Data Recognition Workshop

Pilot Sprint Intelligent Information Extraction

Creation of a prototype for the automatic extraction of structured data from unstructured document types in 5 days – from source analysis through pipeline development to evaluation and handover.

What does the workshop include? 

  • Analysis of unstructured source documents (e.g. HTML websites, PDF quotations, Word contracts) 
  • Defining extraction targets with the relevant department (e.g. issuer, individual quotation items) and providing several annotated reference documents 
  • Prototyping the LLM-based pipeline for intelligent information extraction 
  • Evaluation and live demo with business users 
  • Handover of the PoC (technical documentation, architecture diagrams, deployment template, next-steps plan for the MVP) 

By the end of the workshop you will receive: 

  • Prototype extraction pipeline for the defined document types 
  • Test dataset with annotations and evaluation report 
  • Complete source code with documentation, architecture, and data flow diagrams 
  • Next-steps plan, effort estimate, and ROI calculation for the MVP 

Who is this workshop for? 

  • Departments with high document volumes (e.g. accounting, procurement) 
  • Data entry and process automation teams 
  • IT and data engineering managers