Data Recognition Workshop
Pilot Sprint Intelligent Information Extraction
Creation of a prototype for the automatic extraction of structured data from unstructured document types in 5 days – from source analysis through pipeline development to evaluation and handover.
What does the workshop include?
- Analysis of unstructured source documents (e.g. HTML websites, PDF quotations, Word contracts)
- Defining extraction targets with the relevant department (e.g. issuer, individual quotation items) and providing several annotated reference documents
- Prototyping the LLM-based pipeline for intelligent information extraction
- Evaluation and live demo with business users
- Handover of the PoC (technical documentation, architecture diagrams, deployment template, next-steps plan for the MVP)
By the end of the workshop you will receive:
- Prototype extraction pipeline for the defined document types
- Test dataset with annotations and evaluation report
- Complete source code with documentation, architecture, and data flow diagrams
- Next-steps plan, effort estimate, and ROI calculation for the MVP
Who is this workshop for?
- Departments with high document volumes (e.g. accounting, procurement)
- Data entry and process automation teams
- IT and data engineering managers