Data Recognition Workshop

Pilot Sprint Intelligent Information Extraction

Creation of a prototype for the automatic extraction of structured data from unstructured document types in 5 days – from source analysis through pipeline development to evaluation and handover.

What does the workshop include?

Analysis of unstructured source documents (e.g. HTML websites, PDF quotations, Word contracts)
Defining extraction targets with the relevant department (e.g. issuer, individual quotation items) and providing several annotated reference documents
Prototyping the LLM-based pipeline for intelligent information extraction
Evaluation and live demo with business users
Handover of the PoC (technical documentation, architecture diagrams, deployment template, next-steps plan for the MVP)

By the end of the workshop you will receive:

Prototype extraction pipeline for the defined document types
Test dataset with annotations and evaluation report
Complete source code with documentation, architecture, and data flow diagrams
Next-steps plan, effort estimate, and ROI calculation for the MVP

Who is this workshop for?

Departments with high document volumes (e.g. accounting, procurement)
Data entry and process automation teams
IT and data engineering managers