An MCP server that ingests documents from anywhere, classifies them, stacks them in your reviewer's exact order, and extracts fields with grounded, page-level provenance. Built on LandingAI ADE, for regulated finance and healthcare.
Loan files, claims, intake packets. A pile of PDFs that someone has to read, sort, key in, and sign off on. A wrong field isn't a typo, it's a compliance event. Traditional OCR gives you a wall of text. Fluent LLMs give you confident answers with no way to check them. Neither is something you can put in front of an examiner.
Documents in from any source, a review-ready package out. Every grounded value points back to its page and box, and anything ungrounded is flagged for a human.
From an upload portal, SFTP/S3, email, or an export. Source-agnostic.
Detect each document's type to pick the right schema and stack slot.
LandingAI ADE returns typed fields with page, box, and confidence on grounded values.
Order the set exactly the way your reviewer expects. Configurable per use case.
A combined PDF cover sheet tied to source pages, plus a JSON sidecar.
Built on the one idea that makes document AI usable where the stakes are real: a machine-read value should always point back to where it came from.
Grounded values carry page, bounding box, and confidence. Examinable by default.
Ungrounded or low-confidence values are flagged for human review, not trusted silently.
Assemble any document set into the exact order a reviewer wants.
Works on any list of files, however they arrived. Connectors are thin adapters.
One MCP core, called from Claude, Lyzr, LangGraph, CrewAI, or run on Databricks.
It outputs data and a review queue. It never approves, denies, scores, or ranks.
Not promised on a slide. The properties an auditor cares about are structural.
# A grounded field comes back like this { "name": "borrower.income", "value": "6,500.00", "grounded": true, "confidence": 0.98, "source_doc": "paystub", "page": 1, "needs_review": false } # An ungrounded value is surfaced, not trusted { "name": "account.holder", "grounded": false, "needs_review": true }
The pattern that orders a mortgage credit file orders a healthcare intake or a claims packet.
Stack a credit file (1003, paystubs, W-2, bank statements, ID), extract income, identity, and collateral fields with provenance, hand a reviewer a decision-ready package.
Order an intake, prior-auth, or claims packet, extract the fields a reviewer needs, flag anything ungrounded.
Turn a folder of PDFs into stacked, grounded, audit-ready data your team can act on.
No API key required. Stub mode runs the full pipeline on synthetic data so you can wire it up before spending a cent on ADE. Add your LandingAI key for live extraction.
# clone, install, run, no key needed git clone https://github.com/rdmurugan/idpflow-core.git cd idpflow-core python3.12 -m venv .venv && source .venv/bin/activate pip install -e . python examples/make_sample_docs.py python examples/direct_library.py # stub mode, free
Apache-2.0. Especially looking for new stacking profiles, extraction schemas, and connectors from people in lending, banking, and healthcare ops. Tell me what documents you're drowning in.
Contribute on GitHub