Features
Everything CodexPDF provides
Built for teams that need an authoritative facts contract for PDFs, not duplicated extraction logic across services.
Boundary clarity
Read-only extraction by design
codexPDF focuses on facts extraction and avoids hidden product behavior.
-
No rendering mutations or rule adjudication in extraction.
-
Consumer-agnostic payload shape for all callers.
-
Stable extraction boundary for long-term system evolution.
Contract-first model
CodexDocument as the source of truth
Downstream systems consume one shared facts contract.
-
Versioned CodexDocument root model.
-
Schema files published under schemas/v1.
-
SemVer policy aligns schema evolution with runtime usage.
Validation and trust
Schema-validated workflows
Validate outputs in local tooling and CI to catch drift early.
-
CLI validation against published schema bundles.
-
Portable JSON payloads for reproducible checks.
-
Contract-aware output for downstream adapters.
Operational tooling
CLI and parity built in
Designed for real workflows beyond one-off extraction.
-
extract/probe for raw and lightweight summaries.
-
parity profiles for baseline comparison.
-
baseline command mode for external adapter checks.