CodexPDF CodexPDF

Features

Everything CodexPDF provides

Built for teams that need an authoritative facts contract for PDFs, not duplicated extraction logic across services.

Boundary clarity

Read-only extraction by design

codexPDF focuses on facts extraction and avoids hidden product behavior.

  • No rendering mutations or rule adjudication in extraction.

  • Consumer-agnostic payload shape for all callers.

  • Stable extraction boundary for long-term system evolution.

Contract-first model

CodexDocument as the source of truth

Downstream systems consume one shared facts contract.

  • Versioned CodexDocument root model.

  • Schema files published under schemas/v1.

  • SemVer policy aligns schema evolution with runtime usage.

Validation and trust

Schema-validated workflows

Validate outputs in local tooling and CI to catch drift early.

  • CLI validation against published schema bundles.

  • Portable JSON payloads for reproducible checks.

  • Contract-aware output for downstream adapters.

Operational tooling

CLI and parity built in

Designed for real workflows beyond one-off extraction.

  • extract/probe for raw and lightweight summaries.

  • parity profiles for baseline comparison.

  • baseline command mode for external adapter checks.