Last updated: 2026-04-25
What auto-extraction does
When you upload a statement, Anchorlet doesn't store the file as-is. It parses the contents into structured rows so you can search, sort, and run reports across months and properties.
What gets extracted
For every statement:
- Agency name (e.g. "Ray Cooke Auctioneers")
- Period start + end (the month the statement covers)
- Totals — rent received, management fee, expenses, net to landlord, carried balance
For every per-property row:
- Property address as written by the agency (the "raw" string)
- Invoice number + date
- Rent received, management fee, expenses deducted, balance transferred, property balance
- Match confidence — how sure Anchorlet is that the row maps to a specific property in your workspace
For every expense line:
- Expense date, description, raw property address, net / VAT / total amounts, invoice id
XLSX vs PDF
- XLSX uses a deterministic parser. Fast (under a second) and exact, but only works for known formats.
- PDF uses an LLM (Claude Opus 4.7). Slower (10–30 seconds) but handles any layout. The LLM is given a strict JSON schema to follow, so the output shape is identical to the XLSX path.
Both paths land in the same statements + statement_entries + statement_expense_items tables — the rest of the app doesn't need to know which way it came in.
What doesn't get extracted
- Free-text notes the agency wrote in cells (these are kept as raw text but not indexed).
- Diagrams, scans, or images embedded in PDFs — only the text layer is read.
- Anything outside the standard rent/fee/expense grid. If your agency includes deposits, mortgage payments, or capital expenditures inline, those land in the raw_extract blob but aren't surfaced in the totals.
Reconciliation
After ingest, Anchorlet checks that the per-row totals add up to the statement-level totals. If they don't, you'll see a Reconciliation mismatch banner on the statement detail page — usually a sign the agency's spreadsheet had a manual override or a row that doesn't fit the standard shape. The data is still there; it's just flagging the discrepancy.
Cost + telemetry
Each PDF extraction is a single Opus 4.7 call. Token cost is logged to usage_logs and visible at Settings → Usage. XLSX parses are local and free.