What AI Agents in Finance Actually Look Like in Production
Most AI agents in finance are demos. Here's what 200+ pipelines across 100+ entities in production actually look like — and why foundation beats model.
Everyone is shipping AI agents in finance. Almost nobody is running them in production.
The space is loud right now. Decks. Pilots. Sandboxes. Slide-deck case studies. Vendors and Big 4 firms cluster around the same demo loop: a chatbot that summarizes your P&L, a "copilot" that drafts a journal entry, a 90-day pilot that ends with a polite report and a quiet death when the budget cycle resets.
Production is a different category. A production agent reconciles on-chain and off-chain transactions at 3am, across 100+ legal entities and multiple banking partners. It catches exceptions and routes them to the right person. It logs a clean audit trail. It runs whether anyone is watching. It ran last night, and it will run tonight.
That's not the same thing as a demo.
I've built both. The demo is easy. The production version is hard for one specific reason most vendors won't tell you: you cannot run agents on top of a broken ledger. The agents surface garbage faster than the humans they replaced. Then the project dies and someone says "AI doesn't work in finance."
AI works in finance. The prerequisite is the foundation underneath it.
What "production" actually means
Let's be precise. A production AI agent inside a finance function meets a few non-negotiable tests:
- It runs on a schedule, not on a click. Triggered by data arrival, not by a human pressing a button.
- It handles exceptions. Not "calls a human for help on every edge case." Catches the exception, classifies it, routes it, escalates only what needs judgment.
- It produces an audit trail. Every action, every transformation, every override logged with provenance — sourced back to the journal entry.
- It survives the calendar. Period close, year-end, audit fieldwork, fiscal-year changes, regulatory updates. The agent doesn't break when the world around it changes.
If any of those is missing, you're looking at a pilot, not production.
The number that matters: 200+
At one of the world's largest crypto exchanges, I built and ran 200+ automated pipelines across accounting, treasury, and on-chain data. Not proofs of concept. Not pilots. Production. Every day. Across 100+ legal entities, multiple blockchains, multiple banking partners, and a Big 4 audit that asked hard questions.
Here's what those pipelines actually did:
- Bank-to-book reconciliation. Every entity, every account, every day. Auto-matched, exception-routed, controller-reviewed.
- On-chain to off-chain matching. Crypto transactions tied back to their accounting entries with full provenance, across every chain we operated on.
- Intercompany eliminations. Automated, posted, documented — no spreadsheet warriors.
- Treasury rebalancing. 100% of the moves automated, with AI agents handling routing decisions inside policy guardrails.
- Variance analysis. Anomalies flagged with explanations, sourced back to the underlying transactions, before the controller had to ask.
- Regulatory reporting. Schedules generated, validated, and queued for review — not assembled from scratch every quarter.
None of that is glamorous. All of it is the work that turns AI in finance from a slide into a system.
Why most AI agent projects fail in finance
The pattern is consistent. A finance team buys an AI agent on top of a stack that looks like this:
- An ERP held together by CSV exports
- A datalake that's actually a Google Drive folder
- Reconciliation that runs in someone's personal spreadsheet
- Controls documentation that hasn't been touched since the last audit
The agent gets deployed. It surfaces every contradiction the humans had been quietly papering over. The output looks worse than the manual process — not because the agent is wrong, but because the data underneath it was always wrong. The humans were just hiding it.
That's why we built the work around three pillars in this exact order: Lock the Ledger → Kill the Month-End → Ask the Books. Foundation first. Engine second. Output last. Skip the foundation and the engine grinds itself apart.
Lock the ledger: why the foundation comes first
This is the pillar that's easiest to under-invest in and most painful to skip.
Lock the Ledger means one ledger — every entity, every chain, every bank — reconciled, matched, and provable before your team logs in. When the auditor asks "where did this number come from," the answer is one click, not a forty-tab workbook. When the CFO asks "what's our cash position right now," the answer is a number, not a meeting.
That is the prerequisite. Everything an AI agent does inside a finance function — drafting journal entries, surfacing variance, answering plain-English questions, generating regulatory schedules — depends on the ledger being trustworthy in real time. If the underlying data is stale, fragmented, or unreconciled, every downstream agent inherits the rot.
I have watched firms try to skip this step. They buy the agent, deploy it on top of a broken stack, and within three months the agent is quietly disabled. The post-mortem blames "the model." The model wasn't the problem. The problem was that there was no ledger to lock.
The order of operations
If you are a CFO or COO evaluating AI agents in finance right now, here is the honest sequence:
- Locate the gaps in your foundation. Where is reconciliation manual? Where do humans bridge between systems? Where does the close depend on a spreadsheet someone built in a panic? That map is the work.
- Lock the ledger before you deploy agents. Get the data layer right — every entity, every chain, every bank account, into one reconciled view. This is plumbing. It is not glamorous. It is the only thing that matters.
- Layer agents into the workflows the ledger now supports. Recurring journal entries. Variance analysis. Anomaly routing. Reconciliation exceptions. Each agent inherits the trust of the foundation underneath it.
- Open the books to plain-English Q&A last. When the data is clean and the agents are running, asking the books a question becomes trivial. When the data is dirty, the answers are dangerous.
It is the order we ran inside an institutional crypto finance function with 100+ entities and audit obligations across multiple jurisdictions, and it is the order we run for every client that engages us.
What this means for your team
If you have been told that an AI agent will fix your month-end close, ask one question: what does the foundation underneath it look like?
If the answer is "the agent integrates with your existing systems," translate that as "we deploy on top of whatever you have and hope for the best." That's pilot risk. That's how AI in finance gets a bad reputation.
If the answer is "we build the ledger first, then the agents," that's production. That's the only version that's still running a year from now.
We have done it. It is the methodology we run. Lock the ledger first. The agents come second. The questions answer themselves last. That is the order. That is the work.
Related: What Is Zero-Day Close — And Why Every CFO Should Care — what happens to the close cycle once the ledger is locked.
Let's Map It.
Start with a diagnostic — a fixed-scope assessment that maps every gap between where your finance function is and where it needs to be.