Why do most AI agent projects fail in finance?

Most AI agents in finance get deployed on top of a broken ledger — an ERP held together by CSV exports, reconciliation that lives in someone's personal spreadsheet, controls documentation that hasn't been touched since the last audit. The agent surfaces every contradiction the humans had been quietly papering over, the output looks worse than the manual process, and the project gets shelved within 90 days. The model isn't the problem; the data underneath it is.

What does it mean for an AI agent in finance to be in production, not a pilot?

A production AI agent in finance runs on a schedule (not on a click), handles exceptions by classifying and routing them rather than calling a human for every edge case, produces an audit trail with provenance back to the journal entry, and survives period close, year-end, audit fieldwork, and regulatory updates without breaking. If any of those is missing, it's a pilot, not production.

What is the right order of operations for deploying AI agents in a finance function?

Lock the ledger first. Get the data layer right — every entity, every chain, every bank — into one reconciled view. Then layer agents into the workflows the ledger now supports: recurring journal entries, variance analysis, anomaly routing, reconciliation exceptions. Open the books to plain-English Q&A last. Skip the foundation and the engine grinds itself apart.

What does AI agents in finance look like at production scale?

At one of the world's largest crypto exchanges, we built and ran 200+ automated pipelines across accounting, treasury, and on-chain data — every day, across 100+ legal entities, multiple blockchains, multiple banking partners, and a Big 4 audit. Bank-to-book reconciliation, on-chain to off-chain matching, intercompany eliminations, treasury rebalancing (100% automated), variance analysis, and regulatory reporting — all running in production, not as proofs of concept.

How should a CFO or COO evaluate an AI agent vendor for finance?

Ask one question: what does the foundation underneath the agent look like? If the answer is 'the agent integrates with your existing systems,' that's pilot risk. If the answer is 'we build the ledger first, then the agents,' that's production. The agent layer is the easy part; the ledger layer is what determines whether the agent is still running a year from now.

May 4, 2026/6 min read

What AI Agents in Finance Actually Look Like in Production

Most AI agents in finance are demos. Here's what 200+ pipelines across 100+ entities in production actually look like — and why foundation beats model.

Anthony Su

Everyone is shipping AI agents in finance. Almost nobody is running them in production.

The space is loud right now. Decks. Pilots. Sandboxes. Slide-deck case studies. Vendors and Big 4 firms cluster around the same demo loop: a chatbot that summarizes your P&L, a "copilot" that drafts a journal entry, a 90-day pilot that ends with a polite report and a quiet death when the budget cycle resets.

Production is a different category. A production agent reconciles on-chain and off-chain transactions at 3am, across 100+ legal entities and multiple banking partners. It catches exceptions and routes them to the right person. It logs a clean audit trail. It runs whether anyone is watching. It ran last night, and it will run tonight.

That's not the same thing as a demo.

I've built both. The demo is easy. The production version is hard for one specific reason most vendors won't tell you: you cannot run agents on top of a broken ledger. The agents surface garbage faster than the humans they replaced. Then the project dies and someone says "AI doesn't work in finance."

AI works in finance. The prerequisite is the foundation underneath it.

What "production" actually means

Let's be precise. A production AI agent inside a finance function meets a few non-negotiable tests:

It runs on a schedule, not on a click. Triggered by data arrival, not by a human pressing a button.
It handles exceptions. Not "calls a human for help on every edge case." Catches the exception, classifies it, routes it, escalates only what needs judgment.
It produces an audit trail. Every action, every transformation, every override logged with provenance — sourced back to the journal entry.
It survives the calendar. Period close, year-end, audit fieldwork, fiscal-year changes, regulatory updates. The agent doesn't break when the world around it changes.

If any of those is missing, you're looking at a pilot, not production.

The number that matters: 200+

At one of the world's largest crypto exchanges, I built and ran 200+ automated pipelines across accounting, treasury, and on-chain data. Not proofs of concept. Not pilots. Production. Every day. Across 100+ legal entities, multiple blockchains, multiple banking partners, and a Big 4 audit that asked hard questions.

Here's what those pipelines actually did:

Bank-to-book reconciliation. Every entity, every account, every day. Auto-matched, exception-routed, controller-reviewed.
On-chain to off-chain matching. Crypto transactions tied back to their accounting entries with full provenance, across every chain we operated on.
Intercompany eliminations. Automated, posted, documented — no spreadsheet warriors.
Treasury rebalancing. 100% of the moves automated, with AI agents handling routing decisions inside policy guardrails.
Variance analysis. Anomalies flagged with explanations, sourced back to the underlying transactions, before the controller had to ask.
Regulatory reporting. Schedules generated, validated, and queued for review — not assembled from scratch every quarter.

None of that is glamorous. All of it is the work that turns AI in finance from a slide into a system.

Why most AI agent projects fail in finance

The pattern is consistent. A finance team buys an AI agent on top of a stack that looks like this:

An ERP held together by CSV exports
A datalake that's actually a Google Drive folder
Reconciliation that runs in someone's personal spreadsheet
Controls documentation that hasn't been touched since the last audit

The agent gets deployed. It surfaces every contradiction the humans had been quietly papering over. The output looks worse than the manual process — not because the agent is wrong, but because the data underneath it was always wrong. The humans were just hiding it.

That's why we built the work around three pillars in this exact order: Lock the Ledger → Kill the Month-End → Ask the Books. Foundation first. Engine second. Output last. Skip the foundation and the engine grinds itself apart.

Lock the ledger: why the foundation comes first

This is the pillar that's easiest to under-invest in and most painful to skip.

Lock the Ledger means one ledger — every entity, every chain, every bank — reconciled, matched, and provable before your team logs in. When the auditor asks "where did this number come from," the answer is one click, not a forty-tab workbook. When the CFO asks "what's our cash position right now," the answer is a number, not a meeting.

That is the prerequisite. Everything an AI agent does inside a finance function — drafting journal entries, surfacing variance, answering plain-English questions, generating regulatory schedules — depends on the ledger being trustworthy in real time. If the underlying data is stale, fragmented, or unreconciled, every downstream agent inherits the rot.

I have watched firms try to skip this step. They buy the agent, deploy it on top of a broken stack, and within three months the agent is quietly disabled. The post-mortem blames "the model." The model wasn't the problem. The problem was that there was no ledger to lock.

The order of operations

If you are a CFO or COO evaluating AI agents in finance right now, here is the honest sequence:

Locate the gaps in your foundation. Where is reconciliation manual? Where do humans bridge between systems? Where does the close depend on a spreadsheet someone built in a panic? That map is the work.
Lock the ledger before you deploy agents. Get the data layer right — every entity, every chain, every bank account, into one reconciled view. This is plumbing. It is not glamorous. It is the only thing that matters.
Layer agents into the workflows the ledger now supports. Recurring journal entries. Variance analysis. Anomaly routing. Reconciliation exceptions. Each agent inherits the trust of the foundation underneath it.
Open the books to plain-English Q&A last. When the data is clean and the agents are running, asking the books a question becomes trivial. When the data is dirty, the answers are dangerous.

It is the order we ran inside an institutional crypto finance function with 100+ entities and audit obligations across multiple jurisdictions, and it is the order we run for every client that engages us.

What this means for your team

If you have been told that an AI agent will fix your month-end close, ask one question: what does the foundation underneath it look like?

If the answer is "the agent integrates with your existing systems," translate that as "we deploy on top of whatever you have and hope for the best." That's pilot risk. That's how AI in finance gets a bad reputation.

If the answer is "we build the ledger first, then the agents," that's production. That's the only version that's still running a year from now.

We have done it. It is the methodology we run. Lock the ledger first. The agents come second. The questions answer themselves last. That is the order. That is the work.

See how we run a Diagnostic →

Related: What Is Zero-Day Close — And Why Every CFO Should Care — what happens to the close cycle once the ledger is locked.

Let's Map It.

Start with a diagnostic — a fixed-scope assessment that maps every gap between where your finance function is and where it needs to be.