The AI Productivity Gap: Why Exposure and Usage Estimates Diverge

Motivation

What is AI's actual impact on the economy today? It is one of the most important questions in economics right now, and surprisingly hard to answer. Two research teams looked at this same question and got answers roughly 16 times apart. MIT economist Daron Acemoglu estimated AI will add about 0.06 percentage points to annual productivity growth over the next decade. Anthropic's Economic Index, using real-world usage data from Claude, put the number at 0.7 to 1.8 percentage points per year. Same theoretical foundation. Same accounting framework. Very different conclusions.

The gap is not a mistake. It reflects a genuine disagreement about what AI actually does when people use it. This post walks through how each estimate is built, what they agree on, and where they part ways.

The Task-Based Framework

Both methods rest on the same theoretical foundation. Start with a simple idea: the economy is a bundle of tasks (writing, coding, analyzing, building, advising) performed by people and, increasingly, machines. AI expands what machines can do, shifting some tasks away from human labor. The question is how that shift translates into economy-wide productivity.

Two forces matter. First, tasks are complements, not standalone pieces. Speeding up one step does not raise total output one-for-one if other steps remain bottlenecks. A lawyer who drafts twice as fast still needs client meetings, review cycles, and court dates. That interdependence puts a ceiling on what partial automation can do. Second, AI’s impact is uneven across tasks: it tends to help most with routine cognitive work (drafting, coding, classification) and least where physical presence, interpersonal judgment, or deep context are essential.

Hulten's Theorem

Both teams use the same accounting framework to turn task-level observations into an economy-wide number. The idea is simple: if AI makes a set of tasks faster, the gain in total factor productivity (TFP is the efficiency with which inputs are turned into output, and is often used as a proxy for economy-wide productivity) depends on two things: how large a slice of the economy those tasks represent, and how much faster AI makes them. This framework, known as Hulten’s theorem, can be written as:

TFP gain ≈ task scope × feasibility × labor share × productivity gain per task ≈ s̄_A × φ × s_L × π̄

task scope (s̄_A): wage-bill–weighted share of tasks AI can do
feasibility (φ): share of those tasks where AI is cost-effective in practice
labor share (s_L): labor income share (≈0.65)
productivity gain per task (π̄): average fraction of task time saved

Both approaches plug their data into this same formula. The scope and labor share inputs are nearly identical across both studies. The entire gap comes from one place: “productivity gain per task.” Acemoglu draws on controlled lab experiments; Anthropic draws on real usage data. That single input is where the two estimates diverge.

Where the Two Estimates Diverge

How broadly does AI reach?

Both studies find that AI-capable tasks account for roughly 18–20% of total wages paid across the economy. Acemoglu asks which tasks a current AI model can do in principle; Anthropic looks at which tasks workers actually use Claude for. They measure different things (potential versus actual reach) and land at nearly the same number. Scope is not where the two estimates diverge.

How much does AI help when it's used?

This is where the gap opens up, and it is large.

Acemoglu draws on two carefully run experiments from 2022–2023. In one, professional writers using GPT-4 finished tasks about 40% faster than those working without it. In another, call-center workers with an AI tool resolved 14% more issues per hour. Averaging those two studies, AI cuts task time by roughly 27%, meaning a task that took an hour now takes about 45 minutes.

Anthropic's data tells a very different story. For each task that appears in Claude usage, the model estimates how long the task would take a professional without AI (in hours) and how long it actually took with Claude in the conversation (in minutes). Across thousands of tasks, Anthropic finds speedups of 9–12×: what took an hour without AI takes around 6–7 minutes with it. That difference, multiplied across the same 18–20% slice of the wage bill, produces almost the entire gap in headline estimates.

Is cost-effectiveness assumed?

Acemoglu applies a discount: even where AI can do a task, only about 1 in 4 of those tasks are worth doing with AI given current wages and subscription costs. Think of a paralegal earning $40/hr being asked to use an AI tool for a 20-minute drafting task. If the AI saves 5 minutes but requires setup time and error review, the economics may not favor it. Anthropic skips this calculation entirely. Their reasoning: if a worker is already using Claude for something, they have already decided it is worth it. When Anthropic's analysts do apply adjustments — accounting for tasks where Claude fails or where automating one step creates new bottlenecks elsewhere — the estimate narrows from 1.8 to 0.7–1.2 percentage points per year.

The Numbers Side by Side

Table 1 — Key inputs and results

Parameter	Acemoglu (2024)	Anthropic AEI (Jan 2026)
How scope is measured	Which tasks GPT-4 can do in principle (potential exposure)	Which tasks workers actually use Claude for (observed usage)
Share of wage bill covered	∼18–19%	∼18–20%
How speedups are measured	Randomized experiments, 2022–23	Observed Claude conversations, late 2025
Observed speedup range	1.2× – 1.6× (task takes ~45 min instead of 1 hr)	9× – 12× (task takes ~6 min instead of 1 hr)
Cost-effectiveness filter	Applied: only ~1 in 4 capable tasks used in practice	Not applied: usage itself is the evidence of cost-effectiveness
Headline / adjusted estimate	0.62% over 10 years (0.06 pp/yr)	1.8 pp/yr baseline → 0.7–1.2 pp/yr after adjustments

Both approaches agree AI covers a similar slice of the economy's task portfolio. The entire spread comes from one place: how fast AI speeds up the tasks it reaches.

Why Are the Speedups So Different?

The honest answer is that we do not know yet, but three explanations are probably all partially right. The most straightforward is selection. People use Claude for tasks where they expect it to help. If workers naturally lean toward their AI's strengths, the tasks that show up in usage data will be the ones where AI performs best. Controlled experiments sidestep this by randomly assigning tasks, including the hard ones where AI underperforms.

A second issue is self-reporting: Anthropic's speedup estimates come from asking Claude to judge how long tasks take with and without AI assistance. An AI system evaluating its own contribution is not an independent auditor. Third, and perhaps most hopeful: models may have genuinely improved. The experiments Acemoglu relies on used GPT-4 in 2022–2023. Anthropic's data reflects Claude in late 2025. If frontier AI capabilities have grown substantially in that time (and most benchmarks suggest they have), part of the speedup gap could be real. That would mean the true productivity impact sits somewhere between the two estimates, not at either extreme.

Conclusion

Any aggregate productivity estimate is only as good as the parameters feeding into it. Both studies use the same accounting identity and agree on most inputs. The divergence comes almost entirely from one: the productivity gain per task — how much faster AI makes something when it is actually used. Plug in lab-experiment speedups and you get a modest, gradual effect. Plug in real-usage speedups and the implied gain is an order of magnitude larger.

We do not yet have the experiment that would settle which is closer to the truth: a large, randomized study on current frontier models across the full range of tasks workers actually bring to AI. Until that evidence exists, the gap between these two estimates is itself the most informative number we have.

The AI Productivity Gap: Why Exposure and Usage Estimates Diverge

Motivation

The Task-Based Framework

Hulten's Theorem

Where the Two Estimates Diverge

How broadly does AI reach?

How much does AI help when it's used?

Is cost-effectiveness assumed?

The Numbers Side by Side

Why Are the Speedups So Different?

Conclusion

Appendix

A. Hulten's Theorem and the Two Formulas

B. Anthropic's Speedup Measurement

C. The Sep 2025 AEI Usage-Concentration Exercise

D. Data Sources

E. References