AI Economics

The AI Productivity Gap: Why Exposure and Usage Estimates Diverge

Same theorem, same task taxonomy, estimates roughly 16× apart. What Acemoglu (2024) and Anthropic’s Economic Index actually disagree on.

Shisham Adhikari · March 1, 2026

Key Takeaways
  • Scope: agreed. Both find AI reaches roughly 18–20% of total wages. Not the source of the gap.
  • Speedup: not agreed. Lab experiments find 1.2–1.6×. Real-world usage data shows 9–12×. That single difference drives a huge gap in aggregate impacts.
  • Policy stakes: high either way. At 0.06 percentage points per year (pp/yr), AI is a gradual automation wave. At 0.7–1.2 pp/yr, it is the largest U.S. productivity surge since the 1990s.

Motivation

What is AI's actual impact on the economy today? It is one of the most important questions in economics right now, and surprisingly hard to answer. Two research teams looked at this same question and got answers roughly 16 times apart. MIT economist Daron Acemoglu estimated AI will add about 0.06 percentage points to annual productivity growth over the next decade. Anthropic's Economic Index, using real-world usage data from Claude, put the number at 0.7 to 1.8 percentage points per year. Same theoretical foundation. Same accounting framework. Very different conclusions.

The gap is not a mistake. It reflects a genuine disagreement about what AI actually does when people use it. This post walks through how each estimate is built, what they agree on, and where they part ways.

The Task-Based Framework

Both methods rest on the same theoretical foundation. Start with a simple idea: the economy is a bundle of tasks (writing, coding, analyzing, building, advising) performed by people and, increasingly, machines. AI expands what machines can do, shifting some tasks away from human labor. The question is how that shift translates into economy-wide productivity.

Two forces matter. First, tasks are complements, not standalone pieces. Speeding up one step does not raise total output one-for-one if other steps remain bottlenecks. A lawyer who drafts twice as fast still needs client meetings, review cycles, and court dates. That interdependence puts a ceiling on what partial automation can do. Second, AI’s impact is uneven across tasks: it tends to help most with routine cognitive work (drafting, coding, classification) and least where physical presence, interpersonal judgment, or deep context are essential.

Hulten's Theorem

Both teams use the same accounting framework to turn task-level observations into an economy-wide number. The idea is simple: if AI makes a set of tasks faster, the gain in total factor productivity (TFP is the efficiency with which inputs are turned into output, and is often used as a proxy for economy-wide productivity) depends on two things: how large a slice of the economy those tasks represent, and how much faster AI makes them. This framework, known as Hulten’s theorem, can be written as:

TFP gain ≈ task scope × feasibility × labor share × productivity gain per task ≈ s̄A  ×  φ  ×  sL  ×  π̄
  • task scope (s̄A): wage-bill–weighted share of tasks AI can do
  • feasibility (φ): share of those tasks where AI is cost-effective in practice
  • labor share (sL): labor income share (≈0.65)
  • productivity gain per task (π̄): average fraction of task time saved

Both approaches plug their data into this same formula. The scope and labor share inputs are nearly identical across both studies. The entire gap comes from one place: “productivity gain per task.” Acemoglu draws on controlled lab experiments; Anthropic draws on real usage data. That single input is where the two estimates diverge.

Where the Two Estimates Diverge

How broadly does AI reach?

Both studies find that AI-capable tasks account for roughly 18–20% of total wages paid across the economy. Acemoglu asks which tasks a current AI model can do in principle; Anthropic looks at which tasks workers actually use Claude for. They measure different things (potential versus actual reach) and land at nearly the same number. Scope is not where the two estimates diverge.

How much does AI help when it's used?

This is where the gap opens up, and it is large.

Acemoglu draws on two carefully run experiments from 2022–2023. In one, professional writers using GPT-4 finished tasks about 40% faster than those working without it. In another, call-center workers with an AI tool resolved 14% more issues per hour. Averaging those two studies, AI cuts task time by roughly 27%, meaning a task that took an hour now takes about 45 minutes.

Anthropic's data tells a very different story. For each task that appears in Claude usage, the model estimates how long the task would take a professional without AI (in hours) and how long it actually took with Claude in the conversation (in minutes). Across thousands of tasks, Anthropic finds speedups of 9–12×: what took an hour without AI takes around 6–7 minutes with it. That difference, multiplied across the same 18–20% slice of the wage bill, produces almost the entire gap in headline estimates.

Is cost-effectiveness assumed?

Acemoglu applies a discount: even where AI can do a task, only about 1 in 4 of those tasks are worth doing with AI given current wages and subscription costs. Think of a paralegal earning $40/hr being asked to use an AI tool for a 20-minute drafting task. If the AI saves 5 minutes but requires setup time and error review, the economics may not favor it. Anthropic skips this calculation entirely. Their reasoning: if a worker is already using Claude for something, they have already decided it is worth it. When Anthropic's analysts do apply adjustments — accounting for tasks where Claude fails or where automating one step creates new bottlenecks elsewhere — the estimate narrows from 1.8 to 0.7–1.2 percentage points per year.

The Numbers Side by Side

Table 1 — Key inputs and results

Parameter Acemoglu (2024) Anthropic AEI (Jan 2026)
How scope is measured Which tasks GPT-4 can do in principle (potential exposure) Which tasks workers actually use Claude for (observed usage)
Share of wage bill covered ∼18–19% ∼18–20%
How speedups are measured Randomized experiments, 2022–23 Observed Claude conversations, late 2025
Observed speedup range 1.2× – 1.6× (task takes ~45 min instead of 1 hr) 9× – 12× (task takes ~6 min instead of 1 hr)
Cost-effectiveness filter Applied: only ~1 in 4 capable tasks used in practice Not applied: usage itself is the evidence of cost-effectiveness
Headline / adjusted estimate 0.62% over 10 years  (0.06 pp/yr) 1.8 pp/yr baseline → 0.7–1.2 pp/yr after adjustments

Both approaches agree AI covers a similar slice of the economy's task portfolio. The entire spread comes from one place: how fast AI speeds up the tasks it reaches.

Why Are the Speedups So Different?

The honest answer is that we do not know yet, but three explanations are probably all partially right. The most straightforward is selection. People use Claude for tasks where they expect it to help. If workers naturally lean toward their AI's strengths, the tasks that show up in usage data will be the ones where AI performs best. Controlled experiments sidestep this by randomly assigning tasks, including the hard ones where AI underperforms.

A second issue is self-reporting: Anthropic's speedup estimates come from asking Claude to judge how long tasks take with and without AI assistance. An AI system evaluating its own contribution is not an independent auditor. Third, and perhaps most hopeful: models may have genuinely improved. The experiments Acemoglu relies on used GPT-4 in 2022–2023. Anthropic's data reflects Claude in late 2025. If frontier AI capabilities have grown substantially in that time (and most benchmarks suggest they have), part of the speedup gap could be real. That would mean the true productivity impact sits somewhere between the two estimates, not at either extreme.

Conclusion

Any aggregate productivity estimate is only as good as the parameters feeding into it. Both studies use the same accounting identity and agree on most inputs. The divergence comes almost entirely from one: the productivity gain per task — how much faster AI makes something when it is actually used. Plug in lab-experiment speedups and you get a modest, gradual effect. Plug in real-usage speedups and the implied gain is an order of magnitude larger.

We do not yet have the experiment that would settle which is closer to the truth: a large, randomized study on current frontier models across the full range of tasks workers actually bring to AI. Until that evidence exists, the gap between these two estimates is itself the most informative number we have.

Appendix

Technical notes on methodology, data sources, and the earlier Sep 2025 usage-concentration exercise.

A. Hulten's Theorem and the Two Formulas

Hulten (1978) states that the TFP impact of a productivity improvement in a sector equals that sector's GDP share times the log size of the improvement: d log TFP = revenue_share × d log A. Applied at the task level, this gives the aggregate labor productivity gain as the wage-bill–weighted sum of log speedups across all AI-affected tasks.

Acemoglu (2024) uses a linearized version of this for small productivity gains: replacing log(speedup) with π̄ (fraction of task time saved), and incorporating feasibility (φ) and labor income share (sL) to yield Δlog Y ≈ s̄A × φ × sL × π̄. This linearization is accurate for small speedups (1.1–1.6×) but understates gains for large speedups (9–12×), where log(speedup) is substantially larger than 1 − 1/speedup.

Anthropic (2026) applies Hulten directly: Δlog Y = Σt(task_wb_sharet × log(speedupt)) for the baseline case (σ = 1). For σ ≠ 1, they use a CES aggregator within occupations first, then apply Hulten across occupations. Footnote 9 of their report describes this in detail.

B. Anthropic's Speedup Measurement

For each O*NET task observed in Claude usage, Anthropic asks Claude to estimate: (i) how many hours a competent professional would need to complete the task without AI (human-only time), and (ii) how many minutes the user spent completing it with Claude (human-with-AI time). Speedup is then:

speedup = (human-only time × 60) / human-with-AI time

A task taking 1 hour alone and 10 minutes with AI gives a speedup of 6×. This is confirmed by Anthropic directly in their report: "reducing a 1 hour task to 10 minutes would give a 6x speedup." Across the tasks in their January 2026 data with at least 200 observations, the mean speedup is in the range of 9–12× depending on task complexity. The threshold of 200 observations is chosen to replicate their earlier results and to avoid inflating the aggregate by including very rare tasks with noisy speedup estimates. Without the threshold, the implied productivity gain would be around 5 pp/yr, much higher than their headline figure.

C. The Sep 2025 AEI Usage-Concentration Exercise

An earlier version of this analysis used the September 2025 AEI release, which reports aei_usage_pct: each O*NET task's share of total Claude interactions summing to 100%. This is a concentration measure, not an intensity measure: it records market share, not per-occupation task penetration. When aggregated to the occupation level and plugged into Hulten's s̄A slot, the result is a GDP-weighted share of about 0.25%, roughly 75 times smaller than the exposure index. The resulting TFP estimate of 0.04% reflects the structural mismatch between a concentration measure and an intensity slot, not any substantive finding about AI's economic impact. The January 2026 release resolves this by providing task-level speedup data that can be properly aggregated.

D. Data Sources

E. References