AI5 min readArs Technica AI

How did Anthropic measure AI's "theoretical capabilities" in the job market?

P
Redakcja Pixelift0 views
Share
How did Anthropic measure AI's "theoretical capabilities" in the job market?

Foto: Getty Images

Up to 80% of tasks in sectors such as law, finance, and management could be streamlined by artificial intelligence—according to a high-profile report from Anthropic that has shaken the labor market. The theoretical capabilities of Large Language Models (LLMs) drastically exceed their current real-world application, suggesting that creative industries (Arts & Media) and administrative sectors (Office & Admin) are on the brink of a fundamental shift. The analysis is based on data from the "GPTs are GPTs" study prepared by OpenAI and the University of Pennsylvania. Researchers used the O*NET database to break down occupations into core components and assess whether AI could reduce the time required for specific activities by at least 50% while maintaining the same quality. Significantly, these forecasts do not assume the total replacement of humans, but rather focus on productivity growth through "anticipated LLM-powered software." For users and professionals, this means that the ability to operate advanced software built on language models will become a key competency. While the figures are unsettling, it is worth noting that they are based on the subjective assessments of AI experts rather than empirical tests in real-world business conditions. Instead of mass layoffs, we are more likely to see an evolution of job roles toward the supervision of automated processes.

In the debate over the impact of artificial intelligence on the economy, a chart has emerged that quickly went viral in tech circles. A report by the company Anthropic juxtaposes the current "observed exposure" of occupations to LLM models with their "theoretical capability." At first glance, these data are chilling: the blue field suggests that systems based on large language models could theoretically perform at least 80 percent of tasks in almost all key categories — from administration and media to law, finance, and management.

A deeper analysis of the methodology behind these numbers, however, reveals a picture that is much less dramatic and more speculative. It turns out that the "theoretical capabilities" cited by Anthropic do not stem from empirical tests of the latest models, but from an August 2023 report titled "GPTs are GPTs: An Early Look at the Labor Market Impact Potential of Large Language Models". This document, co-authored by researchers from OpenAI, OpenResearch, and the University of Pennsylvania, is based on a series of assumptions regarding future software that, at the time of publication, not only did not exist but remains in the realm of concept to this day.

Foundations built on guesswork and GPT-4

The methodology of the study cited by Anthropic relied on O*NET Detailed Work Activity reports, which break down individual occupations into their constituent parts — specific, granular tasks. The research team used a mix of human annotation and GPT-4 assistance to assess whether the most powerful model from OpenAI at the time would be able to reduce the time needed to perform a given task by at least 50 percent while maintaining "equivalent quality."

The key problem, however, is who performed these assessments. They were not specialists performing the given profession, nor even people familiar with it. They were AI experts who evaluated the potential of the technology in fields about which — as they themselves admitted — they had little knowledge. The authors of the report themselves point to "subjectivity of labeling" and "unclear logic of task aggregation" as fundamental limitations of their approach. As a result, a measure that looks like an objective economic indicator is actually a collection of educated guesses from mid-2023.

Man working at a computer with AI data visualization
Visualizations of AI's impact on the labor market are often based on theoretical performance models rather than actual data from implementations.

A wish list instead of hard data

The rubric used by researchers to assess the "direct exposure" of tasks to LLMs included a list of activities that language models were already performing well a year ago. These included:

  • Writing and transforming code and text according to complex instructions.
  • Editing existing content according to specifications.
  • Translating texts between languages.
  • Summarizing medium-length documents.
  • Generating questions for documentation and providing answers to them.

While this list accurately reflects the capabilities of GPT-4 at the time, the assumption that the model would perform these tasks twice as fast while maintaining the same quality is risky. It is worth recalling a 2025 study which found that programmers using AI were 19 percent slower than those working traditionally when the time spent writing prompts and verifying erroneous code was factored in. The problem of hallucinations and excessive sycophancy of LLM models puts a huge question mark over the thesis of "equivalent quality" of the generated results.

The promise of "future software"

The most controversial element of the report, which generated the spectacular bars in the Anthropic graphic, is the concept of "anticipated LLM-powered software." Under a restrictive approach, researchers estimated that only 15 percent of professional tasks could be improved by half by then-current models. To reach numbers in the range of 80-100 percent, they had to assume the creation of a new generation of tools built on the basis of LLMs.

The context must be kept in mind: August 2023 was the peak of market hype. This was when Elon Musk was calling for a pause on AI development, Geoffrey Hinton was leaving Google warning of the loss of control over humanity, and Eliezer Yudkowsky was suggesting airstrikes on data centers to stop rogue intelligence. In this atmosphere, experts made projections without imposing any time frames. "We do not make predictions about the timeline of development or adoption of such models," the authors wrote, thereby creating an open-ended forecast that could come true as easily in a year as in a decade.

Abstract graphic representing a neural network and thought processes
Theoretical AI capabilities assume seamless integration of language models with everyday work tools.

Task automation is not worker replacement

In the most optimistic (or pessimistic, depending on your point of view) scenario, researchers predict that between 47 and 56 percent of all tasks in the economy will eventually be accelerated by at least 50 percent. In some professions, such as mathematicians, writers, or digital interface designers, this rate is expected to be 100 percent.

However, a crucial distinction often missing from sensational headlines is necessary: increasing efficiency in a specific task is not synonymous with replacing a human. A tool that allows an article or code to be written twice as fast makes the worker more productive, but it does not eliminate the need for their oversight, creativity, and responsibility for the final result. Anthropic, by citing these data, shows the transformative potential of the technology, but it relies on foundations that still require many years of verification in real market conditions.

Instead of a revolution that wipes out entire sectors of the economy overnight, these data — once filtered of marketing enthusiasm — suggest an evolution of work tools. "Theoretical capabilities" remain merely a mathematical extrapolation until the software they are based on actually reaches the hands of users and proves its value against the real complexity of human professions.

Comments

Loading...