Industry5 min readThe Register

Netflix, Meta, and IBM speakers: AI will make anyone a 10x programmer, but with 10x the cleanup

P
Redakcja Pixelift0 views
Share
Netflix, Meta, and IBM speakers: AI will make anyone a 10x programmer, but with 10x the cleanup

Foto: The Register

Even a tenfold increase in programmer productivity thanks to AI comes with the necessity of performing ten times more work in cleaning up the generated code. Experts from Netflix, Meta, and IBM at the All Things AI conference in Durham agree that while artificial intelligence allows for the instantaneous creation of solutions in previously unfamiliar languages, such as Python or Groovy, the price of this speed is high. This phenomenon aligns with the Jevons paradox: greater efficiency of a tool leads to its more frequent use, which, instead of reducing employment, generates new, specific tasks. A practical solution implemented at Netflix is the construction of multi-layered agentic structures. UI Architect Ben Ilegbodu points to the necessity of using "adversarial code review"—a model in which one Agent performs a task, a second evaluates its work, and a third acts as an orchestrator managing their interaction. For users, this means a transition from the role of creators to the role of controllers and Context Engineering specialists. The greatest challenge is becoming "context rot"—a phenomenon where an excess of data provided to the AI distracts its attention, leading to erroneous results. Instead of less work, an era of intensive digital assistant management awaits us, where precisely defining rules and tools becomes more important than writing lines of code itself.

The vision of a programmer becoming ten times more productive thanks to artificial intelligence is tempting, but the reality of conference hallways paints a much more complex picture. During the All Things AI event in Durham, experts from giants such as Netflix, Meta, and IBM made it clear: AI can indeed turn anyone into a "10x developer," but the price for this productivity leap is the necessity of doing ten times more work in cleaning and verifying code. Instead of the promised rest, engineers face the challenge of managing the chaos generated by their digital assistants.

This phenomenon fits perfectly into the so-called Jevons Paradox, which was frequently cited during the presentations. This theory suggests that the more efficient a resource becomes, the more often it is used, which paradoxically leads not to savings, but to an increase in total demand. In a technological context, this means that AI will not eliminate jobs but will radically change their nature — instead of writing every line of code, programmers become quality controllers and process architects who must spend more and more time preparing context and checking results.

Adversarial code review and an army of agents

Ben Ilegbodu, a UI architect at Netflix, presented a fascinating, though somewhat exhausting, vision of daily work with automation. According to him, creating one agent to perform a task is just the beginning. For the process to be reliable, it is necessary to employ a second agent whose sole purpose is to evaluate the work of the first. Ilegbodu goes even a step further, using a method he calls "adversarial code review," where a task is divided among many specialized agents reviewing different fragments of software.

In this ecosystem, a third agent becomes essential, acting as an orchestrator that manages communication and actions between the two sides. Such a structure allows Ilegbodu to "multiply himself in parallel," enabling him to work in languages he didn't previously know, such as Python, Bash, or Groovy. However, this constant context switching and the role of a digital orchestra conductor come at a price — as he admits, at the end of the day, he feels exhausted by the fact that he spent eight hours "talking to something" instead of simply creating.

The trap of the insatiable intern and context rot

Justin Jeffress, Developer Advocate at Meta, compares today's AI models to extremely enthusiastic but naive junior developers. The main difference is that AI never feels overwhelmed by an excess of data — it will accept any amount of information as long as there are enough tokens. However, this leads to a phenomenon Jeffress describes as context rot. The more disorganized data that enters an agent, the more scattered its attention becomes, which drastically increases the risk of incorrect answers.

The solution to this problem is meant to be "context engineering" — a new discipline that involves building precise rules, tools, and skills that an agent can refer to at a specific moment. Jeffress suggests that programmers must master "prompt chaining," which is breaking down complex commands into small, sequential steps. Interestingly, he noticed a specific, fractal nature of working with AI: a bot performs 80% of a task, leaving 20% to the human. However, when the human tries to finish that 20%, it turns out that 80% of that remaining fragment can again be performed by bots — and so on, in an endless process of cleaning and refining details.

An end to wishful prompting

Luis Lastras, Director of Language and Multimodal Technologies at IBM, believes that AI errors often do not stem from the flaws of the technology itself, but from the users' lack of task decomposition skills. He criticizes so-called "wishful prompting," which involves adding phrases to commands like: "Please don't hallucinate, my career depends on this." According to Lastras, this is the equivalent of casting spells, which has nothing to do with engineering.

Instead of asking "not to hallucinate," IBM promotes a modular approach. The company recently released an open-source library, mellea.ai, which contains ready-made patterns and functions coded in Python. These allow for:

  • Adding hard requirements to LLM calls.
  • Detecting harmful or incorrect outputs.
  • Structuring responses into specific data schemas.

Lastras also revealed that IBM is working on a "switch brains" feature, which will allow agents to dynamically change the LLM model depending on the specifics of the task. The company's research shows that a smaller, specialized domain model given more time for reasoning often outperforms the largest general models on the market.

The preparation tax and hard constraints

Justin Chau, a senior software engineer at Intuit, points to another aspect: technical debt resulting from implicit assumptions. What is obvious to a human is not obvious to a machine. Chau advises that instead of instructing AI on what to do, one should impose hard constraints on it. An LLM model might ignore an instruction if it thinks it has found a "better" way to the goal, but it is much harder for it to break a categorical prohibition, such as one regarding the use of a specific markup language like HTML.

The most effective form of control, however, remains the complete removal of permissions. If an agent is not given access to GitHub, we have absolute certainty that it will not modify code in the repository without our knowledge. This approach shifts the focus of a programmer's work from writing itself to managing access and security architecture.

Modern software engineering in the age of AI is beginning to resemble the search in Douglas Adams' "The Hitchhiker's Guide to the Galaxy." Much like the computer Deep Thought, which after centuries of calculation gave the result "42," today's models provide answers that require the construction of even more powerful systems just to understand what we actually asked. AI does not so much lift the burden of work from us as it imposes a "preparation tax," forcing us to be more precise than ever before. In this new paradigm, the greatest value is no longer the ability to write code, but the ability to decompose and rigorously verify it.

Source: The Register
Share

Comments

Loading...