The Download: gig workers training humanoids, and better AI benchmarks

Foto: MIT Tech Review
Hired workers, who previously earned a living through simple data labeling, are becoming a key link in the development of robotics by donning motion-capture suits to teach humanoid machines human motor skills. Companies such as Tesla, Figure, and Appen are recruiting thousands of gig workers whose task is to perform everyday activities—from walking to moving objects—allowing reinforcement learning algorithms to be fed with real-world behavior patterns. In parallel, the industry faces the challenge of reliably assessing the progress of artificial intelligence. Traditional methods, such as MMLU, are losing relevance as AI models begin "teaching to the test," which artificially inflates their results. The proposed solution lies in new, dynamic benchmarks, such as Scale AI’s SEAL Leaderboards or LiveCodeBench, which focus on tasks generated in real-time and expert verification rather than just automated scripts. For the average user, these changes signal the dawn of an era of robots that move in a fluid and natural manner, as well as access to more reliable information regarding the actual capabilities of AI assistants. The shift from static testing to dynamic evaluation and the involvement of humans in physical machine training is a signal that technology is moving beyond the digital world and beginning to truly understand physical reality. However, this development necessitates a new discussion on labor ethics and the transparency of the training processes that are shaping our future.
In a world dominated by headlines about digital language models and algorithms processing data in the cloud, we often forget that the true AI revolution needs a physical form. However, for humanoid robots to move efficiently in our chaotic world, pick up objects, or assist in medical operations, they need thousands of hours of training under human supervision. As it turns out, behind these advanced movements lies not just pure code, but an army of contract workers like Zeus, a medical student from Nigeria, who after hours spent in the hospital swaps his stethoscope for specialized equipment to remotely control machines.
This is the new face of the gig economy, where the line between physical and digital work blurs in surprising ways. Instead of delivering food or coding simple scripts, workers from the global South are becoming "movement teachers" for machines that are intended to replace humans in the most difficult tasks in the future. It is a fascinating yet raw image of modern technology: state-of-the-art humanoids learn to grasp a glass of water thanks to repetitive movements performed by a human in a modest apartment thousands of kilometers away.
This work, though performed remotely, requires extraordinary precision and coordination. Zeus, by donning sensors and goggles, becomes the robot's digital shadow. Every gesture he makes, every twitch of his wrist, and the way he balances his body is recorded and transmitted to databases used to train the neural networks responsible for robot motor skills. It is a tedious and demanding process, yet crucial for companies striving to create machines capable of functioning autonomously in a human environment.
Read also
A global army of robot teachers
Using contract workers to train AI is not new – for years, thousands of people have been involved in photo tagging or content moderation. However, the transition to training humanoid robots takes this relationship to a completely different level. Here, clicking a box on a screen is not enough; full kinesthetic awareness is required. Tech companies are increasingly outsourcing these tasks to individuals in countries with lower labor costs, allowing for the generation of the massive datasets necessary for reinforcement learning.
For people like Zeus, this work is a chance to earn an income that significantly exceeds local standards, even if it involves working at night to synchronize with servers in a different time zone. From the tech industry's perspective, this is the only way to quickly "feed" AI models with high-quality data. Computer simulations have their limits – they cannot perfectly replicate friction, the unpredictability of physical objects, or the subtlety of human touch. This is why human input remains invaluable.
- Teleoperation: Remote control of a robot in real-time to collect movement data.
- Imitation Learning: A technique where AI learns to mimic actions performed by a human.
- Global reach: Utilizing internet infrastructure to engage talent from around the world, from Nigeria to the Philippines.
It is worth noting the paradox of this situation. Robotization is often presented as a process that will eliminate low-paid physical labor. Meanwhile, to reach that stage, we need thousands of people performing the same repetitive physical actions, but in front of cameras and sensors. This is the symbiosis that defines the current stage of robotics development: a machine is only as intelligent as its human teacher is patient.
The problem of tainted AI benchmarks
While robots are learning to walk, digital AI models like GPT-4 or Claude 3 are racing in performance rankings. The problem is that the traditional benchmarks we use to evaluate machine intelligence are beginning to fail. Increasingly, we see the phenomenon of "teaching to the test" (data contamination), where test data leaks into the models' training sets. As a result, artificial intelligence does not solve problems through reasoning, but because it has "already seen the answers" during its training.
The industry faces an urgent need to create new, more dynamic evaluation methods. Current tests, based on static sets of multiple-choice questions, are too easy to manipulate – whether intentionally or accidentally. Experts point out that we need evaluation systems that test the AI's ability to adapt to new, previously unknown scenarios, rather than just its memory of facts gathered on the internet. Without reliable benchmarks, it is difficult to assess the real progress and safety of upcoming models like GPT-5.
"If a model knows the questions before the exam, the result tells us nothing about its intelligence, only about the capacity of its database." – this sentence best captures the current crisis of confidence in leaderboards within the AI sector.
Modern approaches to benchmarking involve creating tests that are generated in real-time by other AI systems or require interaction with a physical environment (as in the case of the aforementioned robots). Only when AI is faced with a task it could not have "memorized" beforehand will we know its true potential. This is crucial not only for corporate prestige but, above all, for the safety of implementations in medicine or autonomous transport.
A new definition of work in the AI era
The story of Zeus and the challenges related to benchmarks share a common denominator: the importance of human input in the development of technology. We often think of AI as an autonomous entity, when in reality it is a system deeply rooted in human labor and human evaluation systems. The job of a "robot trainer" could become a new standard in the service sector, requiring a unique combination of physical dexterity and technical understanding.
At the same time, we must be aware of the ethical and economic consequences of this model. Are we building a future where the prosperity of one part of the world relies on the tedious digital labor of another? While for many workers in developing countries this is a chance for economic advancement, it raises questions about the sustainability of such a career path. Once robots achieve a sufficient level of proficiency, their "teachers" may become redundant, which is a classic scenario in the history of automation.
In my opinion, in the coming years, we will see the professionalization of the AI trainer profession. These will no longer be just random gig-economy workers, but certified specialists in machine motor skills working with engineers to perfect the most precise movements. Simultaneously, the fight for objective benchmarks will become the new "arms race" in Silicon Valley. Whoever creates the most reliable evaluation system will control the narrative of who is truly leading the race for AGI (Artificial General Intelligence).
Ultimately, the success of humanoid robots will not be measured only by the power of their processors, but by the quality of the data provided to them by people like Zeus. They are the silent architects of a new era of physical artificial intelligence, performing thousands of movements in their homes so that tomorrow's machines can walk confidently on the ground. Paradoxically, technology becomes more "human" because direct, physical human input lies at its foundation.








