Why AI Freelancers Are Still Floundering: Study Shows Agents Only Earn ~1% of Freelance Work
Think that AI agents are poised to replace freelancers overnight? Think again. A new benchmark by Scale AI and Center for AI Safety (CAIS) finds that even the most advanced AI “workers” can complete less than 3 % of simulated freelance tasks—raising serious questions about the hype surrounding AI job displacement. ([WIRED][1])
The Big Picture According to the article by Will Knight in WIRED, researchers devised a benchmark called the Remote Labor Index to assess whether frontier AI agents can handle economically valuable freelance work. ([WIRED][1]) They tested leading models—such as Manus, Grok, Claude, ChatGPT, and Gemini—on a variety of real‑world freelance tasks (graphic design, video editing, administrative chores, data scraping). The result: the best‑performing agent managed only US $1,810 in value out of a possible US $143,991. That’s under 1.3 %. ([WIRED][1])
Key Findings & Insights
- The tasks covered were drawn from real‑world freelance gigs posted on platforms like Upwork: they included a job description, the relevant files, and a completed example (by a human) for context. ([WIRED][1])
- The top agent, Manus (a Chinese startup), outperformed others but still delivered minuscule results overall. ([WIRED][1])
- The researchers believe this shows that although AI models have improved in areas such as coding, math, and logical reasoning, they still falter when tasks demand tool‑use, multi‑step workflows, long‑term memory, or learning from experience. ([WIRED][1])
- The study contrasts with another benchmark published by OpenAI (GDPval) claiming frontier models are “approaching human abilities on 220 office tasks.” The authors caution this upper bound may paint an overly optimistic picture. ([WIRED][1])
- The findings hint that despite waves of AI hype—particularly claims that large swathes of jobs might vanish—the reality is more nuanced and slower moving. Even with impressive model releases, the complex messy world of real‑work remains a high bar.
Implications & What It Means For workers, freelancers, and companies alike, this puts some wind in the sails of those worried about immediate wide‑scale AI job takeover. Yes, automation is ongoing—but this study suggests we are not yet at the tipping point where large numbers of independent tasks can be handed off to AI agents. For AI developers and investors, the message is: don’t underestimate the difficulty of “real work.” Tool‑integration, workflow orchestration, context retention, judgement, iterative improvement—these remain hard for AI. For policy‑makers and strategists, the takeaway is that mass displacement narratives may need recalibration. The mismatch between hype and reality suggests that transitional support, skill adaptation, and human‑machine collaboration will remain central themes for the foreseeable future.
Glossary
- Frontier AI model: A highly advanced artificial intelligence system (typically large‑language‑models or multi‑modal models) considered to be at the cutting edge of capability.
- AI agent: An AI system designed to perform tasks with some degree of autonomy—e.g., using tools, interacting with environments, completing workflows rather than simply generating text.
- Benchmark: A standardised test or set of tasks used to evaluate and compare performance across different models or systems.
- Remote Labor Index: The benchmark created by Scale AI and CAIS to measure how well AI agents can perform economically‑valuable freelance tasks.
- Tool‑use in AI: The capability for an AI system to integrate external tools (software utilities, APIs, data sources) and use them effectively as part of a task workflow.
Conclusion The “AI will take your job tomorrow” refrain may still make headlines, but the empirical evidence from this new study suggests that agents today are far from replacing human freelancers in any substantial way. As developers and companies explore task automation, the heavy lifting remains firmly in the human domain—especially where complexity, context and iteration matter.
Source link: https://www.wired.com/story/ai-agents-are-terrible-freelance-workers/
| [1]: https://www.wired.com/story/ai-agents-are-terrible-freelance-workers/ “AI Agents Are Terrible Freelance Workers | WIRED” |