Simulated Company Shows Most AI Agents Flunk the Job:
The results should comfort people worried about AI replacing them. The best of them, Claude 3.5 Sonnet from Anthropic, only completed 24% of the tasks. Google’s Gemini 2.0 Flash came in second with 11.4%, and OpenAI’s GPT-4o was third with 8.6%.
lol.