{"id":153577,"date":"2025-07-05T14:22:55","date_gmt":"2025-07-05T14:22:55","guid":{"rendered":"https:\/\/teknomers.com\/en\/some-researchers-founded-a-company-where-all-the-employees-were-ai-agents-but-they-didnt-complete-a-quarter-of-the-work\/"},"modified":"2025-07-05T14:22:57","modified_gmt":"2025-07-05T14:22:57","slug":"some-researchers-founded-a-company-where-all-the-employees-were-ai-agents-but-they-didnt-complete-a-quarter-of-the-work","status":"publish","type":"post","link":"https:\/\/teknomers.com\/en\/some-researchers-founded-a-company-where-all-the-employees-were-ai-agents-but-they-didnt-complete-a-quarter-of-the-work\/","title":{"rendered":"Some researchers founded a company where all the employees were AI agents, but they didn&#8217;t complete a quarter of the work."},"content":{"rendered":"\n<div>\n<p>With a generative AI that already shows <u>signs of deceleration<\/u>, the next great jump already glimmers on the horizon: the <u>AI agents<\/u>. Unlike chatbots, an AI agent can be given a complex task and will act independently, making decisions on the fly to achieve its goal. Everything pointed to the fact that <u>2025 was going to be the year of AI agents<\/u>. And to verify it, some researchers conducted <a rel=\"noopener, noreferrer nofollow\" href=\"https:\/\/www.businessinsider.com\/ai-agents-study-company-run-by-ai-disaster-replace-jobs-2025-4\" target=\"_blank\">a curious experiment<\/a>: They put several of these agents to work in a fictitious company. It didn\u2019t go very well.<\/p>\n<p><!-- BREAK 1 --> <\/p>\n<h2>A Fictitious Company<\/h2>\n<p>The study was conducted by <a rel=\"noopener, noreferrer nofollow\" href=\"https:\/\/arxiv.org\/pdf\/2412.14161\" target=\"_blank\"><u>Carnegie Mellon University researchers<\/u><\/a> and sought to measure the effectiveness of the AI agents. They created an environment that mimicked a small software development company, which they dubbed TheAgentCompany. The company had 18 employees and a detailed objective plan for the <em>sprint<\/em> quarterly. Additionally, they provided ample internal documentation such as an employee manual, human resources policies, and a good practices guide. Employees communicated through a chat program similar to Slack for seamless interaction.<\/p>\n<p><!-- BREAK 2 --><\/p>\n<div class=\"article-asset article-asset-normal article-asset-center\">\n<div class=\"desvio-container\">\n<div class=\"desvio\">\n<div class=\"desvio-figure js-desvio-figure\"><\/div>\n<\/p><\/div>\n<\/p><\/div>\n<\/div>\n<h2>The Staff<\/h2>\n<p>The AI agents employed at TheAgentCompany included models from Google, OpenAI, Meta, and Anthropic. They were assigned various roles, including Financial Analyst, Project Manager, and Software Engineer. A technology director and a human resources manager were created for agents to contact if they required assistance. Among their tasks were writing code, searching the Internet, opening programs, or organizing data in spreadsheets\u2014typical jobs in such a company setting.<\/p>\n<p><!-- BREAK 3 -->  <\/p>\n<h2>The Problems<\/h2>\n<p>Initially, everything seemed to be functioning smoothly; however, problems and misunderstandings soon emerged. One agent encountered a popup that obstructed its ability to access necessary information. Although it could have easily closed the popup by clicking the &#8216;X&#8217; in the upper right corner, it decided to seek help from human resources, who informed it that the IT department would contact it soon. Unfortunately, no one ever made that contact, leaving the task unfinished.<\/p>\n<p><!-- BREAK 4 --><\/p>\n<p>Curiously, the agents displayed erratic behavior when they were uncertain about the appropriate steps to follow. In some instances, they resorted to creating shortcuts to bypass challenging aspects of a task. For instance, when one agent was unable to find the correct person to ask a question, it simply changed the name of another user to that of the individual it needed to question.<\/p>\n<p><!-- BREAK 5 --><\/p>\n<h2>The Results<\/h2>\n<p>The accolade for Employee of the Month went to Anthropic\u2019s Claude 3.5 Sonnet model, which managed to complete 24% of its assigned tasks. In contrast, the models Gemini 2.0 Flash and ChatGPT completed only 10%, and the worst performer was Amazon\u2019s Nova Pro 1, which finished a disappointing 1.7% of its tasks. The most common errors stemmed from a lack of social skills and difficulties in internet searches, highlighting the limitations of AI agents in complex workplace environments.<\/p>\n<p><!-- BREAK 6 --> <\/p>\n<h2>The Threat of AI Agents<\/h2>\n<p>According to the latest <a rel=\"noopener, noreferrer nofollow\" href=\"https:\/\/www.weforum.org\/publications\/the-future-of-jobs-report-2025\/\" target=\"_blank\"><u>World Economic Forum Report<\/u><\/a>, AI could eliminate more than 90 million jobs in the next five years, although it is also anticipated that almost <u>twice that number of new positions<\/u> will be created. Nevertheless, AI agents pose a significant threat to various jobs. Instances like this experiment serve to showcase that current technology is not yet fully prepared to replace human employees entirely. As it stands, AI agents <u>make numerous mistakes<\/u> and, like Tesla&#8217;s Autopilot, it\u2019s advisable to keep one\u2019s hands on the steering wheel for the time being.<\/p>\n<p><!-- BREAK 7 --><\/p>\n<p>Image | Gemini<\/p>\n<p>In Xataka | <u>Workers have shifted their perception of AI; software engineers remain concerned<\/u>.<\/p>\n<\/div>\n<p><br \/>\n<br \/><a href=\"https:\/\/teknomers.com\/category\/general\/\" rel=\"dofollow\">General News &#8211; 2<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>With a generative AI that already shows signs of deceleration, the next great jump already glimmers on the horizon: the AI agents. Unlike chatbots, an AI agent can be given a complex task and will act independently, making decisions on the fly to achieve its goal. Everything pointed to the fact that 2025 was going [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":153578,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[36399],"tags":[12968,541,6600,6068,673,17719,380,761,319],"class_list":["post-153577","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-technology","tag-agents","tag-company","tag-complete","tag-didnt","tag-employees","tag-founded","tag-quarter","tag-researchers","tag-work"],"_links":{"self":[{"href":"https:\/\/teknomers.com\/en\/wp-json\/wp\/v2\/posts\/153577","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/teknomers.com\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/teknomers.com\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/teknomers.com\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/teknomers.com\/en\/wp-json\/wp\/v2\/comments?post=153577"}],"version-history":[{"count":0,"href":"https:\/\/teknomers.com\/en\/wp-json\/wp\/v2\/posts\/153577\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/teknomers.com\/en\/wp-json\/wp\/v2\/media\/153578"}],"wp:attachment":[{"href":"https:\/\/teknomers.com\/en\/wp-json\/wp\/v2\/media?parent=153577"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/teknomers.com\/en\/wp-json\/wp\/v2\/categories?post=153577"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/teknomers.com\/en\/wp-json\/wp\/v2\/tags?post=153577"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}