SAN FRANCISCO: In late 2022, OpenAI introduced ChatGPT, sparking a wave of interest in chatbots. Fast forward to last year, and newer systems from OpenAI and Anthropic created a fresh surge in technology with AI agents. These agents can act like personal digital assistants, managing a variety of tasks.
A startup in San Francisco named Arena is working to clarify how people actually use these agents. Their service, called Agent Mode, recently reported that users utilize AI agents for coding tasks about 17% of the time. In addition, about 10% of interactions involve research.
Following research, the next most common uses for agents include creating images, generating documents like graphs and spreadsheets, and brainstorming ideas. Roughly 5% of the time, users engage AI agents for creative writing, tutoring, and educational purposes. Other tasks include debugging code and general chatting.
AI systems developed by OpenAI, Anthropic, and others have the ability to write, test, and edit code. This allows skilled programmers to automate a lot of the work they would typically do manually. These agents can also spend extended periods searching the internet for information on various subjects, including finance, healthcare, and legal matters.
While some functions overlap with what conventional chatbots can achieve, AI agents have the added capability to interact with other software programs on behalf of users. This includes working with spreadsheets, calendars, and email applications.
“An agent can access the internet, search, create files, and interact with other AI models to finish its work,” shared Arena’s CEO, Anastasios Angelopoulos, who co-founded the startup.
In Silicon Valley, many view these bots like employees, delegating tasks at any hour of the day. Some tech leaders and analysts believe that AI agents could replace certain office roles in the near future.
In a notable example, the fintech company Block, which owns Square and Cash App, announced it would reduce its workforce by 40% in anticipation of these AI developments. This highlights the potential shift in job dynamics due to AI.
However, it’s important to note that these digital assistants can only manage a limited range of tasks and are sometimes unreliable. Like chatbots, AI agents can make errors and behave unpredictably, especially when sending emails or messages, which is why Arena restricts users from connecting their agents to email and messaging apps. The company focuses on selling data and insights rather than offering direct interaction.
They also keep agents within a controlled “sandbox” environment to prevent accidental damage to users’ computers, like deleting files or software. Arena’s tracking indicates that about 8% of the time, agents incorrectly claim to have completed tasks. This “bluffing” can lead to bigger issues as tasks often build upon each other.
“The models might say, ‘Yes, I did this,’ but they might not have,” Angelopoulos explained. “They could claim they’ve created a file, but it’s not actually there.”
Arena also evaluates the different AI technologies available. According to their findings, the most effective agents are powered by OpenAI’s GPT-5.5 High technology, while Anthropic’s Claude Opus 4.7 Thinking ranks second. These technologies have proven to be significantly more effective than offerings from other major players like Google and Elon Musk’s xAI.
