Close Menu
  • Home
  • World News
  • India News
  • Business News
  • Health
  • Sports
  • Indian Diaspora In US
  • Technology
  • Bollywood
  • Education
Facebook X (Twitter) Instagram
Thursday, May 28, 2026
Breaking News
  • Indian Consulates Across the U.S. Celebrate and Fortify US-India Ties through Engaging Events
  • Premier League: Will Xabi Alonso Ignite a New Era for Chelsea?
  • ETC and Chula Unisearch Kick Off Dynamic New Business Pitching Platform for Practical Learning
  • Autonomous Electric Trucks Take to Ohio’s Roads This Summer 2026
  • Salman Khan Plays Matchmaker as He Mediates in Ranveer Singh-Farhan Akhtar’s Don 3 Clash!
  • Younger Leaders Emerge as Indian CMs Trim the Average Age by 2.6 Years in a Boomer-Dominated Era
  • Carney’s Indian Journey Paves the Way for Renewed Ties with Canada
  • WFI Takes on High Court Ruling: Vinesh Phogat’s Bid for Asian Games Trials at Stake
Facebook X (Twitter) Instagram
India Bulletin
Advertisement
  • Home
  • World News
  • India News
  • Business News
  • Health
  • Sports
  • Indian Diaspora In US
  • Technology
  • Bollywood
  • Education
India Bulletin
Home»Business News»Salesforce’s CRM Study Reveals AI Agents Face Challenges in Real-World Business Settings
Business News

Salesforce’s CRM Study Reveals AI Agents Face Challenges in Real-World Business Settings

June 15, 20252 Mins Read
Facebook Twitter Email
Share
Facebook Twitter Email


Salesforce’s CRMArena-Pro Benchmark Highlights AI Challenges in Business

Salesforce has introduced its new CRMArena-Pro benchmark, revealing significant hurdles AI agents face in business environments. Even highly advanced models like Gemini 2.5 Pro achieve only a 58% success rate in straightforward tasks. When interactions become longer, success rates drop to a mere 35%.

CRMArena-Pro aims to assess how well large language models (LLMs) can perform in actual business tasks, particularly in areas like sales, customer service, and pricing. This benchmark expands on the previous CRMArena, including more business functions, multi-turn dialogues, and data privacy testing. The Salesforce team generated 4,280 task instances across 19 business activities using synthetic data.

Challenges with Longer Conversations

The findings shed light on the limitations of current LLMs. For simple, single-turn tasks, models like Gemini 2.5 Pro reach about 58% accuracy. However, when it comes to multi-turn conversations—where follow-up questions are necessary—performance drops dramatically to 35%.

Salesforce ran thorough tests on nine LLMs and discovered that many struggle to ask appropriate follow-up questions. In a review of 20 unsuccessful multi-turn tasks involving Gemini 2.5 Pro, nearly half failed due to the model not seeking vital information. Models that are more proactive in asking questions perform better in these situations.

The best results were seen in automated workflows, like managing customer service cases, where Gemini 2.5 Pro achieved an impressive 83% success rate. However, accuracy significantly declined in tasks that required deeper understanding, such as identifying incorrect product configurations or extracting information from call logs.

Data Privacy Concerns

The benchmark also highlights shortcomings in data privacy. Generally, LLMs do not recognize or refuse requests for sensitive information, like personal details or internal company data.

Only by adjusting the system prompt to include explicit privacy guidelines did models begin to reject these sensitive requests, but this came at a cost to overall performance. For instance, GPT-4o improved its ability to detect confidential information from 0% to 34.2%, but its task completion rate fell by 2.7 points. Open-source models like LLaMA-3.1 were even less responsive to prompt changes, indicating they require better training to prioritize instructions correctly.

Kung-Hsiang Steeve Huang, one of the authors of this study, emphasizes that data protection tests have often been overlooked in benchmarks until now. CRMArena-Pro represents a pioneering effort to systematically evaluate this aspect of AI performance.

Agents AI agents Benchmark
Share. Facebook Twitter Email
admin
  • Website

Related Posts

ETC and Chula Unisearch Kick Off Dynamic New Business Pitching Platform for Practical Learning

May 28, 2026

CATL to Revolutionize Global Energy Storage with Major Testing Hub in Xiamen

May 28, 2026

InterSystems and 59stVentures Boost AI-Driven Data Evolution in ASEAN

May 28, 2026
  • Facebook
  • Twitter
  • Instagram
Don't Miss

Indian Consulates Across the U.S. Celebrate and Fortify US-India Ties through Engaging Events

Premier League: Will Xabi Alonso Ignite a New Era for Chelsea?

ETC and Chula Unisearch Kick Off Dynamic New Business Pitching Platform for Practical Learning

Autonomous Electric Trucks Take to Ohio’s Roads This Summer 2026

Started in 2004, India Bulletin is the largest and
most read South Asian publication
in Chicago and surrounding Midwest.

  • Home
  • About Us
  • Contact
  • Advertise With Us
  • Privacy Policy
  • Terms of Use
  • Disclaimer
  • CCPA
News
  • Bollywood
  • Business News
  • Health
  • India News
  • Indian Diaspora In US
  • Sports
  • Technology
  • World News
Facebook X (Twitter) Instagram

Type above and press Enter to search. Press Esc to cancel.

Accessibility Adjustments

Powered by OneTap

How long do you want to hide the toolbar?
Hide Toolbar Duration
Select your accessibility profile
Vision Impaired Mode
Enhances website's visuals
Seizure Safe Profile
Clear flashes & reduces color
ADHD Friendly Mode
Focused browsing, distraction-free
Blindness Mode
Reduces distractions, improves focus
Epilepsy Safe Mode
Dims colors and stops blinking
Content Modules
Font Size

Default

Line Height

Default

Color Modules
Orientation Modules