Google Using Anthropic’s AI to Review Gemini’s Performance
Google is utilizing Anthropic’s Claude AI system to assess its own Gemini AI, as reported by TechCrunch. This practice raises some questions regarding compliance with Anthropic’s service terms.
According to internal documents seen by TechCrunch, contractors working on Gemini are comparing its responses with those generated by Claude. Their job is to rate the accuracy and quality of Gemini’s answers based on points like truthfulness and thoroughness. A report states that contractors can take up to 30 minutes for each prompt to determine which AI provided the better answer.
Comparing Responses: Gemini vs. Claude
The report highlights that contractors discovered mentions of Claude within Google’s internal platform used for evaluating AI models. They noted that Claude often prioritized safety more than Gemini, either declining to answer certain prompts considered unsafe or providing more careful responses. For instance, one of Gemini’s answers was flagged for including inappropriate content, leading to a significant safety concern.
One contractor remarked that "Claude’s safety settings are the strictest" among AI models currently available.
Anthropic’s Terms and Google’s Response
Anthropic’s terms clearly state that using Claude to create or train competing AI systems is not allowed without prior approval. However, a spokesperson from Google DeepMind confirmed that they do compare outputs of different models as part of their evaluation but firmly denied using Anthropic’s AI to assist in training Gemini. Notably, Google is also a significant investor in Anthropic.
Shira McNamara, a representative for Google DeepMind, stated, "In line with standard industry practice, in some cases we compare model outputs as part of our evaluation process," adding, "However, any suggestion that we have used Anthropic models to train Gemini is inaccurate."
