Alibaba Unveils Compact and Efficient AI Model
Alibaba Group Holding, a key player in China’s artificial intelligence landscape, has made a significant leap in technology with its latest AI development. The company recently introduced a new generation of foundation models that deliver impressive performance while being much smaller and more cost-effective than their predecessors.
On Friday, Alibaba Cloud, the company’s AI and cloud computing branch, launched these new large language models, which they describe as “the future of efficient LLMs.” Remarkably, these models are nearly 13 times smaller than Alibaba’s largest AI model released just a week earlier.
Despite its reduced size, the Qwen3-Next-80B-A3B has received praise as one of the company’s top models to date. Its efficiency is impressive, demonstrating a speed increase of up to 10 times in some tasks compared to the earlier Qwen3-32B model launched in April, while cutting training costs by 90%.
Emad Mostaque, co-founder of Stability AI, remarked that Alibaba’s new model surpasses “pretty much any model from last year” with an estimated training cost of under $500,000. In stark contrast, training Google’s Gemini Ultra, released in February 2024, reportedly cost about $191 million.
Artificial Analysis, a notable AI benchmarking firm, confirmed that the Qwen3-Next-80B-A3B outperformed its competitors, including models from DeepSeek and the Alibaba-backed startup Moonshot AI. Some experts attribute its success to an innovative technique known as “hybrid attention.”
Traditionally, AI models struggle with efficiency as input lengths increase due to how they determine the relevance of information. This “attention” mechanism often leads to high computational costs for better accuracy. The challenges grow when dealing with longer inputs, making it costly to train complex AI systems that can act autonomously.
The Qwen3-Next-80B-A3B tackles these issues using “Gated DeltaNet,” a technique developed by researchers from MIT and Nvidia. By refining how the model interacts with input data, it can decide what to keep and what to ignore, resulting in an efficient attention mechanism.
According to Alibaba, tests show that the new model performs comparably to its more powerful counterpart, the Qwen3-235B-A22B-Thinking-2507, despite being smaller and less expensive. This reflects a growing trend toward smaller yet more efficient AI models, especially as concerns regarding the costs of scaling larger models increase.
For instance, training expenses for the top model from xAI, Grok 4, reached $490 million, and future projects may exceed $1 billion. Moreover, researchers from Nvidia have recently argued for small language models as the future of AI due to their increased adaptability and efficiency.
In a similar effort, Chinese AI companies are working to ensure their models can run on everyday devices like laptops and smartphones. Recently, Tencent launched several open-source AI models under 7 billion parameters, while startup Z.ai released the GLM 4.5 Air model, which has only 12 billion active parameters.
Notably, Alibaba’s Qwen3-Next-80B-A3B is compact enough to function on a single Nvidia H200 graphics processing unit. Once available on the open-source platform Hugging Face, the model quickly gained popularity, earning almost 20,000 downloads within a day of its release.
Alibaba believes this new architecture serves as a glimpse into the promising future of its AI models. Experts suggest that the evolution of large language models will likely focus on enhancing efficiency and training costs, paving the way for innovative developments in AI.
