Let’s start with the key point. ALIA, the Spanish AI model, should not have been launched when it was released.
From our conversations at Xataka with one of the principal developers, we’ve gained deeper insight into ALIA’s scope and objectives, as well as the reasons why comparisons with ChatGPT are not only unfair but inappropriate.
ALIA-40b is a foundational AI model, representing a large-scale artificial intelligence system trained on a massive and varied dataset, serving as a basis for a multitude of applications. Coordinated by the Barcelona Supercomputing Center (BSC-CNS), it has benefited from the infrastructure of the MareNostrum 5 supercomputer; however, those advantages were limited.
The performance comparison of ALIA-40b with models like GPT-5 or Gemini 3 is inappropriate. To understand this better, we need to consider the project’s origins, its objectives, and how its development is far more modest—and yet promising—compared to the grand proprietary models from the U.S. and the striking open models appearing in China. Let’s delve deeper into what transpired with ALIA and what we expect for its future.
Promises and Realities
On January 20, 2025, Pedro Sánchez, the President of the Government of Spain, announced ALIA’s launch, which initially sounded promising. Almost a year earlier, he had hinted at this initiative, though with few details. He suggested a family of AI models designed to foster research in the field and develop technological solutions in Spanish, the fourth most spoken language globally and the second most used on the Internet.
It was even mentioned that projects were underway to apply ALIA in pilot studies for the Tax Agency and in primary healthcare. According to BSC-CNS’s website, ALIA-40b’s technical details boasted of being “the most advanced multilingual public foundational model in Europe with 40 billion parameters.” It had been trained for over 8 months on MareNostrum 5 with 6.9 trillion tokens in 35 European languages, with the final version expected to be trained on as many as 9.2 trillion tokens.
However, the reality was quite different. ALIA-40b received criticism from initial testers. Early performance assessments revealed poor performance, similar to an open-source model, Llama-2-34b, released in mid-2023.
A subsequent study from the University of Valencia clearly indicated that ALIA performed worse than other language models in math tests, even scoring lower than random guessing.
The message was clear: ALIA lagged significantly behind its competitors. The model wasn’t even included in major performance comparisons such as LLM-Stats, Artificial Analysis, and LMArena, where it failed to make the cut among 176 models.
A Premature Launch
The primary reason for such underwhelming performance is straightforward: ALIA was not ready for testing because of a premature launch. Aitor González-Agirre, one of ALIA’s developers at BSC, clarifies that this launch “was not a technical decision.”
During the training process, they faced a classic issue—having to cut short a lengthy process. The concept of learning rate (how quickly the model learns) plays a crucial role here. Initially, a high rate is desired to grasp general concepts, later transitioning into a slow down to refine specific details.
For training, a scheduler was employed, which informs MareNostrum 5 on how to adjust the speed over time. The model was initially intended to be trained with 12 trillion tokens, but due to strategic—not technical—decisions, they had to halt prematurely.
While they initially had access to MareNostrum 5 for tests, ALIA was launched with only 2.3 trillion tokens. The learning parameters were still in their high-speed phase, rendering the model fundamentally “raw.”
Adding to the complications, ALIA-40b was launched as a pre-trained model, lacking instructions or alignment. It wasn’t a final product; many expected it to answer questions coherently, which it did not. It was a “crude” model that merely predicted subsequent words without comprehending questions.
Overcoming Obstacles
Months later, the initial phase has been completed, and González-Agirre now mentions that the model’s performance has significantly improved. It is now competing well, claiming the title of the best model in Basque and second in Catalan and Galician.
However, the path ahead is still fraught with challenges, especially regarding access to datasets for training. González-Agirre explains the inherent restrictions: “There are many improvements to be made, but we also have to respect dataset limitations.” Unlike tech giants, ALIA does not utilize copyrighted content or outputs from other models that lack permission to use their data.
The bottom line is that users are increasingly aware that AI models have exploited digital content for training, often without consent, which has led to numerous lawsuits. ALIA’s development team is diligent in regenerating datasets to comply with legal requirements.
Despite accessing MareNostrum’s resources initially, their computing capabilities have dwindled. They had once operated with 512 nodes but are now down to 16, hampering pre-training efforts.
Another pressing issue is that ALIA currently lacks inference capabilities. There is no app or platform available for users to engage directly with the AI model as one would with ChatGPT or Gemini, making it harder to gather user feedback.
A Vision for the Future
Aitor emphasizes the necessity of public alternatives to proprietary AI systems. “If you don’t have a private vehicle, at least you should be able to ride a bus,” he asserts regarding the necessity for publicly-funded AI models that align with Spanish values and culture.
Despite fierce competition, particularly from more or less open models from China, the future of ALIA holds promise. By the end of the year, they aim to deliver usable versions with comparable performance to similar models.” Ultimately, their vision remains to develop a capable AI that operates within ethically defined boundaries.

