Innovation powerhouse Inception, an avant-garde company within G42, committed to expanding the frontiers of AI technology, has unveiled the open-source debut of “Jais,” the world’s most advanced Arabic Large Language Model. Jais is a monumental 13-billion parameter model, meticulously trained on an innovative dataset containing a staggering 395 billion tokens in Arabic and English.
Jais, aptly named after the UAE’s loftiest peak, is poised to usher in the realm of generative AI across Arabic-speaking domains. This extraordinary creation emerges from the collaborative synergy between Inception, the Mohamed bin Zayed University of Artificial Intelligence (MBZUAI) – the globe’s inaugural research institution dedicated solely to AI – and the trailblazing Cerebras Systems. The training process took place on the Condor Galaxy, a recently announced multi-exaFLOP AI supercomputer co-engineered by G42 and Cerebras.
The release of Jais marks a remarkable milestone in the sphere of Arabic AI. Developed in Abu Dhabi, the heart of the UAE, Jais extends its influence to over 400 million Arabic speakers, affording them the unparalleled potential of generative AI. This achievement underscores Abu Dhabi’s preeminent status as a hub for AI, culture preservation, innovation, and global collaboration.
By sharing Jais with the world as open source, Inception is determined to involve the scientific, academic, and developer communities to expedite the growth of a thriving Arabic language AI ecosystem. This initiative can serve as an archetype for elevating other languages that have yet to gain significant representation in mainstream AI.
Andrew Jackson, Inception’s CEO, stresses the importance of collaboration in fostering innovation: “We firmly believe that innovation flourishes when collaboration thrives. Through this release, we’re setting a fresh benchmark for AI progress in the Middle East. Jais stands as a testament to our unwavering commitment to excellence and our mission to democratize AI while fueling innovation.”
Jais surpasses existing Arabic models by a significant margin and remarkably competes with English models of comparable size, despite being trained on a substantially smaller English dataset. This groundbreaking outcome reveals the model’s capacity to glean insights from both English and Arabic data, heralding a new era in Large Language Model development and training.
Eric Xing, President and University Professor of MBZUAI, notes, “Crafting a top-tier Arabic LLM necessitated cutting-edge AI research and a profound understanding of Arabic’s diversity, heritage, and the expanding role of LLMs across society. Thanks to our research and collaborations with Inception and other eminent global organizations, MBZUAI remains at the forefront of pioneering efficient, effective, and precise LLMs.”
Concurrently with the model’s launch, Inception and MBZUAI have initiated an academic partnership to provide early access to forthcoming Arabic LLMs for testing purposes. Notable academic collaborators include Carnegie Mellon University, Ecole Polytechnique, Hamad bin Khalifa University, Sorbonne Paris Nord – LIPN, NYU Abu Dhabi’s CAMeL Lab, and The University of Edinburgh. Diverse organizations such as the UAE Ministry of Foreign Affairs, the UAE Ministry of Industry and Advanced Technology, The Department of Health – Abu Dhabi, Abu Dhabi National Oil Company (ADNOC), Etihad Airways, First Abu Dhabi Bank (FAB), and e&, will harness Jais to glean insights that will further enhance the model.
Jais represents a transformer-based large language model equipped with state-of-the-art features, including ALiBi position embeddings, enabling the model to contextualize longer inputs and enhance precision. Additional cutting-edge techniques such as SwiGLU and maximal update parameterization contribute to the model’s training efficiency and accuracy. The intricate process of training, fine-tuning, and evaluation was executed by a collaborative team from Inception and MBZUAI, leveraging the Condor Galaxy 1 (CG-1), a state-of-the-art AI supercomputer co-conceived by G42 and Cerebras Systems. The open-source model, containing a staggering 13 billion parameters, underwent training using a purpose-built dataset encompassing 116 billion Arabic tokens and 279 billion English word tokens, designed to capture the intricate nuances of the languages. Inception and MBZUAI remain dedicated to refining and expanding Jais as its user base grows.
Andrew Feldman, Cerebras Systems’ co-founder and CEO, underscores the significance of the partnership: “Our collaboration with G42 is already yielding pioneering outcomes, exemplified by the introduction of the multi-exaFLOP AI supercomputer, Condor Galaxy 1 (CG-1). The partnership now adds another crucial breakthrough: the foremost Arabic LLM for the open-source community. Jais constitutes a remarkable contribution to the international open-source sphere, a testament to the user-friendly nature of CG-1 and its role in accelerating rapid AI model development.”
Presently, Inception occupies the crossroads of academia, business, and regulation, forging synergies, nurturing collaboration, and expediting the integration of AI across various industries.
Jais is accessible for download on Hugging Face, and interested users can also explore its capabilities online by registering their interest on the Jais website to receive invitations for access to the playground environment. For a comprehensive understanding of Jais’ benchmarks and comparisons with other models, delve into the Jais white paper.