What is Arabic AbSum v2?

It is an Arabic abstractive summarisation benchmark developed by regional research labs, testing models on a diverse corpus including news, legal, and academic Arabic. AbSum v2 is the current reference standard.

Can I access Fanar-2 or Noor as a developer?

Both are available through regional cloud providers. Fanar-2 is accessible via QCRI's developer portal and Core42. Noor is available through Ai71 direct API and partner clouds.

How do these models compare to Jais and ALLaM?

Fanar-2 and Noor represent newer generations of Arabic-first models, generally outperforming the 2024 Jais and early ALLaM releases. ALLaM has matched pace in 2026 with new releases.

Is open weight release planned?

Fanar has historically released open weights for smaller variants. Ai71 has open-weighted portions of the Falcon line. Full Noor weights remain commercial, though Ai71 has indicated a partial release is under discussion.

What workloads do these models fail at today?

Multi-hop reasoning, code generation, and long-context tasks beyond 32,000 tokens. These are areas where frontier English models retain a clear edge.

Fanar-2 and Ai71's Noor Are Quietly Converging on Arabic Summarisation, and the Gap to GPT Is Closing

Qatar's Fanar and Abu Dhabi's Ai71 both shipped updated Arabic-first models this month, and the benchmarks they have published tell a surprisingly consistent story: the two models are within two to three percentage points of each other on Arabic summarisation, and within eight points of GPT-5 on the same tasks. That is the closest the regional models have been to frontier performance on an Arabic-specific workload since the first Jais release in 2023.

Why summarisation matters

Summarisation is the Arabic AI benchmark that Gulf enterprises actually care about. Legal firms want to summarise hundreds of pages of Arabic contracts. Banks need short-form summaries of regulator circulars. Ministries want readable briefs from longer policy documents. The gap between a model that works and one that fails is measured in reader acceptance, and historical Arabic models have failed at the morphological level, dropping negations, compressing diacritic distinctions, or introducing classical phrasings into modern news text.

Fanar-2 and Noor have visibly improved on those failure modes.

By The Numbers

Fanar-2 scored 83.4 on Arabic AbSum v2 summarisation, up from 76.1 for Fanar-1.
Ai71 Noor scored 81.6 on the same benchmark, a 9 point improvement over Noor-Lite.
GPT-5 scores 89.9 on Arabic AbSum v2, a gap of roughly six to eight points.
Hugging Face lists 37 actively maintained Arabic-first models as of April 2026, up from 18 a year ago.
Fanar-2 and Noor both support Modern Standard Arabic and five major dialects: Egyptian, Levantine, Gulf, Maghrebi, and Iraqi.

Arabic summarisation was the benchmark that kept revealing the weakness of Arabic LLMs. That the regional models are now within a single generation of frontier is significant.
Dr. Houda Bouamor, Principal Scientist, Qatar Computing Research Institute

Fanar-2 and Ai71's Noor Are Quietly Converging on Arabic Summarisation, and the Gap to GPT Is Closing

What changed between releases

Fanar-2 was trained with a substantially expanded Arabic corpus, including parliamentary records from Gulf countries and Egypt, historical press archives, and a curated set of Islamic legal texts. The training team, a partnership between QCRI and HBKU, also invested in instruction tuning with human Arabic editors, a detail often skipped in earlier regional efforts.

Ai71's Noor took a different path. The TII-affiliated team focused on synthetic data generation with quality filters, producing what Ai71 calls "grounded-synthetic" training pairs for summarisation and long-context reasoning. The result is a smaller model, 12 billion parameters compared to Fanar-2's 34 billion, that runs significantly cheaper in production.

Head-to-head capabilities

Capability	Fanar-2	Ai71 Noor	GPT-5
Arabic summarisation (AbSum v2)	83.4	81.6	89.9
MSA reading comprehension	78.2	76.9	85.1
Dialect accuracy (5-dialect avg)	79.4	82.1	74.6
Classical Arabic	71.8	68.2	62.3
Parameters	34B	12B	Undisclosed
Inference cost per 1M tokens	$1.20	$0.45	$3.00

The dialect column is the interesting one. On five-dialect average, both regional models outperform GPT-5. That is the category where local engineering, local data, and local evaluation pays off.

Frontier-grade English models still struggle with Egyptian and Gulf dialect nuances. Our models are built specifically for these, and that is where the gap closes.
Dr. Rowan Jaber, Head of Research, Ai71

Production deployments

Qatar National Library is piloting Fanar-2 for automated abstracts on its Arabic academic collection. Abu Dhabi Judicial Department is evaluating Noor for internal court document summarisation. Saudi's ALLaM team is running an internal comparison benchmark on legal documents. Most significantly, Al Jazeera Digital is testing both models for newsroom summarisation, the most visible production workload in the region.

For the broader Arabic AI picture, see our earlier coverage of April 2026's Falcon-H1 lead, the Arabic dialect benchmarks, and the April Arabic LLM scoreboard.

The AI in Arabia View: The Arabic LLM narrative has shifted from "we need to catch GPT" to "we need to be better than GPT on Arabic workloads". That is the right frame. Fanar-2 and Noor will probably never outperform GPT-5 on English reasoning, and that is fine. They already outperform on dialect, on Islamic jurisprudence, on classical Arabic, and they are within six points on the workloads that matter commercially. The open question is distribution. QCRI and Ai71 need to get these into production faster. Commercial adoption drives the next round of data, which drives the next round of quality. A benchmark win matters only if customers are using the model in anger two quarters later.