December 2, 2025
Misraj Team
Team
Lahjawi by Misraj is a cross-dialect Arabic translation model that enables accurate translation between multiple Arabic dialects and conversion to Modern Standard Arabic. Discover how it was developed...
At Misraj, we continuously invest in developing advanced language technologies that understand the Arabic language in all its richness and diversity. With the significant linguistic gap between Arabic dialects, we saw a real need for a model capable of translating across dialects and converting them into Modern Standard Arabic (MSA) with high accuracy. This is how Lahjawi was created — the first comprehensive Arabic dialect-to-dialect translation model designed to deliver smarter and more natural language understanding.
Traditional NLP models struggle to interpret Arabic dialects due to the major differences in vocabulary, grammar, and expression. Dialects function as independent linguistic systems rather than simple variations. Previous research largely focused on converting one dialect to MSA, leaving dialect-to-dialect translation virtually untouched.
We developed a two-part translation system:
Lahjawi-D2D — for translating text across Arabic dialects.
Lahjawi-D2MSA — for converting any dialect into Modern Standard Arabic.
Using a lightweight language model, we fine-tuned it on one of the largest dialectal datasets to date:
197,042 dialect → MSA pairs
266,871 dialect → dialect pairs
We also adopted a question–answer training structure to help the model capture subtle linguistic differences between dialects.
Our model delivered strong and promising results:
BLEU 9.62 for dialect → MSA
BLEU 9.88 for dialect → dialect
78% fluency (human evaluation)
These outcomes demonstrate the model’s ability to retain meaning and linguistic nuance across dialects.
Lahjawi represents a core step in our vision for building a complete Arabic AI ecosystem. It:
Addresses a major gap in Arabic NLP.
Enhances communication between speakers of different dialects.
Enables more intelligent and human-like Arabic AI experiences.
Uses a lightweight architecture suitable for real-world deployment.
The model is practical and ready for integration into various applications:
Social media comment translation.
Intelligent chat assistants and conversational agents.
Educational tools that explain dialect differences.
Media and content analysis for text normalization.
We are proud of developing Lahjawi at Misraj because it represents our ongoing commitment to advancing Arabic AI. Lahjawi is not only a translation model — it is a foundational step toward smarter, more inclusive, and more capable Arabic language technologies.
Research paper link:
Contact us to discover how Mesraj's technologies can transform the way your organization works.
Start your journey to smarter solutions