IIIT-Hyderabad develops India’s first AI-based tribal language translator
IIIT-Hyderabad played a key role in building ‘Adi Vaani’, India’s first AI-powered translator for tribal languages. Supporting Santali, Bhili, Mundari, and Gondi, it uses neural translation and speech tools to bridge communication gaps and preserve tribal heritage
Published Date - 4 September 2025, 05:17 PM
Hyderabad: The International Institute of Information Technology (IIIT) – Hyderabad has played a vital role in developing ‘Adi Vaani’, a first-of-its-kind AI-powered translator for tribal languages, which has recently been launched by the Central government.
The translator, which is available on Google Play besides as a dedicated web platform, supports Santali, Bhili, Mundari, and Gondi languages. More languages, such as Kui and Garo, will shortly be added. The idea behind the development of the translator is to bridge the communication gap between tribal and non-tribal communities, besides safeguarding endangered languages.
Developed by a consortium of premier institutions led by IIT Delhi and comprising IIIT Hyderabad, BITS Pilani, and IIIT Naya Raipur in collaboration with the Tribal Research Institutes (TRIs) in Jharkhand, Odisha, Madhya Pradesh, Chhattisgarh, and Meghalaya, the platform enables real-time translation of both text and speech between Hindi/English and the tribal languages.
In addition, ‘Adi Vaani’ helps in preserving folklore, oral traditions, and cultural heritage via optical character recognition technology. It can support and promote civic inclusion in tribal communities by spreading awareness about government schemes and other important initiatives.
As per the Census 2011, India has 461 tribal languages and 71 distinct tribal mother tongues. Among these, 81 are vulnerable and 42 are critically endangered. These languages are facing the risk of extinction due to limited documentation and intergenerational transmission gaps.
Speech and Natural Language Processing groups at the oldest lab of Language Technologies Research Centre, IIIT-Hyderabad, used a Transformer-based sequence-to-sequence (seq2seq) architecture for the 4 machine translation systems of English to Santali, Hindi to Santali and vice versa.
“This has become the state-of-the-art approach in neural machine translation (NMT). The parallel corpus was built with the help of Tribal Research Institute, Odisha. After the base model was built, additional data was generated and post-edited by Santali native speakers, which helped improve the systems,” said Prof Radhika Mamidi of IIIT-Hyderabad.
The researchers also developed a Text-to-Speech (TTS) tool for Santali, Mundari, and Bhili languages. The TTS tool for Gondi is currently under development. Anindita Mondal, a research scholar who built the TTS tools, worked closely with the native speakers and they spent a considerable amount of time at IIIT-Hyderabad recording speech data.