Making AI Speak 9 Indian languages: Open-source Text-to-Speech Models

Making AI Speak 9 Indian languages: Open-source Text-to-Speech Models

Germany flag

Germany

Translation

Medium replicability and adaptation

Implementing Organisation

German Federal Ministry for Economic Cooperation and Development (BMZ)

Germany, Germany

Government

Implementing Point of Contact

Wolfger Bungarten

Deputy Head of Division, Digital technologies

Contributor of the Impact Story

Germany

Year of implementation

2024

Problem statement

In a country as linguistically diverse as India, voice technology has the potential to transform how people connect with information, services, and each other. Yet for millions, this promise remains out of reach. The Indian Institute of Science (IISc), in collaboration with the BMZ initiative “FAIR Forward—Artificial Intelligence for All”, has worked on changing that. By building high-quality Text-To-Speech (TTS) corpora in nine Indian languages, IISc is laying the foundation for inclusive, accessible voice-based applications that serve people in their own languages and cultural contexts. These digital public goods for voice technologies unlock a wide range of applications—from voice assistants and screen readers to language learning tools, automated helplines, and accessible public service delivery. They empower users in rural and low-literacy communities to interact with digital systems in their native languages, bridging the digital divide and fostering more inclusive participation in education, healthcare, governance, and beyond. The IISc’s SYSPIN project focuses on nine languages: Bengali, Bhojpuri, Chhattisgarhi, Hindi, Kannada, Magahi, Maithili, Marathi, and Telugu, which have historically lacked the technological resources needed to support modern voice systems based on artificial intelligence (AI). IISc addresses this gap by creating 720 hours of studio-quality audio data.

Submission Overview

The German Federal Ministry for Economic Cooperation and Development (BMZ) develops the guidelines and the fundamental concepts on which German development policy is based. German development policy is guided by the goal of improving living conditions for people worldwide. BMZ works to move the world forward in cooperation with the inter

AI Technology Used

Natural Language Processing

Key Outcomes

Inclusion & Equity

BMZ’s initiative FAIR Forward - Artificial Intelligence for All, implemented by GIZ in partnership with Indian institutions, is working with the Indian Institute of Science (IISc) and the national Bhashini mission to create high-quality open-source text‑to‑speech models in nine Indian languages (Hindi, Bengali, Marathi, Telugu, Bhojpuri, Kannada, Magahi, Chhattisgarhi and Maithili). These studio‑quality corpora are being released as digital public goods so that developers, governments, and civil society can build voice assistants, screen readers, helplines and other applications that “speak” to users in their own languages, especially in rural and low‑literacy communities. By closing a long‑standing resource gap for Indian languages in AI, the SYSPIN project strengthens inclusion and equity, improves the quality and accessibility of voice technologies, and supports India’s wider effort - through Bhashini and allied initiatives - to democratise AI and ensure that language is no longer a barrier to participating in education, healthcare, governance and economic life.

Impact Metrics

Number of high-quality open-source Text-to-Speech (TTS) models available in Bengali, Bhojpuri, Chhattisgarhi, Hindi, Kannada, Magahi, Maithili, Marathi, and Telugu

Baseline Value

NA languages

Post-Implementation

9 languages

Internal Monitoring·Jan 2024 - Feb 2026

Implementation Context

Deployed

India

Rural and low-literacy communities in India, especially speakers of Bengali, Bhojpuri, Chhattisgarhi, Hindi, Kannada, Magahi, Maithili, Marathi, and Telugu

Key Partnerships

Indian Institute of Science (IISc), Bhashini, FAIR Forward – AI for All, and Sister Project RESPIN which is supported by the Gates Foundation

Replicability & Adaptation

Moderate

1. Infrastructure and toolchain for recording and processing studio-quality speech 2. TTS corpus creation 3. Training and hosting TTS models 4. Publishing them as open-source artefacts 5. Linguists and native speakers for all nine languages 6. Recording and quality-control teams 7. AI/ML engineers 8. Programme management and partnership staff 9. Responsible AI and inclusion experts ---

* The data presented is self-reported by the respective organisations. Readers should consult the original sources for further details.