
Germany
Translation
Implementing Organisation
German Federal Ministry for Economic Cooperation and Development (BMZ)
Germany, Germany
Implementing Point of Contact
Wolfger Bungarten
Deputy Head of Division, Digital technologies
Contributor of the Impact Story
Germany
Year of implementation
2023
Problem statement
To enable as many people as possible to benefit from AI, the technology must be able to “speak” and “understand” a wide range of languages, including those that are currently underserved in digital spaces. Conversational AI holds great potential for imparting knowledge and enabling access to services, yet it is often only available in English, which is spoken by just a fraction of the world’s population. Many African languages, including Kiswahili, Kinyarwanda and Luganda, lack the high-quality, open speech datasets needed to build inclusive voice technologies. As a result, users in rural and low-literacy communities are excluded from interacting with digital systems in their own languages, limiting their participation in education, healthcare, governance and economic opportunities. This use case addresses the gap by creating large-scale, open-source speech datasets for three East African languages on Mozilla’s Common Voice platform, enabling developers, researchers and innovators worldwide to build speech-to-text systems and other voice-enabled applications that understand these languages and can serve local needs.
Submission Overview
The German Federal Ministry for Economic Cooperation and Development (BMZ) develops the guidelines and the fundamental concepts on which German development policy is based. German development policy is guided by the goal of improving living conditions for people worldwide. BMZ works to move the world forward in cooperation with the inter
AI Technology Used
Key Outcomes
Inclusion & Equity
By creating more than 4,100 hours of open, high-quality speech data for Kiswahili, Kinyarwanda and Luganda, this initiative has established one of the largest voice resources for East African languages and turned them into digital public goods for AI developers and local innovators. The datasets enable the development of speech-to-text and other voice applications that “understand” three widely spoken African languages, opening up more inclusive access to information and services for users in rural and low-literacy communities who have long been excluded from English-only voice technologies. Because all data and associated models are open-source and well-documented, governments, civil society, researchers and companies can use them to build educational, health, financial and governance solutions tailored to their constituencies, while responsible-AI assessment and a Gender Action Plan for Kiswahili provide a template for mitigating harms and ensuring that women and other underrepresented groups are equitably represented in future AI systems.
Impact Metrics
Number of hours of diverse, high-quality open-source Speech-to-Text (STT) datasets available in Kinyarwanda
Baseline Value
NA Hours
Post-Implementation
2 ,388
Number of hours of diverse, high-quality open-source Speech-to-Text (STT) datasets available in Kiswahili (Swahili)
Baseline Value
NA Hours
Post-Implementation
1 ,137
Number of hours of diverse, high-quality open-source Speech-to-Text (STT) datasets available in Luganda
Baseline Value
NA Hours of recorded spoken language
Post-Implementation
582 Hours of recorded spoken language
Implementation Context
Kenya, Rwanda, Uganda
Speakers of Kiswahili, Kinyarwanda and Luganda, particularly users in rural and low-literacy communities, developers, public institutions, civil society and private sector actors across East Africa and globally
Key Partnerships
Mozilla Common Voice, "FAIR Forward – AI for All” of German Development Cooperation (GIZ), Bill & Melinda Gates Foundation, UK Foreign, Commonwealth & Development Office (FCDO)
Replicability & Adaptation
1. All data and models are fully open-source to enable easy replication in other languages and countries. 2. To replicate this AI commons, implementers should localise community outreach, governance, and inclusion strategies to each language community, including tailored gender and inclusion action plans and responsible AI assessments.
Supporting Materials
* The data presented is self-reported by the respective organisations. Readers should consult the original sources for further details.