Google launches WAXAL voice dataset for Sub-Saharan African languages

By : Samira Njoya

Date : jeudi, 05 février 2026 02:26

  • Database offers 11,000 hours across 21 languages, open access
  • Project aims to boost voice AI and preserve African languages

Google said on Monday, Feb. 2, it had launched WAXAL, a voice database aimed at supporting the development of AI tools tailored to Sub-Saharan Africa. The dataset covers 21 languages, including Yoruba, Acholi, Hausa, Luganda, Malagasy and Shona, and contains more than 11,000 hours of audio from nearly 2 million recordings.

“We wanted to capture how people really talk, so we asked participants to describe different pictures in their native languages. We also recorded professional voice actors in the studio to create the high-quality audio needed for text-to-speech technology,” Google said, adding that professional voice actors were also recorded in studios to provide high-quality material for text-to-speech systems.

The company said WAXAL includes 1,250 hours of transcribed speech for automatic speech recognition and more than 20 hours of studio audio for text-to-speech synthesis. The project was developed with African partners including Makerere University in Uganda, the University of Ghana and Digital Umuganda in Rwanda.

WAXAL is available under an open licence on the Hugging Face platform, giving researchers and developers free access. Google said the initiative aims to spur innovation in voice technology and help safeguard African languages online.

UNESCO estimates Africa is home to between 1,500 and 3,000 languages, but most digital tools support only a handful. Limited high-quality data has slowed the development of voice assistants, educational apps and automated transcription on the continent.

Several local initiatives are also working to address the shortage. In Benin, the “JaimeMaLangue” project encourages citizens to help build a national voice database. Other datasets, such as African Voices in Nigeria and African Next Voices in Mali, are expanding resources for underrepresented languages.

Samira Njoya

TECH STARS

Please publish modules in offcanvas position.