WAXAL: A serious push to fix African language speech tech’s data problem

WAXAL: A serious push to fix African language speech tech’s data problem

1 0 0

Google Research just dropped WAXAL, and honestly, it’s about time someone tackled this properly. For years, voice assistants and transcription tools have been pretty useless if you speak anything other than English, Mandarin, or a handful of other “high-resource” languages. Sub-Saharan Africa alone has over 2,000 languages, and most of them have been completely ignored by the speech tech industry.

WAXAL is their answer. It’s a massive open dataset covering 27 Sub-Saharan African languages spoken by over 100 million people across 26+ countries. The project started back in 2021, and they’ve been working with African academic and community groups to get this right. The numbers are solid: roughly 1,846 hours of transcribed natural speech for automatic speech recognition (ASR), plus over 565 hours of high-fidelity recordings for text-to-speech (TTS). All of it’s released under a Creative Commons CC-BY-4.0 license, which means researchers and developers can actually use it without jumping through legal hoops.

What I like about this approach is how they collected the ASR data. Instead of having people read boring scripts, they showed participants visual prompts — pictures from Google’s Open Images covering 50+ topics — and asked them to describe what they saw in their native language. This captures real speech patterns, tonal nuances, and code-switching in a way that scripted recordings never could. The result is more natural, which matters a lot for building ASR systems that actually work in the wild.

The TTS side is equally thoughtful. Local community members worked in pairs, drafting scripts of 10,000 to 20,000 words and alternating between reading and recording. Some participants even used project funding to build custom studio boxes for professional-grade acoustics. That level of community involvement is rare, and it shows in the quality — 565 hours of phonetically balanced audio is no small feat.

Now, 27 languages is a start, but let’s be real: there are over 2,000 languages in Sub-Saharan Africa. This covers maybe 1% of that diversity. Google says they intend for WAXAL to “continuously evolve and expand,” but I’ve heard that promise before from big tech. The real test will be whether they stick with it and whether the African AI ecosystem can build on this foundation.

The permissive license is the smartest move here. By making it CC-BY-4.0, they’re not just dumping data — they’re enabling startups, universities, and local researchers to build products that actually serve their communities. No corporate gatekeeping, no restrictive terms. Just raw material for voice-enabled tech that might finally work for the hundreds of millions who’ve been left out.

I’m cautiously optimistic. WAXAL is a genuinely useful resource, and the methodology is sound. But the proof will be in the applications that emerge from it. If this sparks a wave of African-language voice assistants, transcription tools, and accessibility products, then it’s a win. If it ends up as another forgotten dataset on a server, that’s on all of us.

WAXAL-1

Examples from Google’s Open Images used as prompts to elicit natural speech for the ASR dataset.

Comments (0)

Be the first to comment!