Advertisement
Advertisement

Beyond the big four: Enriching Nigeria’s AI language diversity

BY JOSHUA OLUFEMI

A former colleague walked into my office two months ago, and our conversation was barely 20 minutes before he jokingly accused me of not including his state of origin’s popular languages (Ibibio and Eket) among the languages our data collection outfit, Goloka recently collected and open-sourced for Meta to use in their ongoing multilingual language model projects.

A similar conversation ensued in June at a meeting where I had noted we would add Fulfulde to the first five languages we planned on collecting. A mentor who was moderating the conversation jokingly remarked that I wasn’t politically correct by picking Fulfulde over Kanuri, which I know was his own language, and I would be paying a fine for that.

These two conversations illustrate the affinity and value that Nigerians, and I dare say, people globally, attach to their indigenous languages or what is generally referred to as mother tongue. For Nigerians, at home or abroad, hearing our mother tongue is not just a matter of communication. It is a call to life and an open invitation to communion.

Advertisement

Mother tongue carries memory, identity, and cultural belonging. Sadly, the bulk of our digital and civic conversations on TikTok, Instagram, X, Meta, or YouTube remain locked in English (and of course, you are reading this in English). For millions of citizens, this becomes exclusion from policy debates, consumer information, election conversations, or even everyday product choices.

Imagine if every tweet, policy paper, product campaign, or civic announcement could be automatically translated accurately and contextually into Yoruba, Tiv, Igala, Kanuri, Fulfulde, Edo, Nupe, or Idoma. The impact would be profound, and, in real terms, mean accessible governance, inclusive markets, and a society where no citizen is left voiceless because of language.

Artificial Intelligence (AI) has the power to break this barrier. It is already shaping Africa’s digital economy, from its early adoption by startups in the agriculture, health and creative sectors, to its use in streamlining consumer service delivery, and increasingly, civic education and policy communication. No doubt, AI is creating new productivity frontiers. However, the greater and much overlooked opportunity lies elsewhere: in language.

Advertisement

But here lies the other challenge we saw during our scoping study: investment in Nigerian language AI currently prioritises the “Big Four” — Yoruba, Hausa, Igbo, and Pidgin English. While these cover tens of millions of speakers, they risk reinforcing linguistic dominance and silencing another 50–70 million Nigerians whose lives are rooted in other languages. AI could unintentionally become the next tool of internal colonisation, where a handful of tongues dominate the digital landscape.

When Dataphyte carved its AI strategy in 2024, it was clear that, besides literacy, learning and localisation, language was going to be central to everything we do with AI. Firstly, because we know it’s the path that many startups or social impact organisations won’t chart because it’s considered unappealing and capital-intensive in the short to medium term. Secondly, learning (research and dialogue) is already our forte as a think tank. Thirdly, we know literacy and localisation are easy (and genuinely significant) to sell across borders.

The question is not whether it is commercially viable to train AI on Tiv, Kanuri, Efik, or Igala. The question is whether Nigeria and Africa can afford the cultural and civic cost of not doing so. This is not just about immediate return on investment; it is about cultural renaissance. It is holistic development. It is about language preservation, about keeping alive the words, idioms, and rhythms of communities that risk disappearing within a generation. It is about creating equity in policy communication, unlocking new consumer bases for businesses, and enabling diaspora Nigerians to reconnect emotionally with their roots.

At Dataphyte, these convictions shaped our partnership with Meta: we went beyond Yoruba to include Fulfulde, and we are on a journey to build datasets for ten more demographically significant Nigerian languages. We call it a proof-of-concept for what Africa must do, which is to build community-owned language data, partner with global platforms to mainstream them, and unlock the cultural and economic dividends that follow.

Advertisement

We are building upon an understanding of ourselves as an organisation that AI can be more than a productivity tool. For Africa, it can be the engine of linguistic justice and cultural renaissance. The future of our languages is the future of our digital economy. The two cannot be separated.

We understand that digital futures are only as relevant as African languages are at the core of the conversation. And as a data access and technology institution, we are strategically positioned to offer this public interest commodity for the young population that is defining Africa today and tomorrow.

Joshua Olufemi is the founder of Dataphyte and Goloka Analytics. He can be reached via LinkedIn and X

Advertisement


Views expressed by contributors are strictly personal and not of TheCable.

error: Content is protected from copying.