Is the English language hindering collective intelligence?

The prevalence of a dominant language on the internet may be a catalyst for information exchange or, conversely, a barrier to the web’s potential

by Valentina Lunardi

alt=

Joshua Hoehne / Unsplash

Culture 02 May 2024

The unifying vision of the internet has echoes of a fable from the past. In that ancient parable, a united human race speaking a single language set out to build a magnificent city and a tower that would reach the heavens. Fearing the boundless potential of these beings, God confounded their speech and scattered them around the world. The origin myth of the Tower of Babel has stood the test of time, inspiring countless interpretations from Pieter Bruegel the Elder to a popular language-learning app.

In our time, that towering ambition has become an intricate digital network, a vast global resource connecting over 5 billion users worldwide. However, while the internet boasts content in hundreds of languages, closer examination reveals that more than half of its content is in English, even though native English speakers make up less than 5% of the world’s population. This discrepancy in digital language representation poses formidable challenges and profoundly affects how we express our identities, think, and work. For non-European languages, this linguistic divide threatens their cultural heritage. UNESCO has long advocated the preservation of online content in all human languages to safeguard linguistic diversity. Yet, as the internet expands, the chasm between spoken languages and their online representation is further widening.

 

The struggle is real (even for scientists)

One might argue that the solution is for everyone to master a lingua franca, such as English. However, the issue goes beyond cultural representation and vocabulary discrimination; it delves into the processes involved. Recent research published in PLOS Biology, part of the translatE project, surveyed 908 environmental scientists from eight countries. The findings underscore the significant challenges faced by non-native English-speaking scientists:

  • Reading papers in English required 91% more time compared to native speakers.
  • Writing research papers in English took them 51% more time.
  • The cost of professional editors for proofreading posed a financial burden.
  • Papers authored by non-native English speakers were 2.6 times more likely to be rejected by journals.
  • Non-native English speakers received revision requests 12.5 times more frequently.
  • Non-native speakers spent 94% more time preparing presentations than their native counterparts.
  • The stress and anxiety induced by these hurdles could cause talented scientists to leave academia prematurely.

 

AI to the rescue

In the face of such language fatigue, artificial intelligence offers a potential solution. Recent developments, such as Spotify’s Voice Translation pilot program, which uses OpenAI to automatically translate podcasts while preserving the original speaker’s voice, are impressive. Nevertheless, integrating AI into content creation and evaluation is not without its challenges and biases. A study conducted at Stanford University subjected 91 English essays created for the Test of English as a Foreign Language (TOEFL) to AI detectors. Surprisingly, more than half of these essays were incorrectly flagged as AI-generated. In contrast, essays written by native English-speaking eighth graders were correctly identified as human-generated over 90% of the time. The crux of the issue lies in how AI detectors assess the “text perplexity” of content. This metric measures how surprised or confused a language model is in predicting the next word in a sentence.

AI models like ChatGPT are trained to generate text with low perplexity, making them prone to misidentifying human-generated content, especially when it contains common words and familiar language patterns often characteristic of non-native English speakers. Rewriting essays with more complex language made them appear human-generated to AI detectors. Paradoxically, this suggests that AI detectors may encourage non-native writers to use AI-generated content to evade detection while mistakenly flagging college and job applications as AI-generated, leading to the marginalization of non-native English speakers.

alt=

Illustration by Francesca Ragazzi

Linguistic diversity, linguistic inclusivity

English-speaking individuals can effortlessly access content, services, and information aligned with their interests. In contrast, those who speak languages such as Bengali or Swahili have limited options, which not only creates individual inconvenience but also perpetuates extensive “information vacuums” in non-dominant languages.

Addressing this linguistic divide requires a focus on the concept of local content. Local content encompasses a variety of information, entertainment, platforms, news, and services tailored to meet local needs, and its availability stands as a cornerstone for boosting internet participation.

Consider Rwanda, where Kinyarwanda is spoken by 93% of the population. Providing services exclusively in English limits users’ ability to maximize online resources, including mobile banking. Technology-based solutions have emerged to bridge these language gaps. Translation tools such as Google Translate have simplified access to content in various languages. Tech startups like Localization Lab translate internet security applications and services into a multitude of languages. Media outlets have embraced diversity in their content, as evidenced by the BBC’s provision of a West African pidgin English news site, showcasing adaptability to oral languages.

Although these services provide immediate relief, they are not long-term solutions. A comprehensive approach entails empowering communities with the resources, knowledge, and skills to create, access, and share relevant content. Local internet exchange points and tech hubs can play pivotal roles in establishing ecosystems for internet content creation and regional development. Broader recognition among stakeholders, including user and citizen groups, service providers, and policymakers, is essential.

As the world advances toward greater internet adoption, prioritizing linguistic inclusivity within the digital landscape will be essential to building a place for collective thinking and intelligence.