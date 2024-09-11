India’s digitisation journey has been progressing at a breakneck pace, but the disparity is wide. The Indian Council for Research on International Economic Relations, an independent public policy organisation, in its State of India’s Digital Economy report 2024, proposed a new framework to measure where a nation’s digital economy stands, based on certain parameters.

“We have huge datasets. We’re the largest population in the world and the most diverse, but these datasets are mostly all on paper,” said Raval.

On an aggregate digital economy level, India ranks third in the world, behind the US and China but that score drops to the 12th spot when ranking user score. The reason being that India makes consistent advances in the production of new technologies like AI, as well as increased investment in startups. But the country lags in the adoption of older, standard technology like broadband and internet accessibility.

The challenge is ensuring how we can ensure that our data is digital at source. That’s when speech recognition, combined with LLMs to understand what’s being spoken, can be converted into structured databases, to make a very powerful application, Raval said.

The lack of gold standard data and structured databases has been a common complaint in India, particularly from those developing AI. Raval offers a different way of looking at it.

“You really have to ask the question, what do you really need gold standard data for?”

To train an AI or a model, you need data that fits the context. It must be data that comes from the same contextual setting that it seeks to serve. For example, if you’re building a model for a large hospital setting with top quality equipment, curating and carving out gold standard datasets is possible.

That wouldn’t work so well in a rural, farm setting. There, building gold standard datasets is simply not possible. “Your AI should in fact be trained on a certain level of noisy data, so that it can understand ‘noise,’” Raval said.

Throwing in other information likely won’t be useful. Additionally, by the time a model makes its way into the hands of users, practical realities are starkly different from what the model is trained on.