Home > Archives (2006 on) > 2019 > Crocodile Tears about Data Availability: A Tragic Story Never (...)

Mainstream, VOL LVII No 32 New Delhi July 27, 2019

Crocodile Tears about Data Availability: A Tragic Story Never Told

Saturday 27 July 2019

by Atanu Sengupta and Sanjoy De

Recently there has been much hue and cry regarding the publication of the National Sample Survey Organisation (NSSO) data. It has been a signed documentation of 108 eminent economists and social scientists that lay bare the supposed intention of the present Central Government in playing a hide-and-seek game with important information that the nation demands. The document was substantiated by similar withholding of important information in the field of employment generation, migration status and other such hallowed arenas. While it is true that there have been some covert exercises recently on the part of the government regarding some important data, it would be inappropriate to suppose that the previous governments were somehow indifferent or innocent babes in this regard.

The story of data hiding and manipulation goes beyond ages. We have the agonistic version of the entire Bible discovered from the Jabal al-Tarif caves about 5 km north of the Egyptian town Nag Hammadi. This text gives a completely different perspective to the common discourse of the Bible. In the historical ages, we see instances of data manipulation and distortion. Banabhatta fails to mention the defeat of Harsha Vardhana at the hands of Pulakeshin II. However, the Aihole Inscription of Ravi Kirti at the Meguti temple mentions about the defeat of Harsha Vardhana by the Chalukya king Pulakeshin II. Down the line, the British version of India’s day-to-day life can provide little clue to the recurring famines. Even independent India is not immune to such distortions.

We will concentrate here on the deliberate distortion of agricultural data, particularly on the farm income and expenditure. Indian farmers generally cultivate a number of crops utilising various parts of their plots. They are also engaged in non-farming activities to augment their income. This was the aim of the original farm management survey exercise. The exercise concentrated on some select areas where the data is collected for a few years consequently. Under such a system, the complete cost structure of the farmers, including various crops cultivated, animal husbandry, non-farm businesses and other items were taken into account. The farm management survey also collected data on the socio-economic features of the farmers and their families.

The scheme was discontinued since the early 1970s. It was replaced by a new scheme— comprehensive cost of cultivation scheme. In this scheme, the cost of cultivation of various crops are collected and presented. A standard sample procedure is used to select tehsils and farmers. As Abhijit Sen and M S Bhatia (2004) argued that this data is available at the public domain for ready use. Data is available at the plot level too. It appears to be a rather rich data giving us plot level information for various crops across various states.

A close look at the data, however, sends shiver along our spine. The data we get gives plot level information on input and output of different crops. Also, cost is provided. But, this information is information of a dissected nature. Suppose we get the plot-level data for aman paddy in West Bengal. This data merely tells us about the technicalities of production and cost for the crop.

But Economics is not a technical science. It is a human science. We need to have information about the producers who produced these crops. The current data does not give us any information about the various crops that a cultivator cultivates besides paddy. It is true that the information on wheat or other crops is provided elsewhere, but we have no threads of information to identify and combine them. Also, we cannot learn about other crops that the farmers have cultivated if this crop is not listed in the comprehensive cost of cultivation scheme. Hence the information obtained through this domain is dissected and often meaningless.

Also we have no information about the socio-economic information about the farmers and their families. We do not obtain information on the educational status, social category, gender composition and such other issues that have implications on the farmers’ choice. Hence, we cannot say anything about the general welfare of the farming community.

The answers to the crucial questions that are burning today—farmers’ suicides and distress—lie outside the capacity of the present public data. Thus, the ‘rich data’ that the government has been publishing is actually a headless monster. It provides enough technicalities but little light on the social and economic regularities that dictate them. We can use it to find the production-function relationship or efficiency relation. Unfortunately, like the headless statue of Kanishka, it gives us no impression of the king.

This is true of many other data sets. The “huge” Census data gives us no economic data on asset1 or income. Economists often cull it from the NSSO data that gives this information. Even where Census gives data this is not complete. We have figures of families with latrines but no data on their usability. We have data on electrification but not its affordability. We have data on housing but nothing about its location. Even literacy data is incomplete. What do we mean by literacy? The ability to read or the ability to read and write or the ability to read, write and comprehend? In his autobio-graphy, E.M.S Namboodiripad stated how as an youngster he was forced to memorise the entire Veda without having the ability to comprehend it. Similarly it was found that women in Bangladesh would recite ayat from the Quran without comprehending them. In common parlance people have been found to memorise their signature or name. Besides that, they can write nothing. The word literacy creates a lot of interpretational problems.

In all, working with data is really a sensitive issue in a developing country like ours. This has become an added nuisance due to the increasing sensitivisation of data among the political parties. The current debate on NSSO data is only a fuel to the ongoing fiasco. Only determined efforts on the part of the public on this sensitive issue can remedy this problem.


  • The so-called listed Census assets are of little value. The ownership of various assets without their valuation is a misnomer. The chief culprit is the bank account use data. We merely know the number of persons availing banking services. In the era where most government financial benefits come through banks and the rapidly expanding Dhan Yojana, this variable loses its meaning.

Atanu Sengupta is a Professor, Department of Economics, Burdwan University, Burdwan, (West Bengal).

Sanjoy De is a Research Scholar, Department of Economics, Burdwan University, Burdwan (West Bengal).

Notice: The print edition of Mainstream Weekly is now discontinued & only an online edition is appearing. No subscriptions are being accepted