Big Data for Justice

Development Data Lab
2 min readFeb 5, 2021

An open-access dataset of 80 million Indian legal case records

The 7000 odd courts that make up India’s lower judiciary processed more than 80 million cases between 2010–2018. That huge backlog and scarce resources plague Indian courts is well-known. But which districts bear the greatest burden? Where has delay in due process been the most crippling? Are the benches diverse — do they mirror the underlying population of a state? Have crimes against women been on the rise — are specific districts particularly notorious? These just scratch the surface of pressing questions on law and order in India, that can now be answered in a matter of minutes using the largest open-access dataset on judicial proceedings in the world.

The team at Development Data Lab has processed and de-identified legal case records for all lower courts in India filed between 2010–2018, using the government’s online case-management portal — E-courts. The result: charges, filing, hearing and decision dates, trial outcomes, and case type details of 25 million criminal, and 65 million civil cases. We have gone a step further and applied a neural network to identify gender of the accused, advocate, and judge for each case.

Using this data, we have already released a detailed academic study on in-group bias among judges. However, the underlying dataset can offer insights well beyond our research. For instance, let’s consider delay in court proceedings. The mean number of days it takes for a case to reach from filing to decision date ranges from 149 days in Gyalshing district in Sikkim, to 5 years in Chandauli district in Uttar Pradesh!

Mean days between filing date and decision date of cases (2010–2018) *No data: could not match case data to geographic coordinates data

That said, things seem to be improving across time, at least in some states.

*Note that the six largest states in terms of population have been displayed

Maybe resources need to allocated to districts that appear the most overburdened. This dataset allows policy-makers to do precisely that. For instance, the chart below indicates the number of open cases per court across districts in Delhi.

Our aim was to publish an open-access resource to transform our understanding of India’s lower courts. Our hope is that this resource will be used. We’ve cut down the time users would have to spent in acquiring, assembling, and making sense of the data (it took us two intense years of industrial data building).

We hope that this dataset will enable compelling investigative stories from India’s growing ranks of data journalists, insightful commentary by legal think-tanks, excellent research at the intersection of law & economics by students and faculty, and informed monitoring by policy-makers. This is part of our broader mission at Development Data Lab to make the wealth of data being generated in India and around the world truly accessible and useful for policymakers, researchers, and civil society.

by Aditi Bhowmick, India Director for the Development Data Lab



Development Data Lab

We develop cutting edge data sources and harness the latest analytical tools to help people in poverty around the world achieve their true potential.