SHRUG to NDAP: A scale-up story

Institutionalizing the SHRUG model at the national government level in India

Development Data Lab
7 min readMay 13, 2022

There is a familiar story in development policy research: a well-crafted, perfectly-designed policy intervention yields promising results, but falls apart when the proof-of-concept transitions from a research to a government implementation setting. There is no better test of a seemingly transformative policy than to embed it in local institutions to see how it works when launched at scale. Examples such as Deworm the World and Teaching at the Right Level demonstrate what is possible to achieve when programs are successfully scaled up through government institutions.

Development Data Lab (DDL) and ID Insight have partnered with the Indian government to undertake just such an ambitious scale-up in the public data sector. A few years ago, NITI Aayog, India’s national policy coordination committee, set out to address the challenges in India’s public data ecosystem by creating a new open data platform. India already has a robust infrastructure to collect rich administrative data, but such data are under-utilized as they are difficult to search, browse, access, and explore. NITI recognized that only by addressing these service delivery challenges could the potential value of the country’s public data be realized in both the public and private sectors.

Here at DDL, we have been developing technical solutions to address precisely this problem. We integrate high resolution data from disparate sources to be released on a single, harmonized data platform known as the SHRUG. Now working collaboratively with NITI Aayog, we have used the SHRUG as the technical proof-of-concept at the core of the new National Data Analytics Platform (NDAP).

What is the SHRUG?

The SHRUG is the largest, highest resolution, open access socio-economic dataset across developing country contexts. It provides seamlessly linked data at the village and town level across sources (satellite data to census records), across domains (firm activity to night lights), and across time (1990–present). It is a one-stop shop where publicly available data is served in an interlinked, ultra-clean, and accessible form for free.

Today, the SHRUG is the backbone for research ranging from the effect of the COVID-19 pandemic on livelihoods to the economic impact of India’s 40 billion USD rural road construction program to mapping criminal activity among politicians following mining booms. It has been downloaded tens of thousands of times by researchers across hundreds of universities, incorporated by journalists into data-driven stories, and used by state-level bureaucrats and civil society organizations to answer policy questions. The proof of concept was seemingly a success.

Representational image from the SHRUG Atlas — a beta tool for visualizing the data

However, like every first iteration, the SHRUG has limitations and can be improved.

First, the SHRUG primarily sources data from publicly available administrative records. Because census records and sample survey data are released sporadically, you won’t find real-time, high frequency data on the current iteration of the SHRUG.

Second, a lot of high-value administrative data languishes on gated government dashboards or in restricted management information systems. We may even be completely unaware of some of these datasets. Unless these are made publicly available, we cannot fold them into open data platforms such as the SHRUG.

Third, as a small, research-focused organization, we do not have the capacity to support a platform with a more sophisticated front-end that makes the data easy to search and explore. The current iteration of the SHRUG is best suited for use by researchers and technologically sophisticated civil society members.

Fourth, the current iteration of the SHRUG doesn’t allow merging and visualization across datasets on the web portal itself. Users must download the data and use their own software to execute any data merging or analysis.

Now, NDAP is poised to leverage the advantages of the SHRUG model in conjunction with the capacity afforded by government resources, making significant advancements possible.

What is NDAP?

The National Data and Analytics Platform (NDAP) is a NITI Aayog initiative aiming to solve the last-mile data delivery challenges of India’s open data landscape. NDAP sources data from across all government ministries. These data are initially hosted in any number of forms - HTML tables, PDF reports, interactive dashboards, etc. NDAP ingests these data into a machine-readable format then transforms them into a single, harmonized data model. This data processing pipeline was built following the SHRUG model, but scaled up to handle many more datasets. NDAP then hosts the data on a user-friendly platform where the data can be easily searched, explored, and visualized.

The NDAP platform

Why is the NDAP a big deal?

At first glance, NDAP already does not look like a typical government-run data portal. Its thoughtful user interface is intuitive and easy to follow, and does not assault the user with a clutter of information as soon as you open it up. However the design alone is not what gets us excited about NDAP.

For the first time, we have a one-stop portal for open-access government data in India that pulls data from all corners of the government machinery and makes it seamlessly linkable at sub-national geographic levels (state, district, sub-district, village/town). The user can merge the data on the platform itself and save several hours that would otherwise have been spent figuring out, for instance, how to align the modern Telangana districts with their 2010 Andhra Pradesh counterparts. All of this background work has already been done by the team that built the platform.

Merging data from the population and economic censuses to compare female population to female employment at the village level.

Users can investigate individual or merged datasets in tabular form or by creating visualizations.

The female literate population (from the Population Census) plotted against the number of women with formal employment (from the Economic Census) for each district in Uttar Pradesh. Hovering over each point will raise the informational box as shown for district Bulandshahr.

The platform also allows seamless spatial visualization of the data.

This map compares the percentage of women with 10 or more years of schooling, plotted as a heatmap to the percentage of women aged 15–49 using any modern family planning method, plotted with varying bubble size. Source: NFHS-4 (2015–16), generated on NDAP.

This is a huge win for government, researchers, civil society, and citizens. In the pre-NDAP era, government officials have had to wait several days to get access to data collected by a different ministry, department, or program — and work to decode these myriad data sources in the absence of clear documentation, undertaking the herculean task of stitching everything together. For researchers, instead of spending months and valuable resources on baseline surveys to understand the lay of the land, big picture descriptive insights can be generated at scale in a matter of minutes so that research can be conceived quickly and efficiently. Bureaucrats and program managers can link datasets across domains to answer bespoke questions about how to optimize service delivery or improve accountability. Overall, any user can start generating useful insights much faster than they ever would have been able to before.

The government of India creates massive amounts of digital exhaust. India’s statistical systems were way ahead of the curve in the 1950s under the stalwart leadership of PC Mahalanobis. In the past few decades, the statistical capacity of the country has fallen behind in terms of last-mile delivery. NDAP has the potential to put India back on the map as a leader in terms of statistical capacity and transparency in public policy among developing countries. The NDAP model is unique, and provides an example of India’s capacity to lead globally on mobilizing government data.

By providing real access to public data, NDAP is providing a service to the public that is long overdue. All of the data is accessible not only for download but also via an intuitive API. This means that NDAP is an open platform that citizens, businesses, researchers, and journalists can leverage to create new products and services. NDAP has the potential to positively disrupt the policy knowledge industry and finally turn the lofty talk surrounding “data for development” into tangible change.

Going back to the SHRUG…

By institutionalizing an open data platform like NDAP within the government, DDL hopes to help boost state capacity to fulfill its obligation to serve public data freely and transparently. Since NDAP is an initiative under NITI Aayog, the scope of administrative datasets that the platform can access and ingest is much larger than that of the SHRUG. Leveraging its policy coordination role, NITI Aayog is best placed to aggregate datasets across sectors, ministries and levels of government into the platform. We also hope that this first release of NDAP encourages ministries to form partnerships with NITI to directly link their data releases to NDAP.

Finally, one goal of the SHRUG was to bolster the norm of policy-making based on high-resolution data. However, the realization of this goal hinges on our team’s ability to reach out to policy-makers and assist them in gleaning insights from the data on our platform. Since NDAP is an in-house government product, the policy goal of creating a culture of granular-data based decision making across levels of government becomes much more feasible and direct.

Parting thoughts…

At DDL, we have been extremely excited about taking the core principles of interoperability, standardized data schema, machine readability, comprehensive metadata, and immediate usability from the SHRUG into the government context through NDAP. This is an opportunity to institutionalize and scale the gains we have realized in our own experiment with the SHRUG.

The most effective interventions stand the test of last mile delivery through government implementation. Our hope with NDAP is similar. We hope the open data experiment tested by SHRUG will be succeeded by a sustained open-data culture across governance levels and departments in India that outlasts our involvement and stands on its own.

— Aditi Bhowmick and Alison Campion, Development Data Lab

--

--

Development Data Lab

We develop cutting edge data sources and harness the latest analytical tools to help people in poverty around the world achieve their true potential.