India boasts of one of the youngest populations globally with an average age of 29.a A report published by the Federation of Indian Chambers of Commerce and Industry (FICCI) in 2013 estimated that to train this large young population, the country must build six new universities and 270 new colleges every month in the next 20 years.7 It was an impossible target! Or so it seemed until 2014 when IIT Kharagpur conceptualized the National Digital Library of India (NDLI; https://ndl.iitkgp.ac.in/) with an aim to bring equity of access to educational resources for every Indian through a single window access mechanism. NDLI, a project funded by the Ministry of Education, Government of India, is a meta-library—a portal that connects users to hundreds of libraries in India and abroad and provides more than 81 million forms of educational content, including books, lecture videos, research articles, and more in over 100 languages, including several vernaculars used throughout the country.
In pre-NDLI era, there were many isolated initiatives on digital education in the country, including the growing number of institutional repositories with their disparate metadata schema, archives of video lectures such as NPTEL,b and national thesis repositories like Sodhganga.c NDLI provided the long-awaited integration among them by becoming the common portal through which each of them is seamlessly accessible.
In the international scenario, while many major digital libraries such as Europeana, DPLA, Canadiana, Trove, World Digital Library, and Digital NZ focus on culture and heritage, NDLI, respecting national need, has been designed to focus on the educational space. Unlike general-purpose search engines, NDLI indexes only educational resources based on their corresponding metadata. NDLI also exposes several filters that allow the user to refine search results based on educational metadata (for example, subject domain, educational level, educational degree, and learning resource type). This helps users to express and iteratively refine their search intent using precise metadata values—a feature missing in general-purpose search engines. Figure 1 shows the NDLI interface: it offers both browse and search options; the user can customize the retrieved results using metadata-based filters visible on either side of the interface. Many of the educational resources in NDLI are organized into collections, for example, schoolbooks are organized into curriculum, thereby allowing users to easily discover related resources. One of the other exclusive features of NDLI is free access to special resources like South Asia Archive, the World eBook Library, and the digitized 'red books' of the Oscar-winning film-maker Satyajit Ray, that are not otherwise publicly accessible.
Challenges and Mitigations
Designing a digital library of this scale posed several technical challenges related to:
- Metadata harvesting. NDLI harvests metadata of resources from different sources. Noise, diversity, and the sheer volume of metadata pose significant difficulties in integration. The proposed NDLI schema combines three globally accepted metadata standards, namely, Dubin Core (DC), Learning Resource Metadata Initiative (LRMI), and Electronic Thesis and Dissertation (ETD) schema to accommodate its wide spectrum of resources, while being open to incorporate new profiles in the future.4 A semiautomatic metadata curation pipeline was designed to map, enhance, and sanitize metadata records from different sources into the NDLI schema at hyper scale.
- Large-scale indexing and access. A core requirement is to ensure NDLI can index and maintain this massive data. Further, NDLI should be accessible simultaneously by thousands of users. To achieve this, the search index (500GB) of NDLI has been organized into a federated search architecture powered by Apache SolrCloud. The resulting system can handle a very large number of queries concurrently while being highly robust even under disruptive circumstances (through disaster recovery setup). The NDLI database presently amounts to 71.5 million records (32GB). NDLI exposes a flexible API to third parties to encourage the development of new tools and applications.
- Multilinguality. To cater to the multilingual community in India, support for a multilingual interface, multilingual resources, and multilingual access mechanism is required. Currently, the NDLI interface is available in 11 Indian languages and search can be performed in three languages, namely, Hindi, Bengali, and English.
- Copyright protection. For most of the resources, NDLI stores only the metadata that is generally not copyrighted. Once a user clicks on a resource link, she is directed to the appropriate repository that is then responsible to protect the copyright of its resources. While many source organizations provide full text access to all their documents, some require users to furnish additional information for access to full text versions, for example, the user must log in to the source or send an explicit email request to the source for full text versions or have a subscription for full text. There are some documents that can be accessed free of cost by logging into NDLI but must be paid for if accessed from some other site. An icon beside each resource alerts the user about the access rights associated with it. Presently, NDLI resources are categorized into five access types: Open, Subscribed, Authorized, NDLI members only, and Limited. A high-level architecture of NDLI is illustrated in Figure 2.
NDLI users comprise primary, secondary, and higher-secondary school students, undergraduates, postgraduates, research scholars, and lifelong learners. Typical use cases include faceted search and browse of the repository through multiple modalities (for example, by subject domain). Access is also provided to specially curated collections like Exam Preparatory, IIT-JEE Preparatory and News Archives. Presently, around 6.4 million active users access NDLI resources. Most traffic includes users from higher education institutions. Figure 3 presents some statistics on repository growth and usage of NDLI.d
During COVID-19, NDLI created a virtual collection Study at Home, which includes specially curated resources on all subjects for students appearing for Class-X and Class-XII examinations. Not surprisingly, there was a sharp rise in daily document views after this feature was introduced. Another valuable addition has been the COVID-19 Research Resources Repository containing COVID-19 research articles, blogs, and preprints. NDLI's contributions in digital learning during COVID-19 have been recognized through several international awardse and recognition.f In this context, it is also worthwhile to quote Sundaram "Sundy" Srinivasan, President, PanIIT USA:
"The Indian diaspora as well as the rest of the world can now avail knowledge from this multilingual, multi-subject, multimedia digital repository [NDLI] that any individual learner or a lifelong learner can avail."
Despite the enormous repository size and user-friendly discovery services in NDLI, it is not an easy task to carry its benefits to every learner in the country. Therefore, virtual NDLI Club service (see Figure 4) has been designed to facilitate and coordinate activities around NDLI resources at different institutions. This service aims at improving user engagement with NDLI by organizing competitions, training sessions, and workshops at the institution level, regional level where multiple institutions participate, and global level. Presently, more than 2,600 NDLI clubs have been established across the country.
Computing Research at NDLI
The design and implementation of NDLI services have spawned several research challenges, some of which have been deliberated under the ambit of NDLI:
- Metadata extraction and information retrieval. For metadata enrichment of scholarly papers, NDLI has developed several deep learning-based algorithms to extract key phrases from scholarly articles10 and to semantically segment abstracts for better indexing.1 To facilitate search for school-level content, a search auto-completion technique powered by NDLI metadata has been developed and found to be effective in a user study.8
- Augmented access. To provide alternate access to paywalled research papers, the concept of a surrogate of a paper has been advanced. Surrogator is an application that points users to a free surrogate of Bhowmick et al. an access-restricted article.9,11 Algorithms have been designed for topical segmentation and augmentation of long lecture videos, thus enabling quick topic-based access,2,3,5 thereby improving the learner's experience.
- Construction of data repository. A notable initiative is the sister project, Comprehensive Archive of Imaging in Cancer (CHAVI), which aims to construct a national bank of annotated medical images with a flexible query interface and link it with a pipeline of radiomic services.6
NDLI has seen enormous growth in user engagement in the last few years, especially in the post-pandemic era. It has been largely able to take digital education to the remote corners of the country. Presently, NDLI is in the process of transforming into a community-driven service where the collections can be managed by relevant communities, and quality learning resources can be created in vernaculars using semiautomatic translations. All these lead to the future prospect of crowdsource-based user community contributions in NDLI.
1. Banerjee, S. Sanyal, D.K., Chattopadhyay, S. Bhowmick, P.K., and Das, P.P. Segmenting scientific abstracts into discourse categories: a deep learning-based approach for sparse labeled data. In Proceedings of the ACM/IEEE Joint Conf. Digital Libraries, 2020.
3. Das, A. and Das, P.P. Semantic segmentation of MOOC lecture videos by analyzing concept change in domain knowledge graph. In Proceedings of the 22nd Intern. Conf. Asian Digital Libraries. Springer, 2020, 55–70.
5. Ghosh, K., Nangi, S.R., Kanchugantla, Y, Rayapati, P.G., Bhowmick, P.K., and Goyal, P. Augmenting video lectures: Identifying off-topic concepts and linking to relevant video lecture segments. Intern. J. Artificial Intelligence in Education (2021), 1–31.
8. Sadhu, S. and Bhowmick, P.K. Automatic segmentation and semantic annotation of verbose queries in digital library. In Proceedings of the 22nd Intern. Conf. Theory and Practice of Digital Libraries. Springer, 2018, 270–276.
9. Santosh, T.Y.S.S., Sanyal, D.K., Bhowmick, P.K., and Das., P.P. Surrogator: A tool to enrich a digital library with open access surrogate resources. In Proceedings of the 18th ACM/IEEE-CS Joint Conf. Digital Libraries, 2018, 379–380.
10. Santosh, T.Y.S.S., Sanyal, D.K., Bhowmick, P.K., and Das., P.P DAKE: Document-level attention for keyphrase extraction. In Proceedings of the 42nd European Conf. Information Retrieval, 2020, 392–401.
©2022 ACM 0001-0782/22/11
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and full citation on the first page. Copyright for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or fee. Request permission to publish from [email protected] or fax (212) 869-0481.
The Digital Library is published by the Association for Computing Machinery. Copyright © 2022 ACM, Inc.