1 Introduction to Digital Library

Jagdish Arora


I.     Objectives


The Objectives of this module is to i) introduce basic concepts and characteristics of digital libraries to the learners; ii) define digital libraries and highlight important differences between digital libraries and its precursors and technologies such as traditional library, information retrieval systems, virtual libraries, Internet search engines, etc.; and iv) introduce terminologies that are associated with digital library.


II.    Learning Outcomes 


After going through this lesson, the learner would gain knowledge about basic concepts and characteristics of digital libraries and different terminologies that are associated with it and are used interchangeably to refer to digital libraries. The learner would have gained knowledge about different types of digital libraries and technologies behind it.


III.    Structure 


1.       Introduction

2.      Traditional Library, Information Retrieval System and Digital Library

3.      Electronic, Virtual, Hybrid and Digital Libraries

4.      Characteristics of Digital Libraries

5.      World Wide Web (WWW) V/s Digital Library

6.      Digital Library: Towards a Definition

7.      Why Digital Library?

8.      Some Important Digital Libraries

8.1.  New Zealand Digital Library (http://www.nzld.org/)

8.2.  Networked Computer Science Technical Reference Library (http://www.ncstrl.org/)

8.3.  ArXiv.org (http://www.arxiv.org/)

8.4.  ScienceDirect (http://www.sciencedirect.com/)

9.      Summary




1.  Introduction 


The computerization of the library during past few decades have focused heavily on the creation of surrogate records of printed documents available in a library or for providing computerized services through secondary databases held locally on CD ROM or magnetic tapes. The scope and functions of integrated library packages, till recently, were essentially restricted to providing access to documents at bibliographic level. The new versions of, integrated library packages, however, tend to provide additional features and functionalities akeen to digital libraries. Similarly, secondary information systems such as MEDLINE, INSPEC, COMPENDEX+ and CAS were essentially designed to serve as an effective tools for bibliographic control of research information. However, since these databases provide  only bibliographic information on research articles, users had to depend heavily on physical collection available either in their institutional library or on inter-library loan from other libraries for references retrieved from the secondary services. Several attempts were made the in past to make the full-text of research articles available through online search services, although technology available till late 1980s and early 1990s supported only simple text (ASCII) without graphics. As such, more than 1,700 full-text journals were available through online search services like DIALOG and STN uptill 1989 although as simple text without graphics. Tools, techniques and protocols necessary for building-up digital libraries evolved with availability of computing power that allow parallel processing, multitasking, parallel consultation, parallel knowledge navigation and software tools that facilitate artificial intelligence and interactivity. Coincided with the availability of software, hardware and networking technology, the advent of the world wide web (WWW), its ever increasing usage and highly evolved browsers have paved the way for the creation of digital libraries. With rapid developments in the technologies necessary for developing digital libraries, the world of digital information resources has expanded quickly and exponentially. An Increasing number of commercial and society publishers are using the Internet as a global way to offer their publications to the international community of scientists and technologists. Resultantly, increasingly large numbers of STM (Science, Technology and Medical) electronic journals are appearing on the web. Digital information resources include not only rapidly growing collections of electronic full text resources, but also images, video, sound, and even object of virtual reality.


The most significant shift is in building digital collections is greater interoperability among information systems across the networks. With the technology available at an effordable cost, the libraries are initiating small digitization projects as individual library or as a group of libraries. Building-up digital collection and infrastructure required to access them is a challenge that every library has to deal with. Today’s digital libraries are built around Internet and web technologies with electronic journals as their building blocks. The increasing popularity of Internet and developments in web technologies are catalyst to the concept of digital libraries. Figure 1 is a pictorial representation of digital library infrastructure and services that can be generated from them.


Growth and developments in digital libraries can generally be attributed to emergence of Internet, particularly world wide web (WWW) as a media of information delivery and access, availability of highly evolved, extraordinarily simple and intuitive user interface, i.e. Internet Explorer and Chrome and advances in online storage technologies enabling storage of large amounts of contents at increasingly affordable cost. The product and services that goes into digital library comes from electronic publishing.


Several terms have been coined at different times to represent the concept of library without books, libraries having information in computer-readable format or having access to information in digitized or digital format. The terms which have been in vogue at different times include: paperless library, an electronic library, virtual library, library without boundaries and more recently digital library. The term digital library, at one hand, is used to refer to a system or applications whose function is chiefly to extend electronic access to material available in a conventional library to remote user, on the other hand, it is used to describe both commercial and academic systems designed to enable electronic access to a large corpora of electronic documents to authorized users.


The term digital library may mean different things to different people. It has been applied to an extraordinary range of applications and is frequently used to denote one or more of the followings:


•    Collections in which complete contents of documents (as opposed to bibliographic citation or abstracts) are created or converted to computer processible form for online access;


•    Providing digital access to material that already exists within traditional library collections, i.e. libraries of scanned image, images of photographic or printed texts, digital video segments;


•    Scientific data sets like protein sequences or nucleic acid sequences, etc. Software libraries or multimedia works are often referred to as a digital library;


•    Online databases and CD ROM information products, particularly those with multimedia or interactive video components or those which contain the complete contents of books or other publications;


•    Computer storage devices on which information repositories reside, such as optical discs, juke boxes, CD ROM / DVD ROM towers, etc.;


•    Database, including library catalogue accessible through the Internet; and


•    Digital audio,  video clips or full-length movies.


The only thing common about the range of products and services mentioned above are their being “digital” or “digitized”. While some of the above mentioned products and services qualify to be digital libraries, others do not qualify the characteristics and definition of a digital library given later in this chapter. The relatively recent use of the term “digital library” can be traced to the “Digital Library Initiatives” funded by the US National Science Foundation, the Advanced Research Projects Agency, and the National Aeronautics and Space Administration (NASA) in the United States. In 1994, these agencies granted US$ 24.4 million to six universities in US for digital library research impelled by the sudden explosive growth on the Internet and web technology. The term was quickly adopted by the computer scientists, librarians and others. Thus, while the term “digital library” is relatively new, the concept behind the term and information resources consisting of digitized resources has a history spanning several years.


A digital library is not merely a collection of electronic information, it is an organized system of information that can serve as a rich resource for its user community. The library and information science community treat digital libraries as “logical extension and augmentation of physical libraries in the electronic information society (Marchionini, 1998). The digital library extend and augment their physical counterparts by extending existing resources and services and enable development of new possibilities for information access and retrieval (Fox, 1998).


2.  Traditional Library, Information Retrieval System and Digital Library 


The services and collection in a traditional library are built around its physical possessions consisting of books, journals, microforms, video and audio cassettes, technical reports, theses & dissertations, standards and patents, etc. The primary purpose of a Library OPAC is to indicate the physical location of a document in the library. In a traditional library environment consisting of physical collections, it is necessary for a user to either come to the library or get the document in order to use it. Moreover, only one person at a time can use a physical document. However, traditional libraries offer additional social and educational benefits. Besides, most traditional libraries also offer hybrid services. Digital library removes physical restrictions that prevail in traditional libraries, provides multiple access, multiple listings and electronic transmission of its collection. Moreover, newer Web 2.0 / Library 2.0 applications now enable digital libraries to offer possibilities of “social networking” and “tagging” in web environment thereby imitating some of the social and educational benefits offered by traditional libraries. Digital libraries, however, come with complications such as: intellectual property, rights management, digital preservation, licenses and terms and conditions, etc.


Information retrieval systems (IRS) can be considered as precursor to the digital libraries. IRS is built with bibliographic databases as target for searching and retrieving textual information stored in them. The process of searching an information retrieval system is based on exact matching from string of text stored in bibliographic database using Boolean and proximity operators. Mistakes in the IRS system at the time of data entry or in search query results in mismatch. Digital libraries, in contrast, are based on pattern searching and inexact matching. While IRS provides metadata access only, digital library provides access to metadata and data. Migration from information retrieval system to digital libraries coincides with the development of full-text e-resources and spread of World Wide Web (WWW).


3.  Electronic, Virtual, Hybrid and Digital Libraries 


While the terms digital libraries and electronic libraries are used interchangeably and synonymously, the term “virtual library” or “library without wall” usually refers to the meta resources or subject portals that extend virtual accessibility of digital collections from several diverse sources without the users even knowing where the resource actually resides. Unlike digital libraries, virtual libraries do not consist of full-text resources, instead they are more like an index of relevant, hand-picked links to resources available on the Web. A virtual library could potentially be enormous, linking huge collections from all around the globe, or it could be very small, consisting of a few hundred links to digital resources maintained by an individual. The concept of “Hybrid Library” (Rusbridge, 1998) reflects the realities being faced by libraries as they attempt to integrate electronic resources acquired on CD ROM or other media or electronic access that they buy with the digital collections produced in-house. The hybrid library can be considered as a transitional phase between the conventional and digital library, where electronic and paper-based information sources are used alongside each other. The challenges associated with the management of hybrid library is to encourage end-user resource discovery and information use, in a variety of formats and from a number of local and remote sources in a seamlessly integrated way (Schawrtz, 2000). The hybrid library should be designed to bring range of technologies from different sources together in the context of a working library. In effect, a hybrid library maintains all or a major part of its collections in computer-processible form as an alternate or to supplement or to complement the conventional printed materials that exist in the libraries. It has a web- enabled computerized catalogue (WebPAC) accessible through the Internet and most of other in-house services like acquisition, books processing, circulation are computerized. A hybrid library has a strong presence on the Internet with a Home Page for the Library providing an integrated access interface, not only to digital collections available locally, but also to other commercial and non-commercial web-based digitized collections accessible to the library across the world.


4.  Characteristics of Digital Libraries 


A digital library promises a one-step, equitable and timely access to vast amount of diverse resources in a shared mode in a given specialty lifting traditional barriers of time and space. Digital libraries have the following characteristics associated with them:


•   Digital libraries are the digital counterparts of traditional libraries and include both electronic (digital) as well as print and other (e.g. audio, video, graphics, animation, etc.) material;


• Digital libraries are not bound to physical spaces. Different components of digital library may be distributed to different locations that works coherently so as to meet the requirement of users;


• Requirement of physical spaces in digital environment reduces essentially for i) housing servers for hosting digital content; ii) PCs as clients for accessing digital content; and iii) staff for maintaining digital libraries.


•   A digital library owns and controls the information, it provides access to information, not just a pointer to it;


•   A digital library has a unified organizational structure with consistent points for accessing the data;


•   A digital library is not a single entity, it may also provide access to digital material and resources from outside the actual confines of any one digital library;


•   Digital libraries support quick and efficient access to a large number of distributed but interlinked information sources that are seamlessly integrated;


•   Digital libraries offer access to its content to multiple users simultaneously, these content can be listed in multiple ways by different users simultaneously;


•  Digital libraries have collections that i) are large and persist over time; ii) are well- organized and managed; iii) contain many formats; iv) contain objects and not just their representations; v) contain objects that may be otherwise unobtainable; and vi) contain some objects that are born digital; and


•  Digital libraries include all the processes and services offered by traditional libraries though these processes will have to be revised to the accommodate difference between digital and paper media.


5.  World Wide Web (WWW) V/s Digital Library 


The World Wide Web (WWW) or the Web is a collection of thousands and thousands of documents and is considered as a digital library by many people. The web is means by which most digital libraries are accessed, but it is not a digital library itself although it has several features of a digital library. The web, unlike a digital library, is an unorganized collection of documents, many of them ephemeral information which does not have any durability or lasting value. Most search engines hunt down their holdings from web sites distributed across the web space, whereas digital libraries are generally more tightly controlled, and have a targeted customer set.


Today’s digital libraries are built around Internet and web technologies. While the Internet serves as the carrier and provides the contents delivery mechanism, the web provides the tools and techniques for content publishing, hosting and accessing. The increasing popularity of Internet and developments in web technologies are catalyst to the concept of digital libraries. Further, availability of computing power that allow parallel processing, multitasking, parallel consultation and parallel knowledge navigation, put together, creates a semblance of artificial intelligence and interactively necessary for developing a digital library. Coincided with the availability of software, hardware and networking technology, the advent of World Wide Web (WWW), its ever increasing usage and highly evolved browsers have paved the way for creation of a global digital library.


6.  Digital Library: Towards a Definition 


The Association of Research Libraries, (Waters, 1998) one of the leaders in collaborative digitization programs in US, assigns following tenets to a digital library:


•    The digital library is not a single entity;

•    The digital library requires technology to link the resources of many;

•    These linkages between many digital libraries and information services are transparent to end-users;

•    Universal access to the digital libraries and information services is the goal; and

•    Digital Library collections are not limited to documents surrogates, they also include digital artifacts that cannot be represented or distributed in printed formats.


Borgman (1992) emphasized that digital libraries should not be viewed only as a point of access to digital information, but as a combination of


•   a services;

•   an architecture;

•   a set of information resource, databases of text, numbers, graphics, sound, music or animation, etc.; and

•   a set of tools and capabilities to locate, retrieve and utilize the information resources available.


Terence R. Smith (1997), defined digital libraries as “controlled collections of information bearing objects (IBOs) that are in digital form and that may be organized, accessed, evaluated and used by means of heterogeneous and extensible set of distributed services that are supported by digital technology”.


Clifford Lynch (1995), a well-know expert on digital libraries and new technologies, defined digital library as “a system providing a community of users with coherent access to a large, organized repository of digital information and knowledge. The digital library is not just one entity, but multiple sources seamlessly integrated.”


Michael Lesk, who predicts that half of the materials accessed in major libraries will be digital by the early 21st century (Lesk, 1997), defines digital libraries as “organized collections of digital information that combine the structuring and gathering of information, which libraries and archives have always done, with the digital representation that computers have made possible. Digital information can be accessed rapidly around the world, copies for preservation without error, stored compactly, and searched very quickly. A true digital library also provides the principles governing what is included and how the collection is organized” (Lesk, 1997).


Emphasizing management aspects of digital collections and services, Arms (2000) defines digital libraries as “managed collection of information, with associated services, where the information is stored in digital formats and accessible over a network”. Laying emphasis on digital technology, Oppenheim and Smithson (1999) define digital library as “an information service in which all the information resources are available in computer processible form and functions of acquisitions, storage, preservation, retrieval, access and display are carried out through the use of digital technologies.


Painting a multi-dimensional picture, Marchionini and Fox (1999) identified the following four dimensions of digital libraries:


i.  Community: Reflects social, political, legal and cultural issues;


ii.  Technology: includes technical progress in computing, networking, information storage and retrieval, multimedia, interface design, etc.;


iii. Services: includes present and future services, personalization, digital reference services, real-time question answering, on-demand help, information literacy and user involvement mechanisms; and


iv. Content: represents all possible kinds of forms and genre of information, printed as well as digital.


It is critical that digital libraries provide an organized and structured access to information contents in a distributed environment and assist users in searching, evaluating and utilizing resources irrespective of their format. Digital libraries combine collection and expertise in a seamless interface, and therefore, require specialized staff to select, organize, evaluate, interpret, offer intellectual access, preserve the integrity and ensure the persistence over time of digital works so that they are readily and economically available for use by a defined community or set of communities (Waters, 1992).


7.  Why Digital Library? 


The unprecedented surge of activities and interest in digital library can generally be attributed to the following three factors:


i. Emergence of Internet and web technologies as a media of information delivery and access. The Internet, particularly world wide web (WWW), allows rapid access to a wide variety of networked information resources extending a uniform interface to a vast number of multimedia resources. The web, being a hypermedia based system, allow linking amongst electronic resources;


ii.  Availability of highly evolved, extraordinarily simple and intuitive user interface, i.e. Internet Explorer and Netscape Navigator for all prevalent platforms; and


iii.  Advances in online storage technologies enabling storage of large amounts of contents at increasingly affordable cost.


The digital library offers significant and unparallel improvement and value addition to library services while providing workable solutions to problems traditionally associated with the management of print-based collections in traditional libraries. Improved information retrieval and enhanced document delivery capabilities are widely acclaimed strength of digital libraries. Moreover, the cost of creating, storing, manipulating and transmitting digital information has decreased considerably providing necessary impetus to the digital library initiatives world wide. Rising acquisition and subscription fees have forced the libraries to find other means to make information available to their users and content aggregators and electronic publishers are providing means to do so.


Several large-scale digitization projects are aimed at conserving and preserving old, fragile and deteriorating documents of high scholarly value not only for preserving them but also for providing increased access and search possibilities that become possible once the documents are available in computer-processible form. Digital libraries enable greater access to digital contents, can be managed from remote locations and provide a way to enrich the teaching and learning environment. Since information in digital library is electronically stored and accessed, it is not bound to space and time. Digital library systems can be accessed simultaneously by multiple users guaranteeing continuous availability of documents. Digital library implementation can dramatically reduce floor space requirements as compare to conventional shelf-type storage of books and journals.


8. Some Important Digital Libraries 


8.1. New Zealand Digital Library (http://www.nzld.org/)


The New Zealand Digital Library, maintained by the University of Waikato, provides web access to several document collections, with varied subject content, languages and formats. It includes historical documents, humanitarian and development information, computer science technical reports and bibliographies, literary works, and magazines. Content formats include text (ASCII, PostScript, PDF), graphics, audio and video. The NZDL supports a simple but powerful bibliographic and full-text search and browse interface,  including  hierarchical  browsing  and  display.  NZDL  has  given  a  lot  of importance to structuring the search, browse and display interfaces, making them user friendly. Metadata (author, title, keywords, etc.) plays a key role in supporting field- based searches and browsing.   The NZDL has been built using the Greenstone Digital Library software, developed out of a research programme at the University of Waikato. The Greenstone Digital Library software (GSDL) is available freely as open-source, under the terms of GNU public license.


Fig. 1:  New Zealand Digital Library (http://www.nzld.org/)


8.2.   Networked Computer Science Technical Reference Library (http://www.ncstrl.org/)


The Networked Computer Science Technical Reference Library (NCSTRL) is an international collection of computer science research reports and papers made available by more than 100 participating institutions worldwide. The Majority of the institutions are universities and research laboratories. While the full-text reports are maintained on servers in participating institutions, a central index of bibliographic details is maintained for all full-text records with searching and linking facilities. The index is updated automatically when a new report is added to any of the institutional servers. New institutions can join NCSTRL and use free software made available freely. The NCSTRL is being shifted to OAI-compliant repository using E-prints software for institutional repositories and ARC for harvesting metadata from distributed repositories.


Fig. 2: Networked Computer Science Technical Reference Library (http://www.ncstrl.org/)


8.3. ArXiv.org (http://www.arxiv.org/) 


The ArXiv (http://www.arXiv.org/), started in 1991 by Paul Ginsparg at Los Alamos National Laboratory, is the oldest eprint archive. The repository, now hosted at the Cornell University, has become a fundamental means of communication for a growing number of fields, starting with theoretical high-energy physics, later spreading to other areas of physics, and now also to computer science and mathematics. ArXiv is leading example of successful implementation of developments in information technology, which led to an alternative model of scholarly communication. This archive processes 35,000 submissions every year. It receives two-thirds of its two million weekly hits from institutions outside the United States, including many research facilities in developing regions. The arXiv has become indispensable to researchers world wide, but in particular to research institutions in developing countries. The success and widespread adoption of arXiv has prompted establishment of institutional archives and subject-based digital repositories in different disciplines. Scientists and librarians have become aware of benefits of open access archiving that is being considered as an alternative method of scholarly publishing. The Institute of Mathematical Sciences, Chenai, maintains the Indian mirror site of ArXiv.


Fig. 3: ArXiv.org (http://www.arxiv.org/)


8.4.    ScienceDirect (http://www.sciencedirect.com/)


ScienceDirect is the web-based interface to the full-text database of a commercial publisher, namely Elsevier Science, one of the world’s largest providers of scientific, technical and medical (STM) literature. ScienceDirect contains over 25% of the world’s science, technology and medicine information. It offers a rich electronic environment for research journals, bibliographic databases and reference works. The database offers more than 2,000 scientific, technical and medical peer-reviewed journals, over 59 million abstracts, over 7 million full-text scientific journal articles, an expanding suite of bibliographic databases and linking to another one million full-text articles via CrossRef to other publishers’ platforms. In addition, the Backfiles program of ScienceDirect offers the ability to search a historical archive of over 6.75 million articles directly from the desktop of a user, back to Volume 1, Issue 1.


Fig. 4: ScienceDirect (http://www.sciencedirect.com/)


9.      Summary


Digital libraries are amongst the most complex and advanced form of information systems. Deployment of digital library requires integration of several information technologies because of many diverse requirements involving creation of digital contents, its organization, ontology, development of interactive interfaces for users, multiple accesses and listings, digital document imaging, OCR, distributed database management, web technology, hypertext, information storage and retrieval system, experts system, intellectual property rights, integration of multimedia information services, management of multilingual collection, data mining, electronic and real-time reference service, electronic document delivery and personalization. Due to these unique challenges and opportunities, the digital libraries are emerging as a growing interdisciplinary area of research and education for information science, computer science, library science and a number of other related disciplines.


Various terminologies associated with development and evolution of hypertext, imaging technology, World Wide Web and other related technology are discussed. Basic characteristics of digital libraries are enunciated along with their definitions by some of the leading technologists in this field. The section deliberates upon the concept of “hybrid library” which reflects the realities being faced by libraries as they attempt to integrate electronic resources acquired on CD ROM or other media or electronic access that they buy with the digital collections produced in-house. The hybrid library is in continuum between the conventional and digital library, where electronic and paper-based information sources are used alongside each other. It discusses the need for digital libraries and compares it with Web, traditional libraries and traditional information storage and retrieval system. Examples of different types of digital libraries conclude this section on the digital library. A glossary of terms used in the text is given to provide better understanding to the concept of digital libraries.






