8 Knowledge Organization in Digital Libraries

Mitesh Pandya


I.     Objectives


The module intends to impart knowledge of the following aspects of the digital library organization:


•    Identify the main aspect of organization of knowledge;

•    Understand knowledge organization methods;

•    Apply the capabilities of automated systems to support knowledge organization;

•    Explore knowledge organizations tools; and

•    Evaluate frameworks and systems for knowledge organization.



II.     Learning Outcomes 


After going through this lesson, the learner would attain knowledge about problems, need and purposes and levels of knowledge organization in digital libraries. He would learn about tools of knowledge organization that are used in traditional libraries as well as in digital libraries. The learners would gain knowledge about procedural and descriptive markup languages used in digital library and their use in manipulating and representing content. In addition, learner would attain knowledge on significant difference between knowledge organisation in traditional library V/s digital library.


III.   Structure 


1.      Introduction

2.      Need and Purpose

3.      Problems of Knowledge Organization in Digital Library

4.      Levels of Knowledge Organization in Digital Library

4.1.       Organization of Knowledge in Database

4.2.       Organization of Knowledge within a Document

4.3.       Organization of Metadata

5.      Tools of Knowledge Organization in Traditional Libraries

5.1.       Classification Systems

5.2.       Cataloguing Codes

5.3.       Thesaurus or Subject Heading

6.      Tools of Knowledge Organization in Digital Library

6.1.       Develop Metadata Schema

6.2.       Assign Metadata to each Digital Object

6.3.       Assign Unique Object Identifier to Each Digital Object

6.3.1.      Persistent Uniform Resource Locator (PURL)

6.3.2.      Handle System

6.3.3.      Digital Object Identifier

6.3.4.      OpenURL

7.      Digital Content Mark-up and Manipulation

7.1.            Procedural Markup

7.2.            Descriptive Markup

8.      Knowledge Organization: Traditional Library v/s Digital Library

9.      Knowledge Organization in Selected Digital Libraries

9.1.            Institutional Repository

9.2.            Commercial Digital Libraries

10.  Summary




1. Introduction


The term knowledge organization consists of two distinct concepts, i.e. knowledge and organization. While knowledge is universally accepted and well-known concept, its organization refers to the activities involved in categorization of knowledge for its effective retrieval. It involves all activities that are undertaken to organize the published knowledge. The published knowledge could be either in physical format or in digital format. Libraries perform a number of activities for the organization of knowledge that includes classification, cataloguing, indexing of documents available in a library with an aim to provide prompt access to a specific collection of knowledge resources. Basically, knowledge organization means classification of knowledge in various categories so that it can be easily retrieved, whenever required. Traditional libraries deploy a number of tools and techniques to organize their physical collections with an aim to provide most effective and efficient method of browsing, searching and listing of documents available in a library. Such traditional tools and techniques are also being deployed for organizing web information resources. However, there are a number of key differences between traditional print-based resources and electronic resources. Since traditional tools and techniques were found inadequate to develop effective resource discovery and information access, new metadata standards, tools and techniques are developed to handle digital information resources. In traditional library system, the classification and cataloguing are used to organize the knowledge available in print format. These tools became quite popular because of their deftness and flexibility.


2. Need and Purpose 


The first and foremost reason for knowledge organization is to search and retrieve required pieces of knowledge effectively and efficiently. Knowledge Organization Systems (KOS) are being deployed increasingly for this purpose. These KOS are user-friendly, with easy to access interface. The Main purposes for knowledge organization are as follows:


•   To assist users in search and retrieval of knowledge effectively and efficiently;

•   To enable a user to browse knowledge resources available in a digital library;

•   To generate multiple listing of knowledge resources available in a digital library;

•   Facilitate search and browse;

•   To provide pathways to reach the documents

•   To define a relationship/link between documents; and

•   To locate a particular digital document within a fraction of time.


3. Problems of Knowledge Organization in Digital Library 


Digital library consists of a set of collections that are available in a variety of formats and types. Digital objects housed in a digital library should be organized in a structured and systematic manner in order to facilitate effective and efficient searching, browsing and retrieval experience to users of digital library. It is a challenge to build-up a simple and effective knowledge organization tool for a digital library. In traditional library system, the published knowledge is organized using tools such as Anglo American Cataloguing Rules-2 (AACR2), Classified Catalogue Code (CCC) for cataloguing and Dewey Decimal Classification (DDC), Colon Classification (CC), Universal Decimal Classification (UDC), Library of Congress (LC) for classification. These tools cannot be used for organization of digital content. As such, new tools including metadata schema, markup languages, ontologies, thesauri etc., have emerged as knowledge organization tools and methods for organizing research in a digital library. Metadata standards and schema help us to create metadata for digital content although, they cannot be used to refer to a specific portion of the digital content/document. Markup languages such as SGML (Standard Generalized Markup Language) or XML (Extensible Markup Language) are used as pointer to refer to a specific section of a digital document by deploying appropriate and precise tags, so that specific content in a document can be extracted using a predefined set of tags. As such, it can be said that while traditional tools were effective for printed documents, new sets of tools  are required for knowledge organization in digital environment that facilitate effective searching, retrieval, browsing and multiple listing of digital content.


4. Levels of Knowledge Organization in Digital Library 


Knowledge is available in different types and varieties. As such, it is organized depending upon their types, varieties and levels, as mentioned below.


4.1  Organization of Knowledge in Database


The knowledge in database is organized in one or more tables, tables are further organized in fields and sub-fields and these fields and sub-fields are related to each other. Most databases follow this kind of structure for the digital knowledge organization.


4.2  Organization of Knowledge within a Document


Information content or knowledge consisting of textual as well as pictorial data is organized in the form of documents. Within a document, information is organized as articles in the case of journals and conference proceeding or chapters in case of books.


4.3  Organization of Metadata


Metadata is data that is used to describe the digital document. Metadata is typically created for each and every document. Metadata may, for example, include the title, creator, publisher and date of document creation. Once the metadata is assigned to a digital document, it can be easily searched over the networks since each element of metadata is linked to the associated digital objects.


5.  Tools of Knowledge Organization in Traditional Libraries 


Tools used for knowledge organization in traditional libraries are described below:


5.1.  Classification Systems


In a traditional library, classification systems are used to classify documents according to subject discipline covered in them. The main objective of the classification is to bring documents on the same subjects together. Dewey Decimal Classification (DDC), Colon Classification (CC), Library of Congress Classification scheme and Universal Decimal Classification (UDC) are some of the most popular schemes of classification.


5.2. Cataloguing Codes


In a traditional library, cataloging codes are used to prepare a catalogue of library material. Cataloguing code is a set of principles, rules and regulation for describing books and other library material in the catalogue. Most of the libraries are using Anglo-American Cataloguing Rules-2 (AACR-2) developed jointly by the American Library Association, Library Association (UK) and Canadian Library Association for cataloguing documents available in their library.


5.3. Thesaurus or Subject Heading


Thesaurus or subject heading is a list words grouped together according to the similarity of their meaning. Thesaurus covers a broad range of subjects, however, the subject heading is generally structured in a hierarchical manner or in alphabetical sequence. Sears List of Subject Heading, Library of Congress Subject Heading, Medical Subject Heading, INSPEC thesaurus, GESIS (Leibniz Institute for Social Sciences) thesaurus, UNESCO thesaurus are some of the examples of subject heading and thesaurus.


6. Tools of Knowledge Organization in Digital Library 


The digital library comprises of digital objects in different formats such as structured text, unstructured text, statistical data, images, audio-visual material (multimedia) etc., that are stored into various media such as hard disk, CD-ROM, DVD-ROM, floppy, etc. The digital objects stored in various media without search and browse facilities is useless. As such, an interface that support effective and efficient access and facilitates a user to browse, search and navigate digital resources are absolutely essential for digital libraries. As digital libraries are built using Web and Internet technology, it uses an object and addressing protocols of the Internet. The process of organizing digital objects includes:


6.1  Develop Metadata Schema


A metadata schema is a scheme to describe the content, context and structure of the record. There are number of metadata schemas developed for various types of data. Standards and protocols are used to make metadata interoperable and exclude unnecessary variations in the data. Some of the metadata schemas defined for various types of digital objects are Dublin Core, Metadata Encoding and Transmission Standard (METS), Text Encoding Initiative (TEI), Electronic Archives Description (EAD), etc.


6.2  Assign Metadata to each Digital Object 


Metadata is required to be assigned to each and every digital object in order to facilitate its discovery and retrieval. Following is an example of the assigning metadata using Dublin core metadata standard.



xmlns:rdf=”http://www.w3.org/1999/02/22-rdf-syntax-ns#” xmlns:dc=“http://purl.org/dc/elements/1.1/”>

<rdf:Description rdf:about=”http://inflibnet.ac.in/“>

<dc:creator> Fritz, Deborah A </dc:creator>

<dc:title>MARC21 for everyone: A practical guide </dc:title>

<dc:description> document provides information for implementing MARC21 in libraries. It is also provides practical exposure to the new comers.</dc:description>



<dc:publisher>American Library Association</dc:publisher>






6.3  Assign Unique Object Identifier to Each Digital Object


It is essential to give a unique identification to each and every digital object Identify, locate and retrieve any digital object. Some of the important unique identifier are as follows:


6.3.1  PURL stands for Persistent Uniform Resource Locator. It is used to redirect to the original location of the requested web resources. PURL was developed by OCLC in 1995 and is implemented using a forked pre-1.0 release of the Apache HTTP Server. If a document moves, the URL is updated, but the PURL remains the same. In operation, a user requests a document through a PURL. A PURL server looks up the corresponding URL in a database, and then the user is redirected to resolved URL. PURL is also treated as URL but, instead of directing to the specific location of an Internet resource, it directs to an intermediate resolution service that runs on a central PURL server.


6.3.2  Handle System is designed to provide efficient, extensible, and secure resolution services for unique and persistent identifiers of digital objects. It is a component of Corporation  for  National  Research  Initiative’s  (CNRI)  Digital  Object  Architecture.  The Handle System includes an open set of protocols, a namespace, and a reference implementation of the protocols. The protocols enable a distributed computer system to store identifiers, known as handles.

Figure -1: Handle System


The Handle System allows handles to be adjudicated in a distributed manner, using dedicated clients, other clients such as web browsers or plug-ins goes through various proxies. In all cases, communication with the Handle System is carried out by using Handle System protocols.


As shown in the figure 1, client such as web browser encounters a handle, i.e., 10.123/234, on the wide area network or on local area network (Intranet) generally as a hyperlink. The client sends the handle to the Handle System for resolution. This can be done directly by a client, which understands the handle resolution protocol. Institutional repositories setup using Dspace uses handle system.


6.3.3  Digital  Object  Identifier 3,4,5  used  by  publisher.  It  provides  a  set  of  systems  for persistent and actionable identification and for exchange of managed information on digital networks. DOI system is contrived for interoperability, means to use existing identifier and metadata schemes.


Figure -2: Digital Object Identifier


A DOI name (10.1234/56/) is assigned to a content entity the DOI system provides resolution from that name to a current URL/URI. When the content, previously known as www.sample.com, is moved to a new URL www.newsample.com, a single change in the DOI directory is made: all instances of the DOI name identifying that content will resolve the new URL, without the user having to take any action or be aware of the change. The DOI name is persistent in nature and it remains unchanged forever. Most of the publishers use DOI as unique identifier.


6.3.4 OpenURL is a standardized format of uniform resource locator (URL) intended to enable Internet users to find out copy of a resource that they are allowed to access. Although OpenURL can be used with any kind of resource over internet, it is designed to enable linking from information resources such as abstracting and indexing databases (sources) to library services (targets), such as academic journals, whether online or in printed or other formats. The linking is mediated by “link resolvers”, or “link-servers”, which parse the elements of an OpenURL and provide links to appropriate targets available through a library by the use of an OpenURL.


7.  Digital Content Mark-up and Manipulation2 


Markup languages are contrived to process and presentation of text. The language specifies code for formatting, both the layout and the style, within a text file. The codes which are used to specify the formatting are called tags. There are two types of markup languages.


7.1  Procedural Markup 


Procedural mark-up language applie to a single way of presenting information and it does not define how to display the file in other media on the Internet. HTML and XHTML are examples of procedural markup language. Basically, the purpose of using procedural markup is to display the content in a specific manner.

Figure – 3: Procedural Markup


7.2  Descriptive Markup 


The content written in a page is rarely one long piece of text. Often the author divides the text into paragraphs, headings and subheadings. This structure can easily be translated into a descriptive structure. Descriptive mark-up, describes the purpose of the text in the document rather than just how it should appear on the page. For example, procedural mark-up would indicate print on a new line in bold face whereas descriptive mark-up would indicate which bit of document is the title, head, paragraph, footnote, keywords, etc., and therefore, the descriptive mark-up separates the content of a document from its style of presentation.


Figure – 4: Descriptive Markup


A descriptive markup system uses markup codes which simply provide names to categorize parts of a document. It describes the structural role of an element (for example, “header”, “list” or “element of bibliography”) and presupposes that the rendering software will choose the right correspondent variant of design for this element. SGML and XML are example of descriptive markup language.


8.  Knowledge Organization: Traditional Library v/s Digital Library


Following are some of the important differences between traditional library and digital library:


•  Classification is one of the tools for organization of knowledge in the traditional library system. In traditional library system classification is used to bring documents on the same subject together essentially to facilitate browsing. Whereas in case of digital library, communities and subject collections are assigned to organize knowledge.


•  In a traditional library system unique identification is assigned to locate a  specific document. It comprised of Class no + Book number + Accession number whereas in digital library environment unique identifiers such as DOI, Handle System, OpenURL and PURL are used for each digital object.


For example: Each article published in a journal has DOI and each article in institutional repository has a handle.


•   Cataloguing is being done in traditional library to retrieve physical objects. The concept of cataloguing is to assign subject heading to facilitate searching and browsing. Free text search is not possible in traditional libraries. Whereas in digital library, metadata is assigned to retrieve each digital object. Digital library also allows free text searching. In traditional library only descriptive metadata is used whereas in digital library descriptive, structural and administrative and preservation metadata is used.


•   Units of organization in physical libraries are books, journals, conference proceedings, etc. Whereas units of organization in a digital library are articles published in journals and conference proceedings as well as chapters of books.


9.  Knowledge Organization in Selected Digital Libraries


9.1  Institutional Repository


An Institutional repository is a showcase of the research output of any institute. It is a collection of digital content produced by the researchers of the institute. Let us see how digital knowledge is organized in some of the well-known institutional repositories.


9.1.1  Shodhganga 


Shodhganga is a National repository of electronic theses and dissertations with full-text content. All the Ph.D theses produced by the research scholars from Indian universities at Ph.D level is archived in the shodhganga using Dspace digital library software. At primary level, it is organized university wise and at secondary level, it is organized by department within a university. Here, the user could select a specific university collection to search thesis or alternatively, the user can search for more than one university, which allows cross- collection searching from one or more collection/university. Besides this, it also provides browse by the set of subjects or single subject from more than one university.


9.1.2  ePrints@IISC


ePrints@IISC is a digital repository of research outcome of the Indian Institute of Science. It includes preprints, post-prints and scholarly communications emanated from IISc community. The digital documents in ePrints@IISc are organized in five categories i.e by subject, year, author, e-print type and latest addition. The user could select a specific subject, like biological sciences or specific class within broad group, like chemical sciences or electrical sciences. The digital contents are not classified; instead the digital content/collection is partitioned according to major disciplines or broad classes.


9.2 Commercial Digital Libraries


Science Direct, Springer and Taylor & Francis are the publishers of the journals as well as books and are well-known for scientific publications. The digital libraries of these publishers are organized in subject collections and DOI is used as a unique identifier to organize the knowledge.



Figure – 5: Science Direct Subject Collection


Figure- 6: Springer Subject Collection


Figure- 7: Taylor & Francis Subject Collection


10.  Summary 


Whether it is a traditional library or digital library, knowledge organization play a vital role in collection development, discovery services, navigation and visualization, etc. In depth analysis required to adopt knowledge organization system in a digital library, such as kind of digital content, user needs, software and infrastructure. In this module need, purpose, issues related to Knowledge organization system, various tools such as PURL, OpenURL, DOI used to develop knowledge organization system in digital library are discussed. The module elaborates various markup languages such as procedural markup and descriptive markup used to describe the digital content in digital library system. The module also discusses on some of the selected digital library system including institutional repositories such as Shodhganga, ePrints@IISC and commercial digital libraries such as Science Direct, Taylor & Francis and Springer and how the digital contents are organized in these digital repositories.




1. Chowdhury, G. G., & Chowdhury, S. (2007). Organizing Information: from the shelf to the web (1 edition.). London: Facet Publishing.

2. Descriptive Markup. (2014, May 27). Retrieved May 27, 2014, from http://www.wwp.brown.edu/outreach/seminars/UCLA/presentations/html/introductio n_markup_lecture.xhtml

3. Digital object identifier. (2014, March 6). Retrieved from http://en.wikipedia.org/w/index.php?title=Digital_object_identifier&oldid=59690110 7

4. Digital Object Identifier System. (2014, May 27). Retrieved May 27, 2014, from http://www.doi.org/

5. Digital Object Identifier System Handbook. (2014, March 10). Retrieved March 10,

2014, from http://www.doi.org/hb.html

6. Functions of knowledge organization. (2014, May 27). Retrieved May 27, 2014, from http://www.iva.dk/bh/lifeboat_ko/CONCEPTS/knowledge_organizat_functions.htm

7. Handle System. (2014, February 16). Retrieved from http://en.wikipedia.org/w/index.php?title=Handle_System&oldid=595508810

8. Handle System Administration Manual: 1. Handle System Overview. (2014, March 19). Retrieved March 19, 2014, from http://www.handle.net/hs_manual_18jan02/server_manual_1.html

9. HTML Unleashed. SGML and the HTML DTD: Procedural and Descriptive Markup | WebReference. (2014, May 27). Retrieved May 27, 2014, from http://www.webreference.com/dlab/books/html/3-1.html

10. Hunt/er, J. L. (2003). A survey of metadata research for organizing the web. Library Trends, 318–344.

11.  Interactive Web Development, Dr. Drew Hwang. (n.d.). Retrieved June 18, 2014, from http://hwang.cisdept.csupomona.edu/cis311/structure.aspx?m=ml

12.  Knowledge organization tools based on “human needs” for digital and Internet Environments. (2014, April 10). Human Science. Retrieved April 10, 2014, from http://humanscience.wikia.com/wiki/Knowledge_organization_tools_based_on_%E2


13.  Markup Languages. (2014, May 27). Retrieved May 27, 2014, from http://www.peterindia.net/MarkupLanguageOverview.html

14.  OpenURL. (2014, February 22). Retrieved from http://en.wikipedia.org/w/index.php?title=OpenURL&oldid=593003049

  1. Paskin, N. (2014). Digital Object Identifier (DOI®) System. In Encyclopedia of Library and Information Sciences, Third Edition (pp. 1586–1592). Taylor & Francis. Retrieved from http://www.tandfonline.com/doi/abs/10.1081/E-ELIS3-120044418
  2. Persistent uniform resource locator. (2014, February 28). Retrieved from http://en.wikipedia.org/w/index.php?title=Persistent_uniform_resource_locator&oldid=597549231
  1. Soergel, D. (2009). Digital Libraries and Knowledge Organization. In S. R. Kruk &
  2. B. McDaniel (E), Semantic Digital Libraries (pp. 9–39). Springer Berlin Heidelberg. Retrieved from http://link.springer.com/chapter/10.1007/978-3-540- 85434-0_2
  3. What is a Markup Language? – Definition from Techopedia. (n.d.). Techopedia.com. Retrieved June 18, 2014, from http://www.techopedia.com/definition/2668/markup- language