30 Case Study: Eprints

Shiv Ram


I.  Objectives


Objectives  of  this  module  is  to  impart  knowledge  on  the  following  aspects  of  Eprints Institutional Repository Software:


•   Basic concepts, tool and technologies of digital archiving;

•   E-prints: Major features and functionalities;

•   Workflow of Eprints; and

•   Installation, configuration and customization of Eprints.


II.  Learning Outcomes 


After going through this lesson, users would attain knowledge about the major features and functionalities of Eprints institutional repository software, its installation, configuration and customization process. Learners would be equipped with the knowledge of collection building process using Eprints.


III. Structure 


1. Introduction

2. Need of Digital Archive Software

3. Eprints software

4. System requirements

4.1 Hardware

4.2 Software

4.3 Network

5. Key Features

6. Eprints workflow

7. Eprints configuration

8. Folder structure

8.1 Global configuration folders

8.2 Repository specific folders

9. Customization

9.1 Look and feel

9.2 Metadata

9.3 Document type

9.4 Subjects


10. Eprints Example Repositories

11. Summary



1.  Introduction: 


The paradigm shift from traditional to digital library services has changed the concepts of information services in a short span of time. Digital Libraries refer to organized and managed collections of digital material, with associated services, accessible over a network. Significant amount of contribution in terms of standards, technologies, techniques, and best practices related to the development and management of digital libraries have taken place in recent past. Digital libraries encompass several facets such as content creation and capture, information storage (digital objects + metadata), search, display and access (user interface), access management, Interoperability, preservation and maintenance etc. Key digital services like electronic journals, portals and gateways, RSS, digital archiving, etc. have made LIS professional’s role more efficient and effective. Digital archiving is an important component of digital libraries and it has evolved along with it; today various technologies and tools are available in the arena of digital archiving.


1.1  Digital Archives: 


It provides a foundation for preservation of digital collections by storing and providing seamless access in a secure environment. Digital preservation is the set of management processes that ensure the long-term accessibility of digital content. The process of digital archiving involves digitization and capturing born digital documents, storage, preservation, metadata assignment, collection policies, access interface (retrieval), and dissemination of information. Strategic overviews  on  broader  issues  like  digital  materials, standardizations,  archiving  tools  and technologies, technological obsolescence etc are important in setting up long term digital archives.


2.  Need of Digital Archive Software 


The explosion of information in digital media and computer processing power has resulted in many systems where the Producer role (researcher) and the Archive role (librarian) are the responsibility of the same entity (Organization). Having a robust digital archiving system in place with long term preservation polices are the need of an hour. Many organizations across the world have come forward to put their scholarly literature in the open domain by establishing digital archives which are popularly known as ‘institutional repositories’. These archives help organizations to preserve and showcase their intellectual output to the external community by providing copyright compliance open access to its scholarly literature. Many libraries and research community are benefited by this noble initiative by getting timely access to scholarly literature at free of cost.


Today we can see many universities and research organizations across the world are showing keen interest in establishing institutional repositories to benefit themselves as well as society at large. Digital archiving field demands robust tools and technologies which enable the long term preservation and delivery in a secure environment. Various proprietary and open source tools and technologies are available with large user community and support in the area of digital archiving. Popular open source digital repository software’s are Eprints, Dspace, Fedora, etc.


3.  Eprints software: 


Eprints is open-source software available under the GNU General Public License developed at the E&CS Dept. of the Univ. of Southampton. Eprints Written in PERL, recommended for UNIX-like operating system. Eprints requires dependence software’s to configure, namely MySQL as RDBMS, Apache as server software, XML for import, export options, and IRStats for various analytics of the repository. Operating system specific Eprints can be downloaded from http://www.eprints.org/software/. Strong user community


The Primary objective of this software is to enable institutions/organizations to set up and maintain eprint archives or Institutional repositories for their scholarly digital content and make them available in the open domain. Eprints supports ‘Green Road’ channel of open access publishing by facilitating institutions/organizations to establish Institutional repositories. It provides a web interface for managing, submitting, discovering, and downloading documents. EPrints addresses high metadata quality to enhance easier data entry and interoperability.


4.  System requirements (recommended) for eprints installation: 


4.1  Hardware:


•   Intel P4 processor

•   512 MB RAM & above

•   40 GB Hard Disk Space (Depends on collection size)

•   Network Interface

•   It runs on lower H/W configurations also


4.2  Software: 


•   OS -Linux compatible (Fedora Core Release, RHEL, CentOS, Ubuntu), windows XP and later (Win32)

•   Web server Apache-2.0 or later with the mod_perlversion 2.0 module (significantly increases the performance of Perlscripts)

•   RDBMS (MySQL5.0)

•   ImageMagick & tetex-latex (helps in rendering equations)

•   PERL Modules (perl-MIME-Lite, perl-XML-LibXML, perl-XML-Parser, Term::ReadKey)

•   Xpdf & antiword (fulltext indexing)

•   Browser (Mozilla or any other graphical browser)

•   Mail server (sendmail or any other)


4.3  Network: 


•    Public IP Number for the server & Registered host name (fully qualified domain name) –provided by ISP.

Example: (Public IP) & nal-ir.nal.res.in (domain name)


Installation manual:


Detailed step by step installation manual for various operating systems is available at eprints documentation website http://wiki.eprints.org/w/EPrints_Manual


5. Eprints –Key features 


•  Eprints institutional repository software is available free on Internet with source code since 2000. Various versions were released from time to time with latest features and upgradation packages.


• It is developed by a dedicated team at Department of Electronics, University of Southampton. All     versions of eprints with contributed modules can be downloaded at http://www.eprints.org/ or http://files.eprints.org/


•  Developed and distributed under GNU license in order to restrict the exploitation of the software. One can customize the source code as per their requirement and use, but it is prohibited to get commercial benefits from the same.


•  It can be installed on various variants of Linux like operating system, Ex. Fedora, RHL, Ubuntu, Debian. It can also be installed on windows XP and later (Win32). It is recommended for Linux variants for best results.


• Eprints is built on various technologies like PERL as scripting language, MySQL as backend database, Apache as server, XML for Import, export and display  functionalities,  other packages like TeX system and ImageMagick for rendering Latex equations, antiword, Xpdf etc.


•  It supports multiple archives on a single installation, which means user can run a number of archives on a single installation


• It supports various means of document retrieval mechanisms such as simple and advanced search, browse option for various metadata fields like author, title, document type, year etc.


• Web based administration functionality of eprints enables users to administer anywhere from the world with authentication (UserID and Password)


• Eprints is Unicode compliance software which accommodates major languages of the world.


Archive managers can build collection in many languages with search functionality. The user interface can also be developed in native languages.


•   It is developed on OAI-PMH frame work which enables cross repositories search  by harvesting metadata by centralized harvesters. Users can search multiple repositories at single point, for fulltext documents they will redirected to parent repository.


•   It can be customized in various ways, such as home page, browse views, document types, metadata fields, subject categories,


•  Eprints functionalities can be extended by writing plugins in PERL, users can find various third party extended functionality modules on eprints website


•  It supports one of the web 2.0 functionality by generating RSS feeds for recent items, browse views, search results etc, to stay updated on the latest additions to the repository


•   It supports multi tier access control to the archived documents, namely “Anyone” where anybody on the Internet can access the fulltext document, “Registered users only” where only authorized users (members of an organization or institute) of the repository can access, “Archive staff only” where only depositor and archive administrator will have access to the fulltext document.


•   Multi – Role based user types are available to delegate roles and responsibilities in archive management. Registered users can only deposit documents at archive/repository, Editors/Moderators/Reviewers can deposit and review the deposited documents and edit, reject, send it back to depositor etc , Administrators will have overall administrative privileges of the repository.


•   Authority files helps to avoid duplicate entries, improve metadata quality & uniformity


•  It supports various metadata formats like METS, Dublin Core or other Digital library interoperability formats. Users can incorporate their custom made metadata fields as per collection requirement


•  It supports bulk import & export feature to facilitate easy and faster archiving process. The popular file formats which Eprints supports for import & export feature are BibTeX, PubMed XML, EndNote, Reference Manager etc.


•   Good documentation is available at eprints website and dedicated team answers all quires raised across eprints community though email discussion forum. These valuable discussions threads are archived to serve as ready reference for similar problems.


7.  Eprints Workflow: The workflow at Eprints can be divided in to two blocks for better understanding, First block for archive managers (brown colored) and second block for archive users (green colored). First block is divided into two stages, first stage where authorized (Repository staff, Authors, Creators etc) user will deposit documents by assigning appropriate metadata, access rights, subject category etc. In second stage moderator/editor/reviewer will check for authenticity, validate metadata, access rights of deposited document and incorporate changes if necessary, and then it will be moderated to live archive. If deposited document fails to satisfy archive polices moderator can either destroy the document or send it back to the depositor by sighting appropriate reason.


Second block of the work flow explains the process of document retrieval by end users. Archive users can retrieve documents by browse or search mechanism, and full text will be available based on the access type granted by repository managers. Users can subscribe to RSS feed of the archive to stay update about latest additions. Following flow chart depicts workflow of eprints software.



8. Eprints Web Configuration: Entire eprints web configuration can be split into following components, web server, SQL database, PERL scripts for repository activities and XML configuration files. The user will submit their request to archive via a web browser which will talk to the web server. PERL scripts are invoked by web server to perform required task. The Request is processed by consulting database, metadata documents and various configuration files. Then the results are passed to the web server which intern will be delivered to end user through a web browser.


Fig. 2


9.  Folder Structure: Folder structure in eprints can be divided into two categories, viz. Global configuration folders and Repository specific folders


9.1  Global configuration folders: These files are least likely to be changed, shared by multiple repositories and considered as read only but can be overridden at the local level. Contain following sub folders with respective roles


•   lib: contains many sub folders and files which are responsible for global configuration of all repositories on single installation


•   archives: contains repository specific configurations for each repositories


•   bin & cgi: these two are directories for storing programs


•   perl_lib: holds all the modules required by PERL scripting language


•   cfg: holds information about apache (web server) configuration


•   var: all temporary files used during the process are stored in this folder


•   testdata:  contains test data for populating repository


9.2 Repository specific folders: Each repository will have its own set of sub folder and files under the top level archives folder. These are often changed to customize each specific repository as per the user needs. All repository specific sub folders will be residing at eprints3/archives/*******/ (****** is the folder name of specific repository).


• cfg: contains number of sub folders which are responsible for the entire configurations of specific repository


cfg.d:  contains  all  configuration  files  of  the  repository  (all  PERL  scripts  related  to repository)

citations: contains citation definitions for the documents of the repository

lang: language specific files for this repository (phrases, static pages and images, site template)

namedsets: contain lists of values for named set fields

static: contains pages and images in the repository

workflows: contains workflow configurations files for eprint and user

autocomplete: contains autocomplete datafiles


•   documents: all fulltext file in various formats (pdf, doc. ppt) are stored


•   html: contains processed static web pages (index, polices, etc).


•   var: contains temporary files of the repository




10. Eprints Customization: Eprints can be customized and localized at different levels, it enables administrators to change Look and feel (branding), adding new metadata fields, new document types, views, browse and search options. This can be achieved through web interface provided for administrator or by editing source files and reloading entire configuration.


10.1  Look and Feel (Branding): simple way to achieve is login as an administrator, click on ‘Home’ to view the home static page (default home page), context sensitive menu tool named ‘Edit page’ will appear in the toolbar. By clicking this button user can download index.xpage as html encapsulated file and incorporate necessary customization as per their requirement or same can be achieved in browsers itself. Similarly logo for the repository can also be uploaded from the screen. Once the customization task is completed configuration needs to be reloaded by clicking Reload configuration button present at the top of the page. The same can be achieved by editing respective source files.


Fig. 4


10.2  Metadata  Customization:  New  metadata  fields  can  be  added  by  clicking on Admin/config.Tools/ Manage Metadata Fields button and then by selecting Eprints dataset fields. The process of adding metadata has four stages viz. Type, Properties, Phrases and Commit. User can set various properties like, mandatory field, length, null values, etc., for new metadata filed.




10.3   Document Type: Eprints default installation comes with various document types namely Article, Book, Book-Section, Conference Item, Monograph, Patent, Thesis, Artefact, Exhibition,Composition, Performance, Image, Video, Audio, Dataset, Experiment, Teaching Resource and Other. Apart from these users can add their own document types by adding them at namedsets/eprint and lang/en/phrases/local.xml source files. Workflow and citation style for the newly added document type can also be customized at workflows/eprint/default.xml and citations/eprint/default.xml. A new document type is added in the following flow chart for your reference.

Fig. 6


10.4 Subjects: New subject tree can be added or modified with administration privileges at Admin/config.Tools/Edit Subjects.


Fig. 7


10.5  Browse Views: Users can customize their browse views by calling required metadata in repository specific views configuration file at /eprints3/archives/nal/cfg/cfg.d/view.pl.


The New journal view is added in the following flowchart for the eprints collection, user can build view as per his requirements by calling appropriate metadata element in views.pl file.


Fig. 8


Other than above customization choices, eprints empowers repository administrators to incorporate few more customizations like workflow, controlled vocabularies, renaming phrase, search page etc.


11. Eprints Example websites: Eprints digital repository (Digital library) software is used to create diverse repositories such as Research, Theses, Data, Project, Political, Subject-based etc.


Following table lists one example of each type of repository which has been customized uniquely based on their document types.



12.  Summary


The module introduces the world of digital archiving, need for archiving technologies and Eprints in detail. Large number of repositories have been established using Eprints across the world. Its capabilities, reliability, good documentations and supports from the development team and user community has made it among one of the popular open source software for digital archiving. Eprints configuration and workflow was discussed in detail along with various hardware and software requirements for Installation. Unit highlights various features of eprints software so that user can fully aware of capabilities of the tool. Understanding folder structure in eprints is very essential which has been explained in details in the module. Customizing archive installation has been explained with screenshots and flowcharts so that it can be achieved easily. Various popular archives of different document types were discussed to give broader overview of Eprints capabilities.



13. References:

  1. http://www.eprints.org/
  2. http://wiki.eprints.org/w/EPrints_Manual
  3. http://www.eprints.org/software/training/
  4. http://www.dpconline.org/pages/handbook/docs/DPCHandbookDigPres.pdf
  5. http://www.digitalpreservation.gov/documents/ebookpdf_march18.pdf