IsoveraDL Software Requirements Specification

Document Summary

This document specifies the requirements for the BEN Collaborators Collection Tool. The collections tool will be an open-source software system written for the PHP/MySQL platform. Although the system will be written in an extensible way, its primary focus will be the provision of the BEN Collaborative with a suite of tools which facilitate its mission of providing a federated network of peer-reviewed biological learning resources and their metadata.

Revision History

Date Version Description of Document Updates Author
3/10/2006 0.1 First draft ccollins et al
3/10/2006 0.2 First draft of glossary mbusby
3/13/2006 0.3 Second draft of glossary ccollins
3/14/2006 0.4 Included first release of glossary. Added purpose, scope, and overview. ssachs
3/15/2006 0.5 Updated definitions, added use cases ssachs
3/15/2006 0.6 Added diagrams, as well as operating environment and database description. ssachs
3/15/2006 0.7 Finalize use case descriptions. First release of document. ssachs
4/9/2006 0.8 Incorporate comments from BEN collaborators. Add section on Default Configurations. ssachs

Introduction

Purpose

The primary goal of the collections tool is to streamline the maintenance of existing BEN Collaborative digital library collections, and to facilitate the addition of new collections to the collaborative.

Scope

The collections tool will provide users with all of the tools they need to create and maintain a fully functioning catalog of peer-reviewed learning resources and their metadata. This tool does not provide for end user functionality, such as searching and browsing. Furthermore, this tool does not provide the functionality required for a peer review system. However, this tool will be built with the assumption that such end user and peer review functionality will be built to augment the collections tool. Consequently, the software will be developed in a modular, extensible manner which facilitates such augmentation.

System Overview

The Collections Tool will facilitate the following tasks in a smoothly integrated manner:

- cataloging a resource and associated metadata

- quality control, including editing metadata, validating metadata, and managing validation assignments

- harvesting metadata into the catalog and out of the catalog, and managing the harvesting process

- managing the collection metadata structure

- managing the cataloging process(es)

- managing the collection vocabularies

- managing the collection users

The collections tool does not include facilities for end users to view, browse, or search collection resources or metadata.

The cataloging process will make provisions for the user to refer to an external resource, to upload a resource for permanent archival on the collection’s server, and to refer to a resource previously uploaded to the collection’s server via FTP. Accurate meta-metadata will be recorded both for the entire lifecycle of the metadata, including creation, editing and validation.

The metadata validation will be a simple, fixed process. The administrator may assign a validator to validate a resource’s metadata, or may allow any validator to validate the metadata. Validators will view, modify, and validate or reject the metadata. The administrator may change the assignments for a resource’s metadata at any time before the validation is complete.

Taking into account the assignment of validators, the editing of metadata, and the validation process, the lifecycle of a resource and its metadata record are depicted in Figure 1.


Figure 1 The resource and metadata lifecycle

The peer review process for resources submitted by outside users, while anticipated, remains out of scope of this document and of the initial version of this software. Collection administrators and validators are assumed to use methods external to the collections tool (e.g., email-based peer reviews) to complete the peer review process for such resources, and to start the validation process only after the resource has been peer-reviewed.

The validation process follows the peer review process because the validation process is meant to certify that the metadata record accurately describes the learning resource. If the learning resource changes as a result of the peer review process, then it will be necessary to modify the metadata record as well, and to certify the accuracy of the resulting metadata during validation.

The harvesting process will be compliant with the OAI Protocol for Metadata Harvesting, version 2.0. Through this process, collection administrators will be able to import metadata into the system from another database, and export metadata to another catalog, in particular the BEN Portal.

The collection tool allows the administrator to manage the metadata structure and the cataloging process separately. The metadata structure is the set of metadata fields for which data is collected about each of the resources in the collection. The cataloging process is the process of providing values for each of those fields in order to describe one resource. The separation between the two will allow the administrator to maintain multiple cataloging processes. For example, the administrator may specify that one cataloging process allows outside users to submit metadata about a new resource, and that another process allows organization staff to catalog metadata about a resource previously peer reviewed by the organization. The processes may differ in terms of the instructions given to the user in providing metadata; or the options available to the user; or the fields which are displayed to the user.

As a baseline, the collection tool’s metadata structure will be that specified in the BEN Metadata Specification, including optional Learning Object Metadata (LOM) specification fields not required by the BEN Metadata Specification. The collection administrator may add metadata fields to that structure, but may not remove or modify those fields.

Administrators will be able to manage the vocabularies available for each metadata field, thereby creating new vocabularies and modifying them over time if necessary. It will also be possible to create mappings which translate terms in one vocabulary into terms in a corresponding BEN vocabulary.

While it will be possible to modify the metadata structure, controlled vocabularies and cataloging processes for the collection at any time, it will be advantageous to finalize these elements, to the degree possible, before entering any resources or metadata into the collection. The suggested workflow for installing, configuring, and using the collection is depicted in Figure 2.


Figure 2 The collection installation, configuration and use procedure

The collections tool will be built under the assumption that it may coexist with other installations of the collections tool on the same server. Consequently, it will be possible for multiple collections tools to share the same user database. Each user’s role on each collection will be set by administrators for that collection, making it possible for the collections to share users, if so desired by the administrators.

Because the collection will also enable users to archive learning resources on the server, it will also be necessary to maintain a resource repository. The repository will be a file store. By default, the resources in the repository will be available for Web download without restrictions. However, it will also be possible to restrict access to the resources using simple web server configuration. The organization maintaining the collection may subsequently implement a separate resource access application to provide end users with access to resources in its repository according to the organization’s business rules. The collection administrator will be responsible for setting up the cataloging process so that the rights metadata is accurately recorded for such resources.

Although this document is primarily concerned with the functionality available to the user via a web user interface, the user interface will be built on top of a well-documented application program interface (API). The API will enable other web applications and services to access and modify data in the collection’s databases. This architecture will pave the way for further extensibility, including the implementation of a peer review system, the integration of a resource access application, as well as advanced resource browse and search web applications. Wherever possible, the API will be designed to enable each organization maintaining a collection to develop and install modules which will extend the application to fit its unique business requirements.

The interaction between the web user interface, API, and collection data stores are depicted in Figure 3.


Figure 3 Collections Tool Architecture

The user database and metadata repository and collection administration database will be built on the MySQL 4.0 platform. The API and user interface will be built in PHP version 4.0, using the PEAR v1.4 extensions.

References

NSDL Metadata Policy for Collections – http://policy.comm.nsdlib.org/cgi-bin/wiki.pl?Policy_Drafts_MS-1

BEN Metadata White Paper – http://www.biosciednet.org/docs/BEN_Metadata_White_Paper_V5.pdf

IEEE Learning Objects Metadata Specification – http://ltsc.ieee.org/wg12/files/LOM_1484_12_1_v1_Final_Draft.pdf

The Open Archives Initiative Protocol for Metadata Harvesting – http://www.openarchives.org/OAI/openarchivesprotocol.html

Definitions

Glossary

General

Collection – A group of peer-reviewed scientific learning resources, and the metadata records associated with those learning resources.

Collections Tool – A generalized application, database, and user interface framework for resource & metadata management for BEN Collaborators. The tool includes a metadata catalog, user registration and login, resource submission/cataloging interface (including upload of resource to online repository), basic administration of users, metadata and resources, and metadata record editing/validation. Runs on Apache/PHP/MySQL platform.

Objects

Learning Resource – A file or set of files which together comprise a digital learning object.

Metadata Record – A set of data which describes a learning resource, and conforms to a metadata specification. All metadata records created by the BEN Collaborators Collections Tool will conform to the BEN Metadata Specification.

Activities

Acceptance – The decision to include a learning resource in the collection. This decision certifies that the resource is scientifically accurate and/or of sufficient pedagogical quality to meet the standards of the organization maintaining the collection.

Cataloging – The process of creating a metadata record for a resource for inclusion in the collection. Generally performed by a staff member or affiliate of the Collaborator’s organization for already peer-reviewed resources.

Metadata Editing – The process of modifying a metadata record for a specific learning resource.

Peer Review – The process of subjecting a learning resource to the scrutiny of one or more experts in the field for the purposes of reviewing the scientific accuracy and/or pedagogical quality of a learning resource. Used to help make a decision on eventual acceptance & publication of the learning resource. May involve resource revision and metadata editing.

Publishing – The process of making a learning resource and its associated metadata record available to non-administrative users in a collection.

Resource Revision – The process of modifying a learning resource. Usually performed by the resource’s author, in order to respond to comments made during the peer review process.

Submission – The process of submitting a resource for inclusion in the collection. Generally performed by the author of the resource or somebody otherwise outside of the Collaborator’s organization for resources that have not yet been peer-reviewed. Includes creation of a minimal metadata record for the resource.

Validation – The process by which a Validator confirms that the metadata record representing a learning resource is complete and accurate, and that the learning resource is a valid, peer-reviewed resource. May involve metadata editing, but usually does not involve resource revision.

Learning Resource/Metadata Status

Accepted – Learning resource has been peer reviewed and is considered of sufficient scientific and/or pedagogical quality to meet the standards of the organization maintaining the collection.

Not Accepted – Learning resource has not yet been included in the collection.

Not Validated – Metadata record for a learning resource has not yet been validated.

Published – Submission is in final format and has been made available in the online library.

Rejected – Submission has been peer reviewed and has not been accepted for publication.

Validated – The metadata record associated with a learning resource has been reviewed by an expert and acknowledged as accurate and complete.

User Roles

Administrator – System administrator. Has control over configuration of the collections tool and user accounts.

Author – Person who creates a learning resource.

Browser – User who consumes learning resources via the services available in the digital library, usually search & browse.

Cataloger – User who creates the metadata record for a learning resource with the collections tool.

Editor – A manager of a collection’s peer review and/or validation and publication processes.

Reviewer – User who reviews a resource for scientific and/or pedagogical quality.

Validator – User who validates a metadata record

Technical

Production – Live site. Actual site used by system users.

Staging – Test site. Usually appears to be very similar to the live site, but is not made available to external system users. Used for deployments of new system versions before migration to production use.

Operating environment

The collections tool will be developed for the following platform:

- Apache v. 1.3

- PHP v. 4.0, with the PEAR v 1.4 extensions

- MySQL v. 4.0

It will be possible to install the collections tool on most Unix platforms and on Windows Server 2003.

Databases

The collections tool will include three data stores:

- A user database, implemented as a MySQL 4.0 database

- A metadata repository and collection administration database, implemented as a MySQL 4.0 database

- A resource archive, implemented as a file store referenced by the metadata repository.

The schema for these data stores has not yet been determined. See Figure 3 for a description of the interface between these data stores, the application program interface, and the web user interface.

User Roles

The following user roles define sets of responsibilities for users in the system. In some cases it is desirable to have the same person filling more than one role – for example, the role of both Collection Administrator and Collection Manager. The user management component of this tool will make these types of role assignments possible, at the collection administrator’s discretion.

- Collection Administrator – The collection administrator manages the collection tool, by configuring it so that its metadata structure, cataloging processes, and controlled vocabularies are sufficient to capture metadata about the resources to be included in the collection. The collection administrator also oversees other users on the system.

- Collection Manager – The collection manager oversees the cataloging, validation and harvesting processes. The manager will be responsible for determining, for each cataloging process, which user roles may use that process, and who is assigned to validate a resource. The manager will also be able to perform metadata imports via harvesting.

- Validator – The validator is responsible for certifying that the metadata for a resource is correct and that the resource is peer-reviewed.

- Cataloguer – The cataloguer is responsible for adding new resources, and the metadata for thos resources, to the collection, by entering the relevant metadata into the system. The cataloguer is usually on staff of the organization maintaining the collection.

- Submitter – The submitter is a user who may, at the collect manager’s discretion, participate in the harvesting process, like the cataloguer. The submitter is usually not on staff of the organization maintaining the collection, and usually will not be allowed to use the same cataloging processes as the cataloguer,

Use Cases

A use case describes how a user of the proposed system will interact with the system to perform a unit of work. It describes an interaction over time that has meaning for the end user (person, machine or other system), and leaves the system in a complete state.

· A use case typically has requirements and constraints that describe the essential features and rules under which it operates.

· A use case may have an diagram or description illustrating behavior over time - who does what and to whom, when.

· A use case typically has scenarios associated with it that describe the work flow over time that produces the end result. Alternate work flows (to capture exceptions, etc.) are also allowed.

The following use cases are included in the collection tool:

Metadata Structure Management