Files
DSpace/dspace/docs/html/ch13.html
Jeffrey Trimble 0bd461646d Final Revisions for 1.6.1
git-svn-id: http://scm.dspace.org/svn/repo/dspace/trunk@5002 9c30dcfa-912a-0410-8fc2-9e0234be79fd
2010-05-21 16:39:25 +00:00

433 lines
118 KiB
HTML
Raw Blame History

<html><head><META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"><title>Chapter&nbsp;13.&nbsp;DSpace System Documentation: Business Logic Layer</title><meta content="DocBook XSL Stylesheets V1.75.2" name="generator"><link rel="home" href="index.html" title="DSpace Manual"><link rel="up" href="index.html" title="DSpace Manual"><link rel="prev" href="ch12.html" title="Chapter&nbsp;12.&nbsp;DSpace System Documentation: Application Layer"><link rel="next" href="ch14.html" title="Chapter&nbsp;14.&nbsp;DSpace System Documentation: Customizing and Configuring Submission User Interface"></head><body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF" marginwidth="5m"><div class="navheader"><table summary="Navigation header" width="100%"><tr><th align="center" colspan="3">Chapter&nbsp;13.&nbsp;DSpace System Documentation: Business Logic Layer</th></tr><tr><td align="left" width="20%"><a accesskey="p" href="ch12.html">Prev</a>&nbsp;</td><th align="center" width="60%">&nbsp;</th><td align="right" width="20%">&nbsp;<a accesskey="n" href="ch14.html">Next</a></td></tr></table><hr></div><div class="chapter" title="Chapter&nbsp;13.&nbsp;DSpace System Documentation: Business Logic Layer"><div class="titlepage"><div><div><h2 class="title"><a name="N176D0"></a>Chapter&nbsp;13.&nbsp;<a name="docbook-business.html"></a>DSpace System Documentation: Business Logic Layer</h2></div><div><h3 class="subtitle"><i>(<code class="literal">ConfigurationManager</code>)</i></h3></div></div><div></div></div><div class="toc"><p><b>Table of Contents</b></p><dl><dt><span class="section"><a href="ch13.html#N176DC">13.1. Core Classes</a></span></dt><dd><dl><dt><span class="section"><a href="ch13.html#N176E9">13.1.1. The Configuration Manager (<code class="literal">ConfigurationManager</code>)</a></span></dt><dt><span class="section"><a href="ch13.html#N1772C">13.1.2. Constants</a></span></dt><dt><span class="section"><a href="ch13.html#N1774A">13.1.3. Context</a></span></dt><dt><span class="section"><a href="ch13.html#N17798">13.1.4. Email</a></span></dt><dt><span class="section"><a href="ch13.html#N177B0">13.1.5. LogManager</a></span></dt><dt><span class="section"><a href="ch13.html#N17895">13.1.6. Utils</a></span></dt></dl></dd><dt><span class="section"><a href="ch13.html#N1789E">13.2. Content Management API</a></span></dt><dd><dl><dt><span class="section"><a href="ch13.html#N17948">13.2.1. Other Classes</a></span></dt><dt><span class="section"><a href="ch13.html#N1796C">13.2.2. Modifications</a></span></dt><dt><span class="section"><a href="ch13.html#N179F1">13.2.3. What's In Memory?</a></span></dt><dt><span class="section"><a href="ch13.html#N17A41">13.2.4. Dublin Core Metadata</a></span></dt><dt><span class="section"><a href="ch13.html#N17B1A">13.2.5. Support for Other Metadata Schemas</a></span></dt><dt><span class="section"><a href="ch13.html#N17B39">13.2.6. Packager Plugins</a></span></dt></dl></dd><dt><span class="section"><a href="ch13.html#N17B75">13.3. Plugin Manager</a></span></dt><dd><dl><dt><span class="section"><a href="ch13.html#N17B7E">13.3.1. Concepts</a></span></dt><dt><span class="section"><a href="ch13.html#N17BAD">13.3.2. Using the Plugin Manager</a></span></dt><dd><dl><dt><span class="section"><a href="ch13.html#N17BB1">13.3.2.1. Types of Plugin</a></span></dt><dt><span class="section"><a href="ch13.html#N17BE0">13.3.2.2. Self-Named Plugins</a></span></dt><dt><span class="section"><a href="ch13.html#N17BF5">13.3.2.3. Obtaining a Plugin Instance</a></span></dt><dt><span class="section"><a href="ch13.html#N17C07">13.3.2.4. Lifecycle Management</a></span></dt><dt><span class="section"><a href="ch13.html#N17C1D">13.3.2.5. Getting Meta-Information</a></span></dt></dl></dd><dt><span class="section"><a href="ch13.html#N17C2D">13.3.3. Implementation</a></span></dt><dd><dl><dt><span class="section"><a href="ch13.html#N17C3D">13.3.3.1. PluginManager Class</a></span></dt><dt><span class="section"><a href="ch13.html#N17CB6">13.3.3.2. SelfNamedPlugin Class</a></span></dt><dt><span class="section"><a href="ch13.html#N17CC3">13.3.3.3. Errors and Exceptions</a></span></dt></dl></dd><dt><span class="section"><a href="ch13.html#N17CD7">13.3.4. Configuring Plugins</a></span></dt><dd><dl><dt><span class="section"><a href="ch13.html#N17D0D">13.3.4.1. Configuring Singleton (Single) Plugins</a></span></dt><dt><span class="section"><a href="ch13.html#N17D29">13.3.4.2. Configuring Sequence of Plugins</a></span></dt><dt><span class="section"><a href="ch13.html#N17D3D">13.3.4.3. Configuring Named Plugins</a></span></dt><dt><span class="section"><a href="ch13.html#N17D87">13.3.4.4. Configuring the Reusable Status of a Plugin</a></span></dt></dl></dd><dt><span class="section"><a href="ch13.html#N17D9B">13.3.5. Validating the Configuration</a></span></dt><dt><span class="section"><a href="ch13.html#N17DEF">13.3.6. Use Cases</a></span></dt><dd><dl><dt><span class="section"><a href="ch13.html#N17DF5">13.3.6.1. Managing the MediaFilter plugins transparently</a></span></dt><dt><span class="section"><a href="ch13.html#N17DFF">13.3.6.2. A Singleton Plugin</a></span></dt><dt><span class="section"><a href="ch13.html#N17E12">13.3.6.3. Plugin that Names Itself</a></span></dt><dt><span class="section"><a href="ch13.html#N17E38">13.3.6.4. Stackable Authentication</a></span></dt></dl></dd></dl></dd><dt><span class="section"><a href="ch13.html#N17E42">13.4. Workflow System</a></span></dt><dt><span class="section"><a href="ch13.html#N17EBA">13.5. Administration Toolkit</a></span></dt><dt><span class="section"><a href="ch13.html#N17EF5">13.6. E-person/Group Manager</a></span></dt><dt><span class="section"><a href="ch13.html#N17F3C">13.7. Authorization</a></span></dt><dd><dl><dt><span class="section"><a href="ch13.html#N17FC5">13.7.1. Special Groups</a></span></dt><dt><span class="section"><a href="ch13.html#N17FCB">13.7.2. Miscellaneous Authorization Notes</a></span></dt></dl></dd><dt><span class="section"><a href="ch13.html#N17FD1">13.8. Handle Manager/Handle Plugin</a></span></dt><dt><span class="section"><a href="ch13.html#N1804A">13.9. Search</a></span></dt><dd><dl><dt><span class="section"><a href="ch13.html#N180AD">13.9.1. Current Lucene Implementation</a></span></dt><dt><span class="section"><a href="ch13.html#N180BF">13.9.2. Indexed Fields</a></span></dt><dt><span class="section"><a href="ch13.html#N18174">13.9.3. Harvesting API</a></span></dt></dl></dd><dt><span class="section"><a href="ch13.html#N181A3">13.10. Browse API</a></span></dt><dd><dl><dt><span class="section"><a href="ch13.html#N1822D">13.10.1. Using the API</a></span></dt><dt><span class="section"><a href="ch13.html#N1828C">13.10.2. Index Maintenance</a></span></dt><dt><span class="section"><a href="ch13.html#N1829C">13.10.3. Caveats</a></span></dt></dl></dd><dt><span class="section"><a href="ch13.html#N182B7">13.11. Checksum checker</a></span></dt><dt><span class="section"><a href="ch13.html#N182C9">13.12. OpenSearch Support</a></span></dt><dt><span class="section"><a href="ch13.html#N18332">13.13. Embargo Support</a></span></dt><dd><dl><dt><span class="section"><a href="ch13.html#N18339">13.13.1. What is an Embargo?</a></span></dt><dt><span class="section"><a href="ch13.html#N1833F">13.13.2. Embargo Model and Life-Cycle</a></span></dt></dl></dd></dl></div><div class="section" title="13.1.&nbsp;Core Classes"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="N176DC"></a>13.1.&nbsp;<a name="docbook-business.html-core"></a>Core Classes</h2></div></div><div></div></div><p>The <code class="literal">org.dspace.core</code> package provides some basic classes that are used throughout the DSpace code.</p><div class="section" title="13.1.1.&nbsp;The Configuration Manager (ConfigurationManager)"><div class="titlepage"><div><div><h3 class="title"><a name="N176E9"></a>13.1.1.&nbsp;The Configuration Manager (<code class="literal">ConfigurationManager</code>)</h3></div></div><div></div></div><p>The configuration manager is responsible for reading the main <code class="literal">dspace.cfg</code> properties file, managing the 'template' configuration files for other applications such as Apache, and for obtaining the text for e-mail messages.</p><p>The system is configured by editing the relevant files in <code class="literal">/dspace/config</code>, as described in the <a class="link" href="ch05.html#docbook-configure.html">configuration section</a>.</p><p>
<span class="bold"><strong>When editing configuration files for applications that DSpace uses, such as Apache, remember to edit the file in /dspace/config/templates and then run /dspace/bin/install-configs rather than editing the 'live' version directly!</strong></span>
</p><p>The <code class="literal">ConfigurationManager</code> class can also be invoked as a command line tool, with two possible uses:</p><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem"><p><code class="literal">/dspace/bin/install-configs</code></p><p>This processes and installs configuration files for other applications, as described in the <a class="link" href="ch05.html#docbook-configure.html">configuration section</a>.</p></li><li class="listitem"><p><code class="literal">/dspace/bin/dsrun org.dspace.core.ConfigurationManager -property property.name</code></p><p>This writes the value of <code class="literal">property.name</code> from <code class="literal">dspace.cfg</code> to the standard output, so that shell scripts can access the DSpace configuration. For an example, see <code class="literal">/dspace/bin/start-handle-server</code>. If the property has no value, nothing is written.</p></li></ul></div></div><div class="section" title="13.1.2.&nbsp;Constants"><div class="titlepage"><div><div><h3 class="title"><a name="N1772C"></a>13.1.2.&nbsp;Constants</h3></div></div><div></div></div><p>This class contains constants that are used to represent types of object and actions in the database. For example, authorization policies can relate to objects of different types, so the <code class="literal">resourcepolicy</code> table has columns <code class="literal">resource_id</code>, which is the internal ID of the object, and <code class="literal">resource_type_id</code>, which indicates whether the object is an item, collection, bitstream etc. The value of <code class="literal">resource_type_id</code> is taken from the <code class="literal">Constants</code> class, for example <code class="literal">Constants.ITEM</code>.</p></div><div class="section" title="13.1.3.&nbsp;Context"><div class="titlepage"><div><div><h3 class="title"><a name="N1774A"></a>13.1.3.&nbsp;Context</h3></div></div><div></div></div><p>The <code class="literal">Context</code> class is central to the DSpace operation. Any code that wishes to use the any API in the business logic layer must first create itself a <code class="literal">Context</code> object. This is akin to opening a connection to a database (which is in fact one of the things that happens.)</p><p>A context object is involved in most method calls and object constructors, so that the method or object has access to information about the current operation. When the context object is constructed, the following information is automatically initialized:</p><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem"><p> A connection to the database. This is a transaction-safe connection. i.e. the 'auto-commit' flag is set to false.</p></li><li class="listitem"><p> A cache of content management API objects. Each time a content object is created (for example <code class="literal">Item</code> or <code class="literal">Bitstream</code>) it is stored in the <code class="literal">Context</code> object. If the object is then requested again, the cached copy is used. Apart from reducing database use, this addresses the problem of having two copies of the same object in memory in different states.</p></li></ul></div><p>The following information is also held in a context object, though it is the responsibility of the application creating the context object to fill it out correctly:</p><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem"><p> The current authenticated user, if any</p></li><li class="listitem"><p> Any 'special groups' the user is a member of. For example, a user might automatically be part of a particular group based on the IP address they are accessing DSpace from, even though they don't have an e-person record. Such a group is called a 'special group'.</p></li><li class="listitem"><p> Any extra information from the application layer that should be added to log messages that are written within this context. For example, the Web UI adds a session ID, so that when the logs are analyzed the actions of a particular user in a particular session can be tracked.</p></li><li class="listitem"><p> A flag indicating whether authorization should be circumvented. This should only be used in rare, specific circumstances. For example, when first installing the system, there are no authorized administrators who would be able to create an administrator account!</p><p>As noted above, the public API is <span class="emphasis"><em>trusted</em></span>, so it is up to applications in the application layer to use this flag responsibly.</p></li></ul></div><p>Typical use of the context object will involve constructing one, and setting the current user if one is authenticated. Several operations may be performed using the context object. If all goes well, <code class="literal">complete</code> is called to commit the changes and free up any resources used by the context. If anything has gone wrong, <code class="literal">abort</code> is called to roll back any changes and free up the resources.</p><p>You should always <code class="literal">abort</code> a context if <span class="emphasis"><em>any</em></span> error happens during its lifespan; otherwise the data in the system may be left in an inconsistent state. You can also <code class="literal">commit</code> a context, which means that any changes are written to the database, and the context is kept active for further use.</p></div><div class="section" title="13.1.4.&nbsp;Email"><div class="titlepage"><div><div><h3 class="title"><a name="N17798"></a>13.1.4.&nbsp;Email</h3></div></div><div></div></div><p>Sending e-mails is pretty easy. Just use the configuration manager's <code class="literal">getEmail</code> method, set the arguments and recipients, and send.</p><p>The e-mail texts are stored in <code class="literal">/dspace/config/emails</code>. They are processed by the standard <code class="literal">java.text.MessageFormat</code>. At the top of each e-mail are listed the appropriate arguments that should be filled out by the sender. Example usage is shown in the <code class="literal">org.dspace.core.Email</code> Javadoc API documentation.</p></div><div class="section" title="13.1.5.&nbsp;LogManager"><div class="titlepage"><div><div><h3 class="title"><a name="N177B0"></a>13.1.5.&nbsp;LogManager</h3></div></div><div></div></div><p>The log manager consists of a method that creates a standard log header, and returns it as a string suitable for logging. Note that this class does not actually write anything to the logs; the log header returned should be logged directly by the sender using an appropriate Log4J call, so that information about where the logging is taking place is also stored.</p><p>The level of logging can be configured on a per-package or per-class basis by editing <code class="literal">/dspace/config/templates/log4j.properties</code> and then executing <code class="literal">/dspace/bin/install-configs</code>. You will need to stop and restart Tomcat for the changes to take effect.</p><p>A typical log entry looks like this:</p><p>
<code class="literal">2002-11-11 08:11:32,903 INFO org.dspace.app.webui.servlet.DSpaceServlet @ anonymous:session_id=BD84E7C194C2CF4BD0EC3A6CAD0142BB:view_item:handle=1721.1/1686</code>
</p><p>This is breaks down like this:</p><div class="informaltable"><table border="1"><colgroup><col><col></colgroup><tbody><tr><td>
<p>Date and time, milliseconds</p>
</td><td>
<p>
<code class="literal">2002-11-11 08:11:32,903</code>
</p>
</td></tr><tr><td>
<p>Level (<code class="literal">FATAL</code>, <code class="literal">WARN</code>, <code class="literal">INFO</code> or <code class="literal">DEBUG</code>)</p>
</td><td>
<p>
<code class="literal">INFO</code>
</p>
</td></tr><tr><td>
<p>Java class</p>
</td><td>
<p>
<code class="literal">org.dspace.app.webui.servlet.DSpaceServlet</code>
</p>
</td></tr><tr><td>
<p></p>
</td><td>
<p>
<code class="literal">@</code>
</p>
</td></tr><tr><td>
<p>User email or <code class="literal">anonymous</code></p>
</td><td>
<p>
<code class="literal">anonymous</code>
</p>
</td></tr><tr><td>
<p></p>
</td><td>
<p>
<code class="literal">:</code>
</p>
</td></tr><tr><td>
<p>Extra log info from context</p>
</td><td>
<p>
<code class="literal">session_id=BD84E7C194C2CF4BD0EC3A6CAD0142BB</code>
</p>
</td></tr><tr><td>
<p></p>
</td><td>
<p>
<code class="literal">:</code>
</p>
</td></tr><tr><td>
<p>Action</p>
</td><td>
<p>
<code class="literal">view_item</code>
</p>
</td></tr><tr><td>
<p></p>
</td><td>
<p>
<code class="literal">:</code>
</p>
</td></tr><tr><td>
<p>Extra info</p>
</td><td>
<p>
<code class="literal">handle=1721.1/1686</code>
</p>
</td></tr></tbody></table></div><p>The above format allows the logs to be easily parsed and analyzed. The <code class="literal">/dspace/bin/log-reporter</code> script is a simple tool for analyzing logs. Try:</p><p><code class="literal">/dspace/bin/log-reporter --help</code></p><p>It's a good idea to 'nice' this log reporter to avoid an impact on server performance.</p></div><div class="section" title="13.1.6.&nbsp;Utils"><div class="titlepage"><div><div><h3 class="title"><a name="N17895"></a>13.1.6.&nbsp;Utils</h3></div></div><div></div></div><p><code class="literal">Utils</code> contains miscellaneous utility method that are required in a variety of places throughout the code, and thus have no particular 'home' in a subsystem.</p></div></div><div class="section" title="13.2.&nbsp;Content Management API"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="N1789E"></a>13.2.&nbsp;<a name="docbook-business.html-content"></a>Content Management API</h2></div></div><div></div></div><p>The content management API package <code class="literal">org.dspace.content</code> contains Java classes for reading and manipulating content stored in the DSpace system. This is the API that components in the application layer will probably use most.</p><p>Classes corresponding to the main elements in the <a class="link" href="ch02.html#docbook-functional.html-data_model">DSpace data model</a> (<code class="literal">Community</code>, <code class="literal">Collection</code>, <code class="literal">Item</code>, <code class="literal">Bundle</code> and <code class="literal">Bitstream</code>) are sub-classes of the abstract class <code class="literal">DSpaceObject</code>. The <code class="literal">Item</code> object handles the Dublin Core metadata record.</p><p>Each class generally has one or more static <code class="literal">find</code> methods, which are used to instantiate content objects. Constructors do not have public access and are just used internally. The reasons for this are:</p><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem"><p> "Constructing" an object may be misconstrued as the action of creating an object in the DSpace system, for example one might expect something like:</p><pre class="screen">Context dsContent = new Context();
Item myItem = new Item(context, id)</pre><p>to construct a brand new item in the system, rather than simply instantiating an in-memory instance of an object in the system.</p></li><li class="listitem"><p>
<code class="literal">find</code> methods may often be called with invalid IDs, and return <code class="literal">null</code> in such a case. A constructor would have to throw an exception in this case. A <code class="literal">null</code> return value from a static method can in general be dealt with more simply in code.</p></li><li class="listitem"><p> If an instantiation representing the same underlying archival entity already exists, the <code class="literal">find</code> method can simply return that same instantiation to avoid multiple copies and any inconsistencies which might result.</p></li></ul></div><p><code class="literal">Collection</code>, <code class="literal">Bundle</code> and <code class="literal">Bitstream</code> do not have <code class="literal">create</code> methods; rather, one has to create an object using the relevant method on the container. For example, to create a collection, one must invoke <code class="literal">createCollection</code> on the community that the collection is to appear in:</p><pre class="screen">Context context = new Context();
Community existingCommunity = Community.find(context, 123);
Collection myNewCollection = existingCommunity.createCollection();</pre><p>The primary reason for this is for determining authorization. In order to know whether an e-person may create an object, the system must know which container the object is to be added to. It makes no sense to create a collection outside of a community, and the authorization system does not have a policy for that.</p><p><code class="literal">Item</code>s are first created in the form of an implementation of <code class="literal">InProgressSubmission</code>. An <code class="literal">InProgressSubmission</code> represents an item under construction; once it is complete, it is installed into the main archive and added to the relevant collection by the <code class="literal">InstallItem</code> class. The <code class="literal">org.dspace.content</code> package provides an implementation of <code class="literal">InProgressSubmission</code> called <code class="literal">WorkspaceItem</code>; this is a simple implementation that contains some fields used by the Web submission UI. The <code class="literal">org.dspace.workflow</code> also contains an implementation called <code class="literal">WorkflowItem</code> which represents a submission undergoing a workflow process.</p><p>In the previous chapter there is an <a class="link" href="ch02.html#docbook-functional.html-ingest">overview of the item ingest process</a> which should clarify the previous paragraph. Also see the section on <a class="link" href="ch13.html#docbook-business.html-workflow">the workflow system</a>.</p><p><code class="literal">Community</code> and <code class="literal">BitstreamFormat</code> do have static <code class="literal">create</code> methods; one must be a site administrator to have authorization to invoke these.</p><div class="section" title="13.2.1.&nbsp;Other Classes"><div class="titlepage"><div><div><h3 class="title"><a name="N17948"></a>13.2.1.&nbsp;Other Classes</h3></div></div><div></div></div><p>Classes whose name begins <code class="literal">DC</code> are for manipulating Dublin Core metadata, as <a class="link" href="ch13.html#docbook-business.html-dublincore">explained below</a>.</p><p>The <code class="literal">FormatIdentifier</code> class attempts to guess the bitstream format of a particular bitstream. Presently, it does this simply by looking at any file extension in the bitstream name and matching it up with the file extensions associated with bitstream formats. Hopefully this can be greatly improved in the future!</p><p>The <code class="literal">ItemIterator</code> class allows items to be retrieved from storage one at a time, and is returned by methods that may return a large number of items, more than would be desirable to have in memory at once.</p><p>The <code class="literal">ItemComparator</code> class is an implementation of the standard <code class="literal">java.util.Comparator</code> that can be used to compare and order items based on a particular Dublin Core metadata field.</p></div><div class="section" title="13.2.2.&nbsp;Modifications"><div class="titlepage"><div><div><h3 class="title"><a name="N1796C"></a>13.2.2.&nbsp;Modifications</h3></div></div><div></div></div><p>When creating, modifying or for whatever reason removing data with the content management API, it is important to know when changes happen in-memory, and when they occur in the physical DSpace storage.</p><p>Primarily, one should note that no change made using a particular <code class="literal">org.dspace.core.Context</code> object will actually be made in the underlying storage unless <code class="literal">complete</code> or <code class="literal">commit</code> is invoked on that <code class="literal">Context</code>. If anything should go wrong during an operation, the context should always be aborted by invoking <code class="literal">abort</code>, to ensure that no inconsistent state is written to the storage.</p><p>Additionally, some changes made to objects only happen in-memory. In these cases, invoking the <code class="literal">update</code> method lines up the in-memory changes to occur in storage when the <code class="literal">Context</code> is committed or completed. In general, methods that change any [meta]data field only make the change in-memory; methods that involve relationships with other objects in the system line up the changes to be committed with the context. See individual methods in the API Javadoc.</p><p>Some examples to illustrate this are shown below:</p><div class="informaltable"><table border="0"><colgroup><col><col></colgroup><tbody><tr><td>
<p>
<pre class="screen">Context context = new Context();
Bitstream b = Bitstream.find(context, 1234);
b.setName("newfile.txt");
b.update();
context.complete();</pre>
</p>
</td><td>
<p><span class="bold"><strong>Will</strong></span> change storage</p>
</td></tr><tr><td>
<p>
<pre class="screen">Context context = new Context();
Bitstream b = Bitstream.find(context, 1234);
b.setName("newfile.txt");
b.update();
context.abort();</pre>
</p>
</td><td>
<p><span class="bold"><strong>Will not</strong></span> change storage (context aborted)</p>
</td></tr><tr><td>
<p>
<pre class="screen">Context context = new Context();
Bitstream b = Bitstream.find(context, 1234);
b.setName("newfile.txt");
context.complete();</pre>
</p>
</td><td>
<p>The new name <span class="bold"><strong>will not</strong></span> be stored since <code class="literal">update</code> was not invoked</p>
</td></tr><tr><td>
<p>
<pre class="screen">Context context = new Context();
Bitstream bs = Bitstream.find(context, 1234);
Bundle bnd = Bundle.find(context, 5678);
bnd.add(bs);
context.complete();</pre>
</p>
</td><td>
<p>The bitstream <span class="bold"><strong>will</strong></span> be included in the bundle, since <code class="literal">update</code> doesn't need to be called</p>
</td></tr></tbody></table></div></div><div class="section" title="13.2.3.&nbsp;What's In Memory?"><div class="titlepage"><div><div><h3 class="title"><a name="N179F1"></a>13.2.3.&nbsp;What's In Memory?</h3></div></div><div></div></div><p>Instantiating some content objects also causes other content objects to be loaded into memory.</p><p>Instantiating a <code class="literal">Bitstream</code> object causes the appropriate <code class="literal">BitstreamFormat</code> object to be instantiated. Of course the <code class="literal">Bitstream</code> object does not load the underlying bits from the bitstream store into memory!</p><p>Instantiating a <code class="literal">Bundle</code> object causes the appropriate <code class="literal">Bitstream</code> objects (and hence <code class="literal">BitstreamFormat</code>s) to be instantiated.</p><p>Instantiating an <code class="literal">Item</code> object causes the appropriate <code class="literal">Bundle</code> objects (etc.) and hence <code class="literal">BitstreamFormat</code>s to be instantiated. All the Dublin Core metadata associated with that item are also loaded into memory.</p><p>The reasoning behind this is that for the vast majority of cases, anyone instantiating an item object is going to need information about the bundles and bitstreams within it, and this methodology allows that to be done in the most efficient way and is simple for the caller. For example, in the Web UI, the servlet (controller) needs to pass information about an item to the viewer (JSP), which needs to have all the information in-memory to display the item without further accesses to the database which may cause errors mid-display.</p><p>You do not need to worry about multiple in-memory instantiations of the same object, or any inconsistencies that may result; the <code class="literal">Context</code> object keeps a cache of the instantiated objects. The <code class="literal">find</code> methods of classes in <code class="literal">org.dspace.content</code> will use a cached object if one exists.</p><p>It may be that in enough cases this automatic instantiation of contained objects reduces performance in situations where it is important; if this proves to be true the API may be changed in the future to include a <code class="literal">loadContents</code> method or somesuch, or perhaps a Boolean parameter indicating what to do will be added to the <code class="literal">find</code> methods.</p><p>When a <code class="literal">Context</code> object is completed, aborted or garbage-collected, any objects instantiated using that context are invalidated and should not be used (in much the same way an AWT button is invalid if the window containing it is destroyed).</p></div><div class="section" title="13.2.4.&nbsp;Dublin Core Metadata"><div class="titlepage"><div><div><h3 class="title"><a name="N17A41"></a>13.2.4.&nbsp;<a name="docbook-business.html-dublincore"></a>Dublin Core Metadata</h3></div></div><div></div></div><p>The <code class="literal">DCValue</code> class is a simple container that represents a single Dublin Core element, optional qualifier, value and language. Note that since DSpace 1.4 the <code class="literal">MetadataValue</code> and associated classes are preferred (see <a class="link" href="ch13.html#docbook-business.html-otherschemas">Support for Other Metadata Schemas</a>). The other classes starting with <code class="literal">DC</code> are utility classes for handling types of data in Dublin Core, such as people's names and dates. As supplied, the DSpace registry of elements and qualifiers corresponds to the <a class="ulink" href="http://www.dublincore.org/documents/2002/09/24/library-application-profile/" target="_top">Library Application Profile</a> for Dublin Core. It should be noted that these utility classes assume that the values will be in a certain syntax, which will be true for all data generated within the DSpace system, but since Dublin Core does not always define strict syntax, this may not be true for Dublin Core originating outside DSpace.</p><p>Below is the specific syntax that DSpace expects various fields to adhere to:</p><div class="informaltable"><table border="0"><colgroup><col><col><col><col></colgroup><tbody><tr><td>
<p>
<span class="bold"><strong>Element</strong></span>
</p>
</td><td>
<p>
<span class="bold"><strong>Qualifier</strong></span>
</p>
</td><td>
<p>
<span class="bold"><strong>Syntax</strong></span>
</p>
</td><td>
<p>
<span class="bold"><strong>Helper Class</strong></span>
</p>
</td></tr><tr><td>
<p>
<code class="literal">date</code>
</p>
</td><td>
<p>Any or unqualified</p>
</td><td><p>ISO 8601 in the UTC time zone, with either year, month, day, or second precision. Examples:</p>
<code class="literal">2000 2002-10 2002-08-14 1999-01-01T14:35:23Z </code>
</td><td>
<p>
<code class="literal">DCDate</code>
</p>
</td></tr><tr><td>
<p>
<code class="literal">contributor</code>
</p>
</td><td>
<p>Any or unqualified</p>
</td><td><p>In general last name, then a comma, then first names, then any additional information like "Jr.". If the contributor is an organization, then simply the name. Examples:</p>
<code class="literal">Doe, John Smith, John Jr. van Dyke, Dick Massachusetts Institute of Technology </code>
</td><td>
<p>
<code class="literal">DCPersonName</code>
</p>
</td></tr><tr><td>
<p>
<code class="literal">language</code>
</p>
</td><td>
<p>
<code class="literal">iso</code>
</p>
</td><td><p>A two letter code taken ISO 639, followed optionally by a two letter country code taken from ISO 3166. Examples:</p>
<code class="literal">en fr en_US </code>
</td><td>
<p>
<code class="literal">DCLanguage</code>
</p>
</td></tr><tr><td>
<p>
<code class="literal">relation</code>
</p>
</td><td>
<p>
<code class="literal">ispartofseries</code>
</p>
</td><td><p>The series name, following by a semicolon followed by the number in that series. Alternatively, just free text.</p>
<code class="literal">MIT-TR; 1234 My Report Series; ABC-1234 NS1234 </code>
</td><td>
<p>
<code class="literal">DCSeriesNumber</code>
</p>
</td></tr></tbody></table></div></div><div class="section" title="13.2.5.&nbsp;Support for Other Metadata Schemas"><div class="titlepage"><div><div><h3 class="title"><a name="N17B1A"></a>13.2.5.&nbsp;<a name="docbook-business.html-otherschemas"></a>Support for Other Metadata Schemas</h3></div></div><div></div></div><p>To support additional metadata schemas a new set of metadata classes have been added. These are backwards compatible with the DC classes and should be used rather than the DC specific classes wherever possible. Note that hierarchical metadata schemas are not currently supported, only flat schemas (such as DC) are able to be defined.</p><p>The <code class="literal">MetadataField</code> class describes a metadata field by schema, element and optional qualifier. The value of a <code class="literal">MetadataField</code> is described by a <code class="literal">MetadataValue</code> which is roughly equivalent to the older <code class="literal">DCValue</code> class. Finally the <code class="literal">MetadataSchema</code> class is used to describe supported schemas. The DC schema is supported by default. Refer to the javadoc for method details.</p></div><div class="section" title="13.2.6.&nbsp;Packager Plugins"><div class="titlepage"><div><div><h3 class="title"><a name="N17B39"></a>13.2.6.&nbsp;<a name="docbook-business.html-packager"></a>Packager Plugins</h3></div></div><div></div></div><p>The Packager plugins let you <span class="emphasis"><em>ingest</em></span> a package to create a new DSpace Object, and <span class="emphasis"><em>disseminate</em></span> a content Object as a package. A package is simply a data stream; its contents are defined by the packager plugin's implementation.</p><p>To ingest an object, which is currently only implemented for Items, the sequence of operations is:</p><div class="orderedlist"><ol class="orderedlist" type="1"><li class="listitem"><p> Get an instance of the chosen <code class="literal">PackageIngester</code> plugin.</p></li><li class="listitem"><p> Locate a Collection in which to create the new Item.</p></li><li class="listitem"><p> Call its <code class="literal">ingest</code> method, and get back a <code class="literal">WorkspaceItem</code>.</p></li></ol></div><p>The packager also takes a <code class="literal">PackageParameters</code> object, which is a property list of parameters specific to that packager which might be passed in from the user interface.</p><p>Here is an example package ingestion code fragment:</p><pre class="screen">Collection collection = <span class="emphasis"><em> find target collection</em></span>
InputStream source = ...;
PackageParameters params = ...;
String license = null;
PackageIngester sip = (PackageIngester) PluginManager
.getNamedPlugin(PackageIngester.class, packageType);
WorkspaceItem wi = sip.ingest(context, collection, source, params, license);</pre><p>Here is an example of a package dissemination:</p><pre class="screen"> OutputStream destination = ...;
PackageParameters params = ...;
DSpaceObject dso = ...;
PackageIngester dip = (PackageDisseminator) PluginManager
.getNamedPlugin(PackageDisseminator.class, packageType);
dip.disseminate(context, dso, params, destination);</pre></div></div><div class="section" title="13.3.&nbsp;Plugin Manager"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="N17B75"></a>13.3.&nbsp;<a name="docbook-business.html-plugin"></a>Plugin Manager</h2></div></div><div></div></div><p>The PluginManager is a very simple component container. It creates and organizes components (plugins), and helps select a plugin in the cases where there are many possible choices. It also gives some limited control over the life cycle of a plugin.</p><div class="section" title="13.3.1.&nbsp;Concepts"><div class="titlepage"><div><div><h3 class="title"><a name="N17B7E"></a>13.3.1.&nbsp;Concepts</h3></div></div><div></div></div><p>The following terms are important in understanding the rest of this section:</p><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem"><p><span class="bold"><strong>Plugin Interface</strong></span> A Java interface, the defining characteristic of a plugin. The consumer of a plugin asks for its plugin by interface.</p></li><li class="listitem"><p><span class="bold"><strong>Plugin</strong></span> a.k.a. Component, this is an instance of a class that implements a certain interface. It is interchangeable with other implementations, so that any of them may be "plugged in", hence the name. A Plugin is an instance of any class that implements the plugin interface.</p></li><li class="listitem"><p><span class="bold"><strong>Implementation class</strong></span> The actual class of a plugin. It may implement several plugin interfaces, but must implement at least one.</p></li><li class="listitem"><p><span class="bold"><strong>Name</strong></span> Plugin implementations can be distinguished from each other by name, a short String meant to symbolically represent the implementation class. They are called "named plugins". Plugins only need to be named when the caller has to make an active choice between them.</p></li><li class="listitem"><p><span class="bold"><strong>SelfNamedPlugin class</strong></span> Plugins that extend the <code class="literal">SelfNamedPlugin</code> class can take advantage of additional features of the Plugin Manager. Any class can be managed as a plugin, so it is not necessary, just possible.</p></li><li class="listitem"><p><span class="bold"><strong>Reusable</strong></span> Reusable plugins are only instantiated once, and the Plugin Manager returns the same (cached) instance whenever that same plugin is requested again. This behavior can be turned off if desired.</p></li></ul></div></div><div class="section" title="13.3.2.&nbsp;Using the Plugin Manager"><div class="titlepage"><div><div><h3 class="title"><a name="N17BAD"></a>13.3.2.&nbsp;Using the Plugin Manager</h3></div></div><div></div></div><div class="section" title="13.3.2.1.&nbsp;Types of Plugin"><div class="titlepage"><div><div><h4 class="title"><a name="N17BB1"></a>13.3.2.1.&nbsp;Types of Plugin</h4></div></div><div></div></div><p>The Plugin Manager supports three different patterns of usage:</p><div class="orderedlist"><ol class="orderedlist" type="1"><li class="listitem"><p><span class="bold"><strong>Singleton Plugins</strong></span> There is only one implementation class for the plugin. It is indicated in the configuration. This type of plugin chooses an implementation of a service, for the entire system, at configuration time. Your application just fetches the plugin for that interface and gets the configured-in choice. See the <a class="link" href="ch13.html#docbook-business.html-pluginmethods">getSinglePlugin()</a> method.</p></li><li class="listitem"><p><span class="bold"><strong>Sequence Plugins</strong></span> You need a sequence or series of plugins, to implement a mechanism like Stackable Authentication or a pipeline, where each plugin is called in order to contribute its implementation of a process to the whole. The Plugin Manager supports this by letting you configure a sequence of plugins for a given interface. See the <a class="link" href="ch13.html#docbook-business.html-pluginmethods">getPluginSequence()</a> method.</p></li><li class="listitem"><p><span class="bold"><strong>Named Plugins</strong></span> Use a named plugin when the application has to choose one plugin implementation out of many available ones. Each implementation is bound to one or more names (symbolic identifiers) in the configuration.</p><p>The name is just a string to be associated with the combination of implementation class and interface. It may contain any characters except for comma (,) and equals (=). It may contain embedded spaces. Comma is a special character used to separate names in the configuration entry.</p><p>Names must be unique within an interface: No plugin classes implementing the same interface may have the same name.</p><p>Think of plugin names as a controlled vocabulary -- for a given plugin interface, there is a set of names for which plugins can be found. The designer of a Named Plugin interface is responsible for deciding what the name means and how to derive it; for example, names of metadata crosswalk plugins may describe the target metadata format.</p><p>See the <a class="link" href="ch13.html#docbook-business.html-pluginmethods">getNamedPlugin()</a> method and the getPluginNames() methods.</p></li></ol></div></div><div class="section" title="13.3.2.2.&nbsp;Self-Named Plugins"><div class="titlepage"><div><div><h4 class="title"><a name="N17BE0"></a>13.3.2.2.&nbsp;<a name="docbook-business.html-selfnamedplugin"></a>Self-Named Plugins</h4></div></div><div></div></div><p>Named plugins can get their names either from the configuration or, for a variant called self-named plugins, from within the plugin itself.</p><p>Self-named plugins are necessary because one plugin implementation can be configured itself to take on many "personalities", each of which deserves its own plugin name. It is already managing its own configuration for each of these personalities, so it makes sense to allow it to export them to the Plugin Manager rather than expecting the plugin configuration to be kept in sync with it own configuration.</p><p>An example helps clarify the point: There is a named plugin that does crosswalks, call it <code class="literal">CrosswalkPlugin</code>. It has several implementations that crosswalk some kind of metadata. Now we add a new plugin which uses XSL stylesheet transformation (XSLT) to crosswalk many types of metadata -- so the single plugin can act like many different plugins, depending on which stylesheet it employs.</p><p>This XSLT-crosswalk plugin has its own configuration that maps a Plugin Name to a stylesheet -- it has to, since of course the Plugin Manager doesn't know anything about stylesheets. It becomes a self-named plugin, so that it reads its configuration data, gets the list of names to which it can respond, and passes those on to the Plugin Manager.</p><p>When the Plugin Manager creates an instance of the XSLT-crosswalk, it records the Plugin Name that was responsible for that instance. The plugin can look at that Name later in order to configure itself correctly for the Name that created it. This mechanism is all part of the SelfNamedPlugin class which is part of any self-named plugin.</p></div><div class="section" title="13.3.2.3.&nbsp;Obtaining a Plugin Instance"><div class="titlepage"><div><div><h4 class="title"><a name="N17BF5"></a>13.3.2.3.&nbsp;Obtaining a Plugin Instance</h4></div></div><div></div></div><p>The most common thing you will do with the Plugin Manager is obtain an instance of a plugin. To request a plugin, you must always specify the plugin interface you want. You will also supply a name when asking for a named plugin.</p><p>A sequence plugin is returned as an array of <code class="literal">Object</code>s since it is actually an ordered list of plugins.</p><p>See the <a class="link" href="ch13.html#docbook-business.html-pluginmethods">getSinglePlugin(), getPluginSequence(), getNamedPlugin()</a> methods.</p></div><div class="section" title="13.3.2.4.&nbsp;Lifecycle Management"><div class="titlepage"><div><div><h4 class="title"><a name="N17C07"></a>13.3.2.4.&nbsp;Lifecycle Management</h4></div></div><div></div></div><p>When <code class="literal">PluginManager</code> fulfills a request for a plugin, it checks whether the implementation class is reusable; if so, it creates one instance of that class and returns it for every subsequent request for that interface and name. If it is not reusable, a new instance is always created.</p><p>For reasons that will become clear later, the manager actually caches a separate instance of an implementation class for each name under which it can be requested.</p><p>You can ask the <code class="literal">PluginManager</code> to forget about (decache) a plugin instance, by releasing it. See the <a class="link" href="ch13.html#docbook-business.html-pluginmethods">PluginManager.releasePlugin()</a> method. The manager will drop its reference to the plugin so the garbage collector can reclaim it. The next time that plugin/name combination is requested, it will create a new instance.</p></div><div class="section" title="13.3.2.5.&nbsp;Getting Meta-Information"><div class="titlepage"><div><div><h4 class="title"><a name="N17C1D"></a>13.3.2.5.&nbsp;Getting Meta-Information</h4></div></div><div></div></div><p>The <code class="literal">PluginManager</code> can list all the names of the Named Plugins which implement an interface. You may need this, for example, to implement a menu in a user interface that presents a choice among all possible plugins. See the <a class="link" href="ch13.html#docbook-business.html-pluginmethods">getPluginNames()</a> method.</p><p>Note that it only returns the plugin name, so if you need a more sophisticated or meaningful "label" (i.e. a key into the I18N message catalog) then you should add a method to the plugin itself to return that.</p></div></div><div class="section" title="13.3.3.&nbsp;Implementation"><div class="titlepage"><div><div><h3 class="title"><a name="N17C2D"></a>13.3.3.&nbsp;Implementation</h3></div></div><div></div></div><p>Note: The <code class="literal">PluginManager</code> refers to interfaces and classes internally only by their names whenever possible, to avoid loading classes until absolutely necessary (i.e. to create an instance). As you'll see below, self-named classes still have to be loaded to query them for names, but for the most part it can avoid loading classes. This saves a lot of time at start-up and keeps the JVM memory footprint down, too. As the Plugin Manager gets used for more classes, this will become a greater concern.</p><p>The only downside of "on-demand" loading is that errors in the configuration don't get discovered right away. The solution is to call the <code class="literal">checkConfiguration()</code> method after making any changes to the configuration.</p><div class="section" title="13.3.3.1.&nbsp;PluginManager Class"><div class="titlepage"><div><div><h4 class="title"><a name="N17C3D"></a>13.3.3.1.&nbsp;<a name="docbook-business.html-pluginmethods"></a>PluginManager Class</h4></div></div><div></div></div><p>The <code class="literal">PluginManager</code> class is your main interface to the Plugin Manager. It behaves like a factory class that never gets instantiated, so its public methods are static.</p><p>Here are the public methods, followed by explanations:</p><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem"><p>
<pre class="screen">static Object getSinglePlugin(Class intface)
throws PluginConfigurationError;</pre>
</p><p> Returns an instance of the singleton (single) plugin implementing the given interface. There must be exactly one single plugin configured for this interface, otherwise the <code class="literal">PluginConfigurationError</code> is thrown.</p><p>Note that this is the only "get plugin" method which throws an exception. It is typically used at initialization time to set up a permanent part of the system so any failure is fatal.</p><p>See the <code class="literal">plugin.single</code> configuration key for configuration details.</p></li><li class="listitem"><p><code class="literal">static Object[] getPluginSequence(Class intface);</code> Returns instances of all plugins that implement the interface <code class="literal">intface</code>, in an <code class="literal">Array</code>. Returns an empty array if no there are no matching plugins.</p><p>The order of the plugins in the array is the same as their class names in the configuration's value field.</p><p>See the <code class="literal">plugin.sequence</code> configuration key for configuration details.</p></li><li class="listitem"><p><code class="literal">static Object getNamedPlugin(Class intface, String name);</code> Returns an instance of a plugin that implements the interface <code class="literal">intface</code> and is bound to a name matching name. If there is no matching plugin, it returns null. The names are matched by <code class="literal">String.equals()</code>.</p><p>See the <code class="literal">plugin.named</code> and <code class="literal">plugin.selfnamed</code> configuration keys for configuration details.</p></li><li class="listitem"><p><code class="literal">static void releasePlugin(Object plugin);</code> Tells the Plugin Manager to let go of any references to a reusable plugin, to prevent it from being given out again and to allow the object to be garbage-collected. Call this when a plugin instance must be taken out of circulation.</p></li><li class="listitem"><p><code class="literal">static String[] getAllPluginNames(Class intface);</code> Returns all of the names under which a named plugin implementing the interface <code class="literal">intface</code> can be requested (with <code class="literal">getNamedPlugin()</code>). The array is empty if there are no matches. Use this to populate a menu of plugins for interactive selection, or to document what the possible choices are.</p><p>The names are NOT returned in any predictable order, so you may wish to sort them first.</p><p>Note: Since a plugin may be bound to more than one name, the list of names this returns does not represent the list of plugins. To get the list of unique implementation classes corresponding to the names, you might have to eliminate duplicates (i.e. create a Set of classes).</p></li><li class="listitem"><p><code class="literal">static void checkConfiguration();</code> Validates the keys in the DSpace <code class="literal">ConfigurationManager</code> pertaining to the Plugin Manager and reports any errors by logging them. This is intended to be used interactively by a DSpace administrator, to check the configuration file after modifying it. See the section about <a class="link" href="ch13.html#docbook-business.html-confval">validating configuration</a> for details.</p></li></ul></div></div><div class="section" title="13.3.3.2.&nbsp;SelfNamedPlugin Class"><div class="titlepage"><div><div><h4 class="title"><a name="N17CB6"></a>13.3.3.2.&nbsp;SelfNamedPlugin Class</h4></div></div><div></div></div><p>A named plugin implementation must extend this class if it wants to supply its own Plugin Name(s). See <a class="link" href="ch13.html#docbook-business.html-selfnamedplugin">Self-Named Plugins</a> for why this is sometimes necessary.</p><pre class="screen">abstract class SelfNamedPlugin
{
// Your class must override this:
// Return all names by which this plugin should be known.
public static String[] getPluginNames();
// Returns the name under which this instance was created.
// This is implemented by SelfNamedPlugin and should NOT be
overridden.
public String getPluginInstanceName();
}</pre></div><div class="section" title="13.3.3.3.&nbsp;Errors and Exceptions"><div class="titlepage"><div><div><h4 class="title"><a name="N17CC3"></a>13.3.3.3.&nbsp;Errors and Exceptions</h4></div></div><div></div></div><pre class="screen">public class PluginConfigurationError extends Error
{
public PluginConfigurationError(String message);
}</pre><p>An error of this type means the caller asked for a single plugin, but either there was no single plugin configured matching that interface, or there was more than one. Either case causes a fatal configuration error.</p><pre class="screen">public class PluginInstantiationException extends RuntimeException
{
public PluginInstantiationException(String msg, Throwable cause)
}</pre><p>This exception indicates a fatal error when instantiating a plugin class. It should only be thrown when something unexpected happens in the course of instantiating a plugin, e.g. an access error, class not found, etc. Simply not finding a class in the configuration is not an exception.</p><p>This is a <code class="literal">RuntimeException</code> so it doesn't have to be declared, and can be passed all the way up to a generalized fatal exception handler.</p></div></div><div class="section" title="13.3.4.&nbsp;Configuring Plugins"><div class="titlepage"><div><div><h3 class="title"><a name="N17CD7"></a>13.3.4.&nbsp;<a name="docbook-business.html-pluginconfig"></a>Configuring Plugins</h3></div></div><div></div></div><p>All of the Plugin Manager's configuration comes from the DSpace Configuration Manager, which is a Java Properties map. You can configure these characteristics of each plugin:</p><div class="orderedlist"><ol class="orderedlist" type="1"><li class="listitem"><p><span class="bold"><strong>Interface</strong></span>: Classname of the Java interface which defines the plugin, including package name. e.g. <code class="literal">org.dspace.app.mediafilter.FormatFilter</code></p></li><li class="listitem"><p><span class="bold"><strong>Implementation Class</strong></span>: Classname of the implementation class, including package. e.g. <code class="literal">org.dspace.app.mediafilter.PDFFilter</code></p></li><li class="listitem"><p><span class="bold"><strong>Names</strong></span>: (Named plugins only) There are two ways to bind names to plugins: listing them in the value of a plugin.named.interface key, or configuring a class in <code class="literal">plugin.selfnamed.interface</code> which extends the <code class="literal">SelfNamedPlugin</code> class.</p></li><li class="listitem"><p><span class="bold"><strong>Reusable option</strong></span>: (Optional) This is declared in a <code class="literal">plugin.reusable</code> configuration line. Plugins are reusable by default, so you only need to configure the non-reusable ones.</p></li></ol></div><div class="section" title="13.3.4.1.&nbsp;Configuring Singleton (Single) Plugins"><div class="titlepage"><div><div><h4 class="title"><a name="N17D0D"></a>13.3.4.1.&nbsp;Configuring Singleton (Single) Plugins</h4></div></div><div></div></div><p>This entry configures a Single Plugin for use with getSinglePlugin():</p><p>
<code class="literal">plugin.single.interface = classname</code>
</p><p>For example, this configures the class <code class="literal">org.dspace.checker.SimpleDispatcher</code> as the plugin for interface <code class="literal">org.dspace.checker.BitstreamDispatcher</code>:</p><p>
<code class="literal">plugin.single.org.dspace.checker.BitstreamDispatcher=org.dspace.checker.SimpleDispatcher</code>
</p></div><div class="section" title="13.3.4.2.&nbsp;Configuring Sequence of Plugins"><div class="titlepage"><div><div><h4 class="title"><a name="N17D29"></a>13.3.4.2.&nbsp;Configuring Sequence of Plugins</h4></div></div><div></div></div><p>This kind of configuration entry defines a Sequence Plugin, which is bound to a sequence of implementation classes. The key identifies the interface, and the value is a comma-separated list of classnames:</p><pre class="programlisting">plugin.sequence.interface = classname, ...</pre><p>The plugins are returned by <code class="literal">getPluginSequence()</code> in the same order as their classes are listed in the configuration value.</p><p>For example, this entry configures Stackable Authentication with three implementation classes:</p><pre class="screen">plugin.sequence.org.dspace.eperson.AuthenticationMethod = \
org.dspace.eperson.X509Authentication, \
org.dspace.eperson.PasswordAuthentication, \
edu.mit.dspace.MITSpecialGroup</pre></div><div class="section" title="13.3.4.3.&nbsp;Configuring Named Plugins"><div class="titlepage"><div><div><h4 class="title"><a name="N17D3D"></a>13.3.4.3.&nbsp;Configuring Named Plugins</h4></div></div><div></div></div><p>There are two ways of configuring named plugins:</p><div class="orderedlist"><ol class="orderedlist" type="1"><li class="listitem"><p><span class="bold"><strong>Plugins Named in the Configuration</strong></span> A named plugin which gets its name(s) from the configuration is listed in this kind of entry:</p><p>
<code class="literal">plugin.named.interface = classname = name [ , name.. ] [ classname = name.. ]</code>
</p><p>The syntax of the configuration value is: classname, followed by an equal-sign and then at least one plugin name. Bind more names to the same implementation class by adding them here, separated by commas. Names may include any character other than comma (,) and equal-sign (=).</p><p>For example, this entry creates one plugin with the names GIF, JPEG, and image/png, and another with the name TeX:</p><pre class="screen">plugin.named.org.dspace.app.mediafilter.MediaFilter = \
org.dspace.app.mediafilter.JPEGFilter = GIF, JPEG, image/png \
org.dspace.app.mediafilter.TeXFilter = TeX</pre><p>This example shows a plugin name with an embedded whitespace character. Since comma (,) is the separator character between plugin names, spaces are legal (between words of a name; leading and trailing spaces are ignored).</p><p>This plugin is bound to the names "Adobe PDF", "PDF", and "Portable Document Format".</p><pre class="screen">plugin.named.org.dspace.app.mediafilter.MediaFilter = \
org.dspace.app.mediafilter.TeXFilter = TeX \
org.dspace.app.mediafilter.PDFFilter = Adobe PDF, PDF, Portable Document Format</pre><p>NOTE: Since there can only be one key with plugin.named. followed by the interface name in the configuration, all of the plugin implementations must be configured in that entry.</p></li><li class="listitem"><p><span class="bold"><strong>Self-Named Plugins</strong></span> Since a self-named plugin supplies its own names through a static method call, the configuration only has to include its interface and classname:</p><p>
<code class="literal">plugin.selfnamed.interface = classname [ , classname.. ]</code>
</p><p>The following example first demonstrates how the plugin class, <code class="literal">XsltDisseminationCrosswalk</code> is configured to implement its own names "MODS" and "DublinCore". These come from the keys starting with <code class="literal">crosswalk.dissemination.stylesheet.</code>. The value is a stylesheet file.</p><p>The class is then configured as a self-named plugin:</p><pre class="screen">crosswalk.dissemination.stylesheet.DublinCore = xwalk/TESTDIM-2-DC_copy.xsl
crosswalk.dissemination.stylesheet.MODS = xwalk/mods.xsl
plugin.selfnamed.crosswalk.org.dspace.content.metadata.DisseminationCrosswalk = \
org.dspace.content.metadata.MODSDisseminationCrosswalk, \
org.dspace.content.metadata.XsltDisseminationCrosswalk
</pre><p>NOTE: Since there can only be one key with <code class="literal">plugin.selfnamed.</code> followed by the interface name in the configuration, all of the plugin implementations must be configured in that entry. The <code class="literal">MODSDisseminationCrosswalk</code> class is only shown to illustrate this point.</p></li></ol></div></div><div class="section" title="13.3.4.4.&nbsp;Configuring the Reusable Status of a Plugin"><div class="titlepage"><div><div><h4 class="title"><a name="N17D87"></a>13.3.4.4.&nbsp;Configuring the Reusable Status of a Plugin</h4></div></div><div></div></div><p>Plugins are assumed to be reusable by default, so you only need to configure the ones which you would prefer not to be reusable. The format is as follows:</p><p>
<code class="literal">plugin.reusable.classname = ( true | false )</code>
</p><p>For example, this marks the PDF plugin from the example above as non-reusable:</p><p>
<code class="literal">plugin.reusable.org.dspace.app.mediafilter.PDFFilter = false</code>
</p></div></div><div class="section" title="13.3.5.&nbsp;Validating the Configuration"><div class="titlepage"><div><div><h3 class="title"><a name="N17D9B"></a>13.3.5.&nbsp;<a name="docbook-business.html-confval"></a>Validating the Configuration</h3></div></div><div></div></div><p>The Plugin Manager is very sensitive to mistakes in the DSpace configuration. Subtle errors can have unexpected consequences that are hard to detect: for example, if there are two "plugin.single" entries for the same interface, one of them will be silently ignored.</p><p>To validate the Plugin Manager configuration, call the <code class="literal">PluginManager.checkConfiguration()</code> method. It looks for the following mistakes:</p><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem"><p> Any duplicate keys starting with "<code class="literal">plugin.</code>".</p></li><li class="listitem"><p> Keys starting <code class="literal">plugin.single</code>, <code class="literal">plugin.sequence</code>, <code class="literal">plugin.named</code>, and <code class="literal">plugin.selfnamed</code> that don't include a valid interface.</p></li><li class="listitem"><p> Classnames in the configuration values that don't exist, or don't implement the plugin interface in the key.</p></li><li class="listitem"><p> Classes declared in plugin.selfnamed lines that don't extend the <code class="literal">SelfNamedPlugin</code> class.</p></li><li class="listitem"><p> Any name collisions among named plugins for a given interface.</p></li><li class="listitem"><p> Named plugin configuration entries without any names.</p></li><li class="listitem"><p> Classnames mentioned in <code class="literal">plugin.reusable</code> keys must exist and have been configured as a plugin implementation class.</p></li></ul></div><p>The <code class="literal">PluginManager</code> class also has a <code class="literal">main()</code> method which simply runs <code class="literal">checkConfiguration()</code>, so you can invoke it from the command line to test the validity of plugin configuration changes.</p><p>Eventually, someone should develop a general configuration-file sanity checker for DSpace, which would just call <code class="literal">PluginManager.checkConfiguration().</code></p></div><div class="section" title="13.3.6.&nbsp;Use Cases"><div class="titlepage"><div><div><h3 class="title"><a name="N17DEF"></a>13.3.6.&nbsp;Use Cases</h3></div></div><div></div></div><p>Here are some usage examples to illustrate how the Plugin Manager works.</p><div class="section" title="13.3.6.1.&nbsp;Managing the MediaFilter plugins transparently"><div class="titlepage"><div><div><h4 class="title"><a name="N17DF5"></a>13.3.6.1.&nbsp;Managing the MediaFilter plugins transparently</h4></div></div><div></div></div><p>The existing DSpace 1.3 MediaFilterManager implementation has been largely replaced by the Plugin Manager. The MediaFilter classes become plugins named in the configuration. Refer to the <a class="link" href="">configuration guide</a> for further details.</p></div><div class="section" title="13.3.6.2.&nbsp;A Singleton Plugin"><div class="titlepage"><div><div><h4 class="title"><a name="N17DFF"></a>13.3.6.2.&nbsp;A Singleton Plugin</h4></div></div><div></div></div><p>This shows how to configure and access a single anonymous plugin, such as the BitstreamDispatcher plugin:</p><p>Configuration:</p><p>
<code class="literal">plugin.single.org.dspace.checker.BitstreamDispatcher=org.dspace.checker.SimpleDispatcher</code>
</p><p>The following code fragment shows how dispatcher, the service object, is initialized and used:</p><pre class="screen">BitstreamDispatcher dispatcher =
(BitstreamDispatcher)PluginManager.getSinglePlugin(BitstreamDispatcher
.class);
int id = dispatcher.next();
while (id != BitstreamDispatcher.SENTINEL)
{
/*
do some processing here
*/
id = dispatcher.next();
}</pre></div><div class="section" title="13.3.6.3.&nbsp;Plugin that Names Itself"><div class="titlepage"><div><div><h4 class="title"><a name="N17E12"></a>13.3.6.3.&nbsp;Plugin that Names Itself</h4></div></div><div></div></div><p>This crosswalk plugin acts like many different plugins since it is configured with different XSL translation stylesheets. Since it already gets each of its stylesheets out of the DSpace configuration, it makes sense to have the plugin give PluginManager the names to which it answers instead of forcing someone to configure those names in two places (and try to keep them synchronized).</p><p>NOTE: Remember how <code class="literal">getPlugin()</code> caches a separate instance of an implementation class for every name bound to it? This is why: the instance can look at the name under which it was invoked and configure itself specifically for that name. Since the instance for each name might be different, the Plugin Manager has to cache a separate instance for each name.</p><p>Here is the configuration file listing both the plugin's own configuration and the <code class="literal">PluginManager</code> config line:</p><pre class="screen">crosswalk.dissemination.stylesheet.DublinCore = xwalk/TESTDIM-2-DC_copy.xsl
crosswalk.dissemination.stylesheet.MODS = xwalk/mods.xsl
plugin.selfnamed.org.dspace.content.metadata.DisseminationCrosswalk = \
org.dspace.content.metadata.XsltDisseminationCrosswalk</pre><p>This look into the implementation shows how it finds configuration entries to populate the array of plugin names returned by the <code class="literal">getPluginNames()</code> method. Also note, in the <code class="literal">getStylesheet()</code> method, how it uses the plugin name that created the current instance (returned by <code class="literal">getPluginInstanceName()</code>) to find the correct stylesheet.</p><pre class="screen">public class XsltDisseminationCrosswalk extends SelfNamedPlugin
{
....
private final String prefix =
"crosswalk.dissemination.stylesheet.";
....
public static String[] getPluginNames()
{
List aliasList = new ArrayList();
Enumeration pe = ConfigurationManager.propertyNames();
while (pe.hasMoreElements())
{
String key = (String)pe.nextElement();
if (key.startsWith(prefix))
aliasList.add(key.substring(prefix.length()));
}
return (String[])aliasList.toArray(new
String[aliasList.size()]);
}
// get the crosswalk stylesheet for an instance of the plugin:
private String getStylesheet()
{
return ConfigurationManager.getProperty(prefix +
getPluginInstanceName());
}
}</pre></div><div class="section" title="13.3.6.4.&nbsp;Stackable Authentication"><div class="titlepage"><div><div><h4 class="title"><a name="N17E38"></a>13.3.6.4.&nbsp;Stackable Authentication</h4></div></div><div></div></div><p>The Stackable Authentication mechanism needs to know all of the plugins configured for the interface, in the order of configuration, since order is significant. It gets a Sequence Plugin from the Plugin Manager. Refer to the <a class="link" href="ch05.html#docbook-configure.html-authentication">Configuration Section on Stackable Authentication</a> for further details.</p></div></div></div><div class="section" title="13.4.&nbsp;Workflow System"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="N17E42"></a>13.4.&nbsp;<a name="docbook-business.html-workflow"></a>Workflow System</h2></div></div><div></div></div><p>The primary classes are:</p><div class="informaltable"><table border="1"><colgroup><col><col></colgroup><tbody><tr><td>
<p>
<code class="literal">org.dspace.content.WorkspaceItem</code>
</p>
</td><td>
<p>contains an Item before it enters a workflow</p>
</td></tr><tr><td>
<p>
<code class="literal">org.dspace.workflow.WorkflowItem</code>
</p>
</td><td>
<p>contains an Item while in a workflow</p>
</td></tr><tr><td>
<p>
<code class="literal">org.dspace.workflow.WorkflowManager</code>
</p>
</td><td>
<p>responds to events, manages the WorkflowItem states</p>
</td></tr><tr><td>
<p>
<code class="literal">org.dspace.content.Collection</code>
</p>
</td><td>
<p>contains List of defined workflow steps</p>
</td></tr><tr><td>
<p>
<code class="literal">org.dspace.eperson.Group</code>
</p>
</td><td>
<p>people who can perform workflow tasks are defined in EPerson Groups</p>
</td></tr><tr><td>
<p>
<code class="literal">org.dspace.core.Email</code>
</p>
</td><td>
<p>used to email messages to Group members and submitters</p>
</td></tr></tbody></table></div><p>The workflow system models the states of an Item in a state machine with 5 states (SUBMIT, STEP_1, STEP_2, STEP_3, ARCHIVE.) These are the three optional steps where the item can be viewed and corrected by different groups of people. Actually, it's more like 8 states, with STEP_1_POOL, STEP_2_POOL, and STEP_3_POOL. These pooled states are when items are waiting to enter the primary states.</p><p>The WorkflowManager is invoked by events. While an Item is being submitted, it is held by a WorkspaceItem. Calling the start() method in the WorkflowManager converts a WorkspaceItem to a WorkflowItem, and begins processing the WorkflowItem's state. Since all three steps of the workflow are optional, if no steps are defined, then the Item is simply archived.</p><p>Workflows are set per Collection, and steps are defined by creating corresponding entries in the List named workflowGroup. If you wish the workflow to have a step 1, use the administration tools for Collections to create a workflow Group with members who you want to be able to view and approve the Item, and the workflowGroup[0] becomes set with the ID of that Group.</p><p>If a step is defined in a Collection's workflow, then the WorkflowItem's state is set to that step_POOL. This pooled state is the WorkflowItem waiting for an EPerson in that group to claim the step's task for that WorkflowItem. The WorkflowManager emails the members of that Group notifying them that there is a task to be performed (the text is defined in config/emails,) and when an EPerson goes to their 'My DSpace' page to claim the task, the WorkflowManager is invoked with a claim event, and the WorkflowItem's state advances from STEP_x_POOL to STEP_x (where x is the corresponding step.) The EPerson can also generate an 'unclaim' event, returning the WorkflowItem to the STEP_x_POOL.</p><p>Other events the WorkflowManager handles are advance(), which advances the WorkflowItem to the next state. If there are no further states, then the WorkflowItem is removed, and the Item is then archived. An EPerson performing one of the tasks can reject the Item, which stops the workflow, rebuilds the WorkspaceItem for it and sends a rejection note to the submitter. More drastically, an abort() event is generated by the admin tools to cancel a workflow outright.</p></div><div class="section" title="13.5.&nbsp;Administration Toolkit"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="N17EBA"></a>13.5.&nbsp;<a name="docbook-business.html-administer"></a>Administration Toolkit</h2></div></div><div></div></div><p>The <code class="literal">org.dspace.administer</code> package contains some classes for administering a DSpace system that are not generally needed by most applications.</p><p>The <code class="literal">CreateAdministrator</code> class is a simple command-line tool, executed via <code class="literal">/dspace/bin/create-administrator</code>, that creates an administrator e-person with information entered from standard input. This is generally used only once when a DSpace system is initially installed, to create an initial administrator who can then use the Web administration UI to further set up the system. This script does not check for authorization, since it is typically run before there are any e-people to authorize! Since it must be run as a command-line tool on the server machine, generally this shouldn't cause a problem. A possibility is to have the script only operate when there are no e-people in the system already, though in general, someone with access to command-line scripts on your server is probably in a position to do what they want anyway!</p><p>The <code class="literal">DCType</code> class is similar to the <code class="literal">org.dspace.content.BitstreamFormat</code> class. It represents an entry in the Dublin Core type registry, that is, a particular element and qualifier, or unqualified element. It is in the <code class="literal">administer</code> package because it is only generally required when manipulating the registry itself. Elements and qualifiers are specified as literals in <code class="literal">org.dspace.content.Item</code> methods and the <code class="literal">org.dspace.content.DCValue</code> class. Only administrators may modify the Dublin Core type registry.</p><p>The <code class="literal">org.dspace.administer.RegistryLoader</code> class contains methods for initializing the Dublin Core type registry and bitstream format registry with entries in an XML file. Typically this is executed via the command line during the build process (see <code class="literal">build.xml</code> in the source.) To see examples of the XML formats, see the files in <code class="literal">config/registries</code> in the source directory. There is no XML schema, they aren't validated strictly when loaded in.</p></div><div class="section" title="13.6.&nbsp;E-person/Group Manager"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="N17EF5"></a>13.6.&nbsp;<a name="docbook-business.html-eperson"></a>E-person/Group Manager</h2></div></div><div></div></div><p>DSpace keeps track of registered users with the <code class="literal">org.dspace.eperson.EPerson</code> class. The class has methods to create and manipulate an <code class="literal">EPerson</code> such as get and set methods for first and last names, email, and password. (Actually, there is no <code class="literal">getPassword()</code> method&mdash;an MD5 hash of the password is stored, and can only be verified with the <code class="literal">checkPassword()</code> method.) There are find methods to find an EPerson by email (which is assumed to be unique,) or to find all EPeople in the system.</p><p>The <code class="literal">EPerson</code> object should probably be reworked to allow for easy expansion; the current EPerson object tracks pretty much only what MIT was interested in tracking - first and last names, email, phone. The access methods are hardcoded and should probably be replaced with methods to access arbitrary name/value pairs for institutions that wish to customize what EPerson information is stored.</p><p>Groups are simply lists of <code class="literal">EPerson</code> objects. Other than membership, <code class="literal">Group</code> objects have only one other attribute: a name. Group names must be unique, so we have adopted naming conventions where the role of the group is its name, such as <code class="literal">COLLECTION_100_ADD</code>. Groups add and remove EPerson objects with <code class="literal">addMember()</code> and <code class="literal">removeMember()</code> methods. One important thing to know about groups is that they store their membership in memory until the <code class="literal">update()</code> method is called - so when modifying a group's membership don't forget to invoke <code class="literal">update()</code> or your changes will be lost! Since group membership is used heavily by the authorization system a fast <code class="literal">isMember()</code> method is also provided.</p><p>Another kind of Group is also implemented in DSpace&mdash;special Groups. The <code class="literal">Context</code> object for each session carries around a List of Group IDs that the user is also a member of&mdash;currently the MITUser Group ID is added to the list of a user's special groups if certain IP address or certificate criteria are met.</p></div><div class="section" title="13.7.&nbsp;Authorization"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="N17F3C"></a>13.7.&nbsp;<a name="docbook-business.html-authorize"></a>Authorization</h2></div></div><div></div></div><p>The primary classes are:</p><div class="informaltable"><table border="1"><colgroup><col><col></colgroup><tbody><tr><td>
<p>
<code class="literal">org.dspace.authorize.AuthorizeManager</code>
</p>
</td><td>
<p>does all authorization, checking policies against Groups</p>
</td></tr><tr><td>
<p>
<code class="literal">org.dspace.authorize.ResourcePolicy</code>
</p>
</td><td>
<p>defines all allowable actions for an object</p>
</td></tr><tr><td>
<p>
<code class="literal">org.dspace.eperson.Group</code>
</p>
</td><td>
<p>all policies are defined in terms of EPerson Groups</p>
</td></tr></tbody></table></div><p>The authorization system is based on the classic 'police state' model of security; no action is allowed unless it is expressed in a policy. The policies are attached to resources (hence the name <code class="literal">ResourcePolicy</code>,) and detail who can perform that action. The resource can be any of the DSpace object types, listed in <code class="literal">org.dspace.core.Constants</code> (<code class="literal">BITSTREAM</code>, <code class="literal">ITEM</code>, <code class="literal">COLLECTION</code>, etc.) The 'who' is made up of EPerson groups. The actions are also in <code class="literal">Constants.java</code> (<code class="literal">READ</code>, <code class="literal">WRITE</code>, <code class="literal">ADD</code>, etc.) The only non-obvious actions are <code class="literal">ADD</code> and <code class="literal">REMOVE</code>, which are authorizations for container objects. To be able to create an Item, you must have <code class="literal">ADD</code> permission in a Collection, which contains Items. (Communities, Collections, Items, and Bundles are all container objects.)</p><p>Currently most of the read policy checking is done with items&mdash;communities and collections are assumed to be openly readable, but items and their bitstreams are checked. Separate policy checks for items and their bitstreams enables policies that allow publicly readable items, but parts of their content may be restricted to certain groups.</p><p>The <code class="literal">AuthorizeManager</code> class'
<code class="literal">authorizeAction(Context, object, action)</code> is the primary source of all authorization in the system. It gets a list of all of the ResourcePolicies in the system that match the object and action. It then iterates through the policies, extracting the EPerson Group from each policy, and checks to see if the EPersonID from the Context is a member of any of those groups. If all of the policies are queried and no permission is found, then an <code class="literal">AuthorizeException</code> is thrown. An <code class="literal">authorizeAction()</code> method is also supplied that returns a boolean for applications that require higher performance.</p><p>ResourcePolicies are very simple, and there are quite a lot of them. Each can only list a single group, a single action, and a single object. So each object will likely have several policies, and if multiple groups share permissions for actions on an object, each group will get its own policy. (It's a good thing they're small.)</p><div class="section" title="13.7.1.&nbsp;Special Groups"><div class="titlepage"><div><div><h3 class="title"><a name="N17FC5"></a>13.7.1.&nbsp;Special Groups</h3></div></div><div></div></div><p>All users are assumed to be part of the public group (ID=0.) DSpace admins (ID=1) are automatically part of all groups, much like super-users in the Unix OS. The Context object also carries around a List of special groups, which are also first checked for membership. These special groups are used at MIT to indicate membership in the MIT community, something that is very difficult to enumerate in the database! When a user logs in with an MIT certificate or with an MIT IP address, the login code adds this MIT user group to the user's Context.</p></div><div class="section" title="13.7.2.&nbsp;Miscellaneous Authorization Notes"><div class="titlepage"><div><div><h3 class="title"><a name="N17FCB"></a>13.7.2.&nbsp;Miscellaneous Authorization Notes</h3></div></div><div></div></div><p>Where do items get their read policies? From the their collection's read policy. There once was a separate item read default policy in each collection, and perhaps there will be again since it appears that administrators are notoriously bad at defining collection's read policies. There is also code in place to enable policies that are timed&mdash;have a start and end date. However, the admin tools to enable these sorts of policies have not been written.</p></div></div><div class="section" title="13.8.&nbsp;Handle Manager/Handle Plugin"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="N17FD1"></a>13.8.&nbsp;<a name="docbook-business.html-handle"></a>Handle Manager/Handle Plugin</h2></div></div><div></div></div><p>The <code class="literal">org.dspace.handle</code> package contains two classes; <code class="literal">HandleManager</code> is used to create and look up Handles, and <code class="literal">HandlePlugin</code> is used to expose and resolve DSpace Handles for the outside world via the CNRI Handle Server code.</p><p>Handles are stored internally in the <code class="literal">handle</code> database table in the form:</p><p><code class="literal">1721.123/4567</code></p><p>Typically when they are used outside of the system they are displayed in either URI or "URL proxy" forms:</p><pre class="screen">hdl:1721.123/4567
http://hdl.handle.net/1721.123/4567</pre><p>It is the responsibility of the caller to extract the basic form from whichever displayed form is used.</p><p>The <code class="literal">handle</code> table maps these Handles to resource type/resource ID pairs, where resource type is a value from <code class="literal">org.dspace.core.Constants</code> and resource ID is the internal identifier (database primary key) of the object. This allows Handles to be assigned to any type of object in the system, though as <a class="link" href="ch02.html#docbook-functional.html-handles">explained in the functional overview</a>, only communities, collections and items are presently assigned Handles.</p><p><code class="literal">HandleManager</code> contains static methods for:</p><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem"><p> Creating a Handle</p></li><li class="listitem"><p> Finding the Handle for a <code class="literal">DSpaceObject</code>, though this is usually only invoked by the object itself, since <code class="literal">DSpaceObject</code> has a <code class="literal">getHandle</code> method</p></li><li class="listitem"><p> Retrieving the <code class="literal">DSpaceObject</code> identified by a particular Handle</p></li><li class="listitem"><p> Obtaining displayable forms of the Handle (URI or "proxy URL").</p></li></ul></div><p><code class="literal">HandlePlugin</code> is a simple implementation of the Handle Server's <code class="literal">net.handle.hdllib.HandleStorage</code> interface. It only implements the basic Handle retrieval methods, which get information from the <code class="literal">handle</code> database table. The CNRI Handle Server is configured to use this plug-in via its <code class="literal">config.dct</code> file.</p><p>Note that since the Handle server runs as a separate JVM to the DSpace Web applications, it uses a separate 'Log4J' configuration, since Log4J does not support multiple JVMs using the same daily rolling logs. This alternative configuration is held as a template in <code class="literal">/dspace/config/templates/log4j-handle-plugin.properties</code>, written to <code class="literal">/dspace/config/log4j-handle-plugin.properties</code> by the <code class="literal">install-configs</code> script. The <code class="literal">/dspace/bin/start-handle-server</code> script passes in the appropriate command line parameters so that the Handle server uses this configuration.</p></div><div class="section" title="13.9.&nbsp;Search"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="N1804A"></a>13.9.&nbsp;<a name="docbook-business.html-search"></a>Search</h2></div></div><div></div></div><p>DSpace's search code is a simple API which currently wraps the Lucene search engine. The first half of the search task is indexing, and <code class="literal">org.dspace.search.DSIndexer</code> is the indexing class, which contains <code class="literal">indexContent()</code> which if passed an <code class="literal">Item</code>, <code class="literal">Community</code>, or <code class="literal">Collection</code>, will add that content's fields to the index. The methods <code class="literal">unIndexContent()</code> and <code class="literal">reIndexContent()</code> remove and update content's index information. The <code class="literal">DSIndexer</code> class also has a <code class="literal">main()</code> method which will rebuild the index completely. This can be invoked by the <code class="literal">dspace/bin/index-init</code> (complete rebuild) or <code class="literal">dspace/bin/index-update</code> (update) script. The intent was for the <code class="literal">main()</code> method to be invoked on a regular basis to avoid index corruption, but we have had no problem with that so far.</p><p>Which fields are indexed by <code class="literal">DSIndexer</code>? These fields are defined in dspace.cfg in the section "Fields to index for search" as name-value-pairs. The name must be unique in the form search.index.i (i is an arbitrary positive number). The value on the right side has a unique value again, which can be referenced in search-form (e.g. title, author). Then comes the metadata element which is indexed. '*' is a wildcard which includes all sub elements. For example:</p><p>
<code class="literal">search.index.4 = keyword:dc.subject.*</code>
</p><p>tells the indexer to create a keyword index containing all dc.subject element values. Since the wildcard ('*') character was used in place of a qualifier, all subject metadata fields will be indexed (e.g. dc.subject.other, dc.subject.lcsh, etc)</p><p>By default, the fields shown in the <code class="literal">Indexed Fields</code> section below are indexed. These are hardcoded in the DSIndexer class. If any search.index.i items are specified in <code class="literal">dspace.cfg</code> these are used rather than these hardcoded fields.</p><p>The query class <code class="literal">DSQuery</code> contains the three flavors of <code class="literal">doQuery()</code> methods&mdash;one searches the DSpace site, and the other two restrict searches to Collections and Communities. The results from a query are returned as three lists of handles; each list represents a type of result. One list is a list of Items with matches, and the other two are Collections and Communities that match. This separation allows the UI to handle the types of results gracefully without resolving all of the handles first to see what kind of content the handle points to. The <code class="literal">DSQuery</code> class also has a <code class="literal">main()</code> method for debugging via command-line searches.</p><div class="section" title="13.9.1.&nbsp;Current Lucene Implementation"><div class="titlepage"><div><div><h3 class="title"><a name="N180AD"></a>13.9.1.&nbsp;Current Lucene Implementation</h3></div></div><div></div></div><p>Currently we have our own Analyzer and Tokenizer classes (<code class="literal">DSAnalyzer</code> and <code class="literal">DSTokenizer</code>) to customize our indexing. They invoke the stemming and stop word features within Lucene. We create an <code class="literal">IndexReader</code> for each query, which we now realize isn't the most efficient use of resources - we seem to run out of filehandles on really heavy loads. (A wildcard query can open many filehandles!) Since Lucene is thread-safe, a better future implementation would be to have a single Lucene IndexReader shared by all queries, and then is invalidated and re-opened when the index changes. Future API growth could include relevance scores (Lucene generates them, but we ignore them,) and abstractions for more advanced search concepts such as booleans.</p></div><div class="section" title="13.9.2.&nbsp;Indexed Fields"><div class="titlepage"><div><div><h3 class="title"><a name="N180BF"></a>13.9.2.&nbsp;Indexed Fields</h3></div></div><div></div></div><p>The <code class="literal">DSIndexer</code> class shipped with DSpace indexes the Dublin Core metadata in the following way:</p><div class="informaltable"><table border="1"><colgroup><col><col></colgroup><tbody><tr><td>
<p>
<span class="bold"><strong>Search Field</strong></span>
</p>
</td><td>
<p>
<span class="bold"><strong>Taken from Dublin Core Fields</strong></span>
</p>
</td></tr><tr><td>
<p>Authors</p>
</td><td>
<p>
<code class="literal">contributor.*</code>
</p>
<p>
<code class="literal">creator.*</code>
</p>
<p>
<code class="literal">description.statementofresponsibility</code>
</p>
</td></tr><tr><td>
<p>Titles</p>
</td><td>
<p>
<code class="literal">title.*</code>
</p>
</td></tr><tr><td>
<p>Keywords</p>
</td><td>
<p>
<code class="literal">subject.*</code>
</p>
</td></tr><tr><td>
<p>Abstracts</p>
</td><td>
<p>
<code class="literal">description.abstract</code>
</p>
<p>
<code class="literal">description.tableofcontents</code>
</p>
</td></tr><tr><td>
<p>Series</p>
</td><td>
<p>
<code class="literal">relation.ispartofseries</code>
</p>
</td></tr><tr><td>
<p>MIME types</p>
</td><td>
<p>
<code class="literal">format.mimetype</code>
</p>
</td></tr><tr><td>
<p>Sponsors</p>
</td><td>
<p>
<code class="literal">description.sponsorship</code>
</p>
</td></tr><tr><td>
<p>Identifiers</p>
</td><td>
<p>
<code class="literal">identifier.*</code>
</p>
</td></tr></tbody></table></div></div><div class="section" title="13.9.3.&nbsp;Harvesting API"><div class="titlepage"><div><div><h3 class="title"><a name="N18174"></a>13.9.3.&nbsp;Harvesting API</h3></div></div><div></div></div><p>The <code class="literal">org.dspace.search</code> package also provides a 'harvesting' API. This allows callers to extract information about items modified within a particular timeframe, and within a particular scope (all of DSpace, or a community or collection.) Currently this is used by the Open Archives Initiative metadata harvesting protocol application, and the e-mail subscription code.</p><p>The <code class="literal">Harvest.harvest</code> is invoked with the required scope and start and end dates. Either date can be omitted. The dates should be in the ISO8601, UTC time zone format used elsewhere in the DSpace system.</p><p><code class="literal">HarvestedItemInfo</code> objects are returned. These objects are simple containers with basic information about the items falling within the given scope and date range. Depending on parameters passed to the <code class="literal">harvest</code> method, the <code class="literal">containers</code> and <code class="literal">item</code> fields may have been filled out with the IDs of communities and collections containing an item, and the corresponding <code class="literal">Item</code> object respectively. Electing not to have these fields filled out means the harvest operation executes considerable faster.</p><p>In case it is required, <code class="literal">Harvest</code> also offers a method for creating a single <code class="literal">HarvestedItemInfo</code> object, which might make things easier for the caller.</p></div></div><div class="section" title="13.10.&nbsp;Browse API"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="N181A3"></a>13.10.&nbsp;<a name="docbook-business.html-browse"></a>Browse API</h2></div></div><div></div></div><p>The browse API maintains indexes of dates, authors, titles and subjects, and allows callers to extract parts of these:</p><div class="variablelist"><dl><dt><span class="term"><span class="bold"><strong>Title</strong></span></span></dt><dd><p> Values of the Dublin Core element <span class="bold"><strong>title</strong></span> (unqualified) are indexed. These are sorted in a case-insensitive fashion, with any leading article removed. For example:</p><p><code class="literal">The DSpace System</code></p><p>Appears under 'D' rather than 'T'.</p></dd><dt><span class="term"><span class="bold"><strong>Author</strong></span></span></dt><dd><p> Values of the <span class="bold"><strong>contributor</strong></span> (any qualifier or unqualified) element are indexed. Since <code class="literal">contributor</code> values typically are in the form 'last name, first name', a simple case-insensitive alphanumeric sort is used which orders authors in last name order.</p><p>Note that this is an index of <span class="emphasis"><em>authors</em></span>, and not <span class="emphasis"><em>items by author</em></span>. If four items have the same author, that author will appear in the index only once. Hence, the index of authors may be greater or smaller than the index of titles; items often have more than one author, though the same author may have authored several items.</p><p>The author indexing in the browse API does have limitations:</p><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem"><p> Ideally, a name that appears as an author for more than one item would appear in the author index only once. For example, 'Doe, John' may be the author of tens of items. However, in practice, author's names often appear in slightly differently forms, for example:</p><pre class="screen">Doe, John
Doe, John Stewart
Doe, John S.</pre><p>Currently, the above three names would all appear as separate entries in the author index even though they may refer to the same author. In order for an author of several papers to be correctly appear once in the index, each item must specify <span class="emphasis"><em>exactly</em></span> the same form of their name, which doesn't always happen in practice.</p></li><li class="listitem"><p> Another issue is that two authors may have the same name, even within a single institution. If this is the case they may appear as one author in the index.</p></li></ul></div><p>These issues are typically resolved in libraries with <span class="emphasis"><em>authority control records</em></span>, in which are kept a 'preferred' form of the author's name, with extra information (such as date of birth/death) in order to distinguish between authors of the same name. Maintaining such records is a huge task with many issues, particularly when metadata is received from faculty directly rather than trained library catalogers. For these reasons, DSpace does not yet feature 'authority control' functionality.</p></dd><dt><span class="term"><span class="bold"><strong>Date of Issue</strong></span></span></dt><dd><p> Items are indexed by date of issue. This may be different from the date that an item appeared in DSpace; many items may have been originally published elsewhere beforehand. The Dublin Core field used is <span class="bold"><strong>date.issued</strong></span>. The ordering of this index may be reversed so 'earliest first' and 'most recent first' orderings are possible.</p><p>Note that the index is of <span class="emphasis"><em>items by date</em></span>, as opposed to an index of <span class="emphasis"><em>dates</em></span>. If 30 items have the same issue date (say 2002), then those 30 items all appear in the index adjacent to each other, as opposed to a single 2002 entry.</p><p>Since dates in DSpace Dublin Core are in ISO8601, all in the UTC time zone, a simple alphanumeric sort is sufficient to sort by date, including dealing with varying granularities of date reasonably. For example:</p><pre class="screen">2001-12-10
2002
2002-04
2002-04-05
2002-04-09T15:34:12Z
2002-04-09T19:21:12Z
2002-04-10</pre></dd><dt><span class="term"><span class="bold"><strong>Date Accessioned</strong></span></span></dt><dd><p> In order to determine which items most recently appeared, rather than using the date of issue, an item's accession date is used. This is the Dublin Core field <span class="bold"><strong>date.accessioned</strong></span>. In other aspects this index is identical to the date of issue index.</p></dd><dt><span class="term"><span class="bold"><strong>Items by a Particular Author</strong></span></span></dt><dd><p> The browse API can perform is to extract items by a particular author. They do not have to be primary author of an item for that item to be extracted. You can specify a scope, too; that is, you can ask for items by author X in collection Y, for example.</p><p>This particular flavor of browse is slightly simpler than the others. You cannot presently specify a particular subset of results to be returned. The API call will simply return all of the items by a particular author within a certain scope.</p><p>Note that the author of the item must <span class="emphasis"><em>exactly</em></span> match the author passed in to the API; see the explanation about the caveats of the author index browsing to see why this is the case.</p></dd><dt><span class="term"><span class="bold"><strong>Subject</strong></span></span></dt><dd><p> Values of the Dublin Core element <span class="bold"><strong>subject</strong></span> (both unqualified and with any qualifier) are indexed. These are sorted in a case-insensitive fashion.</p></dd></dl></div><div class="section" title="13.10.1.&nbsp;Using the API"><div class="titlepage"><div><div><h3 class="title"><a name="N1822D"></a>13.10.1.&nbsp;Using the API</h3></div></div><div></div></div><p>The API is generally invoked by creating a <code class="literal">BrowseScope</code> object, and setting the parameters for which particular part of an index you want to extract. This is then passed to the relevant <code class="literal">Browse</code> method call, which returns a <code class="literal">BrowseInfo</code> object which contains the results of the operation. The parameters set in the <code class="literal">BrowseScope</code> object are:</p><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem"><p>How many entries from the index you want</p></li><li class="listitem"><p>Whether you only want entries from a particular community or collection, or from the whole of DSpace</p></li><li class="listitem"><p>Which part of the index to start from (called the <span class="emphasis"><em>focus</em></span> of the browse). If you don't specify this, the start of the index is used</p></li><li class="listitem"><p>How many entries to include before the <span class="emphasis"><em>focus</em></span> entry</p></li></ul></div><p>To illustrate, here is an example:</p><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem"><p>We want <span class="bold"><strong>7</strong></span> entries in total</p></li><li class="listitem"><p>We want entries from collection <span class="emphasis"><em>x</em></span></p></li><li class="listitem"><p>We want the focus to be 'Really'</p></li><li class="listitem"><p>We want <span class="bold"><strong>2</strong></span> entries included before the focus.</p></li></ul></div><p>The results of invoking <code class="literal">Browse.getItemsByTitle</code> with the above parameters might look like this:</p><pre class="screen"> Rabble-Rousing Rabbis From Sardinia
Reality TV: Love It or Hate It?
FOCUS&gt; The Really Exciting Research Video
Recreational Housework Addicts: Please Visit My House
Regional Television Variation Studies
Revenue Streams
Ridiculous Example Titles: I'm Out of Ideas</pre><p>Note that in the case of title and date browses, <code class="literal">Item</code> objects are returned as opposed to actual titles. In these cases, you can specify the 'focus' to be a specific item, or a partial or full literal value. In the case of a literal value, if no entry in the index matches exactly, the closest match is used as the focus. It's quite reasonable to specify a focus of a single letter, for example.</p><p>Being able to specify a specific item to start at is particularly important with dates, since many items may have the save issue date. Say 30 items in a collection have the issue date 2002. To be able to page through the index 20 items at a time, you need to be able to specify exactly which item's 2002 is the focus of the browse, otherwise each time you invoked the browse code, the results would start at the first item with the issue date 2002.</p><p>Author browses return <code class="literal">String</code> objects with the actual author names. You can only specify the focus as a full or partial literal <code class="literal">String</code>.</p><p>Another important point to note is that presently, the browse indexes contain metadata for all items in the main archive, regardless of authorization policies. This means that all items in the archive will appear to all users when browsing. Of course, should the user attempt to access a non-public item, the usual authorization mechanism will apply. Whether this approach is ideal is under review; implementing the browse API such that the results retrieved reflect a user's level of authorization may be possible, but rather tricky.</p></div><div class="section" title="13.10.2.&nbsp;Index Maintenance"><div class="titlepage"><div><div><h3 class="title"><a name="N1828C"></a>13.10.2.&nbsp;Index Maintenance</h3></div></div><div></div></div><p>The browse API contains calls to add and remove items from the index, and to regenerate the indexes from scratch. In general the content management API invokes the necessary browse API calls to keep the browse indexes in sync with what is in the archive, so most applications will not need to invoke those methods.</p><p>If the browse index becomes inconsistent for some reason, the <code class="literal">InitializeBrowse</code> class is a command line tool (generally invoked using the <code class="literal">/dspace/bin/dspace index-init</code> command) that causes the indexes to be regenerated from scratch.</p></div><div class="section" title="13.10.3.&nbsp;Caveats"><div class="titlepage"><div><div><h3 class="title"><a name="N1829C"></a>13.10.3.&nbsp;Caveats</h3></div></div><div></div></div><p>Presently, the browse API is not tremendously efficient. 'Indexing' takes the form of simply extracting the relevant Dublin Core value, normalizing it (lower-casing and removing any leading article in the case of titles), and inserting that normalized value with the corresponding item ID in the appropriate browse database table. Database views of this table include collection and community IDs for browse operations with a limited scope. When a browse operation is performed, a simple <code class="literal">SELECT</code> query is performed, along the lines of:</p><pre class="screen">SELECT item_id FROM ItemsByTitle ORDER BY sort_title OFFSET 40 LIMIT 20</pre><p>There are two main drawbacks to this: Firstly, <code class="literal">LIMIT</code> and <code class="literal">OFFSET</code> are PostgreSQL-specific keywords. Secondly, the database is still actually performing dynamic sorting of the titles, so the browse code as it stands will not scale particularly well. The code does cache <code class="literal">BrowseInfo</code> objects, so that common browse operations are performed quickly, but this is not an ideal solution.</p></div></div><div class="section" title="13.11.&nbsp;Checksum checker"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="N182B7"></a>13.11.&nbsp;<a name="docbook-business.html-checker"></a>Checksum checker</h2></div></div><div></div></div><p>Checksum checker is used to verify every item within DSpace. While DSpace calculates and records the checksum of every file submitted to it, the checker can determine whether the file has been changed. The idea being that the earlier you can identify a file has changed, the more likely you would be able to record it (assuming it was not a wanted change).</p><p><code class="literal">org.dspace.checker.CheckerCommand</code> class, is the class for the checksum checker tool, which calculates checksums for each bitstream whose ID is in the <code class="literal">most_recent_checksum</code> table, and compares it against the last calculated checksum for that bitstream.</p></div><div class="section" title="13.12.&nbsp;OpenSearch Support"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="N182C9"></a>13.12.&nbsp;<a name="docbook-business.html-opensearch"></a>OpenSearch Support</h2></div></div><div></div></div><p>DSpace is able to support OpenSearch. For those not acquainted with the standard, a very brief introduction, with emphasis on what possibilities it holds for current use and future development.</p><p>OpenSearch is a small set of conventions and documents for describing and using 'search engines', meaning any service that returns a set of results for a query. It is nearly ubiquitous&mdash;but also nearly invisible&mdash;in modern web sites with search capability. If you look at the page source of Wikipedia, Facebook, CNN, etc you will find buried a link element declaring OpenSearch support. It is very much a lowest-common-denominator abstraction (think Google box), but does provide a means to extend its expressive power. This first implementation for DSpace supports <span class="emphasis"><em>none</em></span> of these extensions&mdash;many of which are of potential value&mdash;so it should be regarded as a foundation, not a finished solution. So the short answer is that DSpace appears as a 'search-engine' to OpenSearch-aware software.</p><p>Another way to look at OpenSearch is as a RESTful web service for search, very much like SRW/U, but considerably simpler. This comparative loss of power is offset by the fact that it is widely supported by web tools and players: browsers understand it, as do large metasearch tools.</p><p>&nbsp;</p><p><span class="bold"><strong>How Can It Be Used</strong></span></p><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem"><p>Browser Integration</p><p>Many recent browsers (IE7+, FF2+) can detect, or 'autodiscover', links to the document describing the search engine. Thus you can easily add your or other DSpace instances to the drop-down list of search engines in your browser. This list typically appears in the upper right corner of the browser, with a search box. In Firefox, for example, when you visit a site supporting OpenSearch, the color of the drop-down list widget changes color, and if you open it to show the list of search engines, you are offered an opportunity to add the site to the list. IE works nearly the same way but instead labels the web sites 'search providers'. When you select a DSpace instance as the search engine and enter a search, you are simply sent to the regular search results page of the instance.</p></li><li class="listitem"><p>Flexible, interesting RSS Feeds</p><p>Because one of the formats that OpenSearch specifies for its results is RSS (or Atom), you can turn any search query into an RSS feed. So if there are keywords highly discriminative of content in a collection or repository, these can be turned into a URL that a feed reader can subscribe to. Taken to the extreme, one could take any search a user makes, and dynamically compose an RSS feed URL for it in the page of returned results. To see an example, if you have a DSpace with OpenSearch enabled, try:</p><p>http://dspace.mysite.edu/open-search/?query-&lt;your query&gt;</p><p>The default format returned is Atom 1.0, so you should see an Atom document containing your search results.</p></li><li class="listitem"><p>You can extend the syntax with a few other parameters, as follows:</p><p>
<div class="informaltable"><table border="1" width="100%"><colgroup><col><col></colgroup><tbody><tr><td>Parameter</td><td>Values</td></tr><tr><td>format</td><td>atom, rss, html</td></tr><tr><td>scope</td><td>&lt;handle&gt;&mdash;search is restricted to a collection or community with the indicated handle.</td></tr><tr><td>rpp</td><td>number indicating the number of results per page (i.e. per request)</td></tr><tr><td>start</td><td>number of page to start with (if paginating results)</td></tr><tr><td>sort_by</td><td>number indicating sorting criteria (same as DSpace advanced search values</td></tr></tbody></table></div>
</p></li><li class="listitem"><p>Cheap metasearch</p><p>Search aggregators like A9 (Amazon) recognize OpenSearch-compliant providers, and so can be added to metasearch sets using their UIs. Then you site can be used to aggregate search results with others.</p></li></ul></div><p>Configuration is throught the <code class="literal">dspace.cfg file.</code>See <a class="link" href="ch05.html#docbook-configure.html-opensearch">OpenSearch Support</a></p></div><div class="section" title="13.13.&nbsp;Embargo Support"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="N18332"></a>13.13.&nbsp;<a name="docbook-business.html-embargo"></a>Embargo Support</h2></div></div><div></div></div><div class="section" title="13.13.1.&nbsp;What is an Embargo?"><div class="titlepage"><div><div><h3 class="title"><a name="N18339"></a>13.13.1.&nbsp;What is an Embargo?</h3></div></div><div></div></div><p>An embargo is a temporary access restriction placed on content, commencing at time of accession. It's scope or duration may vary, but the fact that it eventually expires is what distinguishes it from other content restrictions. For example, it is not unusual for content destined for DSpace to come with permanent restrictions on use or access based on license-driven or other IP-based requirements that limit access to institutionally affiliated users. Restrictions such as these are imposed and managed using standard administrative tools in DSpace, typically by attaching specific policies to Items or Collections, Bitstreams, etc. The embargo functionally introduced in 1.6, however, includes tools to automate the imposition and removal of restrictions in managed timeframes.</p></div><div class="section" title="13.13.2.&nbsp;Embargo Model and Life-Cycle"><div class="titlepage"><div><div><h3 class="title"><a name="N1833F"></a>13.13.2.&nbsp;Embargo Model and Life-Cycle</h3></div></div><div></div></div><p>Functionally, the embargo system allows you to attach 'terms' to an item before it is placed into the repository, which express how the embargo should be applied. What do 'we mean by terms' here? They are really any expression that the system is capable of turning into (1) the time the embargo expires, and (2) a concrete set of access restrictions. Some examples:</p><table summary="Simple list" border="0" class="simplelist"><tr><td>"2020-09-12" - an absolute date (i.e. the date embargo will be lifted)</td></tr><tr><td>"6 months" - a time relative to when the item is accessioned</td></tr><tr><td>"forever" - an indefinite, or open-ended embargo</td></tr><tr><td>"local only until 2015" - both a time and an exception (public has no access until 2015, local users OK immediately)</td></tr><tr><td>"Nature Publishing Group standard" - look-up to a policy somewhere (typically 6 months)</td></tr></table><p>These terms are 'interpreted' by the embargo system to yield a specific date on which the embargo can be removed or 'lifted', and a specific set of access policies. Obviously, some terms are easier to interpret than others (the absolute date really requires none at all), and the 'default' embargo logic understands only the most basic terms (the first and third examples above). But as we will see below, the embargo system provides you with the ability to add in your own 'interpreters' to cope with any terms expressions you wish to have. This date that is the result of the interpretation is stored with the item and the embargo system detects when that date has passed, and removes the embargo ("lifts it"), so the item bitstreams become available. Here is a more detailed life-cycle for an embargoed item:</p><div class="orderedlist"><ol class="orderedlist" type="A"><li class="listitem"><p>Terms Assignment.</p><p>The first step in placing an embargo on an item is to attach (assign) 'terms' to it. If these terms are missing, no embargo will be imposed. As we will see below, terms are carried in a configurable DSpace metadata field, so assigning terms just means assigning a value to a metadata field. This can be done in a web submission user interface form, in a SWORD deposit package, a batch import, etc. - anywhere metadata is passed to DSpace. The terms are not immediately acted upon, and may be revised, corrected, removed, etc, up until the next stage of the life-cycle. Thus a submitter could enter one value, and a collection editor replace it, and only the last value will be used. Since metadata fields are multivalued, theoretically there can be multiple terms values, but in the default implementation only one is recognized.</p></li><li class="listitem"><p>Terms interpretation/imposition.</p><p>In DSpace terminology, when an item has exited the last of any workflow steps (or if none have been defined for it), it is said to be 'installed' into the repository. At this precise time, the 'interpretation' of the terms occurs, and a computed 'lift date' is assigned, which like the terms is recorded in a configurable metadata field. It is important to understand that this interpretation happens only once, (just like the installation), and cannot be revisited later. Thus, although an administrator can assign a new value to the metadata field holding the terms after the item has been installed, this will have no effect on the embargo, whose 'force' now resides entirely in the 'lift date' value. For this reason, you cannot embargo content already in your repository (at least using standard tools). The other action taken at installation time is the actual imposition of the embargo. The default behavior here is simply to remove the read policies on all the bundles and bitstreams except for the "LICENSE" or "METADATA" bundles. See the section on <span class="emphasis"><em>Extending Embargo Functionality</em></span> for how to alter this behavior. Also note that since these policy changes occur before installation, there is no time during which embargoed content is 'exposed' (accessible by non-administrators). The terms interpretation and imposition together are called 'setting' the embargo, and the component that performs them both is called the embargo 'setter'.</p></li><li class="listitem"><p>Embargo Period.</p><p>After an embargoed item has been installed, the policy restrictions remain in effect until removed. This is not an automatic process, however: a 'lifter' must be run periodically to look for items whose 'lift date' is past. Note that this means the effective removal of an embargo is <span class="bold"><strong>not</strong></span> the lift date, but the earliest date after the lift date that the lifter is run. Typically, a nightly cron-scheduled invocation of the lifter is more than adequate, given the granularity of embargo terms. Also note that during the embargo period, all metadata of the item remains visible. This default behavior can be changed. One final point to note is that the 'lift date', although it was computed and assigned during the previous stage, is in the end a regular metadata field. That means, if there are extraordinary circumstances that require an administrator (or collection editor&mdash;anyone with edit permissions on metadata) to change the lift date, they can do so. Thus, they can 'revise' the lift date without reference to the original terms. This date will be checked the next time the 'lifter' is run. One could immediately lift the embargo by setting the lift date to the current day, or change it to 'forever' to indefinitely postpone lifting.</p></li><li class="listitem"><p>Embargo Lift.</p><p>When the lifter discovers an item whose lift date is in the past, it removes (lifts) the embargo. The default behavior of the lifter is to add the resource policies <span class="emphasis"><em>that would have been added</em></span> had the embargo not been imposed. That is, it replicates the standard DSpace behavior, in which an item inherits it's policies from its owning collection. As with all other parts of the embargo system, you may replace or extend the default behavior of the lifter (see section V. below). You may wish, e.g. to send an email to an administrator or other interested parties, when an embargoed item becomes available.</p></li><li class="listitem"><p>Post Embargo.</p><p>After the embargo has been lifted, the item ceases to respond to any of the embargo life-cycle events. The values of the metadata fields reflect essentially historical or provenance values. With the exception of the additional metadata fields, they are indistinguishable from items that were never subject to embargo.</p></li></ol></div></div></div></div><HR><p class="copyright">Copyright <20> 2002-2010
<a class="ulink" href="http://www.duraspace.org/" target="_top">DuraSpace</a>
</p><div class="legalnotice" title="Legal Notice"><a name="N1001D"></a><p>
<a class="ulink" href="http://creativecommons.org/licenses/by/3.0/us/" target="_top">
<span class="inlinemediaobject"><img src="http://i.creativecommons.org/l/by/3.0/us/88x31.png"></span>
</a>
</p><p>Licensed under a Creative Commons Attribution 3.0 United States License</p></div><div class="navfooter"><hr><table summary="Navigation footer" width="100%"><tr><td align="left" width="40%"><a accesskey="p" href="ch12.html">Prev</a>&nbsp;</td><td align="center" width="20%">&nbsp;</td><td align="right" width="40%">&nbsp;<a accesskey="n" href="ch14.html">Next</a></td></tr><tr><td valign="top" align="left" width="40%">Chapter&nbsp;12.&nbsp;DSpace System Documentation: Application Layer&nbsp;</td><td align="center" width="20%"><a accesskey="h" href="index.html">Home</a></td><td valign="top" align="right" width="40%">&nbsp;Chapter&nbsp;14.&nbsp;DSpace System Documentation: Customizing and Configuring Submission User Interface</td></tr></table></div></body></html>