mirror of
https://github.com/DSpace/DSpace.git
synced 2025-10-15 22:13:08 +00:00

of submissions, and regenerate the docbook, html, and pdf. git-svn-id: http://scm.dspace.org/svn/repo/branches/dspace-1_5_x@3053 9c30dcfa-912a-0410-8fc2-9e0234be79fd
382 lines
93 KiB
HTML
382 lines
93 KiB
HTML
<html><head><META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"><title>Chapter 9. DSpace System Documentation: Application Layer</title><meta content="DocBook XSL Stylesheets V1.74.0" name="generator"><link rel="home" href="index.html" title="DSpace 1.5.1Beta1 Manual"><link rel="up" href="index.html" title="DSpace 1.5.1Beta1 Manual"><link rel="prev" href="ch08.html" title="Chapter 8. DSpace System Documentation: Architecture"><link rel="next" href="ch10.html" title="Chapter 10. DSpace System Documentation: Business Logic Layer"></head><body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"><div class="navheader"><table summary="Navigation header" width="100%"><tr><th align="center" colspan="3">Chapter 9. DSpace System Documentation: Application Layer</th></tr><tr><td align="left" width="20%"><a accesskey="p" href="ch08.html">Prev</a> </td><th align="center" width="60%"> </th><td align="right" width="20%"> <a accesskey="n" href="ch10.html">Next</a></td></tr></table><hr></div><div class="chapter" lang="en"><div class="titlepage"><div><div><h2 class="title"><a name="N12F39"></a>Chapter 9. DSpace System Documentation: Application Layer</h2></div></div></div><div class="toc"><p><b>Table of Contents</b></p><dl><dt><span class="section"><a href="ch09.html#N12F43">9.1. Web User Interface</a></span></dt><dd><dl><dt><span class="section"><a href="ch09.html#N12F4E">9.1.1. Web UI Files</a></span></dt><dt><span class="section"><a href="ch09.html#N1301C">9.1.2. The Build Process</a></span></dt><dt><span class="section"><a href="ch09.html#N130A0">9.1.3. Servlets and JSPs</a></span></dt><dt><span class="section"><a href="ch09.html#N13153">9.1.4. Custom JSP Tags</a></span></dt><dt><span class="section"><a href="ch09.html#N131F8">9.1.5. Internationalisation</a></span></dt><dt><span class="section"><a href="ch09.html#N132D5">9.1.6. HTML Content in Items</a></span></dt><dt><span class="section"><a href="ch09.html#N1334F">9.1.7. Thesis Blocking</a></span></dt></dl></dd><dt><span class="section"><a href="ch09.html#N13365">9.2. OAI-PMH Data Provider</a></span></dt><dd><dl><dt><span class="section"><a href="ch09.html#N13424">9.2.1. Sets</a></span></dt><dt><span class="section"><a href="ch09.html#N13433">9.2.2. Unique Identifier</a></span></dt><dt><span class="section"><a href="ch09.html#N1345F">9.2.3. Access control</a></span></dt><dt><span class="section"><a href="ch09.html#N13469">9.2.4. Modification Date (OAI Date Stamp)</a></span></dt><dt><span class="section"><a href="ch09.html#N1346F">9.2.5. 'About' Information</a></span></dt><dt><span class="section"><a href="ch09.html#N13475">9.2.6. Deletions</a></span></dt><dt><span class="section"><a href="ch09.html#N13484">9.2.7. Flow Control (Resumption Tokens)</a></span></dt></dl></dd><dt><span class="section"><a href="ch09.html#N134C8">9.3. Community and Collection Structure Importer</a></span></dt><dd><dl><dt><span class="section"><a href="ch09.html#N134E7">9.3.1. Limitation</a></span></dt></dl></dd><dt><span class="section"><a href="ch09.html#N134F0">9.4. Package Importer and Exporter</a></span></dt><dd><dl><dt><span class="section"><a href="ch09.html#N13509">9.4.1. Ingesting</a></span></dt><dt><span class="section"><a href="ch09.html#N1353D">9.4.2. Disseminating</a></span></dt><dt><span class="section"><a href="ch09.html#N1356A">9.4.3. METS packages</a></span></dt></dl></dd><dt><span class="section"><a href="ch09.html#N1358A">9.5. Item Importer and Exporter</a></span></dt><dd><dl><dt><span class="section"><a href="ch09.html#N13593">9.5.1. DSpace simple archive format</a></span></dt><dt><span class="section"><a href="ch09.html#N135D0">9.5.2. Importing Items</a></span></dt><dt><span class="section"><a href="ch09.html#N13608">9.5.3. Exporting Items</a></span></dt></dl></dd><dt><span class="section"><a href="ch09.html#N13622">9.6. Transferring Items Between DSpace Instances</a></span></dt><dt><span class="section"><a href="ch09.html#N13661">9.7. Registering (Not Importing) Bitstreams</a></span></dt><dd><dl><dt><span class="section"><a href="ch09.html#N13672">9.7.1. Accessible Storage</a></span></dt><dt><span class="section"><a href="ch09.html#N13690">9.7.2. Registering Items Using the Item Importer</a></span></dt><dt><span class="section"><a href="ch09.html#N13722">9.7.3. Internal Identification and Retrieval of Registered Items</a></span></dt><dt><span class="section"><a href="ch09.html#N1374E">9.7.4. Exporting Registered Items</a></span></dt><dt><span class="section"><a href="ch09.html#N13758">9.7.5. METS Export of Registered Items</a></span></dt><dt><span class="section"><a href="ch09.html#N13762">9.7.6. Deleting Registered Items</a></span></dt></dl></dd><dt><span class="section"><a href="ch09.html#N1377A">9.8. METS Tools</a></span></dt><dd><dl><dt><span class="section"><a href="ch09.html#N13783">9.8.1. The Export Tool</a></span></dt><dt><span class="section"><a href="ch09.html#N137B4">9.8.2. The AIP Format</a></span></dt><dt><span class="section"><a href="ch09.html#N13857">9.8.3. Limitations</a></span></dt></dl></dd><dt><span class="section"><a href="ch09.html#N1386F">9.9. MediaFilters: Transforming DSpace Content</a></span></dt><dt><span class="section"><a href="ch09.html#N1393B">9.10. Sub-Community Management</a></span></dt></dl></div><p>
|
|
<a class="ulink" href="architecture.html" target="_top">Back to architecture overview</a>
|
|
</p><div class="section" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="N12F43"></a>9.1. <a name="docbook-application.html-webui"></a>Web User Interface</h2></div></div></div><p>The DSpace Web UI is the largest and most-used component in the application layer. Built on Java Servlet and JavaServer Page technology, it allows end-users to access DSpace over the Web via their Web browsers. As of Dspace 1.3.2 the UI meets both XHTML 1.0 standards and Web Accessibility Initiative (WAI) level-2 standard.</p><p>It also features an administration section, consisting of pages intended for use by central administrators. Presently, this part of the Web UI is not particularly sophisticated; users of the administration section need to know what they are doing! Selected parts of this may also be used by collection administrators.</p><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="N12F4E"></a>9.1.1. Web UI Files</h3></div></div></div><p>The Web UI-related files are located in a variety of directories in the DSpace source tree. Note that as of DSpace version 1.2, the deployment mechanism has changed; the build process creates easy-to-deploy Web application archives (<code class="literal">.war</code> files).</p><div class="table"><a name="N12F58"></a><p class="title"><b>Table 9.1. Locations of Web UI Source Files</b></p><div class="table-contents"><table summary="Locations of Web UI Source Files" border="0"><colgroup><col><col></colgroup><tbody><tr><td>
|
|
<p>
|
|
<span class="bold"><strong>Location</strong></span>
|
|
</p>
|
|
</td><td>
|
|
<p>
|
|
<span class="bold"><strong>Description</strong></span>
|
|
</p>
|
|
</td></tr><tr><td>
|
|
<p>
|
|
<code class="literal">org.dspace.app.webui</code>
|
|
</p>
|
|
</td><td>
|
|
<p>Web UI source files</p>
|
|
</td></tr><tr><td>
|
|
<p>
|
|
<code class="literal">org.dspace.app.webui.filter</code>
|
|
</p>
|
|
</td><td>
|
|
<p>Servlet Filters (Servlet 2.3 spec)</p>
|
|
</td></tr><tr><td>
|
|
<p>
|
|
<code class="literal">org.dspace.app.webui.jsptag</code>
|
|
</p>
|
|
</td><td>
|
|
<p>Custom JSP tag class files</p>
|
|
</td></tr><tr><td>
|
|
<p>
|
|
<code class="literal">org.dspace.app.webui.servlet</code>
|
|
</p>
|
|
</td><td>
|
|
<p>Servlets for main Web UI (controllers)</p>
|
|
</td></tr><tr><td>
|
|
<p>
|
|
<code class="literal">org.dspace.app.webui.servlet.admin</code>
|
|
</p>
|
|
</td><td>
|
|
<p>Servlets that comprise the administration part of the Web UI</p>
|
|
</td></tr><tr><td>
|
|
<p>
|
|
<code class="literal">org.dspace.app.webui.util</code>
|
|
</p>
|
|
</td><td>
|
|
<p>Miscellaneous classes used by the servlets and filters</p>
|
|
</td></tr><tr><td>
|
|
<p>
|
|
<code class="literal">[dspace-source]/jsp</code>
|
|
</p>
|
|
</td><td>
|
|
<p>The JSP files</p>
|
|
</td></tr><tr><td>
|
|
<p>
|
|
<code class="literal">[dspace-source]/jsp/local</code>
|
|
</p>
|
|
</td><td>
|
|
<p>This is where you can place customized versions of JSPs -- see <a class="ulink" href="configure.html#customui" target="_top">the configuration section</a></p>
|
|
</td></tr><tr><td>
|
|
<p>
|
|
<code class="literal">[dspace-source]/jsp/WEB-INF/dspace-tags.tld</code>
|
|
</p>
|
|
</td><td>
|
|
<p>Custom DSpace JSP tag descriptor</p>
|
|
</td></tr><tr><td>
|
|
<p>
|
|
<code class="literal">[dspace-source]/etc/dspace-web.xml</code>
|
|
</p>
|
|
</td><td>
|
|
<p>The Web application deployment descriptor. Before including in the <code class="literal">.war</code> file, the text <code class="literal">@@dspace.dir@@</code> will be replaced with the DSpace installation directory (referred to as <span class="emphasis"><em>[dspace]</em></span> elsewhere in this system documentation). This allows the Web application to pick up the DSpace configuration and environment.</p>
|
|
</td></tr></tbody></table></div></div><br class="table-break"></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="N1301C"></a>9.1.2. <a name="docbook-application.html-webui_build"></a>The Build Process</h3></div></div></div><p>The DSpace build process constructs a Web application archive, which is placed in <code class="literal">[dspace-source]/build/dspace.war</code>. The <code class="literal">build_wars</code> Ant target does the work. The process works as follows:</p><div class="itemizedlist"><ul type="disc"><li><p> All the DSpace source code is compiled.</p></li><li><p><code class="literal">[dspace-source]/etc/dspace-web.xml</code> is copied to <code class="literal">[dspace-source]/build</code> and the <code class="literal">@@dspace.dir@@</code> token inside it replaced with the DSpace installation directory (<code class="literal">dspace.dir</code> property from <code class="literal">dspace.cfg</code></p></li><li><p> The JSPs are all copied to <code class="literal">[dspace-source]/build/jsp</code></p></li><li><p> Customized JSPs from <code class="literal">[dspace-source]/jsp/local</code> are copied on top of these, thus 'overriding' the default versions</p></li><li><p><code class="literal">[dspace-source]/build/dspace.war</code> is built</p></li></ul></div><p>The contents of <code class="literal">dspace.war</code> are:</p><div class="itemizedlist"><ul type="disc"><li><p> (Top level) -- the JSPs (customized versions from <code class="literal">[dspace-source]/jsp/local</code> will have overwritten the defaults from the DSpace source distribution)</p></li><li><p><code class="literal">WEB-INF/classes</code> -- the compiled DSpace classes</p></li><li><p><code class="literal">WEB-INF/lib</code> -- the third party library JAR files from <code class="literal">[dspace-source]/lib</code>, minus <code class="literal">servlet.jar</code> which will be available as part of Tomcat (or other servlet engine)</p></li><li><p><code class="literal">WEB-INF/web.xml</code> -- web deployment descriptor, copied from <code class="literal">[dspace-source]/build/dspace-web.xml</code></p></li><li><p><code class="literal">WEB-INF/dspace-tags.tld</code> -- tag descriptor</p></li></ul></div><p>Note that this does mean there are multiple copies of the compiled DSpace code and third-party libraries in the system, so care must be taken to ensure that they are all in sync. (The storage overhead is a few megabytes, totally insignificant these days.) In general, when you change any DSpace code or JSP, it's best to do a complete update of both the installation (<code class="literal">[dspace]</code>), and to rebuild and redeploy the Web UI and OAI <code class="literal">.war</code> files, by running this in <code class="literal">[dspace-source]</code>:</p><pre class="screen">
|
|
ant -D<span class="emphasis"><em> [dspace]</em></span>/config/dspace.cfg update
|
|
</pre><p>and then following the instructions that command writes to the console.</p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="N130A0"></a>9.1.3. Servlets and JSPs</h3></div></div></div><p>The Web UI is loosely based around the MVC (model, view, controller) model. The content management API corresponds to the model, the Java Servlets are the controllers, and the JSPs are the views. Interactions take the following basic form:</p><div class="orderedlist"><ol type="1"><li><p> An HTTP request is received from a browser</p></li><li><p> The appropriate servlet is invoked, and processes the request by invoking the DSpace business logic layer public API</p></li><li><p> Depending on the outcome of the processing, the servlet invokes the appropriate JSP</p></li><li><p> The JSP is processed and sent to the browser</p></li></ol></div><p>The reasons for this approach are:</p><div class="itemizedlist"><ul type="disc"><li><p> All of the processing is done before the JSP is invoked, so any error or problem that occurs does not occur halfway through HTML rendering</p></li><li><p> The JSPs contain as little code as possible, so they can be customized without having to delve into Java code too much</p></li></ul></div><p>The <code class="literal">org.dspace.app.webui.servlet.LoadDSpaceConfig</code> servlet is always loaded first. This is a very simple servlet that checks the <code class="literal">dspace-config</code> context parameter from the DSpace deployment descriptor, and uses it to locate <code class="literal">dspace.cfg</code>. It also loads up the Log4j configuration. It's important that this servlet is loaded first, since if another servlet is loaded up, it will cause the system to try and load DSpace and Log4j configurations, neither of which would be found.</p><p>All DSpace servlets are subclasses of the <code class="literal">DSpaceServlet</code> class. The <code class="literal">DSpaceServlet</code> class handles some basic operations such as creating a DSpace <code class="literal">Context</code> object (opening a database connection etc.), authentication and error handling. Instead of overriding the <code class="literal">doGet</code> and <code class="literal">doPost</code> methods as one normally would for a servlet, DSpace servlets implement <code class="literal">doDSGet</code> or <code class="literal">doDSPost</code> which have an extra context parameter, and allow the servlet to throw various exceptions that can be handled in a standard way.</p><p>The DSpace servlet processes the contents of the HTTP request. This might involve retrieving the results of a search with a query term, accessing the current user's eperson record, or updating a submission in progress. According to the results of this processing, the servlet must decide which JSP should be displayed. The servlet then fills out the appropriate attributes in the <code class="literal">HttpRequest</code> object that represents the HTTP request being processed. This is done by invoking the <code class="literal">setAttribute</code> method of the <code class="literal">javax.servlet.http.HttpServletRequest</code> object that is passed into the servlet from Tomcat. The servlet then forwards control of the request to the appropriate JSP using the <code class="literal">JSPManager.showJSP</code> method.</p><p>The <code class="literal">JSPManager.showJSP</code> method uses the standard Java servlet forwarding mechanism is then used to forward the HTTP request to the JSP. The JSP is processed by Tomcat and the results sent back to the user's browser.</p><p>There is an exception to this servlet/JSP style: <code class="literal">index.jsp</code>, the 'home page', receives the HTTP request directly from Tomcat without a servlet being invoked first. This is because in the servlet 2.3 specification, there is no way to map a servlet to handle only requests made to '<code class="literal">/</code>'; such a mapping results in every request being directed to that servlet. By default, Tomcat forwards requests to '<code class="literal">/</code>' to <code class="literal">index.jsp</code>. To try and make things as clean as possible, <code class="literal">index.jsp</code> contains some simple code that would normally go in a servlet, and then forwards to <code class="literal">home.jsp</code> using the <code class="literal">JSPManager.showJSP</code> method. This means localized versions of the 'home page' can be created by placing a customized <code class="literal">home.jsp</code> in <code class="literal">[dspace-source]/jsp/local</code>, in the same manner as other JSPs.</p><p><code class="literal">[dspace-source]/jsp/dspace-admin/index.jsp</code>, the administration UI index page, is invoked directly by Tomcat and not through a servlet for similar reasons.</p><p>At the top of each JSP file, right after the license and copyright header, is documented the appropriate attributes that a servlet must fill out prior to forwarding to that JSP. No validation is performed; if the servlet does not fill out the necessary attributes, it is likely that an internal server error will occur.</p><p>Many JSPs containing forms will include hidden parameters that tell the servlets which form has been filled out. The submission UI servlet (<code class="literal">SubmissionController</code> is a prime example of a servlet that deals with the input from many different JSPs. The <code class="literal">step</code> and <code class="literal">page</code> hidden parameters (written out by the <code class="literal">SubmissionController.getSubmissionParameters()</code> method) are used to inform the servlet which page of which step has just been filled out (i.e. which page of the submission the user has just completed).</p><p>Below is a detailed, scary diagram depicting the flow of control during the whole process of processing and responding to an HTTP request. More information about the authentication mechanism is mostly <a class="ulink" href="configure.html#authenticate" target="_top">described in the configuration section</a>.</p><p>
|
|
<span class="inlinemediaobject"><img src="image/web-ui-flow.gif" width="585"></span>
|
|
</p><p>Flow of Control During HTTP Request Processing</p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="N13153"></a>9.1.4. Custom JSP Tags</h3></div></div></div><p>The DSpace JSPs all use some custom tags defined in <code class="literal">/dspace/jsp/WEB-INF/dspace-tags.tld</code>, and the corresponding Java classes reside in <code class="literal">org.dspace.app.webui.jsptag</code>. The tags are listed below. The <code class="literal">dspace-tags.tld</code> file contains detailed comments about how to use the tags, so that information is not repeated here.</p><div class="variablelist"><dl><dt><span class="term">
|
|
<code class="literal">layout</code>
|
|
</span></dt><dd><p> Just about every JSP uses this tag. It produces the standard HTML header and <code class="literal"><BODY></code>tag. Thus the content of each JSP is nested inside a <code class="literal"><dspace:layout></code> tag. The (XML-style)attributes of this tag are slightly complicated--see <code class="literal">dspace-tags.tld</code>. The JSPs in the source code bundle also provide plenty of examples.</p></dd><dt><span class="term">
|
|
<code class="literal">sidebar</code>
|
|
</span></dt><dd><p> Can only be used inside a <code class="literal">layout</code> tag, and can only be used once per JSP. The content between the start and end <code class="literal">sidebar</code> tags is rendered in a column on the right-hand side of the HTML page. The contents can contain further JSP tags and Java 'scriptlets'.</p></dd><dt><span class="term">
|
|
<code class="literal">date</code>
|
|
</span></dt><dd><p> Displays the date represented by an <code class="literal">org.dspace.content.DCDate</code> object. Just the one representation of date is rendered currently, but this could use the user's browser preferences to display a localized date in the future.</p></dd><dt><span class="term">
|
|
<code class="literal">include</code>
|
|
</span></dt><dd><p> Obsolete, simple tag, similar to <code class="literal">jsp:include</code>. In versions prior to DSpace 1.2, this tag would use the locally modified version of a JSP if one was installed in jsp/local. As of 1.2, the build process now performs this function, however this tag is left in for backwards compatibility.</p></dd><dt><span class="term">
|
|
<code class="literal">item</code>
|
|
</span></dt><dd><p> Displays an item record, including Dublin Core metadata and links to the bitstreams within it. Note that the displaying of the bitstream links is simplistic, and does not take into account any of the bundling structure. This is because DSpace does not have a fully-fledged dissemination architectural piece yet.</p><p>Displaying an item record is done by a tag rather than a JSP for two reasons: Firstly, it happens in several places (when verifying an item record during submission or workflow review, as well as during standard item accesses), and secondly, displaying the item turns out to be mostly code-work rather than HTML anyway. Of course, the disadvantage of doing it this way is that it is slightly harder to customize exactly what is displayed from an item record; it is necessary to edit the tag code (<code class="literal">org.dspace.app.webui.jsptag.ItemTag</code>). Hopefully a better solution can be found in the future.</p></dd><dt><span class="term"><code class="literal">itemlist</code>, <code class="literal">collectionlist</code>, <code class="literal">communitylist</code></span></dt><dd><p> These tags display ordered sequences of items, collections and communities, showing minimal information but including a link to the page containing full details. These need to be used in HTML tables.</p></dd><dt><span class="term">
|
|
<code class="literal">popup</code>
|
|
</span></dt><dd><p> This tag is used to render a link to a pop-up page (typically a help page.) If Javascript is available, the link will either open or pop to the front any existing DSpace pop-up window. If Javascript is not available, a standard HTML link is displayed that renders the link destination in a window named '<code class="literal">dspace.popup</code>'. In graphical browsers, this usually opens a new window or re-uses an existing window of that name, but if a window is re-used it is not 'raised' which might confuse the user. In text browsers, following this link will simply replace the current page with the destination of the link. This obviously means that Javascript offers the best functionality, but other browsers are still supported.</p></dd><dt><span class="term">
|
|
<code class="literal">selecteperson</code>
|
|
</span></dt><dd><p> A tag which produces a widget analogous to HTML <code class="literal"><SELECT></code>, that allows a user to select one or multiple e-people from a pop-up list.</p></dd><dt><span class="term">
|
|
<code class="literal">sfxlink</code>
|
|
</span></dt><dd><p> Using an item's Dublin Core metadata DSpace can display an SFX link, if an SFX server is available. This tag does so for a particular item if the <code class="literal">sfx.server.url</code> property is defined in <code class="literal">dspace.cfg</code>.</p></dd></dl></div></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="N131F8"></a>9.1.5. <a name="docbook-application.html-i18n"></a>Internationalisation</h3></div></div></div><p>The <a class="ulink" href="http://jakarta.apache.org/taglibs/doc/standard-1.0-doc/intro.html" target="_top">Java Standard Tag Library v1.0</a> is used to specify messages in the JSPs like this:</p><p>OLD:</p><pre class="screen">
|
|
<H1>Search Results</H1>
|
|
</pre><p>NEW:</p><pre class="screen">
|
|
<H1><fmt:message key="jsp.search.results.title"
|
|
/></H1>
|
|
</pre><p>This message can now be changed using the <code class="literal">config/language-packs/Messages.properties</code> file. (This must be done at build-time: <code class="literal">Messages.properties</code> is placed in the <code class="literal">dspace.war</code> Web application file.)</p><pre class="screen">
|
|
jsp.search.results.title = Search Results
|
|
</pre><p>Phrases may have parameters to be passed in, to make the job of translating easier, reduce the number of 'keys' and to allow translators to make the translated text flow more appropriately for the target language.</p><p>OLD:</p><pre class="screen">
|
|
<P>Results <%= r.getFirst() %> to <%= r.getLast() %> of <%=
|
|
r.getTotal() %></P>
|
|
</pre><p>NEW:</p><pre class="screen">
|
|
<fmt:message key="jsp.search.results.text">
|
|
<fmt:param><%= r.getFirst() %></fmt:param>
|
|
<fmt:param><%= r.getLast() %></fmt:param>
|
|
<fmt:param><%= r.getTotal() %></fmt:param>
|
|
</fmt:message>
|
|
</pre><p>(Note: JSTL 1.0 does not seem to allow JSP <%= %> expressions to be passed in as values of attribute in <fmt:param value=""/>)</p><p>The above would appear in the <code class="literal">Messages_xx.properties</code> file as:</p><pre class="screen">
|
|
jsp.search.results.text = Results {0}-{1} of {2}
|
|
</pre><p>Introducing number parameters that should be formatted according to the locale used makes no difference in the message key compared to atring parameters:</p><pre class="screen">
|
|
jsp.submit.show-uploaded-file.size-in-bytes = {0} bytes
|
|
</pre><p>In the JSP using this key can be used in the way belov:</p><pre class="screen">
|
|
<fmt:message
|
|
key="jsp.submit.show-uploaded-file.size-in-bytes">
|
|
<fmt:param><fmt:formatNumber><%= bitstream.getSize()
|
|
%></fmt:formatNumber></fmt:param>
|
|
</fmt:message>
|
|
</pre><p>(Note: JSTL offers a way to include numbers in the message keys as <code class="literal">jsp.foo.key = {0,number} bytes</code>. Setting the parameter as <code class="literal"><fmt:param value="${variable}" /></code> workes when <code class="literal">variable</code> is a single variable name and doesn't work when trying to use a method's return value instead: <code class="literal">bitstream.getSize()</code>. Passing the number as string (or using the <%= %> expression) also does not work.)</p><p>Multiple <code class="literal">Messages.properties</code> can be created for different languages. See <a class="ulink" href="http://java.sun.com/j2se/1.4.2/docs/api/java/util/ResourceBundle.html#getBundle(java.lang.String,%20java.util.Locale,%20java.lang.ClassLoader)" target="_top">ResourceBundle.getBundle</a>. e.g. you can add German and Canadian French translations:</p><pre class="screen">
|
|
Messages_de.properties
|
|
Messages_fr_CA.properties
|
|
</pre><p>The end user's browser settings determine which language is used. The English language file <code class="literal">Messages.properties</code> (or the default server locale) will be used as a default if there's no language bundle for the end user's preferred language. (Note that the English file is not called <code class="literal">Messages_en.properties</code> -- this is so it is always available as a default, regardless of server configuration.)</p><p>The <code class="literal">dspace:layout</code> tag has been updated to allow dictionary keys to be passed in for the titles. It now has two new parameters: <code class="literal">titlekey</code> and <code class="literal">parenttitlekey</code>. So where before you'd do:</p><pre class="screen">
|
|
<dspace:layout title="Here"
|
|
parentlink="/mydspace"
|
|
parenttitle="My DSpace">
|
|
</pre><p>You now do:</p><pre class="screen">
|
|
<dspace:layout titlekey="jsp.page.title"
|
|
parentlink="/mydspace"
|
|
parenttitlekey="jsp.mydspace">
|
|
|
|
</pre><p>And so the layout tag itself gets the relevant stuff out of the dictionary. <code class="literal">title</code> and <code class="literal">parenttitle</code> still work as before for backwards compatibility, and the odd spot where that's preferable.</p><div class="section" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="N1328A"></a>Message Key Convention</h4></div></div></div><p>When translating further pages, please follow the convention for naming message keys to avoid clashes.</p><p><span class="bold"><strong>For text in JSPs</strong></span> use the complete path + filename of the JSP, then a one-word name for the message. e.g. for the title of <code class="literal">jsp/mydspace/main.jsp</code> use:</p><pre class="screen">
|
|
jsp.mydspace.main.title
|
|
</pre><p>Some common words (e.g. "Help") can be brought out into keys starting <code class="literal">jsp.</code> for ease of translation, e.g.:</p><pre class="screen">
|
|
jsp.admin = Administer
|
|
</pre><p>Other common words/phrases are brought out into 'general' parameters if they relate to a set (directory) of JSPs, e.g.</p><pre class="screen">
|
|
jsp.tools.general.delete = Delete
|
|
</pre><p>Phrases that relate <span class="bold"><strong>strongly</strong></span> to a topic (eg. MyDSpace) but used in many JSPs outside the particular directory are more convenient to be cross-referenced. For example one could use the key below in <code class="literal">jsp/submit/saved.jsp</code> to provide a link back to the user's <span class="emphasis"><em>MyDSpace</em></span>:</p><p>
|
|
<span class="emphasis"><em>(Cross-referencing of keys <span class="bold"><strong>in general</strong></span> is not a good idea as it may make maintenance more difficult. But in some cases it has more advantages as the meaning is obvious.)</em></span>
|
|
</p><pre class="screen">
|
|
jsp.mydspace.general.goto-mydspace = Go to My DSpace
|
|
</pre><p><span class="bold"><strong>For text in servlet code</strong></span>, in custom JSP tags or wherever applicable use the fully qualified classname + a one-word name for the message. e.g.</p><pre class="screen">
|
|
org.dspace.app.webui.jsptag.ItemListTag.title = Title
|
|
</pre></div><div class="section" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="N132CB"></a>Which Languages are currently supported?</h4></div></div></div><p>To view translations currently being developed, please refer to the <a class="ulink" href="http://wiki.dspace.org/I18nSupport" target="_top">i18n page</a> of the DSpace Wiki.</p></div></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="N132D5"></a>9.1.6. HTML Content in Items</h3></div></div></div><p>For the most part, the DSpace item display just gives a link that allows an end-user to download a bitstream. However, if a bundle has a primary bitstream whose format is of MIME type <code class="literal">text/html</code>, instead a link to the HTML servlet is given.</p><p>So if we had an HTML document like this:</p><pre class="screen">
|
|
contents.html
|
|
chapter1.html
|
|
chapter2.html
|
|
chapter3.html
|
|
figure1.gif
|
|
figure2.jpg
|
|
figure3.gif
|
|
figure4.jpg
|
|
figure5.gif
|
|
figure6.gif
|
|
</pre><p>The Bundle's primary bitstream field would point to the contents.html Bitstream, which we know is HTML (check the format MIME type) and so we know which to serve up first.</p><p>The HTML servlet employs a trick to serve up HTML documents without actually modifying the HTML or other files themselves. Say someone is looking at <code class="literal">contents.html</code> from the above example, the URL in their browser will look like this:</p><pre class="screen">
|
|
https://dspace.mit.edu/html/1721.1/12345/contents.html
|
|
</pre><p>If there's an image called <code class="literal">figure1.gif</code> in that HTML page, the browser will do HTTP GET on this URL:</p><pre class="screen">
|
|
https://dspace.mit.edu/html/1721.1/12345/figure1.gif
|
|
</pre><p>The HTML document servlet can work out which item the user is looking at, and then which Bitstream in it is called <code class="literal">figure1.gif</code>, and serve up that bitstream. Similar for following links to other HTML pages. Of course all the links and image references have to be relative and not absolute.</p><p>HTML documents must be "self-contained", as <a class="ulink" href="functional.html#html" target="_top">explained here</a>. Provided that full path information is known by DSpace, any depth or complexity of HTML document can be served subject to those contraints. This is usually possible with some kind of batch import. If, however, the document has been uploaded one file at a time using the Web UI, the path information has been stripped. The system can cope with relative links that refer to a deeper path, e.g.</p><pre class="screen">
|
|
<IMG SRC="images/figure1.gif">
|
|
</pre><p>If the item has been uploaded via the Web submit UI, in the Bitstream table in the database we have the 'name' field, which will contain the filename with no path (<code class="literal">figure1.gif</code>). We can still work out what <code class="literal">images/figure1.gif</code> is by making the HTML document servlet strip any path that comes in from the URL, e.g.</p><pre class="screen">
|
|
https://dspace.mit.edu/html/1721.1/12345/images/figure1.gif
|
|
^^^^^^^
|
|
Strip this
|
|
</pre><p>BUT all the filenames (regardless of directory names) must be unique. For example, this wouldn't work:</p><pre class="screen">
|
|
contents.html
|
|
chapter1.html
|
|
chapter2.html
|
|
chapter1_images/figure.gif
|
|
chapter2_images/figure.gif
|
|
</pre><p>since the HTML document servlet wouldn't know which bitstream to serve up for:</p><pre class="screen">
|
|
https://dspace.mit.edu/html/1721.1/12345/chapter1_images/figure.gif
|
|
https://dspace.mit.edu/html/1721.1/12345/chapter2_images/figure.gif
|
|
</pre><p>since it would just have <code class="literal">figure.gif</code></p><p>To prevent "infinite URL spaces" appearing (e.g. if a file <code class="literal">foo.html</code> linked to <code class="literal">bar/foo.html</code>, which would link to <code class="literal">bar/bar/foo.html</code>...) this behavior can be configured by setting the configuration property <code class="literal">webui.html.max-depth-guess</code>.</p><p>For example, if we receive a request for <code class="literal">foo/bar/index.html</code>, and we have a bitstream called just <code class="literal">index.html</code>, we will serve up that bitstream for the request if <code class="literal">webui.html.max-depth-guess</code> is 2 or greater. If <code class="literal">webui.html.max-depth-guess</code> is 1 or less, we would not serve that bitstream, as the depth of the file is greater. If <code class="literal">webui.html.max-depth-guess</code> is zero, the request filename and path must always exactly match the bitstream name. The default value (if that property is not present in <code class="literal">dspace.cfg</code>) is 3.</p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="N1334F"></a>9.1.7. Thesis Blocking</h3></div></div></div><p>The submission UI has an optional feature that came about as a result of MIT Libraries policy. If the <code class="literal">block.theses</code> parameter in <code class="literal">dspace.cfg</code> is <code class="literal">true</code>, an extra checkbox is included in the first page of the submission UI. This asks the user if the submission is a thesis. If the user checks this box, the submission is halted (deleted) and an error message displayed, explaining that DSpace should not be used to submit theses. This feature can be turned off and on, and the message displayed (<code class="literal">/dspace/jsp/submit/no-theses.jsp</code> can be localized as necessary.</p></div></div><div class="section" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="N13365"></a>9.2. <a name="docbook-application.html-oai"></a>OAI-PMH Data Provider</h2></div></div></div><p>The DSpace platform supports the <a class="ulink" href="http://www.openarchives.org/" target="_top">Open Archives Initiative Protocol for Metadata Harvesting</a> (OAI-PMH) version 2.0 as a data provider. This is accomplished using the <a class="ulink" href="http://www.oclc.org/research/software/oai/cat.shtm" target="_top">OAICat framework from OCLC</a>.</p><p>The DSpace build process builds a Web application archive, <code class="literal">[dspace-source]/build/oai.war</code>), in much the same way as <a class="link" href="ch09.html#docbook-application.html-webui_build">the Web UI build process</a> described above. The only differences are that the JSPs are not included, and <code class="literal">[dspace-source]/etc/oai-web.xml</code> is used as the deployment descriptor. This 'webapp' is deployed to receive and respond to OAI-PMH requests via HTTP. Note that typically it should <span class="emphasis"><em>not</em></span> be deployed on SSL (<code class="literal">https:</code> protocol). In a typical configuration, this is deployed at <code class="literal">oai</code>, for example:</p><pre class="screen">
|
|
http://dspace.myu.edu/oai/request?verb=Identify
|
|
</pre><p>The 'base URL' of this DSpace deployment would be:</p><pre class="screen">
|
|
http://dspace.myu.edu/oai/request
|
|
</pre><p>It is this URL that should be registered with <a class="ulink" href="http://www.openarchives.org/" target="_top">www.openarchives.org</a>. Note that you can easily change the '<code class="literal">request</code>' portion of the URL by editing <code class="literal">[dspace-source]/etc/oai-web.xml</code> and rebuilding and deploying <code class="literal">oai.war</code>.</p><p>DSpace provides implementations of the OAICat interfaces <code class="literal">AbstractCatalog</code>, <code class="literal">RecordFactory</code> and <code class="literal">Crosswalk</code> that interface with the DSpace content management API and harvesting API (in the search subsystem).</p><p>Only the basic <code class="literal">oai_dc</code> unqualified Dublin Core metadata set export is enabled by default; this is particularly easy since all items have qualified Dublin Core metadata. When this metadata is harvested, the qualifiers are simply stripped; for example, <code class="literal">description.abstract</code> is exposed as unqualified <code class="literal">description</code>. The <code class="literal">description.provenance</code> field is hidden, as this contains private information about the submitter and workflow reviewers of the item, including their e-mail addresses. Additionally, to keep in line with OAI community practices, values of <code class="literal">contributor.author</code> are exposed as <code class="literal">creator</code> values.</p><p>Other metadata formats are supported as well, using other <code class="literal">Crosswalk</code> implementations; consult the <code class="literal">oaicat.properties</code> file described below. To enable a format, simply uncomment the lines beginning with <code class="literal">Crosswalks.*</code>. Multiple formats are allowed, and the current list includes, in addition to unqualified DC: MPEG DIDL, METS, MODS. There is also an incomplete, experimental qualified DC.</p><p>Note that the current simple DC implementation (<code class="literal">org.dspace.app.oai.OAIDCCrosswalk</code>) does not currently strip out any invalid XML characters that may be lying around in the data. If your database contains a DC value with, for example, some ASCII control codes (form feed etc.) this may cause OAI harvesters problems. This should rarely occur, however. XML entities (such as <code class="literal">></code>) are encoded (e.g. to <code class="literal">&gt;</code>)</p><p>In addition to the implementations of the OAICat interfaces, there are two configuration files relevant to OAI support:</p><div class="variablelist"><dl><dt><span class="term">
|
|
<code class="literal">oaicat.properties</code>
|
|
</span></dt><dd><p> This resides as a template in <code class="literal">[dspace]/config/templates</code>, and the live version is written to <code class="literal">[dspace]/config</code>. You probably won't need to edit this; the <code class="literal">install-configs</code> script fills out the relevant deployment-specific parameters. You might want to change the <code class="literal">earliestDatestamp</code> field to accurately reflect the oldest datestamp in the system. (Note that this is the value of the <code class="literal">last_modified</code> column in the <code class="literal">Item</code> database table.)</p></dd><dt><span class="term">
|
|
<code class="literal">oai-web.xml</code>
|
|
</span></dt><dd><p> This standard Java Servlet 'deployment descriptor' is stored in the source as <code class="literal">[dspace-source]/etc/oai-web.xml</code>, and is written to <code class="literal">/dspace/oai/WEB-INF/web.xml</code>.</p></dd></dl></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="N13424"></a>9.2.1. Sets</h3></div></div></div><p>OAI-PMH allows repositories to expose an hierarchy of sets in which records may be placed. A record can be in zero or more sets.</p><p>DSpace exposes collections as sets. The organization of communities is likely to change over time, and is therefore a less stable basis for selective harvesting.</p><p>Each collection has a corresponding OAI set, discoverable by harvesters via the ListSets verb. The setSpec is the Handle of the collection, with the ':' and '/' converted to underscores so that the Handle is a legal setSpec, for example:</p><pre class="screen">
|
|
hdl_1721.1_1234
|
|
</pre><p>Naturally enough, the collection name is also the name of the corresponding set.</p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="N13433"></a>9.2.2. Unique Identifier</h3></div></div></div><p>Every item in OAI-PMH data repository must have an unique identifier, which must conform to the URI syntax. As of DSpace 1.2, Handles are not used; this is because in OAI-PMH, the OAI identifier identifies the <span class="emphasis"><em>metadata record</em></span> associated with the <span class="emphasis"><em>resource</em></span>. The <span class="emphasis"><em>resource</em></span> is the DSpace item, whose <span class="emphasis"><em>resource identifier</em></span> is the Handle. In practical terms, using the Handle for the OAI identifier may cause problems in the future if DSpace instances share items with the same Handles; the OAI metadata record identifiers should be different as the different DSpace instances would need to be harvested separately and may have different metadata for the item.</p><p>The OAI identifiers that DSpace uses are of the form:</p><p>
|
|
<code class="literal">oai:host name:handle</code>
|
|
</p><p>For example:</p><p>
|
|
<code class="literal">oai:dspace.myu.edu:123456789/345</code>
|
|
</p><p>If you wish to use a different scheme, this can easily be changed by editing the value of <code class="literal">OAI_ID_PREFIX</code> at the top of the <code class="literal">org.dspace.app.oai.DSpaceOAICatalog</code> class. (You do not need to change the code if the above scheme works for you; the code picks up the host name and Handles automatically from the DSpace configuration.)</p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="N1345F"></a>9.2.3. Access control</h3></div></div></div><p>OAI provides no authentication/authorisation details, although these could be implemented using standard HTTP methods. It is assumed that all access will be anonymous for the time being.</p><p>A question is, "is all metadata public?" Presently the answer to this is yes; all metadata is exposed via OAI-PMH, even if the item has restricted access policies. The reasoning behind this is that people who do actually have permission to read a restricted item should still be able to use OAI-based services to discover the content.</p><p>If in the future, this 'expose all metadata' approach proves unsatisfactory for any reason, it should be possible to expose only publicly readable metadata. The authorisation system has separate permissions for READing and item and READing the content (bitstreams) within it. This means the system can differentiate between an item with public metadata and hidden content, and an item with hidden metadata as well as hidden content. In this case the OAI data repository should only expose items those with anonymous READ access, so it can hide the existence of records to the outside world completely. In this scenario, one should be wary of protected items that are made public after a time. When this happens, the items are "new" from the OAI-PMH perspective.</p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="N13469"></a>9.2.4. Modification Date (OAI Date Stamp)</h3></div></div></div><p>OAI-PMH harvesters need to know when a record has been created, changed or deleted. DSpace keeps track of a 'last modified' date for each item in the system, and this date is used for the OAI-PMH date stamp. This means that any changes to the metadata (e.g. admins correcting a field, or a withdrawal) will be exposed to harvesters.</p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="N1346F"></a>9.2.5. 'About' Information</h3></div></div></div><p>As part of each record given out to a harvester, there is an optional, repeatable "about" section which can be filled out in any (XML-schema conformant) way. Common uses are for provenance and rights information, and there are schemas in use by OAI communities for this. Presently DSpace does not provide any of this information.</p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="N13475"></a>9.2.6. Deletions</h3></div></div></div><p>DSpace keeps track of deletions (withdrawals). These are exposed via OAI, which has a specific mechansim for dealing with this. Since DSpace keeps a permanent record of withdrawn items, in the OAI-PMH sense DSpace supports deletions 'persistently'. This is as opposed to 'transient' deletion support, which would mean that deleted records are forgotten after a time.</p><p>Once an item has been withdrawn, OAI-PMH harvests of the date range in which the withdrawal occurred will find the 'deleted' record header. Harvests of a date range prior to the withdrawal will <span class="emphasis"><em>not</em></span> find the record, despite the fact that the record did exist at that time.</p><p>As an example of this, consider an item that was created on 2002-05-02 and withdrawn on 2002-10-06. A request to harvest the month 2002-10 will yield the 'record deleted' header. However, a harvest of the month 2002-05 will not yield the original record.</p><p>Note that presently, the deletion of 'expunged' items is not exposed through OAI.</p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="N13484"></a>9.2.7. Flow Control (Resumption Tokens)</h3></div></div></div><p>An OAI data provider can prevent any performance impact caused by harvesting by forcing a harvester to receive data in time-separated chunks. If the data provider receives a request for a lot of data, it can send part of the data with a resumption token. The harvester can then return later with the resumption token and continue.</p><p>DSpace supports resumption tokens for 'ListRecords' OAI-PMH requests. ListIdentifiers and ListSets requests do not produce a particularly high load on the system, so resumption tokens are not used for those requests.</p><p>Each OAI-PMH ListRecords request will return at most 100 records. This limit is set at the top of <code class="literal">org.dspace.app.oai.DSpaceOAICatalog.java</code> (<code class="literal">MAX_RECORDS</code>). A potential issue here is that if a harvest yields an exact multiple of <code class="literal">MAX_RECORDS</code>, the last operation will result in a harvest with no records in it. It is unclear from the OAI-PMH specification if this is acceptable.</p><p>When a resumption token is issued, the optional <code class="literal">completeListSize</code> and <code class="literal">cursor</code> attributes are not included. OAICat sets the <code class="literal">expirationDate</code> of the resumption token to one hour after it was issued, though in fact since DSpace resumption tokens contain all the information required to continue a request they do not actually expire.</p><p>Resumption tokens contain all the state information required to continue a request. The format is:</p><pre class="screen">
|
|
from/until/setSpec/offset
|
|
</pre><p><code class="literal">from</code> and <code class="literal">until</code> are the ISO 8601 dates passed in as part of the original request, and <code class="literal">setSpec</code> is also taken from the original request. <code class="literal">offset</code> is the number of records that have already been sent to the harvester. For example:</p><pre class="screen">
|
|
2003-01-01//hdl_1721_1_1234/300
|
|
</pre><p>This means the harvest is 'from' <code class="literal">2003-01-01</code>, has no 'until' date, is for collection hdl:1721.1/1234, and 300 records have already been sent to the harvester. (Actually, if the original OAI-PMH request doesn't specify a 'from' or 'until, OAICat fills them out automatically to '0000-00-00T00:00:00Z' and '9999-12-31T23:59:59Z' respectively. This means DSpace resumption tokens will always have from and until dates in them.)</p></div></div><div class="section" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="N134C8"></a>9.3. <a name="docbook-application.html-structbuilder"></a>Community and Collection Structure Importer</h2></div></div></div><p>This command-line tool gives you the ability to import a community and collection structure directly from a source XML file. It is executed as follows:</p><p>
|
|
<code class="literal">[dspace]/bin/structure-builder -f [source xml] -o [output xml file] -e [administrator email]</code>
|
|
</p><p>This will examine the contents of <code class="literal">[source xml]</code>, import the structure into DSpace while logged in as the supplied administrator, and then output the same structure to the output file, but including the handle for each imported community and collection as an attribute.</p><p>The source xml document needs to be in the following format:</p><pre class="screen">
|
|
<import_structure>
|
|
<community>
|
|
<name>Community Name</name>
|
|
<description>Descriptive
|
|
text</description>
|
|
<intro>Introductory text</intro>
|
|
<copyright>Special copyright
|
|
notice</copyright>
|
|
<sidebar>Sidebar text</sidebar>
|
|
<community>
|
|
<name>Sub Community Name</name>
|
|
<community> ...[ad infinitum]...
|
|
</community>
|
|
</community>
|
|
<collection>
|
|
<name>Collection Name</name>
|
|
<description>Descriptive
|
|
text</description>
|
|
<intro>Introductory text</intro>
|
|
<copyright>Special copyright
|
|
notice</copyright>
|
|
<sidebar>Sidebar text</sidebar>
|
|
<license>Special
|
|
licence</license>
|
|
<provenance>Provenance
|
|
information</provenance>
|
|
</collection>
|
|
</community>
|
|
</import_structure>
|
|
</pre><p>The resulting output document will be as follows:</p><pre class="screen">
|
|
<import_structure>
|
|
<community identifier="123456789/1">
|
|
<name>Community Name</name>
|
|
<description>Descriptive
|
|
text</description>
|
|
<intro>Introductory text</intro>
|
|
<copyright>Special copyright
|
|
notice</copyright>
|
|
<sidebar>Sidebar text</sidebar>
|
|
<community identifier="123456789/2">
|
|
<name>Sub Community Name</name>
|
|
<community identifier="123456789/3"> ...[ad
|
|
infinitum]... </community>
|
|
</community>
|
|
<collection identifier="123456789/4">
|
|
<name>Collection Name</name>
|
|
<description>Descriptive
|
|
text</description>
|
|
<intro>Introductory text</intro>
|
|
<copyright>Special copyright
|
|
notice</copyright>
|
|
<sidebar>Sidebar text</sidebar>
|
|
<license>Special
|
|
licence</license>
|
|
<provenance>Provenance
|
|
information</provenance>
|
|
</collection>
|
|
</community>
|
|
</import_structure>
|
|
</pre><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="N134E7"></a>9.3.1. Limitation</h3></div></div></div><div class="itemizedlist"><ul type="disc"><li><p> Currently this does not export community and collection structures, although it should only be a small modification to make it do so</p></li></ul></div></div></div><div class="section" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="N134F0"></a>9.4. <a name="docbook-application.html-packager"></a>Package Importer and Exporter</h2></div></div></div><p>This command-line tool gives you access to the Packager plugins. It can <span class="emphasis"><em>ingest</em></span> a package to create a new DSpace Item, or <span class="emphasis"><em>disseminate</em></span> an Item as a package.</p><p>To see all the options, invoke it as:</p><pre class="screen">
|
|
<span class="emphasis"><em> [dspace]</em></span>/bin/packager --help
|
|
</pre><p> This mode also displays a list of the names of package ingesters and disseminators that are available.</p><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="N13509"></a>9.4.1. Ingesting</h3></div></div></div><p>To ingest a package from a file, give the command:</p><pre class="screen">
|
|
<span class="emphasis"><em> [dspace]</em></span>/bin/packager -e <span class="emphasis"><em> user</em></span> -c <span class="emphasis"><em> handle</em></span> -t <span class="emphasis"><em> packager</em></span> <span class="emphasis"><em>
|
|
path</em></span>
|
|
</pre><p> Where <span class="emphasis"><em>user</em></span> is the e-mail address of the E-Person under whose authority this runs; <span class="emphasis"><em>handle</em></span> is the Handle of the collection into which the Item is added, <span class="emphasis"><em>packager</em></span> is the plugin name of the package ingester to use, and <span class="emphasis"><em>path</em></span> is the path to the file to ingest (or <code class="literal">"-"</code> to read from the standard input).</p><p> Here is an example that loads a PDF file with internal metadata as a package:</p><pre class="screen">
|
|
/dspace/bin/packager -e florey@mit.edu -c 1721.2/13 -t pdf
|
|
thesis.pdf
|
|
</pre><p>This example takes the result of retrieving a URL and ingests it:</p><pre class="screen">
|
|
wget -O - http://alum.mit.edu/jarandom/my-thesis.pdf | \
|
|
/dspace/bin/packager -e florey@mit.edu -c 1721.2/13 -t pdf -
|
|
</pre></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="N1353D"></a>9.4.2. Disseminating</h3></div></div></div><p>To disseminate an Item as a package, give the command:</p><pre class="screen">
|
|
<span class="emphasis"><em> [dspace]</em></span>/bin/packager -e <span class="emphasis"><em> user</em></span> -d -i <span class="emphasis"><em> handle</em></span> -t <span class="emphasis"><em> packager</em></span> <span class="emphasis"><em>
|
|
path</em></span>
|
|
</pre><p> Where <span class="emphasis"><em>user</em></span> is the e-mail address of the E-Person under whose
|
|
authority this runs; <span class="emphasis"><em>handle</em></span> is the Handle of the Item to disseminate;
|
|
<span class="emphasis"><em>packager</em></span> is the plugin name of the package disseminator to use; and
|
|
<span class="emphasis"><em>path</em></span> is the path to the file to create (or <code class="literal">"-"</code> to write to the
|
|
standard output). This example writes an Item out as a METS package
|
|
in the file "454.zip": <pre class="screen">
|
|
/dspace/bin/packager -e florey@mit.edu -d -i 1721.2/454 -t METS
|
|
454.zip
|
|
</pre></p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="N1356A"></a>9.4.3. METS packages</h3></div></div></div><p>DSpace 1.4 includes a package disseminator and matching ingester for the DSpace METS SIP (Submission Information Package) format. They were created to help end users prepare sets of digital resources and metadata for submission to the archive using well-defined standards such as <a class="ulink" href="http://www.loc.gov/standards/mets/" target="_top">METS</a>, <a class="ulink" href="http://www.loc.gov/standards/mods/" target="_top">MODS</a>, <a class="ulink" href="http://www.loc.gov/standards/premis/" target="_top">and PREMIS</a>. The plugin name is <code class="literal">METS</code> by default, and it uses MODS for descriptive metadata.</p><p>The DSpace METS SIP profile is available at: <a class="ulink" href="http://www.dspace.org/standards/METS/SIP/profilev1p0/metsipv1p0.pdf" target="_top">
|
|
<code class="literal">http://www.dspace.org/standards/METS/SIP/profilev1p0/metsipv1p0.pdf</code>
|
|
</a>
|
|
.</p></div></div><div class="section" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="N1358A"></a>9.5. <a name="docbook-application.html-itemimporter"></a>Item Importer and Exporter</h2></div></div></div><p>DSpace has a set of command line tools for importing and exporting items in batches, using the DSpace simple archive format. The tools are not terribly robust, but are useful and are easily modified. They also give a good demonstration of how to implement your own item importer if desired.</p><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="N13593"></a>9.5.1. DSpace simple archive format</h3></div></div></div><p>The basic concept behind the DSpace's simple archive format is to create an archive, which is directory full of items, with a subdirectory per item. Each item directory contains a file for the item's descriptive metadata, and the files that make up the item.</p><pre class="screen">
|
|
archive_directory/
|
|
item_000/
|
|
dublin_core.xml -- qualified Dublin Core metadata
|
|
contents -- text file containing one line per filename
|
|
file_1.doc -- files to be added as bitstreams to the
|
|
item
|
|
file_2.pdf
|
|
item_001/
|
|
dublin_core.xml
|
|
contents
|
|
file_1.png
|
|
...
|
|
</pre><p>The <code class="literal">dublin_core.xml</code> file has the following format, where each Dublin Core element has it's own entry within a <code class="literal"><dcvalue></code> tagset. There are currently three tag attributes available in the <code class="literal"><dcvalue></code> tagset:</p><div class="itemizedlist"><ul type="disc"><li><p><code class="literal"><element></code> - the Dublin Core element</p></li><li><p><code class="literal"><qualifier></code> - the element's qualifier</p></li><li><p><code class="literal"><language></code> - (optional)ISO language code for element</p></li></ul></div><pre class="screen">
|
|
<dublin_core>
|
|
<dcvalue element="title" qualifier="none">A Tale of Two
|
|
Cities</dcvalue>
|
|
<dcvalue element="date"
|
|
qualifier="issued">1990</dcvalue></dublin_core>
|
|
|
|
t;
|
|
<dcvalue element="title" qualifier="alternate" language="fr"
|
|
">J'aime les Printemps</dcvalue>
|
|
</dublin_core>
|
|
|
|
</pre><p>(Note the optional language tag attribute which notifies the system that the optional title is in French.)</p><p>The <code class="literal">contents</code> file simply enumerates, one file per line, the bitstream file names. The bitstream name may optionally be followed by the sequence:</p><p>
|
|
<code class="literal">\tbundle:bundlename</code>
|
|
</p><p> where '\t' is the tab character and 'bundlename' is replaced by the name of the bundle to which the bitstream should be added. If no bundle is specified, the bitstream will be added to the 'ORIGINAL' bundle.</p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="N135D0"></a>9.5.2. <a name="docbook-application.html-importingitems"></a>Importing Items</h3></div></div></div><p><span class="bold"><strong>Note:</strong></span> Before running the item importer over items previously exported from a DSpace instance, please first refer to <a class="ulink" href="application.html#transferitem" target="_top">Transferring Items Between DSpace Instances</a>.</p><p>The item importer is in <code class="literal">org.dspace.app.itemimport.ItemImport</code>, and is run with the <code class="literal">import</code> utility in the <code class="literal">dspace/bin</code> directory. Running it with -h gets the current command-line arguments. Another very important flag is the --test flag, which you can use with any command to simulate all of the actions it will perform without actually making any changes to your DSpace instance - very useful for validating your item directories before doing an import. In the importer's arguments you can use either the user's database ID or email address and the eperson ID, and the collection's database ID or handle as arguments. Currently with the importer you can add, remove, and replace items in a collection. If you specify more than one collection argument then the items will be imported to multiple collections, and the first collection specified becomes the "owning" collection. If there is an error and the import is aborted, there is a --resume flag that you can try to resume the import where you left off after you fix the error.</p><p>To add items to a collection with an EPerson as the submitter, type:</p><pre class="screen">
|
|
[dspace]/bin/import --add --eperson=joe@user.com
|
|
--collection=collectionID --source=items_dir --mapfile=mapfile
|
|
</pre><p>(or by using the short form)</p><pre class="screen">
|
|
[dspace]/bin/import -a -e joe@user.com -c collectionID -s items_dir
|
|
-m mapfile
|
|
</pre><p>which would then cycle through the archive directory's items, import them, and then generate a map file which stores the mapping of item directories to item handles. Save this map file! Using the map file you can then 'unimport' with the command:</p><pre class="screen">
|
|
[dspace]/bin/import --delete --mapfile=mapfile
|
|
</pre><p>The imported items listed in the map file would then be deleted. If you wish to replace previously imported items, you can give the command:</p><pre class="screen">
|
|
[dspace]/bin/import --replace --eperson=joe@user.com
|
|
--collection=collectID --source=items_dir --mapfile=mapfile
|
|
</pre><p>Replacing items uses the map file to replace the old items and still retain their handles.</p><p>The importer usually bypasses any workflow assigned to a collection, but adding the --workflow option will route the imported items through the workflow system.</p><p>The importer also has a --test flag that will simulate the entire import process without actually doing the import. This is extremely useful for verifying your import files before doing the import step.</p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="N13608"></a>9.5.3. <a name="docbook-application.html-exportingitems"></a>Exporting Items</h3></div></div></div><p>The item exporter can export a single item or a collection of items, and creates a DSpace simple archive for each item to be exported. To export a collection's items you type:</p><pre class="screen">
|
|
[dspace]/bin/export --type=COLLECTION --id=collID --dest=dest_dir
|
|
--number=seq_num
|
|
</pre><p>The keyword <code class="literal">COLLECTION</code> means that you intend to export an entire collection. The ID can either be the database ID or the handle. The exporter will begin numbering the simple archives with the sequence number that you supply. To export a single item use the keyword <code class="literal">ITEM</code> and give the item ID as an argument:</p><pre class="screen">
|
|
[dspace]/bin/export --type=ITEM --id=itemID --dest=dest_dir
|
|
--number=seq_num
|
|
</pre><p>Each exported item will have an additional file in its directory, named 'handle'. This will contain the handle that was assigned to the item, and this file will be read by the importer so that items exported and then imported to another machine will retain the item's original handle.</p></div></div><div class="section" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="N13622"></a>9.6. <a name="docbook-application.html-transferitem"></a>Transferring Items Between DSpace Instances</h2></div></div></div><p>Where items are to be moved between DSpace instances (for example from a test DSpace into a production DSpace) the item exporter and item importer can be used in conjunction with a script to assist in this process.</p><p>After running the item exporter each <code class="literal">dublin_core.xml</code> file will contain metadata that was automatically added by DSpace. These fields are as follows:</p><div class="itemizedlist"><ul type="disc"><li><p> date.accessioned</p></li><li><p> date.available</p></li><li><p> date.issued</p></li><li><p> description.provenance</p></li><li><p> format.extent</p></li><li><p> format.mimetype</p></li><li><p> identifier.uri</p></li></ul></div><p>In order to avoid duplication of this metadata, run</p><p>
|
|
<code class="literal">dspace_migrate <exported item directory></code>
|
|
</p><p>prior to running the item importer. This will remove the above metadata items, except for date.issued - if the item has been published or publicly distributed before and identifier.uri - if it is not the handle, from the <code class="literal">dublin_core.xml</code> file and remove all <code class="literal">handle</code> files. It will then be safe to run the item exporter. Use</p><p>
|
|
<code class="literal">dspace_migrate --help</code>
|
|
</p><p>for instructions on use of the script.</p></div><div class="section" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="N13661"></a>9.7. <a name="docbook-application.html-registration"></a>Registering (Not Importing) Bitstreams</h2></div></div></div><p>Registration is an alternate means of incorporating items, their metadata, and their bitstreams into DSpace by taking advantage of the bitstreams already being in storage accessible to DSpace. An example might be that there is a repository for existing digital assets. Rather than using the normal <a class="ulink" href="functional.html#ingest" target="_top">interactive ingest process</a> or the <a class="ulink" href="functional.html#importexport" target="_top">batch import</a> to furnish DSpace the metadata and to upload bitstreams, registration provides DSpace the metadata and the location of the bitstreams. DSpace uses a variation of the import tool to accomplish registration.</p><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="N13672"></a>9.7.1. Accessible Storage</h3></div></div></div><p>To register an item its bitstreams must reside on storage accessible to DSpace and therefore referenced by an asset store number in <code class="literal">dspace.cfg</code>. The configuration file <code class="literal">dspace.cfg</code> establishes one or more asset stores through the use of an integer asset store number. This number relates to a directory in the DSpace host's file system or a set of SRB account parameters. This asset store number is described in <a class="ulink" href="configure.html#dspacecfg" target="_top">The <code class="literal">dspace.cfg</code> Configuration Properties File</a> section and in the <code class="literal">dspace.cfg</code> file itself. The asset store number(s) used for registered items should generally not be the value of the <code class="literal">assetstore.incoming</code> property since it is unlikely that that you will want to mix the bitstreams of normally ingested and imported items and registered items.</p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="N13690"></a>9.7.2. Registering Items Using the Item Importer</h3></div></div></div><p>DSpace uses the same import tool that is used for batch import except that several variations are employed to support registration. The discussion that follows assumes familiarity with the import tool.</p><p>The archive format for registration does not include the actual content files (bitstreams) being registered. The format is however a directory full of items to be registered, with a subdirectory per item. Each item directory contains a file for the item's descriptive metadata (<code class="literal">dublin_core.xml</code>) and a file listing the item's content files (<code class="literal">contents</code>), but not the actual content files themselves.</p><p>The <code class="literal">dublin_core.xml</code> file for item registration is exactly the same as for regular item import.</p><p>The <code class="literal">contents</code> file, like that for regular item import, lists the item's content files, one content file per line, but each line has the one of the following formats:</p><pre class="screen">
|
|
-r -s n -f filepath
|
|
-r -s n -f filepath\tbundle:bundlename
|
|
-r -s n -f filepath\tbundle:bundlename\tpermissions: -[r|w] 'group
|
|
name'
|
|
-r -s n -f filepath\tbundle:bundlename\tpermissions: -[r|w] 'group
|
|
name'\tdescription: some text
|
|
</pre><p>where</p><div class="itemizedlist"><ul type="disc"><li><p><code class="literal">-r</code> indicates this is a file to be registered</p></li><li><p><code class="literal">-s n</code> indicates the asset store number (<code class="literal">n</code>)</p></li><li><p><code class="literal">-f filepath</code> indicates the path and name of the content file to be registered (filepath)</p></li><li><p><code class="literal">\t</code> is a tab character</p></li><li><p><code class="literal">bundle:bundlename</code> is an optional bundle name</p></li><li><p><code class="literal">permissions: -[r|w] 'group name'</code> is an optional read or write permission that can be attached to the bitstream</p></li><li><p><code class="literal">description: some text</code> is an optional description field to add to the file</p></li></ul></div><p>The bundle, that is everything after the filepath, is optional and is normally not used.</p><p>The command line for registration is just like the one for regular import:</p><pre class="screen">
|
|
dsrun org.dspace.app.itemimport.ItemImport --add
|
|
--eperson=joe@user.com --collection=collectionID --source=items_dir
|
|
--mapfile=mapfile
|
|
</pre><p>(or by using the short form)</p><pre class="screen">
|
|
dsrun org.dspace.app.itemimport.ItemImport -a -e joe@user.com -c
|
|
collectionID -s items_dir -m mapfile
|
|
</pre><p>The <code class="literal">--workflow</code> and <code class="literal">--test</code> flags will function as described in <a class="ulink" href="application.html#importingitems" target="_top">Importing Items</a>.</p><p>The <code class="literal">--delete</code> flag will function as described in <a class="ulink" href="application.html#importingitems" target="_top">Importing Items</a> but the registered content files will not be removed from storage. See <a class="link" href="ch09.html#docbook-application.html-deletingregistereditems">Deleting Registered Items</a>.</p><p>The <code class="literal">--replace</code> flag will function as described in <a class="ulink" href="application.html#importingitems" target="_top">Importing Items</a> but care should be taken to consider different cases and implications. With old items and new items being registered or ingested normally, there are four combinations or cases to consider. Foremost, an old registered item deleted from DSpace using <code class="literal">--replace</code> will not be removed from the storage. See <a class="ulink" href="application.html#deletingregistereditems" target="_top">Deleting Registered Items</a>. where is resides. A new item added to DSpace using <code class="literal">--replace</code> will be ingested normally or will be registered depending on whether or not it is marked in the <code class="literal">contents</code> files with the -r.</p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="N13722"></a>9.7.3. Internal Identification and Retrieval of Registered Items</h3></div></div></div><p>Once an item has been registered, superficially it is indistinguishable from items ingested interactively or by batch import. But internally there are some differences:</p><p>First, the randomly generated internal ID is not used because DSpace does not control the file path and name of the bitstream. Instead, the file path and name are that specified in the <code class="literal">contents</code> file.</p><p>Second, the <code class="literal">store_number</code> column of the bitstream database row contains the asset store number specified in the <code class="literal">contents</code> file.</p><p>Third, the <code class="literal">internal_id</code> column of the bitstream database row contains a leading flag (<code class="literal">-R</code>) followed by the registered file path and name. For example, <code class="literal">-Rfilepath</code> where <code class="literal">filepath</code> is the file path and name relative to the asset store corresponding to the asset store number. The asset store could be traditional storage in the DSpace server's file system or an SRB account.</p><p>Fourth, an MD5 checksum is calculated by reading the registered file if it is in local storage. If the registerd file is in remote storage (say, SRB) a checksum is calulated on just the file name! This is an efficiency choice since registering a large number of large files that are in SRB would consume substantial network resources and time. A future option could be to have an SRB proxy process calculate MD5s and store them in SRB's metadata catalog (MCAT) for rapid retrieval. SRB offers such an option but it's not yet in production release.</p><p>Registered items and their bitstreams can be retrieved transparently just like normally ingested items.</p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="N1374E"></a>9.7.4. Exporting Registered Items</h3></div></div></div><p>Registered items may be exported as described in <a class="link" href="ch09.html#docbook-application.html-exportingitems">Exporting Items</a>. If so, the export directory will contain actual copies of the files being exported but the lines in the contents file will flag the files as registered. This means that if DSpace items are "round tripped" (see Transferring Items Between DSpace Instances) using the exporter and importer, the registered files in the export directory will again registered in DSpace instead of being uploaded and ingested normally.</p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="N13758"></a>9.7.5. METS Export of Registered Items</h3></div></div></div><p>The <a class="link" href="ch09.html#docbook-application.html-mets">METS Export Tool</a> can also be used but note the cautions described in that section and note that MD5 values for items in remote storage are actually MD5 values on just the file name.</p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="N13762"></a>9.7.6. <a name="docbook-application.html-deletingregistereditems"></a>Deleting Registered Items</h3></div></div></div><p>If a registered item is deleted from DSpace, either interactively or by using the <code class="literal">--delete</code> or <code class="literal">--replace</code> flags described in <a class="ulink" href="application.html#importingitems" target="_top">Importing Items</a>, the item will disappear from DSpace but it's registered content files will remain in place just as they were prior to registration. Bitstreams not registered but added by DSpace as part of registration, such as <code class="literal">license.txt</code> files, will be deleted.</p></div></div><div class="section" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="N1377A"></a>9.8. <a name="docbook-application.html-mets"></a>METS Tools</h2></div></div></div><p>The experimental (incomplete) METS export tool writes DSpace items to a filesystem with the metadata held in a more standard format based on METS.</p><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="N13783"></a>9.8.1. The Export Tool</h3></div></div></div><p>The METS export tool is invoked via the command line like this:</p><pre class="screen">
|
|
<span class="emphasis"><em> [dspace]</em></span>/bin/dsrun org.dspace.app.mets.METSExport
|
|
--help
|
|
</pre><p>The tool can export an individual item, the items within a given collection, or everything in the DSpace instance. To export an individual item, use:</p><pre class="screen">
|
|
<span class="emphasis"><em> [dspace]</em></span>/bin/dsrun org.dspace.app.mets.METSExport --item <span class="emphasis"><em>
|
|
[handle]</em></span>
|
|
</pre><p>To export the items in collection <code class="literal">hdl:123.456/789</code>, use:</p><pre class="screen">
|
|
<span class="emphasis"><em> [dspace]</em></span>/bin/dsrun org.dspace.app.mets.METSExport --collection
|
|
hdl:123.456/789
|
|
</pre><p>To export all the items DSpace, use:</p><pre class="screen">
|
|
<span class="emphasis"><em> [dspace]</em></span>/bin/dsrun org.dspace.app.mets.METSExport
|
|
--all
|
|
</pre><p>With any of the above forms, you can specify the base directory into which the items will be exported, using <code class="literal">--destination [directory]</code>. If this parameter is omitted, the current directory is used.</p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="N137B4"></a>9.8.2. The AIP Format</h3></div></div></div><p>Each exported item is written to a separate directory, created under the base directory specified in the command-line arguments, or in the current directory if <code class="literal">--destination</code> is omitted. The name of each directory is the Handle, URL-encoded so that the directory name is 'legal'.</p><p>Within each item directory is a <code class="literal">mets.xml</code> file which contains the METS-encoded metadata for the item. Bitstreams in the item are also stored in the directory. Their filenames are their MD5 checksums, firstly for easy integrity checking, and also to avoid any problems with 'special characters' in the filenames that were legal on the original filing system they came from but are illegal in the server filing system. The <code class="literal">mets.xml</code> file includes XLink pointers to these bitstream files.</p><p>An example AIP might look like this:</p><div class="itemizedlist"><ul type="disc"><li><p>
|
|
<code class="literal">hdl%3A123456789%2F8/</code>
|
|
<div class="itemizedlist"><ul type="circle"><li><p><code class="literal">mets.xml</code> -- METS metadata</p></li><li><p><code class="literal">184BE84F293342</code> -- bitstream</p></li><li><p>
|
|
<code class="literal">3F9AD0389CB821</code>
|
|
</p></li><li><p>
|
|
<code class="literal">135FB82113C32D</code>
|
|
</p></li></ul></div>
|
|
</p></li></ul></div><p>The contents of the METS in the <code class="literal">mets.xml</code> file are as follows:</p><div class="itemizedlist"><ul type="disc"><li><p> A <code class="literal">dmdSec</code> (descriptive metadata section) containing the item's metadata in <a class="ulink" href="http://www.loc.gov/standards/mods/" target="_top">Metadata Object Description Schema (MODS)</a> XML. The Dublin Core descriptive metadata is mapped to MODS since there is no official qualified Dublin Core XML schema in existence as of yet, and the Library Application Profile of DC that DSpace uses includes some qualifiers that are not part of the <a class="ulink" href="http://dublincore.org/documents/dcmi-terms/" target="_top">DCMI Metadata Terms</a>.</p></li><li><p> An <code class="literal">amdSec</code> (administrative metadata section), which contains the a rights metadata element, which in turn contains the base64-encoded deposit license (the license the submitter granted as part of the submission process).</p></li><li><p> A <code class="literal">fileSec</code> containing a list of the bitstreams in the item. Each bundle constitutes a <code class="literal">fileGrp</code>. Each bitstream is represented by a <code class="literal">file</code> element, which contains an <code class="literal">FLocat</code> element with a simple XLink to the bitstream in the same directory as the <code class="literal">mets.xml</code> file. The <code class="literal">file</code> attributes consist of most of the basic technical metadata for the bitstream. Additionally, for those bitstreams that are thumbnails or text extracted from another bitstream in the item, those 'derived' bitstreams have the same <code class="literal">GROUPID</code> as the bitstream they were derived from, in order that clients understand that there is a relationship.</p><p>The <code class="literal">OWNERID</code> of each <code class="literal">file</code> is the <a class="ulink" href="functional.html#bitstream_ids" target="_top">'persistent' bitstream identifier</a> assigned by the DSpace instance. The <code class="literal">ID</code> and <code class="literal">GROUPID</code> attributes consist of the item's Handle, together with the bitstream's sequence ID, which underscores used in place of dots and slashes. For example, a bitstream with sequence ID 24, in the item <code class="literal">hdl:123.456/789</code> will have the <code class="literal">ID</code><code class="literal">123_456_789_24</code>. This is because <code class="literal">ID</code> and <code class="literal">GROUPID</code> attributes must be of type <code class="literal">xsd:id</code>.</p></li></ul></div></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="N13857"></a>9.8.3. Limitations</h3></div></div></div><div class="itemizedlist"><ul type="disc"><li><p> No corresponding import tool yet</p></li><li><p> No <code class="literal">structmap</code> section</p></li><li><p> Some technical metadata not written, e.g. the primary bitstream in a bundle, original filenames or descriptions.</p></li><li><p> Only the MIME type is stored, not the (finer grained) bitstream format.</p></li><li><p> Dublin Core to MODS mapping is very simple, probably needs verification</p></li></ul></div></div></div><div class="section" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="N1386F"></a>9.9. <a name="docbook-application.html-mediafilters"></a>MediaFilters: Transforming DSpace Content</h2></div></div></div><p>DSpace can apply filters to content/bitstreams, creating new content. Filters are included that extract text for <span class="bold"><strong>full-text searching</strong></span>, and create <span class="bold"><strong>thumbnails</strong></span> for items that contain images. The media filters are controlled by the <code class="literal">MediaFilterManager</code> which traverses the asset store, invoking the <code class="literal">MediaFilter</code> or <code class="literal">FormatFilter</code> classes on bitstreams. The media filter plugin configuration <code class="literal">filter.plugins</code> in <code class="literal">dspace.cfg</code> contains a list of all enabled media/format filter plugins (see <a class="ulink" href="configure.html#mediafilters" target="_top">Configuring Media Filters</a> for more information). The media filter system is intended to be run from the command line (or regularly as a cron task):</p><pre class="screen">
|
|
[dspace]/bin/filter-media
|
|
</pre><p>With no options, this traverses the asset store, applying media filters to bitstreams, and skipping bitstreams that have already been filtered.</p><p>
|
|
<span class="bold"><strong>Available Command-Line Options:</strong></span>
|
|
</p><div class="itemizedlist"><ul type="disc"><li><p><span class="bold"><strong>Help</strong></span> : <code class="literal">[dspace]/bin/filter-media -h</code></p><div class="itemizedlist"><ul type="circle"><li><p> Display help message describing all command-line options.</p></li></ul></div></li><li><p><span class="bold"><strong>Force mode</strong></span> : <code class="literal">[dspace]/bin/filter-media -f</code></p><div class="itemizedlist"><ul type="circle"><li><p> Apply filters to ALL bitstreams, even if they've already been filtered. If they've already been filtered, the previously filtered content is overwritten.</p></li></ul></div></li><li><p><span class="bold"><strong>Identifier mode</strong></span> : <code class="literal">[dspace]/bin/filter-media -i 123456789/2</code></p><div class="itemizedlist"><ul type="circle"><li><p> Restrict processing to the community, collection, or item named by the identifier - by default, all bitstreams of all items in the repository are processed. The identifier must be a Handle, not a DB key. This option may be combined with any other option.</p></li></ul></div></li><li><p><span class="bold"><strong>Maximum mode</strong></span> : <code class="literal">[dspace]/bin/filter-media -m 1000</code></p><div class="itemizedlist"><ul type="circle"><li><p> Suspend operation after the specified maximum number of items have been processed - by default, no limit exists. This option may be combined with any other option.</p></li></ul></div></li><li><p><span class="bold"><strong>No-Index mode</strong></span> : <code class="literal">[dspace]/bin/filter-media -n</code></p><div class="itemizedlist"><ul type="circle"><li><p> Suppress index creation - by default, a new search index is created for full-text searching. This option suppresses index creation if you intend to run <code class="literal">index-all</code> elsewhere.</p></li></ul></div></li><li><p><span class="bold"><strong>Plugin mode</strong></span> : <code class="literal">[dspace]/bin/filter-media -p "PDF Text Extractor","Word Text Extractor"</code></p><div class="itemizedlist"><ul type="circle"><li><p> Apply ONLY the filter plugin(s) listed (separated by commas). By default all named filters listed in the <code class="literal">filter.plugins</code> field of <code class="literal">dspace.cfg</code> are applied. This option may be combined with any other option. <span class="emphasis"><em>WARNING:</em></span> multiple plugin names must be separated by a comma (i.e. ',') and NOT a comma followed by a space (i.e. ', ').</p></li></ul></div></li><li><p><span class="bold"><strong>Skip mode</strong></span> : <code class="literal">[dspace]/bin/filter-media -s 123456789/9,123456789/100</code></p><div class="itemizedlist"><ul type="circle"><li><p> SKIP the listed identifiers (separated by commas) during processing. The identifiers must be Handles (not DB Keys). They may refer to items, collections or communities which should be skipped. This option may be combined with any other option. <span class="emphasis"><em>WARNING:</em></span> multiple identifiers must be separated by a comma (i.e. ',') and NOT a comma followed by a space (i.e. ', ').</p></li><li><p> NOTE: If you have a large number of identifiers to skip, you may maintain this comma-separated list within a separate file (e.g. <code class="literal">filter-skiplist.txt</code>), and call it similar to the following: </p><div class="itemizedlist"><ul type="square"><li><p>
|
|
<code class="literal">[dspace]/bin/filter-media -s `less filter-skiplist.txt`</code>
|
|
</p></li></ul></div></li></ul></div></li><li><p><span class="bold"><strong>Verbose mode</strong></span> : <code class="literal">[dspace]/bin/filter-media -v</code></p><div class="itemizedlist"><ul type="circle"><li><p> Verbose mode - print all extracted text and other filter details to STDOUT.</p></li></ul></div></li></ul></div><p>Adding your own filters is done by creating a class which <code class="literal">implements</code> the <code class="literal">org.dspace.app.mediafilter.FormatFilter</code> interface. See the <a class="ulink" href="configure.html#newfilter" target="_top">Creating a new Media Filter</a> topic and comments in the source file FormatFilter.java for more information. In theory filters could be implemented in any programming language (C, Perl, etc.) However, they need to be invoked by the Java code in the Media Filter class that you create.</p></div><div class="section" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="N1393B"></a>9.10. <a name="docbook-application.html-filiator"></a>Sub-Community Management</h2></div></div></div><p>DSpace provides an administrative tool - 'CommunityFiliator' - for managing community sub-structure. Normally this structure seldom changes, but prior to the 1.2 release sub-communities were not supported, so this tool could be used to place existing pre-1.2 communities into a hierarchy. It has two operations, either establishing a community to sub-community relationship, or dis-establishing an existing relationship.</p><p>The familiar parent/child metaphor can be used to explain how it works. Every community in DSpace can be either a 'parent' community - meaning it has at least one sub-community, or a 'child' community - meaning it is a sub-community of another community, or both or neither. In these terms, an 'orphan' is a community that lacks a parent (although it can be a parent); 'orphans' are referred to as 'top-level' communities in the DSpace user-interface, since there is no parent community 'above' them. The first operation - establishing a parent/child relationship - can take place between any community and an orphan. The second operation - removing a parent/child relationship - will make the child an orphan.</p><p>Using the dsrun utility in the dspace/bin directory, the establish operation looks like this:</p><pre class="screen">
|
|
dsrun org.dspace.administer.CommunityFiliator --set --parent=parentID
|
|
--child=childID
|
|
</pre><p>(or using the short form)</p><pre class="screen">
|
|
dsrun org.dspace.administer.CommunityFiliator -s -p parentID -c
|
|
childID
|
|
</pre><p>where '-s' or '--set' means establish a relationship whereby the community identified by the '-p' parameter becomes the parent of the community identified by the '-c' parameter. Both the 'parentID' and 'childID' values may be handles or database IDs.</p><p>The reverse operation looks like this:</p><pre class="screen">
|
|
dsrun org.dspace.administer.CommunityFiliator --remove
|
|
--parent=parentID --child=childID
|
|
</pre><p>(or using the short form)</p><pre class="screen">
|
|
dsrun org.dspace.administer.CommunityFiliator -r -p parentID -c
|
|
childID
|
|
</pre><p>where '-r' or '--remove' means dis-establish the current relationship in which the community identified by 'parentID' is the parent of the community identified by 'childID'. The outcome will be that the 'childID' community will become an orphan, i.e. a top-level community.</p><p>If the required constraints of operation are violated, an error message will appear explaining the problem, and no change will be made. An example in a removal operation, where the stated child community does not have the stated parent community as its parent: "Error, child community not a child of parent community".</p><p>It is possible to effect arbitrary changes to the community hierarchy by chaining the basic operations together. For example, to move a child community from one parent to another, simply perform a 'remove' from its current parent (which will leave it an orphan), followed by a 'set' to its new parent.</p><p>It is important to understand that when any operation is performed, all the sub-structure of the child community follows it. Thus, if a child has itself children (sub-communities), or collections, they will all move with it to its new 'location' in the community tree.</p></div></div><HR><p class="copyright">Copyright © 2002-2008
|
|
<a class="ulink" href="http://www.dspace.org/" target="_top">The DSpace Foundation</a>
|
|
</p><div class="legalnotice"><a name="N10017"></a><p>
|
|
<a class="ulink" href="http://creativecommons.org/licenses/by/3.0/us/" target="_top">
|
|
<span class="inlinemediaobject"><img src="http://i.creativecommons.org/l/by/3.0/us/88x31.png"></span>
|
|
Licensed under a Creative Commons Attribution 3.0 United States License
|
|
</a>
|
|
</p></div><div class="navfooter"><hr><table summary="Navigation footer" width="100%"><tr><td align="left" width="40%"><a accesskey="p" href="ch08.html">Prev</a> </td><td align="center" width="20%"> </td><td align="right" width="40%"> <a accesskey="n" href="ch10.html">Next</a></td></tr><tr><td valign="top" align="left" width="40%">Chapter 8. DSpace System Documentation: Architecture </td><td align="center" width="20%"><a accesskey="h" href="index.html">Home</a></td><td valign="top" align="right" width="40%"> Chapter 10. DSpace System Documentation: Business Logic Layer</td></tr></table></div></body></html> |