Files
DSpace/dspace/docs/html/ch08.html
Jeffrey Trimble 0bd461646d Final Revisions for 1.6.1
git-svn-id: http://scm.dspace.org/svn/repo/dspace/trunk@5002 9c30dcfa-912a-0410-8fc2-9e0234be79fd
2010-05-21 16:39:25 +00:00

181 lines
125 KiB
HTML
Raw Blame History

<html><head><META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"><title>Chapter&nbsp;8.&nbsp;DSpace System Documentation: System Administration</title><meta content="DocBook XSL Stylesheets V1.75.2" name="generator"><link rel="home" href="index.html" title="DSpace Manual"><link rel="up" href="index.html" title="DSpace Manual"><link rel="prev" href="ch07.html" title="Chapter&nbsp;7.&nbsp;DSpace System Documentation: Manakin [XMLUI] Configuration and Customization"><link rel="next" href="ch09.html" title="Chapter&nbsp;9.&nbsp;DSpace System Documentation: Storage Layer"></head><body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF" marginwidth="5m"><div class="navheader"><table summary="Navigation header" width="100%"><tr><th align="center" colspan="3">Chapter&nbsp;8.&nbsp;DSpace System Documentation: System Administration</th></tr><tr><td align="left" width="20%"><a accesskey="p" href="ch07.html">Prev</a>&nbsp;</td><th align="center" width="60%">&nbsp;</th><td align="right" width="20%">&nbsp;<a accesskey="n" href="ch09.html">Next</a></td></tr></table><hr></div><div class="chapter" title="Chapter&nbsp;8.&nbsp;DSpace System Documentation: System Administration"><div class="titlepage"><div><div><h2 class="title"><a name="N158AB"></a>Chapter&nbsp;8.&nbsp;<a name="docbook-sys_admin.html"></a>DSpace System Documentation: System Administration</h2></div></div><div></div></div><div class="toc"><p><b>Table of Contents</b></p><dl><dt><span class="section"><a href="ch08.html#N158EA">8.1. Community and Collection Structure Importer</a></span></dt><dd><dl><dt><span class="section"><a href="ch08.html#N15945">8.1.1. Limitation</a></span></dt></dl></dd><dt><span class="section"><a href="ch08.html#N1594E">8.2. Package Importer and Exporter</a></span></dt><dd><dl><dt><span class="section"><a href="ch08.html#N1596E">8.2.1. Ingesting</a></span></dt><dt><span class="section"><a href="ch08.html#N159AB">8.2.2. Disseminating</a></span></dt><dt><span class="section"><a href="ch08.html#N159DE">8.2.3. METS packages</a></span></dt></dl></dd><dt><span class="section"><a href="ch08.html#N159FE">8.3. Item Importer and Exporter</a></span></dt><dd><dl><dt><span class="section"><a href="ch08.html#N15A07">8.3.1. DSpace Simple Archive Format</a></span></dt><dt><span class="section"><a href="ch08.html#N15A57">8.3.2. Configuring <code class="literal">metadata-[prefix].xml</code> for Different Schema</a></span></dt><dt><span class="section"><a href="ch08.html#N15A8A">8.3.3. Importing Items</a></span></dt><dd><dl><dt><span class="section"><a href="ch08.html#N15B5D">8.3.3.1. Adding Items to a Collection</a></span></dt><dt><span class="section"><a href="ch08.html#N15B93">8.3.3.2. Replacing Items in Collection</a></span></dt><dt><span class="section"><a href="ch08.html#N15BAA">8.3.3.3. Deleting or Unimporting Items in a Collection</a></span></dt><dt><span class="section"><a href="ch08.html#N15BC1">8.3.3.4. Other Options</a></span></dt></dl></dd><dt><span class="section"><a href="ch08.html#N15BEB">8.3.4. Exporting Items</a></span></dt></dl></dd><dt><span class="section"><a href="ch08.html#N15CA6">8.4. Transferring Items Between DSpace Instances</a></span></dt><dt><span class="section"><a href="ch08.html#N15CE1">8.5. Item Update</a></span></dt><dd><dl><dt><span class="section"><a href="ch08.html#N15D06">8.5.1. DSpace simple Archive Format</a></span></dt><dt><span class="section"><a href="ch08.html#N15D19">8.5.2. ItemUpdate Commands</a></span></dt><dt><span class="section"><a href="ch08.html#N15DDE">8.5.3. CLI Examples</a></span></dt></dl></dd><dt><span class="section"><a href="ch08.html#N15DED">8.6. Registering (Not Importing) Bitstreams</a></span></dt><dd><dl><dt><span class="section"><a href="ch08.html#N15DFE">8.6.1. Accessible Storage</a></span></dt><dt><span class="section"><a href="ch08.html#N15E1C">8.6.2. Registering Items Using the Item Importer</a></span></dt><dt><span class="section"><a href="ch08.html#N15EB0">8.6.3. Internal Identification and Retrieval of Registered Items</a></span></dt><dt><span class="section"><a href="ch08.html#N15EDC">8.6.4. Exporting Registered Items</a></span></dt><dt><span class="section"><a href="ch08.html#N15EE6">8.6.5. METS Export of Registered Items</a></span></dt><dt><span class="section"><a href="ch08.html#N15EF0">8.6.6. Deleting Registered Items</a></span></dt></dl></dd><dt><span class="section"><a href="ch08.html#N15F08">8.7. METS Tools</a></span></dt><dd><dl><dt><span class="section"><a href="ch08.html#N15F11">8.7.1. The Export Tool</a></span></dt><dt><span class="section"><a href="ch08.html#N15FA3">8.7.2. The AIP Format</a></span></dt><dt><span class="section"><a href="ch08.html#N16047">8.7.3. Limitations</a></span></dt></dl></dd><dt><span class="section"><a href="ch08.html#N1605F">8.8. MediaFilters: Transforming DSpace Content</a></span></dt><dt><span class="section"><a href="ch08.html#N16139">8.9. Sub-Community Management</a></span></dt><dt><span class="section"><a href="ch08.html#N161D0">8.10. Batch Metadata Editing</a></span></dt><dd><dl><dt><span class="section"><a href="ch08.html#N161EC">8.10.1. Export Function</a></span></dt><dd><dl><dt><span class="section"><a href="ch08.html#N16254">8.10.1.1. Exporting Process</a></span></dt></dl></dd><dt><span class="section"><a href="ch08.html#N16276">8.10.2. Import Function</a></span></dt><dd><dl><dt><span class="section"><a href="ch08.html#N162FA">8.10.2.1. Importing Process</a></span></dt></dl></dd><dt><span class="section"><a href="ch08.html#N1631C">8.10.3. The CSV Files</a></span></dt></dl></dd><dt><span class="section"><a href="ch08.html#N16370">8.11. Checksum Checker</a></span></dt><dd><dl><dt><span class="section"><a href="ch08.html#N16415">8.11.1. Checker Execution Mode</a></span></dt><dt><span class="section"><a href="ch08.html#N164C1">8.11.2. Checker Results Pruning</a></span></dt><dt><span class="section"><a href="ch08.html#N164E5">8.11.3. Checker Reporting</a></span></dt><dt><span class="section"><a href="ch08.html#N16502">8.11.4. Cron or Automatic Execution of Checksum Checker</a></span></dt><dt><span class="section"><a href="ch08.html#N16525">8.11.5. Automated Checksum Checkers' Results</a></span></dt></dl></dd><dt><span class="section"><a href="ch08.html#N165AC">8.12. Embargo</a></span></dt><dt><span class="section"><a href="ch08.html#N16636">8.13. Browse Index Creation</a></span></dt><dd><dl><dt><span class="section"><a href="ch08.html#N16735">8.13.1. Running the Indexing Programs</a></span></dt><dt><span class="section"><a href="ch08.html#N1675B">8.13.2. Indexing Customization</a></span></dt></dl></dd><dt><span class="section"><a href="ch08.html#N167B9">8.14. DSpace Log Converter</a></span></dt><dt><span class="section"><a href="ch08.html#N168A2">8.15. Client Statistics</a></span></dt><dt><span class="section"><a href="ch08.html#N1692C">8.16. Test Database</a></span></dt></dl></div><p>DSpace operates on several levels: as a Tomcat servlet, cron jobs, and on-demand operations. This section explains many of the on-demand operations. Some of the command operations may be also set up as cron jobs. Many of these operations are performed at the Command Line Interface (CLI) also known as the Unix prompt ($:) Future reference will use the term CLI when the use needs to be at the command line.</p><p>Below is the "Command Help Table". This table explains what data is contained in the individual command/help tables in the sections that follow.</p><div class="table"><a name="N158B5"></a><p class="title"><b>Table&nbsp;8.1.&nbsp;Command Help Table</b></p><div class="table-contents"><table summary="Command Help Table" border="1" width="75%"><colgroup><col align="left"><col align="left"></colgroup><tbody><tr><td align="left">Command used:</td><td align="left">
<span class="emphasis"><em>The directory and where the command is to be found.</em></span>
</td></tr><tr><td align="left">Java class:</td><td align="left">
<span class="emphasis"><em>The actual java program doing the work.</em></span>
</td></tr><tr><td align="left">Arguments:</td><td align="left">
<span class="emphasis"><em>The required/mandatory or optional arguments available to the user.</em></span>
</td></tr></tbody></table></div></div><br class="table-break"><p><span class="bold"><strong>DSpace Command Launcher</strong></span>. With DSpace Release 1.6, the many commands and scripts have been replaced with a simple <code class="literal">[dspace]/bin/dspace &lt;command&gt;</code> command. See Application Layer chapter for the details of the DSpace Command Launcher.</p><div class="section" title="8.1.&nbsp;Community and Collection Structure Importer"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="N158EA"></a>8.1.&nbsp;<a name="docbook-sys_admin.html-structbuilder"></a>Community and Collection Structure Importer</h2></div></div><div></div></div><p>This CLI tool gives you the ability to import acommunity and collection structure directory froma source XML file.</p><div class="table"><a name="N158F3"></a><p class="title"><b>Table&nbsp;8.2.&nbsp;Structure Importer Command Table</b></p><div class="table-contents"><table summary="Structure Importer Command Table" border="1" width="75%"><colgroup><col align="left"><col align="left"></colgroup><tbody><tr><td align="left">Command used:</td><td align="left"><code class="literal">[dspace]/bin/dspace structure-builder</code></td></tr><tr><td align="left">Java class:</td><td align="left"><code class="literal">org.dspace.administer.StructBuilder</code></td></tr><tr><td align="left">Argument: short and long (if available) forms:</td><td align="left">Description of the argument</td></tr><tr><td align="left"><code class="literal">-f</code></td><td align="left">Source xml file. </td></tr><tr><td align="left"><code class="literal">-o</code></td><td align="left">Output xml file.</td></tr><tr><td align="left"><code class="literal">-e</code></td><td align="left">Email of DSpace Administrator.</td></tr></tbody></table></div></div><br class="table-break"><p>The administrator need to build the source xml document in the following format:</p><pre class="screen">&lt;import_structure&gt;
&lt;community&gt;
&lt;name&gt;Community Name&lt;/name&gt;
&lt;description&gt;Descriptive text&lt;/description&gt;
&lt;intro&gt;Introductory text&lt;/intro&gt;
&lt;copyright&gt;Special copyright notice&lt;/copyright&gt;
&lt;sidebar&gt;Sidebar text&lt;/sidebar&gt;
&lt;community&gt;
&lt;name&gt;Sub Community Name&lt;/name&gt;
&lt;community&gt; ...[ad infinitum]...
&lt;/community&gt;
&lt;/community&gt;
&lt;collection&gt;
&lt;name&gt;Collection Name&lt;/name&gt;
&lt;description&gt;Descriptive text&lt;/description&gt;
&lt;intro&gt;Introductory text&lt;/intro&gt;
&lt;copyright&gt;Special copyright notice&lt;/copyright&gt;
&lt;sidebar&gt;Sidebar text&lt;/sidebar&gt;
&lt;license&gt;Special licence&lt;/license&gt;
&lt;provenance&gt;Provenance information&lt;/provenance&gt;
&lt;/collection&gt;
&lt;/community&gt;
&lt;/import_structure&gt;
</pre><p>The resulting output document will be as follows:</p><pre class="screen">&lt;import_structure&gt;
&lt;community identifier="123456789/1"&gt;
&lt;name&gt;Community Name&lt;/name&gt;
&lt;description&gt;Descriptive text&lt;/description&gt;
&lt;intro&gt;Introductory text&lt;/intro&gt;
&lt;copyright&gt;Special copyright notice&lt;/copyright&gt;
&lt;sidebar&gt;Sidebar text&lt;/sidebar&gt;
&lt;community identifier="123456789/2"&gt;
&lt;name&gt;Sub Community Name&lt;/name&gt;
&lt;community identifier="123456789/3"&gt; ...[ad infinitum]...
&lt;/community&gt;
&lt;/community&gt;
&lt;collection identifier="123456789/4"&gt;
&lt;name&gt;Collection Name&lt;/name&gt;
&lt;description&gt;Descriptive text&lt;/description&gt;
&lt;intro&gt;Introductory text&lt;/intro&gt;
&lt;copyright&gt;Special copyright notice&lt;/copyright&gt;
&lt;sidebar&gt;Sidebar text&lt;/sidebar&gt;
&lt;license&gt;Special licence&lt;/license&gt;
&lt;provenance&gt;Provenance information&lt;/provenance&gt;
&lt;/collection&gt;
&lt;/community&gt;
&lt;/import_structure&gt;
</pre><p>This command-line tool gives you the ability to import a community and collection structure directly from a source XML file. It is executed as follows:</p><p><code class="literal">[dspace]/bin/dspace structure-builder -f /path/to/source.xml -o path/to/output.xml -e admin@user.com</code></p><p>This will examine the contents of <code class="literal">[source xml]</code>, import the structure into DSpace while logged in as the supplied administrator, and then output the same structure to the output file, but including the handle for each imported community and collection as an attribute.</p><div class="section" title="8.1.1.&nbsp;Limitation"><div class="titlepage"><div><div><h3 class="title"><a name="N15945"></a>8.1.1.&nbsp;Limitation</h3></div></div><div></div></div><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem"><p> Currently this does not export community and collection structures, although it should only be a small modification to make it do so</p></li></ul></div></div></div><div class="section" title="8.2.&nbsp;Package Importer and Exporter"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="N1594E"></a>8.2.&nbsp;<a name="docbook-sys_admin.html-packager"></a>Package Importer and Exporter</h2></div></div><div></div></div><p>This command-line tool gives you access to the Packager plugins. It can <span class="emphasis"><em>ingest</em></span> a package to create a new DSpace Item, or <span class="emphasis"><em>disseminate</em></span> an Item as a package.</p><p>To see all the options, invoke it as:</p><p>
<span class="emphasis"><em>
<code class="literal">[dspace]</code>
</em></span>
<code class="literal">/bin/packager --help</code>
</p><p> This mode also displays a list of the names of package ingesters and disseminators that are available.</p><div class="section" title="8.2.1.&nbsp;Ingesting"><div class="titlepage"><div><div><h3 class="title"><a name="N1596E"></a>8.2.1.&nbsp;Ingesting</h3></div></div><div></div></div><p>To ingest a package from a file, give the command:</p><pre class="screen"><span class="emphasis"><em>[dspace]</em></span>/bin/packager -e <span class="emphasis"><em> user</em></span> -c <span class="emphasis"><em> handle</em></span> -t <span class="emphasis"><em> packager</em></span> <span class="emphasis"><em>path</em></span></pre><p> Where <span class="emphasis"><em><code class="literal">user</code></em></span> is the e-mail address of the E-Person under whose authority this runs; <span class="emphasis"><em><code class="literal">handle</code></em></span> is the Handle of the collection into which the Item is added, <span class="emphasis"><em><code class="literal">packager</code></em></span> is the plugin name of the package ingester to use, and <span class="emphasis"><em><code class="literal">path</code></em></span> is the path to the file to ingest (or <code class="literal">"-"</code> to read from the standard input).</p><p> Here is an example that loads a PDF file with internal metadata as a package:</p><p>
<code class="literal">/dspace/bin/packager -e florey@mit.edu -c 1721.2/13 -t pdf thesis.pdf</code>
</p><p>This example takes the result of retrieving a URL and ingests it:</p><pre class="screen">wget -O - http://alum.mit.edu/jarandom/my-thesis.pdf | \
/dspace/bin/packager -e florey@mit.edu -c 1721.2/13 -t pdf -</pre></div><div class="section" title="8.2.2.&nbsp;Disseminating"><div class="titlepage"><div><div><h3 class="title"><a name="N159AB"></a>8.2.2.&nbsp;Disseminating</h3></div></div><div></div></div><p>To disseminate an Item as a package, give the command:</p><pre class="screen"><span class="emphasis"><em>[dspace]</em></span>/bin/packager -e <span class="emphasis"><em> user</em></span> -d -i <span class="emphasis"><em> handle</em></span> -t <span class="emphasis"><em>packager path</em></span></pre><p>Where <span class="emphasis"><em><code class="literal">user</code></em></span> is the e-mail address of the E-Person under whose authority this runs; <span class="emphasis"><em><code class="literal">handle</code></em></span> is the Handle of the Item to disseminate; <span class="emphasis"><em><code class="literal">packager</code></em></span> is the plugin name of the package disseminator to use; and <span class="emphasis"><em><code class="literal">path</code></em></span> is the path to the file to create (or <code class="literal">"-"</code> to write to the standard output). This example writes an Item out as a METS package in the file "454.zip":</p><p>
<code class="literal">/dspace/bin/packager -e florey@mit.edu -d -i 1721.2/454 -t METS 454.zip</code>
</p></div><div class="section" title="8.2.3.&nbsp;METS packages"><div class="titlepage"><div><div><h3 class="title"><a name="N159DE"></a>8.2.3.&nbsp;METS packages</h3></div></div><div></div></div><p>Since DSpace 1.4 release, the software includes a package disseminator and matching ingester for the DSpace METS SIP (Submission Information Package) format. They were created to help end users prepare sets of digital resources and metadata for submission to the archive using well-defined standards such as <a class="ulink" href="http://www.loc.gov/standards/mets/" target="_top">METS</a>, <a class="ulink" href="http://www.loc.gov/standards/mods/" target="_top">MODS</a>, and <a class="ulink" href="http://www.loc.gov/standards/premis/" target="_top">PREMIS</a>. The plugin name is <code class="literal">METS</code> by default, and it uses MODS for descriptive metadata.</p><p>The DSpace METS SIP profile is available at: <a class="ulink" href="http://www.dspace.org/standards/METS/SIP/profilev1p0/metsipv1p0.pdf" target="_top">
<span class="underline">http://www.dspace.org/standards/METS/SIP/profilev1p0/metsipv1p0.pdf</span></a> .</p></div></div><div class="section" title="8.3.&nbsp;Item Importer and Exporter"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="N159FE"></a>8.3.&nbsp;<a name="docbook-sys_admin.html-itemimporter"></a>Item Importer and Exporter</h2></div></div><div></div></div><p>DSpace has a set of command line tools for importing and exporting items in batches, using the DSpace simple archive format. The tools are not terribly robust, but are useful and are easily modified. They also give a good demonstration of how to implement your own item importer if desired.</p><div class="section" title="8.3.1.&nbsp;DSpace Simple Archive Format"><div class="titlepage"><div><div><h3 class="title"><a name="N15A07"></a>8.3.1.&nbsp;<a name="docbook-sys_admin.html-dsaf"></a>DSpace Simple Archive Format</h3></div></div><div></div></div><p>The basic concept behind the DSpace's simple archive format is to create an archive, which is directory full of items, with a subdirectory per item. Each item directory contains a file for the item's descriptive metadata, and the files that make up the item.</p><pre class="screen">
archive_directory/
item_000/
dublin_core.xml -- qualified Dublin Core metadata for metadata fields belonging to the dc schema
metadata_[prefix].xml -- metadata in another schema, the prefix is the name of the schema as registered with the metadata registry
contents -- text file containing one line per filename
file_1.doc -- files to be added as bitstreams to the item
file_2.pdf
item_001/
dublin_core.xml
contents
file_1.png
...
</pre><p>The <code class="literal">dublin_core.xml</code> or <code class="literal">metadata_[prefix].xml</code>file has the following format, where each metadata element has it's own entry within a <code class="literal">&lt;dcvalue&gt;</code> tagset. There are currently three tag attributes available in the <code class="literal">&lt;dcvalue&gt;</code> tagset:</p><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem"><p><code class="literal">&lt;element&gt;</code> - the Dublin Core element</p></li><li class="listitem"><p><code class="literal">&lt;qualifier&gt;</code> - the element's qualifier</p></li><li class="listitem"><p><code class="literal">&lt;language&gt;</code> - (optional)ISO language code for element</p></li></ul></div><pre class="screen">
&lt;dublin_core&gt;
&lt;dcvalue element="title" qualifier="none"&gt;A Tale of Two Cities&lt;/dcvalue&gt;
&lt;dcvalue element="date" qualifier="issued"&gt;1990&lt;/dcvalue&gt;
&lt;dcvalue element="title" qualifier="alternate" language="fr"&gt;J'aime les Printemps&lt;/dcvalue&gt;
&lt;/dublin_core&gt;
</pre><p>(Note the optional language tag attribute which notifies the system that the optional title is in French.)</p><p>Every metadata field used, must be registered via the metadata registry of the DSpace instance first.</p><p>The <code class="literal">contents</code> file simply enumerates, one file per line, the bitstream file names. See the following example:</p><pre class="screen">
file_1.doc
file_2.pdf
license
</pre><p> Please notice that the <span class="emphasis"><em>license</em></span> is optional, and if you wish to have one included, you can place the file in the .../item_001/ directory, for example.</p><p>The bitstream name may optionally be followed by the sequence:</p><p>
<code class="literal">\tbundle:bundlename</code>
</p><p> where '\t' is the tab character and 'bundlename' is replaced by the name of the bundle to which the bitstream should be added. If no bundle is specified, the bitstream will be added to the 'ORIGINAL' bundle.</p></div><div class="section" title="8.3.2.&nbsp;Configuring metadata-[prefix].xml for Different Schema"><div class="titlepage"><div><div><h3 class="title"><a name="N15A57"></a>8.3.2.&nbsp;<a name="docbook-sys_admin.html-dsafvariations"></a>Configuring <code class="literal">metadata-[prefix].xml</code> for Different Schema</h3></div></div><div></div></div><p>It is possible to use other Schema such as EAD, VRA Core, etc. Make sure you have defined the new scheme in the DSpace Metada Schema Registry. <div class="orderedlist"><ol class="orderedlist" type="1"><li class="listitem"><p>Create a separate file for the other schema named "<code class="literal">metadata_{prefix}.xml</code>", where the <code class="literal">{prefix}</code> is replaced with the schema's prefix.</p></li><li class="listitem"><p>Inside the xml file use the dame Dublin Core <span class="emphasis"><em>syntax</em></span>, but on the <code class="literal">&lt;dublin_core&gt;</code> element include the attribute "<code class="literal">schema={prefix}</code>".</p></li><li class="listitem"><p>Here is an example for ETD metadata, which would be in the file "<code class="literal">metadata_etd.xml"</code>:</p><pre class="screen">&lt;xml version="1.0" encoding="UTF-8"?&gt;
&lt;dublin_core schema="etd"&gt;
&lt;dcvalue element="degree" qualifier="department"&gt;Computer Science&lt;/dcvalue&gt;
&lt;dcvalue element="degree" qualifier="level"&gt;Masters&lt;/dcvalue&gt;
&lt;dcvalue element="degree" qualifier="grantor"&gt;Texas A &amp; M&lt;/dcvalue&gt;
</pre></li></ol></div></p></div><div class="section" title="8.3.3.&nbsp;Importing Items"><div class="titlepage"><div><div><h3 class="title"><a name="N15A8A"></a>8.3.3.&nbsp;<a name="docbook-sys_admin.html-importingitems"></a>Importing Items</h3></div></div><div></div></div><div class="note" title="Note" style="margin-left: 0.5in; margin-right: 0.5in;"><table border="0" summary="Note"><tr><td valign="top" align="center" rowspan="2" width="25"><img alt="[Note]" src="/jspui/doc/image/note.png"></td><th align="left"></th></tr><tr><td valign="top" align="left"><p>Before running the item importer over items previously exported from a DSpace instance, please first refer to <a class="link" href="ch08.html#docbook-sys_admin.html-transferitem">Transferring Items Between DSpace Instances</a>.</p></td></tr></table></div><div class="table"><a name="N15A98"></a><p class="title"><b>Table&nbsp;8.3.&nbsp;Import Items Command Table</b></p><div class="table-contents"><table summary="Import Items Command Table" border="1" width="100%"><colgroup><col align="left"><col align="left"></colgroup><tbody><tr><td align="left">Command used:</td><td align="left">
<span class="emphasis"><em>
<code class="literal">[dspace]</code>
</em></span>
<code class="literal">/bin/dspace import</code>
</td></tr><tr><td align="left">Java class:</td><td align="left">
<code class="literal">org.dspace.app.itemimport.ItemImport</code>
</td></tr><tr><td align="left">Arguments short and (long) forms:</td><td align="left">Description</td></tr><tr><td align="left"><code class="literal">-a</code> or <code class="literal">--add</code></td><td align="left">Add items to DSpace &Dagger;</td></tr><tr><td align="left"><code class="literal">-r</code> or <code class="literal">--replace</code></td><td align="left">Replace items listed in mapfile &Dagger;</td></tr><tr><td align="left"><code class="literal">-d</code> or <code class="literal">--delete</code></td><td align="left">Delete items listed in mapfile &Dagger;</td></tr><tr><td align="left"><code class="literal">-s</code> or <code class="literal">--source</code></td><td align="left">Source of the items (directory)</td></tr><tr><td align="left"><code class="literal">-c</code> or <code class="literal">--collection</code></td><td align="left">Destination Collection by their Handle or database ID</td></tr><tr><td align="left"><code class="literal">-m</code> or <code class="literal">--mapfile</code></td><td align="left">Where the mapfile for items can be found (name and directory)</td></tr><tr><td align="left"><code class="literal">-e</code> or <code class="literal">--eperson</code></td><td align="left">Email of eperson doing the importing</td></tr><tr><td align="left"><code class="literal">-w</code> or <code class="literal">--workflow</code></td><td align="left">Send submission through collection' workflow</td></tr><tr><td align="left"><code class="literal">-n</code> or <code class="literal">--notify</code></td><td align="left">Kicks off the email alerting of the item(s) has(have) been imported</td></tr><tr><td align="left"><code class="literal">-t</code> or <code class="literal">--test</code></td><td align="left">Test run&mdash;do not actually import items</td></tr><tr><td align="left"><code class="literal">-p</code> or <code class="literal">--template</code></td><td align="left">Apply the collection template</td></tr><tr><td align="left"><code class="literal">-R</code> or <code class="literal">--resume</code></td><td align="left">Resume a failed import (Used on Add only)</td></tr><tr><td align="left"><code class="literal">-h</code> or <code class="literal">--help</code></td><td align="left">Command help</td></tr></tbody></table></div></div><br class="table-break"><p>&Dagger; These are mutually exclusive.</p><p>The item importer is able to batch import unlimited numbers of items for a particular collection using a very simple CLI command and 'arguments' </p><div class="section" title="8.3.3.1.&nbsp;Adding Items to a Collection"><div class="titlepage"><div><div><h4 class="title"><a name="N15B5D"></a>8.3.3.1.&nbsp;Adding Items to a Collection</h4></div></div><div></div></div><p>To add items to a collection, you gather the following information:</p><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem"><p>eperson</p></li><li class="listitem"><p>Collection ID (either Handle (e.g. 123456789/14) or Database ID (e.g. 2)</p></li><li class="listitem"><p>Source directory where the items reside</p></li><li class="listitem"><p>Mapfile. Since you don't have one, you need to determine where it will be (e.g. /Import/Col_14/mapfile)</p></li></ul></div><p>At the command line:</p><p>
<code class="literal">[dspace]/bin/import --add --eperson=joe@user.com --collection=CollectionID --source=items_dir --mapfile=mapfile</code>
</p><p>or by using the short form:</p><p>
<code class="literal">[dspace]/bin/import -a -e joe@user.com -c CollectionID -s items_dir -m mapfile</code>
</p><p>The above command would cycle through the archive directory's items, import them, and then generate a map file which stores the mapping of item directories to item handles. <span class="bold"><strong>SAVE THIS MAP FILE.</strong></span> Using the map file you can use it for replacing or deleting (unimporting) the file. </p><p><span class="bold"><strong>Testing.</strong></span> You can add <code class="literal">--test</code> (or <code class="literal">-t</code>) to the command to simulate the entire import process without actually doing the import. This is extremely useful for verifying your import files before doing the actual import.</p></div><div class="section" title="8.3.3.2.&nbsp;Replacing Items in Collection"><div class="titlepage"><div><div><h4 class="title"><a name="N15B93"></a>8.3.3.2.&nbsp;Replacing Items in Collection</h4></div></div><div></div></div><p>Replacing existing items is relatively easy. Remember that mapfile you were <span class="emphasis"><em>supposed</em></span> to save? Now you will use it. The command (in short form):</p><p>
<code class="literal">[dspace]/bin/import -r -e joe@user.com -c collectionID -s items_dir -m mapfile</code>
</p><p>Long form:</p><p>
<code class="literal">[dspace]/bin/import --replace --eperson=joe@user.com --collection=collectionID --source=items_dire --mapfile=mapfile</code>
</p></div><div class="section" title="8.3.3.3.&nbsp;Deleting or Unimporting Items in a Collection"><div class="titlepage"><div><div><h4 class="title"><a name="N15BAA"></a>8.3.3.3.&nbsp;Deleting or Unimporting Items in a Collection</h4></div></div><div></div></div><p>You are able to unimport or delete items provided you have the mapfile. Remember that mapfile you were <span class="emphasis"><em>supposed</em></span> to save? The command is (in short form):</p><p>
<code class="literal">[dspace]/bin/import -d -m mapfile</code>
</p><p>In long form:</p><p>
<code class="literal">[dspace/bin/import --delete --mapfile mapfile</code>
</p></div><div class="section" title="8.3.3.4.&nbsp;Other Options"><div class="titlepage"><div><div><h4 class="title"><a name="N15BC1"></a>8.3.3.4.&nbsp;Other Options</h4></div></div><div></div></div><p><span class="bold"><strong>Workflow</strong></span>. The importer usually bypasses any workflow assigned to a collection. But add the <code class="literal">--workflow </code>(<code class="literal">-w</code>) argument will route the imported items through the workflow system.</p><p><span class="bold"><strong>Templates</strong></span>. If you have templates that have constant data and you wish to apply that data during batch importing, add the <code class="literal">--template </code>(<code class="literal">-p</code>) argument.</p><p><span class="bold"><strong>Resume</strong></span>. If, during importing, you have an error and the import is aborted, you can use the <code class="literal">--resume </code>(<code class="literal">-R</code>) flag that you can try to resume the import where you left off after you fix the error.</p></div></div><div class="section" title="8.3.4.&nbsp;Exporting Items"><div class="titlepage"><div><div><h3 class="title"><a name="N15BEB"></a>8.3.4.&nbsp;<a name="docbook-sys_admin.html-exportingitems"></a>Exporting Items</h3></div></div><div></div></div><p>The item exporter can export a single item or a collection of items, and creates a DSpace simple archive for each item to be exported.</p><div class="table"><a name="N15BF3"></a><p class="title"><b>Table&nbsp;8.4.&nbsp;Export Items Command Table</b></p><div class="table-contents"><table summary="Export Items Command Table" border="1" width="100%"><colgroup><col align="left"><col align="left"></colgroup><tbody><tr><td align="left">Command used:</td><td align="left">
<span class="emphasis"><em><code class="literal">[dspace]</code></em></span><code class="literal">/bin/dspace export</code>
</td></tr><tr><td align="left">Java class:</td><td align="left">
<code class="literal">org.dspace.app.itemexport.ItemExport</code>
</td></tr><tr><td align="left">Arguments short and (long) forms:</td><td align="left">Description</td></tr><tr><td align="left"><code class="literal">-t</code> or <code class="literal">--type</code></td><td align="left">Type of export. <code class="literal">COLLECTION</code> will inform the program you want the whole collection. <code class="literal">ITEM</code> will be only the specific item. (You will actually key in the keywords in all caps. See examples below.)</td></tr><tr><td align="left"><code class="literal">-i</code> or <code class="literal">--ed</code></td><td align="left">The ID or Handle of the Collection or Item to export.</td></tr><tr><td align="left"><code class="literal">-d</code> or <code class="literal">--dest</code></td><td align="left">The destination of where you want the file of items to be placed. You place the path if necessary. </td></tr><tr><td align="left"><code class="literal">-n</code> or <code class="literal">--number</code></td><td align="left">Sequence number to begin export the items with. Whatever number you give, this will be the name of the first directory created for your export. The layout of the export is the same as you would set your layout for an Import.</td></tr><tr><td align="left"><code class="literal">-m</code> or <code class="literal">--migrate</code></td><td align="left">Export the item/collection for migration. This will remove the handle and metadata that will be re-created in the new instance of DSpace.</td></tr><tr><td align="left"><code class="literal">-h</code> or <code class="literal">--help</code></td><td align="left">Brief Help.</td></tr></tbody></table></div></div><br class="table-break"><p><span class="bold"><strong>Exporting a Collection</strong></span></p><p>To export a collection's items you type at the CLI:</p><p>[dspace]/bin/dspace export --type=COLLECTION --id=collID --dest=dest_dir --number=seq_num</p><p>Short form:</p><p><code class="literal">[dspace]/bin/dspace export -t COLLECTION -d CollID or Handle -d /path/to/destination -n Some_number</code></p><p><span class="bold"><strong>Exporting a Single Item</strong></span></p><p>The keyword <code class="literal">COLLECTION</code> means that you intend to export an entire collection. The ID can either be the database ID or the handle. The exporter will begin numbering the simple archives with the sequence number that you supply. To export a single item use the keyword <code class="literal">ITEM</code> and give the item ID as an argument:</p><p><code class="literal">[dspace]/bin/dspace export --type=ITEM --id=itemID --dest=dest_dir --number=seq_num</code></p><p>Short form:</p><p><code class="literal">[dspace]/bin/dspace export -t ITEM -i itemID or Handle -d /path/to/destination -n some_unumber</code></p><p>Each exported item will have an additional file in its directory, named 'handle'. This will contain the handle that was assigned to the item, and this file will be read by the importer so that items exported and then imported to another machine will retain the item's original handle.</p><p><span class="bold"><strong>The <code class="literal">-m</code> Arugment</strong></span></p><p>Using the <code class="literal">-m</code> argument will export the item/collection and also perform the migration step. It will perform the same process that the next section <a class="link" href="ch08.html#docbook-sys_admin.html-transferitem">Transferring Items Between DSpace Instances </a> performs. We recommend that the next section be read in conjunction with this flag being used. </p></div></div><div class="section" title="8.4.&nbsp;Transferring Items Between DSpace Instances"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="N15CA6"></a>8.4.&nbsp;<a name="docbook-sys_admin.html-transferitem"></a>Transferring Items Between DSpace Instances</h2></div><div><h3 class="subtitle">Migration of Data</h3></div></div><div></div></div><p>Where items are to be moved between DSpace instances (for example from a test DSpace into a production DSpace) the item exporter and item importer can be used in conjunction with a script to assist in this process.</p><p>After running the item exporter each <code class="literal">dublin_core.xml</code> file will contain metadata that was automatically added by DSpace. These fields are as follows:</p><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem"><p>date.accessioned</p></li><li class="listitem"><p>date.available</p></li><li class="listitem"><p>date.issued</p></li><li class="listitem"><p>description.provenance</p></li><li class="listitem"><p>format.extent</p></li><li class="listitem"><p>format.mimetype</p></li><li class="listitem"><p>identifier.uri</p></li></ul></div><p>In order to avoid duplication of this metadata, run</p><p><code class="literal">dspace_migrate &lt;/path/to/exported item directory&gt;</code></p><p>prior to running the item importer. This will remove the above metadata items, except for date.issued - if the item has been published or publicly distributed before and <code class="literal">identifier.uri</code> - if it is not the handle, from the <code class="literal">dublin_core.xml</code> file and remove all <code class="literal">handle</code> files. It will then be safe to run the item exporter.</p></div><div class="section" title="8.5.&nbsp;Item Update"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="N15CE1"></a>8.5.&nbsp;<a name="docbook-sys_admin.html-itemupdate"></a>Item Update</h2></div></div><div></div></div><p>ItemUpdate is a batch-mode command-line tool for altering the metadata and bitstream content of existing items in a DSpace instance. It is a companion tool to ItemImport and uses the DSpace simple archive format to specify changes in metadata and bitstream contents. Those familiar with generating the source trees for ItemImporter will find a similar environment in the use of this batch processing tool.</p><p>For metadata, ItemUpdate can perform 'add' and 'delete' actions on specified metadta elements. For bitstreams, 'add' and 'delete' are similarly available. All these actions can be combined in a single batch run.</p><p>ItemUpdate supports an undo feature for all actions except bitstream deletion. There is also a test mode, as with ItemImport. However, unlike ItemImport, there is no resume feature for incomplete processing. There is more extensive logging with a summary statement at the end with counts of successful and unsuccessful items processed.</p><p>One probable scenario for using this tool is where there is an external primary data source for which the DSpace instance is a secondary or down-stream system. Metadata and/or bitstream content changes in the primary system can be exported to the simple archive format to be used by ItemUpdate to synchronize the changes.</p><p>A note on terminology: <span class="bold"><strong>item</strong></span> refers to a DSpace item. <span class="bold"><strong>metadata element</strong></span> refers generally to a qualified or unqualified element in a schema in the form <code class="literal">[schema].[element].[qualifier]</code> or <code class="literal">[schema].[element]</code> and occasionally in a more specific way to the second part of that form. <span class="bold"><strong>metadata field</strong></span> refers to a specific instance pairing a metadata element to a value.</p><div class="section" title="8.5.1.&nbsp;DSpace simple Archive Format"><div class="titlepage"><div><div><h3 class="title"><a name="N15D06"></a>8.5.1.&nbsp;DSpace simple Archive Format</h3></div></div><div></div></div><p>As with ItemImporter, the idea behind the DSpace's simple archive format is to create an archive directory with a subdirectory per item. There are a few additional features added to this format specifically for ItemUpdate. Note that in the simple archive format, the item directories are merely local references and only used by ItemUpdate in the log output.</p><p>The user is referred to the previous section <a class="link" href="ch08.html#docbook-sys_admin.html-dsaf">DSpace Simple Archive Format.</a></p><p>Additionally, the use of a <span class="bold"><strong>delete_contents</strong></span> is now available. This file lists the bitstreams to be deleted, one bitstream ID per line. Currently, no other identifiers for bitstreams are usable for this function. This file is an addition to the Archive format specifically for ItemUpdate.</p><p>The optional suppress_undo file is a flag to indicate that the 'undo archive' should not be written to disk. This file is usually written by the application in an undo archive to prevent a recursive undo. This file is an addition to the Archive format specifically for ItemUpdate.</p></div><div class="section" title="8.5.2.&nbsp;ItemUpdate Commands"><div class="titlepage"><div><div><h3 class="title"><a name="N15D19"></a>8.5.2.&nbsp;ItemUpdate Commands</h3></div></div><div></div></div><div class="table"><a name="N15D1D"></a><p class="title"><b>Table&nbsp;8.5.&nbsp;ItemUpdate Command Table</b></p><div class="table-contents"><table summary="ItemUpdate Command Table" border="1" width="100%"><colgroup><col align="left"><col align="left"></colgroup><tbody><tr><td align="left">Command used:</td><td align="left">
<span class="emphasis"><em>
<code class="literal">[dspace]</code>
</em></span>
<code class="literal">/bin/dspace itemupdate</code>
</td></tr><tr><td align="left">Java class:</td><td align="left">
<code class="literal">org.dspace.app.itemimport.ItemUpdate</code>
</td></tr><tr><td align="left">Arguments short and (long) forms:</td><td align="left">Description</td></tr><tr><td align="left"><code class="literal">-a</code> or <code class="literal">--addmetadata [metadata element]</code></td><td align="left">Repeatable for multiple elements. The metadata element should be in the form dc.x or dc.x.y. The mandatory argument indicates the metadata fields in the dublin_core.xml file to be added unless already present. However, duplicate fields will not be added to the item metadata without warning or error.</td></tr><tr><td align="left"><code class="literal">-d</code> or <code class="literal">--deletemetadata [metadata element]</code></td><td align="left">Repeatable for multiple elements. All metadata fields matching the element will be deleted.</td></tr><tr><td align="left"><code class="literal">-A</code> or <code class="literal">--addbitstream</code></td><td align="left">Adds bitstreams listed in the contents file with the bistream metadata cited there.</td></tr><tr><td align="left"><code class="literal">-D</code> or <code class="literal">--deletebitstream [filter plug classname or alis]</code></td><td align="left">Not repeatable. With no argument, this operation deletes bistreams listed in the <code class="literal">deletes_contents</code> file. Only bitstream ids are recognized identifiers for this operatiotn. The optional filter argument is the classname of an implementation of <code class="literal">org.dspace.app.itemdupate.BitstreamFilter</code> class to identify files for deletion or one of the aliases (ORIGINAL, ORIGINAL_AND_DERIVATIVES, TEXT, THUMBNAIL) which reference existing filters based on membership in a bundle of that name. IN this case, the <code class="literal">delete_contents</code> file is not required for any item. The filter properties file will contains properties pertinent to the particular filer used. Multiple filters are not allowed.</td></tr><tr><td align="left"><code class="literal">-h</code> or <code class="literal">--help</code></td><td align="left">Displays brief command line help.</td></tr><tr><td align="left"><code class="literal">-e</code> or <code class="literal">--eperson</code></td><td align="left">Email address of the person or the user's database ID <span class="bold"><strong>(Required)</strong></span></td></tr><tr><td align="left"><code class="literal">-s</code> or <code class="literal">--source</code></td><td align="left">Directory archive to process <span class="bold"><strong>(Required)</strong></span></td></tr><tr><td align="left"><code class="literal">-i</code> or <code class="literal">--itemidentifier</code></td><td align="left">Specifies an alternate metadata field (not a handle) used to hold an identifier used to match the DSpace item with that in the archive. If omitted, the item handle is expected to be located in the <code class="literal">dc.identifier.uri</code> field. (Optional)</td></tr><tr><td align="left"><code class="literal">-t</code> or <code class="literal">--test</code></td><td align="left">Runs the process in test mode with logging but no changes applied to the DSpace instance. (Optional)</td></tr><tr><td align="left"><code class="literal">-P</code> or <code class="literal">--alterprovenance</code></td><td align="left">Prevents any changes to the provenance field to represent changes in the bitstream content resulting from an Add or Delete. No provenance statements are written for thumbnails or text derivative bitstreams, un keepin with the practice of MediaFilterManager. (Optional)</td></tr><tr><td align="left"><code class="literal">-F</code> or <code class="literal">--filterproperties</code></td><td align="left">The filter properties files to be used by the delete bitstreams action (Optional)</td></tr></tbody></table></div></div><br class="table-break"></div><div class="section" title="8.5.3.&nbsp;CLI Examples"><div class="titlepage"><div><div><h3 class="title"><a name="N15DDE"></a>8.5.3.&nbsp;CLI Examples</h3></div></div><div></div></div><p><span class="bold"><strong>Adding Metadata</strong></span>:</p><p><code class="literal">[dspace]/bin/dspace itemupdate -e joe@user.com -s [path/to/archive] -a dc.description</code></p><p><span class="emphasis"><em>This will add from your archive the dc element description based on the handle from the URI (since the -i argument wasn't used).</em></span></p></div></div><div class="section" title="8.6.&nbsp;Registering (Not Importing) Bitstreams"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="N15DED"></a>8.6.&nbsp;<a name="docbook-sys_admin.html-registration"></a>Registering (Not Importing) Bitstreams</h2></div></div><div></div></div><p>Registration is an alternate means of incorporating items, their metadata, and their bitstreams into DSpace by taking advantage of the bitstreams already being in storage accessible to DSpace. An example might be that there is a repository for existing digital assets. Rather than using the normal <a class="link" href="ch02.html#docbook-functional.html-ingest">interactive ingest process</a> or the <a class="link" href="ch02.html#docbook-functional.html-importexport">batch import</a> to furnish DSpace the metadata and to upload bitstreams, registration provides DSpace the metadata and the location of the bitstreams. DSpace uses a variation of the import tool to accomplish registration.</p><div class="section" title="8.6.1.&nbsp;Accessible Storage"><div class="titlepage"><div><div><h3 class="title"><a name="N15DFE"></a>8.6.1.&nbsp;Accessible Storage</h3></div></div><div></div></div><p>To register an item its bitstreams must reside on storage accessible to DSpace and therefore referenced by an asset store number in <code class="literal">dspace.cfg</code>. The configuration file <code class="literal">dspace.cfg</code> establishes one or more asset stores through the use of an integer asset store number. This number relates to a directory in the DSpace host's file system or a set of SRB account parameters. This asset store number is described in <a class="link" href="ch05.html#docbook-configure.html-dspace-cfg">The <code class="literal">dspace.cfg</code> Configuration Properties File</a> section and in the <code class="literal">dspace.cfg</code> file itself. The asset store number(s) used for registered items should generally not be the value of the <code class="literal">assetstore.incoming</code> property since it is unlikely that you will want to mix the bitstreams of normally ingested and imported items and registered items.</p></div><div class="section" title="8.6.2.&nbsp;Registering Items Using the Item Importer"><div class="titlepage"><div><div><h3 class="title"><a name="N15E1C"></a>8.6.2.&nbsp;Registering Items Using the Item Importer</h3></div></div><div></div></div><p>DSpace uses the same import tool that is used for batch import except that several variations are employed to support registration. The discussion that follows assumes familiarity with the import tool.</p><p>The archive format for registration does not include the actual content files (bitstreams) being registered. The format is however a directory full of items to be registered, with a subdirectory per item. Each item directory contains a file for the item's descriptive metadata (<code class="literal">dublin_core.xml</code>) and a file listing the item's content files (<code class="literal">contents</code>), but not the actual content files themselves.</p><p>The <code class="literal">dublin_core.xml</code> file for item registration is exactly the same as for regular item import.</p><p>The <code class="literal">contents</code> file, like that for regular item import, lists the item's content files, one content file per line, but each line has the one of the following formats:</p><pre class="screen">-r -s n -f filepath
-r -s n -f filepath\tbundle:bundlename
-r -s n -f filepath\tbundle:bundlename\tpermissions: -[r|w] 'group name'
-r -s n -f filepath\tbundle:bundlename\tpermissions: -[r|w] 'group name'\tdescription: some text</pre><p>where</p><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem"><p><code class="literal">-r</code> indicates this is a file to be registered</p></li><li class="listitem"><p><code class="literal">-s n</code> indicates the asset store number (<code class="literal">n</code>)</p></li><li class="listitem"><p><code class="literal">-f filepath</code> indicates the path and name of the content file to be registered (filepath)</p></li><li class="listitem"><p><code class="literal">\t</code> is a tab character</p></li><li class="listitem"><p><code class="literal">bundle:bundlename</code> is an optional bundle name</p></li><li class="listitem"><p><code class="literal">permissions: -[r|w] 'group name'</code> is an optional read or write permission that can be attached to the bitstream</p></li><li class="listitem"><p><code class="literal">description: some text</code> is an optional description field to add to the file</p></li></ul></div><p>The bundle, that is everything after the filepath, is optional and is normally not used.</p><p>The command line for registration is just like the one for regular import:</p><p><code class="literal">[dspace]/bin/dspace import -a -e joe@user.com -c collectionID -s items_dir -m mapfile</code></p><p>(or by using the long form)</p><p><code class="literal">[dspace]/bin/dspace import --add -eperson=joe@user.com --collection=collectionID --source=items_dir --map=mapfile</code></p><p>The <code class="literal">--workflow</code> and <code class="literal">--test</code> flags will function as described in <a class="link" href="ch08.html#docbook-sys_admin.html-importingitems">Importing Items</a>.</p><p>The <code class="literal">--delete</code> flag will function as described in <a class="link" href="ch08.html#docbook-sys_admin.html-importingitems">Importing Items</a> but the registered content files will not be removed from storage. See <a class="link" href="ch08.html#docbook-sys_admin.html-deletingregistereditems">Deleting Registered Items</a>.</p><p>The <code class="literal">--replace</code> flag will function as described in <a class="link" href="ch08.html#docbook-sys_admin.html-importingitems">Importing Items</a> but care should be taken to consider different cases and implications. With old items and new items being registered or ingested normally, there are four combinations or cases to consider. Foremost, an old registered item deleted from DSpace using <code class="literal">--replace</code> will not be removed from the storage. See <a class="link" href="ch08.html#docbook-sys_admin.html-deletingregistereditems">Deleting Registered Items</a>. where is resides. A new item added to DSpace using <code class="literal">--replace</code> will be ingested normally or will be registered depending on whether or not it is marked in the <code class="literal">contents</code> files with the -r.</p></div><div class="section" title="8.6.3.&nbsp;Internal Identification and Retrieval of Registered Items"><div class="titlepage"><div><div><h3 class="title"><a name="N15EB0"></a>8.6.3.&nbsp;Internal Identification and Retrieval of Registered Items</h3></div></div><div></div></div><p>Once an item has been registered, superficially it is indistinguishable from items ingested interactively or by batch import. But internally there are some differences:</p><p>First, the randomly generated internal ID is not used because DSpace does not control the file path and name of the bitstream. Instead, the file path and name are that specified in the <code class="literal">contents</code> file.</p><p>Second, the <code class="literal">store_number</code> column of the bitstream database row contains the asset store number specified in the <code class="literal">contents</code> file.</p><p>Third, the <code class="literal">internal_id</code> column of the bitstream database row contains a leading flag (<code class="literal">-R</code>) followed by the registered file path and name. For example, <code class="literal">-Rfilepath</code> where <code class="literal">filepath</code> is the file path and name relative to the asset store corresponding to the asset store number. The asset store could be traditional storage in the DSpace server's file system or an SRB account.</p><p>Fourth, an MD5 checksum is calculated by reading the registered file if it is in local storage. If the registerd file is in remote storage (say, SRB) a checksum is calculated on just the file name! This is an efficiency choice since registering a large number of large files that are in SRB would consume substantial network resources and time. A future option could be to have an SRB proxy process calculate MD5s and store them in SRB's metadata catalog (MCAT) for rapid retrieval. SRB offers such an option but it's not yet in production release.</p><p>Registered items and their bitstreams can be retrieved transparently just like normally ingested items.</p></div><div class="section" title="8.6.4.&nbsp;Exporting Registered Items"><div class="titlepage"><div><div><h3 class="title"><a name="N15EDC"></a>8.6.4.&nbsp;Exporting Registered Items</h3></div></div><div></div></div><p>Registered items may be exported as described in <a class="link" href="ch08.html#docbook-sys_admin.html-exportingitems">Exporting Items</a>. If so, the export directory will contain actual copies of the files being exported but the lines in the contents file will flag the files as registered. This means that if DSpace items are "round tripped" (see Transferring Items Between DSpace Instances) using the exporter and importer, the registered files in the export directory will again registered in DSpace instead of being uploaded and ingested normally.</p></div><div class="section" title="8.6.5.&nbsp;METS Export of Registered Items"><div class="titlepage"><div><div><h3 class="title"><a name="N15EE6"></a>8.6.5.&nbsp;METS Export of Registered Items</h3></div></div><div></div></div><p>The <a class="link" href="ch08.html#docbook-sys_admin.html-mets">METS Export Tool</a> can also be used but note the cautions described in that section and note that MD5 values for items in remote storage are actually MD5 values on just the file name.</p></div><div class="section" title="8.6.6.&nbsp;Deleting Registered Items"><div class="titlepage"><div><div><h3 class="title"><a name="N15EF0"></a>8.6.6.&nbsp;<a name="docbook-sys_admin.html-deletingregistereditems"></a>Deleting Registered Items</h3></div></div><div></div></div><p>If a registered item is deleted from DSpace, either interactively or by using the <code class="literal">--delete</code> or <code class="literal">--replace</code> flags described in <a class="link" href="ch08.html#docbook-sys_admin.html-importingitems">Importing Items</a>, the item will disappear from DSpace but it's registered content files will remain in place just as they were prior to registration. Bitstreams not registered but added by DSpace as part of registration, such as <code class="literal">license.txt</code> files, will be deleted.</p></div></div><div class="section" title="8.7.&nbsp;METS Tools"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="N15F08"></a>8.7.&nbsp;<a name="docbook-sys_admin.html-mets"></a>METS Tools</h2></div></div><div></div></div><p>The experimental (incomplete) METS export tool writes DSpace items to a filesystem with the metadata held in a more standard format based on METS.</p><div class="section" title="8.7.1.&nbsp;The Export Tool"><div class="titlepage"><div><div><h3 class="title"><a name="N15F11"></a>8.7.1.&nbsp;The Export Tool</h3></div></div><div></div></div><p>This tool is obsolete, and does not export a complete AIP. It's use is strongly deprecated.</p><div class="table"><a name="N15F17"></a><p class="title"><b>Table&nbsp;8.6.&nbsp;Mets Export Command table</b></p><div class="table-contents"><table summary="Mets Export Command table" border="1" width="100%"><colgroup><col align="left"><col align="left"></colgroup><tbody><tr><td align="left">Command used:</td><td align="left">
<span class="emphasis"><em>
<code class="literal">[dspace]</code>
</em></span>
<code class="literal">/bin/dspace mets-export</code>
</td></tr><tr><td align="left">Java class:</td><td align="left"><code class="literal">org.dspace.app.mets.METSExport</code></td></tr><tr><td align="left">Arguments short and (long) forms:</td><td align="left">Description</td></tr><tr><td align="left"><code class="literal">-a</code> or <code class="literal">--all</code></td><td align="left">Export all items in the archive.</td></tr><tr><td align="left"><code class="literal">-c</code> or <code class="literal">--collection</code></td><td align="left">Handle of the collection to export.</td></tr><tr><td align="left"><code class="literal">-d</code> or <code class="literal">--destination</code></td><td align="left">Destination directory.</td></tr><tr><td align="left"><code class="literal">-i</code> or <code class="literal">--item</code></td><td align="left">Handle of the item to export.</td></tr><tr><td align="left"><code class="literal">-h</code> or <code class="literal">--help</code></td><td align="left">Help</td></tr></tbody></table></div></div><br class="table-break"><p>The following are examples of the types of process the METS tool can provide.</p><p><span class="bold"><strong>Exporting an individual item.</strong></span> From the CLI:</p><p><span class="emphasis"><em><code class="literal">[dspace]</code></em></span><code class="literal">/bin/dspace mets-export -i </code><span class="emphasis"><em><code class="literal">[handle] -d /path/to/destination</code></em></span></p><p><span class="bold"><strong>Exporting a collection</strong></span>. From the CLI:</p><p><code class="literal">[dspace]/bin/dspace mets-export -c [handle] -d /path/to/destination</code></p><p><span class="bold"><strong>Exporting all the items in DSpace.</strong></span> From the CLI:</p><p><code class="literal">[dspace]/bin/dspace mets-export -a -d /path/to/destination</code></p></div><div class="section" title="8.7.2.&nbsp;The AIP Format"><div class="titlepage"><div><div><h3 class="title"><a name="N15FA3"></a>8.7.2.&nbsp;The AIP Format</h3></div></div><div></div></div><p>Note that this tool is deprecated, and the output format is not a true AIP</p><p>Each exported item is written to a separate directory, created under the base directory specified in the command-line arguments, or in the current directory if <code class="literal">--destination</code> is omitted. The name of each directory is the Handle, URL-encoded so that the directory name is 'legal'.</p><p>Within each item directory is a <code class="literal">mets.xml</code> file which contains the METS-encoded metadata for the item. Bitstreams in the item are also stored in the directory. Their filenames are their MD5 checksums, firstly for easy integrity checking, and also to avoid any problems with 'special characters' in the filenames that were legal on the original filing system they came from but are illegal in the server filing system. The <code class="literal">mets.xml</code> file includes XLink pointers to these bitstream files.</p><p>An example AIP might look like this:</p><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem"><p>
<code class="literal">hdl%3A123456789%2F8/</code>
<div class="itemizedlist"><ul class="itemizedlist" type="circle"><li class="listitem"><p><code class="literal">mets.xml</code> -- METS metadata</p></li><li class="listitem"><p><code class="literal">184BE84F293342</code> -- bitstream</p></li><li class="listitem"><p>
<code class="literal">3F9AD0389CB821</code>
</p></li><li class="listitem"><p>
<code class="literal">135FB82113C32D</code>
</p></li></ul></div></p></li></ul></div><p>The contents of the METS in the <code class="literal">mets.xml</code> file are as follows:</p><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem"><p> A <code class="literal">dmdSec</code> (descriptive metadata section) containing the item's metadata in <a class="ulink" href="http://www.loc.gov/standards/mods/" target="_top">Metadata Object Description Schema (MODS)</a> XML. The Dublin Core descriptive metadata is mapped to MODS since there is no official qualified Dublin Core XML schema in existence as of yet, and the Library Application Profile of DC that DSpace uses includes some qualifiers that are not part of the <a class="ulink" href="http://dublincore.org/documents/dcmi-terms/" target="_top">DCMI Metadata Terms</a>.</p></li><li class="listitem"><p> An <code class="literal">amdSec</code> (administrative metadata section), which contains the a rights metadata element, which in turn contains the base64-encoded deposit license (the license the submitter granted as part of the submission process).</p></li><li class="listitem"><p> A <code class="literal">fileSec</code> containing a list of the bitstreams in the item. Each bundle constitutes a <code class="literal">fileGrp</code>. Each bitstream is represented by a <code class="literal">file</code> element, which contains an <code class="literal">FLocat</code> element with a simple XLink to the bitstream in the same directory as the <code class="literal">mets.xml</code> file. The <code class="literal">file</code> attributes consist of most of the basic technical metadata for the bitstream. Additionally, for those bitstreams that are thumbnails or text extracted from another bitstream in the item, those 'derived' bitstreams have the same <code class="literal">GROUPID</code> as the bitstream they were derived from, in order that clients understand that there is a relationship.</p><p>The <code class="literal">OWNERID</code> of each <code class="literal">file</code> is the <a class="link" href="ch02.html#docbook-functional.html-bitstream_ids">'persistent' bitstream identifier</a> assigned by the DSpace instance. The <code class="literal">ID</code> and <code class="literal">GROUPID</code> attributes consist of the item's Handle, together with the bitstream's sequence ID, which underscores used in place of dots and slashes. For example, a bitstream with sequence ID 24, in the item <code class="literal">hdl:123.456/789</code> will have the <code class="literal">ID</code><code class="literal">123_456_789_24</code>. This is because <code class="literal">ID</code> and <code class="literal">GROUPID</code> attributes must be of type <code class="literal">xsd:id</code>.</p></li></ul></div></div><div class="section" title="8.7.3.&nbsp;Limitations"><div class="titlepage"><div><div><h3 class="title"><a name="N16047"></a>8.7.3.&nbsp;Limitations</h3></div></div><div></div></div><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem"><p> No corresponding import tool yet</p></li><li class="listitem"><p> No <code class="literal">structmap</code> section</p></li><li class="listitem"><p> Some technical metadata not written, e.g. the primary bitstream in a bundle, original filenames or descriptions.</p></li><li class="listitem"><p> Only the MIME type is stored, not the (finer grained) bitstream format.</p></li><li class="listitem"><p> Dublin Core to MODS mapping is very simple, probably needs verification</p></li></ul></div></div></div><div class="section" title="8.8.&nbsp;MediaFilters: Transforming DSpace Content"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="N1605F"></a>8.8.&nbsp;<a name="docbook-sys_admin.html-mediafilters"></a>MediaFilters: Transforming DSpace Content</h2></div></div><div></div></div><p>DSpace can apply filters to content/bitstreams, creating new content. Filters are included that extract text for <span class="bold"><strong>full-text searching</strong></span>, and create <span class="bold"><strong>thumbnails</strong></span> for items that contain images. The media filters are controlled by the <code class="literal">MediaFilterManager</code> which traverses the asset store, invoking the <code class="literal">MediaFilter</code> or <code class="literal">FormatFilter</code> classes on bitstreams. The media filter plugin configuration <code class="literal">filter.plugins</code> in <code class="literal">dspace.cfg</code> contains a list of all enabled media/format filter plugins (see <a class="link" href="">Configuring Media Filters</a> for more information). The media filter system is intended to be run from the command line (or regularly as a cron task):</p><pre class="screen">[dspace]/bin/filter-media</pre><p>With no options, this traverses the asset store, applying media filters to bitstreams, and skipping bitstreams that have already been filtered.</p><p>
<span class="bold"><strong>Available Command-Line Options:</strong></span>
</p><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem"><p><span class="bold"><strong>Help</strong></span> : <code class="literal">[dspace]/bin/dspace filter-media -h</code></p><div class="itemizedlist"><ul class="itemizedlist" type="circle"><li class="listitem"><p> Display help message describing all command-line options.</p></li></ul></div></li><li class="listitem"><p><span class="bold"><strong>Force mode</strong></span> : <code class="literal">[dspace]/bin/dspace filter-media -f</code></p><div class="itemizedlist"><ul class="itemizedlist" type="circle"><li class="listitem"><p> Apply filters to ALL bitstreams, even if they've already been filtered. If they've already been filtered, the previously filtered content is overwritten.</p></li></ul></div></li><li class="listitem"><p><span class="bold"><strong>Identifier mode</strong></span> : <code class="literal">[dspace]/bin/dspace filter-media -i 123456789/2</code></p><div class="itemizedlist"><ul class="itemizedlist" type="circle"><li class="listitem"><p> Restrict processing to the community, collection, or item named by the identifier - by default, all bitstreams of all items in the repository are processed. The identifier must be a Handle, not a DB key. This option may be combined with any other option.</p></li></ul></div></li><li class="listitem"><p><span class="bold"><strong>Maximum mode</strong></span> : <code class="literal">[dspace]/bin/dspace filter-media -m 1000</code></p><div class="itemizedlist"><ul class="itemizedlist" type="circle"><li class="listitem"><p> Suspend operation after the specified maximum number of items have been processed - by default, no limit exists. This option may be combined with any other option.</p></li></ul></div></li><li class="listitem"><p><span class="bold"><strong>No-Index mode</strong></span> : <code class="literal">[dspace]/bin/dspace filter-media -n</code></p><div class="itemizedlist"><ul class="itemizedlist" type="circle"><li class="listitem"><p> Suppress index creation - by default, a new search index is created for full-text searching. This option suppresses index creation if you intend to run <code class="literal">index-update</code> elsewhere.</p></li></ul></div></li><li class="listitem"><p><span class="bold"><strong>Plugin mode</strong></span> : <code class="literal">[dspace]/bin/dspace filter-media -p "PDF Text Extractor","Word Text Extractor"</code></p><div class="itemizedlist"><ul class="itemizedlist" type="circle"><li class="listitem"><p> Apply ONLY the filter plugin(s) listed (separated by commas). By default all named filters listed in the <code class="literal">filter.plugins</code> field of <code class="literal">dspace.cfg</code> are applied. This option may be combined with any other option. <span class="emphasis"><em>WARNING:</em></span> multiple plugin names must be separated by a comma (i.e. ',') and NOT a comma followed by a space (i.e. ', ').</p></li></ul></div></li><li class="listitem"><p><span class="bold"><strong>Skip mode</strong></span> : <code class="literal">[dspace]/bin/dspace filter-media -s 123456789/9,123456789/100</code></p><div class="itemizedlist"><ul class="itemizedlist" type="circle"><li class="listitem"><p> SKIP the listed identifiers (separated by commas) during processing. The identifiers must be Handles (not DB Keys). They may refer to items, collections or communities which should be skipped. This option may be combined with any other option. <span class="emphasis"><em>WARNING:</em></span> multiple identifiers must be separated by a comma (i.e. <code class="literal">','</code>) and NOT a comma followed by a space (i.e. <code class="literal">', '</code>).</p></li><li class="listitem"><p> NOTE: If you have a large number of identifiers to skip, you may maintain this comma-separated list within a separate file (e.g. <code class="literal">filter-skiplist.txt</code>). Use the following format to call the program. <span class="emphasis"><em>Please note the use of the "grave" or "tick" (<code class="literal">`</code>) symbol and do not use the single quotation. </em></span></p><div class="itemizedlist"><ul class="itemizedlist" type="square"><li class="listitem"><p>
<code class="literal">[dspace]/bin/dspace filter-media -s `less filter-skiplist.txt`</code>
</p></li></ul></div></li></ul></div></li><li class="listitem"><p><span class="bold"><strong>Verbose mode</strong></span> : <code class="literal">[dspace]/bin/dspace filter-media -v</code></p><div class="itemizedlist"><ul class="itemizedlist" type="circle"><li class="listitem"><p> Verbose mode - print all extracted text and other filter details to STDOUT.</p></li></ul></div></li></ul></div><p>Adding your own filters is done by creating a class which <code class="literal">implements</code> the <code class="literal">org.dspace.app.mediafilter.FormatFilter</code> interface. See the <a class="link" href="">Creating a new Media Filter</a> topic and comments in the source file FormatFilter.java for more information. In theory filters could be implemented in any programming language (C, Perl, etc.) However, they need to be invoked by the Java code in the Media Filter class that you create.</p></div><div class="section" title="8.9.&nbsp;Sub-Community Management"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="N16139"></a>8.9.&nbsp;<a name="docbook-sys_admin.html-filiator"></a>Sub-Community Management</h2></div></div><div></div></div><p>DSpace provides an administrative tool&mdash;'CommunityFiliator'&mdash;for managing community sub-structure. Normally this structure seldom changes, but prior to the 1.2 release sub-communities were not supported, so this tool could be used to place existing pre-1.2 communities into a hierarchy. It has two operations, either establishing a community to sub-community relationship, or dis-establishing an existing relationship.</p><p>The familiar parent/child metaphor can be used to explain how it works. Every community in DSpace can be either a 'parent' community&mdash;meaning it has at least one sub-community, or a 'child' community&mdash;meaning it is a sub-community of another community, or both or neither. In these terms, an 'orphan' is a community that lacks a parent (although it can be a parent); 'orphans' are referred to as 'top-level' communities in the DSpace user-interface, since there is no parent community 'above' them. The first operation&mdash;establishing a parent/child relationship - can take place between any community and an orphan. The second operation - removing a parent/child relationship&mdash;will make the child an orphan.</p><div class="table"><a name="N16144"></a><p class="title"><b>Table&nbsp;8.7.&nbsp;Community Filiator Command table</b></p><div class="table-contents"><table summary="Community Filiator Command table" border="1" width="100%"><colgroup><col align="left"><col align="left"></colgroup><tbody><tr><td align="left">Command used:</td><td align="left">
<span class="emphasis"><em>
<code class="literal">[dspace]</code>
</em></span>
<code class="literal">/bin/dspace community-filiator</code>
</td></tr><tr><td align="left">Java class:</td><td align="left"><code class="literal">org.dspace.administer.CommunityFiliator</code></td></tr><tr><td align="left">Arguments short and (long) forms:</td><td align="left">Description</td></tr><tr><td align="left"><code class="literal">-s</code> or <code class="literal">--set</code></td><td align="left">Set a parent/child relationship</td></tr><tr><td align="left"><code class="literal">-r</code> or <code class="literal">--remove</code></td><td align="left">Remove a parent/child relationship</td></tr><tr><td align="left"><code class="literal">-c</code> or <code class="literal">--child</code></td><td align="left">Child community (Handle or database ID)</td></tr><tr><td align="left"><code class="literal">-p</code> or <code class="literal">--parent</code></td><td align="left">Parent community (Handle or database ID</td></tr><tr><td align="left"><code class="literal">-h</code> or <code class="literal">--help</code></td><td align="left">Online help.</td></tr></tbody></table></div></div><br class="table-break"><p><span class="bold"><strong>Set</strong></span> a parent/child relationship, issue the following at the CLI:</p><p><code class="literal">dsrun org.dspace.administer.CommunityFiliator --set --parent=parentID --child=childID</code></p><p>(or using the short form)</p><p><code class="literal">[dspace]/bin dspace community-filiator -s -p parentID -c childID</code></p><p>where '-s' or '--set' means establish a relationship whereby the community identified by the '-p' parameter becomes the parent of the community identified by the '-c' parameter. Both the 'parentID' and 'childID' values may be handles or database IDs.</p><p>The reverse operation looks like this:</p><p><code class="literal">[dspace]/bin dspace community-filiator --remove --parent=parentID --child=childID</code></p><p>(or using the short form)</p><p><code class="literal">[dspace]/bin dspace community-filiator -r -p parentID -c childID</code></p><p>where '-r' or '--remove' means dis-establish the current relationship in which the community identified by 'parentID' is the parent of the community identified by 'childID'. The outcome will be that the 'childID' community will become an orphan, i.e. a top-level community.</p><p>If the required constraints of operation are violated, an error message will appear explaining the problem, and no change will be made. An example in a removal operation, where the stated child community does not have the stated parent community as its parent: "Error, child community not a child of parent community".</p><p>It is possible to effect arbitrary changes to the community hierarchy by chaining the basic operations together. For example, to move a child community from one parent to another, simply perform a 'remove' from its current parent (which will leave it an orphan), followed by a 'set' to its new parent.</p><p>It is important to understand that when any operation is performed, all the sub-structure of the child community follows it. Thus, if a child has itself children (sub-communities), or collections, they will all move with it to its new 'location' in the community tree.</p></div><div class="section" title="8.10.&nbsp;Batch Metadata Editing"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="N161D0"></a>8.10.&nbsp;<a name="docbook-sys_admin.html-batchedits"></a>Batch Metadata Editing</h2></div></div><div></div></div><p>DSpace provides a batch metadata editing tool. The batch editing tool is able to produce a comma delimited file in the CVS format. The batch editing tool facilitates the user to perform the following:</p><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem"><p>Batch editing of metadata (e.g. perform an external spell check)</p></li><li class="listitem"><p>Batch additions of metadata (e.g. add an abstract to a set of items, add controlled vocabulary such as LCSH)</p></li><li class="listitem"><p>Batch find and replace of metadata values (e.g. correct misspelled surname across several records)</p></li><li class="listitem"><p>Mass move items between collections</p></li><li class="listitem"><p>Enable the batch addition of new items (without bitstreams) via a CSV file</p></li><li class="listitem"><p>Re-order the values in a list (e.g. authors)</p></li></ul></div><div class="section" title="8.10.1.&nbsp;Export Function"><div class="titlepage"><div><div><h3 class="title"><a name="N161EC"></a>8.10.1.&nbsp;Export Function</h3></div></div><div></div></div><p>The following table summarizes the basics.</p><div class="table"><a name="N161F2"></a><p class="title"><b>Table&nbsp;8.8.&nbsp;Batch Editing Metatdata Export Command Table</b></p><div class="table-contents"><table summary="Batch Editing Metatdata Export Command Table" border="1" width="100%"><colgroup><col align="left"><col align="left"></colgroup><tbody><tr><td align="left">Command used:</td><td align="left">
<span class="emphasis"><em>
<code class="literal">[dspace]</code>
</em></span>
<code class="literal">/bin/dspace metadata-export</code>
</td></tr><tr><td align="left">Java class:</td><td align="left">org.dspace.app.bulkedit.MetadataExport</td></tr><tr><td align="left">Arguments short and (long) forms):</td><td align="left">Description</td></tr><tr><td align="left"><code class="literal">-f</code> or <code class="literal">--file</code></td><td align="left">Required. The filename of the resulting CSV.</td></tr><tr><td align="left"><code class="literal">-i</code> or <code class="literal">--id</code></td><td align="left">The Item, Collection, or Community handle or Database ID to export. If not specified, <span class="bold"><strong>all</strong></span> items will be exported.</td></tr><tr><td align="left"><code class="literal">-a</code> or <code class="literal">--all</code></td><td align="left">Include all the metadata fields that are not normally changed (e.g. provenance) or those fields you configured in the <code class="literal">dspace.cfg</code> to be ignored on export.</td></tr><tr><td align="left"><code class="literal">-h</code> or <code class="literal">--help</code></td><td align="left">Display the help page.</td></tr></tbody></table></div></div><br class="table-break"><div class="section" title="8.10.1.1.&nbsp;Exporting Process"><div class="titlepage"><div><div><h4 class="title"><a name="N16254"></a>8.10.1.1.&nbsp;Exporting Process</h4></div></div><div></div></div><p>To run the batch editing exporter, at the command line:</p><p>
<code class="literal">[dspace]/bin/dspace metadata-export -f name_of_file.csv -i 1023/24 </code>
</p><p>Example:</p><p>
<code class="literal">[dspace]/bin/dspace metadata-export -f /batch_export/col_14.csv -i /1989.1/24</code>
</p><p>In the above example we have requested that a collection, assigned handle '<code class="literal">1989.1/24</code>' export the entire collection to the file '<code class="literal">col_14.cvs</code>' found in the '<code class="literal">/batch_export</code>' directory.</p></div></div><div class="section" title="8.10.2.&nbsp;Import Function"><div class="titlepage"><div><div><h3 class="title"><a name="N16276"></a>8.10.2.&nbsp;Import Function</h3></div></div><div></div></div><p>The following table summarizes the basics.</p><div class="table"><a name="N1627C"></a><p class="title"><b>Table&nbsp;8.9.&nbsp;Batch Editing Metatdata Import Command Table</b></p><div class="table-contents"><table summary="Batch Editing Metatdata Import Command Table" border="1" width="100%"><colgroup><col align="left"><col align="left"></colgroup><tbody><tr><td align="left">Command used:</td><td align="left">
<span class="emphasis"><em>
<code class="literal">[dspace]</code>
</em></span>
<code class="literal">/bin/dspace metadata-import</code>
</td></tr><tr><td align="left">Java class:</td><td align="left">org.dspace.app.bulkedit.MetadataImport</td></tr><tr><td align="left">Arguments short and (long) forms:</td><td align="left">Description</td></tr><tr><td align="left"><code class="literal">-f</code> or <code class="literal">--file</code></td><td align="left">Required. The filename of the CSV file to load.</td></tr><tr><td align="left"><code class="literal">-s</code> or <code class="literal">--silent</code></td><td align="left">Silent mode. The import function does not prompt you to make sure you wish to make the changes.</td></tr><tr><td align="left"><code class="literal">-e</code> or <code class="literal">--email</code></td><td align="left">The email address of the user. This is only required when adding new items.</td></tr><tr><td align="left"><code class="literal">-w</code> or <code class="literal">--workflow</code></td><td align="left">When adding new items, the program will queue the items up to use the Collection Workflow processes.</td></tr><tr><td align="left"><code class="literal">-n</code> or <code class="literal">--notify</code></td><td align="left">when adding new items using a workflow, send notification emails.</td></tr><tr><td align="left"><code class="literal">-t</code> or <code class="literal">--template</code></td><td align="left">When adding new items, use the Collection template, if it exists.</td></tr><tr><td align="left"><code class="literal">-h</code> or <code class="literal">--help</code></td><td align="left">Display the brief help page.</td></tr></tbody></table></div></div><br class="table-break"><div class="caution" title="Caution" style="margin-left: 0.5in; margin-right: 0.5in;"><table border="0" summary="Caution"><tr><td valign="top" align="center" rowspan="2" width="25"><img alt="[Caution]" src="/jspui/doc/image/caution.png"></td><th align="left"></th></tr><tr><td valign="top" align="left"><p>Silent Mode should be used carefully. It is possible (and probable) that you can overlay the wrong data and cause irreparable damage to the database. </p></td></tr></table></div><div class="section" title="8.10.2.1.&nbsp;Importing Process"><div class="titlepage"><div><div><h4 class="title"><a name="N162FA"></a>8.10.2.1.&nbsp;Importing Process</h4></div></div><div></div></div><p>To run the batch importer, at the command line:</p><p>
<code class="literal">[dspace]/bin/dspace metadata-import -f name_of_file.csv </code>
</p><p>Example</p><p>
<code class="literal">[dspace]/bin/dspace metadata-import -f /dImport/col_14.csv</code>
</p><p>If you are wishing to upload new metadata <span class="bold"><strong>without</strong></span> bistreams, at the command line:</p><p>
<code class="literal">[dspace]/bin/dspace/metadata-import -f /dImport/new_file.csv -e joe@user.com -w -n -t</code>
</p><p>In the above example we threw in all the arguments. This would add the metadata and engage the workflow, notification, and templates to all be applied to the items that are being added.</p></div></div><div class="section" title="8.10.3.&nbsp;The CSV Files"><div class="titlepage"><div><div><h3 class="title"><a name="N1631C"></a>8.10.3.&nbsp;The CSV Files</h3></div></div><div></div></div><p>The csv files that this tool can import and export abide by the RFC4180 CSV format <a class="ulink" href="http://www.ietf.org/rfc/rfc4180.txt" target="_top"><span class="underline">http://www.ietf.org/rfc/rfc4180.txt</span></a>. This means that new lines, and embedded commas can be included by wrapping elements in double quotes. Double quotes can be included by using two double quotes. The code does all this for you, and any good csv editor such as Excel or OpenOffice will comply with this convention.</p><p><span class="bold"><strong>File Structure.</strong></span> The first row of the csv must define the metadata values that the rest of the csv represents. The first column must always be "id" which refers to the item'id. All other columns are optional. The other columns contain the dublin core metadata fields that the data is to reside. </p><p>A typical heading row looks like:</p><pre class="screen"><code class="code">id,collection,dc.title,dc.contributor,dc.date.issued,etc,etc,etc.</code></pre><p>Subsequent rows in the csv file relate to items. A typical row might look like:</p><pre class="screen"><code class="code">350,2292,Item title,"Smith, John",2008</code></pre><p>If you want to store multiple values for a given metadata element, they can be separated with the double-pipe '||' (or another character that you defined in your <code class="literal">dspace.cfg </code>file. For example:</p><pre class="screen"><code class="code">Horses||Dogs||Cats</code></pre><p>Elements are stored in the database in the order that they appear in the csv file. You can use this to order elements where order may matter, such as authors, or controlled vocabulary such as Library of Congress Subject Headings.</p><p>When importing a csv file, the importer will <span class="emphasis"><em>overlay</em></span> the data onto what is already in the repository to determine the differences. It only acts on the contents of the cvs file, rather than on the complete item metadata. This means that the CSV file that is exported can be manipulated quite substantially before being re-imported. Rows (items) or Columns (metadata elements) can be removed and will be ignored. For example, if you only want to edit item abstracts, you can remove all of the other columns and just leave the abstract column. (You do need to leave the ID column intact. This is mandatory).</p><p><span class="bold"><strong>Editing collection membership.</strong></span> Items can be moved between collections by editing the collection handles in the 'collection' column. Multiple collections can be included. The first collection is the 'owning collection'. The owning collection is the primary collection that the item appears in. Subsequent collections (separated by the field separator) are treated as mapped collections. These are the same as using the map item functionality in the DSpace user interface. To move items between collections, or to edit which other collections they are mapped to, change the data in the collection column.</p><p><span class="bold"><strong>Adding items.</strong></span> New metadata-only items can be added to DSpace using the batch metadata importer. To do this, enter a plus sign '+' in the first 'id' column. The importer will then treat this as a new item. If you are using the command line importer, you will need to use the -e flag to specify the user email address or id of the user that is registered as submitting the items.</p><p><span class="bold"><strong>Deleting Data.</strong></span> It is possible to perform deletes across the board of certain metadata fields from an exported file. For example, let's say you have used keywords (dc.subject) that need to be removed <span class="emphasis"><em>en masse</em></span>. You would leave the column (dc.subject) intact, but remove the data in the corresponding rows.</p><p><span class="bold"><strong>Migrating Data or Exchanging data.</strong></span> It is possible that you have data in one Dublin Core (DC) element and you wish to really have it in another. An example would be that your staff have input Library of Congress Subject Headings in the Subject field (dc.subject) instead of the LCSH field (dc.subject.lcsh). Follow these steps and your data is migrated upon import:</p><div class="orderedlist"><ol class="orderedlist" type="1"><li class="listitem"><p>Insert a new column. The first row should be the new metadata element. (We will refer to it as the TARGET)</p></li><li class="listitem"><p>Select the column/rows of the data you wish to change. (We will refer to it as the SOURCE)</p></li><li class="listitem"><p>Cut and paste this data into the new column (TARGET) you created in Step 1.</p></li><li class="listitem"><p>Leave the column (SOURCE) you just cut and pasted from empty. Do not delete it.</p></li></ol></div></div></div><div class="section" title="8.11.&nbsp;Checksum Checker"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="N16370"></a>8.11.&nbsp;<a name="docbook-sys_admin.html-checksum"></a>Checksum Checker</h2></div></div><div></div></div><p>Checksum Checker is program that can run to verify the checksum of every item within DSpace. Checksum Checker was designed with the idea that most System Administrators will run it from the cron. Depending on the size of the repository choose the options wisely.</p><div class="table"><a name="N16379"></a><p class="title"><b>Table&nbsp;8.10.&nbsp;Checksum Checker Information Table</b></p><div class="table-contents"><table summary="Checksum Checker Information Table" border="1" width="100%"><colgroup><col align="left"><col align="left"></colgroup><tbody><tr><td align="left">Command used:</td><td align="left"><span class="emphasis"><em><code class="literal">[dspace]</code></em></span><code class="literal">/bin/dspace checker</code></td></tr><tr><td align="left">Java class:</td><td align="left">org.dspace.app.checker.ChecksumChecker</td></tr><tr><td align="left">Arguments short and (long) forms):</td><td align="left">Description</td></tr><tr><td align="left"><code class="literal">-L</code> or <code class="literal">--continuous</code></td><td align="left">Loop continuously through the bitstreams</td></tr><tr><td align="left"><code class="literal">-a</code> or <code class="literal">--handle</code></td><td align="left">Specify a handle to check</td></tr><tr><td align="left"><code class="literal">-b</code> &lt;bitstream-ids&gt;</td><td align="left">Space separated list of bitstream IDs</td></tr><tr><td align="left"><code class="literal">-c</code> or <code class="literal">--count</code></td><td align="left">Check count</td></tr><tr><td align="left"><code class="literal">-d</code> or <code class="literal">--duration</code></td><td align="left">Checking duration</td></tr><tr><td align="left"><code class="literal">-h</code> or <code class="literal">--help</code></td><td align="left">Calls online help</td></tr><tr><td align="left"><code class="literal">-l</code> or <code class="literal">--looping</code></td><td align="left">Loop once through bitstreams</td></tr><tr><td align="left"><code class="literal">-p</code> &lt;prune&gt;</td><td align="left">Prune old results (optionally using specified properties file for configuration</td></tr><tr><td align="left"><code class="literal">-v</code> or <code class="literal">--verbose</code></td><td align="left">Report all processing</td></tr></tbody></table></div></div><br class="table-break"><p>There are three aspects of the Checksum Checker's operation that can be configured:</p><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem"><p>the execution mode</p></li><li class="listitem"><p>the logging output</p></li><li class="listitem"><p>the policy for removing old checksum results from the database</p></li></ul></div><p>The user should refer to <a class="link" href="ch05.html#docbook-configure.html-checksum">Chapter 5. Configuration</a> for specific configuration beys in the <code class="literal">dspace.cfg</code> file.</p><div class="section" title="8.11.1.&nbsp;Checker Execution Mode"><div class="titlepage"><div><div><h3 class="title"><a name="N16415"></a>8.11.1.&nbsp;Checker Execution Mode</h3></div></div><div></div></div><p>Execution mode can be configured using command line options. Information on the options are found in the previous table above. The different modes are described below.</p><p>Unless a particular bitstream or handle is specified, the Checksum Checker will always check bitstreams in order of the least recently checked bitstream. (Note that this means that the most recently ingested bitstreams will be the last ones checked by the Checksum Checker.)</p><p><span class="bold"><strong>Available command line options</strong></span></p><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem"><p><span class="bold"><strong>Limited-count mode: </strong></span><code class="literal">[dspace]/bin/dspace checker -c</code></p><p>To check a specific number of bitstreams. The <code class="literal">-c</code> option if followed by an integer, the number of bitstreams to check.</p><p>Example: <code class="literal">[dspace/bin/dspace checker -c 10</code></p><p>This is particularly useful for checking that the checker is executing properly. The Checksum Checker's default execution mode is to check a single bitstream, as if the option was <code class="literal">-c 1</code></p></li><li class="listitem"><p><span class="bold"><strong>Duration mode:</strong></span>
<code class="literal">[dspace]/bin/dspace checker -d</code></p><p>To run the Check for a specific period of time with a time argument. You may use any of the time arguments below: </p><p>Example: <code class="literal">[dspace/bin/dspace checker -d 2h</code> (Checker will run for 2 hours)</p><div class="informaltable"><table border="1" width="60%"><colgroup><col><col></colgroup><tbody><tr><td>s</td><td>Seconds</td></tr><tr><td>m</td><td>Minutes</td></tr><tr><td>h</td><td>Hours</td></tr><tr><td>d</td><td>Days</td></tr><tr><td>w</td><td>Weeks</td></tr><tr><td>y</td><td>Years</td></tr></tbody></table></div><p>The checker will keep starting new bitstream checks for the specific durations, so actual execution duration will be slightly longer than the specified duration. Bear this in mind when scheduling checks.</p></li><li class="listitem"><p><span class="bold"><strong>Specific Bistream mode:</strong></span>
<code class="literal">[dspace]/bin/dspace checker -b</code></p><p>Checker will only look at the internal bitsteam IDs.</p><p>Example: <code class="literal">[dspace]/bin/dspace checker -b 112 113 4567</code> Checker will only check bitstream IDs 112, 113 and 4567.</p></li><li class="listitem"><p><span class="bold"><strong>Specific Handle mode:</strong></span>
<code class="literal">[dspace]/bin/dspace checker -a</code></p><p>Checkr will only check bistreams within the Community, Community or the item itself.</p><p>Example: <code class="literal">[dspace]/bin/dspace checker -a 123456/999</code> Checker will only check this handle. If it is a Collection or Community, it will run through the entire Collection or Community.</p><p>The Check</p></li><li class="listitem"><p><span class="bold"><strong>Looping mode:</strong></span>
<code class="literal">[dspace]/bin/dspace checker -l</code> or <code class="literal">[dspace]/bin/dspace checker -L</code></p><p>There are two modes. The lowercase 'el' (-l) specifies to check every bitstream in the repository once. This is recommended for smaller repositories who are able to loop through all their content in just a few hours maximum. An uppercase 'L' (-L) specifies to continuously loops through the repository. This is not recommended for most repository systems. </p><p><span class="bold"><strong>Cron Jobs</strong></span>. For large repositories that cannot be completely checked in a couple of hours, we recommend the -d option in cron.</p></li><li class="listitem"><p><span class="bold"><strong>Pruning mode:</strong></span>
<code class="literal">[dspace]/bin/dspace checker -p</code></p><p>The Checksum Checker will store the result of every check in the checksum_histroy table. By default, successful checksum matches that are eight weeks old or older will be deleted when the -p option is used. (Unsuccessful ones will be retained indefinitel). Without this option, the retention settings are ignored and the database table may grow rather large!</p></li></ul></div></div><div class="section" title="8.11.2.&nbsp;Checker Results Pruning"><div class="titlepage"><div><div><h3 class="title"><a name="N164C1"></a>8.11.2.&nbsp;Checker Results Pruning</h3></div></div><div></div></div><p>As stated above in "Pruning mode", the checksum_history table can get rather large, and that running the checker with the -p assists in the size of the checksum_history being kept manageable. The amount of time for which results are retained in the checksum_history table can be modified by one of two methods: </p><div class="orderedlist"><ol class="orderedlist" type="1"><li class="listitem"><p>Editing the retention policies in <code class="literal">[dspace]/config/dspace.cfg</code> See Chapter 5 <a class="link" href="ch05.html#docbook-configure.html-checksum">Configuration</a> for the property keys.</p><p>OR</p></li><li class="listitem"><p>Pass in a properties file containting retention policies when using the -p option.</p><p>To do this, create a file with the following two property keys: <pre class="screen">checker.retention.default = 10y
checker.retention.CHECKSUM_MATCH = 8w</pre> You can use the table above for your time units.</p><p>At the command line: <pre class="screen">[dspace]/bin/dspace checker -p retention_file_name &lt;ENTER&gt;</pre></p></li></ol></div></div><div class="section" title="8.11.3.&nbsp;Checker Reporting"><div class="titlepage"><div><div><h3 class="title"><a name="N164E5"></a>8.11.3.&nbsp;Checker Reporting</h3></div></div><div></div></div><p>Checksum Checker uses log4j to report its results. By default it will report to a log called <code class="literal">[dspace]/log/checker.log</code>, and it will report only on bitstreams for which the newly calculated checksum does not match the stored checksum. To report on all bitstreams checked regardless of outcome, use the <code class="literal">-v</code> (verbose) command line option:</p><p><code class="literal">[dspace]/bin/dspace checker -l -v</code> (This will loop through the repository once and report in detail about every bitstream checked.</p><p>To change the location of the log, or to modify the prefix used on each line of output, edit the <code class="literal">[dspace]/config/templates/log4j.properties</code> file and run <code class="literal">[dspace]/bin/install_configs</code>.</p></div><div class="section" title="8.11.4.&nbsp;Cron or Automatic Execution of Checksum Checker"><div class="titlepage"><div><div><h3 class="title"><a name="N16502"></a>8.11.4.&nbsp;Cron or Automatic Execution of Checksum Checker</h3></div></div><div></div></div><p>You should schedule the Checksum Checker to run automatically, based on how frequently you backup your DSpace instance (and how long you keep those backups). The size of your repository is also a factor. For very large repositories, you may need to schedule it to run for an hour (e.g. <code class="literal">-d 1h</code> option) each evening to ensure it makes it through your entire repository within a week or so. Smaller repositories can likely get by with just running it weekly.</p><p><span class="bold"><strong>Unix, Linux, or MAC OS</strong></span>. You can schedule it by adding a cron entry similar to the following to the crontab for the user who installed DSpace:</p><p><code class="literal">0 4 ** 0 [dspace]/bin/dspace checker -d2h -p</code></p><p>The above cron entry would schedule the checker to run the checker every Sunday at 400 (4:00 a.m.) for 2 hours. It also specifies to 'prune' the database based on the retention settings in <code class="literal">dspace.cfg</code>.</p><p><span class="bold"><strong>Windows OS</strong></span>. You will be unable to use the checker shell script. Instead, you should use Windows Schedule Tasks to schedule the following command to run at the appropriate times:</p><p><code class="literal">''[dspace]''/bin/dsrun.bat org.dspace.app.checker.ChecksumChecker -d2h -p</code> (This command should appear on a single line).</p></div><div class="section" title="8.11.5.&nbsp;Automated Checksum Checkers' Results"><div class="titlepage"><div><div><h3 class="title"><a name="N16525"></a>8.11.5.&nbsp;Automated Checksum Checkers' Results</h3></div></div><div></div></div><p>Optionally, you may choose to receive automated emails listing the Checksum Checkers' results. Schedule it to run <span class="bold"><strong>after</strong></span> the Checksum Checker has completed its processing (otherwise the email may not contain all the results).</p><div class="informaltable"><table border="1" width="100%"><colgroup><col align="left"><col align="left"></colgroup><tbody><tr><td align="left">Command used:</td><td align="left"><span class="emphasis"><em><code class="literal">[dspace]</code></em></span><code class="literal">/bin/dspace checker</code></td></tr><tr><td align="left">Java class:</td><td align="left">org.dspace.checker.DailyReportEmailer</td></tr><tr><td align="left">Arguments short and (long) forms):</td><td align="left">Description</td></tr><tr><td align="left"><code class="literal">-a</code> or <code class="literal">--All</code></td><td align="left">Send all the results (everything specified below)</td></tr><tr><td align="left"><code class="literal">-d</code> or <code class="literal">--Deleted</code></td><td align="left">Send E-mail report for all bitstreams set as deleted for today.</td></tr><tr><td align="left"><code class="literal">-m</code> or <code class="literal">--Missing</code></td><td align="left">Send E-mail report for all bitstreams not found in assetstore for today.</td></tr><tr><td align="left"><code class="literal">-c</code> or <code class="literal">--Changed</code></td><td align="left">Send E-mail report for all bitstrems where checksum has been changed for today.</td></tr><tr><td align="left"><code class="literal">-u</code> or <code class="literal">--Unchanged</code></td><td align="left">Send the Unchecked bitstream report.</td></tr><tr><td align="left"><code class="literal">-n</code> or <code class="literal">--Not Processed</code></td><td align="left">Send E-mail report for all bitstreams set to longer be processed for today.</td></tr><tr><td align="left"><code class="literal">-h</code> or <code class="literal">--help</code></td><td align="left">Help</td></tr></tbody></table></div><div class="tip" title="Tip" style="margin-left: 0.5in; margin-right: 0.5in;"><table border="0" summary="Tip"><tr><td valign="top" align="center" rowspan="2" width="25"><img alt="[Tip]" src="/jspui/doc/image/tip.png"></td><th align="left"></th></tr><tr><td valign="top" align="left"><p>You can also combine options (e.g. -m -c) for combined reports.</p></td></tr></table></div><p><span class="bold"><strong>Cron</strong></span>. Follow the same steps above as you would running checker in cron. Change the time but match the regularity. Remember to schedule this **after** Checksum Checker has run.</p></div></div><div class="section" title="8.12.&nbsp;Embargo"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="N165AC"></a>8.12.&nbsp;<a name="docbook-sys_admin-html-embargo"></a>Embargo</h2></div></div><div></div></div><p>If you have implemented the Embargo feature, you will need to run it periodically to check for Items with expired embargoes and lift them.</p><div class="table"><a name="N165B4"></a><p class="title"><b>Table&nbsp;8.11.&nbsp;Embargo Manager Command Table</b></p><div class="table-contents"><table summary="Embargo Manager Command Table" border="1" width="100%"><colgroup><col align="left"><col align="left"></colgroup><tbody><tr><td align="left">Command used:</td><td align="left"><span class="emphasis"><em><code class="literal">[dspace]</code></em></span><code class="literal">/bin/dspace embargo-lifter</code></td></tr><tr><td align="left">Java class:</td><td align="left">org.dspace.embargo.EmbargoManager</td></tr><tr><td align="left">Arguments short and (long) forms):</td><td align="left">Description</td></tr><tr><td align="left"><code class="literal">-c</code> or <code class="literal">--check</code></td><td align="left">ONLY check the state of embargoed Items, do NOT lift any embargoes</td></tr><tr><td align="left"><code class="literal">-i</code> or <code class="literal">--identifier</code></td><td align="left">Process ONLY this handle identifier(s), which must be an Item. Can be repeated.</td></tr><tr><td align="left"><code class="literal">-l</code> or <code class="literal">--lift</code></td><td align="left">Only lift embargoes, do NOT check the state of any embargoed items.</td></tr><tr><td align="left"><code class="literal">-n</code> or <code class="literal">--dryrun</code></td><td align="left">Do no change anything in the data model, print message instead.</td></tr><tr><td align="left"><code class="literal">-v</code> or <code class="literal">--verbose</code></td><td align="left">Print a line describing the action taken for each embargoed item found.</td></tr><tr><td align="left"><code class="literal">-q</code> or <code class="literal">--quiet</code></td><td align="left">No output except upon error.</td></tr><tr><td align="left"><code class="literal">-h</code> or <code class="literal">--help</code></td><td align="left">Display brief help screen.</td></tr></tbody></table></div></div><br class="table-break"><p>You must run the Embargo Lifter task periodically to check for items with expired embargoes and lift them from being embargoed. For example, to check the status, at the CLI:</p><p><code class="literal">[dspace]/bin/dspace embargo-lifter -c</code></p><p>To lift the actual embargoes on those items that meet the time criteria, at the CLI:</p><p><code class="literal">[dspace]/bin/dspace embargo-lifter -l</code></p></div><div class="section" title="8.13.&nbsp;Browse Index Creation"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="N16636"></a>8.13.&nbsp;<a name="docbook-sys_admin.html-indexbrowse"></a>Browse Index Creation</h2></div></div><div></div></div><p>To create all the various browse indexes that you define in the <a class="link" href="ch05.html#docbook-configure.html-browse-index">Configuration Section</a> (Chapter 5) there are a variety of options available to you. You can see these options below in the command table.</p><div class="table"><a name="N16642"></a><p class="title"><b>Table&nbsp;8.12.&nbsp;Browse Index Command Table</b></p><div class="table-contents"><table summary="Browse Index Command Table" border="1" width="100%"><colgroup><col align="left"><col align="left"></colgroup><tbody><tr><td align="left">Command used:</td><td align="left"><span class="emphasis"><em><code class="literal">[dspace]</code></em></span><code class="literal">/bin/dspace index-init</code></td></tr><tr><td align="left">Java class:</td><td align="left">org.dspace.browse.IndexBrowse</td></tr><tr><td align="left">Arguments short and long forms):</td><td align="left">Description</td></tr><tr><td align="left"><code class="literal">-r</code> or <code class="literal">--rebuild</code></td><td align="left">Should we rebuild all the indexes, which removes old tables and creates new ones. For use with <code class="literal">-f</code>. Mutually exclusive with <code class="literal">-d</code></td></tr><tr><td align="left"><code class="literal">-s</code> or <code class="literal">--start</code></td><td align="left"><code class="literal">[-s &lt;int&gt;] </code>start from this index number and work upwards (mostly only useful for debugging). For use with <code class="literal">-t</code> and <code class="literal">-f</code></td></tr><tr><td align="left"><code class="literal">-x</code> or <code class="literal">--execute</code></td><td align="left">Execute all the remove and create SQL against the database. For use with <code class="literal">-t </code>and <code class="literal">-f</code></td></tr><tr><td align="left"><code class="literal">-i</code> or <code class="literal">--index</code></td><td align="left">Actually do the indexing. Mutually exclusive with <code class="literal">-t</code> and <code class="literal">-f</code>.</td></tr><tr><td align="left"><code class="literal">-o</code> or <code class="literal">--out</code></td><td align="left"><code class="literal">[-o&lt;filename&gt;]</code> write the remove and create SQL to the given file. For use with <code class="literal">-t</code> and <code class="literal">-f</code></td></tr><tr><td align="left"><code class="literal">-p</code> or <code class="literal">--print</code></td><td align="left">Write the remove and create SQL to the stdout. For use with <code class="literal">-t</code> and <code class="literal">-f</code>.</td></tr><tr><td align="left"><code class="literal">-t</code> or <code class="literal">--tables</code></td><td align="left">Create the tables only, do no attempt to index. Mutually exclusive with <code class="literal">-f</code> and <code class="literal">-i</code></td></tr><tr><td align="left"><code class="literal">-f</code> or <code class="literal">--full</code></td><td align="left">Make the tables, and do the indexing. This forces <code class="literal">-x</code>. Mutually exclusive with <code class="literal">-f</code> and <code class="literal">-i</code>.</td></tr><tr><td align="left"><code class="literal">-v</code> or <code class="literal">--verbose</code></td><td align="left">Print extra information to the stdout. If used in conjunction with <code class="literal">-p</code>, you cannot use the stdout to generate your database structure.</td></tr><tr><td align="left"><code class="literal">-d</code> or <code class="literal">--delete</code></td><td align="left">Delete all the indexes, but do not create new ones. For use with <code class="literal">-f</code>. This is mutually exclusive with <code class="literal">-r</code>.</td></tr><tr><td align="left"><code class="literal">-h</code> or <code class="literal">--help</code></td><td align="left">Show this help documentation. Overrides all other arguments.</td></tr></tbody></table></div></div><br class="table-break"><div class="section" title="8.13.1.&nbsp;Running the Indexing Programs"><div class="titlepage"><div><div><h3 class="title"><a name="N16735"></a>8.13.1.&nbsp;Running the Indexing Programs</h3></div></div><div></div></div><p><span class="bold"><strong>Complete Index Regeneration</strong></span>. By running <code class="literal">[dspace]/bin/dspace index-init</code> you will completely regenerate your indexes, tearing down all old tables and reconstructing with the new cofiguration. Running this is the same as:</p><p><code class="literal">[dspace]/bin/dsrun org.dspace.browse.IndexBrowse -f -r</code></p><p><span class="bold"><strong>Updating the Indexes</strong></span>. By running <code class="literal">dspace/bin/dspace index-update</code> you will reindex your full browse wihtout modifying the table structure. (This should be your default approach if indexing, for example, via a cron job periodically). Running this is the same as:</p><p><code class="literal">[dspace]/bin/dsrun org.dspace.browse.IndexBrowse -i</code></p><p><span class="bold"><strong>Destroy and rebuild.</strong></span> You can destroy and rebuild the database, but do not do the indexing. Output the SQL to do this to the screen and a file, as well as executing it against the database, while being verbose. At the CLI screen:</p><p><code class="literal">[dspace]/bin/dsrun org.dspace.browse.IndexBrowse -r -t -p -v -x -o myfile.sql</code></p></div><div class="section" title="8.13.2.&nbsp;Indexing Customization"><div class="titlepage"><div><div><h3 class="title"><a name="N1675B"></a>8.13.2.&nbsp;Indexing Customization</h3></div></div><div></div></div><p>DSpace provides robust browse indexing. It is possible to expand upon the default indexes delivered at the time of the installation. The System Administrator should review <a class="link" href="ch05.html#docbook-configure.html-browse-index-define">"Defining the Indexes" from the Chapter 5. Configuration</a> to become familiar with the property keys and the definitions used therein before attempting heavy customizations.</p><p>Through customization is is possible to:</p><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem"><p>Add new browse indexes besides the four that are delivered upon installation. Examples: <div class="itemizedlist"><ul class="itemizedlist" type="circle"><li class="listitem"><p>Series</p></li><li class="listitem"><p>Specific subject fields (Library of Congress Subject Headings.<span class="emphasis"><em>(It is possible to create a browse index based on a controlled vocabulary or thesauris.)</em></span></p></li><li class="listitem"><p>Other metadata schema fields</p></li></ul></div></p></li><li class="listitem"><p>Combine metadata fields into one browse</p></li><li class="listitem"><p>Combine different metadata schemas in one browse</p></li></ul></div><p><span class="bold"><strong>Examples of new browse indexes that are possible.</strong></span>
<span class="emphasis"><em>(The system administrator is reminded to read the section on Defining the Indexes in Chapter 5. Configuration.)</em></span></p><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem"><p><span class="bold"><strong>Add a Series Browse</strong></span>. You want to add a new browse using a previously unused metadata element. </p><p><code class="literal">webui.browse.index.6 = series:metadata:dc.relation.ispartofseries:text:single</code></p><p>Note: the index # need to be adjusted to your browse stanza in the <code class="literal">dspace.cfg</code> file. Also, you will need to update your <code class="literal">Messages.properties</code> file. </p></li><li class="listitem"><p><span class="bold"><strong>Combine more than one metadata field into a browse.</strong></span> You may have other title fields used in your repository. You may only want one or two of them added, not all title fields. And/or you may want your series to file in there. </p><p><code class="literal">webui.browse.index.3 = title:metadata:dc.title,dc:title.uniform,dc:relation.ispartofseries:title:full</code></p></li><li class="listitem"><p><span class="bold"><strong>Separate subject browse.</strong></span> You may want to have a separate subject browse limited to only one type of subject. </p><p><code class="literal">webui.browse.index.7 = lcsubject.metdata:dc.subject.lcsh.text:single</code></p></li></ul></div><p>As one can see, the choices are limited only by your metadata schema, the metadata, and your imagination.</p><div class="tip" title="Tip" style="margin-left: 0.5in; margin-right: 0.5in;"><table border="0" summary="Tip"><tr><td valign="top" align="center" rowspan="2" width="25"><img alt="[Tip]" src="/jspui/doc/image/tip.png"></td><th align="left"></th></tr><tr><td valign="top" align="left"><p>Remember to run <code class="literal">index-init</code> after adding any new defitions in the <code class="literal">dspace.cfg</code> to have the indexes created and the data indexed.</p></td></tr></table></div></div></div><div class="section" title="8.14.&nbsp;DSpace Log Converter"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="N167B9"></a>8.14.&nbsp;<a name="docbook-sys_admin.html-log-converter"></a>DSpace Log Converter</h2></div></div><div></div></div><p>With the release of DSpace 1.6, new statistics software component was added. DSpace's use of SOLR for statics makes it possible to have a database of statistics. This in mind, there is the issue of the older log files and how a site can use them. The following command process is able to convert the existing log files and then import them for SOLR use. The user will need to perform this only once. </p><p>The Log Converter program converts log files from dspace.log into an intermediate format that can be inserted into SOLR.</p><div class="table"><a name="N167C4"></a><p class="title"><b>Table&nbsp;8.13.&nbsp;Log Converter Table</b></p><div class="table-contents"><table summary="Log Converter Table" border="1" width="100%"><colgroup><col align="left"><col align="left"></colgroup><tbody><tr><td align="left">Command used:</td><td align="left"><span class="emphasis"><em><code class="literal">[dspace]</code></em></span><code class="literal">/bin/dspace stats-log-converter</code></td></tr><tr><td align="left">Java class:</td><td align="left">org.dspace.statistics.util.ClassicDSpaceLogConverter</td></tr><tr><td align="left">Arguments short and long forms):</td><td align="left">Description</td></tr><tr><td align="left"><code class="literal">-i</code> or <code class="literal">--in</code></td><td align="left">Input file</td></tr><tr><td align="left"><code class="literal">-o</code> or <code class="literal">--out</code></td><td align="left">Output file</td></tr><tr><td align="left"><code class="literal">-m</code> or <code class="literal">--multiple</code></td><td align="left">Adds a wildcard at the end of input and output, so it would mean dspace.log* would be converted. (For example, the following files would be included because of this argument: <code class="literal">dspace.log, dspace.log.1, dspace.log.2, dspace.log.3,</code> etc.)</td></tr><tr><td align="left"><code class="literal">-n</code> or <code class="literal">--newformat</code></td><td align="left">If the log files have been created with DSpace 1.6</td></tr><tr><td align="left"><code class="literal">-v</code> or <code class="literal">--verbose</code></td><td align="left">Display verbose ouput (helpful for debugging)</td></tr><tr><td align="left"><code class="literal">-h</code> or <code class="literal">--help</code></td><td align="left">Help</td></tr></tbody></table></div></div><br class="table-break"><p>The command loads the intermediate log files that have been created by the aforementioned script into SOLR.</p><div class="table"><a name="N16835"></a><p class="title"><b>Table&nbsp;8.14.&nbsp;Log Import Table</b></p><div class="table-contents"><table summary="Log Import Table" border="1" width="100%"><colgroup><col align="left"><col align="left"></colgroup><tbody><tr><td align="left">Command used:</td><td align="left"><span class="emphasis"><em><code class="literal">[dspace]</code></em></span><code class="literal">/bin/dspace stats-log-importer</code></td></tr><tr><td align="left">Java class:</td><td align="left">org.dspace.statistics.util.StatisticsImporter</td></tr><tr><td align="left">Arguments (short and long forms):</td><td align="left">Description</td></tr><tr><td align="left"><code class="literal">-i</code> or <code class="literal">--</code></td><td align="left">input file</td></tr><tr><td align="left"><code class="literal">-m</code> or <code class="literal">--</code></td><td align="left">Adds a wildcard at the end of the input, so it would mean dspace.log* would be imported</td></tr><tr><td align="left"><code class="literal">-s</code> or <code class="literal">--</code></td><td align="left">To skip the reverse DNS lookups that work out where a user is from. (The DNS lookup finds the inforamtion about the host from its IP addess, such as geographical location, etc. This can be slow, and wouldn't work on a server not connected to the internet.)</td></tr><tr><td align="left"><code class="literal">-v</code> or <code class="literal">--</code></td><td align="left">Display verbose ouput (helpful for debugging)</td></tr><tr><td align="left"><code class="literal">-l</code> or <code class="literal">--</code></td><td align="left">For developers: allows you to import a log file from another system, so because the handles won't exist, it looks up random items in your local system to add hits to instead.</td></tr><tr><td align="left"><code class="literal">-h</code> or <code class="literal">--</code></td><td align="left">Help</td></tr></tbody></table></div></div><br class="table-break"><p>Although the DSpace Log Convertor applies basic spider filtering (googlebot, yahoo slurp, msnbot), it is far from complete. Please refer to Statistics Client (8.15) for spider removal operations, after converting your old logs.</p></div><div class="section" title="8.15.&nbsp;Client Statistics"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="N168A2"></a>8.15.&nbsp;<a name="docbook-sys_admin.html-statistics"></a>Client Statistics</h2></div></div><div></div></div><div class="table"><a name="N168A9"></a><p class="title"><b>Table&nbsp;8.15.&nbsp;Client Statistics Command Table</b></p><div class="table-contents"><table summary="Client Statistics Command Table" border="1" width="100%"><colgroup><col align="left"><col align="left"></colgroup><tbody><tr><td align="left">Command used:</td><td align="left"><span class="emphasis"><em><code class="literal">[dspace]</code></em></span><code class="literal">/bin/dspace stats-util</code></td></tr><tr><td align="left">Java class:</td><td align="left">org.dspace.statistics.util.StatisticsClient</td></tr><tr><td align="left">Arguments (short and long forms):</td><td align="left">Description</td></tr><tr><td align="left"><code class="literal">-u</code> or <code class="literal">--update-spider-files</code></td><td align="left">Update Spider IP Files from internet into /dspace/config/spiders. Downloads Spider files identified in <code class="literal">dspace.cfg</code> under property</td></tr><tr><td align="left"><code class="literal">-f</code> or <code class="literal">--delete-spiders-by-flag</code></td><td align="left">Delete Spiders in Solr By isBot Flag. Will prune out all records that have <code class="literal">isBot:true</code></td></tr><tr><td align="left"><code class="literal">-i</code> or <code class="literal">--delete-spiders-by-ip</code></td><td align="left">Delete Spiders in Solr By IP Address. Will prune out all records that have IP's that match spider IPs.</td></tr><tr><td align="left"><code class="literal">-m</code> or <code class="literal">--mark-spiders</code></td><td align="left">Update isBog Flag in Solr. Marks any records currently stored in statistics that have IP addresses matched in spiders files</td></tr><tr><td align="left"><code class="literal">-h</code> or <code class="literal">--help</code></td><td align="left">Calls up this brief help table at CLI.</td></tr></tbody></table></div></div><br class="table-break"><p>Notes:</p><p>The usage of these options is open for the user to choose, If they want to keep spider entires in their repository, they can just mark them using "<code class="literal">-m</code>" and they will be excluded from statistics queries when "<code class="literal">solr.statistics.query.filter.isBot = true</code>" in the <code class="literal">dspace.cfg</code>.</p><p>If they want to keep the spiders out of the solr repository, they can run just use the "<code class="literal">-i</code>" option and they will be removed immediately.</p><p>There are guards in place to control what can be defined as an IP range for a bot, in <code class="literal">[dspace]/config/spiders</code>, spider IP address ranges have to be at least 3 subnet sections in length 123.123.123 and IP Ranges can only be on the smallest subnet [123.123.123.0 - 123.123.123.255]. If not, loading that row will cause exceptions in the dspace logs and exclude that IP entry.</p></div><div class="section" title="8.16.&nbsp;Test Database"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="N1692C"></a>8.16.&nbsp;<a name="docbook-sys_admin.html-testDB"></a>Test Database</h2></div></div><div></div></div><p>This command can be used at any time to test for Database connectivity. It will assist in troubleshooting PostgreSQL and Oracle connection issues with the datase.</p><div class="table"><a name="N16935"></a><p class="title"><b>Table&nbsp;8.16.&nbsp;Test Database Command Table</b></p><div class="table-contents"><table summary="Test Database Command Table" border="1" width="100%"><colgroup><col align="left"><col align="left"></colgroup><tbody><tr><td align="left">Command used:</td><td align="left"><span class="emphasis"><em><code class="literal">[dspace]</code></em></span><code class="literal">/bin/dspace test-database</code></td></tr><tr><td align="left">Java class:</td><td align="left">org.dspace.storage.rdbms.DatabaseManager</td></tr><tr><td align="left">Arguments (short and long forms):</td><td align="left">Description</td></tr><tr><td align="left"><code class="literal">-</code> or <code class="literal">--</code></td><td align="left">There are no arguments used at this time.</td></tr></tbody></table></div></div><br class="table-break"></div></div><HR><p class="copyright">Copyright <20> 2002-2010
<a class="ulink" href="http://www.duraspace.org/" target="_top">DuraSpace</a>
</p><div class="legalnotice" title="Legal Notice"><a name="N1001D"></a><p>
<a class="ulink" href="http://creativecommons.org/licenses/by/3.0/us/" target="_top">
<span class="inlinemediaobject"><img src="http://i.creativecommons.org/l/by/3.0/us/88x31.png"></span>
</a>
</p><p>Licensed under a Creative Commons Attribution 3.0 United States License</p></div><div class="navfooter"><hr><table summary="Navigation footer" width="100%"><tr><td align="left" width="40%"><a accesskey="p" href="ch07.html">Prev</a>&nbsp;</td><td align="center" width="20%">&nbsp;</td><td align="right" width="40%">&nbsp;<a accesskey="n" href="ch09.html">Next</a></td></tr><tr><td valign="top" align="left" width="40%">Chapter&nbsp;7.&nbsp;DSpace System Documentation: Manakin [XMLUI] Configuration and Customization&nbsp;</td><td align="center" width="20%"><a accesskey="h" href="index.html">Home</a></td><td valign="top" align="right" width="40%">&nbsp;Chapter&nbsp;9.&nbsp;DSpace System Documentation: Storage Layer</td></tr></table></div></body></html>