mirror of
https://github.com/DSpace/DSpace.git
synced 2025-10-07 01:54:22 +00:00
Version 1.7 for 1.7RC1 release.
git-svn-id: http://scm.dspace.org/svn/repo/dspace/trunk@5759 9c30dcfa-912a-0410-8fc2-9e0234be79fd
This commit is contained in:
643
dspace/docs/html/AipBackupRestore.html
Executable file
643
dspace/docs/html/AipBackupRestore.html
Executable file
@@ -0,0 +1,643 @@
|
||||
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
|
||||
<html>
|
||||
<head>
|
||||
<title>DSpace Documentation : AipBackupRestore</title>
|
||||
<link rel="stylesheet" href="styles/site.css" type="text/css" />
|
||||
<META http-equiv="Content-Type" content="text/html; charset=UTF-8">
|
||||
</head>
|
||||
|
||||
<body>
|
||||
<table class="pagecontent" border="0" cellpadding="0" cellspacing="0" width="100%" bgcolor="#ffffff">
|
||||
<tr>
|
||||
<td valign="top" class="pagebody">
|
||||
<div class="pageheader">
|
||||
<span class="pagetitle">
|
||||
DSpace Documentation : AipBackupRestore
|
||||
</span>
|
||||
</div>
|
||||
<div class="pagesubheading">
|
||||
This page last changed on Nov 06, 2010 by <font color="#0050B2">jtrimble</font>.
|
||||
</div>
|
||||
|
||||
<h1><a name="AipBackupRestore-AIPBackup%26RestoreforDSpace"></a>AIP Backup & Restore for DSpace</h1>
|
||||
|
||||
<h2><a name="AipBackupRestore-Background%26Overview"></a>Background & Overview</h2>
|
||||
|
||||
<div class='panelMacro'><table class='noteMacro'><colgroup><col width='24'><col></colgroup><tr><td valign='top'><img src="images/icons/emoticons/warning.gif" width="16" height="16" align="absmiddle" alt="" border="0"></td><td>Additional background information available in the Open Repositories 2010 Presentation entitled <a href="http://www.slideshare.net/tdonohue/improving-dspace-backups-restores-migrations">Improving DSpace Backups, Restores & Migrations</a></td></tr></table></div>
|
||||
|
||||
<p>As of DSpace 1.7, DSpace now can backup and restore all of its contents as a set of <a href="DSpaceAIPFormat.html" title="DSpaceAIPFormat">AIP Files</a>. This includes all Communities, Collections, Items, Groups and People in the system.</p>
|
||||
|
||||
<p>This feature came out of a requirement for DSpace to better integrate with DuraCloud (<a href="http://www.duracloud.org">http://www.duracloud.org</a>), and other backup storage systems. One of these requirements is to be able to essentially "backup" local DSpace contents into the cloud (as a type of offsite backup), and "restore" those contents at a later time.</p>
|
||||
|
||||
<p>Essentially, this means DSpace can export the entire hierarchy (i.e. bitstreams, metadata and relationships between Communities/Collections/Items) into a relatively standard format (a METS-based, <a href="DSpaceAIPFormat.html" title="DSpaceAIPFormat">AIP format</a>). This entire hierarchy can also be re-imported into DSpace in the same format (essentially a restore of that content in the same or different DSpace installation).</p>
|
||||
|
||||
<p><b>Benefits for the DSpace community:</b></p>
|
||||
<ul>
|
||||
<li>Allows folks to more easily move entire Communities or Collections between DSpace instances.</li>
|
||||
<li>Allows for a potentially more consistent backup of this hierarchy (e.g. to DuraCloud, or just to your own local backup system), rather than relying on synchronizing a backup of your DB (metadata/relationships) and assetstore (bitstreams).</li>
|
||||
<li>Provides a way for people to more easily get their data out of DSpace (whatever the purpose may be).</li>
|
||||
<li>Provides a relatively standard format for people to migrate entire hierarchies (Communities/Collections) into DSpace (from another system).</li>
|
||||
</ul>
|
||||
|
||||
|
||||
<h3><a name="AipBackupRestore-HowdoesthisworkhelpDSpaceinteractwithDuraCloud%3F"></a>How does this work help DSpace interact with DuraCloud?</h3>
|
||||
|
||||
<p>This work is entirely about <b>exporting</b> DSpace content objects to a location on a local filesystem. So, this work doesn't interact solely with DuraCloud, and could be used by any backup storage system to backup your DSpace contents.</p>
|
||||
|
||||
<p>In the initial DuraCloud work, the DuraCloud team is working on a way to "synchronize" DuraCloud with a local file folder. So, DuraCloud can be configured to "watch" a given folder and automatically replicate its contents into the cloud.</p>
|
||||
|
||||
<p>Therefore, moving content from DSpace to DuraCloud would currently be a two-step process:</p>
|
||||
<ol>
|
||||
<li>First, export AIPs describing that content from DSpace to a filesystem folder</li>
|
||||
<li>Second, enable DuraCloud to watch that same filesystem folder and replicate it into the cloud.</li>
|
||||
</ol>
|
||||
|
||||
|
||||
<p>Similarly, moving content from DuraCloud back into DSpace would also be a two-step process:</p>
|
||||
<ol>
|
||||
<li>First, you'd tell DuraCloud to replicate the AIPs from the cloud to a folder on your file system</li>
|
||||
<li>Second, you'd ingest those AIPs back into DSpace</li>
|
||||
</ol>
|
||||
|
||||
|
||||
<p>(These backup/restore processes may change as we go forward and investigate more use cases. This is just the initial plan.)</p>
|
||||
|
||||
<h2><a name="AipBackupRestore-MakeupandDefinitionofAIPs"></a>Makeup and Definition of AIPs</h2>
|
||||
|
||||
<h3><a name="AipBackupRestore-AIPsareArchivalInformationPackages."></a>AIPs are Archival Information Packages.</h3>
|
||||
|
||||
<ul>
|
||||
<li>AIP is a package describing one archival object.
|
||||
<ul>
|
||||
<li>Archival object may be <b>Item</b>, <b>Collection</b>, <b>Community</b>, or <b>Site</b> (Site AIPs contain site-wide information). Bitstreams are included in an Item's AIP.</li>
|
||||
<li>Each AIP is logically self-contained, can be restored without rest of the archive. (So you could restore a single Item, Collection or Community)</li>
|
||||
<li>AIP profile favors completeness and accuracy rather than presenting the semantics of an object in a standard format. It conforms to the quirks of DSpace's internal object model rather than attempting to produce a universally understandable representation of the object. When possible, an AIP tries to use common standards to express objects.</li>
|
||||
<li>An AIP <em>can</em> serve as a DIP (Dissemination Information Package) or SIP (Submission Information Package), especially when transferring custody of objects to another DSpace implementation.</li>
|
||||
</ul>
|
||||
</li>
|
||||
<li>In contrast to SIP or DIP, the AIP should include all available DSpace structural and administrative metadata, and basic provenance information. AIPs will also describe some basic system level information (e.g. Groups and People).</li>
|
||||
</ul>
|
||||
|
||||
|
||||
|
||||
<h3><a name="AipBackupRestore-AIPStructure%2FFormat"></a>AIP Structure / Format</h3>
|
||||
|
||||
<p>Generally speaking, an AIP is an Zip file containing a METS manifest and all related content bitstreams.</p>
|
||||
|
||||
<p>For more specific details of AIP format / structure, along with examples, please see <a href="DSpaceAIPFormat.html" title="DSpaceAIPFormat">DSpaceAIPFormat</a></p>
|
||||
|
||||
<h2><a name="AipBackupRestore-RunningtheCode"></a>Running the Code</h2>
|
||||
|
||||
<h3><a name="AipBackupRestore-ExportingAIPs"></a>Exporting AIPs</h3>
|
||||
|
||||
<h4><a name="AipBackupRestore-ExportModes%26Options"></a>Export Modes & Options</h4>
|
||||
|
||||
<p>All AIP Exports are done by using the Dissemination Mode (<tt>-d</tt> option) of the <tt>packager</tt> command.</p>
|
||||
|
||||
<p>There are two types of AIP Dissemination you can perform:</p>
|
||||
<ul>
|
||||
<li><b>Single AIP</b> (default, using <tt>-d</tt> option) - Exports just an AIP describing a single DSpace object. So, if you ran it in this default mode for a Collection, you'd just end up with a single Collection AIP (which would not include AIPs for all its child Items)</li>
|
||||
<li><b>Hierarchy of AIPs</b> (using the <tt>-d --all</tt> or <tt>-d -a</tt> option) - Exports the requested AIP describing an object, plus the AIP for all child objects. Some examples follow:
|
||||
<ul>
|
||||
<li>For a Site - this would export <b>all</b> Communities, Collections & Items within the site into AIP files (in a provided directory)</li>
|
||||
<li>For a Community - this would export that Community and all SubCommunities, Collections and Items into AIP files (in a provided directory)</li>
|
||||
<li>For a Collection - this would export that Collection and all contained Items into AIP files (in a provided directory)</li>
|
||||
<li>For an Item – this just exports the Item into an AIP as normal (as it already contains its Bitstreams/Bundles by default)</li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
|
||||
|
||||
<h4><a name="AipBackupRestore-ExportingjustasingleAIP"></a>Exporting just a single AIP</h4>
|
||||
|
||||
<p>To export in single AIP mode (default), use this 'packager' command template:</p>
|
||||
|
||||
<div class="code panel" style="border-width: 1px;"><div class="codeContent panelContent">
|
||||
<pre class="code-java"> [dspace]/bin/dspace packager -d -t AIP -e <eperson> -i <handle> <file-path>
|
||||
</pre>
|
||||
</div></div>
|
||||
<p>for example:</p>
|
||||
|
||||
<div class="code panel" style="border-width: 1px;"><div class="codeContent panelContent">
|
||||
<pre class="code-java"> [dspace]/bin/dspace packager -d -t AIP -e admin@myu.edu -i 4321/4567 aip4567.zip
|
||||
</pre>
|
||||
</div></div>
|
||||
<p>The above code will export the object of the given handle (4321/4567) into an AIP file named "aip4567.zip". This will <b>not</b> include any child objects for Communities or Collections.</p>
|
||||
|
||||
|
||||
<h4><a name="AipBackupRestore-ExportingAIPHierarchy"></a>Exporting AIP Hierarchy</h4>
|
||||
|
||||
<p>To export an AIP hierarchy, use the <tt>-a</tt> (or <tt>--all</tt>) package parameter.</p>
|
||||
|
||||
<p>For example, use this 'packager' command template:</p>
|
||||
|
||||
<div class="code panel" style="border-width: 1px;"><div class="codeContent panelContent">
|
||||
<pre class="code-java"> [dspace]/bin/dspace packager -d -a -t AIP -e <eperson> -i <handle> <file-path>
|
||||
</pre>
|
||||
</div></div>
|
||||
<p>for example:</p>
|
||||
|
||||
<div class="code panel" style="border-width: 1px;"><div class="codeContent panelContent">
|
||||
<pre class="code-java"> [dspace]/bin/dspace packager -d -a -t AIP -e admin@myu.edu -i 4321/4567 aip4567.zip
|
||||
</pre>
|
||||
</div></div>
|
||||
<p>The above code will export the object of the given handle (4321/4567) into an AIP file named "aip4567.zip". In addition it would export all children objects to the same directory as the "aip4567.zip" file. The child AIP files are all named using the following format:</p>
|
||||
<ul>
|
||||
<li>File Name Format: <tt><Obj-Type>@<Handle-with-dashes>.zip</tt>
|
||||
<ul>
|
||||
<li>e.g. COMMUNITY@123456789-1.zip, COLLECTION@123456789-2.zip, ITEM@123456789-200.zip</li>
|
||||
<li>This general file naming convention ensures that you can easily locate an object to restore by its name (assuming you know its Object Type and Handle).</li>
|
||||
</ul>
|
||||
</li>
|
||||
<li>Alternatively, if object doesn't have a Handle, it uses this File Name Format: <tt><Obj-Type>@internal-id-<DSpace-ID>.zip</tt> (e.g. ITEM@internal-id-234.zip)</li>
|
||||
</ul>
|
||||
|
||||
|
||||
<h5><a name="AipBackupRestore-ExportingEntireSite"></a>Exporting Entire Site</h5>
|
||||
|
||||
<p>To export an entire DSpace Site, pass the packager the Handle <tt><site-handle-prefix>/0</tt>. For example, if your site prefix is "4321", you'd run a command similar to the following:</p>
|
||||
|
||||
<div class="code panel" style="border-width: 1px;"><div class="codeContent panelContent">
|
||||
<pre class="code-java"> [dspace]/bin/dspace packager -d -a -t AIP -e admin@myu.edu -i 4321/0 sitewide-aip.zip
|
||||
</pre>
|
||||
</div></div>
|
||||
<p>Again, this would export the DSpace Site AIP into the file "sitewide-aip.zip", and export AIPs for <b>all</b> Communities, Collections and Items into the same directory as the Site AIP.</p>
|
||||
|
||||
<h3><a name="AipBackupRestore-Ingesting%2FRestoringAIPs"></a>Ingesting / Restoring AIPs</h3>
|
||||
|
||||
<h4><a name="AipBackupRestore-IngestionModes%26Options"></a>Ingestion Modes & Options</h4>
|
||||
|
||||
<p>Ingestion of AIPs is a bit more complex than Dissemination, as there are several different "modes" available:</p>
|
||||
<ol>
|
||||
<li>Submit/Ingest Mode (<tt>-s</tt> option, default) – submit AIP(s) to DSpace in order to create a new object(s) (i.e. AIP is treated like a SIP – Submission Information Package)</li>
|
||||
<li>Restore Mode (<tt>-r</tt> option) – restore pre-existing object(s) in DSpace based on AIP(s). This also attempts to restore all handles and relationships (parent/child objects). This is a specialized type of "submit", where the object is created with a known Handle and known relationships.</li>
|
||||
<li>Replace Mode (<tt>-r -f</tt> option) – replace existing object(s) in DSpace based on AIP(s). This also attempts to restore all handles and relationships (parent/child objects). This is a specialized type of "restore" where the contents of existing object(s) is replaced by the contents in the AIP(s). By default, if a normal "restore" finds the object already exists, it will back out (i.e. rollback all changes) and report which object already exists.</li>
|
||||
</ol>
|
||||
|
||||
|
||||
<p>Again, like export, there are two types of AIP Ingestion you can perform (using any of the above modes):</p>
|
||||
<ul>
|
||||
<li><b>Single AIP</b> (default) - Ingests just an AIP describing a single DSpace object. So, if you ran it in this default mode for a Collection AIP, you'd just create a DSpace Collection from the AIP (but not ingest any of its child objects)</li>
|
||||
<li><b>Hierarchy of AIPs</b> (by including the <tt>--all</tt> or <tt>-a</tt> option after the mode) - Ingests the requested AIP describing an object, plus the AIP for all child objects. Some examples follow:
|
||||
<ul>
|
||||
<li>For a Site - this would ingest <b>all</b> Communities, Collections & Items based on the located AIP files</li>
|
||||
<li>For a Community - this would ingest that Community and all SubCommunities, Collections and Items based on the located AIP files</li>
|
||||
<li>For a Collection - this would ingest that Collection and all contained Items based on the located AIP files</li>
|
||||
<li>For an Item – this just ingest the Item (including all Bitstreams & Bundles) based on the AIP file.</li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
|
||||
<h5><a name="AipBackupRestore-Thedifferencebetween%22Submit%22and%22Restore%2FReplace%22modes"></a>The difference between "Submit" and "Restore/Replace" modes</h5>
|
||||
|
||||
<p>It's worth understanding the primary differences between a Submission (specified by <tt>-s</tt> parameter) and a Restore (specified by <tt>-r</tt> parameter).</p>
|
||||
|
||||
<ul>
|
||||
<li><b>Submission Mode</b> (<tt>-s</tt> mode) - creates a new object (AIP is treated like a SIP)
|
||||
<ul>
|
||||
<li>By default, a new Handle is always assigned
|
||||
<ul>
|
||||
<li>However, you can force it to use the handle specified in the AIP by specifying <tt>-o ignoreHandle=false</tt> as one of your parameters</li>
|
||||
</ul>
|
||||
</li>
|
||||
<li>By default, a new Parent object <b>must</b> be specified (using the <tt>-p</tt> parameter). This is the location where the new object will be created.
|
||||
<ul>
|
||||
<li>However, you can force it to use the parent object specified in the AIP by specifiying <tt>-o ignoreParent=false</tt> as one of your parameters</li>
|
||||
</ul>
|
||||
</li>
|
||||
<li>By default, will respect a Collection's Workflow process when you submit an Item to a Collection
|
||||
<ul>
|
||||
<li>However, you can specifically <em>skip</em> any workflow approval processes by specifying <tt>-w</tt> parameter.</li>
|
||||
</ul>
|
||||
</li>
|
||||
<li><b>Always</b> adds a new Deposit License to Items</li>
|
||||
<li><b>Always</b> adds new DSpace System metadata to Items (includes new 'dc.date.accessioned', 'dc.date.available', 'dc.date.issued' and 'dc.description.provenance' entries)</li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
|
||||
<ul>
|
||||
<li><b>Restore / Replace Mode</b> (<tt>-r</tt> mode) - restores a previously existing object (as if from a backup)
|
||||
<ul>
|
||||
<li>By default, the Handle specified in the AIP is restored
|
||||
<ul>
|
||||
<li>However, for restores, you can force a new handle to be generated by specifying <tt>-o ignoreHandle=true</tt> as one of your parameters. (NOTE: Doesn't work for <em>replace</em> mode as the new object always retains the handle of the replaced object)</li>
|
||||
<li><img class="emoticon" src="images/icons/emoticons/information.gif" height="16" width="16" align="absmiddle" alt="" border="0"/> Although a Restore/Replace does restore Handles, it will not necessarily restore the same internal IDs in your Database.</li>
|
||||
</ul>
|
||||
</li>
|
||||
<li>By default, the object is restored under the Parent specified in the AIP
|
||||
<ul>
|
||||
<li>However, for restores, you can force it to restore under a different parent object by using the <tt>-p</tt> parameter. (NOTE: Doesn't work for <em>replace</em> mode, as the new object always retains the parent of the replaced object)</li>
|
||||
</ul>
|
||||
</li>
|
||||
<li><b>Always</b> skips any Collection workflow approval processes when restoring/replacing an Item in a Collection</li>
|
||||
<li><b>Never</b> adds a new Deposit License to Items (rather it restores the previous deposit license, as long as it is stored in the AIP)</li>
|
||||
<li><b>Never</b> adds new DSpace System metadata to Items (rather it just restores the metadata as specified in the AIP)</li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
|
||||
|
||||
<h4><a name="AipBackupRestore-SubmittingAIP%28s%29tocreateanewobject"></a>Submitting AIP(s) to create a new object</h4>
|
||||
|
||||
<h5><a name="AipBackupRestore-SubmittingaSingleAIP"></a>Submitting a Single AIP</h5>
|
||||
|
||||
<div class='panelMacro'><table class='noteMacro'><colgroup><col width='24'><col></colgroup><tr><td valign='top'><img src="images/icons/emoticons/warning.gif" width="16" height="16" align="absmiddle" alt="" border="0"></td><td><b>AIPs treated as SIPs</b><br />This option allows you to essentially use an AIP as a SIP (Submission Information Package). The default settings will create a new DSpace object (with a new handle and a new parent object, if specified) from your AIP.</td></tr></table></div>
|
||||
|
||||
<p>To ingest a single AIP and create a new DSpace object under a parent of your choice, specify the <tt>-p</tt> (or <tt>--parent</tt>) package parameter to the command. Also, note that you are running the <tt>packager</tt> in <tt>-s</tt> (submit) mode.</p>
|
||||
|
||||
<p><em>NOTE:</em> This only ingests the single AIP specified. It does <b>not</b> ingest all children objects.</p>
|
||||
|
||||
<div class="code panel" style="border-width: 1px;"><div class="codeContent panelContent">
|
||||
<pre class="code-java"> [dspace]/bin/dspace packager -s -t AIP -e <eperson> -p <parent-handle> <file-path>
|
||||
</pre>
|
||||
</div></div>
|
||||
|
||||
<p>If you leave out the <tt>-p</tt> parameter, the AIP package ingester will attempt to install the AIP under the same parent it had before. As you are also specifying the <tt>-s</tt> (submit) parameter, the <tt>packager</tt> will assume you want a new Handle to be assigned (as you are effectively specifying that you are submitting a <b>new</b> object). If you want the object to retain the Handle specified in the AIP, you can specify the <tt>-o ignoreHandle=false</tt> option to force the packager to <em>not</em> ignore the Handle specified in the AIP.</p>
|
||||
|
||||
|
||||
<h5><a name="AipBackupRestore-SubmittinganAIPHierarchy"></a>Submitting an AIP Hierarchy</h5>
|
||||
|
||||
<div class='panelMacro'><table class='noteMacro'><colgroup><col width='24'><col></colgroup><tr><td valign='top'><img src="images/icons/emoticons/warning.gif" width="16" height="16" align="absmiddle" alt="" border="0"></td><td><b>AIPs treated as SIPs</b><br />This option allows you to essentially use a set of AIPs as SIPs (Submission Information Packages). The default settings will create a new DSpace object (with a new handle and a new parent object, if specified) from each AIP</td></tr></table></div>
|
||||
|
||||
<p>To ingest an AIP hierarchy from a directory of AIPs, use the <tt>-a</tt> (or <tt>--all</tt>) package parameter.</p>
|
||||
|
||||
<p>For example, use this 'packager' command template:</p>
|
||||
|
||||
<div class="code panel" style="border-width: 1px;"><div class="codeContent panelContent">
|
||||
<pre class="code-java"> [dspace]/bin/dspace packager -s -a -t AIP -e <eperson> -p <parent-handle> <file-path>
|
||||
</pre>
|
||||
</div></div>
|
||||
<p>for example:</p>
|
||||
|
||||
<div class="code panel" style="border-width: 1px;"><div class="codeContent panelContent">
|
||||
<pre class="code-java"> [dspace]/bin/dspace packager -s -a -t AIP -e admin@myu.edu -p 4321/12 aip4567.zip
|
||||
</pre>
|
||||
</div></div>
|
||||
<p>The above command will ingest the package named "aip4567.zip" as a child of the specified Parent Object (handle="4321/12"). The resulting object is assigned a new Handle (since <tt>-s</tt> is specified). In addition, any child AIPs referenced by "aip4567.zip" are also recursively ingested (a new Handle is also assigned for each child AIP).</p>
|
||||
|
||||
<p>Another example – <b>Ingesting a Top-Level Community</b> (by using the Site Handle, <tt><site-handle-prefix>/0</tt>):</p>
|
||||
<div class="code panel" style="border-width: 1px;"><div class="codeContent panelContent">
|
||||
<pre class="code-java"> [dspace]/bin/dspace packager -s -a -t AIP -e admin@myu.edu -p 4321/0 community-aip.zip
|
||||
</pre>
|
||||
</div></div>
|
||||
<p>The above command will ingest the package named "community-aip.zip" as a <b>top-level community</b> (i.e. the specified parent is "4321/0" which is a Site Handle). Again, the resulting object is assigned a new Handle. In addition, any child AIPs referenced by "community-aip.zip" are also recursively ingested (a new Handle is also assigned for each child AIP).</p>
|
||||
|
||||
<h4><a name="AipBackupRestore-Restoring%2FReplacingusingAIP%28s%29"></a>Restoring/Replacing using AIP(s)</h4>
|
||||
|
||||
<p><b>Restoring</b> is slightly different than just <b>submitting</b>. When restoring, we make every attempt to restore the object as it <b>used to be</b> (including its handle, parent object, etc.).</p>
|
||||
|
||||
<p>There are currently three restore modes:</p>
|
||||
<ol>
|
||||
<li>Default Restore Mode (<tt>-r</tt>) = Attempt to restore object (and optionally children). Rollback all changes if any object is found to already exist.</li>
|
||||
<li>Restore, Keep Existing Mode (<tt>-r -k</tt>) = Attempt to restore object (and optionally children). If an object is found to already exist, skip over it (and all children objects), and continue to restore all other non-existing objects.</li>
|
||||
<li>Force Replace Mode (<tt>-r -f</tt>) = Restore an object (and optionally children) and <b>overwrite</b> any existing objects in DSpace. Therefore, if an object is found to already exist in DSpace, its contents are replaced by the contents of the AIP. <em>WARNING: This mode is potentially dangerous as it will permanently destroy any object contents that do not currently exist in the AIP. You may want to perform a secondary backup, unless you are sure you know what you are doing!</em></li>
|
||||
</ol>
|
||||
|
||||
|
||||
<div class='panelMacro'><table class='infoMacro'><colgroup><col width='24'><col></colgroup><tr><td valign='top'><img src="images/icons/emoticons/information.gif" width="16" height="16" align="absmiddle" alt="" border="0"></td><td><b>Restoring a Single AIP</b><br />All of the below examples show how to restore an entire hierarchy of objects (using <tt>-a</tt> option). To restore a single object, you can use the same commands, but remove the <tt>-a</tt> option.</td></tr></table></div>
|
||||
|
||||
<h5><a name="AipBackupRestore-DefaultRestoreMode"></a>Default Restore Mode</h5>
|
||||
|
||||
<p>By default, the restore mode (<tt>-r</tt> option) will rollback all changes if any object is found to already exist. The user will be informed if which object already exists within their DSpace installation.</p>
|
||||
|
||||
<p>Use this 'packager' command template:</p>
|
||||
<div class="code panel" style="border-width: 1px;"><div class="codeContent panelContent">
|
||||
<pre class="code-java"> [dspace]/bin/dspace packager -r -a -t AIP -e <eperson> <file-path>
|
||||
</pre>
|
||||
</div></div>
|
||||
<p>For example:</p>
|
||||
<div class="code panel" style="border-width: 1px;"><div class="codeContent panelContent">
|
||||
<pre class="code-java"> [dspace]/bin/dspace packager -r -a -t AIP -e admin@myu.edu aip4567.zip
|
||||
</pre>
|
||||
</div></div>
|
||||
|
||||
<p><em>Notice that unlike</em> <tt><em>-s</em></tt> <em>option (for submission/ingesting), the</em> <tt><em>-r</em></tt> <em>option does not require the Parent Object (</em><tt><em>-p</em></tt> <em>option) to be specified if it can be determined from the package itself.</em></p>
|
||||
|
||||
<p>In the above example, the package "aip4567.zip" is restored to the DSpace installation with the Handle provided within the package itself (and added as a child of the parent object specified within the package itself). In addition, any child AIPs referenced by "aip4567.zip" are also recursively ingested (the <tt>-a</tt> option specifies to also restore all child AIPs). They are also restored with the Handles & Parent Objects provided with their package. If any object is found to already exist, all changes are rolled back (i.e. nothing is restored to DSpace)</p>
|
||||
|
||||
<div class='panelMacro'><table class='noteMacro'><colgroup><col width='24'><col></colgroup><tr><td valign='top'><img src="images/icons/emoticons/warning.gif" width="16" height="16" align="absmiddle" alt="" border="0"></td><td><b>Highly Recommended to Update Database Sequences after a Large Restore</b><br />In some cases, when you restore a large amount of content to your DSpace, the internal database counts (called "sequences") may get out of sync with the Handles of the content you just restored. As a best practice, it is <b>highly recommended to always</b> re-run the "update-sequences.sql" script on your DSpace database after a larger scale restore. This database script can be run while the system is online (i.e. no need to stop Tomcat or PostgreSQL). The script can be found in the following locations for PostgreSQL and Oracle, respectively:<br/>
|
||||
<tt>[dspace]/etc/postgres/update-sequences.sql</tt><br/>
|
||||
<tt>[dspace]/etc/oracle/update-sequences.sql</tt></td></tr></table></div>
|
||||
|
||||
<h5><a name="AipBackupRestore-Restore%2CKeepExistingMode"></a>Restore, Keep Existing Mode</h5>
|
||||
|
||||
<p>When the "Keep Existing" flag (<tt>-k</tt> option) is specified, the restore will attempt to skip over any objects found to already exist. It will report to the user that the object was found to exist (and was not modified or changed). It will then continue to restore all objects which do not already exist.</p>
|
||||
|
||||
<p>One special case to note: If a Collection or Community is found to already exist, its child objects are also skipped over. So, this mode will not auto-restore items to an existing Collection.</p>
|
||||
|
||||
<p>Use this 'packager' command template:</p>
|
||||
<div class="code panel" style="border-width: 1px;"><div class="codeContent panelContent">
|
||||
<pre class="code-java"> [dspace]/bin/dspace packager -r -a -k -t AIP -e <eperson> <file-path>
|
||||
</pre>
|
||||
</div></div>
|
||||
<p>For example:</p>
|
||||
|
||||
<div class="code panel" style="border-width: 1px;"><div class="codeContent panelContent">
|
||||
<pre class="code-java"> [dspace]/bin/dspace packager -r -a -k -t AIP -e admin@myu.edu aip4567.zip
|
||||
</pre>
|
||||
</div></div>
|
||||
|
||||
<p>In the above example, the package "aip4567.zip" is restored to the DSpace installation with the Handle provided within the package itself (and added as a child of the parent object specified within the package itself). In addition, any child AIPs referenced by "aip4567.zip" are also recursively restored (the <tt>-a</tt> option specifies to also restore all child AIPs). They are also restored with the Handles & Parent Objects provided with their package. If any object is found to already exist, it is skipped over (child objects are also skipped). All non-existing objects are restored.</p>
|
||||
|
||||
<h5><a name="AipBackupRestore-ForceReplaceMode"></a>Force Replace Mode</h5>
|
||||
|
||||
<p>When the "Force Replace" flag (<tt>-f</tt> option) is specified, the restore will <b>overwrite</b> any objects found to already exist in DSpace. In other words, existing content is deleted and then replaced by the contents of the AIP(s).</p>
|
||||
|
||||
<div class='panelMacro'><table class='warningMacro'><colgroup><col width='24'><col></colgroup><tr><td valign='top'><img src="images/icons/emoticons/forbidden.gif" width="16" height="16" align="absmiddle" alt="" border="0"></td><td><b>Potential for Data Loss</b><br />Because this mode actually <b>destroys</b> existing content in DSpace, it is potentially dangerous and may result in data loss! You may wish to perform a secondary full backup (assetstore files & database) before attempting to replace any existing object(s) in DSpace.</td></tr></table></div>
|
||||
|
||||
<p>Use this 'packager' command template:</p>
|
||||
<div class="code panel" style="border-width: 1px;"><div class="codeContent panelContent">
|
||||
<pre class="code-java"> [dspace]/bin/dspace packager -r -a -f -t AIP -e <eperson> <file-path>
|
||||
</pre>
|
||||
</div></div>
|
||||
<p>For example:</p>
|
||||
|
||||
<div class="code panel" style="border-width: 1px;"><div class="codeContent panelContent">
|
||||
<pre class="code-java"> [dspace]/bin/dspace packager -r -a -f -t AIP -e admin@myu.edu aip4567.zip
|
||||
</pre>
|
||||
</div></div>
|
||||
|
||||
<p>In the above example, the package "aip4567.zip" is restored to the DSpace installation with the Handle provided within the package itself (and added as a child of the parent object specified within the package itself). In addition, any child AIPs referenced by "aip4567.zip" are also recursively ingested. They are also restored with the Handles & Parent Objects provided with their package. <em>If any object is found to already exist, its contents are replaced by the contents of the appropriate AIP.</em></p>
|
||||
|
||||
<p>If any error occurs, the script attempts to rollback the entire replacement process.</p>
|
||||
|
||||
<h5><a name="AipBackupRestore-RestoringEntireSite"></a>Restoring Entire Site</h5>
|
||||
|
||||
<p>In order to restore an entire Site from a set of AIPs, you must do the following:</p>
|
||||
<ol>
|
||||
<li>Install a completely "fresh" version of DSpace by following the <a href="Installation.html" title="Installation">Installation instructions in the DSpace Manual</a>
|
||||
<ul>
|
||||
<li>At this point, you should have a completely empty, but fully-functional DSpace installation. You will need to create an initial Administrator user in order to perform this restore (as a full-restore can only be performed by a DSpace Administrator).</li>
|
||||
</ul>
|
||||
</li>
|
||||
<li>Once DSpace is installed, run the following command to restore all its contents from AIPs
|
||||
<div class="code panel" style="border-width: 1px;"><div class="codeContent panelContent">
|
||||
<pre class="code-java"> [dspace]/bin/dspace packager -r -a -f -t AIP -e <eperson> -i <site-handle-prefix>/0 /full/path/to/your/site-aip.zip
|
||||
</pre>
|
||||
</div></div></li>
|
||||
</ol>
|
||||
|
||||
|
||||
<p>Please note the following about the above restore command:</p>
|
||||
<ul>
|
||||
<li>Notice that you are running this command in "Force Replace" mode (<tt>-r -f</tt>). This is necessary as your empty DSpace install will already include a few default groups (Administrators and Anonymous) and your initial administrative user. You need to replace these groups in order to restore your prior DSpace contents completely.</li>
|
||||
<li><tt><eperson></tt> should be replaced with the Email Address of the initial Administrator (who you created when you reinstalled DSpace).</li>
|
||||
<li><tt><site-handle-prefix></tt> should be replaced with your DSpace site's assigned Handle Prefix. This is equivalent to the <tt>handle.prefix</tt> setting in your <tt>dspace.cfg</tt></li>
|
||||
<li><tt>/full/path/to/your/site-aip.zip</tt> is the full path to the AIP file which represents your DSpace SITE. This file will be named whatever you named it when you actually <a href="#AipBackupRestore-ExportingEntireSite">exported your entire site</a>. All other AIPs are assumed to be referenced from this SITE AIP (in most cases, they should be in the same directory as that SITE AIP).</li>
|
||||
</ul>
|
||||
|
||||
|
||||
<div class='panelMacro'><table class='noteMacro'><colgroup><col width='24'><col></colgroup><tr><td valign='top'><img src="images/icons/emoticons/warning.gif" width="16" height="16" align="absmiddle" alt="" border="0"></td><td><b>Highly Recommended to Update Database Sequences after a Large Restore</b><br />In some cases, when you restore a large amount of content to your DSpace, the internal database counts (called "sequences") may get out of sync with the Handles of the content you just restored. As a best practice, it is <b>highly recommended to always</b> re-run the "update-sequences.sql" script on your DSpace database after a larger scale restore. This database script can be run while the system is online (i.e. no need to stop Tomcat or PostgreSQL). The script can be found in the following locations for PostgreSQL and Oracle, respectively:<br/>
|
||||
<tt>[dspace]/etc/postgres/update-sequences.sql</tt><br/>
|
||||
<tt>[dspace]/etc/oracle/update-sequences.sql</tt></td></tr></table></div>
|
||||
|
||||
<h2><a name="AipBackupRestore-AdditionalPackagerOptions"></a>Additional Packager Options</h2>
|
||||
|
||||
<p>In additional to the various "modes" settings described under "<a href="#AipBackupRestore-RunningtheCode">Running the Code</a>" above, the AIP Packager supports the following packager options. These options allow you to better tweak how your AIPs are processed (especially during ingests/restores/replaces).</p>
|
||||
|
||||
<div class='table-wrap'>
|
||||
<table class='confluenceTable'><tbody>
|
||||
<tr>
|
||||
<th class='confluenceTh'> Option </th>
|
||||
<th class='confluenceTh'> Ingest or Export </th>
|
||||
<th class='confluenceTh'> Default Value </th>
|
||||
<th class='confluenceTh'> Description </th>
|
||||
</tr>
|
||||
<tr>
|
||||
<td class='confluenceTd'> <tt>createMetadataFields</tt> </td>
|
||||
<td class='confluenceTd'> ingest-only </td>
|
||||
<td class='confluenceTd'> true </td>
|
||||
<td class='confluenceTd'> Tells the AIP ingester to automatically create any metadata fields which are found to be <b>missing</b> from the DSpace Metadata Registry. When 'true', this means as each AIP is ingested, new fields may be added to the DSpace Metadata Registry if they don't already exist. When 'false', an AIP ingest will fail if it encounters a metadata field that doesn't exist in the DSpace Metadata Registry. (NOTE: This will <b>not</b> create missing DSpace Metadata <em>Schemas</em>. If a schema is found to be missing, the ingest will always fail.) </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td class='confluenceTd'> <tt>ignoreHandle</tt> </td>
|
||||
<td class='confluenceTd'> ingest-only </td>
|
||||
<td class='confluenceTd'> Restore/Replace Mode defaults to 'false', <br class="atl-forced-newline" />
|
||||
Submit Mode defaults to 'true' </td>
|
||||
<td class='confluenceTd'> If 'true', the AIP ingester will ignore any Handle specified in the AIP itself, and instead create a new Handle during the ingest process (this is the default when running in Submit mode, using the <tt>-s</tt> flag). If 'false', the AIP ingester attempts to restore the Handles specified in the AIP (this is the default when running in Restore/replace mode, using the <tt>-r</tt> flag). </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td class='confluenceTd'> <tt>ignoreParent</tt> </td>
|
||||
<td class='confluenceTd'> ingest-only </td>
|
||||
<td class='confluenceTd'> Restore/Replace Mode defaults to 'false', <br class="atl-forced-newline" />
|
||||
Submit Mode defaults to 'true' </td>
|
||||
<td class='confluenceTd'> If 'true', the AIP ingester will ignore any Parent object specified in the AIP itself, and instead ingest under a new Parent object (this is the default when running in Submit mode, using the <tt>-s</tt> flag). The new Parent object must be specified via the <tt>-p</tt> flag (run <tt>dspace packager -h</tt> for more help). If 'false', the AIP ingester attempts to restore the object directly under its old Parent (this is the default when running in Restore/replace mode, using the <tt>-r</tt> flag). </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td class='confluenceTd'> <tt>includeBundles</tt> </td>
|
||||
<td class='confluenceTd'> export-only </td>
|
||||
<td class='confluenceTd'> defaults to "all" </td>
|
||||
<td class='confluenceTd'> This option can be used to limit the Bundles which are exported to AIPs for each DSpace Item. By default, all file Bundles will be exported into Item AIPs. You could use this option to limit the size of AIPs by only exporting certain Bundles. <em>WARNING: any bundles</em> <b><em>not</em></b> <em>included in AIPs will obviously be unable to be restored.</em> This option expects a comma separated list of bundle names (e.g. "ORIGINAL,LICENSE,CC_LICENSE,METADATA"), or "all" if all bundles should be included. <br class="atl-forced-newline" />
|
||||
(NOTE: If you choose to no longer export LICENSE or CC_LICENSE bundles, you will also need to disable the License Dissemination Crosswalks in the <tt>aip.disseminate.rightsMD</tt> configuration for the changes to take affect) </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td class='confluenceTd'> <tt>manifestOnly</tt> </td>
|
||||
<td class='confluenceTd'> both </td>
|
||||
<td class='confluenceTd'> false </td>
|
||||
<td class='confluenceTd'> If 'true', the AIP Disseminator will export an AIP which only consists of the METS Manifest file (i.e. result will be a single 'mets.xml' file). This METS Manifest contains URI references to all content files, but does <em>not</em> contain any content files. <b>This option is experimental, and should never be set to 'true' if you want to be able to restore content files.</b> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td class='confluenceTd'> <tt>passwords</tt> </td>
|
||||
<td class='confluenceTd'> export-only </td>
|
||||
<td class='confluenceTd'> false </td>
|
||||
<td class='confluenceTd'> If 'true' (and the 'DSPACE-ROLES' crosswalk is enabled, see <a href="#AipBackupRestore-AIPMetadataDisseminationConfigurations">AIP Metadata Dissemination Configurations</a>), then the AIP Disseminator will export user password hashes (i.e. encrypted passwords) into Site AIP's METS Manifest. This would allow you to restore user's passwords from Site AIP. If 'false', then user password hashes are not stored in Site AIP, and passwords cannot be restored at a later time. </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td class='confluenceTd'> <tt>unauthorized</tt> </td>
|
||||
<td class='confluenceTd'> export-only </td>
|
||||
<td class='confluenceTd'> <em>unspecified</em> </td>
|
||||
<td class='confluenceTd'> If 'skip', the AIP Disseminator will skip over any unauthorized Bundle or Bitstream encountered (i.e. it will not be added to the AIP). If 'zero', the AIP Disseminator will add a Zero-length "placeholder" file to the AIP when it encounters an unauthorized Bitstream. If unspecified (the default value), the AIP Disseminator will throw an error if an unauthorized Bundle or Bitstream is encountered. </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td class='confluenceTd'> <tt>updatedAfter</tt> </td>
|
||||
<td class='confluenceTd'> export-only </td>
|
||||
<td class='confluenceTd'> <em>unspecified</em> </td>
|
||||
<td class='confluenceTd'> This option works as a basic form of "incremental backup". This option requires that an <a href="http://en.wikipedia.org/wiki/ISO_8601">ISO-8601 date</a> is specified. When specified, the AIP Disseminator will only export Item AIPs which have a last-modified date <b>after</b> the specified ISO-8601 date. This option has no affect on the export of Site, Community or Collection AIPs as DSpace does not record a last-modified date for Sites, Communities or Collections. Therefore, when this option is specified, the AIP Disseminator will export the Site AIP, all Community AIPs, all Collection AIPs, and only Item AIPs modified after that date and time. </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td class='confluenceTd'> <tt>validate</tt> </td>
|
||||
<td class='confluenceTd'> ingest-only </td>
|
||||
<td class='confluenceTd'> false </td>
|
||||
<td class='confluenceTd'> If 'true', the AIP Ingester will attempt to validate every METS file within every AIP before ingesting it into DSpace (this will cause the ingestion processing to take longer, but tips on speeding it up can be found in the "<a href="#AipBackupRestore-AIPConfigurationsToImproveIngestionSpeedwhileValidating">AIP Configurations To Improve Ingestion Speed while Validating</a>" section below). If 'false', it will ingest without any validation (this will process much faster, but may be less secure as you are not verifying each METS file before processing it) </td>
|
||||
</tr>
|
||||
</tbody></table>
|
||||
</div>
|
||||
|
||||
|
||||
<h3><a name="AipBackupRestore-Howtousetheseoptions"></a>How to use these options</h3>
|
||||
|
||||
<p>These options can be passed in two main ways:</p>
|
||||
|
||||
<p><b>From the Command Line</b></p>
|
||||
|
||||
<p>From the command-line, you can add the option to your command by using the <tt>-o</tt> or <tt>--option</tt> parameter.</p>
|
||||
<div class="code panel" style="border-width: 1px;"><div class="codeContent panelContent">
|
||||
<pre class="code-java"> [dspace]/bin/dspace packager -r -a -t AIP -o [option1-value] -o [option2-value] -e admin@myu.edu aip4567.zip
|
||||
</pre>
|
||||
</div></div>
|
||||
<p>For example:</p>
|
||||
|
||||
<div class="code panel" style="border-width: 1px;"><div class="codeContent panelContent">
|
||||
<pre class="code-java"> [dspace]/bin/dspace packager -r -a -t AIP -o ignoreParent=<span class="code-keyword">false</span> -o createMetadataFields=<span class="code-keyword">false</span> -e admin@myu.edu aip4567.zip
|
||||
</pre>
|
||||
</div></div>
|
||||
|
||||
<p><b>Via the Java API call</b></p>
|
||||
|
||||
<p>If you are programmatically calling the <tt>org.dspace.content.packager.DSpaceAIPIngester</tt> from your own custom script, you can specify these options via the <tt>org.dspace.content.packager.PackageParameters</tt> class.</p>
|
||||
|
||||
<p>As a basic example:</p>
|
||||
<div class="code panel" style="border-width: 1px;"><div class="codeContent panelContent">
|
||||
<pre class="code-java">PackageParameters params = <span class="code-keyword">new</span> PackageParameters;
|
||||
params.addProperty(<span class="code-quote">"createMetadataFields"</span>, <span class="code-quote">"<span class="code-keyword">false</span>"</span>);
|
||||
params.addProperty(<span class="code-quote">"ignoreParent"</span>, <span class="code-quote">"<span class="code-keyword">true</span>"</span>);
|
||||
</pre>
|
||||
</div></div>
|
||||
|
||||
<h2><a name="AipBackupRestore-Configurationin%27dspace.cfg%27"></a>Configuration in 'dspace.cfg'</h2>
|
||||
|
||||
<p>The following new configurations relate to AIPs:</p>
|
||||
|
||||
<h3><a name="AipBackupRestore-AIPMetadataDisseminationConfigurations"></a>AIP Metadata Dissemination Configurations</h3>
|
||||
|
||||
<p>The following configurations allow you to specify what metadata is stored within each METS-based AIP. In 'dspace.cfg', the general format for each of these settings is:</p>
|
||||
|
||||
<ul>
|
||||
<li><tt>aip.disseminate.<setting> = <mdType>:<DSpace-crosswalk-name> [, ...]</tt>
|
||||
<ul>
|
||||
<li><setting> is the setting name (see below for the full list of valid settings)</li>
|
||||
<li><mdType> is optional. It allows you to specify the value of the @MDTYPE or @OTHERMDTYPE attribute in the corresponding METS element.</li>
|
||||
<li><DSpace-crosswalk-name> is required. It specifies the name of the DSpace Crosswalk which should be used to generate this metadata.</li>
|
||||
<li>Zero or more <tt><label-for-METS>:<DSpace-crosswalk-name></tt> may be specified for each setting</li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
|
||||
<div class='panelMacro'><table class='infoMacro'><colgroup><col width='24'><col></colgroup><tr><td valign='top'><img src="images/icons/emoticons/information.gif" width="16" height="16" align="absmiddle" alt="" border="0"></td><td><b>AIP Metadata Recommendations</b><br />It is recommended to <b>minimally</b> use the default settings when generating AIPs. DSpace can only restore information that is included within an AIP. Therefore, if you choose to no longer include some information in an AIP, DSpace will no longer be able to restore that information from an AIP backup</td></tr></table></div>
|
||||
|
||||
<p>The default settings in 'dspace.cfg' are:</p>
|
||||
|
||||
<ul>
|
||||
<li><tt>aip.disseminate.techMD</tt> - Lists the DSpace Crosswalks (by name) which should be called to populate the <tt><techMD></tt> section of the METS file within the AIP (Default: <tt>PREMIS, DSPACE-ROLES</tt>)
|
||||
<ul>
|
||||
<li>The <tt>PREMIS</tt> crosswalk generates PREMIS metadata for the object specified by the AIP</li>
|
||||
<li>The <tt>DSPACE-ROLES</tt> crosswalk exports DSpace Group / EPerson information into AIPs in a DSpace-specific XML format. Using this crosswalk means that AIPs can be used to recreated Groups & People within the system. (NOTE: The <tt>DSPACE-ROLES</tt> crosswalk should be used alongside the <tt>METSRights</tt> crosswalk if you also wish to restore the <em>permissions</em> that Groups/People have within the System. See below for more info on the <tt>METSRights</tt> crosswalk.)</li>
|
||||
</ul>
|
||||
</li>
|
||||
<li><tt>aip.disseminate.sourceMD</tt> - Lists the DSpace Crosswalks (by name) which should be called to populate the <tt><sourceMD></tt> section of the METS file within the AIP (Default: <tt>AIP-TECHMD</tt>)
|
||||
<ul>
|
||||
<li>The AIP-TECHMD Crosswalk generates technical metadata (in DIM format) for the object specified by the AIP</li>
|
||||
</ul>
|
||||
</li>
|
||||
<li><tt>aip.disseminate.digiprovMD</tt> - Lists the DSpace Crosswalks (by name) which should be called to populate the <tt><digiprovMD></tt> section of the METS file within the AIP (Default: <em>None</em>)</li>
|
||||
<li><tt>aip.disseminate.rightsMD</tt> - Lists the DSpace Crosswalks (by name) which should be called to populate the <tt><rightsMD></tt> section of the METS file within the AIP (Default: <tt>DSpaceDepositLicense:DSPACE_DEPLICENSE, CreativeCommonsRDF:DSPACE_CCRDF, CreativeCommonsText:DSPACE_CCTEXT, METSRights</tt>)
|
||||
<ul>
|
||||
<li>The <tt>DSPACE_DEPLICENSE</tt> crosswalk ensures the DSpace Deposit License is referenced/stored in AIP</li>
|
||||
<li>The <tt>DSPACE_CCRDF</tt> crosswalk ensures any Creative Commons RDF Licenses are reference/stored in AIP</li>
|
||||
<li>The <tt>DSPACE_CCTEXT</tt> crosswalk ensures any Creative Commons Textual Licenses are referenced/stored in AIP</li>
|
||||
<li>The <tt>METSRights</tt> crosswalk ensures that Permissions/Rights on DSpace Objects (Communities, Collections, Items or Bitstreams) are referenced/stored in AIP. Using this crosswalk means that AIPs can be used to restore permissions that a particular Group or Person had on a DSpace Object. (NOTE: The <tt>METSRights</tt> crosswalk should always be used in conjunction with the <tt>DSPACE-ROLES</tt> crosswalk (see above) or a similar crosswalk. The <tt>METSRights</tt> crosswalk can only restore permissions, and cannot re-create Groups or EPeople in the system. The <tt>DSPACE-ROLES</tt> can actually re-create the Groups or EPeople as needed.)</li>
|
||||
</ul>
|
||||
</li>
|
||||
<li><tt>aip.disseminate.dmd</tt> - Lists the DSpace Crosswalks (by name) which should be called to populate the <tt><dmdSec></tt> section of the METS file within the AIP (Default: MODS, DIM)
|
||||
<ul>
|
||||
<li>The MODS crosswalk translates the DSpace descriptive metadata (for this object) into MODS. As MODS is a relatively "standard" metadata schema, it may be useful to include a copy of MODS metadata in your AIPs if you should ever want to import them into another (non-DSpace) system.</li>
|
||||
<li>The DIM crosswalk just translates the DSpace internal descriptive metadata into an XML format. This XML format is proprietary to DSpace, but stores the metadata in a format similar to Qualified Dublin Core.</li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
|
||||
<h3><a name="AipBackupRestore-AIPIngestionMetadataCrosswalkConfigurations"></a>AIP Ingestion Metadata Crosswalk Configurations</h3>
|
||||
|
||||
<p>The following configurations allow you to specify what DSpace Crosswalks are used during the ingestion/restoration of AIPs. These configurations also allow you to ignore areas of the METS file (in the AIP) if you do not want that area to be restored.</p>
|
||||
|
||||
<p>In <tt>dspace.cfg</tt>, the general format for each of these settings is:</p>
|
||||
|
||||
<ul>
|
||||
<li><tt>mets.dspaceAIP.ingest.crosswalk.<mdType> = <DSpace-crosswalk-name></tt>
|
||||
<ul>
|
||||
<li><mdType> is the type of metadata as specified in the METS file. This corresponds to the value of the @MDTYPE attribute (of that metadata section in the METS). When the @MDTYPE attribute is "OTHER", then the <mdType> corresponds to the @OTHERMDTYPE attribute value.</li>
|
||||
<li><DSpace-crosswalk-name> specifies the name of the DSpace Crosswalk which should be used to ingest this metadata into DSpace. You can specify the "NULLSTREAM" crosswalk if you specifically want this metadata to be ignored (and skipped over during ingestion).</li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
|
||||
<p>By default, the settings in <tt>dspace.cfg</tt> are:</p>
|
||||
|
||||
<div class="code panel" style="border-width: 1px;"><div class="codeContent panelContent">
|
||||
<pre class="code-java">mets.dspaceAIP.ingest.crosswalk.DSpaceDepositLicense = NULLSTREAM
|
||||
mets.dspaceAIP.ingest.crosswalk.CreativeCommonsRDF = NULLSTREAM
|
||||
mets.dspaceAIP.ingest.crosswalk.CreativeCommonsText = NULLSTREAM
|
||||
</pre>
|
||||
</div></div>
|
||||
|
||||
<p>The above settings tell the ingester to <b>ignore</b> any metadata sections which reference DSpace Deposit Licenses or Creative Commons Licenses. These metadata sections can be safely ignored as long as the "LICENSE" and "CC_LICENSE" bundles are included in AIPs (which is the default setting). As the Licenses are included in those Bundles, they will already be restored when restoring the bundle contents.</p>
|
||||
|
||||
<div class='panelMacro'><table class='infoMacro'><colgroup><col width='24'><col></colgroup><tr><td valign='top'><img src="images/icons/emoticons/information.gif" width="16" height="16" align="absmiddle" alt="" border="0"></td><td><b>More Info on Default Crosswalks used</b><br />If unspecified in the above settings, the AIP ingester will automatically use the Crosswalk which is named the same as the @MDTYPE or @OTHERMDTYPE attribute for the metadata section. For example, a metadata section with an @MDTYPE="PREMIS" will be processed by the DSpace Crosswalk named "PREMIS".</td></tr></table></div>
|
||||
|
||||
<h3><a name="AipBackupRestore-AIPIngestionEPersonConfigurations"></a>AIP Ingestion EPerson Configurations</h3>
|
||||
|
||||
<p>The following setting determines whether the AIP Ingester should create an EPerson (if necessary) when attempting to restore or ingest an Item whose Submitter cannot be located in the system. By default it is set to "false", as for AIPs the creation of EPeople (and Groups) is generally handled by the <tt>DSPACE-ROLES</tt> crosswalk (see <a href="#AipBackupRestore-AIPMetadataDisseminationConfigurations">AIP Metadata Dissemination Configurations</a> for more info on <tt>DSPACE-ROLES</tt> crosswalk.)</p>
|
||||
|
||||
<ul>
|
||||
<li><tt>mets.dspaceAIP.ingest.createSubmitter = false</tt></li>
|
||||
</ul>
|
||||
|
||||
|
||||
<h3><a name="AipBackupRestore-AIPConfigurationsToImproveIngestionSpeedwhileValidating"></a>AIP Configurations To Improve Ingestion Speed while Validating</h3>
|
||||
|
||||
<p>It is recommended to validate all AIPs on ingestion (when possible). But validation can be extremely slow, as each validation request first must download all referenced Schema documents from various locations on the web (sometimes as many as 10 schemas may be necessary to download in order to validate a single METS file). To make matters worse, the same schema will be re-downloaded each time it is used (i.e. it is not cached locally). So, if you are validating just 20 METS files which each reference 10 schemas, that results in 200 download requests.</p>
|
||||
|
||||
<p>In order to perform validations in a speedy fashion, you can pull down a local copy of <b>all</b> schemas. Validation will then use this local cache, which can sometimes increase the speed up to 10 x.</p>
|
||||
|
||||
<p>To use a local cache of XML schemas when validating, use the following settings in 'dspace.cfg'. The general format is:</p>
|
||||
|
||||
<ul>
|
||||
<li><tt>mets.xsd.<abbreviation> = <namespace> <local-file-name></tt>
|
||||
<ul>
|
||||
<li><tt><abbreviation></tt> is a unique abbreviation (of your choice) for this schema</li>
|
||||
<li><tt><namespace></tt> is the Schema namespace</li>
|
||||
<li><tt><local-file-name></tt> the full name of the cached schema file (which should reside in your <tt>[dspace]/config/schemas/</tt> directory, by default this directory does not exist – you will need to create it)</li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
|
||||
<p>The default settings are all commented out. But, they provide a full listing of all schemas currently used during validation of AIPs. In order to utilize them, uncomment the settings, download the appropriate schema file, and save it to your <tt>[dspace]/config/schemas/</tt> directory (by default this directory does not exist – you will need to create it) using the specified file name:</p>
|
||||
|
||||
<div class="code panel" style="border-width: 1px;"><div class="codeContent panelContent">
|
||||
<pre class="code-java">#mets.xsd.mets = http:<span class="code-comment">//www.loc.gov/METS/ mets.xsd
|
||||
</span>#mets.xsd.xlink = http:<span class="code-comment">//www.w3.org/1999/xlink xlink.xsd
|
||||
</span>#mets.xsd.mods = http:<span class="code-comment">//www.loc.gov/mods/v3 mods.xsd
|
||||
</span>#mets.xsd.xml = http:<span class="code-comment">//www.w3.org/XML/1998/namespace xml.xsd
|
||||
</span>#mets.xsd.dc = http:<span class="code-comment">//purl.org/dc/elements/1.1/ dc.xsd
|
||||
</span>#mets.xsd.dcterms = http:<span class="code-comment">//purl.org/dc/terms/ dcterms.xsd
|
||||
</span>#mets.xsd.premis = http:<span class="code-comment">//www.loc.gov/standards/premis PREMIS.xsd
|
||||
</span>#mets.xsd.premisObject = http:<span class="code-comment">//www.loc.gov/standards/premis PREMIS-<span class="code-object">Object</span>.xsd
|
||||
</span>#mets.xsd.premisEvent = http:<span class="code-comment">//www.loc.gov/standards/premis PREMIS-Event.xsd
|
||||
</span>#mets.xsd.premisAgent = http:<span class="code-comment">//www.loc.gov/standards/premis PREMIS-Agent.xsd
|
||||
</span>#mets.xsd.premisRights = http:<span class="code-comment">//www.loc.gov/standards/premis PREMIS-Rights.xsd</span>
|
||||
</pre>
|
||||
</div></div>
|
||||
|
||||
|
||||
</td>
|
||||
</tr>
|
||||
</table>
|
||||
<table border="0" cellpadding="0" cellspacing="0" width="100%">
|
||||
<tr>
|
||||
<td height="12" background="https://wiki.duraspace.org/images/border/border_bottom.gif"><img src="images/border/spacer.gif" width="1" height="1" border="0"/></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td align="center"><font color="grey">Document generated by Confluence on Nov 06, 2010 19:27</font></td>
|
||||
</tr>
|
||||
</table>
|
||||
</body>
|
||||
</html>
|
Reference in New Issue
Block a user