diff --git a/dspace/docs/html/AIP Backup and Restore.html b/dspace/docs/html/AIP Backup and Restore.html new file mode 100644 index 0000000000..1ce9be7980 --- /dev/null +++ b/dspace/docs/html/AIP Backup and Restore.html @@ -0,0 +1,860 @@ + + +
+
+
+
+ DSpace Documentation : AIP Backup and Restore
+
+
+
+ This page last changed on Feb 17, 2011 by helix84.
+
+
+ AIP Backup & Restore for DSpace+ +
+
+
+
Background & Overview+ +
As of DSpace 1.7, DSpace now can backup and restore all of its contents as a set of AIP Files. This includes all Communities, Collections, Items, Groups and People in the system. + +This feature came out of a requirement for DSpace to better integrate with DuraCloud, and other backup storage systems. One of these requirements is to be able to essentially "backup" local DSpace contents into the cloud (as a type of offsite backup), and "restore" those contents at a later time. + +Essentially, this means DSpace can export the entire hierarchy (i.e. bitstreams, metadata and relationships between Communities/Collections/Items) into a relatively standard format (a METS-based, AIP format). This entire hierarchy can also be re-imported into DSpace in the same format (essentially a restore of that content in the same or different DSpace installation). + +Benefits for the DSpace community: +
How does this differ from traditional DSpace Backups? Which Backup route is better?+ +Traditionally, it has always been recommended to backup and restore DSpace's database and files (also known as the "assetstore") separately. This is described in more detail in the Storage Layer section of the DSpace System Documentation. The traditional backup and restore route is still a recommended and supported option. + +However, the new AIP Backup & Restore option seeks to try and resolve many of the complexities of a traditional backup and restore. The below table details some of the differences between these two valid Backup and Restore options. + +
+
+
+
+
Based on your local institutions needs, you will want to choose the backup & restore process which is most appropriate to you. You may also find it beneficial to use both types of backups on different time schedules, in order to keep to a minimum the likelihood of losing your DSpace installation settings or its contents. For example, you may choose to perform a Traditional Backup once per week (to backup your local system configurations and customizations) and an AIP Backup on a daily basis. Alternatively, you may choose to perform daily Traditional Backups and only use the AIP Backup as a "permanent archives" option (perhaps performed on a weekly or monthly basis). + +
How does this work help DSpace interact with DuraCloud?+ +This work is entirely about exporting DSpace content objects to a location on a local filesystem. So, this work doesn't interact solely with DuraCloud, and could be used by any backup storage system to backup your DSpace contents. + +In the initial DuraCloud work, the DuraCloud team is working on a way to "synchronize" DuraCloud with a local file folder. So, DuraCloud can be configured to "watch" a given folder and automatically replicate its contents into the cloud. + +Therefore, moving content from DSpace to DuraCloud would currently be a two-step process: +
Similarly, moving content from DuraCloud back into DSpace would also be a two-step process: +
(These backup/restore processes may change as we go forward and investigate more use cases. This is just the initial plan.) + +Makeup and Definition of AIPs+ +AIPs are Archival Information Packages.+ +
AIP Structure / Format+ +Generally speaking, an AIP is an Zip file containing a METS manifest and all related content bitstreams. + +For more specific details of AIP format / structure, along with examples, please see DSpace AIP Format + +Running the Code+ +Exporting AIPs+ +Export Modes & Options+ +All AIP Exports are done by using the Dissemination Mode (-d option) of the packager command. + +There are two types of AIP Dissemination you can perform: +
Exporting just a single AIP+ +To export in single AIP mode (default), use this 'packager' command template: + +
+ [dspace]/bin/dspace packager -d -t AIP -e <eperson> -i <handle> <file-path> ++ for example: + +
+ [dspace]/bin/dspace packager -d -t AIP -e admin@myu.edu -i 4321/4567 aip4567.zip ++ The above code will export the object of the given handle (4321/4567) into an AIP file named "aip4567.zip". This will not include any child objects for Communities or Collections. + + +Exporting AIP Hierarchy+ +To export an AIP hierarchy, use the -a (or --all) package parameter. + +For example, use this 'packager' command template: + +
+ [dspace]/bin/dspace packager -d -a -t AIP -e <eperson> -i <handle> <file-path> ++ for example: + +
+ [dspace]/bin/dspace packager -d -a -t AIP -e admin@myu.edu -i 4321/4567 aip4567.zip ++ The above code will export the object of the given handle (4321/4567) into an AIP file named "aip4567.zip". In addition it would export all children objects to the same directory as the "aip4567.zip" file. The child AIP files are all named using the following format: +
AIPs are only generated for objects which are currently in the "in archive" state in DSpace. This means that in-progress, uncompleted submissions are not described in AIPs and cannot be restored after a disaster. + +Exporting Entire Site+ +To export an entire DSpace Site, pass the packager the Handle <site-handle-prefix>/0. For example, if your site prefix is "4321", you'd run a command similar to the following: + +
+ [dspace]/bin/dspace packager -d -a -t AIP -e admin@myu.edu -i 4321/0 sitewide-aip.zip ++ Again, this would export the DSpace Site AIP into the file "sitewide-aip.zip", and export AIPs for all Communities, Collections and Items into the same directory as the Site AIP. + +Ingesting / Restoring AIPs+ +Ingestion Modes & Options+ +Ingestion of AIPs is a bit more complex than Dissemination, as there are several different "modes" available: +
Again, like export, there are two types of AIP Ingestion you can perform (using any of the above modes): +
The difference between "Submit" and "Restore/Replace" modes+ +It's worth understanding the primary differences between a Submission (specified by -s parameter) and a Restore (specified by -r parameter). + +
Submitting AIP(s) to create a new object+ +Submitting a Single AIP+ +
To ingest a single AIP and create a new DSpace object under a parent of your choice, specify the -p (or --parent) package parameter to the command. Also, note that you are running the packager in -s (submit) mode. + +NOTE: This only ingests the single AIP specified. It does not ingest all children objects. + +
+ [dspace]/bin/dspace packager -s -t AIP -e <eperson> -p <parent-handle> <file-path> ++ If you leave out the -p parameter, the AIP package ingester will attempt to install the AIP under the same parent it had before. As you are also specifying the -s (submit) parameter, the packager will assume you want a new Handle to be assigned (as you are effectively specifying that you are submitting a new object). If you want the object to retain the Handle specified in the AIP, you can specify the -o ignoreHandle=false option to force the packager to not ignore the Handle specified in the AIP. + + +Submitting an AIP Hierarchy+ +
To ingest an AIP hierarchy from a directory of AIPs, use the -a (or --all) package parameter. + +For example, use this 'packager' command template: + +
+ [dspace]/bin/dspace packager -s -a -t AIP -e <eperson> -p <parent-handle> <file-path> ++ for example: + +
+ [dspace]/bin/dspace packager -s -a -t AIP -e admin@myu.edu -p 4321/12 aip4567.zip ++ The above command will ingest the package named "aip4567.zip" as a child of the specified Parent Object (handle="4321/12"). The resulting object is assigned a new Handle (since -s is specified). In addition, any child AIPs referenced by "aip4567.zip" are also recursively ingested (a new Handle is also assigned for each child AIP). + +Another example – Ingesting a Top-Level Community (by using the Site Handle, <site-handle-prefix>/0): +
+ [dspace]/bin/dspace packager -s -a -t AIP -e admin@myu.edu -p 4321/0 community-aip.zip ++ The above command will ingest the package named "community-aip.zip" as a top-level community (i.e. the specified parent is "4321/0" which is a Site Handle). Again, the resulting object is assigned a new Handle. In addition, any child AIPs referenced by "community-aip.zip" are also recursively ingested (a new Handle is also assigned for each child AIP). + +Restoring/Replacing using AIP(s)+ +Restoring is slightly different than just submitting. When restoring, we make every attempt to restore the object as it used to be (including its handle, parent object, etc.). + +There are currently three restore modes: +
Default Restore Mode+ +By default, the restore mode (-r option) will throw an error and rollback all changes if any object is found to already exist. The user will be informed if which object already exists within their DSpace installation. + +Restore a Single AIP: Use this 'packager' command template to restore a single object from an AIP (not including any child objects): +
+ [dspace]/bin/dspace packager -r -t AIP -e <eperson> <AIP-file-path> ++ Restore a Hierarchy of AIPs: Use this 'packager' command template to restore an object from an AIP along with all child objects (from their AIPs): +
+ [dspace]/bin/dspace packager -r -a -t AIP -e <eperson> <AIP-file-path> ++ For example: +
+ [dspace]/bin/dspace packager -r -a -t AIP -e admin@myu.edu aip4567.zip ++ Notice that unlike -s option (for submission/ingesting), the -r option does not require the Parent Object (-p option) to be specified if it can be determined from the package itself. + +In the above example, the package "aip4567.zip" is restored to the DSpace installation with the Handle provided within the package itself (and added as a child of the parent object specified within the package itself). In addition, any child AIPs referenced by "aip4567.zip" are also recursively ingested (the -a option specifies to also restore all child AIPs). They are also restored with the Handles & Parent Objects provided with their package. If any object is found to already exist, all changes are rolled back (i.e. nothing is restored to DSpace) + +
Restore, Keep Existing Mode+ +When the "Keep Existing" flag (-k option) is specified, the restore will attempt to skip over any objects found to already exist. It will report to the user that the object was found to exist (and was not modified or changed). It will then continue to restore all objects which do not already exist. + +One special case to note: If a Collection or Community is found to already exist, its child objects are also skipped over. So, this mode will not auto-restore items to an existing Collection. + +Restore a Hierarchy of AIPs: Use this 'packager' command template to restore an object from an AIP along with all child objects (from their AIPs): +
+ [dspace]/bin/dspace packager -r -a -k -t AIP -e <eperson> <AIP-file-path> ++ For example: + +
+ [dspace]/bin/dspace packager -r -a -k -t AIP -e admin@myu.edu aip4567.zip ++ In the above example, the package "aip4567.zip" is restored to the DSpace installation with the Handle provided within the package itself (and added as a child of the parent object specified within the package itself). In addition, any child AIPs referenced by "aip4567.zip" are also recursively restored (the -a option specifies to also restore all child AIPs). They are also restored with the Handles & Parent Objects provided with their package. If any object is found to already exist, it is skipped over (child objects are also skipped). All non-existing objects are restored. + +Force Replace Mode+ +When the "Force Replace" flag (-f option) is specified, the restore will overwrite any objects found to already exist in DSpace. In other words, existing content is deleted and then replaced by the contents of the AIP(s). + +
Replace using a Single AIP: Use this 'packager' command template to replace a single object from an AIP (not including any child objects): +
+ [dspace]/bin/dspace packager -r -f -t AIP -e <eperson> <AIP-file-path> ++ Replace using a Hierarchy of AIPs: Use this 'packager' command template to replace an object from an AIP along with all child objects (from their AIPs): +
+ [dspace]/bin/dspace packager -r -a -f -t AIP -e <eperson> <AIP-file-path> ++ For example: + +
+ [dspace]/bin/dspace packager -r -a -f -t AIP -e admin@myu.edu aip4567.zip ++ In the above example, the package "aip4567.zip" is restored to the DSpace installation with the Handle provided within the package itself (and added as a child of the parent object specified within the package itself). In addition, any child AIPs referenced by "aip4567.zip" are also recursively ingested. They are also restored with the Handles & Parent Objects provided with their package. If any object is found to already exist, its contents are replaced by the contents of the appropriate AIP. + +If any error occurs, the script attempts to rollback the entire replacement process. + +Restoring Entire Site+ +In order to restore an entire Site from a set of AIPs, you must do the following: +
Please note the following about the above restore command: +
Additional Packager Options+ +In additional to the various "modes" settings described under "Running the Code" above, the AIP Packager supports the following packager options. These options allow you to better tweak how your AIPs are processed (especially during ingests/restores/replaces). + +
+
+
+
+
How to use these options+ +These options can be passed in two main ways: + +From the Command Line + +From the command-line, you can add the option to your command by using the -o or --option parameter. +
+ [dspace]/bin/dspace packager -r -a -t AIP -o [option1-value] -o [option2-value] -e admin@myu.edu aip4567.zip ++ For example: + +
+ [dspace]/bin/dspace packager -r -a -t AIP -o ignoreParent=false -o createMetadataFields=false -e admin@myu.edu aip4567.zip ++ Via the Java API call + +If you are programmatically calling the org.dspace.content.packager.DSpaceAIPIngester from your own custom script, you can specify these options via the org.dspace.content.packager.PackageParameters class. + +As a basic example: +
+ +PackageParameters params = new PackageParameters; +params.addProperty("createMetadataFields", "false"); +params.addProperty("ignoreParent", "true"); ++ Configuration in 'dspace.cfg'+ +The following new configurations relate to AIPs: + +AIP Metadata Dissemination Configurations+ +The following configurations allow you to specify what metadata is stored within each METS-based AIP. In 'dspace.cfg', the general format for each of these settings is: + +
The default settings in 'dspace.cfg' are: + +
AIP Ingestion Metadata Crosswalk Configurations+ +The following configurations allow you to specify what DSpace Crosswalks are used during the ingestion/restoration of AIPs. These configurations also allow you to ignore areas of the METS file (in the AIP) if you do not want that area to be restored. + +In dspace.cfg, the general format for each of these settings is: + +
By default, the settings in dspace.cfg are: + +
+ +mets.dspaceAIP.ingest.crosswalk.DSpaceDepositLicense = NULLSTREAM +mets.dspaceAIP.ingest.crosswalk.CreativeCommonsRDF = NULLSTREAM +mets.dspaceAIP.ingest.crosswalk.CreativeCommonsText = NULLSTREAM ++ The above settings tell the ingester to ignore any metadata sections which reference DSpace Deposit Licenses or Creative Commons Licenses. These metadata sections can be safely ignored as long as the "LICENSE" and "CC_LICENSE" bundles are included in AIPs (which is the default setting). As the Licenses are included in those Bundles, they will already be restored when restoring the bundle contents. + +
AIP Ingestion EPerson Configurations+ +The following setting determines whether the AIP Ingester should create an EPerson (if necessary) when attempting to restore or ingest an Item whose Submitter cannot be located in the system. By default it is set to "false", as for AIPs the creation of EPeople (and Groups) is generally handled by the DSPACE-ROLES crosswalk (see AIP Metadata Dissemination Configurations for more info on DSPACE-ROLES crosswalk.) + +
AIP Configurations To Improve Ingestion Speed while Validating+ +It is recommended to validate all AIPs on ingestion (when possible). But validation can be extremely slow, as each validation request first must download all referenced Schema documents from various locations on the web (sometimes as many as 10 schemas may be necessary to download in order to validate a single METS file). To make matters worse, the same schema will be re-downloaded each time it is used (i.e. it is not cached locally). So, if you are validating just 20 METS files which each reference 10 schemas, that results in 200 download requests. + +In order to perform validations in a speedy fashion, you can pull down a local copy of all schemas. Validation will then use this local cache, which can sometimes increase the speed up to 10 x. + +To use a local cache of XML schemas when validating, use the following settings in 'dspace.cfg'. The general format is: + +
The default settings are all commented out. But, they provide a full listing of all schemas currently used during validation of AIPs. In order to utilize them, uncomment the settings, download the appropriate schema file, and save it to your [dspace]/config/schemas/ directory (by default this directory does not exist – you will need to create it) using the specified file name: + +
+ +#mets.xsd.mets = http://www.loc.gov/METS/ mets.xsd +#mets.xsd.xlink = http://www.w3.org/1999/xlink xlink.xsd +#mets.xsd.mods = http://www.loc.gov/mods/v3 mods.xsd +#mets.xsd.xml = http://www.w3.org/XML/1998/namespace xml.xsd +#mets.xsd.dc = http://purl.org/dc/elements/1.1/ dc.xsd +#mets.xsd.dcterms = http://purl.org/dc/terms/ dcterms.xsd +#mets.xsd.premis = http://www.loc.gov/standards/premis PREMIS.xsd +#mets.xsd.premisObject = http://www.loc.gov/standards/premis PREMIS-Object.xsd +#mets.xsd.premisEvent = http://www.loc.gov/standards/premis PREMIS-Event.xsd +#mets.xsd.premisAgent = http://www.loc.gov/standards/premis PREMIS-Agent.xsd +#mets.xsd.premisRights = http://www.loc.gov/standards/premis PREMIS-Rights.xsd ++ Common Issues or Error Messages+ +The below table lists common fixes to issues you may encounter when backing up or restoring objects using AIP Backup and Restore. + +
+
+
+
+
+
|
+
![]() |
+
Document generated by Confluence on Mar 25, 2011 19:21 | +
+
+
+ DSpace Documentation : Appendices
+
+
+
+ This page last changed on Dec 29, 2009 by mdiggory.
+
+
+
+
+
+ |
+
![]() |
+
Document generated by Confluence on Mar 25, 2011 19:21 | +
![]() |
+
Document generated by Confluence on Mar 25, 2011 19:21 | +
+
+
+ DSpace Documentation : Application Layer
+
+
+
+ This page last changed on Feb 17, 2011 by helix84.
+
+
+ System Architecture: Application Layer+ +The following explains how the application layer is built and used. + +
+
+
+
Web User Interface+ +The DSpace Web UI is the largest and most-used component in the application layer. Built on Java Servlet and JavaServer Page technology, it allows end-users to access DSpace over the Web via their Web browsers. As of Dspace 1.3.2 the UI meets both XHTML 1.0 standards and Web Accessibility Initiative (WAI) level-2 standard. + +It also features an administration section, consisting of pages intended for use by central administrators. Presently, this part of the Web UI is not particularly sophisticated; users of the administration section need to know what they are doing! Selected parts of this may also be used by collection administrators. + +Web UI Files+ +The Web UI-related files are located in a variety of directories in the DSpace source tree. Note that as of DSpace version 1.5, the deployment has changed. The build systems has moved to a maven-based system enabling the various projects (JSPUI, XMLUI, etc.) into separate projects. The system still uses the familar 'Ant' to deploy the webapps in later stages. +
+
+
+
+
+
The Build Process+ +The DSpace Maven build process constructs a full DSpace installation template directory structure containing a series of web applications. The results are placed in [dspace-source]/dspace/target/dspace-[version]-build.dir/. The process works as follows: + +
In order to then install & deploy DSpace from this "installation template" folder, you must run the following from [dspace-source]/dspace/target/dspace-[version]-build.dir/ : + +
+ ant -D [dspace]/config/dspace.cfg update+ Please see the Installation instructions for more details about the Installation process. + + +Servlets and JSPs (JSPUI Only)+ +The JSPUI Web UI is loosely based around the MVC (model, view, controller) model. The content management API corresponds to the model, the Java Servlets are the controllers, and the JSPs are the views. Interactions take the following basic form: + +
All DSpace servlets are subclasses of the DSpaceServlet class. The DSpaceServlet class handles some basic operations such as creating a DSpace Context object (opening a database connection etc.), authentication and error handling. Instead of overriding the doGet and doPost methods as one normally would for a servlet, DSpace servlets implement doDSGet or doDSPost which have an extra context parameter, and allow the servlet to throw various exceptions that can be handled in a standard way. + +The DSpace servlet processes the contents of the HTTP request. This might involve retrieving the results of a search with a query term, accessing the current user's eperson record, or updating a submission in progress. According to the results of this processing, the servlet must decide which JSP should be displayed. The servlet then fills out the appropriate attributes in the HttpRequest object that represents the HTTP request being processed. This is done by invoking the setAttribute method of the javax.servlet.http.HttpServletRequest object that is passed into the servlet from Tomcat. The servlet then forwards control of the request to the appropriate JSP using the JSPManager.showJSP method. + +The JSPManager.showJSP method uses the standard Java servlet forwarding mechanism is then used to forward the HTTP request to the JSP. The JSP is processed by Tomcat and the results sent back to the user's browser. + +There is an exception to this servlet/JSP style: index.jsp, the 'home page', receives the HTTP request directly from Tomcat without a servlet being invoked first. This is because in the servlet 2.3 specification, there is no way to map a servlet to handle only requests made to '/'; such a mapping results in every request being directed to that servlet. By default, Tomcat forwards requests to '/' to index.jsp. To try and make things as clean as possible, index.jsp contains some simple code that would normally go in a servlet, and then forwards to home.jsp using the JSPManager.showJSP method. This means localized versions of the 'home page' can be created by placing a customized home.jsp in [dspace-source]/jsp/local, in the same manner as other JSPs. + +[dspace-source]/jsp/dspace-admin/index.jsp, the administration UI index page, is invoked directly by Tomcat and not through a servlet for similar reasons. + +At the top of each JSP file, right after the license and copyright header, is documented the appropriate attributes that a servlet must fill out prior to forwarding to that JSP. No validation is performed; if the servlet does not fill out the necessary attributes, it is likely that an internal server error will occur. + +Many JSPs containing forms will include hidden parameters that tell the servlets which form has been filled out. The submission UI servlet (SubmissionController is a prime example of a servlet that deals with the input from many different JSPs. The step and page hidden parameters (written out by the SubmissionController.getSubmissionParameters() method) are used to inform the servlet which page of which step has just been filled out (i.e. which page of the submission the user has just completed). + +Below is a detailed, scary diagram depicting the flow of control during the whole process of processing and responding to an HTTP request. More information about the authentication mechanism is mostly described in the configuration section. + +Flow of Control During HTTP Request Processing + + +Custom JSP Tags (JSPUI Only)+ +The DSpace JSPs all use some custom tags defined in /dspace/jsp/WEB-INF/dspace-tags.tld, and the corresponding Java classes reside in org.dspace.app.webui.jsptag. The tags are listed below. The dspace-tags.tld file contains detailed comments about how to use the tags, so that information is not repeated here. + +
Internationalization (JSPUI Only)+ +
The Java Standard Tag Library v1.0 is used to specify messages in the JSPs like this: + +OLD: +
+ <H1>Search Results</H1>+ NEW: +
+ <H1><fmt:message key="jsp.search.results.title"/></H1>
+This message can now be changed using the config/language-packs/Messages.properties file. (This must be done at build-time: Messages.properties is placed in the dspace.war Web application file.) +
+ jsp.search.results.title = Search Results+ Phrases may have parameters to be passed in, to make the job of translating easier, reduce the number of 'keys' and to allow translators to make the translated text flow more appropriately for the target language. + +OLD: +
+ <P>Results <%= r.getFirst() %> to <%= r.getLast() %> of <%=r.getTotal() %></P>+ NEW: +
+ <fmt:message key="jsp.search.results.text">
+ <fmt:param><%= r.getFirst() %></fmt:param>
+ <fmt:param><%= r.getLast() %></fmt:param>
+ <fmt:param><%= r.getTotal() %></fmt:param>
+</fmt:message>
+(Note: JSTL 1.0 does not seem to allow JSP <%= %> expressions to be passed in as values of attribute in <fmt:param value=""/>) + +The above would appear in the Messages_xx.properties file as: +
+ jsp.search.results.text = Results {0}-{1} of {2}+ Introducing number parameters that should be formatted according to the locale used makes no difference in the message key compared to string parameters: +
+ jsp.submit.show-uploaded-file.size-in-bytes = {0} bytes+ In the JSP using this key can be used in the way belov: +
+ <fmt:message key="jsp.submit.show-uploaded-file.size-in-bytes">
+ <fmt:param><fmt:formatNumber><%= bitstream.getSize()%></fmt:formatNumber></fmt:param>
+</fmt:message>
+
+(Note: JSTL offers a way to include numbers in the message keys as jsp.foo.key = {0,number} bytes. Setting the parameter as <fmt:param value="${variable}" /> workes when variable is a single variable name and doesn't work when trying to use a method's return value instead: bitstream.getSize(). Passing the number as string (or using the <%= %> expression) also does not work.) + +Multiple Messages.properties can be created for different languages. See ResourceBundle.getBundle. e.g. you can add German and Canadian French translations: +
+ Messages_de.properties +Messages_fr_CA.properties+ The end user's browser settings determine which language is used. The English language file Messages.properties (or the default server locale) will be used as a default if there's no language bundle for the end user's preferred language. (Note that the English file is not called Messages_en.properties – this is so it is always available as a default, regardless of server configuration.) + +The dspace:layout tag has been updated to allow dictionary keys to be passed in for the titles. It now has two new parameters: titlekey and parenttitlekey. So where before you'd do: +
+ <dspace:layout title="Here" + parentlink="/mydspace" + parenttitle="My DSpace"> ++ You now do: +
+ <dspace:layout titlekey="jsp.page.title" + parentlink="/mydspace" + parenttitlekey="jsp.mydspace"> + ++ And so the layout tag itself gets the relevant stuff out of the dictionary. title and parenttitle still work as before for backwards compatibility, and the odd spot where that's preferable. + +Message Key Convention+ +When translating further pages, please follow the convention for naming message keys to avoid clashes. + +For text in JSPs use the complete path + filename of the JSP, then a one-word name for the message. e.g. for the title of jsp/mydspace/main.jsp use: +
+ jsp.mydspace.main.title+ Some common words (e.g. "Help") can be brought out into keys starting jsp. for ease of translation, e.g.: +
+ jsp.admin = Administer+ Other common words/phrases are brought out into 'general' parameters if they relate to a set (directory) of JSPs, e.g. +
+ jsp.tools.general.delete = Delete+ Phrases that relate strongly to a topic (eg. MyDSpace) but used in many JSPs outside the particular directory are more convenient to be cross-referenced. For example one could use the key below in jsp/submit/saved.jsp to provide a link back to the user's MyDSpace: + +(Cross-referencing of keys in general is not a good idea as it may make maintenance more difficult. But in some cases it has more advantages as the meaning is obvious.) +
+ jsp.mydspace.general.goto-mydspace = Go to My DSpace
+For text in servlet code, in custom JSP tags or wherever applicable use the fully qualified classname + a one-word name for the message. e.g. +
+ org.dspace.app.webui.jsptag.ItemListTag.title = Title+ Which Languages are currently supported?+ +To view translations currently being developed, please refer to the i18n page of the DSpace Wiki. + + + +HTML Content in Items+ +For the most part, the DSpace item display just gives a link that allows an end-user to download a bitstream. However, if a bundle has a primary bitstream whose format is of MIME type text/html, instead a link to the HTML servlet is given. + +So if we had an HTML document like this: +
+ contents.html +chapter1.html +chapter2.html +chapter3.html +figure1.gif +figure2.jpg +figure3.gif +figure4.jpg +figure5.gif +figure6.gif+ The Bundle's primary bitstream field would point to the contents.html Bitstream, which we know is HTML (check the format MIME type) and so we know which to serve up first. + +The HTML servlet employs a trick to serve up HTML documents without actually modifying the HTML or other files themselves. Say someone is looking at contents.html from the above example, the URL in their browser will look like this: +
+ https://dspace.mit.edu/html/1721.1/12345/contents.html
+If there's an image called figure1.gif in that HTML page, the browser will do HTTP GET on this URL: +
+ https://dspace.mit.edu/html/1721.1/12345/figure1.gif
+The HTML document servlet can work out which item the user is looking at, and then which Bitstream in it is called figure1.gif, and serve up that bitstream. Similar for following links to other HTML pages. Of course all the links and image references have to be relative and not absolute. + +HTML documents must be "self-contained", as explained here. Provided that full path information is known by DSpace, any depth or complexity of HTML document can be served subject to those constraints. This is usually possible with some kind of batch import. If, however, the document has been uploaded one file at a time using the Web UI, the path information has been stripped. The system can cope with relative links that refer to a deeper path, e.g. +
+ <IMG SRC="images/figure1.gif">
+If the item has been uploaded via the Web submit UI, in the Bitstream table in the database we have the 'name' field, which will contain the filename with no path (figure1.gif). We can still work out what images/figure1.gif is by making the HTML document servlet strip any path that comes in from the URL, e.g. +
+ https://dspace.mit.edu/html/1721.1/12345/images/figure1.gif + ^^^^^^^ + Strip this+ BUT all the filenames (regardless of directory names) must be unique. For example, this wouldn't work: +
+ contents.html +chapter1.html +chapter2.html +chapter1_images/figure.gif +chapter2_images/figure.gif+ since the HTML document servlet wouldn't know which bitstream to serve up for: +
+ https://dspace.mit.edu/html/1721.1/12345/chapter1_images/figure.gif +https://dspace.mit.edu/html/1721.1/12345/chapter2_images/figure.gif+ since it would just have figure.gif + +To prevent "infinite URL spaces" appearing (e.g. if a file foo.html linked to bar/foo.html, which would link to bar/bar/foo.html...) this behavior can be configured by setting the configuration property webui.html.max-depth-guess. + +For example, if we receive a request for foo/bar/index.html, and we have a bitstream called just index.html, we will serve up that bitstream for the request if webui.html.max-depth-guess is 2 or greater. If webui.html.max-depth-guess is 1 or less, we would not serve that bitstream, as the depth of the file is greater. If webui.html.max-depth-guess is zero, the request filename and path must always exactly match the bitstream name. The default value (if that property is not present in dspace.cfg) is 3. + + +Thesis Blocking+ +The submission UI has an optional feature that came about as a result of MIT Libraries policy. If the block.theses parameter in dspace.cfg is true, an extra checkbox is included in the first page of the submission UI. This asks the user if the submission is a thesis. If the user checks this box, the submission is halted (deleted) and an error message displayed, explaining that DSpace should not be used to submit theses. This feature can be turned off and on, and the message displayed (/dspace/jsp/submit/no-theses.jsp can be localized as necessary. + + + +OAI-PMH Data Provider+ +The DSpace platform supports the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) version 2.0 as a data provider. This is accomplished using the OAICat framework from OCLC. + +The DSpace build process builds a Web application archive, [dspace-source]/build/oai.war), in much the same way as the Web UI build process described above. The only differences are that the JSPs are not included, and [dspace-source]/etc/oai-web.xml is used as the deployment descriptor. This 'webapp' is deployed to receive and respond to OAI-PMH requests via HTTP. Note that typically it should not be deployed on SSL (https: protocol). In a typical configuration, this is deployed at oai, for example: +
+
+http://dspace.myu.edu/oai/request?verb=Identify
+
+The 'base URL' of this DSpace deployment would be: +
+
+http://dspace.myu.edu/oai/request
+
+It is this URL that should be registered with www.openarchives.org. Note that you can easily change the 'request' portion of the URL by editing [dspace-source]/etc/oai-web.xml and rebuilding and deploying oai.war. + +DSpace provides implementations of the OAICat interfaces AbstractCatalog, RecordFactory and Crosswalk that interface with the DSpace content management API and harvesting API (in the search subsystem). + +Only the basic oai_dc unqualified Dublin Core metadata set export is enabled by default; this is particularly easy since all items have qualified Dublin Core metadata. When this metadata is harvested, the qualifiers are simply stripped; for example, description.abstract is exposed as unqualified description. The description.provenance field is hidden, as this contains private information about the submitter and workflow reviewers of the item, including their e-mail addresses. Additionally, to keep in line with OAI community practices, values of contributor.author are exposed as creator values. + +Other metadata formats are supported as well, using other Crosswalk implementations; consult the oaicat.properties file described below. To enable a format, simply uncomment the lines beginning with Crosswalks.*. Multiple formats are allowed, and the current list includes, in addition to unqualified DC: MPEG DIDL, METS, MODS. There is also an incomplete, experimental qualified DC. + +Note that the current simple DC implementation (org.dspace.app.oai.OAIDCCrosswalk) does not currently strip out any invalid XML characters that may be lying around in the data. If your database contains a DC value with, for example, some ASCII control codes (form feed etc.) this may cause OAI harvesters problems. This should rarely occur, however. XML entities (such as >) are encoded (e.g. to >) + +In addition to the implementations of the OAICat interfaces, there is one main configuration file relevant to OAI-PMH support: + +
Sets+ +OAI-PMH allows repositories to expose an hierarchy of sets in which records may be placed. A record can be in zero or more sets. + +DSpace exposes collections as sets. The organization of communities is likely to change over time, and is therefore a less stable basis for selective harvesting. + +Each collection has a corresponding OAI set, discoverable by harvesters via the ListSets verb. The setSpec is the Handle of the collection, with the ':' and '/' converted to underscores so that the Handle is a legal setSpec, for example: +
+ +hdl_1721.1_1234 ++ Naturally enough, the collection name is also the name of the corresponding set. + + +Unique Identifier+ +Every item in OAI-PMH data repository must have an unique identifier, which must conform to the URI syntax. As of DSpace 1.2, Handles are not used; this is because in OAI-PMH, the OAI identifier identifies the metadata record associated with the resource. The resource is the DSpace item, whose resource identifier is the Handle. In practical terms, using the Handle for the OAI identifier may cause problems in the future if DSpace instances share items with the same Handles; the OAI metadata record identifiers should be different as the different DSpace instances would need to be harvested separately and may have different metadata for the item. + +The OAI identifiers that DSpace uses are of the form: + +
+ oai:host name:handle+ For example: + +
+ oai:dspace.myu.edu:123456789/345+ If you wish to use a different scheme, this can easily be changed by editing the value of OAI_ID_PREFIX at the top of the org.dspace.app.oai.DSpaceOAICatalog class. (You do not need to change the code if the above scheme works for you; the code picks up the host name and Handles automatically from the DSpace configuration.) + + +Access control+ +OAI provides no authentication/authorisation details, although these could be implemented using standard HTTP methods. It is assumed that all access will be anonymous for the time being. + +A question is, "is all metadata public?" Presently the answer to this is yes; all metadata is exposed via OAI-PMH, even if the item has restricted access policies. The reasoning behind this is that people who do actually have permission to read a restricted item should still be able to use OAI-based services to discover the content. + +If in the future, this 'expose all metadata' approach proves unsatisfactory for any reason, it should be possible to expose only publicly readable metadata. The authorisation system has separate permissions for READing and item and READing the content (bitstreams) within it. This means the system can differentiate between an item with public metadata and hidden content, and an item with hidden metadata as well as hidden content. In this case the OAI data repository should only expose items those with anonymous READ access, so it can hide the existence of records to the outside world completely. In this scenario, one should be wary of protected items that are made public after a time. When this happens, the items are "new" from the OAI-PMH perspective. + + +Modification Date (OAI Date Stamp)+ +OAI-PMH harvesters need to know when a record has been created, changed or deleted. DSpace keeps track of a 'last modified' date for each item in the system, and this date is used for the OAI-PMH date stamp. This means that any changes to the metadata (e.g. admins correcting a field, or a withdrawal) will be exposed to harvesters. + + +'About' Information+ +As part of each record given out to a harvester, there is an optional, repeatable "about" section which can be filled out in any (XML-schema conformant) way. Common uses are for provenance and rights information, and there are schemas in use by OAI communities for this. Presently DSpace does not provide any of this information. + + +Deletions+ +DSpace keeps track of deletions (withdrawals). These are exposed via OAI, which has a specific mechansim for dealing with this. Since DSpace keeps a permanent record of withdrawn items, in the OAI-PMH sense DSpace supports deletions 'persistently'. This is as opposed to 'transient' deletion support, which would mean that deleted records are forgotten after a time. + +Once an item has been withdrawn, OAI-PMH harvests of the date range in which the withdrawal occurred will find the 'deleted' record header. Harvests of a date range prior to the withdrawal will not find the record, despite the fact that the record did exist at that time. + +As an example of this, consider an item that was created on 2002-05-02 and withdrawn on 2002-10-06. A request to harvest the month 2002-10 will yield the 'record deleted' header. However, a harvest of the month 2002-05 will not yield the original record. + +Note that presently, the deletion of 'expunged' items is not exposed through OAI. + + +Flow Control (Resumption Tokens)+ +An OAI data provider can prevent any performance impact caused by harvesting by forcing a harvester to receive data in time-separated chunks. If the data provider receives a request for a lot of data, it can send part of the data with a resumption token. The harvester can then return later with the resumption token and continue. + +DSpace supports resumption tokens for 'ListRecords' OAI-PMH requests. ListIdentifiers and ListSets requests do not produce a particularly high load on the system, so resumption tokens are not used for those requests. + +Each OAI-PMH ListRecords request will return at most 100 records. This limit is set at the top of org.dspace.app.oai.DSpaceOAICatalog.java (MAX_RECORDS). A potential issue here is that if a harvest yields an exact multiple of MAX_RECORDS, the last operation will result in a harvest with no records in it. It is unclear from the OAI-PMH specification if this is acceptable. + +When a resumption token is issued, the optional completeListSize and cursor attributes are not included. OAICat sets the expirationDate of the resumption token to one hour after it was issued, though in fact since DSpace resumption tokens contain all the information required to continue a request they do not actually expire. + +Resumption tokens contain all the state information required to continue a request. The format is: +
+ +from/until/setSpec/offset ++ from and until are the ISO 8601 dates passed in as part of the original request, and setSpec is also taken from the original request. offset is the number of records that have already been sent to the harvester. For example: +
+
+2003-01-01//hdl_1721_1_1234/300
+
+This means the harvest is 'from' DSpace Command Launcher+ +Introduced in Release 1.6, the DSpace Command Launcher brings together the various command and scripts into a standard-practice for running CLI runtime programs. + +Older Versions+ +Prior to Release 1.6, there were various scripts written that masked a more manual approach to running CLI programs. The user had to issue [dspace]/bin/dsrun and then java class that ran that program. With release 1.5, scripts were written to mask the [dspace]/bin/dsrun command. We have left the java class in the System Administration section since it does have value for debugging purposes and for those who wish to learn about DSpace Command Launcher Structure+ +There are two components to the command launcher: the dspace script and the launcher.xml. The DSpace command calls a java class which in turn refers to launcher.xml that is stored in the [dspace]/config directory + +launcher.xml is made of several components: + +
+
+ Attachments:
+
+
+
+
+ |
+
![]() |
+
Document generated by Confluence on Mar 25, 2011 19:21 | +
+
+
+ DSpace Documentation : Architecture
+
+
+
+ This page last changed on Dec 15, 2010 by tdonohue.
+
+
+ DSpace System Documentation: Architecture+ +
+
+
+
Overview+ +The DSpace system is organized into three layers, each of which consists of a number of components. + +DSpace System Architecture+ +The storage layer is responsible for physical storage of metadata and content. The business logic layer deals with managing the content of the archive, users of the archive (e-people), authorization, and workflow. The application layer contains components that communicate with the world outside of the individual DSpace installation, for example the Web user interface and the Open Archives Initiative protocol for metadata harvesting service. + +Each layer only invokes the layer below it; the application layer may not used the storage layer directly, for example. Each component in the storage and business logic layers has a defined public API. The union of the APIs of those components are referred to as the Storage API (in the case of the storage layer) and the DSpace Public API (in the case of the business logic layer). These APIs are in-process Java classes, objects and methods. + +It is important to note that each layer is trusted. Although the logic for authorising actions is in the business logic layer, the system relies on individual applications in the application layer to correctly and securely authenticate e-people. If a 'hostile' or insecure application were allowed to invoke the Public API directly, it could very easily perform actions as any e-person in the system. + +The reason for this design choice is that authentication methods will vary widely between different applications, so it makes sense to leave the logic and responsibility for that in these applications. + +The source code is organized to cohere very strictly to this three-layer architecture. Also, only methods in a component's public API are given the public access level. This means that the Java compiler helps ensure that the source code conforms to the architecture. +
+
+
+
+
The storage and business logic layer APIs are extensively documented with Javadoc-style comments. Generate the HTML version of these by entering the [dspace-source]/dspace directory and running: +
+ +mvn javadoc:javadoc ++ The resulting documentation will be at [dspace-source]dspace-api/target/site/apidocs/index.html. The package-level documentation of each package usually contains an overview of the package and some example usage. This information is not repeated in this architecture document; this and the Javadoc APIs are intended to be used in parallel. + +Each layer is described in a separate section: + +
+
+ Attachments:
+
+
+
+
+ |
+
![]() |
+
Document generated by Confluence on Mar 25, 2011 19:21 | +
+
+
+ DSpace Documentation : Business Logic Layer
+
+
+
+ This page last changed on Feb 17, 2011 by helix84.
+
+
+ System Architecture: Business Logic Layer+ +
+
+
+
Core Classes+ +The org.dspace.core package provides some basic classes that are used throughout the DSpace code. + +The Configuration Manager+ +The configuration manager is responsible for reading the main dspace.cfg properties file, managing the 'template' configuration files for other applications such as Apache, and for obtaining the text for e-mail messages. + +The system is configured by editing the relevant files in [dspace]/config, as described in the configuration section. + +When editing configuration files for applications that DSpace uses, such as Apache Tomcat, you may want to edit the copy in [dspace-source] and then run ant update or ant overwrite_configs rather than editing the 'live' version directly! This will ensure you have a backup copy of your modified configuration files, so that they are not accidentally overwritten in the future. + +The ConfigurationManager class can also be invoked as a command line tool: + +
Constants+ +This class contains constants that are used to represent types of object and actions in the database. For example, authorization policies can relate to objects of different types, so the resourcepolicy table has columns resource_id, which is the internal ID of the object, and resource_type_id, which indicates whether the object is an item, collection, bitstream etc. The value of resource_type_id is taken from the Constants class, for example Constants.ITEM. + + +Context+ +The Context class is central to the DSpace operation. Any code that wishes to use the any API in the business logic layer must first create itself a Context object. This is akin to opening a connection to a database (which is in fact one of the things that happens.) + +A context object is involved in most method calls and object constructors, so that the method or object has access to information about the current operation. When the context object is constructed, the following information is automatically initialized: + +
You should always abort a context if any error happens during its lifespan; otherwise the data in the system may be left in an inconsistent state. You can also commit a context, which means that any changes are written to the database, and the context is kept active for further use. + + +Sending e-mails is pretty easy. Just use the configuration manager's getEmail method, set the arguments and recipients, and send. + +The e-mail texts are stored in [dspace]/config/emails. They are processed by the standard java.text.MessageFormat. At the top of each e-mail are listed the appropriate arguments that should be filled out by the sender. Example usage is shown in the org.dspace.core.Email Javadoc API documentation. + + +LogManager+ +The log manager consists of a method that creates a standard log header, and returns it as a string suitable for logging. Note that this class does not actually write anything to the logs; the log header returned should be logged directly by the sender using an appropriate Log4J call, so that information about where the logging is taking place is also stored. + +The level of logging can be configured on a per-package or per-class basis by editing [dspace]/config/log4j.properties. You will need to stop and restart Tomcat for the changes to take effect. + +A typical log entry looks like this: + +2002-11-11 08:11:32,903 INFO org.dspace.app.webui.servlet.DSpaceServlet @ anonymous:session_id=BD84E7C194C2CF4BD0EC3A6CAD0142BB:view_item:handle=1721.1/1686 + +This is breaks down like this: +
+
+
+
+
The above format allows the logs to be easily parsed and analyzed. The [dspace]/bin/log-reporter script is a simple tool for analyzing logs. Try: + +
+ [dspace]/bin/log-reporter --help+ It's a good idea to 'nice' this log reporter to avoid an impact on server performance. + + +Utils+ +Utils contains miscellaneous utility method that are required in a variety of places throughout the code, and thus have no particular 'home' in a subsystem. + + + +Content Management API+ +The content management API package org.dspace.content contains Java classes for reading and manipulating content stored in the DSpace system. This is the API that components in the application layer will probably use most. + +Classes corresponding to the main elements in the DSpace data model (Community, Collection, Item, Bundle and Bitstream) are sub-classes of the abstract class DSpaceObject. The Item object handles the Dublin Core metadata record. + +Each class generally has one or more static find methods, which are used to instantiate content objects. Constructors do not have public access and are just used internally. The reasons for this are: + +
Collection, Bundle and Bitstream do not have create methods; rather, one has to create an object using the relevant method on the container. For example, to create a collection, one must invoke createCollection on the community that the collection is to appear in: + +
+ Context context = new Context();
+Community existingCommunity = Community.find(context, 123);
+Collection myNewCollection = existingCommunity.createCollection();
+The primary reason for this is for determining authorization. In order to know whether an e-person may create an object, the system must know which container the object is to be added to. It makes no sense to create a collection outside of a community, and the authorization system does not have a policy for that. + +Item_s are first created in the form of an implementation of _InProgressSubmission. An InProgressSubmission represents an item under construction; once it is complete, it is installed into the main archive and added to the relevant collection by the InstallItem class. The org.dspace.content package provides an implementation of InProgressSubmission called WorkspaceItem; this is a simple implementation that contains some fields used by the Web submission UI. The org.dspace.workflow also contains an implementation called WorkflowItem which represents a submission undergoing a workflow process. + +In the previous chapter there is an overview of the item ingest process which should clarify the previous paragraph. Also see the section on the workflow system. + +Community and BitstreamFormat do have static create methods; one must be a site administrator to have authorization to invoke these. + +Other Classes+ +Classes whose name begins DC are for manipulating Dublin Core metadata, as explained below. + +The FormatIdentifier class attempts to guess the bitstream format of a particular bitstream. Presently, it does this simply by looking at any file extension in the bitstream name and matching it up with the file extensions associated with bitstream formats. Hopefully this can be greatly improved in the future! + +The ItemIterator class allows items to be retrieved from storage one at a time, and is returned by methods that may return a large number of items, more than would be desirable to have in memory at once. + +The ItemComparator class is an implementation of the standard java.util.Comparator that can be used to compare and order items based on a particular Dublin Core metadata field. + + +Modifications+ +When creating, modifying or for whatever reason removing data with the content management API, it is important to know when changes happen in-memory, and when they occur in the physical DSpace storage. + +Primarily, one should note that no change made using a particular org.dspace.core.Context object will actually be made in the underlying storage unless complete or commit is invoked on that Context. If anything should go wrong during an operation, the context should always be aborted by invoking abort, to ensure that no inconsistent state is written to the storage. + +Additionally, some changes made to objects only happen in-memory. In these cases, invoking the update method lines up the in-memory changes to occur in storage when the Context is committed or completed. In general, methods that change any [meta]data field only make the change in-memory; methods that involve relationships with other objects in the system line up the changes to be committed with the context. See individual methods in the API Javadoc. + +Some examples to illustrate this are shown below: +
+
+
+
+
+
What's In Memory?+ +Instantiating some content objects also causes other content objects to be loaded into memory. + +Instantiating a Bitstream object causes the appropriate BitstreamFormat object to be instantiated. Of course the Bitstream object does not load the underlying bits from the bitstream store into memory! + +Instantiating a Bundle object causes the appropriate Bitstream objects (and hence _BitstreamFormat_s) to be instantiated. + +Instantiating an Item object causes the appropriate Bundle objects (etc.) and hence _BitstreamFormat_s to be instantiated. All the Dublin Core metadata associated with that item are also loaded into memory. + +The reasoning behind this is that for the vast majority of cases, anyone instantiating an item object is going to need information about the bundles and bitstreams within it, and this methodology allows that to be done in the most efficient way and is simple for the caller. For example, in the Web UI, the servlet (controller) needs to pass information about an item to the viewer (JSP), which needs to have all the information in-memory to display the item without further accesses to the database which may cause errors mid-display. + +You do not need to worry about multiple in-memory instantiations of the same object, or any inconsistencies that may result; the Context object keeps a cache of the instantiated objects. The find methods of classes in org.dspace.content will use a cached object if one exists. + +It may be that in enough cases this automatic instantiation of contained objects reduces performance in situations where it is important; if this proves to be true the API may be changed in the future to include a loadContents method or somesuch, or perhaps a Boolean parameter indicating what to do will be added to the find methods. + +When a Context object is completed, aborted or garbage-collected, any objects instantiated using that context are invalidated and should not be used (in much the same way an AWT button is invalid if the window containing it is destroyed). + + +Dublin Core Metadata+ +The DCValue class is a simple container that represents a single Dublin Core element, optional qualifier, value and language. Note that since DSpace 1.4 the MetadataValue and associated classes are preferred (see Support for Other Metadata Schemas). The other classes starting with DC are utility classes for handling types of data in Dublin Core, such as people's names and dates. As supplied, the DSpace registry of elements and qualifiers corresponds to the Library Application Profile for Dublin Core. It should be noted that these utility classes assume that the values will be in a certain syntax, which will be true for all data generated within the DSpace system, but since Dublin Core does not always define strict syntax, this may not be true for Dublin Core originating outside DSpace. + +Below is the specific syntax that DSpace expects various fields to adhere to: +
+
+
+
+
+
Support for Other Metadata Schemas+ +To support additional metadata schemas a new set of metadata classes have been added. These are backwards compatible with the DC classes and should be used rather than the DC specific classes wherever possible. Note that hierarchical metadata schemas are not currently supported, only flat schemas (such as DC) are able to be defined. + +The MetadataField class describes a metadata field by schema, element and optional qualifier. The value of a MetadataField is described by a MetadataValue which is roughly equivalent to the older DCValue class. Finally the MetadataSchema class is used to describe supported schemas. The DC schema is supported by default. Refer to the javadoc for method details. + + +Packager Plugins+ +The Packager plugins let you ingest a package to create a new DSpace Object, and disseminate a content Object as a package. A package is simply a data stream; its contents are defined by the packager plugin's implementation. + +To ingest an object, which is currently only implemented for Items, the sequence of operations is: + +
Here is an example package ingestion code fragment: +
+ Collection collection = find target collection + InputStream source = ...; + PackageParameters params = ...; + String license = null; + + PackageIngester sip = (PackageIngester) PluginManager + .getNamedPlugin(PackageIngester.class, packageType); + + WorkspaceItem wi = sip.ingest(context, collection, source, params, license);+ Here is an example of a package dissemination: +
+ OutputStream destination = ...; + PackageParameters params = ...; + DSpaceObject dso = ...; + + PackageIngester dip = (PackageDisseminator) PluginManager + .getNamedPlugin(PackageDisseminator.class, packageType); + + dip.disseminate(context, dso, params, destination);+ Plugin Manager+ +The PluginManager is a very simple component container. It creates and organizes components (plugins), and helps select a plugin in the cases where there are many possible choices. It also gives some limited control over the life cycle of a plugin. + +Concepts+ +The following terms are important in understanding the rest of this section: + +
Using the Plugin Manager+ +Types of Plugin+ +The Plugin Manager supports three different patterns of usage: + +
Self-Named Plugins+ +Named plugins can get their names either from the configuration or, for a variant called self-named plugins, from within the plugin itself. + +Self-named plugins are necessary because one plugin implementation can be configured itself to take on many "personalities", each of which deserves its own plugin name. It is already managing its own configuration for each of these personalities, so it makes sense to allow it to export them to the Plugin Manager rather than expecting the plugin configuration to be kept in sync with it own configuration. + +An example helps clarify the point: There is a named plugin that does crosswalks, call it CrosswalkPlugin. It has several implementations that crosswalk some kind of metadata. Now we add a new plugin which uses XSL stylesheet transformation (XSLT) to crosswalk many types of metadata – so the single plugin can act like many different plugins, depending on which stylesheet it employs. + +This XSLT-crosswalk plugin has its own configuration that maps a Plugin Name to a stylesheet – it has to, since of course the Plugin Manager doesn't know anything about stylesheets. It becomes a self-named plugin, so that it reads its configuration data, gets the list of names to which it can respond, and passes those on to the Plugin Manager. + +When the Plugin Manager creates an instance of the XSLT-crosswalk, it records the Plugin Name that was responsible for that instance. The plugin can look at that Name later in order to configure itself correctly for the Name that created it. This mechanism is all part of the SelfNamedPlugin class which is part of any self-named plugin. + + +Obtaining a Plugin Instance+ +The most common thing you will do with the Plugin Manager is obtain an instance of a plugin. To request a plugin, you must always specify the plugin interface you want. You will also supply a name when asking for a named plugin. + +A sequence plugin is returned as an array of _Object_s since it is actually an ordered list of plugins. + +See the getSinglePlugin(), getPluginSequence(), getNamedPlugin() methods. + + +Lifecycle Management+ +When PluginManager fulfills a request for a plugin, it checks whether the implementation class is reusable; if so, it creates one instance of that class and returns it for every subsequent request for that interface and name. If it is not reusable, a new instance is always created. + +For reasons that will become clear later, the manager actually caches a separate instance of an implementation class for each name under which it can be requested. + +You can ask the PluginManager to forget about (decache) a plugin instance, by releasing it. See the PluginManager.releasePlugin() method. The manager will drop its reference to the plugin so the garbage collector can reclaim it. The next time that plugin/name combination is requested, it will create a new instance. + + +Getting Meta-Information+ +The PluginManager can list all the names of the Named Plugins which implement an interface. You may need this, for example, to implement a menu in a user interface that presents a choice among all possible plugins. See the getPluginNames() method. + +Note that it only returns the plugin name, so if you need a more sophisticated or meaningful "label" (i.e. a key into the I18N message catalog) then you should add a method to the plugin itself to return that. + + + +Implementation+ +Note: The PluginManager refers to interfaces and classes internally only by their names whenever possible, to avoid loading classes until absolutely necessary (i.e. to create an instance). As you'll see below, self-named classes still have to be loaded to query them for names, but for the most part it can avoid loading classes. This saves a lot of time at start-up and keeps the JVM memory footprint down, too. As the Plugin Manager gets used for more classes, this will become a greater concern. + +The only downside of "on-demand" loading is that errors in the configuration don't get discovered right away. The solution is to call the checkConfiguration() method after making any changes to the configuration. + +PluginManager Class+ +The PluginManager class is your main interface to the Plugin Manager. It behaves like a factory class that never gets instantiated, so its public methods are static. + +Here are the public methods, followed by explanations: + +
SelfNamedPlugin Class+ +A named plugin implementation must extend this class if it wants to supply its own Plugin Name(s). See Self-Named Plugins for why this is sometimes necessary. +
+ abstract class SelfNamedPlugin +{ + // Your class must override this: + // Return all names by which this plugin should be known. + public static String[] getPluginNames(); + + // Returns the name under which this instance was created. + // This is implemented by SelfNamedPlugin and should NOT be + overridden. + public String getPluginInstanceName(); +}+ Errors and Exceptions+
+ public class PluginConfigurationError extends Error +{ + public PluginConfigurationError(String message); +}+ An error of this type means the caller asked for a single plugin, but either there was no single plugin configured matching that interface, or there was more than one. Either case causes a fatal configuration error. +
+ public class PluginInstantiationException extends RuntimeException +{ + public PluginInstantiationException(String msg, Throwable cause) +}+ This exception indicates a fatal error when instantiating a plugin class. It should only be thrown when something unexpected happens in the course of instantiating a plugin, e.g. an access error, class not found, etc. Simply not finding a class in the configuration is not an exception. + +This is a RuntimeException so it doesn't have to be declared, and can be passed all the way up to a generalized fatal exception handler. + + + +Configuring Plugins+ +All of the Plugin Manager's configuration comes from the DSpace Configuration Manager, which is a Java Properties map. You can configure these characteristics of each plugin: + +
Configuring Singleton (Single) Plugins+ +This entry configures a Single Plugin for use with getSinglePlugin(): + +
+ plugin.single.interface = classname
+For example, this configures the class org.dspace.checker.SimpleDispatcher as the plugin for interface org.dspace.checker.BitstreamDispatcher: + +
+ plugin.single.org.dspace.checker.BitstreamDispatcher=org.dspace.checker.SimpleDispatcher+ Configuring Sequence of Plugins+ +This kind of configuration entry defines a Sequence Plugin, which is bound to a sequence of implementation classes. The key identifies the interface, and the value is a comma-separated list of classnames: For example, this entry configures Stackable Authentication with three implementation classes: +
+ plugin.sequence.org.dspace.eperson.AuthenticationMethod = \ + org.dspace.eperson.X509Authentication, \ + org.dspace.eperson.PasswordAuthentication, \ + edu.mit.dspace.MITSpecialGroup+ Configuring Named Plugins+ +There are two ways of configuring named plugins: + +
Configuring the Reusable Status of a Plugin+ +Plugins are assumed to be reusable by default, so you only need to configure the ones which you would prefer not to be reusable. The format is as follows: + +
+ plugin.reusable.classname = ( true | false )+ For example, this marks the PDF plugin from the example above as non-reusable: + +
+ plugin.reusable.org.dspace.app.mediafilter.PDFFilter = false
+Validating the Configuration+ +The Plugin Manager is very sensitive to mistakes in the DSpace configuration. Subtle errors can have unexpected consequences that are hard to detect: for example, if there are two "plugin.single" entries for the same interface, one of them will be silently ignored. + +To validate the Plugin Manager configuration, call the PluginManager.checkConfiguration() method. It looks for the following mistakes: + +
Eventually, someone should develop a general configuration-file sanity checker for DSpace, which would just call PluginManager.checkConfiguration(). + + +Use Cases+ +Here are some usage examples to illustrate how the Plugin Manager works. + +Managing the MediaFilter plugins transparently+ +The existing DSpace 1.3 MediaFilterManager implementation has been largely replaced by the Plugin Manager. The MediaFilter classes become plugins named in the configuration. Refer to the configuration guide for further details. + + +A Singleton Plugin+ +This shows how to configure and access a single anonymous plugin, such as the BitstreamDispatcher plugin: + +Configuration: + +plugin.single.org.dspace.checker.BitstreamDispatcher=org.dspace.checker.SimpleDispatcher + +The following code fragment shows how dispatcher, the service object, is initialized and used: +
+ BitstreamDispatcher dispatcher = + + (BitstreamDispatcher)PluginManager.getSinglePlugin(BitstreamDispatcher +.class); + +int id = dispatcher.next(); + +while (id != BitstreamDispatcher.SENTINEL) +{ + /* + do some processing here + */ + + id = dispatcher.next(); +}+ Plugin that Names Itself+ +This crosswalk plugin acts like many different plugins since it is configured with different XSL translation stylesheets. Since it already gets each of its stylesheets out of the DSpace configuration, it makes sense to have the plugin give PluginManager the names to which it answers instead of forcing someone to configure those names in two places (and try to keep them synchronized). + +NOTE: Remember how getPlugin() caches a separate instance of an implementation class for every name bound to it? This is why: the instance can look at the name under which it was invoked and configure itself specifically for that name. Since the instance for each name might be different, the Plugin Manager has to cache a separate instance for each name. + +Here is the configuration file listing both the plugin's own configuration and the PluginManager config line: +
+ crosswalk.dissemination.stylesheet.DublinCore = xwalk/TESTDIM-2-DC_copy.xsl +crosswalk.dissemination.stylesheet.MODS = xwalk/mods.xsl + +plugin.selfnamed.org.dspace.content.metadata.DisseminationCrosswalk = \ + org.dspace.content.metadata.XsltDisseminationCrosswalk+ This look into the implementation shows how it finds configuration entries to populate the array of plugin names returned by the getPluginNames() method. Also note, in the getStylesheet() method, how it uses the plugin name that created the current instance (returned by getPluginInstanceName()) to find the correct stylesheet. +
+ public class XsltDisseminationCrosswalk extends SelfNamedPlugin +{ + .... + private final String prefix = + "crosswalk.dissemination.stylesheet."; + .... + public static String[] getPluginNames() + { + List aliasList = new ArrayList(); + Enumeration pe = ConfigurationManager.propertyNames(); + + while (pe.hasMoreElements()) + { + String key = (String)pe.nextElement(); + if (key.startsWith(prefix)) + aliasList.add(key.substring(prefix.length())); + } + return (String[])aliasList.toArray(new + String[aliasList.size()]); + } + + // get the crosswalk stylesheet for an instance of the plugin: + private String getStylesheet() + { + return ConfigurationManager.getProperty(prefix + + getPluginInstanceName()); + } +}+ Stackable Authentication+ +The Stackable Authentication mechanism needs to know all of the plugins configured for the interface, in the order of configuration, since order is significant. It gets a Sequence Plugin from the Plugin Manager. Refer to the Configuration Section on Stackable Authentication for further details. + + + + +Workflow System+ +The primary classes are: +
+
+
+
+
The workflow system models the states of an Item in a state machine with 5 states (SUBMIT, STEP_1, STEP_2, STEP_3, ARCHIVE.) These are the three optional steps where the item can be viewed and corrected by different groups of people. Actually, it's more like 8 states, with STEP_1_POOL, STEP_2_POOL, and STEP_3_POOL. These pooled states are when items are waiting to enter the primary states. + +The WorkflowManager is invoked by events. While an Item is being submitted, it is held by a WorkspaceItem. Calling the start() method in the WorkflowManager converts a WorkspaceItem to a WorkflowItem, and begins processing the WorkflowItem's state. Since all three steps of the workflow are optional, if no steps are defined, then the Item is simply archived. + +Workflows are set per Collection, and steps are defined by creating corresponding entries in the List named workflowGroup. If you wish the workflow to have a step 1, use the administration tools for Collections to create a workflow Group with members who you want to be able to view and approve the Item, and the workflowGroup[0] becomes set with the ID of that Group. + +If a step is defined in a Collection's workflow, then the WorkflowItem's state is set to that step_POOL. This pooled state is the WorkflowItem waiting for an EPerson in that group to claim the step's task for that WorkflowItem. The WorkflowManager emails the members of that Group notifying them that there is a task to be performed (the text is defined in config/emails,) and when an EPerson goes to their 'My DSpace' page to claim the task, the WorkflowManager is invoked with a claim event, and the WorkflowItem's state advances from STEP_x_POOL to STEP_x (where x is the corresponding step.) The EPerson can also generate an 'unclaim' event, returning the WorkflowItem to the STEP_x_POOL. + +Other events the WorkflowManager handles are advance(), which advances the WorkflowItem to the next state. If there are no further states, then the WorkflowItem is removed, and the Item is then archived. An EPerson performing one of the tasks can reject the Item, which stops the workflow, rebuilds the WorkspaceItem for it and sends a rejection note to the submitter. More drastically, an abort() event is generated by the admin tools to cancel a workflow outright. + + +Administration Toolkit+ +The org.dspace.administer package contains some classes for administering a DSpace system that are not generally needed by most applications. + +The CreateAdministrator class is a simple command-line tool, executed via [dspace]/bin/dspace create-administrator, that creates an administrator e-person with information entered from standard input. This is generally used only once when a DSpace system is initially installed, to create an initial administrator who can then use the Web administration UI to further set up the system. This script does not check for authorization, since it is typically run before there are any e-people to authorize! Since it must be run as a command-line tool on the server machine, generally this shouldn't cause a problem. A possibility is to have the script only operate when there are no e-people in the system already, though in general, someone with access to command-line scripts on your server is probably in a position to do what they want anyway! + +The DCType class is similar to the org.dspace.content.BitstreamFormat class. It represents an entry in the Dublin Core type registry, that is, a particular element and qualifier, or unqualified element. It is in the administer package because it is only generally required when manipulating the registry itself. Elements and qualifiers are specified as literals in org.dspace.content.Item methods and the org.dspace.content.DCValue class. Only administrators may modify the Dublin Core type registry. + +The org.dspace.administer.RegistryLoader class contains methods for initializing the Dublin Core type registry and bitstream format registry with entries in an XML file. Typically this is executed via the command line during the build process (see build.xml in the source.) To see examples of the XML formats, see the files in config/registries in the source directory. There is no XML schema, they aren't validated strictly when loaded in. + + +E-person/Group Manager+ +DSpace keeps track of registered users with the org.dspace.eperson.EPerson class. The class has methods to create and manipulate an EPerson such as get and set methods for first and last names, email, and password. (Actually, there is no getPassword() method‚ an MD5 hash of the password is stored, and can only be verified with the checkPassword() method.) There are find methods to find an EPerson by email (which is assumed to be unique,) or to find all EPeople in the system. + +The EPerson object should probably be reworked to allow for easy expansion; the current EPerson object tracks pretty much only what MIT was interested in tracking - first and last names, email, phone. The access methods are hardcoded and should probably be replaced with methods to access arbitrary name/value pairs for institutions that wish to customize what EPerson information is stored. + +Groups are simply lists of EPerson objects. Other than membership, Group objects have only one other attribute: a name. Group names must be unique, so we have adopted naming conventions where the role of the group is its name, such as COLLECTION_100_ADD. Groups add and remove EPerson objects with addMember() and removeMember() methods. One important thing to know about groups is that they store their membership in memory until the update() method is called - so when modifying a group's membership don't forget to invoke update() or your changes will be lost! Since group membership is used heavily by the authorization system a fast isMember() method is also provided. + +Another kind of Group is also implemented in DSpace‚ special Groups. The Context object for each session carries around a List of Group IDs that the user is also a member of‚ currently the MITUser Group ID is added to the list of a user's special groups if certain IP address or certificate criteria are met. + + +Authorization+ +The primary classes are: +
+
+
+
+
The authorization system is based on the classic 'police state' model of security; no action is allowed unless it is expressed in a policy. The policies are attached to resources (hence the name ResourcePolicy,) and detail who can perform that action. The resource can be any of the DSpace object types, listed in org.dspace.core.Constants (BITSTREAM, ITEM, COLLECTION, etc.) The 'who' is made up of EPerson groups. The actions are also in Constants.java (READ, WRITE, ADD, etc.) The only non-obvious actions are ADD and REMOVE, which are authorizations for container objects. To be able to create an Item, you must have ADD permission in a Collection, which contains Items. (Communities, Collections, Items, and Bundles are all container objects.) + +Currently most of the read policy checking is done with items‚ communities and collections are assumed to be openly readable, but items and their bitstreams are checked. Separate policy checks for items and their bitstreams enables policies that allow publicly readable items, but parts of their content may be restricted to certain groups. + +The AuthorizeManager class' ResourcePolicies are very simple, and there are quite a lot of them. Each can only list a single group, a single action, and a single object. So each object will likely have several policies, and if multiple groups share permissions for actions on an object, each group will get its own policy. (It's a good thing they're small.) + +Special Groups+ +All users are assumed to be part of the public group (ID=0.) DSpace admins (ID=1) are automatically part of all groups, much like super-users in the Unix OS. The Context object also carries around a List of special groups, which are also first checked for membership. These special groups are used at MIT to indicate membership in the MIT community, something that is very difficult to enumerate in the database! When a user logs in with an MIT certificate or with an MIT IP address, the login code adds this MIT user group to the user's Context. + + +Miscellaneous Authorization Notes+ +Where do items get their read policies? From the their collection's read policy. There once was a separate item read default policy in each collection, and perhaps there will be again since it appears that administrators are notoriously bad at defining collection's read policies. There is also code in place to enable policies that are timed‚ have a start and end date. However, the admin tools to enable these sorts of policies have not been written. + + + +Handle Manager/Handle Plugin+ +The org.dspace.handle package contains two classes; HandleManager is used to create and look up Handles, and HandlePlugin is used to expose and resolve DSpace Handles for the outside world via the CNRI Handle Server code. + +Handles are stored internally in the handle database table in the form: + +1721.123/4567 + +Typically when they are used outside of the system they are displayed in either URI or "URL proxy" forms: +
+ hdl:1721.123/4567
+http://hdl.handle.net/1721.123/4567
+It is the responsibility of the caller to extract the basic form from whichever displayed form is used. + +The handle table maps these Handles to resource type/resource ID pairs, where resource type is a value from org.dspace.core.Constants and resource ID is the internal identifier (database primary key) of the object. This allows Handles to be assigned to any type of object in the system, though as explained in the functional overview, only communities, collections and items are presently assigned Handles. + +HandleManager contains static methods for: + +
Note that since the Handle server runs as a separate JVM to the DSpace Web applications, it uses a separate 'Log4J' configuration, since Log4J does not support multiple JVMs using the same daily rolling logs. This alternative configuration is located at [dspace]/config/log4j-handle-plugin.properties. The [dspace]/bin/start-handle-server script passes in the appropriate command line parameters so that the Handle server uses this configuration. + + +Search+ +DSpace's search code is a simple API which currently wraps the Lucene search engine. The first half of the search task is indexing, and org.dspace.search.DSIndexer is the indexing class, which contains indexContent() which if passed an Item, Community, or Collection, will add that content's fields to the index. The methods unIndexContent() and reIndexContent() remove and update content's index information. The DSIndexer class also has a main() method which will rebuild the index completely. This can be invoked by the dspace/bin/index-init (complete rebuild) or dspace/bin/index-update (update) script. The intent was for the main() method to be invoked on a regular basis to avoid index corruption, but we have had no problem with that so far. + +Which fields are indexed by DSIndexer? These fields are defined in dspace.cfg in the section "Fields to index for search" as name-value-pairs. The name must be unique in the form search.index.i (i is an arbitrary positive number). The value on the right side has a unique value again, which can be referenced in search-form (e.g. title, author). Then comes the metadata element which is indexed. '*' is a wildcard which includes all sub elements. For example: + +
+ search.index.4 = keyword:dc.subject.*+ tells the indexer to create a keyword index containing all dc.subject element values. Since the wildcard ('*') character was used in place of a qualifier, all subject metadata fields will be indexed (e.g. dc.subject.other, dc.subject.lcsh, etc) + +By default, the fields shown in the Indexed Fields section below are indexed. These are hardcoded in the DSIndexer class. If any search.index.i items are specified in dspace.cfg these are used rather than these hardcoded fields. + +The query class DSQuery contains the three flavors of doQuery() methods‚ one searches the DSpace site, and the other two restrict searches to Collections and Communities. The results from a query are returned as three lists of handles; each list represents a type of result. One list is a list of Items with matches, and the other two are Collections and Communities that match. This separation allows the UI to handle the types of results gracefully without resolving all of the handles first to see what kind of content the handle points to. The DSQuery class also has a main() method for debugging via command-line searches. + +Current Lucene Implementation+ +Currently we have our own Analyzer and Tokenizer classes (DSAnalyzer and DSTokenizer) to customize our indexing. They invoke the stemming and stop word features within Lucene. We create an IndexReader for each query, which we now realize isn't the most efficient use of resources - we seem to run out of filehandles on really heavy loads. (A wildcard query can open many filehandles!) Since Lucene is thread-safe, a better future implementation would be to have a single Lucene IndexReader shared by all queries, and then is invalidated and re-opened when the index changes. Future API growth could include relevance scores (Lucene generates them, but we ignore them,) and abstractions for more advanced search concepts such as booleans. + + +Indexed Fields+ +The DSIndexer class shipped with DSpace indexes the Dublin Core metadata in the following way: +
+
+
+
+
+
Harvesting API+ +The org.dspace.search package also provides a 'harvesting' API. This allows callers to extract information about items modified within a particular timeframe, and within a particular scope (all of DSpace, or a community or collection.) Currently this is used by the Open Archives Initiative metadata harvesting protocol application, and the e-mail subscription code. + +The Harvest.harvest is invoked with the required scope and start and end dates. Either date can be omitted. The dates should be in the ISO8601, UTC time zone format used elsewhere in the DSpace system. + +HarvestedItemInfo objects are returned. These objects are simple containers with basic information about the items falling within the given scope and date range. Depending on parameters passed to the harvest method, the containers and item fields may have been filled out with the IDs of communities and collections containing an item, and the corresponding Item object respectively. Electing not to have these fields filled out means the harvest operation executes considerable faster. + +In case it is required, Harvest also offers a method for creating a single HarvestedItemInfo object, which might make things easier for the caller. + + + +Browse API+ +The browse API maintains indexes of dates, authors, titles and subjects, and allows callers to extract parts of these: + +
Using the API+ +The API is generally invoked by creating a BrowseScope object, and setting the parameters for which particular part of an index you want to extract. This is then passed to the relevant Browse method call, which returns a BrowseInfo object which contains the results of the operation. The parameters set in the BrowseScope object are: + +
To illustrate, here is an example: + +
The results of invoking Browse.getItemsByTitle with the above parameters might look like this: + +
+ Rabble-Rousing Rabbis From Sardinia + Reality TV: Love It or Hate It? +FOCUS> The Really Exciting Research Video + Recreational Housework Addicts: Please Visit My House + Regional Television Variation Studies + Revenue Streams + Ridiculous Example Titles: I'm Out of Ideas+ Note that in the case of title and date browses, Item objects are returned as opposed to actual titles. In these cases, you can specify the 'focus' to be a specific item, or a partial or full literal value. In the case of a literal value, if no entry in the index matches exactly, the closest match is used as the focus. It's quite reasonable to specify a focus of a single letter, for example. + +Being able to specify a specific item to start at is particularly important with dates, since many items may have the save issue date. Say 30 items in a collection have the issue date 2002. To be able to page through the index 20 items at a time, you need to be able to specify exactly which item's 2002 is the focus of the browse, otherwise each time you invoked the browse code, the results would start at the first item with the issue date 2002. + +Author browses return String objects with the actual author names. You can only specify the focus as a full or partial literal String. + +Another important point to note is that presently, the browse indexes contain metadata for all items in the main archive, regardless of authorization policies. This means that all items in the archive will appear to all users when browsing. Of course, should the user attempt to access a non-public item, the usual authorization mechanism will apply. Whether this approach is ideal is under review; implementing the browse API such that the results retrieved reflect a user's level of authorization may be possible, but rather tricky. + + +Index Maintenance+ +The browse API contains calls to add and remove items from the index, and to regenerate the indexes from scratch. In general the content management API invokes the necessary browse API calls to keep the browse indexes in sync with what is in the archive, so most applications will not need to invoke those methods. + +If the browse index becomes inconsistent for some reason, the InitializeBrowse class is a command line tool (generally invoked using the [dspace]/bin/dspace index-init command) that causes the indexes to be regenerated from scratch. + + +Caveats+ +Presently, the browse API is not tremendously efficient. 'Indexing' takes the form of simply extracting the relevant Dublin Core value, normalizing it (lower-casing and removing any leading article in the case of titles), and inserting that normalized value with the corresponding item ID in the appropriate browse database table. Database views of this table include collection and community IDs for browse operations with a limited scope. When a browse operation is performed, a simple SELECT query is performed, along the lines of: +
+ SELECT item_id FROM ItemsByTitle ORDER BY sort_title OFFSET 40 LIMIT 20+ There are two main drawbacks to this: Firstly, LIMIT and OFFSET are PostgreSQL-specific keywords. Secondly, the database is still actually performing dynamic sorting of the titles, so the browse code as it stands will not scale particularly well. The code does cache BrowseInfo objects, so that common browse operations are performed quickly, but this is not an ideal solution. + + + +Checksum checker+ +Checksum checker is used to verify every item within DSpace. While DSpace calculates and records the checksum of every file submitted to it, the checker can determine whether the file has been changed. The idea being that the earlier you can identify a file has changed, the more likely you would be able to record it (assuming it was not a wanted change). + +org.dspace.checker.CheckerCommand class, is the class for the checksum checker tool, which calculates checksums for each bitstream whose ID is in the most_recent_checksum table, and compares it against the last calculated checksum for that bitstream. + + +OpenSearch Support+ +DSpace is able to support OpenSearch. For those not acquainted with the standard, a very brief introduction, with emphasis on what possibilities it holds for current use and future development. + +OpenSearch is a small set of conventions and documents for describing and using 'search engines', meaning any service that returns a set of results for a query. It is nearly ubiquitous‚ but also nearly invisible‚ in modern web sites with search capability. If you look at the page source of Wikipedia, Facebook, CNN, etc you will find buried a link element declaring OpenSearch support. It is very much a lowest-common-denominator abstraction (think Google box), but does provide a means to extend its expressive power. This first implementation for DSpace supports none of these extensions‚ many of which are of potential value‚ so it should be regarded as a foundation, not a finished solution. So the short answer is that DSpace appears as a 'search-engine' to OpenSearch-aware software. + +Another way to look at OpenSearch is as a RESTful web service for search, very much like SRW/U, but considerably simpler. This comparative loss of power is offset by the fact that it is widely supported by web tools and players: browsers understand it, as do large metasearch tools. + + + +How Can It Be Used + +
Configuration is through the dspace.cfg file. See OpenSearch Support for more details. + + +Embargo Support+ +What is an Embargo?+ +An embargo is a temporary access restriction placed on content, commencing at time of accession. It's scope or duration may vary, but the fact that it eventually expires is what distinguishes it from other content restrictions. For example, it is not unusual for content destined for DSpace to come with permanent restrictions on use or access based on license-driven or other IP-based requirements that limit access to institutionally affiliated users. Restrictions such as these are imposed and managed using standard administrative tools in DSpace, typically by attaching specific policies to Items or Collections, Bitstreams, etc. The embargo functionally introduced in 1.6, however, includes tools to automate the imposition and removal of restrictions in managed timeframes. + + +Embargo Model and Life-Cycle+ +Functionally, the embargo system allows you to attach 'terms' to an item before it is placed into the repository, which express how the embargo should be applied. What do 'we mean by terms' here? They are really any expression that the system is capable of turning into (1) the time the embargo expires, and (2) a concrete set of access restrictions. Some examples:
|
+
![]() |
+
Document generated by Confluence on Mar 25, 2011 19:21 | +
+
+
+ DSpace Documentation : Configuration
+
+
+
+ This page last changed on Mar 16, 2011 by helix84.
+
+
+ DSpace System Documentation: Configuration+ +There are a numbers of ways in which DSpace may be configured and/or customized. This chapter of the documentation will discuss the configuration of the software and will also reference customizations that may be performed in the chapter following. + +For ease of use, the Configuration documentation is broken into several parts: + +
The full table of contents follows: +
+
+
+
+
General Configuration+ +In the following sections you will learn about the different configuration files that you will need to edit so that you may make your DSpace installation work. Of the several configuration files which you will work with, it is the dspace.cfg file you need to learn to configure first and foremost. + +In general, most of the configuration files, namely dspace.cfg and xmlui.xconf will provide a good source of information not only with configuration but also with customization (cf. Customization chapters) + +Input Conventions+ +We will use the dspace.cfg as our example for input conventions used throughout the system. It is a basic Java properties file, where lines are either comments, starting with a '#', blank lines, or property/value pairs of the form: + +property.name = property value + +Some property defaults are "commented out". That is, they have a "#" preceding them, and the DSpace software ignores the config property. This may cause the feature not to be enabled, or, cause a default property to be used when the software is compiled and updated. + +The property value may contain references to other configuration properties, in the form ${property.name}. This follows the ant convention of allowing references in property files. A property may not refer to itself. Examples: + +
+ property.name = word1 ${other.property.name} more words
+property2.name = ${dspace.dir}/rest/of/path
+
+Property values can include other, previously defined values, by enclosing the property name in ${...}. For example, if your dspace.cfg contains: + +
+ dspace.dir = /dspace +dspace.history = ${dspace.dir}/history+ Then the value of dspace.history property is expanded to be /dspace/history. This method is especially useful for handling commonly used file paths. + + +Update Reminder+ +Things you should know about editing dspace.cfg files.
To keep the two files in synchronization, you can edit your files in [dspace-source]/dspace/config/ and then you would run the following commands: + +
+ cd [dspace-source]/dspace/target/dspace-<version>-build.dir ant update_configs+ This will copy the source dspace.cfg (along with other configuration files) into the runtime ([dspace]/config) directory. + +You should remember that after editing your configuration file(s), and you are done and wish to implement the changes, you will need to: + +
The dspace.cfg Configuration Properties File+ +The primary way of configuring DSpace is to edit the dspace.cfg. You will definitely have to do this before you can run DSpace properly. dspace.cfg contains basic information about a DSpace installation, including system path information, network host information, and other like items. To assist you in this endeavor, below is a place for you to write down some of the preliminary data so that you may facilitate faster configuration. +
The dspace.cfg file+ +Below is a brief "Properties" table for the dspace.cfg file and the documented details are referenced. Please refer to those sections for the complete details of the parameter you are working with. +
+
+
+
+
+
Main DSpace Configurations+ +
+
+
+
+
+
DSpace Database Configuration+ +Many of the database configurations are software-dependent. That is, it will be based on the choice of database software being used. Currently, DSpace properly supports PostgreSQL and Oracle. +
+
+
+
+
+
DSpace Email Settings+ +The configuration of email is simple and provides a mechanism to alert the person(s) responsible for different features of the DSpace software. +
+
+
+
+
Wording of E-mail Messages+ +Sometimes DSpace automatically sends e-mail messages to users, for example, to inform them of a new work flow task, or as a subscription e-mail alert. The wording of emails can be changed by editing the relevant file in [dspace]/config/emails . Each file is commented. Be careful to keep the right number 'placeholders' (e.g.{2}). + +Note: You should replace the contact-information "dspace-help@myu.edu or call us at xxx-555-xxxx" with your own contact details in: File Storage+ +DSpace supports two distinct options for storing your repository bitstreams (uploaded files). The files are not stored in the database in which Metadata, user information, ... are stored. An assetstore is a directory on your server, on which the bitstreams are stored and consulted afterwards. The usage of different assetstore directories is the default "technique" in DSpace. The parameters below define which assetstores are present, and which one should be used for newly incoming items. As an alternative, DSpace can also use SRB (Storage Resource Brokerage) as an alternative. See SRB File Storage for details regarding SRB. +
+
+
+
+
assetstore.dir = /storevgm/assetstore
+
SRB (Storage Resource Brokerage) File Storage+ +An alternate to using the default storage framework is to use Storage Resource Brokerage (SRB). This can provide a different level of storage and disaster recovery. (Storage can take place on storage that is off-site.) Refer to http://www.sdsc.edu/srb/index.php/Main_Page for complete details regarding SRB. + +The same framework is used to configure SRB storage. That is, the asset store number (0..n) can reference a file system directory as above or it can reference a set of SRB account parameters. But any particular asset store number can reference one or the other but not both. This way traditional and SRB storage can both be used but with different asset store numbers. The same cautions mentioned above apply to SRB asset stores as well. The particular asset store a bitstream is stored in is held in the database, so don't move bitstreams between asset stores, and do not renumber them. +
+
+
+
+
The 'assetstore.incoming' property is an integer that references where new bitstreams will be stored. The default (say the starting reference) is zero. The value will be used to identify the storage where all new bitstreams will be stored until this number is changed. This number is stored in the Bitstream table (store_number column) in the DSpace database, so older bitstreams that may have been stored when 'asset.incoming' had a different value can be found. + +In the simple case in which DSpace uses local (or mounted) storage the number can refer to different directories (or partitions). This gives DSpace some level of scalability. The number links to another set of properties 'assetstore.dir', 'assetstore.dir.1' (remember zero is default), assetstore.dir.2', etc., where the values are directories. + +To support the use of SRB DSpace uses the same scheme but broaden to support: + +
If SRB is chosen from the first install of DSpace, it is suggested that 'assetstore.dir' (no integer appended) be retained to reference a local directory (as above under File Storage) because build.xml uses this value to do a mkdir. In this case, 'assetstore.incoming' can be set to 1 (i.e. uncomment the line in File Storage above) and the 'assetstore.dir' will not be used. + +Logging Configuration+ +
+
+
+
+
Previous releases of DSpace provided an example ${dspace.dir}/config/log4j.xml as an alternative to log4j.properties. This caused some confusion and has been removed. log4j continues to support both Properties and XML forms of configuration, and you may continue (or begin) to use any form that log4j supports. + + +Configuring Lucene Search Indexes+ +Search indexes can be configured and customized easily in the dspace.cfg file. This allows institutions to choose which DSpace metadata fields are indexed by Lucene. +
+
+
+
+
For example, the following entries appear in the default DSpace installation: The format of each entry is search.index.<id> = <search label> : <schema> . <metadata field> where: +
+
+
+
+
In the example above, search.index.1 and search.index.2 and search.index.3 are configured as the author search field. The author index is created by Lucene indexing all dc.contributor.*,dc.creator.* and description.statementofresponsibility metadata fields. + +After changing the configuration run /[dspace]/bin/dspace index-init to regenerate the indexes. + +While the indexes are created, this only affects the search results and has no effect on the search components of the user interface. One will need to customize the user interface to reflect the changes, for example, to add the a new search category to the Advanced Search. + +In the above examples, notice the asterisk (*). The metadata field (at least for Dublin Core) is made up of the "element" and the "qualifier". The asterisk is used as the "wildcard". So, for example, keyword.dc.subject.* will index all subjects regardless if the term resides in a qualified field. (subject versus subject.lcsh). One could customize the search and only index LCSH (Library of Congress Subject Headings) with the following entry keyword:dc.subject.lcsh instead of keyword:dc.subject.* + +Authority Control Note: + +Although DSIndexer automatically builds a separate index for the authority keys of any index that contains authority-controlled metadata fields, the "Advanced Search" UIs does not allow direct access to it. Perhaps it will be added in the future. Fortunately, the OpenSearch API lets you submit a query directly to the Lucene search engine, and this may include the authority-controlled indexes. + + +Handle Server Configuration+ +The CNRI Handle system is a 3rd party service for maintaining persistent URL's. For a nominal fee, you can register a handle prefix for your repository. As a result, your repository items will be also available under the links http://handle.net/<<handle prefix>>/<<item id>>. As the base url of your repository might change or evolve, the persistent handle.net URL's secure the consistency of links to your repository items. For complete information regarding the Handle server, the user should consult Section 3.4.4.. The Handle Server section of Installing DSpace. +
+
+
+
+
For complete information regarding the Handle server, the user should consult 3.3.4. The Handle Server section of Installing DSpace. + + +Delegation Administration : Authorization System Configuration+ +(Authorization System Configuration) Authorization to execute the functions that are allowed to user with WRITE permission on an object will be attributed to be the ADMIN of the object (e.g. community/collection/admin will be always allowed to edit metadata of the object). The default will be "true" for all the configurations. +
+
+
+
+
Oracle users should consult Chapter 4 Updating a DSpace Installation regarding the necessary database changes that need to take place. + + +Stackable Authentication Method(s)+ +(formally Custom Authentication) + +Since many institutions and organizations have existing authentication systems, DSpace has been designed to allow these to be easily integrated into an existing authentication infrastructure. It keeps a series, or "stack", of authentication methods, so each one can be tried in turn. This makes it easy to add new authentication methods or rearrange the order without changing any existing code. You can also share authentication code with other sites. +
The configuration property plugin.sequence.org.dspace.authenticate.AuthenticationMethod defines the authentication stack. It is a comma-separated list of class names. Each of these classes implements a different authentication method, or way of determining the identity of the user. They are invoked in the order specified until one succeeds. + +An authentication method is a class that implements the interface org.dspace.authenticate.AuthenticationMethod. It authenticates a user by evaluating the credentials (e.g. username and password) he or she presents and checking that they are valid. + +The basic authentication procedure in the DSpace Web UI is this: + +
Shibboleth Authentication Configuration Settings+ +Detailed instructions for installing Shibboleth on DSpace may be found at https://mams.melcoe.mq.edu.au/zope/mams/pubs/Installation/dspace15. + +DSpace requires email as the user's credentials. There are two ways of providing email to DSpace: + +
Authentication by Password+ +The default method org.dspace.authenticate.PasswordAuthentication has the following properties: + +
X.509 Certificate Authentication+ +The X.509 authentication method uses an X.509 certificate sent by the client to establish his/her identity. It requires the client to have a personal Web certificate installed on their browser (or other client software) which is issued by a Certifying Authority (CA) recognized by the web server. + +
Example of a Custom Authentication Method+ +Also included in the source is an implementation of an authentication method used at MIT, edu.mit.dspace.MITSpecialGroup. This does not actually authenticate a user, it only adds the current user to a special (dynamic) group called 'MIT Users' (which must be present in the system!). This allows us to create authorization policies for MIT users without having to manually maintain membership of the MIT users group. + +By keeping this code in a separate method, we can customize the authentication process for MIT by simply adding it to the stack in the DSpace configuration. None of the code has to be touched. + +You can create your own custom authentication method and add it to the stack. Use the most similar existing method as a model, e.g. org.dspace.authenticate.PasswordAuthentication for an "explicit" method (with credentials entered interactively) or org.dspace.authenticate.X509Authentication for an implicit method. + + +Configuring IP Authentication+ +You can enable IP authentication by adding its method to the stack in the DSpace configuration, e.g.: +
+ plugin.sequence.org.dspace.authenticate.AuthenticationMethod = org.dspace.authenticate.IPAuthentication+ You are then able to map DSpace groups to IP addresses in dspace.cfg by setting authentication.ip.GROUPNAME = iprange[, iprange ...], e.g: +
+ authentication.ip.MY_UNIVERSITY = 10.1.2.3, \ # Full IP + 13.5, \ # Partial IP + 11.3.4.5/24, \ # with CIDR + 12.7.8.9/255.255.128.0, # with netmask + 2001:18e8::/32 # IPv6 too+ Negative matches can be set by prepending the entry with a '-'. For example if you want to include all of a class B network except for users of a contained class c network, you could use: 111.222,-111.222.333. + +Notes: +
Configuring LDAP Authentication+ +You can enable LDAP authentication by adding its method to the stack in the DSpace configuration, e.g. +
+ plugin.sequence.org.dspace.authenticate.AuthenticationMethod = + org.dspace.authenticate.LDAPAuthentication+ If LDAP is enabled in the dspace.cfg file, then new users will be able to register by entering their username and password without being sent the registration token. If users do not have a username and password, then they can still register and login with just their email address the same way they do now. + +If you want to give any special privileges to LDAP users, create a stackable authentication method to automatically put people who have a netid into a special group. You might also want to give certain email addresses special privileges. Refer to the Custom Authentication Code section above for more information about how to do this. + +Here is an explanation of what each of the different configuration parameters are for: +
+
+
+
+
Hierarchical LDAP Settings. If your users are spread out across a hierarchical tree on your LDAP server, you will need to use the following stackable authentication class: +
+ plugin.sequence.org.dspace.authenticate.AuthenticationMethod = \ + org.dspace.authenticate.LDAPHierarchicalAuthentication+ You can optionally specify the search scope. If anonymous access is not enabled on your LDAP server, you will need to specify the full DN and password of a user that is allowed to bind in order to search for the users. +
+
+
+
+
+
+
Restricted Item Visibility Settings+ +By default RSS feeds, OAI-PMH and subscription emails will include ALL items regardless of permissions set on them. If you wish to only expose items through these channels where the ANONYMOUS user is granted READ permission, then set the following options to false. + +In large repositories, setting harvest.includerestricted.oai to false may cause performance problems as all items will need to have their authorization permissions checked, but because DSpace has not implemented resumption tokens in ListIdentifiers, ALL items will need checking whenever a ListIdentifers request is made. +
+
+
+
+
+
Proxy Settings+ +These settings for proxy are commented out by default. Uncomment and specify both properties if proxy server is required for external http requests. Use regular host name without port number. +
+
+
+
+
+
Configuring Media Filters+ +Media or Format Filters are classes used to generate derivative or alternative versions of content or bitstreams within DSpace. For example, the PDF Media Filter will extract textual content from PDF bitstreams, the JPEG Media Filter can create thumbnails from image bitstreams. + +Media Filters are configured as Named Plugins, with each filter also having a separate configuration setting (in dspace.cfg) indicating which formats it can process. The default configuration is shown below. +
+
+
+
+
Names are assigned to each filter using the plugin.named.org.dspace.app.mediafilter.FormatFilter field (e.g. by default the PDFilter is named "PDF Text Extractor". + +Finally, the appropriate filter.<class path>.inputFormats defines the valid input formats which each filter can be applied. These format names must match the short description field of the Bitstream Format Registry. + +You can also implement more dynamic or configurable Media/Format Filters which extend SelfNamedPlugin . + + +Crosswalk and Packager Plugin Settings+ +The subsections below give configuration details based on the types of crosswalks and packager plugins you need to implement. + +Configurable MODS Dissemination Crosswalk+ +The MODS crosswalk is a self-named plugin. To configure an instance of the MODS crosswalk, add a property to the DSpace configuration starting with "crosswalk.mods.properties."; the final word of the property name becomes the plugin's name. For example, a property name crosswalk.mods.properties.MODS defines a crosswalk plugin named "MODS". + +The value of this property is a path to a separate properties file containing the configuration for this crosswalk. The pathname is relative to the DSpace configuration directory, i.e. the config subdirectory of the DSpace install directory. Example from the dspace.cfg file: +
+
+
+
+
The MODS crosswalk properties file is a list of properties describing how DSpace metadata elements are to be turned into elements of the MODS XML output document. The property name is a concatenation of the metadata schema, element name, and optionally the qualifier. For example, the contributor.author element in the native Dublin Core schema would be: dc.contributor.author. The value of the property is a line containing two segments separated by the vertical bar ("|"_): The first part is an XML fragment which is copied into the output document. The second is an XPath expression describing where in that fragment to put the value of the metadata element. For example, in this property: +
+ dc.contributor.author = <mods:name>
+ <mods:role>
+ <mods:roleTerm type="text">author</mods:roleTerm>
+ </mods:role>
+ <mods:namePart>%s</mods:namePart>
+ </mods:name>
+Some of the examples include the string "%s" in the prototype XML where the text value is to be inserted, but don't pay any attention to it, it is an artifact that the crosswalk ignores. For example, given an author named Jack Florey, the crosswalk will insert +
+ <mods:name>
+ <mods:role>
+ <mods:roleTerm type="text">author</mods:roleTerm>
+ </mods:role>
+ <mods:namePart>Jack Florey</mods:namePart>
+</mods:name>
+into the output document. Read the example configuration file for more details. + + +XSLT-based Crosswalks+ +The XSLT crosswalks use XSL stylesheet transformation (XSLT) to transform an XML-based external metadata format to or from DSpace's internal metadata. XSLT crosswalks are much more powerful and flexible than the configurable MODS and QDC crosswalks, but they demand some esoteric knowledge (XSL stylesheets). Given that, you can create all the crosswalks you need just by adding stylesheets and configuration lines, without touching any of the Java code. + +The default settings in the dspace.cfg file for submission crosswalk: +
+
+
+
+
As shown above, there are three (3) parts that make up the properties "key": +
+ crosswalk.submissionPluginName.stylesheet = + 1 2 3 4+ crosswalk first part of the property key. You can make two different plugin names point to the same crosswalk, by adding two configuration entries with the same path: +
+ crosswalk.submission.MyFormat.stylesheet = crosswalks/myformat.xslt + crosswalk.submission.almost_DC.stylesheet = crosswalks/myformat.xslt+ The dissemination crosswalk must also be configured with an XML Namespace (including prefix and URI) and an XML schema for its output format. This is configured on additional properties in the DSpace configuration: +
+ crosswalk.dissemination.PluginName.namespace.Prefix = namespace-URI + crosswalk.dissemination.PluginName.schemaLocation = schemaLocation value+ For example: +
+ crosswalk.dissemination.qdc.namespace.dc = http://purl.org/dc/elements/1.1/ + crosswalk.dissemination.qdc.namespace.dcterms = http://purl.org/dc/terms/ + crosswalk.dissemination.qdc.schemalocation = http://purl.org/dc/elements/1.1/ \ + http://dublincore.org/schemas/xmls/qdc/2003/04/02/qualifieddc.xsd+ Testing XSLT Crosswalks+ +The XSLT crosswalks will automatically reload an XSL stylesheet that has been modified, so you can edit and test stylesheets without restarting DSpace. You can test a dissemination crosswalk by hooking it up to an OAI-PMH crosswalk and using an OAI request to get the metadata for a known item. + +Testing the submission crosswalk is more difficult, so we have supplied a command-line utility to help. It calls the crosswalk plugin to translate an XML document you submit, and displays the resulting intermediate XML (DIM). Invoke it with: +
+ [dspace]/bin/dsrun + org.dspace.content.crosswalk.XSLTIngestionCrosswalk [-l] plugin input-file+ where plugin is the name of the crosswalk plugin to test (e.g. "LOM"), and input-file is a file containing an XML document of metadata in the appropriate format. + +Add the -l option to pass the ingestion crosswalk a list of elements instead of a whole document, as if the List form of the ingest() method had been called. This is needed to test ingesters for formats like DC that get called with lists of elements instead of a root element. + + + +Configurable Qualified Dublin Core (QDC) dissemination crosswalk+ +The QDC crosswalk is a self-named plugin. To configure an instance of the QDC crosswalk, add a property to the DSpace configuration starting with "crosswalk.qdc.properties."; the final word of the property name becomes the plugin's name. For example, a property name crosswalk.qdc.properties.QDC defines a crosswalk plugin named "QDC". + +The following is from dspace.cfg file: +
+
+
+
+
In the property key "crosswalk.qdc.properties.QDC" the value of this property is a path to a separate properties file containing the configuration for this crosswalk. The pathname is relative to the DSpace configuration directory /[dspace]/config . Referring back to the "Example Value" for this property key, one has crosswalks/qdc.properties which defines a crosswalk named QDC whose configuration comes from the file [dspace]/config/crosswalks/qdc.properties . + +You will also need to configure the namespaces and schema location strings for the XML output generated by this crosswalk. The namespaces properties names are formatted: + +crosswalk.qdc.namespace.prefix = uri + +where prefix is the namespace prefix and uri is the namespace URI. See the above Property and Example Value keys as the default dspace.cfg has been configured. + +The QDC crosswalk properties file is a list of properties describing how DSpace metadata elements are to be turned into elements of the Qualified DC XML output document. The property name is a concatenation of the metadata schema, element name, and optionally the qualifier. For example, the contributor.author element in the native Dublin Core schema would be: dc.contributor.author . The value of the property is an XML fragment, the element whose value will be set to the value of the metadata field in the property key. + +For example, in this property: + +dc.coverage.temporal = <dcterms:temporal /> + +the generated XML in the output document would look like, e.g.: Configuring Crosswalk Plugins+ +Ingestion crosswalk plugins are configured as named or self-named plugins for the interface org.dspace.content.crosswalk.IngestionCrosswalk. Dissemination crosswalk plugins are configured as named or self-named plugins for the interface org.dspace.content.crosswalk.DisseminationCrosswalk. + +You can add names for existing crosswalks, add new plugin classes, and add new configurations for the configurable crosswalks as noted below. + + +Configuring Packager Plugins+ +Package ingester plugins are configured as named or self-named plugins for the interface org.dspace.content.packager.PackageIngester . Package disseminator plugins are configured as named or self-named plugins for the interface org.dspace.content.packager.PackageDisseminator . + +You can add names for the existing plugins, and add new plugins, by altering these configuration properties. See the Plugin Manager architecture for more information about plugins. + + + +Event System Configuration+ +If you are unfamiliar with the Event System in DSpace, and require additional information with terms like "Consumer" and "Dispatcher" please refer to:http://wiki.dspace.org/index.php/EventSystemPrototype + +
+
+
+
+
+
Embargo+ +DSpace embargoes utilize standard metadata fields to hold both the 'terms' and the 'lift date'. Which fields you use are configurable, and no specific metadata element is dedicated or predefined for use in embargo. Rather, you specify exactly what field you want the embargo system to examine when it needs to find the terms or assign the lift date. +
+
+
+
+
Key Recommendations: + +
After the fields defined for terms and lift date have been assigned in dspace.cfg, and created and configured wherever they will be used, you can begin to embargo items simply by entering data (dates, if using the default setter) in the terms field. They will automatically be embargoed as they exit workflow. For the embargo to be lifted on any item, however, a new administrative procedure must be added: the 'embargo lifter' must be invoked on a regular basis. This task examines all embargoed items, and if their 'lift date' has passed, it removes the access restrictions on the item. Good practice dictates automating this procedure using cron jobs or the like, rather than manually running it. The lifter is available as a target of the 1.6 DSpace launcher: see Section 8. + + +Extending Embargo Functionality+ +The 1.6 Embargo system supplies a default 'interpreter/imposition' class (the 'Setter') as well as a 'Lifter', but they are fairly rudimentary in several aspects. + +
Step-by-Step Setup Examples+ +
Now add a new property called 'embargo.terms.days' as follows: +
+ # DC metadata field to hold computed "lift date" of embargo
+ embargo.terms.days = 90 days:90, 6 months:180, 1 year:365
+
Checksum Checker Settings+ +DSpace now comes with a Checksum Checker script ([dspace]/bin/dspace checker) which can be scheduled to verify the checksum of every item within DSpace. Since DSpace calculates and records the checksum of every file submitted to it, this script is able to determine whether or not a file has been changed (either manually or by some sort of corruption or virus). The idea being that the earlier you can identify a file has changed, the more likely you'd be able to recover it (assuming it was not a wanted change). +
+
+
+
+
+
Item Export and Download Settings+ +It is possible for an authorized user to request a complete export and download of a DSpace item in a compressed zip file. This zip file may contain the following: The configuration settings control several aspects of this feature: +
+
+
+
+
+
Subscription Emails+ +DSpace, through some advanced installation and setup, is able to send out an email to collections that a user has subscribed. The user who is subscribed to a collection is emailed each time an item id added or modified. The following property key controls whether or not a user should be notified of a modification. +
+
+
+
+
+
Batch Metadata Editing+ +The following configurations allow the administrator extract from the DSpace database a set of records for editing by a metadata export. It provides an easier way of editing large collections. +
+
+
+
+
+
Hiding Metadata+ +It is now possible to hide metadata from public consumption that is only available to the Administrator. +
+
+
+
+
+
Settings for the Submission Process+ +These settings control two aspects of the submission process: thesis submission permission and whether or not a bitstream file is required when submitting to a collection. +
+
+
+
+
+
Configuring Creative Commons License+ +This enables the Creative Commons license step in the submission process of the JSP User Interface (JSPUI). Submitters are given an opportunity to select a Creative Common license to accompany the item. Creative Commons license govern the use of the content. For further details, refer to the Creative Commons website at http://creativecommons.org . +
+
+
+
+
+
WEB User Interface Configurations+ +General Web User Interface Configurations
+
+
+
+
+
Browse Index Configuration+ +The browse indexes for DSpace can be extensively configured. This section of the configuration allows you to take control of the indexes you wish to browse, and how you wish to present the results. The configuration is broken into several parts: defining the indexes, defining the fields upon which users can sort results, defining truncation for potentially long fields (e.g. authors), setting cross-links between different browse contexts (e.g. from an author's name to a complete list of their items), how many recent submissions to display, and configuration for item mapping browse. +
+
+
+
+
Defining the Indexes.+ +DSpace arrives with four default indexes already defined: author, title, date issued, and subjects. Users may also define additional indexes or re-configure the current indexes for different levels of specificity. For example, the default entries that appear in the dspace.cfg as default installation: +
+ webui.browse.index.1 = dateissued:metadata:dc.date.issued:date:full +webui.browse.index.2 = author:metadata:dc.contributor.*:text +webui.browse.index.3 = title:metadata:dc.title:title:full +webui.browse.index.4 = subject:metadata:dc.subject.*:text +#webui.browse.index.5 = dateaccessioned:item:dateaccessioned+ The format of each entry is webui.browse.index.<n> = <index name>:<metadata>:<schema prefix>.<element>.<qualifier>:<data-type field>:<sort option>. Please notice that the punctuation is paramount in typing this property key in the dspace.cfg file. The following table explains each element: + +
+
+
+
+
If you are customizing this list beyond the default, you will need to insert the text you wish to appear in the navigation and on link and buttons. You need to edit the Messages.properties file. The form of the parameter(s) in the file: Defining Sort Options+ +Sort options will be available when browsing a list of items (i.e. only in "full" mode, not "single" mode). You can define an arbitrary number of fields to sort on, irrespective of which fields you display using web.itemlist.columns. For example, the default entries that appear in the dspace.cfg as default installation: +
+ webui.itemlist.sort-option.1 = title:dc.title:title +webui.itemlist.sort-option.2 = dateissued:dc.date.issued:date +webui.itemlist.sort-option.3 = dateaccessioned:dc.date.accessioned:date+ The format of each entry is web.browse.sort-option.<n> = <option name>:<schema prefix>.<element>.<qualifier>:<datatype>. Please notice the punctuation used between the different elements. The following table explains the each element: +
+
+
+
+
+
Browse Index Normalization Rule Configuration+ +Normalization Rules are those rules that make it possible for the indexes to intermix entries without regard to case sensitivity. By default, the display of metadata in the browse indexes are case-sensitive. In the example below, you retrieve separate entries:
+
+
+
+
At the present time, you would need to edit your metadata to clean up the index presentation. + + +Other Browse Options+ +We set other browse values in the following section. +
+
+
+
+
+
Browse Index Authority Control Configuration+ +
+
+
+
+
Author (Multiple metadata value) Display+ +This section actually applies to any field with multiple values, but authors are the define case and example here. +
+
+
+
+
Replace dc.contributor.* with another field if appropriate. The field should be listed in the configuration for webui.itemlist.columns, otherwise you will not see its effect. It must also be defined in webui.itemlist.columns as being of the datatype text otherwise the functionality will be overridden by the specific data type feature. (This setting is not used by the XMLUI as it is controlled by your theme). + +Now that we know which field is our author or other multiple metadata value field we can provide the option to truncate the number of values displayed by default. We replace the remaining list of values with "et al" or the language pack specific alternative. Note that this is just for the default, and users will have the option of changing the number displayed when they browse the results. See the following table: +
+
+
+
+
+
Links to Other Browse Contexts+ +We can define which fields link to other browse listings. This is useful, for example, to link an author's name to a list of just that author's items. The effect this has is to create links to browse views for the item clicked on. If it is a "single" type, it will link to a view of all the items which share that metadata element in common (i.e. all the papers by a single author). If it is a "full" type, it will link to a view of the standard full browse page, starting with the value of the link clicked on. + +
+
+
+
+
The format of the property key is webui.browse.link.<n> = <index name>:<display column metadata> Please notice the punctuation used between the elements. +
+
+
+
+
Examples of some browse links used in a real DSpace installation instance: +
Recent Submissions+ +This allows us to define which index to base Recent Submission display on, and how many we should show at any one time. This uses the PluginManager to automatically load the relevant plugin for the Community and Collection home pages. Values given in examples are the defaults supplied in dspace.cfg +
+
+
+
+
There will be the need to set up the processors that the PluginManager will load to actually perform the recent submissions query on the relevant pages. This is already configured by default dspace.cfg so there should be no need for the administrator/programmer to worry about this. + +
+ plugin.sequence.org.dspace.plugin.CommunityHomeProcessor = \ + org.dspace.app.webui.components.RecentCommunitySubmissions + +plugin.sequence.org.dspace.plugin.CollectionHomeProcessor = \ + org.dspace.app.webui.components.RecentCollectionSubmissions+ Submission License Substitution Variables+ +
+
+
+
+
+
Syndication Feed (RSS) Settings+ +This will enable syndication feeds‚ links display on community and collection home pages. This setting is not used by the XMLUI, as you enable feeds in your theme. +
+
+
+
+
+
OpenSearch Support+ +OpenSearch is a small set of conventions and documents for describing and using "search engines", meaning any service that returns a set of results for a query. See extensive description in the Business Layer section of the documentation. + +Please note that for result data formatting, OpenSearch uses Syndication Feed Settings (RSS). So, even if Syndication Feeds are not enable, they must be configured to enable OpenSearch. OpenSearch uses all the configuration properties for DSpace RSS to determine the mapping of metadata fields to feed fields. Note that a new field for authors has been added (used in Atom format only). +
+
+
+
+
+
Content Inline Disposition Threshold+ +The following configuration is used to change the disposition behavior of the browser. That is, when the browser will attempt to open the file or download it to the user-specified location. For example, the default size is 8MB. When an item being viewed is larger than 8MB, the browser will download the file to the desktop (or wherever you have it set to download) and the user will have to open it manually. +
+
+
+
+
Other values are possible: Multi-file HTML Document/Site Settings+ +The setting is used to configure the "depth" of request for html documents bearing the same name. +
+
+
+
+
+
Sitemap Settings+ +To aid web crawlers index the content within your repository, you can make use of sitemaps. +
+
+
+
+
+
Authority Control Settings+ +Two new features of DSpace 1.6 fall under the header of Authority Control: Choice Management and Authority Control of Item ("DC") metadata values. Authority control is a fully optional feature in DSpace 1.6. Implemented out of the box are the Library of Congress Names service, and the Sherpa Romeo authority plugin. + +For an in-depth description of this feature, please consult: http://wiki.dspace.org/index.php/Authority_Control_of_Metadata_Values + +
+
+
+
+
+
JSPUI Upload File Settings+ +To alter these properties for the XMLUI, please consult the Cocoon specific configuration at /WEB-INF/cocoon/properties/core.properties. +
+
+
+
+
+
JSP Web Interface (JSPUI) Settings+ +The following section is limited to JSPUI. If the user wishes to use XMLUI settings, please refer to Chapter 7: XMLUI Configuration and Customization. +
+
+
+
+
+
JSPUI Configuring Multilingual Support+ +[i18n – Locales] + +Setting the Default Language for the Application+ +
+
+
+
+
+
Supporting More Than One Language+ +Changes in dspace.cfg+ +
+
+
+
+
The table above, if needed and is used will result in: + +
Related Files+ +If you set webui.supported.locales make sure that all the related additional files for each language are available. LOCALE should correspond to the locale set in webui.supported.locales, e. g.: for webui.supported.locales = en, de, fr, there should be: + +
JSPUI Item Mapper+ +Because the item mapper requires a primitive implementation of the browse system to be present, we simply need to tell that system which of our indexes defines the author browse (or equivalent) so that the mapper can list authors' items for mapping + +Define the index name (from webui.browse.index) to use for displaying items by author. +
+
+
+
+
+
Display of Group Membership+ +
+
+
+
+
+
JSPUI / XMLUI SFX Server+ +SFX Server is an OpenURL Resolver. +
+
+
+
+
All the parameters mapping are defined in [dspace]/config/sfx.xml file. The program will check the parameters in sfx.xml and retrieve the correct metadata of the item. It will then parse the string to your resolver. + +For the following example, the program will search the first query-pair which is DOI of the item. If there is a DOI for that item, your retrieval results will be, for example: Example. For setting DOI in sfx.xml +
+ <query-pairs> + <field> + <querystring>rft_id=info:doi/</querystring> + <dc-schema>dc</dc-schema> + <dc-element>identifier</dc-element> + <dc-qualifier>doi</dc-qualifier> + </field> + </query-pairs>+ If there is no DOI for that item, it will search next query-pair based on the [dspace]/config/sfx.xml and then so on. + +Example of using ISSN, volume, issue for item without DOI For parameter passing to the <querystring> +
+ <querystring>rft_id=info:doi/</querystring>+ Please refer to these: Program assume won’t get empty string for the item, as there will at least author, title for the item to pass to the resolver. + +For contributor author, program maintains original DSpace SFX function of extracting author‘s first and last name. +
+ <field> + <querystring>rft.aulast=</querystring> + <dc-schema>dc</dc-schema> + <dc-element>contributor</dc-element> + <dc-qualifier>author</dc-qualifier> + </field> + <field> + <querystring>rft.aufirst=</querystring> + <dc-schema>dc</dc-schema> + <dc-element>contributor</dc-element> + <dc-qualifier>author</dc-qualifier> + </field>+ JSPUI Item Recommendation Setting+ +
+
+
+
+
+
H3. Controlled Vocabulary Settings + +DSpace now supports controlled vocabularies to confine the set of keywords that users can use while describing items. +
+
+
+
+
The need for a limited set of keywords is important since it eliminates the ambiguity of a free description system, consequently simplifying the task of finding specific items of information. + +The controlled vocabulary add-on allows the user to choose from a defined set of keywords organized in an tree (taxonomy) and then use these keywords to describe items while they are being submitted. + +We have also developed a small search engine that displays the classification tree (or taxonomy) allowing the user to select the branches that best describe the information that he/she seeks. + +The taxonomies are described in XML following this (very simple) structure: + +
+ <node id="acmccs98" label="ACMCCS98"> + <isComposedBy> + <node id="A." label="General Literature"> + <isComposedBy> + <node id="A.0" label="GENERAL"/> + <node id="A.1" label="INTRODUCTORY AND SURVEY"/> + </isComposedBy> + </node> + </isComposedBy> +</node>+ You are free to use any application you want to create your controlled vocabularies. A simple text editor should be enough for small projects. Bigger projects will require more complex tools. You may use Protegé to create your taxonomies, save them as OWL and then use a XML Stylesheet (XSLT) to transform your documents to the appropriate format. Future enhancements to this add-on should make it compatible with standard schemas such as OWL or RDF. + +In order to make DSpace compatible with WAI 2.0, the add-on is turned off by default (the add-on relies strongly on JavaScript to function). It can be activated by setting the following property in dspace.cfg: + +webui.controlledvocabulary.enable = true + +New vocabularies should be placed in [dspace]/config/controlled-vocabularies/ and must be according to the structure described. A validation XML Schema can be downloaded here. + +Vocabularies need to be associated with the correspondent DC metadata fields. Edit the file [dspace]/config/input-forms.xml and place a "vocabulary" tag under the "field" element that you want to control. Set value of the "vocabulary" element to the name of the file that contains the vocabulary, leaving out the extension (the add-on will only load files with extension "*.xml"). For example: + +
+ <field> + <dc-schema>dc</dc-schema> + <dc-element>subject</dc-element> + <dc-qualifier></dc-qualifier> + <!-- An input-type of twobox MUST be marked as repeatable --> + <repeatable>true</repeatable> + <label>Subject Keywords</label> + <input-type>twobox</input-type> + <hint> Enter appropriate subject keywords or phrases below. </hint> + <required></required> + <vocabulary [closed="false"]>nsi</vocabulary> +</field>+ The vocabulary element has an optional boolean attribute closed that can be used to force input only with the javascript of controlled-vocabulary add-on. The default behavior (i.e. without this attribute) is as set closed="false". This allow the user also to enter the value in free way. + +The following vocabularies are currently available by default: + +
3. JSPUI Session Invalidation +
+
+
+
+
+
XMLUI Specific Configuration+ +The DSpace digital repository supports two user interfaces: one based upon JSP technologies and the other based upon the Apache Cocoon framework. This section describes those configurations settings which are specific to the XMLUI interface based upon the Cocoon framework. (Prior to DSpace Release 1.5.1 XMLUI was referred to Manakin. You may still see references to "Manakin") +
+
+
+
+
+
OAI-PMH Configuration and Activation+ +In the following sections, you will learn how to configure OAI-PMH and activate additional OAI-PMH crosswalks. The user is also referred to 9.2OAI-PMH Data Provider for greater depth details of the program. + +OAI-PMH Configuration+ +
+
+
+
+
+
Activating Additional OAI-PMH Crosswalks+ +DSpace comes with an unqualified DC Crosswalk used in the default OAI-PMH data provider. There are also other Crosswalks bundled with the DSpace distribution which can be activated by editing one or more configuration files. How to do this for each available Crosswalk is described below. The DSpace source includes the following crosswalk plugins available for use with OAI-PMH: + +
DIDL+ +By activating the DIDL provider, DSpace items are represented as MPEG-21 DIDL objects. These DIDL objects are XML documents that wrap both the Dublin Core metadata that describes the DSpace item and its actual bitstreams. A bitstream is provided inline in the DIDL object in a base64 encoded manner, and/or by means of a pointer to the bitstream. The data provider exposes DIDL objects via the metadataPrefix didl. + +The crosswalk does not deal with special characters and purposely skips dissemination of the license.txt file awaiting a better understanding on how to map DSpace rights information to MPEG21-DIDL. + +The DIDL Crosswalk can be activated as follows: + +
OAI-ORE Harvester Configuration+ +This section describes the parameters used in configuring the OAI-ORE harvester. + +OAI-ORE Configuration+ +There are many possible configuration options for the OAI harvester. Most of them are technical and therefore omitted from the dspace.cfg file itself, using hard-coded defaults instead. However, should you wish to modify those values, including them in dspace.cfg will override the system defaults. +
+
+
+
+
+
+
DSpace SOLR Statistics Configuration+ +
+
+
+
+
+
+
Optional or Advanced Configuration Settings+ +The following section explains how to configure either optional features or advanced features that are not necessary to make DSpace "out-of-the-box" + +The Metadata Format and Bitstream Format Registries+ +The [dspace]/config/registries directory contains three XML files. These are used to load the initial contents of the Dublin Core Metadata registry and Bitstream Format registry and SWORD metadata registry. After the initial loading (performed by ant fresh_install above), the registries reside in the database; the XML files are not updated. + +In order to change the registries, you may adjust the XML files before the first installation of DSpace. On an already running instance it is recommended to change bitstream registries via DSpace admin UI, but the metadata registries can be loaded again at any time from the XML files without difficult. The changes made via admin UI are not reflected in the XML files. + +Metadata Format Registries+ +The default metadata schema is Dublin Core, so DSpace is distributed with a default Dublin Core Metadata Registry. Currently, the system requires that every item have a Dublin Core record. + +There is a set of Dublin Core Elements, which is used by the system and should not be removed or moved to another schema, see Appendix: Default Dublin Core Metadata registry. + +Note: altering a Metadata Registry has no effect on corresponding parts, e.g. item submission interface, item display, item import and vice versa. Every metadata element used in submission interface or item import must be registered before using it. + +Note also that deleting a metadata element will delete all its corresponding values. + +If you wish to add more metadata elements, you can do this in one of two ways. Via the DSpace admin UI you may define new metadata elements in the different available schemas. But you may also modify the XML file (or provide an additional one), and re-import the data as follows: +
+ [dspace]/bin/dsrun org.dspace.administer.MetadataImporter -f [xml file]+ The XML file should be structured as follows: +
+ <dspace-dc-types>
+ <dc-type>
+ <schema>dc</schema>
+ <element>contributor</element>
+ <qualifier>advisor</qualifier>
+ <scope_note>Use primarily for thesis advisor.</scope_note>
+ </dc-type>
+</dspace-dc-types>
+Bitstream Format Registry+ +The bitstream formats recognized by the system and levels of support are similarly stored in the bitstream format registry. This can also be edited at install-time via [dspace]/config/registries/bitstream-formats.xml or by the administration Web UI. The contents of the bitstream format registry are entirely up to you, though the system requires that the following two formats are present: + +
XPDF Filter+ +This is an alternative suite of MediaFilter plugins that offers faster and more reliable text extraction from PDF Bitstreams, as well as thumbnail image generation. It replaces the built-in default PDF MediaFilter. + +If this filter is so much better, why isn't it the default? The answer is that it relies on external executable programs which must be obtained and installed for your server platform. This would add too much complexity to the installation process, so it left out as an optional "extra" step. + +Installation Overview+ +Here are the steps required to install and configure the filters: + +
Install XPDF Tools+ +First, download the XPDF suite found at: http://www.foolabs.com/xpdf and install it on your server. The executables can be located anywhere, but make a note of the full path to each command. + +You may be able to download a binary distribution for your platform, which simplifies installation. Xpdf is readily available for Linux, Solaris, MacOSX, Windows, NetBSD, HP-UX, AIX, and OpenVMS, and is reported to work on AIX, OS/2, and many other systems. + +The only tools you really need are: + +
Fetch and install jai_imageio JAR+ +Fetch and install the Java Advanced Imaging Image I/O Tools. + +For AIX, Sun support has the following: "JAI has native acceleration for the above but it also works in pure Java mode. So as long as you have an appropriate JDK for AIX (1.3 or later, I believe), you should be able to use it. You can download any of them, extract just the jars, and put those in your $CLASSPATH." + +Download the jai_imageio library version 1.0_01 or 1.1 found at: https://jai-imageio.dev.java.net/binary-builds.html#Stable_builds . + +For these filters you do NOT have to worry about the native code, just the JAR, so choose a download for any platform. +
+ curl -O http://download.java.net/media/jai-imageio/builds/release/1.1/jai_imageio-1_1-lib-linux-i586.tar.gz
+tar xzf jai_imageio-1_1-lib-linux-i586.tar.gz
+
+The preceding example leaves the JAR in jai_imageio-1_1/lib/jai_imageio.jar . Now install it in your local Maven repository, e.g.: (changing the path after file= if necessary) +
+ mvn install:install-file \
+ -Dfile=jai_imageio-1_1/lib/jai_imageio.jar \
+ -DgroupId=com.sun.media \
+ -DartifactId=jai_imageio \
+ -Dversion=1.0_01 \
+ -Dpackaging=jar \
+ -DgeneratePom=true
+
+You may have to repeat this procedure for the jai_core.jar library, as well, if it is not available in any of the public Maven repositories. Once acquired, this command installs it locally: +
+ mvn install:install-file -Dfile=jai_core-1.1.2_01.jar \
+ -DgroupId=javax.media -DartifactId=jai_core -Dversion=1.1.2_01 -Dpackaging=jar -DgeneratePom=true
+Edit DSpace Configuration+ +First, be sure there is a value for thumbnail.maxwidth and that it corresponds to the size you want for preview images for the UI, e.g.: (NOTE: this code doesn't pay any attention to thumbnail.maxheight but it's best to set it too so the other thumbnail filters make square images.) +
+ # maximum width and height of generated thumbnails + thumbnail.maxwidth= 80 + thumbnail.maxheight = 80+ Now, add the absolute paths to the XPDF tools you installed. In this example they are installed under /usr/local/bin (a logical place on Linux and MacOSX), but they may be anywhere. +
+ xpdf.path.pdftotext = /usr/local/bin/pdftotext + xpdf.path.pdftoppm = /usr/local/bin/pdftoppm + xpdf.path.pdfinfo = /usr/local/bin/pdfinfo+ Change the MediaFilter plugin configuration to remove the old org.dspace.app.mediafilter.PDFFilter and add the new filters, e.g: (New sections are in bold) +
+ filter.plugins = \ + PDF Text Extractor, \ + PDF Thumbnail, \ + HTML Text Extractor, \ + Word Text Extractor, \ + JPEG Thumbnail + plugin.named.org.dspace.app.mediafilter.FormatFilter = \ + org.dspace.app.mediafilter.XPDF2Text = PDF Text Extractor, \ + org.dspace.app.mediafilter.XPDF2Thumbnail = PDF Thumbnail, \ + org.dspace.app.mediafilter.HTMLFilter = HTML Text Extractor, \ + org.dspace.app.mediafilter.WordFilter = Word Text Extractor, \ + org.dspace.app.mediafilter.JPEGFilter = JPEG Thumbnail, \ + org.dspace.app.mediafilter.BrandedPreviewJPEGFilter = Branded Preview JPEG+ Then add the input format configuration properties for each of the new filters, e.g.: +
+ filter.org.dspace.app.mediafilter.XPDF2Thumbnail.inputFormats = Adobe PDFfilter.org.dspace.app.mediafilter.XPDF2Text.inputFormats = Adobe PDF+ Finally, if you want PDF thumbnail images, don't forget to add that filter name to the filter.plugins property, e.g.: +
+ filter.plugins = PDF Thumbnail, PDF Text Extractor, ...+ Build and Install+ +Follow your usual DSpace installation/update procedure, only add -Pxpdf-mediafilter-support to the Maven invocation: +
+ mvn -Pxpdf-mediafilter-support package
+ ant -Dconfig=\[dspace\]/config/dspace.cfg update
+Creating a new Media/Format Filter+ +Creating a simple Media Filter+ +New Media Filters must implement the org.dspace.app.mediafilter.FormatFilter interface. More information on the methods you need to implement is provided in the FormatFilter.java source file. For example: + +public class MySimpleMediaFilter implements FormatFilter + +Alternatively, you could extend the org.dspace.app.mediafilter.MediaFilter class, which just defaults to performing no pre/post-processing of bitstreams before or after filtering. + +public class MySimpleMediaFilter extends MediaFilter + +You must give your new filter a "name", by adding it and its name to the plugin.named.org.dspace.app.mediafilter.FormatFilter field in dspace.cfg. In addition to naming your filter, make sure to specify its input formats in the filter.<class path>.inputFormats config item. Note the input formats must match the short description field in the Bitstream Format Registry (i.e. bitstreamformatregistry table). +
+ plugin.named.org.dspace.app.mediafilter.FormatFilter = \ + org.dspace.app.mediafilter.MySimpleMediaFilter = My Simple Text Filter, \ ... + +filter.org.dspace.app.mediafilter.MySimpleMediaFilter.inputFormats = + Text+ If you neglect to define the inputFormats for a particular filter, the MediaFilterManager will never call that filter, since it will never find a bitstream which has a format matching that filter's input format(s). + +If you have a complex Media Filter class, which actually performs different filtering for different formats (e.g. conversion from Word to PDF and conversion from Excel to CSV), you should define this as described in Chapter 13.3.2.2 . + + +Creating a Dynamic or "Self-Named" Format Filter+ +If you have a more complex Media/Format Filter, which actually performs multiple filtering or conversions for different formats (e.g. conversion from Word to PDF and conversion from Excel to CSV), you should have define a class which implements the FormatFilter interface, while also extending the Chapter 13.3.2.2 SelfNamedPlugin class. For example: + +public class MyComplexMediaFilter extends SelfNamedPlugin implements FormatFilter + +Since SelfNamedPlugins are self-named (as stated), they must provide the various names the plugin uses by defining a getPluginNames() method. Generally speaking, each "name" the plugin uses should correspond to a different type of filter it implements (e.g. "Word2PDF" and "Excel2CSV" are two good names for a complex media filter which performs both Word to PDF and Excel to CSV conversions). + +Self-Named Media/Format Filters are also configured differently in dspace.cfg. Below is a general template for a Self Named Filter (defined by an imaginary MyComplexMediaFilter class, which can perform both Word to PDF and Excel to CSV conversions): +
+ #Add to a list of all Self Named filters +plugin.selfnamed.org.dspace.app.mediafilter.FormatFilter = \ + org.dspace.app.mediafilter.MyComplexMediaFilter +#Define input formats for each "named" plugin this filter implements + filter.org.dspace.app.mediafilter.MyComplexMediaFilter.Word2PDF.inputFormats = Microsoft Word + filter.org.dspace.app.mediafilter.MyComplexMediaFilter.Excel2CSV.inputFormats = Microsoft Excel+ As shown above, each Self-Named Filter class must be listed in the plugin.selfnamed.org.dspace.app.mediafilter.FormatFilter item in dspace.cfg. In addition, each Self-Named Filter must define the input formats for each named plugin defined by that filter. In the above example the MyComplexMediaFilter class is assumed to have defined two named plugins, Word2PDF and Excel2CSV. So, these two valid plugin names ("Word2PDF" and "Excel2CSV") must be returned by the getPluginNames() method of the MyComplexMediaFilter class. + +These named plugins take different input formats as defined above (see the corresponding inputFormats setting). +
For a particular Self-Named Filter, you are also welcome to define additional configuration settings in dspace.cfg. To continue with our current example, each of our imaginary plugins actually results in a different output format (Word2PDF creates "Adobe PDF", while Excel2CSV creates "Comma Separated Values"). To allow this complex Media Filter to be even more configurable (especially across institutions, with potential different "Bitstream Format Registries"), you may wish to allow for the output format to be customizable for each named plugin. For example: +
+ #Define output formats for each named plugin
+filter.org.dspace.app.mediafilter.MyComplexMediaFilter.Word2PDF.output Format = Adobe PDF
+filter.org.dspace.app.mediafilter.MyComplexMediaFilter.Excel2CSV.outputFormat = Comma Separated Values
+Any custom configuration fields in dspace.cfg defined by your filter are ignored by the MediaFilterManager, so it is up to your custom media filter class to read those configurations and apply them as necessary. For example, you could use the following sample Java code in your MyComplexMediaFilter class to read these custom outputFormat configurations from dspace.cfg: +
+ #Get "outputFormat" configuration from dspace.cfg +String outputFormat = ConfigurationManager.getProperty(MediaFilterManager.FILTER_PREFIX + "." + MyComplexMediaFilter.class.getName() + "." + this.getPluginInstanceName() + ".outputFormat");+ Configuring Usage Instrumentation Plugins+ +A usage instrumentation plugin is configured as a singleton plugin for the abstract class org.dspace.app.statistics.AbstractUsageEvent. + +The Passive Plugin+ +The Passive plugin is provided as the class org.dspace.app.statistics.PassiveUsageEvent. It absorbs events without effect. Use the Passive plugin when you have no use for usage event postings. This is the default if no plugin is configured. + + +The Tab File Logger Plugin+ +The Tab File Logger plugin is provided as the class org.dspace.app.statistics.UsageEventTabFileLogger. It writes event records to a file in tab-separated column format. If left unconfigured, an error will be noted in the DSpace log and no file will be produced. To specify the file path, provide an absolute path as the value for usageEvent.tabFileLogger.file in dspace.cfg. + + +The XML Logger Plugin+ +The XML Logger plugin is provided as the class org.dspace.app.statistics.UsageEventXMLLogger. It writes event records to a file in a simple XML-like format. If left unconfigured, an error will be noted in the DSpace log and no file will be produced. To specify the file path, provide an absolute path as the value for usageEvent.xmlLogger.file in dspace.cfg. + + + +SWORD Configuration+ +SWORD (Simple Web-service Offering Repository Deposit) is a protocol that allows the remote deposit of items into repositories. DSpace implements the SWORD protocol via the 'sword' web application. The version of SWORD currently supported by DSpace is 1.3. The specification and further information can be downloaded fromhttp://swordapp.org. + +SWORD is based on the Atom Publish Protocol and allows service documents to be requested which describe the structure of the repository, and packages to be deposited. +
+
+
+
+
+
|
+
![]() |
+
Document generated by Confluence on Mar 25, 2011 19:21 | +
+
+
+ DSpace Documentation : Curation System
+
+
+
+ This page last changed on Feb 17, 2011 by helix84.
+
+
+ Curation System+ +As of release 1.7, DSpace supports running curation tasks, which are described in this section. DSpace 1.7 and subsequent distributions will bundle (include) several useful tasks, but the system also is designed to allow new tasks to be added between releases, both general purpose tasks that come from the community, and locally written and deployed tasks. + +
+
+
+
Tasks+ +The goal of the curation system ('CS') is to provide a simple, extensible way to manage routine content operations on a repository. These operations are known to CS as 'tasks', and they can operate on any DSpaceObject (i.e. subclasses of DSpaceObject) - which means Communities, Collections, and Items - viz. core data model objects. Tasks may elect to work on only one type of DSpace object - typically an Item - and in this case they may simply ignore other data types (tasks have the ability to 'skip' objects for any reason). The DSpace core distribution will provide a number of useful tasks, but the system is designed to encourage local extension - tasks can be written for any purpose, and placed in any java package. This gives DSpace sites the ability to customize the behavior of their repository without having to alter - and therefore manage synchronization with - the DSpace source code. What sorts of activities are appropriate for tasks? + +Some examples: + +
Since tasks have access to, and can modify, DSpace content, performing tasks is considered an administrative function to be available only to knowledgeable collection editors, repository administrators, sysadmins, etc. No tasks are exposed in the public interfaces. + +Activation+ +For CS to run a task, the code for the task must of course be included with other deployed code (to [dspace]/lib, WAR, etc) but it must also be declared and given a name. This is done via a configuration property in [dspace]/config/modules/curate.cfg as follows: + +
+ +plugin.named.org.dspace.curate.CurationTask = \ +org.dspace.curate.ProfileFormats = profileformats, \ +org.dspace.curate.RequiredMetadata = requiredmetadata, \ +org.dspace.curate.ClamScan = vscan ++ For each activated task, a key-value pair is added. The key is the fully qualified class name and the value is the taskname used elsewhere to configure the use of the task, as will be seen below. Note that the curate.cfg configuration file, while in the config directory, is located under 'modules'. The intent is that tasks, as well as any configuration they require, will be optional 'add-ons' to the basic system configuration. Adding or removing tasks has no impact on dspace.cfg. + +For many tasks, this activation configuration is all that will be required to use it. But for others, the task needs specific configuration itself. A concrete example is described below, but note that these task-specific configuration property files also reside in [dspace]/config/modules + +Writing your own tasks+ +A task is just a java class that can contain arbitrary code, but it must have 2 properties: + +First, it must provide a no argument constructor, so it can be loaded by the PluginManager. Thus, all tasks are 'named' plugins, with the taskname being the plugin name. + +Second, it must implement the interface 'org.dspace.curate.CurationTask' + +The CurationTask interface is almost a 'tagging' interface, and only requires a few very high-level methods be implemented. The most significant is: + +
+ int perform(DSpaceObject dso);
+The return value should be a code describing one of 4 conditions: + +
If a task extends the AbstractCurationTask class, that is the only method it needs to define. + +Task Invocation+ +Tasks are invoked using CS framework classes that manage a few details (to be described below), and this invocation can occur wherever needed, but CS offers great versatility 'out of the box': + +On the command line+ +A simple tool 'CurationCli' provides access to CS via the command line. This tool bears the name 'curate' in the DSpace launcher. For example, to perform a virus check on collection '4': + +
+ [dspace]/bin/dspace curate -t vscan -i 123456789/4+ The complete list of arguments: + +
+
+-t taskname: name of task to perform
+-T filename: name of file containing list of tasknames
+-e epersonID: (email address) will be superuser if unspecified
+-i identifier: Id of object to curate. May be (1) a handle (2) a workflow Id or (3) 'all' to operate on the whole repository
+-q queue: name of queue to process - -i and -q are mutually exclusive
+-v emit verbose output
+-r - emit reporting to standard out
+
+As with other command-line tools, these invocations could be placed in a cron table and run on a fixed schedule, or run on demand by an administrator. + +In the admin UI+ +In the XMLUI, there is a 'Curate' tab (appearing within the 'Edit Community/Collection/Item') that exposes a drop-down list of configured tasks, with a button to 'perform' the task, or queue it for later operation (see section below). Not all activated tasks need appear in the Curate tab - you filter them by means of a configuration property. This property also permits you to assign to the task a more user-friendly name than the PluginManager taskname. The property resides in [dspace]/config/modules/curate.cfg: + +
+
+ui.tasknames = \
+ profileformats = Profile Bitstream Formats, \
+ requiredmetadata = Check for Required Metadata
+
+When a task is selected from the drop-down list and performed, the tab displays both a phrase interpreting the 'status code' of the task execution, and the 'result' message if any has been defined. When the task has been queued, an acknowledgement appears instead. You may configure the words used for status codes in curate.cfg (for clarity, language localization, etc): + +
+ +ui.statusmessages = \ + -3 = Unknown Task, \ + -2 = No Status Set, \ + -1 = Error, \ + 0 = Success, \ + 1 = Fail, \ + 2 = Skip, \ + other = Invalid Status ++ In workflow+ +CS provides the ability to attach any number of tasks to standard DSpace workflows. Using a configuration file [dspace]/config/workflow-curation.xml, you can declaratively (without coding) wire tasks to any step in a workflow. An example: + +
+ +<taskset-map> + <mapping collection-handle="default" taskset="cautious" /> +</taskset-map> +<tasksets> + <taskset name="cautious"> + <flowstep name="step1"> + <task name="vscan"> + <workflow>reject</workflow> + <notify on="fail">$flowgroup</notify> + <notify on="fail">$colladmin</notify> + <notify on="error">$siteadmin</notify> + </task> + </flowstep> + </taskset> +</tasksets> ++ This markup would cause a virus scan to occur during step one of workflow for any collection, and automatically reject any submissions with infected files. It would further notify (via email) both the reviewers (step 1 group), and the collection administrators, if either of these are defined. If it could not perform the scan, the site administrator would be notified. + +The notifications use the same procedures that other workflow notifications do - namely email. There is a new email template defined for curation task use: [dspace]/config/emails/flowtask_notify. This may be language-localized or otherwise modified like any other email template. + +Like configurable submission, you can assign these task rules per collection, as well as having a default for any collection. + +In arbitrary user code+ +If these pre-defined ways are not sufficient, you can of course manage curation directly in your code. You would use the CS helper classes. For example: + +
+ +Collection coll = (Collection)HandleManager.resolveToObject(context, "123456789/4"); +Curator curator = new Curator(); +curator.addTask("vscan").curate(coll); +System.out.println("Result: " + curator.getResult("vscan")); ++ would do approximately what the command line invocation did. the method 'curate' just performs all the tasks configured Asynchronous (Deferred) Operation+ +Because some tasks may consume a fair amount of time, it may not be desirable to run them in an interactive context. CS provides a simple API and means to defer task execution, by a queuing system. Thus, using the previous example: + +
+ + Curator curator = new Curator(); + curator.addTask("vscan").queue(context, "monthly", "123456789/4"); ++ would place a request on a named queue "monthly" to virus scan the collection. To read (and process) the queue, we could for example: + +
+ [dspace]/bin/dspace curate -q monthly+ use the command-line tool, but we could also read the queue programmatically. Any number of queues can be defined and used as needed. Task Output and Reporting+ +Few assumptions are made by CS about what the 'outcome' of a task may be (if any) - it. could e.g. produce a report to a temporary file, it could modify DSpace content silently, etc But the CS runtime does provide a few pieces of information whenever a task is performed: + +Status Code+ +This was mentioned above. This is returned to CS whenever a task is called. The complete list of values: + +
+
+ -3 NOTASK - CS could not find the requested task
+ -2 UNSET - task did not return a status code because it has not yet run
+ -1 ERROR - task could not be performed
+ 0 SUCCESS - task performed successfully
+ 1 FAIL - task performed, but failed
+ 2 SKIP - task not performed due to object not being eligible
+
+In the administrative UI, this code is translated into the word or phrase configured by the ui.statusmessages property (discussed above) for display. + +Result String+ +The task may define a string indicating details of the outcome. This result is displayed, in the 'curation widget' described above: + +
+
+ "Virus 12312 detected on Bitstream 4 of 1234567789/3"
+
+CS does not interpret or assign result strings, the task does it. A task may not assign a result, but the 'best practice' for tasks is to assign one whenever possible. + +Reporting Stream+ +For very fine-grained information, a task may write to a reporting stream. This stream is sent to standard out, so is only available when running a task from the command line. Unlike the result string, there is no limit to the amount of data that may be pushed to this stream. + +The status code, and the result string are accessed (or set) by methods on the Curation object: + +
+ + Curator curator = new Curator(); + curator.addTask("vscan").curate(coll); + int status = curator.getStatus("vscan"); + String result - curator.getResult("vscan"); ++ Task Annotations+ +CS looks for, and will use, certain java annotations in the task Class definition that can help it invoke tasks more intelligently. An example may explain best. Since tasks operate on DSOs that can either be simple (Items) or containers (Collections, and Communities), there is a fundamental problem or ambiguity in how a task is invoked: if the DSO is a collection, should the CS invoke the task on each member of the collection, or does the task 'know' how to do that itself? The decision is made by looking for the @Distributive annotation: if present, CS assumes that the task will manage the details, otherwise CS will walk the collection, and invoke the task on each member. The java class would be defined: + +
+ +@Distributive +public class MyTask implements CurationTask ++ A related issue concerns how non-distributive tasks report their status and results: the status will normally reflect only the last invocation of the task in the container, so important outcomes could be lost. If a task declares itself @Suspendable, however, the CS will cease processing when it encounters a FAIL status. When used in the UI, for example, this would mean that if our virus scan is running over a collection, it would stop and return status (and result) to the scene on the first infected item it encounters. You can even tune @Supendable tasks more precisely by annotating what invocations you want to suspend on. For example: + +
+ +@Suspendable(invoked=Curator.Invoked.INTERACTIVE) +public class MyTask implements CurationTask ++ would mean that the task would suspend if invoked in the UI, but would run to completion if run on the command-line. + +Only a few annotation types have been defined so far, but as the number of tasks grow, we can look for common behavior that can be signaled by annotation. For example, there is a @Mutative type: that tells CS that the task may alter (mutate) the object it is working on. + +Starter Tasks+ +DSpace 1.7 bundles a few tasks and activates two (2) by default to demonstrate the use of the curation system. These may be removed (deactivated by means of configuration) if desired without affecting system integrity. Each task is briefly described here. + +Bitstream Format Profiler+ +The task with the taskname 'formatprofiler' (in the admin UI it is labeled "Profile Bitstream Formats") examines all the bitstreams in an item and produces a table ("profile") which is assigned to the result string. It is activated by default, and is configured to display in the administrative UI. The result string has the layout: + +
+ + 10 (K) Portable Network Graphics + 5 (S) Plain Text ++ where the left column is the count of bitstreams of the named format and the letter in parentheses is an abbreviation of the repository-assigned support level for that format: + +
+ + U Unsupported + K Known + S Supported ++ The profiler will operate on any DSpace object. If the object is an item, then only that item's bitstreams are profiled; if a collection, all the bitstreams of all the items; if a community, all the items of all the collections of the community. + +Required Metadata+ +The 'requiredmetadata' task examines item metadata and determines whether fields that the web submission (input-forms.xml) marks as required are present. It sets the result string to indicate either that all required fields are present, or constructs a list of metadata elements that are required but missing. When the task is performed on an item, it will display the result for that item. When performed on a collection or community, the task be performed on each item, and will display the last item result. If all items in the community or collection have all required fields, that will be the last in the collection. If the task fails for any item (i.e. the item lacks all required fields), the process is halted. This way the results for the 'failed' items are not lost. + +Virus Scan+ +The 'vscan' task performs a virus scan on the bitstreams of items using the ClamAV software product. Setup the service from the ClamAV documentation.+ +This plugin requires a ClamAV daemon installed and configured for TCP sockets. Instructions for installing ClamAV (http://www.clamav.net/doc/latest/clamdoc.pdf ) + +NOTICE: The following directions assume there is a properly installed and configured clamav daemon. Refer to links above for more information about ClamAV. DSpace Configuration+ +In [dspace]/config/modules/curate.cfg, activate the task: +
+
+### Task Class implementations
+plugin.named.org.dspace.curate.CurationTask = \
+org.dspace.curate.ProfileFormats = profileformats, \
+org.dspace.curate.RequiredMetadata = requiredmetadata, \
+org.dspace.curate.ClamScan = vscan
+
+
+ +ui.tasknames = \ +profileformats = Profile Bitstream Formats, \ +requiredmetadata = Check for Required Metadata, \ +vscan = Scan for Viruses ++ In [dspace]/config/modules, edit configuration file clamav.cfg: + +
+ +service.host = 127.0.0.1 +Change if not running on the same host as your DSpace installation. +service.port = 3310 +Change if not using standard ClamAV port +socket.timeout = 120 +Change if longer timeout needed +scan.failfast = false +Change only if items have large numbers of bitstreams ++ Task Operation from the GUI+ +Curation tasks can be run against container and item dspace objects by e-persons with administrative privileges. A curation tab will appear in the administrative ui after logging into DSpace: +
Task Operation from the curation command line client+ +To output the results to the console: +
+ +[dspace]/bin/dspace curate -t vscan -i <handle of container or item dso> -r - ++ Or capture the results in a file: +
+ +[dspace]/bin/dspace curate -t vscan -i <handle of container or item dso> -r - > /<path...>/<name> ++ Table 1 – Virus Scan Results Table+ +
+
+
+
+
+
|
+
![]() |
+
Document generated by Confluence on Mar 25, 2011 19:21 | +
+
+
+ DSpace Documentation : DRI Schema Reference
+
+
+
+ This page last changed on Feb 17, 2011 by helix84.
+
+
+ DSpace System Documentation: DRI Schema Reference+ +Digital Repository Interface (DRI) is a schema that governs the structure of a Manakin DSpace page when encoded as an XML Document. It determines what elements can be present in the Document and the relationship of those elements to each other. This reference document explains the purpose of DRI, provides a broad architectural overview, and explains common design patterns. The appendix includes a complete reference for elements used in the DRI Schema, a graphical representation of the element hierarchy, and a quick reference table of elements and attributes. + +Table of Contents: +
+
+
+
Introduction+ +This manual describes the Digital Repository Interface (DRI) as it applies to the DSpace digital repository and XMLUI Manakin based interface. DSpace XML UI is a comprehensive user interface system. It is centralized and generic, allowing it to be applied to all DSpace pages, effectively replacing the JSP-based interface system. Its ability to apply specific styles to arbitrarily large sets of DSpace pages significantly eases the task of adapting the DSpace look and feel to that of the adopting institution. This also allows for several levels of branding, lending institutional credibility to the repository and collections. + +Manakin, the second version of DSpace XML UI, consists of several components, written using Java, XML, and XSL, and is implemented in Cocoon. Central to the interface is the XML Document, which is a semantic representation of a DSpace page. In Manakin, the XML Document adheres to a schema called the Digital Repository Interface (DRI) Schema, which was developed in conjunction with Manakin and is the subject of this guide. For the remainder of this guide, the terms XML Document, DRI Document, and Document will be used interchangeably. + +This reference document explains the purpose of DRI, provides a broad architectural overview, and explains common design patterns. The appendix includes a complete reference for elements used in the DRI Schema, a graphical representation of the element hierarchy, and a quick reference table of elements and attributes. + +The Purpose of DRI+ +DRI is a schema that governs the structure of the XML Document. It determines the elements that can be present in the Document and the relationship of those elements to each other. Since all Manakin components produce XML Documents that adhere to the DRI schema, The XML Document serves as the abstraction layer. Two such components, Themes and Aspects, are essential to the workings of Manakin and are described briefly in this manual. + + +The Development of DRI+ +The DRI schema was developed for use in Manakin. The choice to develop our own schema rather than adapt an existing one came after a careful analysis of the schema's purpose as well as the lessons learned from earlier attempts at customizing the DSpace interface. Since every DSpace page in Manakin exists as an XML Document at some point in the process, the schema describing that Document had to be able to structurally represent all content, metadata and relationships between different parts of a DSpace page. It had to be precise enough to avoid losing any structural information, and yet generic enough to allow Themes a certain degree of freedom in expressing that information in a readable format. + +Popular schemas such as XHTML suffer from the problem of not relating elements together explicitly. For example, if a heading precedes a paragraph, the heading is related to the paragraph not because it is encoded as such but because it happens to precede it. When these structures are attempted to be translated into formats where these types of relationships are explicit, the translation becomes tedious, and potentially problematic. More structured schemas, like TEI or DocBook, are domain specific (much like DRI itself) and therefore not suitable for our purposes. + +We also decided that the schema should natively support a metadata standard for encoding artifacts. Rather than encoding artifact metadata in structural elements, like tables or lists, the schema would include artifacts as objects encoded in a particular standard. The inclusion of metadata in native format would enable the Theme to choose the best method to render the artifact for display without being tied to a particular structure. + +Ultimately, we chose to develop our own schema. We have constructed the DRI schema by incorporating other standards when appropriate, such as Cocoon's i18n schema for internationalization, DCMI's Dublin Core, and the Library of Congress's METS schema. The design of structural elements was derived primarily from TEI, with some of the design patterns borrowed from other existing standards such as DocBook and XHTML. While the structural elements were designed to be easily translated into XHTML, they preserve the semantic relationships for use in more expressive languages. + + + +DRI in Manakin+ +The general process for handling a request in DSpace XML UI consists of two parts. The first part builds the XML Document, and the second part stylizes that Document for output. In Manakin, the two parts are not discrete and instead wrapped within two processes: Content Generation, which builds an XML representation of the page, and Style Application, which stylizes the resulting Document. Content Generation is performed by Aspect chaining, while Style Application is performed by a Theme. + +Themes+ +A Theme is a collection of XSL stylesheets and supporting files like images, CSS styles, translations, and help documents. The XSL stylesheets are applied to the DRI Document to covert it into a readable format and give it structure and basic visual formatting in that format. The supporting files are used to provide the page with a specific look and feel, insert images and other media, translate the content, and perform other tasks. The currently used output format is XHTML and the supporting files are generally limited to CSS, images, and JavaScript. More output formats, like PDF or SVG, may be added in the future. + +A DSpace installation running Manakin may have several Themes associated with it. When applied to a page, a Theme determines most of the page's look and feel. Different themes can be applied to different sets of DSpace pages allowing for both variety of styles between sets of pages and consistency within those sets. The xmlui.xconf configuration file determines which Themes are applied to which DSpace pages (see the Chapter 7. Manakin [XMLUI] Configuration and Customization for more information on installing and configuring themes). Themes may be configured to apply to all pages of specific type, like browse-by-title, to all pages of a one particular community or collection or sets of communities and collections, and to any mix of the two. They can also be configured to apply to a singe arbitrary page or handle. + + +Aspect Chains+ +Manakin Aspects are arrangements of Cocoon components (transformers, actions, matchers, etc) that implement a new set of coupled features for the system. These Aspects are chained together to form all the features of Manakin. Five Aspects exist in the default installation of Manakin, each handling a particular set of features of DSpace, and more can be added to implement extra features. All Aspects take a DRI Document as input and generate one as output. This allows Aspects to be linked together to form an Aspect chain. Each Aspect in the chain takes a DRI Document as input, adds its own functionality, and passes the modified Document to the next Aspect in the chain. + + + +Common Design Patterns+ +There are several design patterns used consistently within the DRI schema. This section identifies the need for and describes the implementation of these patterns. Three patterns are discussed: language and internationalization issues, standard attribute triplet (id, n, and rend), and the use of structure-oriented markup. + +Localization and Internationalization+ +Internationalization is a very important component of the DRI system. It allows content to be offered in other languages based on user's locale and conditioned upon availability of translations, as well as present dates and currency in a localized manner. There are two types of translated content: content stored and displayed by DSpace itself, and content introduced by the DRI styling process in the XSL transformations. Both types are handled by Cocoon's i18n transformer without regard to their origin. + +When the Content Generation process produces a DRI Document, some of the textual content may be marked up with i18n elements to signify that translations are available for that content. During the Style Application process, the Theme can also introduce new textual content, marking it up with i18n tags. As a result, after the Theme's XSL templates are applied to the DRI Document, the final output consists of a DSpace page marked up in the chosen display format (like XHTML) with i18n elements from both DSpace and XSL content. This final document is sent through Cocoon's i18n transformer that translates the marked up text. + + +Standard attribute triplet+ +Many elements in the DRI system (all top-level containers, character classes, and many others) contain one or several of the three standard attributes: id, n, and rend. The id and n attributes can be required or optional based on the element's purpose, while the rend attribute is always optional. The first two are used for identification purposes, while the third is used as a display hint issued to the styling step. + +Identification is important because it allows elements to be separated from their peers for sorting, special case rendering, and other tasks. The first attribute, id, is the global identifier and it is unique to the entire document. Any element that contains an id attribute can thus be uniquely referenced by it. The id attribute of an element can be either assigned explicitly, or generated from the Java Class Path of the originating object if no name is given. While all elements that can be uniquely identified can carry the id attribute, only those that are independent on their context are required to do so. For example, tables are required to have an id since they retain meaning regardless of their location in the document, while table rows and cells can omit the attribute since their meaning depends on the parent element. + +The name attribute n is simply the name assigned to the element, and it is used to distinguish an element from its immediate peers. In the example of a particular list, all items in that list will have different names to distinguish them from each other. Other lists in the document, however, can also contain items whose names will be different from each other, but identical to those in the first list. The n attribute of an element is therefore unique only in the scope of that element's parent and is used mostly for sorting purposes and special rendering of a certain class of elements, like, for example, all first items in lists, or all items named "browse". The n attribute follows the same rules as id when determining whether or not it is required for a given element. + +The last attribute in the standard triplet is rend. Unlike id and n, the rend attribute can consist of several space delimited values and is optional for all elements that can contain it. Its purpose is to provide a rendering hint from the middle layer component to the styling theme. How that hint is interpreted and whether it is used at all when provided, is completely up the theme. There are several cases, however, where the content of the rend attribute is outlined in detail and its use is encouraged. Those cases are the emphasis element hi, the division element div, and the list element. Please refer to the Element Reference for more detail on these elements. + + +Structure-oriented markup+ +The final design pattern is the use of structure-oriented markup for content carried by the XML Document. Once generated by Cocoon, the Document contains two major types of information: metadata about the repository and its contents, and the actual content of the page to be displayed. A complete overview of metadata and content markup and their relationship to each other is given in the next section. An important thing to note here, however, is that the markup of the content is oriented towards explicitly stating structural relationships between the elements rather than focusing on the presentational aspects. This makes the markup used by the Document more similar to TEI or Docbook rather than HTML. For this reason, XSL templates are used by the themes to convert structural DRI markup to XHTML. Even then, an attempt is made to create XHTML as structural as possible, leaving presentation entirely to CSS. This allows the XML Document to be generic enough to represent any DSpace page without dictating how it should be rendered. + + + +Schema Overview+ +The DRI XML Document consists of the root element document and three top-level elements that contain two major types of elements. The three top-level containers are meta, body, and options. The two types of elements they contain are metadata and content, carrying metadata about the page and the contents of the page, respectively. Figure 1 depicts the relationship between these six components. + +Figure 1: The two content types across three major divisions of a DRI page. + +The document element is the root for all DRI pages and contains all other elements. It bears only one attribute, version, that contains the version number of the DRI system and the schema used to validate the produced document. At the time of writing the working version number is "1.1". + +The meta element is a the top-level element under document and contains all metadata information about the page, the user that requested it, and the repository it is used with. It contains no structural elements, instead being the only container of metadata elements in a DRI Document. The metadata stored by the meta element is broken up into three major groups: userMeta, pageMeta, and objectMeta, each storing metadata information about their respective component. Please refer to the reference entries for more information about these elements. + +The options element is another top-level element that contains all navigation and action options available to the user. The options are stored as items in list elements, broken up by the type of action they perform. The five types of actions are: browsing, search, language selection, actions that are always available, and actions that are context dependent. The two action types also contain sub-lists that contain actions available to users of varying degrees of access to the system. The options element contains no metadata elements and can only make use of a small set of structural elements, namely the list element and its children. + +The last major top-level element is the body element. It contains all structural elements in a DRI Document, including the lists used by the options element. Structural elements are used to build a generic representation of a DSpace page. Any DSpace page can be represented with a combination of the structural elements, which will in turn be transformed by the XSL templates into another format. This is the core mechanism that allows DSpace XML UI to apply uniform templates and styling rules to all DSpace pages and is the fundamental difference from the JSP approach currently used by DSpace. + +The body element directly contains only one type of element: div. The div element serves as a major division of content and any number of them can be contained by the body. Additionally, divisions are recursive, allowing divs to contain other divs. It is within these elements that all other structural elements are contained. Those elements include tables, paragraph elements p, and lists, as well as their various children elements. At the lower levels of this hierarchy lie the character container elements. These elements, namely paragraphs p, table cells, lists items, and the emphasis element hi, contain the textual content of a DSpace page, optionally modified with links, figures, and emphasis. If the division within which the character class is contained is tagged as interactive (via the interactive attribute), those elements can also contain interactive form fields. Divisions tagged as interactive must also provide method and action attributes for its fields to use. + +Figure 2: All the elements in the DRI schema (version 1.1). + + +Merging of DRI Documents+ +Having described the structure of the DRI Document, as well as its function in Manakin's Aspect chains, we now turn our attention to the one last detail of their use: merging two Documents into one. There are several situations where the need to merge two documents arises. In Manakin, for example, every Aspect is responsible for adding different functionality to a DSpace page. Since every instance of a page has to be a complete DRI Document, each Aspect is faced with the task of merging the Document it generated with the ones generated (and merged into one Document) by previously executed Aspects. For this reason rules exist that describe which elements can be merged together and what happens to their data and child elements in the process. + +When merging two DRI Documents, one is considered to be the main document, and the other a feeder document that is added in. The three top level containers (meta, body and options) of both documents are then individually analyzed and merged. In the case of the options and meta elements, the children tags are taken individually as well and treated differently from their siblings. + +The body elements are the easiest to merge: their respective div children are preserved along with their ordering and are grouped together under one element. Thus, the new body tag will contain all the divs of the main document followed by all the divs of the feeder. However, if two divs have the same n and rend attributes (and in case of an interactive div the same action and method attributes as well), those divs will be merged into one. The resulting div will bear the id, n, and rend attributes of the main document's div and contain all the divs of the main document followed by all the divs of the feeder. This process continues recursively until all the divs have been merged. It should be noted that two divisions with separate pagination rules cannot be merged together. + +Merging the options elements is somewhat different. First, list elements under options of both documents are compared with each other. Those unique to either document are simply added under the new options element, just like divs under body. In case of duplicates, that is list elements that belong to both documents and have the same n attribute, the two lists will be merged into one. The new list element will consist of the main document's head element, followed label-item pairs from the main document, and then finally the label-item pairs of the feeder, provided they are different from those of the main. + +Finally, the meta elements are merged much like the elements under body. The three children of meta - userMeta, pageMeta, and objectMeta - are individually merged, adding the contents of the feeder after the contents of the main. + + +Version Changes+ +The DRI schema will continue to evolve overtime as the needs of interface design require. The version attribute on the document will indicate which version of the schema the document conforms to. At the time Manakin was incorporated into the standard distribution of DSpac the current version was "1.1", however earlier versions of the Manakin interface may use "1.0". + +Changes from 1.0 to 1.1+ +There were major structural changes between these two version numbers. Several elements were removed from the schema:includeSet, include, objectMeta, and object. Originally all metadata for objects were included in-line with the DRI document, this proved to have several problems and has been removed in version 1.1 of the DRI schema. Instead of including metadata in-line, external references to the metadata is included. Thus, a reference element has been added along with referenceSet. These new elements operate like their counterparts in the previous version except refrencing metadata contained on the objectMeta element they reference metadata in external files. The repository and repositoryMeta elements were alse modified in a similar manner removing in-line metadata and refrencing external metadata documents. + + + +Element Reference+ +
+
+
+
+
BODY+ + + +Top-Level Container + +The body element is the main container for all content displayed to the user. It contains any number of div elements that group content into interactive and display blocks. + +Parent +
Children +
Attributes +
+ +<document version=1.0> + <meta> ... </meta> + <body> + <div n="division-example1" + id="XMLExample.div.division-example1"> + ... + </div> + <div n="division-example2" id="XMLExample.div.division-example2" + interactive="yes" action="www.DRItest.com" + method="post"> + ... + </div> + ... + </body> + <options> ... </options> +</document> ++ cell+ +Rich Text Container + + +Structural Element + +The cell element contained in a row of a table carries content for that table. It is a character container, just like p, item, and hi, and its primary purpose is to display textual data, possibly enhanced with hyperlinks, emphasized blocks of text, images and form fields. Every cell can be annotated with a role (the most common being "header" and "data") and can stretch across any number of rows and columns. Since cells cannot exist outside their container, row, their id attribute is optional. + +Parent +
Children +
Attributes +
+ +<table n="table-example" id="XMLExample.table.table-example" rows="2" + cols="3"> + <row role="head"> + <cell cols="2">Data Label One and Two</cell> <cell>Data Label + Three</cell> + ... + </row> + <row> + <cell> Value One </cell> <cell> Value Two </cell> <cell> Value + Three </cell> + ... + </row> + ... +</table> ++ div+ +Structural Element + +The div element represents a major section of content and can contain a wide variety of structural elements to present that content to the user. It can contain paragraphs, tables, and lists, as well as references to artifact information stored in artifactMeta, repositoryMeta, collections, and communities. The div element is also recursive, allowing it to be further divided into other divs. Divs can be of two types: interactive and static. The two types are set by the use of the interactive attribute and differ in their ability to contain interactive content. Children elements of divs tagged as interactive can contain form fields, with the action and method attributes of the div serving to resolve those fields. + +Parent +
Children +
Attributes +
+ +<body> + <div n="division-example" + id="XMLExample.div.division-example"> + <head> Example Division </head> + <p> This example shows the use of divisions. </p> + <table ...> + ... + </table> + <referenceSet ...> + ... + </referenceSet> + <list ...> + ... + </list> + <div n="sub-division-example" + id="XMLExample.div.sub-division-example"> + <p> Divisions may be nested </p> + ... + </div> + ... + </div> + ... +</body> ++ DOCUMENT+ +Document Root + +The document element is the root container of an XML UI document. All other elements are contained within it either directly or indirectly. The only attribute it carries is the version of the Schema to which it conforms. + +Parent +
Children +
Attributes + +
+ <document
+ version="1.1">
+ <meta>
+ ...
+ </meta>
+ <body>
+ ...
+ </body>
+ <options>
+ ...
+ </options>
+ </document>
+field+ +Text Container + +Structural Element + +The field element is a container for all information necessary to create a form field. The required type attribute determines the type of the field, while the children tags carry the information on how to build it. Fields can only occur in divisions tagged as "interactive". + +Parent +
Children +
Attributes + +
+ +<p> + <hi> ... </hi> + <xref> ... </xref> + <figure> ... </figure> + ... + <field id="XMLExample.field.name" n="name" type="text" + required="yes"> + <params size="16" maxlength="32"/> + <help>Some help text with <i18n>localized + content</i18n>.</help> + <value type="raw">Default value goes + here</value> + </field> +</p> + ++ figure+ +Text Container + +Structural Element + +The figure element is used to embed a reference to an image or a graphic element. It can be mixed freely with text, and any text within the tag itself will be used as an alternative descriptor or a caption. + +Parent +
Children +
Attributes + +
+ +<p> + <hi> ... </hi> + ... + <xref> ... </xref> + ... + <field> ... </field> + ... + <figure source="www.example.com/fig1"> This is a static image. + </figure> <figure source="www.example.com/fig1" + target="www.example.net"> + This image is also a link. + </figure> + ... +</p> ++ head+ +Text Container + +Structural Element + +The head element is primarily used as a label associated with its parent element. The rendering is determined by its parent tag, but can be overridden by the rend attribute. Since there can only be one head element associated with a particular tag, the n attribute is not needed, and the id attribute is optional. + +Parent +
Children +
Attributes + +
+
+<div ...>
+ <head> This is a simple header associated with its div element.
+ </head>
+ <div ...>
+ <head rend="green"> This header will be green.
+ </head>
+ <p>
+ <head> A header with <i18n>localized content</i18n>.
+ </head>
+ ...
+ </p>
+ </div>
+ <table ...>
+ <head> ...
+ </head>
+ ...
+ </table>
+ <list ...>
+ <head> ...
+ </head>
+ ...
+ </list>
+ ...
+</body>
+
+help+ +Text Container + +Structural Element + +The optional help element is used to supply help instructions in plain text and is normally contained by the field element. The method used to render the help text in the target markup is up to the theme. + +Parent +
Children +
Attributes +
+ +<p> + <hi> ... </hi> + ... + <xref> ... </xref> + ... + <figure> ... </figure> + ... + <field id="XMLExample.field.name" n="name" type="text" + required="yes"> + <params size="16" maxlength="32" /> + <help>Some help text with <i18n>localized + content</i18n>.</help> + </field> + ... +</p> ++ hi+ +Rich Text Container + +Structural Element + +The hi element is used for emphasis of text and occurs inside character containers like p and list item. It can be mixed freely with text, and any text within the tag itself will be emphasized in a manner specified by the required rend attribute. Additionally, hi element is the only text container component that is a rich text container itself, meaning it can contain other tags in addition to plain text. This allows it to contain other text containers, including other hi tags. + +Parent +
Children +
Attributes + +
+ +<p> + This text is normal, while <hi rend="bold">this text is bold and + this text is <hi rend="italic">bold and + italic.</hi></hi> +</p> ++ instance+ + +Structural Element + +The instance element contains the value associated with a form field's multiple instances. Fields encoded as an instance should also include the values of each instance as a hidden field. The hidden field should be appended with the index number for the instance. Thus if the field is "firstName" each instance would be named "firstName_1", "firstName_2", "firstName_3", etc... + +Parent +
Children +
Attributes +
+ +Example needed. ++ item+ +Rich Text Container + +Structural Element + +The item element is a rich text container used to display textual data in a list. As a rich text container it can contain hyperlinks, emphasized blocks of text, images and form fields in addition to plain text. + +The item element can be associated with a label that directly precedes it. The Schema requires that if one item in a list has an associated label, then all other items must have one as well. This mitigates the problem of loose connections between elements that is commonly encountered in XHTML, since every item in particular list has the same structure. + +Parent +
Children +
Attributes + +
+ +<list n="list-example" + id="XMLExample.list.list-example"> + <head> Example List </head> + <item> This is the first item + </item> <item> This is the second item with <hi ...>highlighted text</hi>, + <xref ...> a link</xref> and an <figure + ...>image</figure>.</item> + ... + <list n="list-example2" + id="XMLExample.list.list-example2"> + <head> Example List </head> + <label>ITEM ONE:</label> + <item> This is the first item + </item> + <label>ITEM TWO:</label> + <item> This is the second item with <hi ...>highlighted + text</hi>, <xref ...> a link</xref> and an <figure + ...>image</figure>.</item> + <label>ITEM THREE:</label> + <item> This is the third item with a <field ...> ... </field> + </item> + ... + </list> + <item> This is the third item in the list + </item> + ... +</list> ++ label+ +Text Container + +Structural Element + +The label element is associated with an item and annotates that item with a number, a textual description of some sort, or a simple bullet. + +Parent +
Children +
Attributes + +
+ +<list n="list-example" + id="XMLExample.list.list-example"> + <head>Example List</head> + <label>1</label> + <item> This is the first item </item> + <label>2</label> + <item> This is the second item with <hi ...>highlighted text</hi>, + <xref ...> a link</xref> and an <figure + ...>image</figure>.</item> + ... + <list n="list-example2" + id="XMLExample.list.list-example2"> + <head>Example Sublist</head> + <label>ITEM + ONE:</label> + <item> This is the first item </item> + <label>ITEM + TWO:</label> + <item> This is the second item with <hi ...>highlighted + text</hi>, <xref ...> a link</xref> and an <figure + ...>image</figure>.</item> + <label>ITEM + THREE:</label> + <item> This is the third item with a <field ...> ... </field> + </item> + ... + </list> + <item> This is the third item in the list </item> + ... +</list> ++ list+ +Structural Element + +The list element is used to display sets of sequential data. It contains an optional head element, as well as any number of item and list elements. Items contain textual information, while sublists contain other item or list elements. An item can also be associated with a label element that annotates an item with a number, a textual description of some sort, or a simple bullet. The list type (ordered, bulleted, gloss, etc.) is then determined either by the content of labels on items or by an explicit value of the type attribute. Note that if labels are used in conjunction with any items in a list, all of the items in that list must have a label. It is also recommended to avoid mixing label styles unless an explicit type is specified. + +Parent +
Children +
Attributes + +
+ +<div ...> + ... + <list n="list-example" + id="XMLExample.list.list-example"> + <head>Example List</head> + <item> ... </item> + <item> ... </item> + ... + <list n="list-example2" + id="XMLExample.list.list-example2"> + <head>Example Sublist</head> + <label> ... </label> + <item> ... </item> + <label> ... </label> + <item> ... </item> + <label> ... </label> + <item> ... </item> + ... + </list> + <label> ... </label> + <item> ... </item> + ... + </list> +</div> ++ META+ +Top-Level Container + +The meta element is a top level element and exists directly inside the document element. It serves as a container element for all metadata associated with a document broken up into categories according to the type of metadata they carry. + +Parent +
Children +
Attributes +
+ +<document version=1.0> + <meta> + <userMeta> ... </userMeta> + <pageMeta> ... </pageMeta> + <repositoryMeta> ... </repositoryMeta> + </meta> + <body> ... </body> + <options> ... </options> +</document> ++ metadata+ +Text Container + +Structural Element + +The metadata element carries generic metadata information in the form on an attribute-value pair. The type of information it contains is determined by two attributes: element, which specifies the general type of metadata stored, and an optional qualifier attribute that narrows the type down. The standard representation for this pairing is element.qualifier. The actual metadata is contained in the text of the tag itself. Additionally, a language attribute can be used to specify the language used for the metadata entry. + +Parent +
Children +
Attributes +
+ +<meta> + <userMeta> + <metadata element="identifier" qualifier="firstName"> Bob + </metadata> <metadata element="identifier" qualifier="lastName"> Jones + </metadata> <metadata ...> ... + </metadata> + ... + </userMeta> + <pageMeta> + <metadata element="rights" + qualifier="accessRights">user</metadata> <metadata ...> ... + </metadata> + ... + </pageMeta> +</meta> ++ OPTIONS+Top-Level Container + +The options element is the main container for all actions and navigation options available to the user. It consists of any number of list elements whose items contain navigation information and actions. While any list of navigational options may be contained in this element, it is suggested that at least the following 5 lists be included. + +Parent +
Children +
Attributes +
+ +<document version=1.0> + + <meta> ... </meta> + + <body> ... </body> + + <options> + + <list n="navigation-example1" + id="XMLExample.list.navigation-example1"> + + <head>Example Navigation List 1</head> + + <item><xref target="/link/to/option">Option + One</xref></item> + + <item><xref target="/link/to/option">Option + two</xref></item> + + ... + + </list> + + <list n="navigation-example2" + id="XMLExample.list.navigation-example2"> + + <head>Example Navigation List 2</head> + + <item><xref target="/link/to/option">Option + One</xref></item> + + <item><xref target="/link/to/option">Option + two</xref></item> + + ... + + </list> + + ... + + </options> + +</document> ++ p+ +Rich Text Container + +Structural Element + +The p element is a rich text container used by divs to display textual data in a paragraph format. As a rich text container it can contain hyperlinks, emphasized blocks of text, images and form fields in addition to plain text. + +Parent +
Children +
Attributes + +
+ +<div n="division-example" + id="XMLExample.div.division-example"> + + <p> This is a regular paragraph. + </p> <p> This text is normal, while <hi rend="bold">this text is bold + and this text is <hi rend="italic">bold and italic.</hi></hi> + </p> <p> This paragraph contains a <xref + target="/link/target">link</xref>, a static <figure + source="/image.jpg">image</figure>, and a <figure target= + "/link/target" source="/image.jpg">image link.</figure> + </p> + +</div> ++ pageMeta+ +Metadata Element + +The pageMeta element contains metadata associated with the document itself. It contains generic metadata elements to carry the content, and any number of trail elements to provide information on the user's current location in the system. Required and suggested values for metadata elements contained in pageMeta include but are not limited to: + +
See the metadata and trail tag entries for more information on their structure. + +Parent +
Children +
Attributes +
+ +<meta> + + <userMeta> ... </userMeta> + + <pageMeta> + + <metadata element="title">Example DRI + page</metadata> + + <metadata + element="contextPath">/xmlui/</metadata> + + <metadata ...> ... </metadata> + + ... + + <trail source="123456789/6"> A bread crumb item + </trail> + + <trail ...> ... </trail> + + ... + + </pageMeta> + +</meta> ++ params+ +Structural Component + +The params element identifies extra parameters used to build a form field. There are several attributes that may be available for this element depending on the field type. + +Parent +
Children +
Attributes + +
+ +<p> + + <field id="XMLExample.field.name" n="name" type="text" + required="yes"> + + <params size="16" + maxlength="32"/> + + <help>Some help text with <i18n>localized + content</i18n>.</help> + + <default>Default value goes here</default> + + </field> + +</p> ++ reference+ +Metadata Reference Element + +reference is a reference element used to access information stored in an external metadata file. The url attribute is used to locate the external metadata file. The type attribute provides a short limited description of the referenced object's type. + +reference elements can be both contained by includeSet elements and contain includeSets themselves, making the structure recursive. + +Parent +
Children +
Attributes +
+ + <includeSet n="browse-list" + id="XMLTest.includeSet.browse-list"> + <reference url="/metadata/handle/123/4/mets.xml" + repositoryID="123" type="DSpace + Item"/> <reference url="/metadata/handle/123/5/mets.xml" + repositoryID="123" /> + ... + </includeSet> + ++ referenceSet+ +Metadata Reference Element + +The referenceSet element is a container of artifact or repository references. + +Parent +
Children +
Attributes + +
+ + <div ...> + <head> Example Division </head> + <p> ... </p> + <table> ... </table> + <list> + ... + </list> + <referenceSet n="browse-list" + id="XMLTest.referenceSet.browse-list" type="summaryView" + informationModel="DSpace"> + <head>A header for the includeset</head> + <reference + url="/metadata/handle/123/34/mets.xml"/> + <reference + url=""metadata/handle/123/34/mets.xml/> + </referenceSet> + ... + </p> ++ repository+ +Metadata Element + +The repository element is used to describe the repository. Its principal component is a set of structural metadata that carrier information on how the repository's objects under objectMeta are related to each other. The principal method of encoding these relationships at the time of this writing is a METS document, although other formats, like RDF, may be employed in the future. + +Parent +
Children +
Attributes +
+ +<repositoryMeta> + + <repository repositoryID="123456789" + url="/metadata/handle/1234/4/mets.xml" /> + +</repositoryMeta> ++ repositoryMeta+ +Metadata Element + +The repositoryMeta element contains metadata references about the repositories used in the used or referenced in the document. It can contain any number of repository elements. + +See the repository tag entry for more information on the structure of repository elements. + +Parent +
Children +
Attributes +
+ +<meta> + + <userMeta> ... </usermeta> + + <pageMeta> ... </pageMeta> + + <repositoryMeta> + + <repository repositoryIID="..." url="..." + /> + + </repositoryMeta> + +</meta> ++ row+ +Structural Element + +The row element is contained inside a table and serves as a container of cell elements. A required role attribute determines how the row and its cells are rendered. + +Parent +
Children +
Attributes +
+ +<table n="table-example" id="XMLExample.table.table-example" rows="2" + cols="3"> + + <row + role="head"> + + <cell cols="2">Data Label One and + Two</cell> + + <cell>Data Label Three</cell> + + ... + + </row> <row> + + <cell> Value One </cell> + + <cell> Value Two </cell> + + <cell> Value Three </cell> + + ... + + </row> + + ... + +</table> ++ table+ +Structural Element + +The table element is a container for information presented in tabular format. It consists of a set of row elements and an optional header. + +Parent +
Children +
Attributes +
+ +<div n="division-example" + id="XMLExample.div.division-example"> + + <table n="table1" id="XMLExample.table.table1" rows="2" + cols="3"> + + <row role="head"> + + <cell cols="2">Data Label One and + Two</cell> + + <cell>Data Label Three</cell> + + ... + + </row> + + <row> + + <cell> Value One </cell> + + <cell> Value Two </cell> + + <cell> Value Three </cell> + + ... + + </row> + + ... + + </table> + ... +</div> ++ trail+ +Text Container + +Metadata Element + +The trail element carries information about the user's current location in the system relative of the repository's root page. Each instance of the element serves as one link in the path from the root to the current page. + +Parent +
Children +
Attributes +
+ +<pageMeta> + + <metadata element="title">Example DRI + page</metadata> + + <metadata + element="contextPath">/xmlui/</metadata> + + <metadata ...> ... </metadata> + + ... + + <trail target="/myDSpace"> A bread crumb item pointing to a + page. </trail> <trail ...> ... </trail> + + ... + +</pageMeta> ++ userMeta+ +Metadata Element + +The userMeta element contains metadata associated with the user that requested the document. It contains generic metadata elements, which in turn carry the information. Required and suggested values for metadata elements contained in userMeta include but not limited to: + +
See the metadata tag entry for more information on the structure of metadata elements. + +Parent +
Children +
Attributes +
+ +<meta> + + <userMeta> + + <metadata element="identifier" qualifier="email">bobJones@tamu.edu</metadata> + + <metadata element="identifier" qualifier="firstName">Bob</metadata> + + <metadata element="identifier" qualifier="lastName">Jones</metadata> + + <metadata element="rights" qualifier="accessRights">user</metadata> + + <metadata ...> ... </metadata> + + ... + + <trail source="123456789/6">A bread crumb item</trail> + + <trail ...> ... </trail> + + ... + + </userMeta> + + <pageMeta> ... </pageMeta> + +</meta> ++ value+ +Rich Text Container + +Structural Element + +The value element contains the value associated with a form field and can serve a different purpose for various field types. The value element is comprised of two subelements: the raw element which stores the unprocessed value directly from the user of other source, and the interpreted element which stores the value in a format appropriate for display to the user, possibly including rich text markup. + +Parent +
Children +
Attributes + +
+ +<p> + <hi> ... </hi> + <xref> ... </xref> + <figure> ... </figure> + <field id="XMLExample.field.name" n="name" type="text" + required="yes"> + <params size="16" maxlength="32"/> + <help>Some help text with <i18n>localized + content</i18n>.</help> + <value type="default">Author, + John</value> + </field> +</p> + ++ xref+ +Text Container + +Structural Element + +The xref element is a reference to an external document. It can be mixed freely with text, and any text within the tag itself will be used as part of the link's visual body. + +Parent +
Children +
Attributes +
+
+<p>
+ <xref target="/url/link/target">This text is shown as a link.</xref>
+</p>
+
++
+ Attachments:
+
+
+
+
+ |
+
![]() |
+
Document generated by Confluence on Mar 25, 2011 19:21 | +
+
+
+ DSpace Documentation : DSpace AIP Format
+
+
+
+ This page last changed on Dec 13, 2010 by tdonohue.
+
+
+ The DSpace AIP Format+ +
+
+
+
Makeup and Definition of AIPs+ +AIPs are Archival Information Packages.+ +
General AIP Structure / Examples+ +Generally speaking, an AIP is an Zip file containing a METS manifest and all related content bitstreams, license files and any other associated files. + +Some examples include: +
Notes: +
What is NOT in AIPs +
Customizing What Is Stored in Your AIPs+ +If you choose, you can customize exactly what information is stored in your AIPs. However, you should be aware that you can only restore information which is stored within your AIPs. If you choose to remove information from your AIPs, you will be unable to restore it later on (unless you are also backing up your entire DSpace database and assetstore folder). + +
There are two ways to go about customizing your AIP format: +
AIP Details: METS Structure+ +
Metadata in METS+ +The following tables describe how various metadata schemas are populated (via DSpace Crosswalks) in the METS file for an AIP. + +DIM (DSpace Intermediate Metadata) Schema+ +DIM Schema is essentially a way of representing DSpace internal metadata structure in XML. DSpace's internal metadata is very similar to a Qualified Dublin Core in its structure, and is primarily meant for descriptive metadata. However, DSpace's metadata allows for custom elements, qualifiers or schemas to be created (so it is extendable to any number of schemas, elements, qualifiers). These custom fields/schemas may or may not be able to be translated into normal Qualified Dublin Core. So, the DIM Schema must be able to express metadata schemas, elements or qualifiers which may or may not exist within Qualified Dublin Core. + +In the METS structure, DIM metadata always appears within a dmdSec inside an <mdWrap MDTYPE="OTHER" OTHERMDTYPE="DIM"> element. For example: +
+ + <dmdSec ID="dmdSec_2190"> + <mdWrap MDTYPE="OTHER" OTHERMDTYPE="DIM"> + ... + </mdWrap> + </dmdSec> ++ By default, DIM metadata is always included in AIPs. It is controlled by the following configuration in your dspace.cfg: +
+ +aip.disseminate.dmd = MODS, DIM ++ DIM Descriptive Elements for Item objects+ +As all DSpace Items already have user-assigned DIM (essentially Qualified Dublin Core) metadata fields, those fields are just exported into the DIM Schema within the METS file. + +DIM Descriptive Elements for Collection objects+ +For Collections, the following fields are translated to the DIM schema: + +
+
+
+
+
+
DIM Descriptive Elements for Community objects+ +For Communities, the following fields are translated to the DIM schema: + +
+
+
+
+
+
DIM Descriptive Elements for Site objects+ +For the Site Object, the following fields are translated to the DIM schema: + +
+
+
+
+
MODS Schema+ +By default, all DSpace descriptive metadata (DIM) is also translated into the MODS Schema by utilizing DSpace's MODSDisseminationCrosswalk. DSpace's DIM to MODS crosswalk is defined within your [dspace]/config/crosswalks/mods.properties configuration file. This file allows you to customize the MODS that is included within your AIPs. + +For more information on the MODS Schema, see http://www.loc.gov/standards/mods/mods-schemas.html + +In the METS structure, MODS metadata always appears within a dmdSec inside an <mdWrap MDTYPE="MODS"> element. For example: +
+ + <dmdSec ID="dmdSec_2189"> + <mdWrap MDTYPE="MODS"> + ... + </mdWrap> + </dmdSec> ++ By default, MODS metadata is always included in AIPs. It is controlled by the following configuration in your dspace.cfg: +
+ +aip.disseminate.dmd = MODS, DIM ++ The MODS metadata is included within your AIP to support interoperability. It provides a way for other systems to interact with or ingest the AIP without needing to understand the DIM Schema. You may choose to disable MODS if you wish, however this may decrease the likelihood that you'd be able to easily ingest your AIPs into a non-DSpace system (unless that non-DSpace system is able to understand the DIM schema). When restoring/ingesting AIPs, DSpace will always first attempt to restore DIM descriptive metadata. Only if no DIM metadata is found, will the MODS metadata be used during a restore. + +AIP Technical Metadata Schema (AIP-TECHMD)+ +The AIP Technical Metadata Schema is a way to translate technical metadata about a DSpace object into the DIM Schema. It is kept separate from DIM as it is considered technical metadata rather than descriptive metadata. + +In the METS structure, AIP-TECHMD metadata always appears within a sourceMD inside an <mdWrap MDTYPE="OTHER" OTHERMDTYPE="AIP-TECHMD"> element. For example: +
+ + <amdSec ID="amd_2191"> + ... + <sourceMD ID="sourceMD_2198"> + <mdWrap MDTYPE="OTHER" OTHERMDTYPE="AIP-TECHMD"> + ... + </mdWrap> + </sourceMD> + ... + </amdSec> ++ By default, AIP-TECHMD metadata is always included in AIPs. It is controlled by the following configuration in your dspace.cfg: +
+ +aip.disseminate.sourceMD = AIP-TECHMD ++ AIP Technical Metadata for Item+ +
+
+
+
+
+
AIP Technical Metadata for Bitstream+ +
+
+
+
+
+
AIP Technical Metadata for Collection+ +
+
+
+
+
+
AIP Technical Metadata for Community+ +
+
+
+
+
+
AIP Technical Metadata for Site+ +
+
+
+
+
+
PREMIS Schema+ +At this point in time, the PREMIS Schema is only used to represent technical metadata about DSpace Bitstreams (i.e. Files). The PREMIS metadata is generated by DSpace's PREMISCrosswalk. Only the PREMIS Object Entity Schema is used. + +In the METS structure, PREMIS metadata always appears within a techMD inside an <mdWrap MDTYPE="PREMIS"> element. PREMIS metadata is always wrapped withn a <premis:premis> element. For example: +
+ + <amdSec ID="amd_2209"> + ... + <techMD ID="techMD_2210"> + <mdWrap MDTYPE="PREMIS"> + <premis:premis> + ... + </premis:premis> + </mdWrap> + </techMD> + ... + </amdSec> ++ Each Bitstream (file) has its own amdSec within a METS manifest. So, there will be a separate PREMIS techMD for each Bitstream within a single Item. + +By default, PREMIS metadata is always included in AIPs. It is controlled by the following configuration in your dspace.cfg: +
+ +aip.disseminate.techMD = PREMIS, DSPACE-ROLES ++ PREMIS Metadata for Bitstream+ +The following Bitstream information is translated into PREMIS for each DSpace Bitstream (file): + +
+
+
+
+
+
DSPACE-ROLES Schema+ +All DSpace Groups and EPeople objects are translated into a custom DSPACE-ROLES XML Schema. This XML Schema is a very simple representation of the underlying DSpace database model for Groups and EPeople. The DSPACE-ROLES Schemas is generated by DSpace's RoleCrosswalk. + +Only the following DSpace Objects utilize the DSPACE-ROLES Schema in their AIPs: +
In the METS structure, DSPACE-ROLES metadata always appears within a techMD inside an <mdWrap MDTYPE="OTHER" OTHERMDTYPE="DSPACE-ROLES"> element. For example: +
+ + <amdSec ID="amd_2068"> + ... + <techMD ID="techMD_2070"> + <mdWrap MDTYPE="OTHER" OTHERMDTYPE="DSPACE-ROLES"> + ... + </mdWrap> + </techMD> + ... + </amdSec> ++ By default, DSPACE-ROLES metadata is always included in AIPs. It is controlled by the following configuration in your dspace.cfg: +
+ +aip.disseminate.techMD = PREMIS, DSPACE-ROLES ++ Example of DSPACE-ROLES Schema for a SITE AIP+ +Below is a general example of the structure of a DSPACE-ROLES XML file, as it would appear in a SITE AIP. + +
+ +<DSpaceRoles> + <Groups> + <Group ID="1" Name="Administrator"> + <Members> + <Member ID="1" Name="bsmith@myu.edu" /> + </Members> + </Group> + <Group ID="0" Name="Anonymous" /> + <Group ID="70" Name="COLLECTION_hdl:123456789/57_ADMIN"> + <Members> + <Member ID="1" Name="bsmith@myu.edu" /> + </Members> + </Group> + <Group ID="75" Name="COLLECTION_hdl:123456789/57_DEFAULT_READ"> + <MemberGroups> + <MemberGroup ID="0" Name="Anonymous" /> + </MemberGroups> + </Group> + <Group ID="71" Name="COLLECTION_hdl:123456789/57_SUBMIT"> + <Members> + <Member ID="1" Name="bsmith@myu.edu" /> + </Members> + </Group> + <Group ID="72" Name="COLLECTION_hdl:123456789/57_WORKFLOW_STEP_1"> + <MemberGroups> + <MemberGroup ID="1" Name="Administrator" /> + </MemberGroups> + </Group> + <Group ID="73" Name="COLLECTION_hdl:123456789/57_WORKFLOW_STEP_2"> + <MemberGroups> + <MemberGroup ID="1" Name="Administrator" /> + </MemberGroups> + </Group> + <Group ID="8" Name="COLLECTION_hdl:123456789/6703_DEFAULT_READ" /> + <Group ID="9" Name="COLLECTION_hdl:123456789/2_ADMIN"> + <Members> + <Member ID="1" Name="bsmith@myu.edu" /> + </Members> + </Group> + </Groups> + <People> + <Person ID="1"> + <Email>bsmith@myu.edu</Email> + <Netid>bsmith</Netid> + <FirstName>Bob</FirstName> + <LastName>Smith</LastName> + <Language>en</Language> + <CanLogin /> + </Person> + <Person ID="2"> + <Email>jjones@myu.edu</Email> + <FirstName>Jane</FirstName> + <LastName>Jones</LastName> + <Language>en</Language> + <CanLogin /> + <SelfRegistered /> + </Person> + </People> +</DSpaceRoles> ++
Example of DSPACE-ROLES Schema for a Community or Collection+ +Below is a general example of the structure of a DSPACE-ROLES XML file, as it would appear in a Community or Collection AIP. + +This specific example is for a Collection, which has associated Administrator, Submitter, and Workflow approver groups. In this very simple example, each group only has one Person as a member of it. Please notice that the Person's information (Name, NetID, etc) is NOT contained in this content (however they are available in the DSPACE-ROLES example for a SITE, as shown above) + +
+ +<DSpaceRoles> + <Groups> + <Group ID="9" Name="COLLECTION_hdl:123456789/2_ADMIN" Type="ADMIN"> + <Members> + <Member ID="1" Name="bsmith@myu.edu" /> + </Members> + </Group> + <Group ID="13" Name="COLLECTION_hdl:123456789/2_SUBMIT" Type="SUBMIT"> + <Members> + <Member ID="2" Name="jjones@myu.edu" /> + </Members> + </Group> + <Group ID="10" Name="COLLECTION_hdl:123456789/2_WORKFLOW_STEP_1" Type="WORKFLOW_STEP_1"> + <Members> + <Member ID="1" Name="bsmith@myu.edu" /> + </Members> + </Group> + <Group ID="11" Name="COLLECTION_hdl:123456789/2_WORKFLOW_STEP_2" Type="WORKFLOW_STEP_2"> + <Members> + <Member ID="2" Name="jjones@myu.edu" /> + </Members> + </Group> + <Group ID="12" Name="COLLECTION_hdl:123456789/2_WORKFLOW_STEP_3" Type="WORKFLOW_STEP_3"> + <Members> + <Member ID="1" Name="bsmith@myu.edu" /> + </Members> + </Group> + </Groups> +</DSpaceRoles> ++ METSRights Schema+ +All DSpace Policies (permissions on objects) are translated into the METSRights schema. This is different than the above DSPACE-ROLES schema, which only represents Groups and People objects. Instead, the METSRights schema is used to translate the permission statements (e.g. a group named "Library Admins" has Administrative permissions on a Community named "University Library"). But the METSRights schema doesn't represent who is a member of a particular group (that is defined in the DSPACE-ROLES schema, as described above). + +
All DSpace Object's AIPs (except for the SITE AIP) utilize the METSRights Schema in order to define what permissions people and groups have on that object. Although there are several sections to the METSRights Schema, DSpace AIPs only use the <RightsDeclarationMD> section, as this is what is used to describe rights on an object. + +In the METS structure, METSRights metadata always appears within a rightsMD inside an <mdWrap MDTYPE="OTHER" OTHERMDTYPE="METSRIGHTS"> element. For example: +
+ + <amdSec ID="amd_2068"> + ... + <rightsMD ID="rightsMD_2074"> + <mdWrap MDTYPE="OTHER" OTHERMDTYPE="METSRIGHTS"> + ... + </mdWrap> + </rightsMD> + ... + </amdSec> ++ By default, METSRights metadata is always included in AIPs. It is controlled by the following configuration in your dspace.cfg: +
+ +aip.disseminate.rightsMD = DSpaceDepositLicense:DSPACE_DEPLICENSE, \ + CreativeCommonsRDF:DSPACE_CCRDF, CreativeCommonsText:DSPACE_CCTEXT, METSRIGHTS ++ Example of METSRights Schema for an Item+ +An Item AIP will almost always contain several METSRights metadata sections within its METS Manifest. A separate METSRights metadata section is used to describe the permissions on: +
Below is an example of a METSRights sections for a publicly visible Bitstream, Bundle or Item. Notice it specifies that the "GENERAL PUBLIC" has the permission to DISCOVER or DISPLAY this object. +
+ +<rights:RightsDeclarationMD xmlns:rights="http://cosimo.stanford.edu/sdr/metsrights/" RIGHTSCATEGORY="LICENSED"> + <rights:Context CONTEXTCLASS="GENERAL PUBLIC"> + <rights:Permissions DISCOVER="true" DISPLAY="true" MODIFY="false" DELETE="false" /> + </rights:Context> +</rights:RightsDeclarationMD> ++ Example of METSRights Schema for a Collection+ +A Collection AIP contains one METSRights section, which describes the permissions different Groups or People have within the Collection + +Below is an example of a METSRights sections for a publicly visible Collection, which also has an Administrator group, a Submitter group, and a group for each of the three DSpace workflow approval steps. You'll notice that each of the groups is provided with very specific permissions within the Collection. Submitters & Workflow approvers can "ADD CONTENTS" to a collection (but cannot delete the collection). Administrators have full rights. +
+ +<rights:RightsDeclarationMD xmlns:rights="http://cosimo.stanford.edu/sdr/metsrights/" RIGHTSCATEGORY="LICENSED"> + <rights:Context CONTEXTCLASS="MANAGED_GRP"> + <rights:UserName USERTYPE="GROUP">COLLECTION_hdl:123456789/2_SUBMIT</rights:UserName> + <rights:Permissions DISCOVER="true" DISPLAY="true" MODIFY="true" DELETE="false" OTHER="true" OTHERPERMITTYPE="ADD CONTENTS" /> + </rights:Context> + <rights:Context CONTEXTCLASS="MANAGED_GRP"> + <rights:UserName USERTYPE="GROUP">COLLECTION_hdl:123456789/2_WORKFLOW_STEP_3</rights:UserName> + <rights:Permissions DISCOVER="true" DISPLAY="true" MODIFY="true" DELETE="false" OTHER="true" OTHERPERMITTYPE="ADD CONTENTS" /> + </rights:Context> + <rights:Context CONTEXTCLASS="MANAGED_GRP"> + <rights:UserName USERTYPE="GROUP">COLLECTION_hdl:123456789/2_WORKFLOW_STEP_2</rights:UserName> + <rights:Permissions DISCOVER="true" DISPLAY="true" MODIFY="true" DELETE="false" OTHER="true" OTHERPERMITTYPE="ADD CONTENTS" /> + </rights:Context> + <rights:Context CONTEXTCLASS="MANAGED_GRP"> + <rights:UserName USERTYPE="GROUP">COLLECTION_hdl:123456789/2_WORKFLOW_STEP_1</rights:UserName> + <rights:Permissions DISCOVER="true" DISPLAY="true" MODIFY="true" DELETE="false" OTHER="true" OTHERPERMITTYPE="ADD CONTENTS" /> + </rights:Context> + <rights:Context CONTEXTCLASS="MANAGED_GRP"> + <rights:UserName USERTYPE="GROUP">COLLECTION_hdl:123456789/2_ADMIN</rights:UserName> + <rights:Permissions DISCOVER="true" DISPLAY="true" COPY="true" DUPLICATE="true" MODIFY="true" DELETE="true" PRINT="true" OTHER="true" OTHERPERMITTYPE="ADMIN" /> + </rights:Context> + <rights:Context CONTEXTCLASS="GENERAL PUBLIC"> + <rights:Permissions DISCOVER="true" DISPLAY="true" MODIFY="false" DELETE="false" /> + </rights:Context> +</rights:RightsDeclarationMD> ++ Example of METSRights Schema for a Community+ +A Community AIP contains one METSRights section, which describes the permissions different Groups or People have within that Community. + +Below is an example of a METSRights sections for a publicly visible Community, which also has an Administrator group. As you'll notice, this content looks very similar to the Collection METSRights section (as described above) + +
+ +<rights:RightsDeclarationMD xmlns:rights="http://cosimo.stanford.edu/sdr/metsrights/" RIGHTSCATEGORY="LICENSED"> + <rights:Context CONTEXTCLASS="MANAGED_GRP"> + <rights:UserName USERTYPE="GROUP">COMMUNITY_hdl:123456789/10_ADMIN</rights:UserName> + <rights:Permissions DISCOVER="true" DISPLAY="true" COPY="true" DUPLICATE="true" MODIFY="true" DELETE="true" PRINT="true" OTHER="true" OTHERPERMITTYPE="ADMIN" /> + </rights:Context> + <rights:Context CONTEXTCLASS="GENERAL PUBLIC"> + <rights:Permissions DISCOVER="true" DISPLAY="true" MODIFY="false" DELETE="false" /> + </rights:Context> +</rights:RightsDeclarationMD> ++ +
+ Attachments:
+
+
+
+
+
+ ![]() + ![]() + ![]() + ![]() + |
+
![]() |
+
Document generated by Confluence on Mar 25, 2011 19:21 | +
+
+
+ DSpace Documentation : DSpace Services Framework
+
+
+
+ This page last changed on Mar 07, 2011 by awoods.
+
+
+ DSpace Services Framework+ +
+
+
+
The DSpace Services Framework is a backporting of the DSpace 2.0 Development Group's work in creating a reasonable and abstractable "Core Services" layer for DSpace components to operate within. The Services Framework represents a "best practice" for new DSpace architecture and implementation of extensions to the DSpace application. DSpace Services are best described as a "Simple Registry" where plugins Architectural Overview+ +DSpace Kernel+ +The DSpace Kernel manages the start up and access services in the DSpace Services framework. It is meant to allow for a simple way to control the core parts of DSpace and allow for flexible ways to startup the kernel. For example, the kernel can be run inside a single webapp along with a frontend piece (like JSPUI) or it can be started as part of the servlet container so that multiple webapps can use a single kernel (this increases speed and efficiency). The kernel is also designed to happily allow multiple kernels to run in a single servlet container using identifier keys. + +Kernel registration+ +The kernel will automatically register itself as an MBean when it starts up so that it can be managed via JMX. It allows startup and shutdown and provides direct access to the ServiceManager and the ConfigurationService. All the other core services can be retrieved from the ServiceManager by their APIs. Service Manager+ + +The ServiceManager abstracts the concepts of service lookups and lifecycle control. It also manages the configuration of services by allowing properties to be pushed into the services as they start up (mostly from the ConfigurationService). The ServiceManagerSystem abstraction allows the DSpace ServiceManager to use different systems to manage its services. The current implementations include Spring and Guice. This allows DSpace 2 to have very little service management code but still be flexible and not tied to specific technology. Developers who are comfortable with those technologies can consume the services from a parent Spring ApplicationContext or a parent Guice Module. The abstraction also means that we can replace Spring/Guice or add other dependency injection systems later without requiring developers to change their code. The interface provides simple methods for looking up services by interface type for developers who do not want to have to use or learn a dependency injection system or are using one which is not currently supported. + + +The DS2 kernel is compact so it can be completely started up in a unit test (technically integration test) environment. (This is how we test the kernel and core services currently). This allows developers to execute code against a fully functional kernel while developing and then deploy their code with high confidence. + +Basic Usage+ +To use the Framework you must begin by instantiating and starting a DSpaceKernel. The kernel will give you references to the ServiceManager and the ConfigurationService. The ServiceManager can be used to get references to other services and to register services which are not part of the core set. + +Access to the kernel is provided via the Kernel Manager through the DSpace object, which will locate the kernel object and allow it to be used. + +Standalone Applications+ +For standalone applications, access to the kernel is provided via the Kernel Manager and the DSpace object which will locate the kernel object and allow it to be used. +
+ +/* Instantiate the Utility Class */ +DSpace dspace = new DSpace(); + + +/* Access get the Service Manager by convenience method */ +ServiceManager manager = dspace.getServiceManager(); + + +/* Or access by convenience method for core services */ +EventService service = dspace.getEventService(); ++ The DSpace launcher ( +
+ bin/dspace+ ) initializes a kernel before dispatching to the selected command. + +Application Frameworks (Spring, Guice, etc.)+ +Similar to Standalone Applications, but you can use your framework to instantiate an org.dspace.utils.DSpace object. +
+
+ <bean id="dspace" class="org.dspace.utils.DSpace"/>
+
+Web Applications+ +In web applications, the kernel can be started and accessed through the use of Servlet Filter/ContextListeners which are provided as part of the DSpace 2 utilities. Developers don't need to understand what is going on behind the scenes and can simply write their applications and package them as webapps and take advantage of the services which are offered by DSpace 2. + + +Providers and Plugins+ +For developers (how we are trying to make your lives easier): The DS2 ServiceManager supports a plugin/provider system which is runtime hot-swappable. The implementor can register any service/provider bean or class with the DS2 kernel ServiceManager. The ServiceManager will manage the lifecycle of beans (if desired) and will instantiate and manage the lifecycle of any classes it is given. This can be done at any time and does not have to be done during Kernel startup. This allows providers to be swapped out at runtime without disrupting the service if desired. The goal of this system is to allow DS2 to be extended without requiring any changes to the core codebase or a rebuild of the code code. + +Activators+ +Developers can provide an activator to allow the system to startup their service or provider. It is a simple interface with 2 methods which are called by the ServiceManager to startup the provider(s) and later to shut them down. These simply allow a developer to run some arbitrary code in order to create and register services if desired. It is the method provided to add plugins directly to the system via configuration as the activators are just listed in the configuration file and the system starts them up in the order it finds them. + +Provider Stacks+ +Utilities are provided to assist with stacking and ordering providers. Ordering is handled via a priority number such that 1 is the highest priority and something like 10 would be lower. 0 indicates that priority is not important for this service and can be used to ensure the provider is placed at or near the end without having to set some arbitrarily high number. + + +Core Services+ +The core services are all behind APIs so that they can be reimplemented without affecting developers who are using the services. Most of the services have plugin/provider points so that customizations can be added into the system without touching the core services code. For example, let's say a deployer has a specialized authentication system and wants to manage the authentication calls which come into the system. The implementor can simply implement an AuthenticationProvider and then register it with the DS2 kernel's ServiceManager. This can be done at any time and does not have to be done during Kernel startup. This allows providers to be swapped out at runtime without disrupting the DS2 service if desired. It can also speed up development by allowing quick hot redeploys of code during development. + +Caching Service+ +Provides for a centralized way to handle caching in the system and thus a single point for configuration and control over all caches in the system. Provider and plugin developers are strongly encouraged to use this rather than implementing their own caching. The caching service has the concept of scopes so even storing data in maps or lists is discouraged unless there are good reasons to do so. + +Configuration Service+ +The ConfigurationService controls the external and internal configuration of DSpace 2. It reads Properties files when the kernel starts up and merges them with any dynamic configuration data which is available from the services. This service allows settings to be updated as the system is running, and also defines listeners which allow services to know when their configuration settings have changed and take action if desired. It is the central point to access and manage all the configuration settings in DSpace 2. + +Manages the configuration of the DSpace 2 system. Can be used to manage configuration for providers and plugins also. + +EventService+ +Handles events and provides access to listeners for consumption of events. + + +RequestService+ +In DS2 a request is an atomic transaction in the system. It is likely to be an HTTP request in many cases but it does not have to be. This service provides the core services with a way to manage atomic transactions so that when a request comes in which requires multiple things to happen they can either all succeed or all fail without each service attempting to manage this independently. In a nutshell this simply allows identification of the current request and the ability to discover if it succeeded or failed when it ends. Nothing in the system will enforce usage of the service, but we encourage developers who are interacting with the system to make use of this service so they know if the request they are participating in with has succeeded or failed and can take appropriate actions. + + +SessionService+ +In DS2 a session is like an HttpSession (and generally is actually one) so this service is here to allow developers to find information about the current session and to access information in it. The session identifies the current user (if authenticated) so it also serves as a way to track user sessions. Since we use HttpSession directly it is easy to mirror sessions across multiple servers in order to allow for no-interruption failover for users when servers go offline. + +Examples+ +Configuring Event Listeners+ +Event Listeners can be created by overriding the the EventListener interface: + +In Spring: + +
+ +<?xml version="1.0" encoding="UTF-8"?> +<beans> + + <bean id="dspace" class="org.dspace.utils.DSpace"/> + + <bean id="dspace.eventService" + factory-bean="dspace" + factory-method="getEventService"/> + + <bean class="org.my.EventListener"> + <property name="eventService" > + <ref bean="dspace.eventService"/> + </property> + </bean> +</beans> ++ (org.my.EventListener will need to register itself with the EventService, for which it is passed a reference to that service via the eventService property.) + +or in Java: + +
+ +DSpace dspace = new DSpace(); + +EventService eventService = dspace.getEventService(); + +EventListener listener = new org.my.EventListener(); +eventService.registerEventListener(listener); ++ (This registers the listener externally – the listener code assumes it is registered.) + + +TODO: examples in Guice + +TODO: examples of implementing and registering configurations in Spring and Guice + +TBS: how we did X before : how we do it using the Framework + ++
+ Attachments:
+
+
+
+
+
+ ![]() + ![]() + ![]() + ![]() + |
+
![]() |
+
Document generated by Confluence on Mar 25, 2011 19:21 | +
+
+
+ DSpace Documentation : DSpace Statistics
+
+
+
+ This page last changed on Jan 14, 2011 by benbosman.
+
+
+ DSpace Statistics+ +DSpace 1.6 and newer versions uses the Apache SOLR application underlying the statistics. SOLR enables performant searching and adding to vast amounts of (usage) data.
+
+
+
What is exactly being logged ?+ +Each time a page or file gets requested, this request is being logged. The logging happens at the server side, and doesn't require a javascript like Google Analytics does, to provide usage data. + +Definition of which fields are to be stored happens in the file dspace/solr/statistics/conf/schema.xml.
+ <field name="type" type="integer" indexed="true" stored="true" required="true" /> +<field name="id" type="integer" indexed="true" stored="true" required="true" /> +<field name="ip" type="string" indexed="true" stored="true" required="false" /> +<field name="time" type="date" indexed="true" stored="true" required="true" /> +<field name="epersonid" type="integer" indexed="true" stored="true" required="false" /> +<field name="country" type="string" indexed="true" stored="true" required="false" /> +<field name="city" type="string" indexed="true" stored="true" required="false"/> +<field name="owningComm" type="integer" indexed="true" stored="true" required="false" multiValued="true" /> ++ The combination of type and id determine which resource (either community, collection, item page or file download) has been requested. + +Web user interface for DSpace statistics+ +In the XMLUI, statistics can be accessed from the lower end of the navigation menu. In the JSPUI, a view statistics button appears on the bottom of pages for which statistics are available. + +If you are not seeing these links or buttons, it's likely that they are only enabled for administrators in your installation. Change the configuration parameter "statistics.item.authorization.admin" to false in order to make statistics visible for all repository visitors. + +Home page+ +Starting from the repository homepage, the statistics page displays the top 10 most popular items of the entire repository. + +Community home page+ +The following statistics are available for the community home pages: +
Collection home page+ +The following statistics are available for the collection home pages: +
Item home page+ +The following statistics are available for the item home pages: +
Usage Event Logging and Usage Statistics Gathering+ +The DSpace Statistics Implementation is a Client/Server architecture based on Solr for collecting usage events in the JSPUI and XMLUI user interface applications of DSpace. Solr runs as a separate webapplication and an instance of Apache Http Client is utilized to allow parallel requests to log statistics events into this Solr instance. + +Configuration settings for Statistics+ +In the dspace.cfg file review the following fields to make sure they are uncommented: + +
+
+
+
+
Upgrade Process for Statistics.+ +Example of rebuild and redeploy DSpace (only if you have configured your distribution in this manner) + +First approach the traditional DSpace build process for updating + +
+ cd [dspace-source]/dspace
+ mvn package
+ cd [dspace-source]/dspace/target/dspace-<version>-build.dir
+ ant -Dconfig=[dspace]/config/dspace.cfg update
+ cp -R [dspace]/webapps/* [TOMCAT]/webapps
+
+The last step is only used if you are not mounting [dspace]/webapps directly into your Tomcat, Resin or Jetty host (the recommended practice)If you only need to build the statistics, and don't make any changes to other web applications, you can replace the copy step above with: + +
+ cp -R dspace/webapps/solr TOMCAT/webapps ++ Again, only if you are not mounting [dspace]/webapps directly into your Tomcat, Resin or Jetty host (the recommended practice) + +Restart your webapps (Tomcat/Jetty/Resin) + +Older setting that are not related to the new 1.6 Statistics+ +The following Dspace.cfg fields are only applicable to the older statistics solution. + +
+ ###### Statistical Report Configuration Settings ###### + + # should the stats be publicly available? should be set to false if you only + # want administrators to access the stats, or you do not intend to generate + # any + report.public = false + + # directory where live reports are stored + report.dir = ${dspace.dir}/reports/ ++ These fields are not used by the new 1.6 Statistics, but are only related to the Statistics from previous DSpace releases + +Statistics Administration+ +Converting older DSpace logs into SOLR usage data+ +If you have upgraded from a previous version of DSpace, converting older log files ensures that you carry over older usage stats from before the upgrade. + +Statistics Client Utility+ +The command line interface (CLI) scripts can be used to clean the usage database from additional spider traffic and other maintenance tasks. + +Statistics differences between DSpace 1.6.x and 1.7.0+ +SOLR optimization added+ +If required, the solr server can be optimized by running +
+ {dspace.dir}/bin/stats-util -o ++ . More information on how these solr server optimizations work can be found here: http://wiki.apache.org/solr/SolrPerformanceFactors#Optimization_Considerations. + +SOLR Autocommit+ +In DSpace 1.6.x, each solr event was committed to the solr server individually. For high load DSpace installations, this would result in a huge load of small solr commits resulting in a very high load on the solr server.
+ {dspace.dir}/solr/statistics/conf/solrconfig.xml. ++ |
+
![]() |
+
Document generated by Confluence on Mar 25, 2011 19:21 | +
+
+
+ DSpace Documentation : DSpace System Documentation
+
+
+
+ This page last changed on Dec 16, 2010 by tdonohue.
+
+
+ The DSpace System Documentation+ + + + |
+
![]() |
+
Document generated by Confluence on Mar 25, 2011 19:21 | +
+
+
+ DSpace Documentation : Directories
+
+
+
+ This page last changed on Feb 17, 2011 by helix84.
+
+
+ DSpace System Documentation: Directories and Files+ +
+
+
+
Overview+ +A complete DSpace installation consists of three separate directory trees: + +
Source Directory Layout+ +
Installed Directory Layout+ +Below is the basic layout of a DSpace installation using the default configuration. These paths can be configured if necessary. + +
Contents of JSPUI Web Application+ +DSpace's Ant build file creates a dspace-jspui-webapp/ directory with the following structure: + +
Contents of XMLUI Web Application (aka Manakin)+ +DSpace's Ant build file creates a dspace-xmlui-webapp/ directory with the following structure: + +
Log Files+ +The first source of potential confusion is the log files. Since DSpace uses a number of third-party tools, problems can occur in a variety of places. Below is a table listing the main log files used in a typical DSpace setup. The locations given are defaults, and might be different for your system depending on where you installed DSpace and the third-party tools. The ordering of the list is roughly the recommended order for searching them for the details about a particular problem or error. +
+
+
+
+
log4j.properties File.+ +the file [dspace]/config/log4j.properties controls how and where log files are created. There are three sets of configurations in that file, called A1, A2, and A3. These are used to control the logs for DSpace, the checksum checker, and the XMLUI respectively. The important settings in this file are: +
+
+
+
+
+
|
+
![]() |
+
Document generated by Confluence on Mar 25, 2011 19:21 | +
+
+
+ DSpace Documentation : Discovery
+
+
+
+ This page last changed on Dec 16, 2010 by bram.
+
+
+ DSpace Discovery+ +
+
+
+Introduction Video+ +Usage Guidelines+ +The Discovery Module enables faceted searching for your repository. + +In a faceted search, a user can filter what they are looking for by grouping entries into a facet, and drill down to find the content they are interested in. Although these techniques are new in DSpace, they might feel familiar from other platforms like Aquabroser or Amazon, where facets help you to select the right product according to facets like price and brand. DSpace Discovery offers very powerful browse and search configurations that were only possible with code customization in the past. + +Instructions for enabling Discovery in DSpace 1.7.0+ +As with any upgrade procedure, it is highly recommend that you backup your existing data thoroughly. This includes cases where upgrading DSpace from 1.6.2 to 1.7.0. Although upgrades in versions of Solr/Lucene do tend to be forwards compatible for the data stored in the Lucene index, it is always a best practice to backup your dspace.dir/solr/statistics cores to assure no data is lost. + + +
Instructions for Configuring Discovery+ +Discovery can be configured at multiple levels of the application. Outlined below will be where in Discovery changes can be made that will alter the presentation. The primary place that the user experience is altered in XMLUi is through the dspace-solr-search.cfg file + +Configuring Facets that are Exposed for Search Results+ +
+
+
+
+
Advanced Configuration in Solr+ +Solr itself now runs two cores. One for collection DSpace Solr based "statistics", the other for Discovery Solr based "search" + +
+ solr +├── search +│ ├── conf +│ │ ├── admin-extra.html +│ │ ├── elevate.xml +│ │ ├── protwords.txt +│ │ ├── schema.xml +│ │ ├── scripts.conf +│ │ ├── solrconfig.xml +│ │ ├── spellings.txt +│ │ ├── stopwords.txt +│ │ ├── synonyms.txt +│ │ └── xslt +│ │ ├── DRI.xsl +│ │ ├── example.xsl +│ │ ├── example_atom.xsl +│ │ ├── example_rss.xsl +│ │ └── luke.xsl +│ └── conf2 +├── solr.xml +└── statistics + └── conf + ├── admin-extra.html + ├── elevate.xml + ├── protwords.txt + ├── schema.xml + ├── scripts.conf + ├── solrconfig.xml + ├── spellings.txt + ├── stopwords.txt + ├── synonyms.txt + └── xslt + ├── example.xsl + ├── example_atom.xsl + ├── example_rss.xsl + └── luke.xsl ++ |
+
![]() |
+
Document generated by Confluence on Mar 25, 2011 19:21 | +
+
+
+ DSpace Documentation : Embargo
+
+
+
+ This page last changed on Dec 14, 2010 by tdonohue.
+
+
+ Embargo Support in DSpace 1.6+ +
+
+
+
What is an embargo?+ +An embargo is a temporary access restriction placed on content, commencing at time of accession. It's scope or duration may vary, but the fact that it eventually expires is what distinguishes it from other content restrictions. For example, it is not unusual for content destined for DSpace to come with permanent restrictions on use or access based on license-driven or other IP-based requirements that limit access to institutionally affiliated users. Restrictions such as these are imposed and managed using standard administrative tools in DSpace, typically by attaching specific policies to Items or Collections, Bitstreams, etc. The embargo functionally introduced in 1.6, however, includes tools to automate the imposition and removal of restrictions in managed timeframes. + +Embargo model and life-cycle+ +Functionally, the embargo system allows you to attach 'terms' to an item before it is placed into the repository, which express how the embargo should be applied. What do 'we mean by terms' here? They are really any expression that the system is capable of turning into (1) the time the embargo expires, and (2) a concrete set of access restrictions. Some examples: + +"2020-09-12" - an absolute date (i.e. the date embargo will be lifted) These terms are 'interpreted' by the embargo system to yield a specific date on which the embargo can be removed or 'lifted'., and a specific set of access policies. Obviously, some terms are easier to interpret than others (the absolute date really requires none at all), and the 'default' embargo logic understands only the most basic terms (the first and third examples above). But as we will see below, the embargo system provides you with the ability to add in your own 'interpreters' to cope with any terms expressions you wish to have. This date that is the result of the interpretation is stored with the item and the embargo system detects when that date has passed, and removes the embargo ("lifts it"), so the item bitstreams become available. Here is a more detailed life-cycle for an embargoed item: + +Terms assignment+ +The first step in placing an embargo on an item is to attach (assign) 'terms' to it. Terms interpretation/imposition+ +In DSpace terminology, when an Item has exited the last of any workflow steps (or if none have been defined for it), it is said to be 'installed' into the repository. At this precise time, the 'interpretation' of the terms occurs, and a computed 'lift date' is assigned, which like the terms is recorded in a configurable metadata field. It is important to understand that this interpretation happens only once, (just like the installation), and cannot be revisited later. Thus, although an administrator can assign a new value to the metadata field holding the terms after the item has been installed, this will have no effect on the embargo, whose 'force' now resides entirely in the 'lift date' value. For this reason, you cannot embargo content already in your repository (at least using standard tools). The other action taken at installation time is the actual imposition of the embargo. The default behavior here is simply to remove the read policies on all the bundles and bitstreams except for the "LICENSE" or "METADATA" bundles. See section V. below for how to alter this behavior. Also note that since these policy changes occur before installation, there is no time during which embargoed content is 'exposed' (accessible by non-administrators). The terms interpretation and imposition together are called 'setting' the embargo, and the component that performs them both is called the embargo 'setter'. + +Embargo period+ +After an embargoed item has been installed, the policy restrictions remain in effect until removed. This is not an automatic process, however: a 'lifter' must be run periodically to look for items whose 'lift date' is past. Note that this means the effective removal of an embargo is not the lift date, but the earliest date after the lift date that the lifter is run. Typically, a nightly cron-scheduled invocation of the lifter is more than adequate, given the granularity of embargo terms. Also note that during the embargo period, all metadata of the item remains visible.This default behavior can be changed. One final point to note is that the 'lift date', although it was computed and assigned during the previous stage, is in the end a regular metadata field. That means, if there are extraordinary circumstances that require an administrator (or collection editor - anyone with edit permissions on metadata) to change the lift date, they can do so. Thus, they can 'revise' the lift date without reference to the original terms. This date will be checked the next Embargo lift+ +When the lifter discovers an item whose lift date is in the past, it removes (lifts) the embargo. The default behavior of the lifter is to add the resource policies Post embargo+ +After the embargo has been lifted, the item ceases to respond to any of the embargo life-cycle events. The values of the metadata fields reflect essentially historical or provenance values. With the exception of the additional metadata fields, they are indistinguishable from items that were never subject to embargo. + +Configuration+ +DSpace embargoes utilize standard metadata fields to hold both the 'terms' and the 'lift date'. Which fields you use are configurable, and no specific metadata element is dedicated or pre-defined for use in embargo. Rather, you specify exactly what field you want the embargo system to examine when it needs to find the terms or assign the lift date. + +The properties that specify these assignments live in dspace.cfg: +
+
+# DC metadata field to hold the user-supplied embargo terms
+embargo.field.terms = SCHEMA.ELEMENT.QUALIFIER
+
+# DC metadata field to hold computed "lift date" of embargo
+embargo.field.lift = SCHEMA.ELEMENT.QUALIFIER
+
+You replace the placeholder values with real metadata field names. If you only need the 'default' embargo behavior - which essentially accepts only absolute dates as 'terms' , There is also a property for the special date of 'forever': +
+ +# string in terms field to indicate indefinite embargo +embargo.terms.open = forever ++ which you may change to suit linguistic or other preference. + +You are free to use existing metadata fields, or create new fields. If you choose the latter, you must understand that the embargo system does not create or configure these fields: i.e. you must follow all the standard documented procedures for actually creating them (i.e. adding them to the metadata registry, or to display templates, etc) - this does not happen automatically. Likewise, if you want the field for 'terms' to appear in submission screens and workflows, you must follow the documented procedure for configurable submission (basically, this means adding the field to input-forms.xml). The flexibility of metadata configuration makes if easy for you to restrict embargoes to specific collections, since configurable submission can be defined per collection. + +Key recommendations: + +
Operation+ +After the fields defined for terms and lift date have been assigned in dspace.cfg, and created and configured wherever they will be used, you can begin to embargo items simply by entering data (dates, if using the default setter) in the terms field. They will automatically be embargoed as they exit workflow. For the embargo to be lifted on any item, however, a new administrative procedure must be added: the 'embargo lifter' must be invoked on a regular basis. This task examines all embargoed items, and if their 'lift date' has passed, it removes the access restrictions on the item. Good practice dictates automating this procedure using cron jobs or the like, rather than manually running it. Extending embargo functionality+ +The 1.6 embargo system supplies a default 'interpreter/imposition' class (the 'Setter') as well as a 'Lifter', but they are fairly rudimentary in several respects. + +Setter+ +The default setter recognizes only two expressions of terms: either a literal, non-relative date in the fixed format 'yyyy-mm-dd' (known as ISO 8601), or a special string used for open-ended embargo (the default configured value for this is 'forever', but this can be changed in dspace.cfg to 'toujours', 'unendlich', etc). It will perform a minimal sanity check that the date is not in the past. Similarly, the default setter will only remove all read policies as noted above, rather than applying more nuanced rules (e.g allow access to certain IP groups, deny the rest). Fortunately, the setter class itself is configurable and you can 'plug in' any behavior you like, provided it is written in java and conforms to the setter interface. The dspace.cfg property: + +
+
+# implementation of embargo setter plugin - replace with local implementation if applicable
+plugin.single.org.dspace.embargo.EmbargoSetter = org.dspace.embargo.DefaultEmbargoSetter
+
+controls which setter to use. + +Lifter+ +The default lifter behavior as described above - essentially applying the collection policy rules to the item - might also not be sufficient for all purposes. It also can be replaced with another class: + +
+
+# implementation of embargo lifter plugin - - replace with local implementation if applicable
+plugin.single.org.dspace.embargo.EmbargoLifter = org.dspace.embargo.DefaultEmbargoLifter
+
+ |
+
![]() |
+
Document generated by Confluence on Mar 25, 2011 19:21 | +
+
+
+ DSpace Documentation : Functional Overview
+
+
+
+ This page last changed on Jan 07, 2011 by tdonohue.
+
+
+ DSpace System Documentation: Functional Overview+ +The following sections describe the various functional aspects of the DSpace system. + +
+
+
+
Data Model+ +Data Model Diagram + +The way data is organized in DSpace is intended to reflect the structure of the organization using the DSpace system. Each DSpace site is divided into communities, which can be further divided into sub-communities reflecting the typical university structure of college, department, research center, or laboratory. + +Communities contain collections, which are groupings of related content. A collection may appear in more than one community. + +Each collection is composed of items, which are the basic archival elements of the archive. Each item is owned by one collection. Additionally, an item may appear in additional collections; however every item has one and only one owning collection. + +Items are further subdivided into named bundles of bitstreams. Bitstreams are, as the name suggests, streams of bits, usually ordinary computer files. Bitstreams that are somehow closely related, for example HTML files and images that compose a single HTML document, are organized into bundles. + +In practice, most items tend to have these named bundles: + +
Each bitstream is associated with one Bitstream Format. Because preservation services may be an important aspect of the DSpace service, it is important to capture the specific formats of files that users submit. In DSpace, a bitstream format is a unique and consistent way to refer to a particular file format. An integral part of a bitstream format is an either implicit or explicit notion of how material in that format can be interpreted. For example, the interpretation for bitstreams encoded in the JPEG standard for still image compression is defined explicitly in the Standard ISO/IEC 10918-1. The interpretation of bitstreams in Microsoft Word 2000 format is defined implicitly, through reference to the Microsoft Word 2000 application. Bitstream formats can be more specific than MIME types or file suffixes. For example, application/ms-word and .doc span multiple versions of the Microsoft Word application, each of which produces bitstreams with presumably different characteristics. + +Each bitstream format additionally has a support level, indicating how well the hosting institution is likely to be able to preserve content in the format in the future. There are three possible support levels that bitstream formats may be assigned by the hosting institution. The host institution should determine the exact meaning of each support level, after careful consideration of costs and requirements. MIT Libraries' interpretation is shown below: +
+
+
+
+
Each item has one qualified Dublin Core metadata record. Other metadata might be stored in an item as a serialized bitstream, but we store Dublin Core for every item for interoperability and ease of discovery. The Dublin Core may be entered by end-users as they submit content, or it might be derived from other metadata as part of an ingest process. + +Items can be removed from DSpace in one of two ways: They may be 'withdrawn', which means they remain in the archive but are completely hidden from view. In this case, if an end-user attempts to access the withdrawn item, they are presented with a 'tombstone,' that indicates the item has been removed. For whatever reason, an item may also be 'expunged' if necessary, in which case all traces of it are removed from the archive. +
+
+
+
+
+
Plugin Manager+ +The PluginManager is a very simple component container. It creates and organizes components (plugins), and helps select a plugin in the cases where there are many possible choices. It also gives some limited control over the lifecycle of a plugin. + +A plugin is defined by a Java interface. The consumer of a plugin asks for its plugin by interface. A Plugin is an instance of any class that implements the plugin interface. It is interchangeable with other implementations, so that any of them may be "plugged in". + +The mediafilter is a simple example of a plugin implementation. Refer to the Business Logic Layer for more details on Plugins. + + +Metadata+ +Broadly speaking, DSpace holds three sorts of metadata about archived content: + +
Packager Plugins+ +Packagers are software modules that translate between DSpace Item objects and a self-contained external representation, or "package". A Package Ingester interprets, or ingests, the package and creates an Item. A Package Disseminator writes out the contents of an Item in the package format. + +A package is typically an archive file such as a Zip or "tar" file, including a manifest document which contains metadata and a description of the package contents. The IMS Content Package is a typical packaging standard. A package might also be a single document or media file that contains its own metadata, such as a PDF document with embedded descriptive metadata. + +Package ingesters and package disseminators are each a type of named plugin (see Plugin Manager), so it is easy to add new packagers specific to the needs of your site. You do not have to supply both an ingester and disseminator for each format; it is perfectly acceptable to just implement one of them. + +Most packager plugins call upon Crosswalk Plugins to translate the metadata between DSpace's object model and the package format. + +More information about calling Packagers to ingest or disseminate content can be found in the Package Importer and Exporter section of the System Administration documentation. + + +Crosswalk Plugins+ +Crosswalks are software modules that translate between DSpace object metadata and a specific external representation. An Ingestion Crosswalk interprets the external format and crosswalks it to DSpace's internal data structure, while a Dissemination Crosswalk does the opposite. + +For example, a MODS ingestion crosswalk translates descriptive metadata from the MODS format to the metadata fields on a DSpace Item. A MODS dissemination crosswalk generates a MODS document from the metadata on a DSpace Item. + +Crosswalk plugins are named plugins (see Plugin Manager), so it is easy to add new crosswalks. You do not have to supply both an ingester and disseminator for each format; it is perfectly acceptable to just implement one of them. + +There is also a special pair of crosswalk plugins which use XSL stylesheets to translate the external metadata to or from an internal DSpace format. You can add and modify XSLT crosswalks simply by editing the DSpace configuration and the stylesheets, which are stored in files in the DSpace installation directory. + +The Packager plugins and OAH-PMH server make use of crosswalk plugins. + + +E-People and Groups+ +Although many of DSpace's functions such as document discovery and retrieval can be used anonymously, some features (and perhaps some documents) are only available to certain "privileged" users. E-People and Groups are the way DSpace identifies application users for the purpose of granting privileges. This identity is bound to a session of a DSpace application such as the Web UI or one of the command-line batch programs. Both E-People and Groups are granted privileges by the authorization system described below. + +E-Person+ +DSpace holds the following information about each e-person: + +
Groups+ +Groups are another kind of entity that can be granted permissions in the authorization system. A group is usually an explicit list of E-People; anyone identified as one of those E-People also gains the privileges granted to the group. + +However, an application session can be assigned membership in a group without being identified as an E-Person. For example, some sites use this feature to identify users of a local network so they can read restricted materials not open to the whole world. Sessions originating from the local network are given membership in the "LocalUsers" group and gain the corresponding privileges. + +Administrators can also use groups as "roles" to manage the granting of privileges more efficiently. + + + +Authentication+ +Authentication is when an application session positively identifies itself as belonging to an E-Person and/or Group. In DSpace 1.4 and later, it is implemented by a mechanism called Stackable Authentication: the DSpace configuration declares a "stack" of authentication methods. An application (like the Web UI) calls on the Authentication Manager, which tries each of these methods in turn to identify the E-Person to which the session belongs, as well as any extra Groups. The E-Person authentication methods are tried in turn until one succeeds. Every authenticator in the stack is given a chance to assign extra Groups. This mechanism offers the following advantages: + +
Authorization+ +DSpace's authorization system is based on associating actions with objects and the lists of EPeople who can perform them. The associations are called Resource Policies, and the lists of EPeople are called Groups. There are two built-in groups: 'Administrators', who can do anything in a site, and 'Anonymous', which is a list that contains all users. Assigning a policy for an action on an object to anonymous means giving everyone permission to do that action. (For example, most objects in DSpace sites have a policy of 'anonymous' READ.) Permissions must be explicit - lack of an explicit permission results in the default policy of 'deny'. Permissions also do not 'commute'; for example, if an e-person has READ permission on an item, they might not necessarily have READ permission on the bundles and bitstreams in that item. Currently Collections, Communities and Items are discoverable in the browse and search systems regardless of READ authorization. + +The following actions are possible: + +Collection +
+
+
+
+
Item +
+
+
+
+
Bundle +
+
+
+
+
Bitstream +
+
+
+
+
Note that there is no 'DELETE' action. In order to 'delete' an object (e.g. an item) from the archive, one must have REMOVE permission on all objects (in this case, collection) that contain it. The 'orphaned' item is automatically deleted. + +Policies can apply to individual e-people or groups of e-people. + + +Ingest Process and Workflow+ +Rather than being a single subsystem, ingesting is a process that spans several. Below is a simple illustration of the current ingesting process in DSpace. + +DSpace Ingest Process + +The batch item importer is an application, which turns an external SIP (an XML metadata document with some content files) into an "in progress submission" object. The Web submission UI is similarly used by an end-user to assemble an "in progress submission" object. + +Depending on the policy of the collection to which the submission in targeted, a workflow process may be started. This typically allows one or more human reviewers or 'gatekeepers' to check over the submission and ensure it is suitable for inclusion in the collection. + +When the Batch Ingester or Web Submit UI completes the InProgressSubmission object, and invokes the next stage of ingest (be that workflow or item installation), a provenance message is added to the Dublin Core which includes the filenames and checksums of the content of the submission. Likewise, each time a workflow changes state (e.g. a reviewer accepts the submission), a similar provenance statement is added. This allows us to track how the item has changed since a user submitted it. + +Once any workflow process is successfully and positively completed, the InProgressSubmission object is consumed by an "item installer", that converts the InProgressSubmission into a fully blown archived item in DSpace. The item installer: + +
Workflow Steps+ +A collection's workflow can have up to three steps. Each collection may have an associated e-person group for performing each step; if no group is associated with a certain step, that step is skipped. If a collection has no e-person groups associated with any step, submissions to that collection are installed straight into the main archive. + +In other words, the sequence is this: The collection receives a submission. If the collection has a group assigned for workflow step 1, that step is invoked, and the group is notified. Otherwise, workflow step 1 is skipped. Likewise, workflow steps 2 and 3 are performed if and only if the collection has a group assigned to those steps. + +When a step is invoked, the submission is put into the 'task pool' of the step's associated group. One member of that group takes the task from the pool, and it is then removed from the task pool, to avoid the situation where several people in the group may be performing the same task without realizing it. + +The member of the group who has taken the task from the pool may then perform one of three actions: +
+
+
+
+
Submission Workflow in DSpace + +If a submission is rejected, the reason (entered by the workflow participant) is e-mailed to the submitter, and it is returned to the submitter's 'My DSpace' page. The submitter can then make any necessary modifications and re-submit, whereupon the process starts again. + +If a submission is 'accepted', it is passed to the next step in the workflow. If there are no more workflow steps with associated groups, the submission is installed in the main archive. + +One last possibility is that a workflow can be 'aborted' by a DSpace site administrator. This is accomplished using the administration UI. + +The reason for this apparently arbitrary design is that is was the simplest case that covered the needs of the early adopter communities at MIT. The functionality of the workflow system will no doubt be extended in the future. + +Supervision and Collaboration+ +In order to facilitate, as a primary objective, the opportunity for thesis authors to be supervised in the preparation of their e-theses, a supervision order system exists to bind groups of other users (thesis supervisors) to an item in someone's pre-submission workspace. The bound group can have system policies associated with it that allow different levels of interaction with the student's item; a small set of default policy groups are provided: + +
This functionality could also be used in situations where researchers wish to collaborate on a particular submission, although there is no particular collaborative workspace functionality. + + +Handles+ +Researchers require a stable point of reference for their works. The simple evolution from sharing of citations to emailing of URLs broke when Web users learned that sites can disappear or be reconfigured without notice, and that their bookmark files containing critical links to research results couldn't be trusted in the long term. To help solve this problem, a core DSpace feature is the creation of a persistent identifier for every item, collection and community stored in DSpace. To persist identifiers, DSpace requires a storage- and location- independent mechanism for creating and maintaining identifiers. DSpace uses the CNRI Handle System for creating these identifiers. The rest of this section assumes a basic familiarity with the Handle system. + +DSpace uses Handles primarily as a means of assigning globally unique identifiers to objects. Each site running DSpace needs to obtain a unique Handle 'prefix' from CNRI, so we know that if we create identifiers with that prefix, they won't clash with identifiers created elsewhere. + +Presently, Handles are assigned to communities, collections, and items. Bundles and bitstreams are not assigned Handles, since over time, the way in which an item is encoded as bits may change, in order to allow access with future technologies and devices. Older versions may be moved to off-line storage as a new standard becomes de facto. Since it's usually the item that is being preserved, rather than the particular bit encoding, it only makes sense to persistently identify and allow access to the item, and allow users to access the appropriate bit encoding from there. + +Of course, it may be that a particular bit encoding of a file is explicitly being preserved; in this case, the bitstream could be the only one in the item, and the item's Handle would then essentially refer just to that bitstream. The same bitstream can also be included in other items, and thus would be citable as part of a greater item, or individually. + +The Handle system also features a global resolution infrastructure; that is, an end-user can enter a Handle into any service (e.g. Web page) that can resolve Handles, and the end-user will be directed to the object (in the case of DSpace, community, collection or item) identified by that Handle. In order to take advantage of this feature of the Handle system, a DSpace site must also run a 'Handle server' that can accept and resolve incoming resolution requests. All the code for this is included in the DSpace source code bundle. + +Handles can be written in two forms: +
+
+hdl:1721.123/4567
+http://hdl.handle.net/1721.123/4567
+
+The above represent the same Handle. The first is possibly more convenient to use only as an identifier; however, by using the second form, any Web browser becomes capable of resolving Handles. An end-user need only access this form of the Handle as they would any other URL. It is possible to enable some browsers to resolve the first form of Handle as if they were standard URLs using CNRI's Handle Resolver plug-in, but since the first form can always be simply derived from the second, DSpace displays Handles in the second form, so that it is more useful for end-users. + +It is important to note that DSpace uses the CNRI Handle infrastructure only at the 'site' level. For example, in the above example, the DSpace site has been assigned the prefix '1721.123'. It is still the responsibility of the DSpace site to maintain the association between a full Handle (including the '4567' local part) and the community, collection or item in question. + + +Bitstream 'Persistent' Identifiers+ +Similar to handles for DSpace items, bitstreams also have 'Persistent' identifiers. They are more volatile than Handles, since if the content is moved to a different server or organization, they will no longer work (hence the quotes around 'persistent'). However, they are more easily persisted than the simple URLs based on database primary key previously used. This means that external systems can more reliably refer to specific bitstreams stored in a DSpace instance. + +Each bitstream has a sequence ID, unique within an item. This sequence ID is used to create a persistent ID, of the form: + +dspace url/bitstream/handle/sequence ID/filename + +For example: +
+
+https://dspace.myu.edu/bitstream/123.456/789/24/foo.html
+
+The above refers to the bitstream with sequence ID 24 in the item with the Handle hdl:123.456/789. The foo.html is really just there as a hint to browsers: Although DSpace will provide the appropriate MIME type, some browsers only function correctly if the file has an expected extension. + + +Storage Resource Broker (SRB) Support+ +DSpace offers two means for storing bitstreams. The first is in the file system on the server. The second is using SRB (Storage Resource Broker). Both are achieved using a simple, lightweight API. + +SRB is purely an option but may be used in lieu of the server's file system or in addition to the file system. Without going into a full description, SRB is a very robust, sophisticated storage manager that offers essentially unlimited storage and straightforward means to replicate (in simple terms, backup) the content on other local or remote storage resources. + + +Search and Browse+ +DSpace allows end-users to discover content in a number of ways, including: + +
Another important mechanism for discovery in DSpace is the browse. This is the process whereby the user views a particular index, such as the title index, and navigates around it in search of interesting items. The browse subsystem provides a simple API for achieving this by allowing a caller to specify an index, and a subsection of that index. The browse subsystem then discloses the portion of the index of interest. Indices that may be browsed are item title, item issue date, item author, and subject terms. Additionally, the browse can be limited to items within a particular collection or community. + + +HTML Support+ +For the most part, at present DSpace simply supports uploading and downloading of bitstreams as-is. This is fine for the majority of commonly-used file formats – for example PDFs, Microsoft Word documents, spreadsheets and so forth. HTML documents (Web sites and Web pages) are far more complicated, and this has important ramifications when it comes to digital preservation: + +
OAI Support+ +The Open Archives Initiative has developed a protocol for metadata harvesting. This allows sites to programmatically retrieve or 'harvest' the metadata from several sources, and offer services using that metadata, such as indexing or linking services. Such a service could allow users to access information from a large number of sites from one place. + +DSpace exposes the Dublin Core metadata for items that are publicly (anonymously) accessible. Additionally, the collection structure is also exposed via the OAI protocol's 'sets' mechanism. OCLC's open source OAICat framework is used to provide this functionality. + +You can also configure the OAI service to make use of any crosswalk plugin to offer additional metadata formats, such as MODS. + +DSpace's OAI service does support the exposing of deletion information for withdrawn items, but not for items that are 'expunged' (see above). DSpace also supports OAI-PMH resumption tokens. + + +OpenURL Support+ +DSpace supports the OpenURL protocol from SFX, in a rather simple fashion. If your institution has an SFX server, DSpace will display an OpenURL link on every item page, automatically using the Dublin Core metadata. Additionally, DSpace can respond to incoming OpenURLs. Presently it simply passes the information in the OpenURL to the search subsystem. A list of results is then displayed, which usually gives the relevant item (if it is in DSpace) at the top of the list. + + +Creative Commons Support+ +DSpace provides support for Creative Commons licenses to be attached to items in the repository. They represent an alternative to traditional copyright. To learn more about Creative Commons, visit their website. Support for the licenses is controlled by a site-wide configuration option, and since license selection involves redirection to the Creative Commons website, additional parameters may be configured to work with a proxy server. If the option is enabled, users may select a Creative Commons license during the submission process, or elect to skip Creative Commons licensing. If a selection is made a copy of the license text and RDF metadata is stored along with the item in the repository. There is also an indication - text and a Creative Commons icon - in the item display page of the web user interface when an item is licensed under Creative Commons. + + +Subscriptions+ +As noted above, end-users (e-people) may 'subscribe' to collections in order to be alerted when new items appear in those collections. Each day, end-users who are subscribed to one or more collections will receive an e-mail giving brief details of all new items that appeared in any of those collections the previous day. If no new items appeared in any of the subscribed collections, no e-mail is sent. Users can unsubscribe themselves at any time. RSS feeds of new items are also available for collections and communities. + + +Import and Export+ +DSpace also includes batch tools to import and export items in a simple directory structure, where the Dublin Core metadata is stored in an XML file. This may be used as the basis for moving content between DSpace and other systems. + +There is also a METS-based export tool, which exports items as METS-based metadata with associated bitstreams referenced from the METS file. + + +Registration+ +Registration is an alternate means of incorporating items, their metadata, and their bitstreams into DSpace by taking advantage of the bitstreams already being in accessible computer storage. An example might be that there is a repository for existing digital assets. Rather than using the normal interactive ingest process or the batch import to furnish DSpace the metadata and to upload bitstreams, registration provides DSpace the metadata and the location of the bitstreams. DSpace uses a variation of the import tool to accomplish registration. + + +Statistics+ +DSpace offers system statistics for administrator usage, as well as usage statistics on the level of items, communities and collections. + +System Statistics+ +Various statistical reports about the contents and use of your system can be automatically generated by the system. These are generated by analyzing DSpace's log files. Statistics can be broken down monthly. + +The report includes following sections + +
Item, Collection and Community Usage Statistics+ +Usage statistics can be retrieved from individual item, collection and community pages. These Usage Statistics pages show: + +
*File Downloads information is only displayed for item-level statistics. Note that downloads from separate bitstreams are also recorded and represented separately. DSpace is able to capture and store File Download information, even when the bitstream was downloaded from a direct link on an external website. + +Checksum Checker+ +The purpose of the checker is to verify that the content in a DSpace repository has not become corrupted or been tampered with. The functionality can be invoked on an ad-hoc basis from the command line, or configured via cron or similar. Options exist to support large repositories that cannot be entirely checked in one run of the tool. The tool is extensible to new reporting and checking priority approaches. + + +Usage Instrumentation+ +DSpace can report usage events, such as bitstream downloads, to a pluggable event processor. This can be used for developing customized usage statistics, for example. Sample event processor plugins writes event records to a file as tab-separated values or XML. + + +Choice Management and Authority Control+ +This is a configurable framework that lets you define plug-in classes to control the choice of values for a given DSpace metadata fields. It also lets you configure fields to include "authority" values along with the textual metadata value. The choice-control system includes a user interface in both the Configurable Submission UI and the Admin UI (edit Item pages) that assists the user in choosing metadata values. + +Introduction and Motivation+ +Definitions+ +Choice Management + +This is a mechanism that generates a list of choices for a value to be entered in a given metadata field. Depending on your implementation, the exact choice list might be determined by a proposed value or query, or it could be a fixed list that is the same for every query. It may also be closed (limited to choices produced internally) or open, allowing the user-supplied query to be included as a choice. + +Authority Control + +This works in addition to choice management to supply an authority key along with the chosen value, which is also assigned to the Item's metadata field entry. Any authority-controlled field is also inherently choice-controlled. + + +About Authority Control+ +The advantages we seek from an authority controlled metadata field are: + +
Some Terminology+ +
+
+
+
+
+
+ Attachments:
+
+
+
+
+
+ ![]() + ![]() + ![]() + ![]() + |
+
![]() |
+
Document generated by Confluence on Mar 25, 2011 19:21 | +
+
+
+ DSpace Documentation : Google Scholar Metadata Mappings
+
+
+
+ This page last changed on Nov 08, 2010 by sands.
+
+
+ Google Scholar, in crawling sites, prefers a meta-tag schema of its own devising. This schema contains names which are all prefixed by the string "citation_", and provide various metadata about the article/item being indexed. As of DSpace 1.7, there is a mapping facility to connect metadata fields with these citation fields in HTML. In order to enable this functionality, the switch needs to be flipped in dspace.cfg:
+ google-metadata.enable = true ++ Once the feature is enabled, the mapping is configured by a separate configuration file located here:
+ ${dspace.dir}/config/google-metadata.properties ++ This file contains name/value pairs linking meta-tags with DSpace metadata fields. E.g…
+ google.citation_title = dc.title + google.citation_publisher = dc.publisher + google.citation_authors = dc.author | dc.contributor.author | dc.creator ++ There is further documentation in this configuration file explaining proper syntax in specifying which metadata fields to use. If a value is omitted for a meta-tag field, the meta-tag is simply not included in the HTML output. The values for each item are interpolated when the item is viewed, and the appropriate meta-tags are included in the HTML head tag, on both the Brief Item Display and the Full Item Display. This is implemented in the XMLUI and JSPUI. + + + |
+
![]() |
+
Document generated by Confluence on Mar 25, 2011 19:21 | +
+
+
+ DSpace Documentation : History
+
+
+
+ This page last changed on Mar 23, 2011 by tdonohue.
+
+
+ DSpace System Documentation: Version History+ +
+
+
+
Changes in DSpace 1.7.1+ +General Improvements+ ++ + +
Bug Fixes+ ++ + +
Changes in DSpace 1.7.0+ +New Features+ ++ + +
General Improvements+ ++ + +
Bug Fixes+ ++ + +
Changes in DSpace 1.6.2+ +General Improvements+ ++ + +
Bug Fixes+ ++ + +
Changes in DSpace 1.6.1+ +General Improvements+ ++ + +
Bug Fixes+ ++ + +
Changes in DSpace 1.6.0+ +New Features+ ++ + +
General Improvements+ ++ + +
Bug Fixes+ ++ + +
Changes in DSpace 1.5.2+ +New Features+ ++ + +
General Improvements+ ++ + +
Bug Fixes+ ++ + +
Changes in DSpace 1.5.1+ +General Improvements and Bug Fixes+ +(Scott Philips) +
(Mark Diggory) +
(Claudia Juergen) +
(Stuart Lewis) +
(Zuki Ebetsu / Stuart Lewis) +
(Stuart Lewis / Claudia Juergen) +
(Tim Donohue) +
(Graham Triggs) +
Changes in DSpace 1.5+ +General Improvements+ +
Bug fixes and smaller patches+ +
Changes in DSpace 1.4.1+ +General Improvements+ +
Bug fixes+ +
Changes in DSpace 1.4+ +General Improvements+ +
Bug fixes+ +
Changes in DSpace 1.3.2+ +General Improvements+ +
Bug fixes+ +
Changes in DSpace 1.3.1+ +Bug fixes+ +
Changes in DSpace 1.3+ +General Improvements+ +
Bug fixes+ +
Changes in DSpace 1.2.2+ +General Improvements+ +
Bug fixes+ +
Changes in JSPs+ +
Changes in DSpace 1.2.1+ +General Improvements+ +
Bug fixes+ +
Changed JSPs+ +
Changes in DSpace 1.2+ +General Improvments+ +
Administration+ +
Import/Export/OAI+ +
Miscellaneous+ +
JSP file changes between 1.1 and 1.2+ +This list generated with cvs -Q rdiff -s -r dspace-1_1 dspace and a sprinkling of perl. + +
Changes in DSpace 1.1.1+ +Bug fixes+ +
Improvements+ +
Changes in DSpace 1.1+ +
|
+
![]() |
+
Document generated by Confluence on Mar 25, 2011 19:21 | +
+
+
+ DSpace Documentation : Installation
+
+
+
+ This page last changed on Mar 11, 2011 by stuartlewis.
+
+
+ DSpace System Documentation: Installation+ +
+
+
+
For the Impatient+ +Since some users might want to get their test version up and running as fast as possible, offered below is an unsupported outline of getting DSpace to run quickly in a Unix-based environment. + +
+ useradd -m dspace
+gunzip -c dspace-1.x-src-release.tar.gz | tar -xf -
+createuser -U postgres -d -A -P dspace
+createdb -U dspace -E UNICODE dspace
+cd [dspace-source]/dspace/config
+vi dspace.cfg
+mkdir [dspace]
+chown dspace [dspace]
+su - dspace
+cd [dspace-source]/dspace
+mvn package
+cd [dspace-source]/dspace/target/dspace-<version>-build.dir
+ant fresh_install
+cp -r [dspace]/webapps/* [tomcat]/webapps
+/etc/init.d/tomcat start
+[dspace]/bin/dspace create-administrator
+Prerequisite Software+ +The list below describes the third-party components and tools you'll need to run a DSpace server. These are just guidelines. Since DSpace is built on open source, standards-based tools, there are numerous other possibilities and setups. + +Also, please note that the configuration and installation guidelines relating to a particular tool below are here for convenience. You should refer to the documentation for each individual component for complete and up-to-date details. Many of the tools are updated on a frequent basis, and the guidelines below may become out of date. + +UNIX-like OS or Microsoft Windows+ +
Oracle Java JDK 6 or later (standard SDK is fine, you don't need J2EE)+ +DSpace now requires Oracle Java 6 or greater because of usage of new language capabilities introduced in 5 and 6 that make coding easier and cleaner. + +Java can be downloaded from the following location: http://java.sun.com/javase/downloads/index.jsp + + +Only Oracle's Java has been tested with each release and is known to work correctly. Other flavors of Java may pose problems. + +Apache Maven 2.2.x (Java build tool)+ +
Maven is necessary in the first stage of the build process to assemble the installation package for your DSpace instance. It gives you the flexibility to customize DSpace using the existing Maven projects found in the [dspace-source]/dspace/modules directory or by adding in your own Maven project to build the installation package for DSpace, and apply any custom interface "overlay" changes. + +Maven can be downloaded from the following location: http://maven.apache.org/download.html + + +Configuring a Proxy+ +You can configure a proxy to use for some or all of your HTTP requests in Maven 2.0. The username and password are only required if your proxy requires basic authentication (note that later releases may support storing your passwords in a secured keystore‚ in the mean time, please ensure your settings.xml file (usually ${user.home}/.m2/settings.xml) is secured with permissions appropriate for your operating system). + +Example: +
+ <settings>
+ .
+ .
+ <proxies>
+ <proxy>
+ <active>true</active>
+ <protocol>http</protocol>
+ <host>proxy.somewhere.com</host>
+ <port>8080</port>
+ <username>proxyuser</username>
+ <password>somepassword</password>
+ <nonProxyHosts>www.google.com|*.somewhere.com</nonProxyHosts>
+ </proxy>
+ </proxies>
+ .
+ .
+</settings>
+Apache Ant 1.7 or later (Java build tool)+ +Apache Ant is still required for the second stage of the build process. It is used once the installation package has been constructed in [dspace-source]/dspace/target/dspace-<version>-build.dir and still uses some of the familiar ant build targets found in the 1.4.x build process. + +Ant can be downloaded from the following location: http://ant.apache.org + + + +Relational Database: (PostgreSQL or Oracle).+ +
Servlet Engine: (Apache Tomcat 5.5 or 6, Jetty, Caucho Resin or equivalent).+ +
Perl (only required for [dspace]/bin/dspace-info.pl)+ + +Installation Instructions+ +Overview of Install Options+ +With the advent of a new Apache Maven 2 based build architecture (first introduced inDSpace 1.5.x), you now have two options in how you may wish to install and manage your local installation of DSpace. If you've used DSpace 1.4.x, please recognize that the initial build procedure has changed to allow for more customization. You will find the later 'Ant based' stages of the installation procedure familiar. Maven is used to resolve the dependencies of DSpace online from the 'Maven Central Repository' server. + +It is important to note that the strategies are identical in terms of the list of procedures required to complete the build process, the only difference being that the Source Release includes "more modules" that will be built given their presence in the distribution package. + +
Overview of DSpace Directories+ +Before beginning an installation, it is important to get a general understanding of the DSpace directories and the names by which they are generally referred. (Please attempt to use these below directory names when asking for help on the DSpace Mailing Lists, as it will help everyone better understand what directory you may be referring to.) + +DSpace uses three separate directory trees. Although you don't need to know all the details of them in order to install DSpace, you do need to know they exist and also know how they're referred to in this document: + +
Installation+ +This method gets you up and running with DSpace quickly and easily. It is identical in both the Default Release and Source Release distributions. + +
In order to set up some communities and collections, you'll need to login as your DSpace Administrator (which you created with create-administrator above) and access the administration UI in either the JSP or XML user interface. + + + +Advanced Installation+ +The above installation steps are sufficient to set up a test server to play around with, but there are a few other steps and options you should probably consider before deploying a DSpace production site. + +'cron' Jobs+ +A couple of DSpace features require that a script is run regularly – the e-mail subscription feature that alerts users of new items being deposited, and the new 'media filter' tool, that generates thumbnails of images and extracts the full-text of documents for indexing. + +To set these up, you just need to run the following command as the dspace UNIX user: +
+ crontab -e+ Then add the following lines: +
+ # Send out subscription e-mails at 01:00 every day +0 1 * * * [dspace]/bin/dspace sub-daily +# Run the media filter at 02:00 every day +0 2 * * * [dspace]/bin/dspace filter-media +# Run the checksum checker at 03:00 +0 3 * * * [dspace]/bin/dspace checker -lp +# Mail the results to the sysadmin at 04:00 +0 4 * * * [dspace]/bin/dspace checker-emailer -c ++ Naturally you should change the frequencies to suit your environment. + +PostgreSQL also benefits from regular 'vacuuming', which optimizes the indexes and clears out any deleted data. Become the postgres UNIX user, run crontab -e and add (for example): +
+ # Clean up the database nightly at 4.20am
+20 4 * * * vacuumdb --analyze dspace > /dev/null 2>&1
+In order that statistical reports are generated regularly and thus kept up to date you should set up the following cron jobs: +
+ # Run stat analysis +0 1 * * * [dspace]/bin/dspace stat-general +0 1 * * * [dspace]/bin/dspace stat-monthly +0 2 * * * [dspace]/bin/dspace stat-report-general +0 2 * * * [dspace]/bin/dspace stat-report-monthly+ Obviously, you should choose execution times which are most useful to you, and you should ensure that the Multilingual Installation+ +In order to deploy a multilingual version of DSpace you have to configure two parameters in [dspace-source]/config/dspace.cfg: + +
The Locales might have the form country, country_language, country_language_variant. + +According to the languages you wish to support, you have to make sure, that all the i18n related files are available see the Multilingual User Interface Configuring MultiLingual Support section for the JSPUI or the Multilingual Support for XMLUI in the configuration documentation. + + +DSpace over HTTPS+ +If your DSpace is configured to have users login with a username and password (as opposed to, say, client Web certificates), then you should consider using HTTPS. Whenever a user logs in with the Web form (e.g. dspace.myuni.edu/dspace/password-login) their DSpace password is exposed in plain text on the network. This is a very serious security risk since network traffic monitoring is very common, especially at universities. If the risk seems minor, then consider that your DSpace administrators also login this way and they have ultimate control over the archive. + +The solution is to use HTTPS (HTTP over SSL, i.e. Secure Socket Layer, an encrypted transport), which protects your passwords against being captured. You can configure DSpace to require SSL on all "authenticated" transactions so it only accepts passwords on SSL connections. + +The following sections show how to set up the most commonly-used Java Servlet containers to support HTTP over SSL. + +To enable the HTTPS support in Tomcat 5.0:+ +
To use SSL on Apache HTTPD with mod_jk:+ +If you choose Apache HTTPD as your primary HTTP server, you can have it forward requests to the Tomcat servlet container via Apache Jakarta Tomcat Connector. This can be configured to work over SSL as well. First, you must configure Apache for SSL; for Apache 2.0 see Apache SSL/TLS Encryption for information about using mod_ssl. + +If you are using X.509 Client Certificates for authentication: add these configuration options to the appropriate httpd configuration file, e.g. ssl.conf, and be sure they are in force for the virtual host and namespace locations dedicated to DSpace: +
+ ## SSLVerifyClient can be "optional" or + "require" + SSLVerifyClient optional + SSLVerifyDepth 10 + SSLCACertificateFile + path-to-your-client-CA-certificate + SSLOptions StdEnvVars ExportCertData + ++ Now consult the Apache Jakarta Tomcat Connector documentation to configure the mod_jk (note: NOTmod_jk2) module. Select the AJP 1.3 connector protocol. Also follow the instructions there to configure your Tomcat server to respond to AJP. + +To use SSL on Apache HTTPD with mod_webapp consult the DSpace 1.3.2 documentation. Apache have deprecated the mod_webapp connector and recommend using mod_jk. + +To use Jetty's HTTPS support consult the documentation for the relevant tool. + + + +The Handle Server+ +First a few facts to clear up some common misconceptions: + +
A Handle server runs as a separate process that receives TCP requests from other Handle servers, and issues resolution requests to a global server or servers if a Handle entered locally does not correspond to some local content. The Handle protocol is based on TCP, so it will need to be installed on a server that can broadcast and receive TCP on port 2641. + +
Updating Existing Handle Prefixes+ +If you need to update the handle prefix on items created before the CNRI registration process you can run the [dspace]/bin/dspace update-handle-prefix script. You may need to do this if you loaded items prior to CNRI registration (e.g. setting up a demonstration system prior to migrating it to production). The script takes the current and new prefix as parameters. For example: +
+ [dspace]/bin/dspace update-handle-prefix 123456789 1303 ++ This script will change any handles currently assigned prefix 123456789 to prefix 1303, so for example handle 123456789/23 will be updated to 1303/23 in the database. + + + +Google and HTML sitemaps+ +To aid web crawlers index the content within your repository, you can make use of sitemaps. There are currently two forms of sitemaps included in DSpace; Google sitemaps and HTML sitemaps. + +Sitemaps allow DSpace to expose it's content without the crawlers having to index every page. HTML sitemaps provide a list of all items, collections and communities in HTML format, whilst Google sitemaps provide the same information in gzipped XML format. + +To generate the sitemaps, you need to run [dspace]/bin/generate-sitemaps This creates the sitemaps in [dspace]/sitemaps/ + +The sitemaps can be accessed from the following URLs: + +
When running [dspace]/bin/generate-sitemaps the script informs Google that the sitemaps have been updated. For this update to register correctly, you must first register your Google sitemap index page (/dspace/sitemap) with Google at http://www.google.com/webmasters/sitemaps/. If your DSpace server requires the use of a HTTP proxy to connect to the Internet, ensure that you have set http.proxy.host and http.proxy.port in [dspace]/config/dspace.cfg + +The URL for pinging Google, and in future, other search engines, is configured in [dspace-space]/config/dspace.cfg using the sitemap.engineurls setting where you can provide a comma-separated list of URLs to 'ping'. + +You can generate the sitemaps automatically every day using an additional cron job: +
+ # Generate sitemaps + + +0 6 * * * [dspace]/bin/generate-sitemaps + ++ DSpace Statistics+ +DSpace uses the Apache Solr application underlaying the statistics. There is no need to download any separate software. All the necessary software is included. To understand all of the configuration property keys, the user should refer to 5.2.35 DSpace Statistic Configuration for detailed information. + +
Windows Installation+ +Pre-requisite Software+ +If you are installing DSpace on Windows, you will still need to install all the same Prerequisite Software, as listed above. + +
Installation Steps+ +
Checking Your Installation+ +The administrator needs to check the installation to make sure all components are working. Here is list of checks to be performed. In brackets after each item, it the associated component or components that might be the issue needing resolution. + +
Known Bugs+ +In any software project of the scale of DSpace, there will be bugs. Sometimes, a stable version of DSpace includes known bugs. We do not always wait until every known bug is fixed before a release. If the software is sufficiently stable and an improvement on the previous release, and the bugs are minor and have known workarounds, we release it to enable the community to take advantage of those improvements. + +The known bugs in a release are documented in the KNOWN_BUGS file in the source package. + +Please see the DSpace bug tracker for further information on current bugs, and to find out if the bug has subsequently been fixed. This is also where you can report any further bugs you find. + + +Common Problems+ +In an ideal world everyone would follow the above steps and have a fully functioning DSpace. Of course, in the real world it doesn't always seem to work out that way. This section lists common problems that people encounter when installing DSpace, and likely causes and fixes. This is likely to grow over time as we learn about users' experiences. + +
+
+ Attachments:
+
+
+
+
+ |
+
![]() |
+
Document generated by Confluence on Mar 25, 2011 19:21 | +
+
+
+ DSpace Documentation : Introduction
+
+
+
+ This page last changed on Mar 07, 2011 by tdonohue.
+
+
+
DSpace System Documentation: Introduction+ + +DSpace is an open source software platform that enables organisations to: + +
This system documentation includes a functional overview of the system, which is a good introduction to the capabilities of the system, and should be readable by non-technical folk. Everyone should read this section first because it introduces some terminology used throughout the rest of the documentation. + +For people actually running a DSpace service, there is an installation guide, and sections on configuration and the directory structure. + +Finally, for those interested in the details of how DSpace works, and those potentially interested in modifying the code for their own purposes, there is a detailed architecture and design section. + +Other good sources of information are: + +
|
+
![]() |
+
Document generated by Confluence on Mar 25, 2011 19:21 | +
+
+
+ DSpace Documentation : JSPUI Configuration and Customization
+
+
+
+ This page last changed on Dec 14, 2010 by tdonohue.
+
+
+ DSpace System Documentation: JPSUI Configuration and Customization+ +The DSpace digital repository supports two user interfaces: one based on JavaServer Pages (JSP) technologies and one based upon the Apache Cocoon framework (XMLUI). This chapter describes those parameters which are specific to the JPSUI interface. + +
+
+
+
Configuration+ +The user will need to refer to the extensive WebUI/JSPUI configurations that are contained in JSP Web Interface Settings. + + +Customizing the JSP pages+ +The JSPUI interface is implemented using Java Servlets which handle the business logic, and JavaServer Pages (JSPs) which produce the HTML pages sent to an end-user. Since the JSPs are much closer to HTML than Java code, altering the look and feel of DSpace is relatively easy. + +To make it even easier, DSpace allows you to 'override' the JSPs included in the source distribution with modified versions, that are stored in a separate place, so when it comes to updating your site with a new DSpace release, your modified versions will not be overwritten. It should be possible to dramatically change the look of DSpace to suit your organization by just changing the CSS style file and the site 'skin' or 'layout' JSPs in jsp/layout; if possible, it is recommended you limit local customizations to these files to make future upgrades easier. + +You can also easily edit the text that appears on each JSP page by editing the Messages.properties file. However, note that unless you change the entry in all of the different language message files, users of other languages will still see the default text for their language. See Internationalization in Application Layer. + +Note that the data (attributes) passed from an underlying Servlet to the JSP may change between versions, so you may have to modify your customized JSP to deal with the new data. + +Thus, if possible, it is recommended you limit your changes to the 'layout' JSPs and the stylesheet. + +The JSPs are available in one of two places: + +
If you wish to modify a particular JSP, place your edited version in the [dspace-source]/dspace/modules/jspui/src/main/webapp/ directory (this is the replacement for the pre-1.5 /jsp/local directory), with the same path as the original. If they exist, these will be used in preference to the default JSPs. For example: +
+
+
+
+
Heavy use is made of a style sheet, styles.css. If you make edits, copy the local version to [jsp.custom-dir]/dspace/modules/jspui/src/main/webapp/styles.css, and it will be used automatically in preference to the default, as described above. + +Fonts and colors can be easily changed using the stylesheet. The stylesheet is a JSP so that the user's browser version can be detected and the stylesheet tweaked accordingly. + +The 'layout' of each page, that is, the top and bottom banners and the navigation bar, are determined by the JSPs /layout/header-*.jsp and /layout/footer-*.jsp. You can provide modified versions of these (in [jsp.custom-dir]/dspace/modules/jspui/src/main/webapp/layout), or define more styles and apply them to pages by using the "style" attribute of the dspace:layout tag. + +
|
+
![]() |
+
Document generated by Confluence on Mar 25, 2011 19:21 | +
+
+
+ DSpace Documentation : Mirage Configuration and Customization
+
+
+
+ This page last changed on Mar 25, 2011 by peterdietz.
+
+
+ Mirage Theme Configuration and Customization+ +
+
+
+
Introduction+ +Mirage is a new XMLUI theme, added in DSpace 1.7 by @mire. The code was mainly developed by Art Lowel. The main benefits of Mirage are: + +
Configuration Parameters+ +
+
+
+
+
+
Technical Features+ +Look & Feel+ +
Structural enhancements for easier customization.+ +
Enhanced Performance+ +
Troubleshooting+ +Errors using HTTPS+ +DSpace 1.7.0 ships with a hardcoded http:// link for JQuery, causing problems for users running 1.7.0 Mirage on HTTPS. While awaiting the implementation of this fix in an upcoming release, you can solve in the following file: lib/core/page-structure.xsl, addJavascript template. In this file, you will need to replace + +
+ <script type="text/javascript" src="http://ajax.googleapis.com/ajax/libs/jquery/1.4.2/jquery.min.js"> </script> ++ with + +
+ <script type="text/javascript"> + <xsl:text disable-output-escaping="yes">var JsHost = (("https:" == document.location.protocol) ? "https://" : "http://"); + document.write(unescape("%3Cscript src='" + JsHost + "ajax.googleapis.com/ajax/libs/jquery/1.4.2/jquery.min.js' type='text/javascript'%3E%3C/script%3E"));</xsl:text> + </script> ++ Thanks Peter Dietz for providing this fix. Note: This issue is resolved in 1.7.1 + + + |
+
![]() |
+
Document generated by Confluence on Mar 25, 2011 19:21 | +
+
+
+ DSpace Documentation : Preface
+
+
+
+ This page last changed on Mar 25, 2011 by peterdietz.
+
+
+
Preface+ +Welcome to Release 1.7.1! The committers have volunteered many hours to fix, re-write and contribute new software code for this release. Documentation has also been updated. The 1.7.1 bug fix release includes all the new features of 1.7.0 but fixes many known issues discovered since the release of 1.7.0. + + +Bug Fixes in 1.7.1 include (but not exhaustive): +
Improvements in 1.7.1 include (but not exhaustive): +
The following is a list of the new features included for release 1.7.0 (not an exhaustive list): + +
A full list of all changes / bug fixes in 1.7.0 is available in the History section. + +The following people have contributed directly to this release of DSpace: @mire, Andrea Bollini, Andrea Schweer, Andreas Schwander, Andrew Hankinson, Andrew Taylor, Antero Neto, Ben Bosman, Bill Hays, BioMed Central, Bram Luyten, Caryn Neiswender, Christophe Dupriez, Claudia Jürgen, Enovation Solutions, Erick Rocha Fonseca, Flávio Botelho, Gabriela Mircea, Gareth Waller, Graham Triggs, Hardy Pottinger, Ivan Masár, Jason Stirnaman, Jeffrey Trimble, Keiji Suzuki, Keith Gilbertson, Kevin Van de Velde, Kim Shepherd, Mark Diggory, Mark H. Wood, Marvin Pollard, Michael B. Klein, Nicholas Riley, Nick Nicholas, OhioLINK, Oleksandr Sytnyk, Pere Villega, Peter Dietz, Reinhard Engels, Richard Rodgers, Robin Taylor, Sands Fish, Sarah Shreeves, Scott Phillips, Simon Brown, Stuart Lewis, Tim Donohue, Vladislav Zhivkov, Yin Yin Latt. Many of them could not do this work without the support (release time and financial) of their associated institutions. We offer thanks to those institutions for supporting their staff to take time to contribute to the DSpace project. + +We apologize to any contributor accidentally left off this list. DSpace has such a large, active development community that we sometimes lose track of all our contributors. Our ongoing list of all known people/institutions that have contributed to DSpace software can be found on our DSpace Contributors page. Acknowledgements to those left off will be made in future releases. Want to make sure you make it on the short list of contributors? All you have to do is report an issue, fix a bug or help us determine the necessary requirements for a new feature! Visit our Issue Tracker to take part and get your name on the list of DSpace Contributors! + +The Documentation Gardener for this release was Jeffrey Trimble with input from everyone. All typos are his fault. + +Peter Dietz is the Release Coordinator of this release. Tim Donohue helped out with coordinating the final days of the release. + +Additional thanks to Tim Donohue from DuraSpace on keeping all of us focused on the work at hand, and calming us when we got excited and for the general support for the DSpace project. + + + |
+
![]() |
+
Document generated by Confluence on Mar 25, 2011 19:21 | +
+
+
+ DSpace Documentation : Storage Layer
+
+
+
+ This page last changed on Feb 17, 2011 by helix84.
+
+
+ System Architecture: Storage Layer+ +In this section, we explain the storage layer: the database structure, maintenance, and the bistream store and configurations. + +
+
+
+
RDBMS / Database Structure+ +DSpace uses a relational database to store all information about the organization of content, metadata about the content, information about e-people and authorization, and the state of currently-running workflows. The DSpace system also uses the relational database in order to maintain indices that users can browse. + +Most of the functionality that DSpace uses can be offered by any standard SQL database that supports transactions. Presently, the browse indices use some features specific to PostgreSQL and Oracle, so some modification to the code would be needed before DSpace would function fully with an alternative database back-end. + +The org.dspace.storage.rdbms package provides access to an SQL database in a somewhat simpler form than using JDBC directly. The main class is DatabaseManager, which executes SQL queries and returns TableRow or TableRowIterator objects. The InitializeDatabase class is used to load SQL into the database via JDBC, for example to set up the schema. + +All calls to the Database Manager require a DSpace Context object. Example use of the database manager API is given in the org.dspace.storage.rdbms package Javadoc. + +The database schema used by DSpace is created by SQL statements stored in a directory specific to each supported RDBMS platform: + +
Also in [dspace-source]/dspace/etc/[database] are various SQL files called database_schema_1x_1y. These contain the necessary SQL commands to update a live DSpace database from version 1.x to 1.y. Note that this might not be the only part of an upgrade process: see Updating a DSpace Installation for details. + +The DSpace database code uses an SQL function getnextid to assign primary keys to newly created rows. This SQL function must be safe to use if several JVMs are accessing the database at once; for example, the Web UI might be creating new rows in the database at the same time as the batch item importer. The PostgreSQL-specific implementation of the method uses SEQUENCES for each table in order to create new IDs. If an alternative database backend were to be used, the implementation of getnextid could be updated to operate with that specific DBMS. + +The etc directory in the source distribution contains two further SQL files. clean-database.sql contains the SQL necessary to completely clean out the database, so use with caution! The Ant target clean_database can be used to execute this. update-sequences.sql contains SQL to reset the primary key generation sequences to appropriate values. You'd need to do this if, for example, you're restoring a backup database dump which creates rows with specific primary keys already defined. In such a case, the sequences would allocate primary keys that were already used. + +Versions of the .sql files for Oracle are stored in [dspace-source]/dspace/etc/oracle. These need to be copied over their PostgreSQL counterparts in [dspace-source]/dspace/etc prior to installation. + +Maintenance and Backup+ +When using PostgreSQL, it's a good idea to perform regular 'vacuuming' of the database to optimize performance. This is performed by the vacuumdb command which can be executed via a 'cron' job, for example by putting this in the system crontab: +
+
+# clean up the database nightly
+40 2 * * * /usr/local/pgsql/bin/vacuumdb --analyze dspace > /dev/null 2>&1
+
+The DSpace database can be backed up and restored using usual methods, for example with pg_dump and psql. However when restoring a database, you will need to perform these additional steps: + +
Configuring the RDBMS Component+ +The database manager is configured with the following properties in dspace.cfg: +
+
+
+
+
+
+
Bitstream Store+ +DSpace offers two means for storing content. The first is in the file system on the server. The second is using SRB (Storage Resource Broker). Both are achieved using a simple, lightweight API. + +SRB is purely an option but may be used in lieu of the server's file system or in addition to the file system. Without going into a full description, SRB is a very robust, sophisticated storage manager that offers essentially unlimited storage and straightforward means to replicate (in simple terms, backup) the content on other local or remote storage resources. + +The terms "store", "retrieve", "in the system", "storage", and so forth, used below can refer to storage in the file system on the server ("traditional") or in SRB. + +The BitstreamStorageManager provides low-level access to bitstreams stored in the system. In general, it should not be used directly; instead, use the Bitstream object in the content management API since that encapsulated authorization and other metadata to do with a bitstream that are not maintained by the BitstreamStorageManager. + +The bitstream storage manager provides three methods that store, retrieve and delete bitstreams. Bitstreams are referred to by their 'ID'; that is the primary key bitstream_id column of the corresponding row in the database. + +As of DSpace version 1.1, there can be multiple bitstream stores. Each of these bitstream stores can be traditional storage or SRB storage. This means that the potential storage of a DSpace system is not bound by the maximum size of a single disk or file system and also that traditional and SRB storage can be combined in one DSpace installation. Both traditional and SRB storage are specified by configuration parameters. Also see Configuring the Bitstream Store below. + +Stores are numbered, starting with zero, then counting upwards. Each bitstream entry in the database has a store number, used to retrieve the bitstream when required. + +At the moment, the store in which new bitstreams are placed is decided using a configuration parameter, and there is no provision for moving bitstreams between stores. Administrative tools for manipulating bitstreams and stores will be provided in future releases. Right now you can move a whole store (e.g. you could move store number 1 from /localdisk/store to /fs/anotherdisk/store but it would still have to be store number 1 and have the exact same contents. + +Bitstreams also have an 38-digit internal ID, different from the primary key ID of the bitstream table row. This is not visible or used outside of the bitstream storage manager. It is used to determine the exact location (relative to the relevant store directory) that the bitstream is stored in traditional or SRB storage. The first three pairs of digits are the directory path that the bitstream is stored under. The bitstream is stored in a file with the internal ID as the filename. + +For example, a bitstream with the internal ID 12345678901234567890123456789012345678 is stored in the directory: +
+ +(assetstore dir)/12/34/56/12345678901234567890123456789012345678 ++ The reasons for storing files this way are: + +
The bitstream storage manager is fully transaction-safe. In order to implement transaction-safety, the following algorithm is used to store bitstreams: + +
Similarly, when a bitstream is deleted for some reason, its deleted flag is set to true as part of the overall transaction, and the corresponding file in storage is not deleted. + +The above techniques mean that the bitstream storage manager is transaction-safe. Over time, the bitstream database table and file store may contain a number of 'deleted' bitstreams. The cleanup method of BitstreamStorageManager goes through these deleted rows, and actually deletes them along with any corresponding files left in the storage. It only removes 'deleted' bitstreams that are more than one hour old, just in case cleanup is happening in the middle of a storage operation. + +This cleanup can be invoked from the command line via the Cleanup class, which can in turn be easily executed from a shell on the server machine using /dspace/bin/cleanup. You might like to have this run regularly by cron, though since DSpace is read-lots, write-not-so-much it doesn't need to be run very often. + +Backup+ +The bitstreams (files) in traditional storage may be backed up very easily by simply 'tarring' or 'zipping' the assetstore directory (or whichever directory is configured in dspace.cfg). Restoring is as simple as extracting the backed-up compressed file in the appropriate location. + +Similar means could be used for SRB, but SRB offers many more options for managing backup. + +It is important to note that since the bitstream storage manager holds the bitstreams in storage, and information about them in the database, that a database backup and a backup of the files in the bitstream store must be made at the same time; the bitstream data in the database must correspond to the stored files. + +Of course, it isn't really ideal to 'freeze' the system while backing up to ensure that the database and files match up. Since DSpace uses the bitstream data in the database as the authoritative record, it's best to back up the database before the files. This is because it's better to have a bitstream in storage but not the database (effectively non-existent to DSpace) than a bitstream record in the database but not storage, since people would be able to find the bitstream but not actually get the contents. + +With DSpace 1.7 and above, there is also the option to backup both files and metadata via the AIP Backup and Restore feature. + +Configuring the Bitstream Store+ +Both traditional and SRB bitstream stores are configured in dspace.cfg. + +Configuring Traditional Storage+ +Bitstream stores in the file system on the server are configured like this: +
+ +assetstore.dir = [dspace]/assetstore ++ (Remember that [dspace] is a placeholder for the actual name of your DSpace install directory). + +The above example specifies a single asset store. +
+ +assetstore.dir = [dspace]/assetstore_0 +assetstore.dir.1 = /mnt/other_filesystem/assetstore_1 ++ The above example specifies two asset stores. assetstore.dir specifies the asset store number 0 (zero); after that use assetstore.dir.1, assetstore.dir.2 and so on. The particular asset store a bitstream is stored in is held in the database, so don't move bitstreams between asset stores, and don't renumber them. + +By default, newly created bitstreams are put in asset store 0 (i.e. the one specified by the assetstore.dir property.) This allows backwards compatibility with pre-DSpace 1.1 configurations. To change this, for example when asset store 0 is getting full, add a line to dspace.cfg like: +
+ +assetstore.incoming = 1 ++ Then restart DSpace (Tomcat). New bitstreams will be written to the asset store specified by assetstore.dir.1, which is /mnt/other_filesystem/assetstore_1 in the above example. + + +Configuring SRB Storage+ +The same framework is used to configure SRB storage. That is, the asset store number (0..n) can reference a file system directory as above or it can reference a set of SRB account parameters. But any particular asset store number can reference one or the other but not both. This way traditional and SRB storage can both be used but with different asset store numbers. The same cautions mentioned above apply to SRB asset stores as well: The particular asset store a bitstream is stored in is held in the database, so don't move bitstreams between asset stores, and don't renumber them. + +For example, let's say asset store number 1 will refer to SRB. The there will be a set of SRB account parameters like this: +
+ +srb.host.1 = mysrbmcathost.myu.edu +srb.port.1 = 5544 +srb.mcatzone.1 = mysrbzone +srb.mdasdomainname.1 = mysrbdomain +srb.defaultstorageresource.1 = mydefaultsrbresource +srb.username.1 = mysrbuser +srb.password.1 = mysrbpassword +srb.homedirectory.1 = /mysrbzone/home/mysrbuser.mysrbdomain +srb.parentdir.1 = mysrbdspaceassetstore ++ Several of the terms, such as mcatzone, have meaning only in the SRB context and will be familiar to SRB users. The last, srb.parentdir.n, can be used to used for addition (SRB) upper directory structure within an SRB account. This property value could be blank as well. + +(If asset store 0 would refer to SRB it would be srb.host = ..., srb.port = ..., and so on (.0 omitted) to be consistent with the traditional storage configuration above.) + +The similar use of assetstore.incoming to reference asset store 0 (default) or 1..n (explicit property) means that new bitstreams will be written to traditional or SRB storage determined by whether a file system directory on the server is referenced or a set of SRB account parameters are referenced. + +There are comments in dspace.cfg that further elaborate the configuration of traditional and SRB storage. + ++
+ Attachments:
+
+
+
+
+ |
+
![]() |
+
Document generated by Confluence on Mar 25, 2011 19:21 | +
+
+
+ DSpace Documentation : Submission User Interface
+
+
+
+ This page last changed on Dec 15, 2010 by tdonohue.
+
+
+ DSpace System Documentation: Customizing and Configuring Submission User Interface+ +This page explains various customization and configuration options that are available within DSpace for the Item Submission user interface. + +
+
+
+
Understanding the Submission Configuration File+ +The [dspace]/config/item-submission.xml contains the submission configurations for both the DSpace JSP user interface (JSPUI) or the DSpace XML user interface (XMLUI or Manakin). This configuration file contains detailed documentation within the file itself, which should help you better understand how to best utilize it. + +The Structure of item-submission.xml+
+ <item-submission> + <!-- Where submission processes are mapped to specific Collections --> + <submission-map> + <name-map collection-handle="default" submission-name="traditional" /> ... + </submission-map> + <!-- Where "steps" which are used across many submission processes can be defined in a + single place. They can then be referred to by ID later. --> + <step-definitions> + <step id="collection"> + <processing-class>org.dspace.submit.step.SelectCollectionStep</process;/processing-class> + <workflow-editable>false</workflow-editable> + </step> + ... + </step-definitions> + <!-- Where actual submission processes are defined and given names. Each <submission-process> has + many <step> nodes which are in the order that the steps should be in.--> + <submission-definitions> <submission-process name="traditional"> + ... + <!-- Step definitions appear here! --> + </submission-process> + ... + </submission-definitions> + </item-submission>+ Because this file is in XML format, you should be familiar with XML before editing this file. By default, this file contains the "traditional" Item Submission Process for DSpace, which consists of the following Steps (in this order): + +Select Collection -> Initial Questions -> Describe -> Upload -> Verify -> License -> Complete + +If you would like to customize the steps used or the ordering of the steps, you can do so within the <submission-definition> section of the item-submission.xml . + +In addition, you may also specify different Submission Processes for different DSpace Collections. This can be done in the <submission-map> section. The item-submission.xml file itself documents the syntax required to perform these configuration changes. + + +Defining Steps (<step>) within the item-submission.xml+ +This section describes how Steps of the Submission Process are defined within the item-submission.xml. + +Where to place your <step> definitions+ +<step> definitions can appear in one of two places within the item-submission.xml configuration file. + +
The ordering of <step> definitions matters!+ +The ordering of the <step> tags within a <submission-process> definition directly corresponds to the order in which those steps will appear! + +For example, the following defines a Submission Process where the License step directly precedes the Initial Questions step (more information about the structure of the information under each <step> tag can be found in the section on Structure of the <step> Definition below): +
+ <submission-process> + <!--Step 1 will be to Sign off on the License--> + <step> + <heading>submit.progressbar.license</heading> + <processing-class>org.dspace.submit.step.LicenseStep</processing-classing-class> + <jspui-binding>org.dspace.app.webui.submit.step.JSPLicenseStep</jspui-binding> + <xmlui-binding>org.dspace.app.xmlui.aspect.submission.submit.LicenseStenseStep</xmlui-binding> + <workflow-editable>false</workflow-editable> + </step> + <!--Step 2 will be to Ask Initial Questions--> + <step> + <heading>submit.progressbar.initial-questions</heading> + <processing-class>org.dspace.submit.step.InitialQuestionsStep</process;/processing-class> + <jspui-binding>org.dspace.app.webui.submit.step.JSPInitialQuestionsSteonsStep</jspui-binding> + <xmlui-binding>org.dspace.app.xmlui.aspect.submission.submit.InitialQutialQuestionsStep</xmlui-binding> + <workflow-editable>true</workflow-editable> + </step> + ...[other steps]... +</submission-process>+ Structure of the <step> Definition+ +The same <step> definition is used by both the DSpace JSP user interface (JSPUI) an the DSpace XML user interface (XMLUI or Manakin). Therefore, you will notice each <step> definition contains information specific to each of these two interfaces. + +The structure of the <step> Definition is as follows: +
+ <step>
+ <heading>submit.progressbar.describe</heading>
+ <processing-class>org.dspace.submit.step.DescribeStep</processing-classing-class>
+ <jspui-binding>org.dspace.app.webui.submit.step.JSPDescribeStep</jspuilt;/jspui-binding>
+ <xmlui-binding>org.dspace.app.xmlui.aspect.submission.submit.DescribeScribeStep</xmlui-binding>
+ <workflow-editable>true</workflow-editable>
+</step>
+Each step contains the following elements. The required elements are so marked: + +
Reordering/Removing Submission Steps+ +The removal of existing steps and reordering of existing steps is a relatively easy process! + +Reordering steps + +
Removing one or more steps + +
Assigning a custom Submission Process to a Collection+ +Assigning a custom submission process to a Collection in DSpace involves working with the submission-map section of the item-submission.xml. For a review of the structure of the item-submission.xml see the section above on Understanding the Submission Configuration File. + +Each name-map element within submission-map associates a collection with the name of a submission definition. Its collection-handle attribute is the Handle of the collection. Its submission-name attribute is the submission definition name, which must match the name attribute of a submission-process element (in the submission-definitions section of item-submission.xml. + +For example, the following fragment shows how the collection with handle "12345.6789/42" is assigned the "custom" submission process: + +
+ <submission-map>
+ <name-map collection-handle=" 12345.6789/42" submission-name="
+ custom" />
+ ...
+ </submission-map>
+
+ <submission-definitions>
+ <submission-process name="
+ custom">
+ ...
+ </submission-definitions>
+
+It's a good idea to keep the definition of the default name-map from the example input-forms.xml so there is always a default for collections which do not have a custom form set. + +Getting A Collection's Handle+ +You will need the handle of a collection in order to assign it a custom form set. To discover the handle, go to the "Communities & Collections" page under "Browse" in the left-hand menu on your DSpace home page. Then, find the link to your collection. It should look something like: + +
+ http://myhost.my.edu/dspace/handle/12345.6789/42
+The underlined part of the URL is the handle. It should look familiar to any DSpace administrator. That is what goes in the collection-handle attribute of your name-map element. + + +Custom Metadata-entry Pages for Submission+ +Introduction+ +This section explains how to customize the Web forms used by submitters and editors to enter and modify the metadata for a new item. These metadata web forms are controlled by the Describe step within the Submission Process. However, they are also configurable via their own XML configuration file (input-forms.xml). + +You can customize the "default" metadata forms used by all collections, and also create alternate sets of metadata forms and assign them to specific collections. In creating custom metadata forms, you can choose: + +
NOTE: The cosmetic and ergonomic details of metadata entry fields remain the same as the fixed metadata pages in previous DSpace releases, and can only be altered by modifying the appropriate stylesheet and JSP pages. + +All of the custom metadata-entry forms for a DSpace instance are controlled by a single XML file, input-forms.xml, in the config subdirectory under the DSpace home. DSpace comes with a sample configuration that implements the traditional metadata-entry forms, which also serves as a well-documented example. The rest of this section explains how to create your own sets of custom forms. + + +Describing Custom Metadata Forms+ +The description of a set of pages through which submitters enter their metadata is called a form (although it is actually a set of forms, in the HTML sense of the term). A form is identified by a unique symbolic name. In the XML structure, the form is broken down into a series of pages: each of these represents a separate Web page for collecting metadata elements. + +To set up one of your DSpace collections with customized submission forms, first you make an entry in the form-map. This is effectively a table that relates a collection to a form set, by connecting the collection's Handle to the form name. Collections are identified by handle because their names are mutable and not necessarily unique, while handles are unique and persistent. + +A special map entry, for the collection handle "default", defines the default form set. It applies to all collections which are not explicitly mentioned in the map. In the example XML this form set is named traditional (for the "traditional" DSpace user interface) but it could be named anything. + + +The Structure of input-forms.xml+ +The XML configuration file has a single top-level element, input-forms, which contains three elements in a specific order. The outline is as follows: + +
+ +<input-forms> + + <-- Map of Collections to Form Sets --> + <form-map> + <name-map collection-handle="default" form-name="traditional" + /> + ... + </form-map> + + <-- Form Set Definitions --> + <form-definitions> + <form name="traditional"> + ... + </form-definitions> + + <-- Name/Value Pairs used within Multiple Choice Widgets + --> + <form-value-pairs> + <value-pairs value-pairs-name="common_iso_languages" + dc-term="language_iso"> + ... + </form-value-pairs> +</input-forms> ++ Adding a Collection Map+ +Each name-map element within form-map associates a collection with the name of a form set. Its collection-handle attribute is the Handle of the collection, and its form-name attribute is the form set name, which must match the name attribute of a form element. + +For example, the following fragment shows how the collection with handle "12345.6789/42" is attached to the "TechRpt" form set: + +
+ + <form-map> + <name-map collection-handle=" 12345.6789/42" form-name=" TechRpt"/> + ... + </form-map> + + <form-definitions> + <form name="TechRept"> + ... + </form-definitions> ++ It's a good idea to keep the definition of the default name-map from the example input-forms.xml so there is always a default for collections which do not have a custom form set. + +Getting A Collection's Handle+ +You will need the handle of a collection in order to assign it a custom form set. To discover the handle, go to the "Communities & Collections" page under "Browse" in the left-hand menu on your DSpace home page. Then, find the link to your collection. It should look something like: + +
+ http://myhost.my.edu/dspace/handle/12345.6789/42
+The underlined part of the URL is the handle. It should look familiar to any DSpace administrator. That is what goes in the collection-handle attribute of your name-map element. + + + +Adding a Form Set+ +You can add a new form set by creating a new form element within the form-definitions element. It has one attribute, name, which as seen above must match the value of the name-map for the collections it is to be used for. + +Forms and Pages+ +The content of the form is a sequence of page elements. Each of these corresponds to a Web page of forms for entering metadata elements, presented in sequence between the initial "Describe" page and the final "Verify" page (which presents a summary of all the metadata collected). + +A form must contain at least one and at most six pages. They are presented in the order they appear in the XML. Each page element must include a number attribute, that should be its sequence number, e.g. +
+
+<page number="1">
+
+The page element, in turn, contains a sequence of field elements. Each field defines an interactive dialog where the submitter enters one of the Dublin Core metadata items. + + +Composition of a Field+ +Each field contains the following elements, in the order indicated. The required sub-elements are so marked: + +
For the use of controlled vocabularies see the Configuring Controlled Vocabularies section. + + +Automatically Elided Fields+ +You may notice that some fields are automatically skipped when a custom form page is displayed, depending on the kind of item being submitted. This is because the DSpace user-interface engine skips Dublin Core fields which are not needed, according to the initial description of the item. For example, if the user indicates there are no alternate titles on the first "Describe" page (the one with a few checkboxes), the input for the title.alternative DC element is automatically elided, even on custom submission pages. + +When a user initiates a submission, DSpace first displays what we'll call the "initial-questions page". By default, it contains three questions with check-boxes: +
The answers to the first two questions control whether inputs for certain of the DC metadata fields will displayed, even if they are defined as fields in a custom page. Conversely, if the metadata fields controlled by a checkbox are not mentioned in the custom form, the checkbox is elided from the initial page to avoid confusing or misleading the user. + +The two relevant checkbox entries are "The item has more than one title, e.g. a translated title", and "The item has been published or publicly distributed before". The checkbox for multiple titles trigger the display of the field with dc-element equal to 'title' and dc-qualifier equal to 'alternative'. If the controlling collection's form set does not contain this field, then the multiple titles question will not appear on the initial questions page. + + +Adding Value-Pairs+ +Finally, your custom form description needs to define the "value pairs" for any fields with input types that refer to them. Do this by adding a value-pairs element to the contents of form-value-pairs. It has the following required attributes: +
Example+ +Here is a menu of types of common identifiers: + +
+ + <value-pairs value-pairs-name="common_identifiers" dc-term="identifier"> + <pair> + <displayed-value>Gov't Doc #</displayed-value> + <stored-value>govdoc</stored-value> + </pair> + <pair> + <displayed-value>URI</displayed-value> + <stored-value>uri</stored-value> + </pair> + <pair> + <displayed-value>ISBN</displayed-value> + <stored-value>isbn</stored-value> + </pair> + </value-pairs> ++ It generates the following HTML, which results in the menu widget below. (Note that there is no way to indicate a default choice in the custom input XML, so it cannot generate the HTML SELECTED attribute to mark one of the options as a pre-selected default.) + +
+ +<select name="identifier_qualifier_0"> + <option VALUE="govdoc">Gov't Doc #</option> + <option VALUE="uri">URI</option> + <option VALUE="isbn">ISBN</option> +</select> ++ Deploying Your Custom Forms+ +The DSpace web application only reads your custom form definitions when it starts up, so it is important to remember: +
Any mistake in the syntax or semantics of the form definitions, such as poorly formed XML or a reference to a nonexistent field name, will cause a fatal error in the DSpace UI. The exception message (at the top of the stack trace in the dspace.log file) usually has a concise and helpful explanation of what went wrong. Don't forget to stop and restart the servlet container before testing your fix to a bug. + + +Configuring the File Upload step+ +The Upload step in the DSpace submission process has two configuration options which can be set with your [dspace]/config/dspace.cfg configuration file. They are as follows: + +
Creating new Submission Steps+ +First, a brief warning: Creating a new Submission Step requires some Java knowledge, and is therefore recommended to be undertaken by a Java programmer whenever possible + +That being said, at a higher level, creating a new Submission Step requires the following (in this relative order): + +
Creating a Non-Interactive Step+ +Non-interactive steps are ones that have no user interface and only perform backend processing. You may find a need to create non-interactive steps which perform further processing of previously entered information. + +To create a non-interactive step, do the following: + +
+ <step>
+ <processing-class>org.dspace.submit.step.MyNonInteractiveStep</processing-class>
+ <workflow-editable>false</workflow-editable>
+</step>
+Note: Non-interactive steps will not appear in the Progress Bar! Therefore, your submitters will not even know they are there. However, because they are not visible to your users, you should make sure that your non-interactive step does not take a large amount of time to finish its processing and return control to the next step (otherwise there will be a visible time delay in the user interface). + + + |
+
![]() |
+
Document generated by Confluence on Mar 25, 2011 19:21 | +
+
+
+ DSpace Documentation : System Administration
+
+
+
+ This page last changed on Mar 21, 2011 by tdonohue.
+
+
+ DSpace System Documentation: System Administration+ +DSpace operates on several levels: as a Tomcat servlet, cron jobs, and on-demand operations. This section explains many of the on-demand operations. Some of the command operations may be also set up as cron jobs. Many of these operations are performed at the Command Line Interface (CLI) also known as the Unix prompt ($:). Future reference will use the term CLI when the use needs to be at the command line. + +Below is the "Command Help Table". This table explains what data is contained in the individual command/help tables in the sections that follow. +
+
+
+
+
Table of Contents: +
+
+
+
Community and Collection Structure Importer+ +This CLI tool gives you the ability to import a community and collection structure directory from a source XML file. +
+
+
+
+
The administrator need to build the source xml document in the following format: +
+ <import_structure> + <community> + <name>Community Name</name> + <description>Descriptive text</description> + <intro>Introductory text</intro> + <copyright>Special copyright notice</copyright> + <sidebar>Sidebar text</sidebar> + <community> + <name>Sub Community Name</name> + <community> ...[ad infinitum]... + </community> + </community> + <collection> + <name>Collection Name</name> + <description>Descriptive text</description> + <intro>Introductory text</intro> + <copyright>Special copyright notice</copyright> + <sidebar>Sidebar text</sidebar> + <license>Special licence</license> + <provenance>Provenance information</provenance> + </collection> + </community> +</import_structure> ++ The resulting output document will be as follows: +
+ <import_structure> + <community identifier="123456789/1"> + <name>Community Name</name> + <description>Descriptive text</description> + <intro>Introductory text</intro> + <copyright>Special copyright notice</copyright> + <sidebar>Sidebar text</sidebar> + <community identifier="123456789/2"> + <name>Sub Community Name</name> + <community identifier="123456789/3"> ...[ad infinitum]... + </community> + </community> + <collection identifier="123456789/4"> + <name>Collection Name</name> + <description>Descriptive text</description> + <intro>Introductory text</intro> + <copyright>Special copyright notice</copyright> + <sidebar>Sidebar text</sidebar> + <license>Special licence</license> + <provenance>Provenance information</provenance> + </collection> + </community> +</import_structure> ++ This command-line tool gives you the ability to import a community and collection structure directly from a source XML file. It is executed as follows: + +
+ [dspace]/bin/dspace structure-builder -f /path/to/source.xml -o path/to/output.xml -e admin@user.com+ This will examine the contents of source.xml, import the structure into DSpace while logged in as the supplied administrator, and then output the same structure to the output file, but including the handle for each imported community and collection as an attribute. + +Limitation+ +
Package Importer and Exporter+ +This command-line tool gives you access to the Packager plugins. It can ingest a package to create a new DSpace Object (Community, Collection or Item), or disseminate a DSpace Object as a package. + +To see all the options, invoke it as: + +
+ [dspace]/bin/dspace packager --help+ This mode also displays a list of the names of package ingestion and dissemination plugins that are currently installed in your DSpace. Each Packager plugin also may allow for custom options, which may provide you more control over how a package is imported or exported. You can see a listing of all specific packager options by invoking --help (or -h) with the --type (or -t) option: + +
+ [dspace]/bin/dspace packager --help --type METS+ The above example will display the normal help message, while also listing any additional options available to the "METS" packager plugin. + +Ingesting+ +Ingestion Modes & Options+ +When ingesting packages DSpace supports several different "modes". (Please note that not all packager plugins may support all modes of ingestion) +
Ingesting a Single Package+ +To ingest a single package from a file, give the command: +
+ [dspace]/bin/dspace packager -e [user-email] -p [parent-handle] -t [packager-name] /full/path/to/package
+Where [user-email] is the e-mail address of the E-Person under whose authority this runs; [parent-handle] is the Handle of the Parent Object into which the package is ingested, [packager-name] is the plugin name of the package ingester to use, and /full/path/to/package is the path to the file to ingest (or "-" to read from the standard input). + +Here is an example that loads a PDF file with internal metadata as a package: + +
+ [dspace]/bin/dspace packager -e admin@myu.edu -p 4321/10 -t PDF thesis.pdf+ This example takes the result of retrieving a URL and ingests it: +
+ wget -O - http://alum.mit.edu/jarandom/my-thesis.pdf | [dspace]/bin/dspace packager -e admin@myu.edu -p 4321/10 -t PDF -
+Ingesting Multiple Packages at Once+ +Some Packager plugins support bulk ingest functionality using the --all (or -a) flag. When --all is used, the packager will attempt to ingest all child packages referenced by the initial package (and continue on recursively). Some examples follow: +
Here is a basic example of a bulk ingest 'packager' command template: + +
+ [dspace]/bin/dspace packager -s -a -t AIP -e <eperson> -p <parent-handle> <file-path> ++ for example: + +
+ [dspace]/bin/dspace packager -s -a -t AIP -e admin@myu.edu -p 4321/12 collection-aip.zip ++ The above command will ingest the package named "collection-aip.zip" as a child of the specified Parent Object (handle="4321/12"). The resulting object is assigned a new Handle (since -s is specified). In addition, any child packages directly referenced by "collection-aip.zip" are also recursively ingested (a new Handle is also assigned for each child AIP). + +
Restoring/Replacing using Packages+ +Restoring is slightly different than just ingesting. When restoring, the packager makes every attempt to restore the object as it used to be (including its handle, parent object, etc.). + +There are currently three restore modes: +
Default Restore Mode+ +By default, the restore mode (-r option) will rollback all changes if any object is found to already exist. The user will be informed if which object already exists within their DSpace installation. + +Use this 'packager' command template: +
+ [dspace]/bin/dspace packager -r -t AIP -e <eperson> <file-path> ++ For example: +
+ [dspace]/bin/dspace packager -r -t AIP -e admin@myu.edu aip4567.zip ++ Notice that unlike -s option (for submission/ingesting), the -r option does not require the Parent Object (-p option) to be specified if it can be determined from the package itself. + +In the above example, the package "aip4567.zip" is restored to the DSpace installation with the Handle provided within the package itself (and added as a child of the parent object specified within the package itself). If the object is found to already exist, all changes are rolled back (i.e. nothing is restored to DSpace) + +Restore, Keep Existing Mode+ +When the "Keep Existing" flag (-k option) is specified, the restore will attempt to skip over any objects found to already exist. It will report to the user that the object was found to exist (and was not modified or changed). It will then continue to restore all objects which do not already exist. This flag is most useful when attempting a bulk restore (using the --all (or -a) option. + +One special case to note: If a Collection or Community is found to already exist, its child objects are also skipped over. So, this mode will not auto-restore items to an existing Collection. + +Here's an example of how to use this 'packager' command: +
+ [dspace]/bin/dspace packager -r -a -k -t AIP -e <eperson> <file-path> ++ For example: + +
+ [dspace]/bin/dspace packager -r -a -k -t AIP -e admin@myu.edu aip4567.zip ++ In the above example, the package "aip4567.zip" is restored to the DSpace installation with the Handle provided within the package itself (and added as a child of the parent object specified within the package itself). In addition, any child packages referenced by "aip4567.zip" are also recursively restored (the -a option specifies to also restore all child pacakges). They are also restored with the Handles & Parent Objects provided with their package. If any object is found to already exist, it is skipped over (child objects are also skipped). All non-existing objects are restored. + +Force Replace Mode+ +When the "Force Replace" flag (-f option) is specified, the restore will overwrite any objects found to already exist in DSpace. In other words, existing content is deleted and then replaced by the contents of the package(s). + +
Here's an example of how to use this 'packager' command: +
+ [dspace]/bin/dspace packager -r -f -t AIP -e <eperson> <file-path> ++ For example: + +
+ [dspace]/bin/dspace packager -r -f -t AIP -e admin@myu.edu aip4567.zip ++ In the above example, the package "aip4567.zip" is restored to the DSpace installation with the Handle provided within the package itself (and added as a child of the parent object specified within the package itself). In addition, any child packages referenced by "aip4567.zip" are also recursively ingested. They are also restored with the Handles & Parent Objects provided with their package. If any object is found to already exist, its contents are replaced by the contents of the appropriate package. + +If any error occurs, the script attempts to rollback the entire replacement process. + +Disseminating+ +Disseminating a Single Object+ +To disseminate a single object as a package, give the command: +
+ [dspace]/bin/dspace packager -d -e [user-email] -i [handle] -t [packager-name] [file-path]+ Where [user-email] is the e-mail address of the E-Person under whose authority this runs; [handle] is the Handle of the Object to disseminate; [packager-name] is the plugin name of the package disseminator to use; and [file-path] is the path to the file to create (or "-" to write to the standard output). For example: + +
+ [dspace]/bin/dspace packager -d -t METS -e admin@myu.edu -i 4321/4567 4567.zip ++ The above code will export the object of the given handle (4321/4567) into a METS file named "4567.zip". + +Disseminating Multiple Objects at Once+ +To export an object hierarchy, use the -a (or --all) package parameter. + +For example, use this 'packager' command template: + +
+ [dspace]/bin/dspace packager -d -a -e [user-email] -i [handle] -t [packager-name][file-path] ++ for example: + +
+ [dspace]/bin/dspace packager -d -a -t METS -e admin@myu.edu -i 4321/4567 4567.zip ++ The above code will export the object of the given handle (4321/4567) into a METS file named "4567.zip". In addition it would export all children objects to the same directory as the "4567.zip" file. + +Archival Information Packages (AIPs)+ +As of DSpace 1.7, DSpace now can backup and restore all of its contents as a set of AIP Files. This includes all Communities, Collections, Items, Groups and People in the system. + +This feature came out of a requirement for DSpace to better integrate with DuraCloud (http://www.duracloud.org), and other backup storage systems. One of these requirements is to be able to essentially "backup" local DSpace contents into the cloud (as a type of offsite backup), and "restore" those contents at a later time. + +Essentially, this means DSpace can export the entire hierarchy (i.e. bitstreams, metadata and relationships between Communities/Collections/Items) into a relatively standard format (a METS-based, AIP format). This entire hierarchy can also be re-imported into DSpace in the same format (essentially a restore of that content in the same or different DSpace installation). + +For more information, see the section on AIP backup & Restore for DSpace. + +METS packages+ +Since DSpace 1.4 release, the software includes a package disseminator and matching ingester for the DSpace METS SIP (Submission Information Package) format. They were created to help end users prepare sets of digital resources and metadata for submission to the archive using well-defined standards such as METS, MODS, and PREMIS. The plugin name is METS by default, and it uses MODS for descriptive metadata. + +The DSpace METS SIP profile is available at: https://wiki.duraspace.org/display/DSPACE/DSpaceMETSSIPProfile + + + + + + + +Item Importer and Exporter+ +DSpace has a set of command line tools for importing and exporting items in batches, using the DSpace simple archive format. The tools are not terribly robust, but are useful and are easily modified. They also give a good demonstration of how to implement your own item importer if desired. + +DSpace Simple Archive Format+ +The basic concept behind the DSpace's simple archive format is to create an archive, which is directory full of items, with a subdirectory per item. Each item directory contains a file for the item's descriptive metadata, and the files that make up the item. +
+
+archive_directory/
+ item_000/
+ dublin_core.xml -- qualified Dublin Core metadata for metadata fields belonging to the dc schema
+ metadata_[prefix].xml -- metadata in another schema, the prefix is the name of the schema as registered with the metadata registry
+ contents -- text file containing one line per filename
+ file_1.doc -- files to be added as bitstreams to the item
+ file_2.pdf
+ item_001/
+ dublin_core.xml
+ contents
+ file_1.png
+ ...
+
+The dublin_core.xml or metadata[prefix].xml_file has the following format, where each metadata element has it's own entry within a <dcvalue> tagset. There are currently three tag attributes available in the <dcvalue> tagset: + +
Every metadata field used, must be registered via the metadata registry of the DSpace instance first. + +The contents file simply enumerates, one file per line, the bitstream file names. See the following example: +
+ + file_1.doc + file_2.pdf + license ++ Please notice that the license is optional, and if you wish to have one included, you can place the file in the .../item_001/ directory, for example. + +The bitstream name may optionally be followed by the sequence: + +\tbundle:bundlename + +where '\t' is the tab character and 'bundlename' is replaced by the name of the bundle to which the bitstream should be added. If no bundle is specified, the bitstream will be added to the 'ORIGINAL' bundle. + + +Configuring metadata-[prefix].xml for Different Schema+ +It is possible to use other Schema such as EAD, VRA Core, etc. Make sure you have defined the new scheme in the DSpace Metada Schema Registry. +
Importing Items+ +Before running the item importer over items previously exported from a DSpace instance, please first refer to Transferring Items Between DSpace Instances. +
+
+
+
+
‡ These are mutually exclusive. + +The item importer is able to batch import unlimited numbers of items for a particular collection using a very simple CLI command and 'arguments' + +Adding Items to a Collection+ +To add items to a collection, you gather the following information: + +
+ [dspace]/bin/dspace import --add --eperson=joe@user.com --collection=CollectionID --source=items_dir --mapfile=mapfile
+or by using the short form: + +
+ [dspace]/bin/dspace import -a -e joe@user.com -c CollectionID -s items_dir -m mapfile
+The above command would cycle through the archive directory's items, import them, and then generate a map file which stores the mapping of item directories to item handles. SAVE THIS MAP FILE. Using the map file you can use it for replacing or deleting (unimporting) the file. + +Testing. You can add --test (or -t) to the command to simulate the entire import process without actually doing the import. This is extremely useful for verifying your import files before doing the actual import. + + +Replacing Items in Collection+ +Replacing existing items is relatively easy. Remember that mapfile you were supposed to save? Now you will use it. The command (in short form): + +
+ [dspace]/bin/dspace import -r -e joe@user.com -c collectionID -s items_dir -m mapfile
+Long form: + +
+ [dspace]/bin/dspace import --replace --eperson=joe@user.com --collection=collectionID --source=items_dire --mapfile=mapfile
+Deleting or Unimporting Items in a Collection+ +You are able to unimport or delete items provided you have the mapfile. Remember that mapfile you were supposed to save? The command is (in short form): + +
+ [dspace]/bin/dspace import -d -m mapfile
+In long form: + +
+ [dspace]/bin/dspace import --delete --mapfile mapfile
+Other Options+ +
Exporting Items+ +The item exporter can export a single item or a collection of items, and creates a DSpace simple archive for each item to be exported. +
+
+
+
+
Exporting a Collection + +To export a collection's items you type at the CLI: + +
+ [dspace]/bin/dspace export --type=COLLECTION --id=collID --dest=dest_dir --number=seq_num+ Short form: + +
+ [dspace]/bin/dspace export -t COLLECTION -d CollID or Handle -d /path/to/destination -n Some_number+ Exporting a Single Item + +The keyword COLLECTION means that you intend to export an entire collection. The ID can either be the database ID or the handle. The exporter will begin numbering the simple archives with the sequence number that you supply. To export a single item use the keyword ITEM and give the item ID as an argument: + +
+ [dspace]/bin/dspace export --type=ITEM --id=itemID --dest=dest_dir --number=seq_num+ Short form: + +
+ [dspace]/bin/dspace export -t ITEM -i itemID or Handle -d /path/to/destination -n some_number+ Each exported item will have an additional file in its directory, named 'handle'. This will contain the handle that was assigned to the item, and this file will be read by the importer so that items exported and then imported to another machine will retain the item's original handle. + +The -m Argument + +Using the -m argument will export the item/collection and also perform the migration step. It will perform the same process that the next section Transferring Items Between DSpace Instances performs. We recommend that the next section be read in conjunction with this flag being used. + + + +Transferring Items Between DSpace Instances+ +Migration of Data After running the item exporter each dublin_core.xml file will contain metadata that was automatically added by DSpace. These fields are as follows: + +
+ [dspace]/bin/dspace_migrate </path/to/exported item directory>+ prior to running the item importer. This will remove the above metadata items, except for date.issued - if the item has been published or publicly distributed before and identifier.uri - if it is not the handle, from the dublin_core.xml file and remove all handle files. It will then be safe to run the item exporter. + + +Item Update+ +ItemUpdate is a batch-mode command-line tool for altering the metadata and bitstream content of existing items in a DSpace instance. It is a companion tool to ItemImport and uses the DSpace simple archive format to specify changes in metadata and bitstream contents. Those familiar with generating the source trees for ItemImporter will find a similar environment in the use of this batch processing tool. + +For metadata, ItemUpdate can perform 'add' and 'delete' actions on specified metadata elements. For bitstreams, 'add' and 'delete' are similarly available. All these actions can be combined in a single batch run. + +ItemUpdate supports an undo feature for all actions except bitstream deletion. There is also a test mode, as with ItemImport. However, unlike ItemImport, there is no resume feature for incomplete processing. There is more extensive logging with a summary statement at the end with counts of successful and unsuccessful items processed. + +One probable scenario for using this tool is where there is an external primary data source for which the DSpace instance is a secondary or down-stream system. Metadata and/or bitstream content changes in the primary system can be exported to the simple archive format to be used by ItemUpdate to synchronize the changes. + +A note on terminology: item refers to a DSpace item. metadata element refers generally to a qualified or unqualified element in a schema in the form [schema].[element].[qualifier] or [schema].[element] and occasionally in a more specific way to the second part of that form. metadata field refers to a specific instance pairing a metadata element to a value. + +DSpace simple Archive Format+ +As with ItemImporter, the idea behind the DSpace's simple archive format is to create an archive directory with a subdirectory per item. There are a few additional features added to this format specifically for ItemUpdate. Note that in the simple archive format, the item directories are merely local references and only used by ItemUpdate in the log output. + +The user is referred to the previous section DSpace Simple Archive Format. + +Additionally, the use of a delete_contents is now available. This file lists the bitstreams to be deleted, one bitstream ID per line. Currently, no other identifiers for bitstreams are usable for this function. This file is an addition to the Archive format specifically for ItemUpdate. + +The optional suppress_undo file is a flag to indicate that the 'undo archive' should not be written to disk. This file is usually written by the application in an undo archive to prevent a recursive undo. This file is an addition to the Archive format specifically for ItemUpdate. + + +ItemUpdate Commands+ +
+
+
+
+
+
CLI Examples+ +Adding Metadata: + +
+ [dspace]/bin/dspace itemupdate -e joe@user.com -s [path/to/archive] -a dc.description+ This will add from your archive the dc element description based on the handle from the URI (since the -i argument wasn't used). + + + +Registering (Not Importing) Bitstreams+ +Registration is an alternate means of incorporating items, their metadata, and their bitstreams into DSpace by taking advantage of the bitstreams already being in storage accessible to DSpace. An example might be that there is a repository for existing digital assets. Rather than using the normal interactive ingest process or the batch import to furnish DSpace the metadata and to upload bitstreams, registration provides DSpace the metadata and the location of the bitstreams. DSpace uses a variation of the import tool to accomplish registration. + +Accessible Storage+ +To register an item its bitstreams must reside on storage accessible to DSpace and therefore referenced by an asset store number in dspace.cfg. The configuration file dspace.cfg establishes one or more asset stores through the use of an integer asset store number. This number relates to a directory in the DSpace host's file system or a set of SRB account parameters. This asset store number is described in The dspace.cfg Configuration Properties File section and in the dspace.cfg file itself. The asset store number(s) used for registered items should generally not be the value of the assetstore.incoming property since it is unlikely that you will want to mix the bitstreams of normally ingested and imported items and registered items. + + +Registering Items Using the Item Importer+ +DSpace uses the same import tool that is used for batch import except that several variations are employed to support registration. The discussion that follows assumes familiarity with the import tool. + +The archive format for registration does not include the actual content files (bitstreams) being registered. The format is however a directory full of items to be registered, with a subdirectory per item. Each item directory contains a file for the item's descriptive metadata (dublin_core.xml) and a file listing the item's content files (contents), but not the actual content files themselves. + +The dublin_core.xml file for item registration is exactly the same as for regular item import. + +The contents file, like that for regular item import, lists the item's content files, one content file per line, but each line has the one of the following formats: +
+ -r -s n -f filepath +-r -s n -f filepath\tbundle:bundlename +-r -s n -f filepath\tbundle:bundlename\tpermissions: -[r|w] 'group name' +-r -s n -f filepath\tbundle:bundlename\tpermissions: -[r|w] 'group name'\tdescription: some text+ where + +
The command line for registration is just like the one for regular import: + +
+ [dspace]/bin/dspace import -a -e joe@user.com -c collectionID -s items_dir -m mapfile
+(or by using the long form) + +
+ [dspace]/bin/dspace import --add --eperson=joe@user.com --collection=collectionID --source=items_dir --map=mapfile
+The --workflow and --test flags will function as described in Importing Items. + +The --delete flag will function as described in Importing Items but the registered content files will not be removed from storage. See Deleting Registered Items. + +The --replace flag will function as described in Importing Items but care should be taken to consider different cases and implications. With old items and new items being registered or ingested normally, there are four combinations or cases to consider. Foremost, an old registered item deleted from DSpace using --replace will not be removed from the storage. See Deleting Registered Items. where is resides. A new item added to DSpace using --replace will be ingested normally or will be registered depending on whether or not it is marked in the contents files with the -r. + + +Internal Identification and Retrieval of Registered Items+ +Once an item has been registered, superficially it is indistinguishable from items ingested interactively or by batch import. But internally there are some differences: + +First, the randomly generated internal ID is not used because DSpace does not control the file path and name of the bitstream. Instead, the file path and name are that specified in the contents file. + +Second, the store_number column of the bitstream database row contains the asset store number specified in the contents file. + +Third, the internal_id column of the bitstream database row contains a leading flag (-R) followed by the registered file path and name. For example, -Rfilepath where filepath is the file path and name relative to the asset store corresponding to the asset store number. The asset store could be traditional storage in the DSpace server's file system or an SRB account. + +Fourth, an MD5 checksum is calculated by reading the registered file if it is in local storage. If the registerd file is in remote storage (say, SRB) a checksum is calculated on just the file name! This is an efficiency choice since registering a large number of large files that are in SRB would consume substantial network resources and time. A future option could be to have an SRB proxy process calculate MD5s and store them in SRB's metadata catalog (MCAT) for rapid retrieval. SRB offers such an option but it's not yet in production release. + +Registered items and their bitstreams can be retrieved transparently just like normally ingested items. + + +Exporting Registered Items+ +Registered items may be exported as described in Exporting Items. If so, the export directory will contain actual copies of the files being exported but the lines in the contents file will flag the files as registered. This means that if DSpace items are "round tripped" (see Transferring Items Between DSpace Instances) using the exporter and importer, the registered files in the export directory will again registered in DSpace instead of being uploaded and ingested normally. + + +METS Export of Registered Items+ +The METS Export Tool can also be used but note the cautions described in that section and note that MD5 values for items in remote storage are actually MD5 values on just the file name. + + +Deleting Registered Items+ +If a registered item is deleted from DSpace, either interactively or by using the - METS Tools+ +
The experimental (incomplete) METS export tool writes DSpace items to a filesystem with the metadata held in a more standard format based on METS. + +The Export Tool+ +This tool is obsolete. Its use is strongly discouraged. Please use the Package Importer and Exporter instead. + +The following are examples of the types of process the METS tool can provide. + +Exporting an individual item. From the CLI: + +
+ [dspace]/bin/dspace org.dspace.app.mets.METSExport -i [handle] -d /path/to/destination+ Exporting a collection. From the CLI: + +
+ [dspace]/bin/dspace org.dspace.app.mets.METSExport -c [handle] -d /path/to/destination+ Exporting all the items in DSpace. From the CLI: + +
+ [dspace]/bin/dspace org.dspace.app.mets.METSExport -a -d /path/to/destination+ Limitations+ +
MediaFilters: Transforming DSpace Content+ +DSpace can apply filters to content/bitstreams, creating new content. Filters are included that extract text for full-text searching, and create thumbnails for items that contain images. The media filters are controlled by the MediaFilterManager which traverses the asset store, invoking the MediaFilter or FormatFilter classes on bitstreams. The media filter plugin configuration filter.plugins in dspace.cfg contains a list of all enabled media/format filter plugins (see Configuring Media Filters for more information). The media filter system is intended to be run from the command line (or regularly as a cron task): +
+ [dspace]/bin/dspace filter-media+ With no options, this traverses the asset store, applying media filters to bitstreams, and skipping bitstreams that have already been filtered. + +Available Command-Line Options: + +
Sub-Community Management+ +DSpace provides an administrative tool‚ 'CommunityFiliator'‚ for managing community sub-structure. Normally this structure seldom changes, but prior to the 1.2 release sub-communities were not supported, so this tool could be used to place existing pre-1.2 communities into a hierarchy. It has two operations, either establishing a community to sub-community relationship, or dis-establishing an existing relationship. + +The familiar parent/child metaphor can be used to explain how it works. Every community in DSpace can be either a 'parent' community‚ meaning it has at least one sub-community, or a 'child' community‚ meaning it is a sub-community of another community, or both or neither. In these terms, an 'orphan' is a community that lacks a parent (although it can be a parent); 'orphans' are referred to as 'top-level' communities in the DSpace user-interface, since there is no parent community 'above' them. The first operation‚ establishing a parent/child relationship - can take place between any community and an orphan. The second operation - removing a parent/child relationship‚ will make the child an orphan. +
+
+
+
+
Set a parent/child relationship, issue the following at the CLI: + +
+ dspace community-filiator --set --parent=parentID --child=childID+ (or using the short form) + +
+ [dspace]/bin/dspace community-filiator -s -p parentID -c childID+ where ' The reverse operation looks like this: + +
+ [dspace]/bin/dspace community-filiator --remove --parent=parentID --child=childID+ (or using the short form) + +
+ [dspace]/bin/dspace community-filiator -r -p parentID -c childID+ where ' If the required constraints of operation are violated, an error message will appear explaining the problem, and no change will be made. An example in a removal operation, where the stated child community does not have the stated parent community as its parent: "Error, child community not a child of parent community". + +It is possible to effect arbitrary changes to the community hierarchy by chaining the basic operations together. For example, to move a child community from one parent to another, simply perform a 'remove' from its current parent (which will leave it an orphan), followed by a 'set' to its new parent. + +It is important to understand that when any operation is performed, all the sub-structure of the child community follows it. Thus, if a child has itself children (sub-communities), or collections, they will all move with it to its new 'location' in the community tree. + + +Batch Metadata Editing+ +DSpace provides a batch metadata editing tool. The batch editing tool is able to produce a comma delimited file in the CVS format. The batch editing tool facilitates the user to perform the following: + +
The following table summarizes the basics. +
+
+
+
+
Exporting Process+ +To run the batch editing exporter, at the command line: + +
+ [dspace]/bin/dspace metadata-export -f name_of_file.csv -i 1023/24+ Example: + +
+ [dspace]/bin/dspace metadata-export -f /batch_export/col_14.csv -i /1989.1/24+ In the above example we have requested that a collection, assigned handle '1989.1/24' export the entire collection to the file 'col_14.cvs' found in the '/batch_export' directory. + + + +Import Function+ +The following table summarizes the basics. +
+
+
+
+
Silent Mode should be used carefully. It is possible (and probable) that you can overlay the wrong data and cause irreparable damage to the database. + +Importing Process+ +To run the batch importer, at the command line: + +
+ [dspace]/bin/dspace metadata-import -f name_of_file.csv
+Example + +
+ [dspace]/bin/dspace metadata-import -f /dImport/col_14.csv
+If you are wishing to upload new metadata without bitstreams, at the command line: + +
+ [dspace]/bin/dspace/metadata-import -f /dImport/new_file.csv -e joe@user.com -w -n -t
+In the above example we threw in all the arguments. This would add the metadata and engage the workflow, notification, and templates to all be applied to the items that are being added. + +
The CSV Files+ +The csv files that this tool can import and export abide by the RFC4180 CSV format http://www.ietf.org/rfc/rfc4180.txt. This means that new lines, and embedded commas can be included by wrapping elements in double quotes. Double quotes can be included by using two double quotes. The code does all this for you, and any good csv editor such as Excel or OpenOffice will comply with this convention. + +File Structure. The first row of the csv must define the metadata values that the rest of the csv represents. The first column must always be "id" which refers to the item's id. All other columns are optional. The other columns contain the dublin core metadata fields that the data is to reside. + +A typical heading row looks like: +
+ id,collection,dc.title,dc.contributor,dc.date.issued,etc,etc,etc.+ Subsequent rows in the csv file relate to items. A typical row might look like: +
+ 350,2292,Item title,"Smith, John",2008
+If you want to store multiple values for a given metadata element, they can be separated with the double-pipe '||' (or another character that you defined in your _dspace.cfg _file. For example: +
+ Horses||Dogs||Cats+ Elements are stored in the database in the order that they appear in the csv file. You can use this to order elements where order may matter, such as authors, or controlled vocabulary such as Library of Congress Subject Headings. + +When importing a csv file, the importer will overlay the data onto what is already in the repository to determine the differences. It only acts on the contents of the csv file, rather than on the complete item metadata. This means that the CSV file that is exported can be manipulated quite substantially before being re-imported. Rows (items) or Columns (metadata elements) can be removed and will be ignored. For example, if you only want to edit item abstracts, you can remove all of the other columns and just leave the abstract column. (You do need to leave the ID column intact. This is mandatory). + +Editing collection membership. Items can be moved between collections by editing the collection handles in the 'collection' column. Multiple collections can be included. The first collection is the 'owning collection'. The owning collection is the primary collection that the item appears in. Subsequent collections (separated by the field separator) are treated as mapped collections. These are the same as using the map item functionality in the DSpace user interface. To move items between collections, or to edit which other collections they are mapped to, change the data in the collection column. + +Adding items. New metadata-only items can be added to DSpace using the batch metadata importer. To do this, enter a plus sign '+' in the first 'id' column. The importer will then treat this as a new item. If you are using the command line importer, you will need to use the -e flag to specify the user email address or id of the user that is registered as submitting the items. + +Deleting Data. It is possible to perform deletes across the board of certain metadata fields from an exported file. For example, let's say you have used keywords (dc.subject) that need to be removed en masse. You would leave the column (dc.subject) intact, but remove the data in the corresponding rows. + +Migrating Data or Exchanging data. It is possible that you have data in one Dublin Core (DC) element and you wish to really have it in another. An example would be that your staff have input Library of Congress Subject Headings in the Subject field (dc.subject) instead of the LCSH field (dc.subject.lcsh). Follow these steps and your data is migrated upon import: + +
Checksum Checker+ +Checksum Checker is program that can run to verify the checksum of every item within DSpace. Checksum Checker was designed with the idea that most System Administrators will run it from the cron. Depending on the size of the repository choose the options wisely. +
+
+
+
+
There are three aspects of the Checksum Checker's operation that can be configured: + +
Checker Execution Mode+ +Execution mode can be configured using command line options. Information on the options are found in the previous table above. The different modes are described below. + +Unless a particular bitstream or handle is specified, the Checksum Checker will always check bitstreams in order of the least recently checked bitstream. (Note that this means that the most recently ingested bitstreams will be the last ones checked by the Checksum Checker.) + +Available command line options + +
Checker Results Pruning+ +As stated above in "Pruning mode", the checksum_history table can get rather large, and that running the checker with the -p assists in the size of the checksum_history being kept manageable. The amount of time for which results are retained in the checksum_history table can be modified by one of two methods: + +
Checker Reporting+ +Checksum Checker uses log4j to report its results. By default it will report to a log called [dspace]/log/checker.log, and it will report only on bitstreams for which the newly calculated checksum does not match the stored checksum. To report on all bitstreams checked regardless of outcome, use the -v (verbose) command line option: + +[dspace]/bin/dspace checker -l -v (This will loop through the repository once and report in detail about every bitstream checked. + +To change the location of the log, or to modify the prefix used on each line of output, edit the [dspace]/config/templates/log4j.properties file and run [dspace]/bin/install_configs. + + +Cron or Automatic Execution of Checksum Checker+ +You should schedule the Checksum Checker to run automatically, based on how frequently you backup your DSpace instance (and how long you keep those backups). The size of your repository is also a factor. For very large repositories, you may need to schedule it to run for an hour (e.g. -d 1h option) each evening to ensure it makes it through your entire repository within a week or so. Smaller repositories can likely get by with just running it weekly. + +Unix, Linux, or MAC OS. You can schedule it by adding a cron entry similar to the following to the crontab for the user who installed DSpace: + +
+ 0 4 * * 0 [dspace]/bin/dspace checker -d2h -p+ The above cron entry would schedule the checker to run the checker every Sunday at 400 (4:00 a.m.) for 2 hours. It also specifies to 'prune' the database based on the retention settings in dspace.cfg. + +Windows OS. You will be unable to use the checker shell script. Instead, you should use Windows Schedule Tasks to schedule the following command to run at the appropriate times: + +
+ [dspace]/bin/dspace checker -d2h -p+ (This command should appear on a single line). + + +Automated Checksum Checkers' Results+ +Optionally, you may choose to receive automated emails listing the Checksum Checkers' results. Schedule it to run after the Checksum Checker has completed its processing (otherwise the email may not contain all the results). +
+
+
+
+
You can also combine options (e.g. -m -c) for combined reports. + +Cron. Follow the same steps above as you would running checker in cron. Change the time but match the regularity. Remember to schedule this after Checksum Checker has run. + + + +Embargo+ +If you have implemented the Embargo feature, you will need to run it periodically to check for Items with expired embargoes and lift them. +
+
+
+
+
You must run the Embargo Lifter task periodically to check for items with expired embargoes and lift them from being embargoed. For example, to check the status, at the CLI: + +
+ [dspace]/bin/dspace embargo-lifter -c+ To lift the actual embargoes on those items that meet the time criteria, at the CLI: + +
+ [dspace]/bin/dspace embargo-lifter -l+ Browse Index Creation+ +To create all the various browse indexes that you define in the Configuration Section (Chapter 5) there are a variety of options available to you. You can see these options below in the command table. +
+
+
+
+
Running the Indexing Programs+ +Complete Index Regeneration. By running [dspace]/bin/dspace index-init you will completely regenerate your indexes, tearing down all old tables and reconstructing with the new configuration. +
+ [dspace]/bin/dspace index-init+ Updating the Indexes. By running [dspace]/bin/dspace index-update you will reindex your full browse without modifying the table structure. (This should be your default approach if indexing, for example, via a cron job periodically). + +
+ [dspace]/bin/dspace index-update+ Destroy and rebuild. You can destroy and rebuild the database, but do not do the indexing. Output the SQL to do this to the screen and a file, as well as executing it against the database, while being verbose. At the CLI screen: + +
+ [dspace]/bin/dspace index \-r \-t \-p \-v \-x \-o myfile.sql+ Indexing Customization+ +DSpace provides robust browse indexing. It is possible to expand upon the default indexes delivered at the time of the installation. The System Administrator should review "Defining the Indexes" from the Chapter 5. Configuration to become familiar with the property keys and the definitions used therein before attempting heavy customizations. + +Through customization is is possible to: + +
Remember to run index-init after adding any new definitions in the dspace.cfg to have the indexes created and the data indexed. + + + +DSpace Log Converter+ +With the release of DSpace 1.6, new statistics software component was added. DSpace's use of SOLR for statics makes it possible to have a database of statistics. This in mind, there is the issue of the older log files and how a site can use them. The following command process is able to convert the existing log files and then import them for SOLR use. The user will need to perform this only once. + +The Log Converter program converts log files from dspace.log into an intermediate format that can be inserted into SOLR. +
+
+
+
+
The command loads the intermediate log files that have been created by the aforementioned script into SOLR. +
+
+
+
+
Although the DSpace Log Convertor applies basic spider filtering (googlebot, yahoo slurp, msnbot), it is far from complete. Please refer to Statistics Client (8.15) for spider removal operations, after converting your old logs. + + +Client Statistics+ +
+
+
+
+
Notes: + +The usage of these options is open for the user to choose, If they want to keep spider entires in their repository, they can just mark them using "-m" and they will be excluded from statistics queries when "solr.statistics.query.filter.isBot = true" in the dspace.cfg. + +If they want to keep the spiders out of the solr repository, they can run just use the "-i" option and they will be removed immediately. + +There are guards in place to control what can be defined as an IP range for a bot, in [dspace]/config/spiders, spider IP address ranges have to be at least 3 subnet sections in length 123.123.123 and IP Ranges can only be on the smallest subnet [123.123.123.0 - 123.123.123.255]. If not, loading that row will cause exceptions in the dspace logs and exclude that IP entry. + + +Test Database+ +This command can be used at any time to test for Database connectivity. It will assist in troubleshooting PostgreSQL and Oracle connection issues with the database. +
+
+
+
+
Moving items+ +It is possible for administrators to move items one at a time using either the JSPUI or the XMLUI. When editing an item, on the 'Edit item' screen select the 'Move Item' option. To move the item, select the new collection for the item to appear in. When the item is moved, it will take its authorizations (who can READ / WRITE it) with it. + +If you wish for the item to take on the default authorizations of the destination collection, tick the 'Inherit default policies of destination collection' checkbox. This is useful if you are moving an item from a private collection to a public collection, or from a public collection to a private collection. + +
Items may also be moved in bulk by using the CSV batch metadata editor (see above). + + + |
+
![]() |
+
Document generated by Confluence on Mar 25, 2011 19:21 | +
+
+
+ DSpace Documentation : Upgrading a DSpace Installation
+
+
+
+ This page last changed on Mar 25, 2011 by mwood.
+
+
+ DSpace System Documentation: Upgrading a DSpace Installation+ +This section describes how to upgrade a DSpace installation from one version to the next. Details of the differences between the functionality of each version are given in the Version History section. + +
+
+
+
Upgrading from 1.7 to 1.7.x+ +In the notes below [dspace] refers to the install directory for your existing DSpace installation, and [dspace-source] to the source directory for DSpace 1.7.1. Whenever you see these path references, be sure to replace them with the actual path names on your local system. Additionally, be sure to backup your configs, source code modifications, and database before doing a step that could destroy your instance. + + +
Upgrading from 1.6.x to 1.7.x+ +In the notes below [dspace] refers to the install directory for your existing DSpace installation, and [dspace-source] to the source directory for DSpace 1.7.x. Whenever you see these path references, be sure to replace them with the actual path names on your local system. + +
Upgrading from 1.6 to 1.6.x+ +In the notes below [dspace] refers to the install directory for your existing DSpace installation, and [dspace-source] to the source directory for DSpace 1.6.1. Whenever you see these path references, be sure to replace them with the actual path names on your local system. + +
Upgrading from 1.5.x to 1.6.x+ +In the notes below [dspace] refers to the install directory for your existing DSpace installation, and [dspace-source] to the source directory for DSpace 1.6. Whenever you see these path references, be sure to replace them with the actual path names on your local system. + +
Upgrading From 1.5 or 1.5.1 to 1.5.2+ +The changes in DSpace 1.5.2 do not include any database schema upgrades, and the upgrade should be straightforward. + +In the notes below [dspace] refers to the install directory for your existing DSpace installation, and [dspace-source] to the source directory for DSpace 1.5. Whenever you see these path references, be sure to replace them with the actual path names on your local system. + +
Upgrading From 1.4.2 to 1.5+ +The changes in DSpace 1.5 are significant and wide spread involving database schema upgrades, code restructuring, completely new user and programmatic interfaces, and new build system. + +In the notes below [dspace] refers to the install directory for your existing DSpace installation, and [dspace-source] to the source directory for DSpace 1.5. Whenever you see these path references, be sure to replace them with the actual path names on your local system. + +
Upgrading From 1.4.1 to 1.4.2+ +See Upgrading From 1.4 to 1.4.x; the same instructions apply. + + +Upgrading From 1.4 to 1.4.x+ +The changes in 1.4.x releases are only code and configuration changes so the update is simply a matter of rebuilding the wars and slight changes to your config file. + +In the notes below [dspace] refers to the install directory for your existing DSpace installation, and [dspace-1.4.x-source] to the source directory for DSpace 1.4.x. Whenever you see these path references, be sure to replace them with the actual path names on your local system. + +
Upgrading From 1.3.2 to 1.4.x+ +
Upgrading From 1.3.1 to 1.3.2+ +The changes in 1.3.2 are only code changes so the update is simply a matter of rebuilding the wars. + +In the notes below [dspace] refers to the install directory for your existing DSpace installation, and [dspace-1.3.2-source] to the source directory for DSpace 1.3.2. Whenever you see these path references, be sure to replace them with the actual path names on your local system. + +
Upgrading From 1.2.x to 1.3.x+ +In the notes below [dspace] refers to the install directory for your existing DSpace installation, and [dspace-1.3.x-source] to the source directory for DSpace 1.3.x. Whenever you see these path references, be sure to replace them with the actual path names on your local system. + +
Upgrading From 1.2.1 to 1.2.2+ +The changes in 1.2.2 are only code and config changes so the update should be fairly simple. + +In the notes below [dspace] refers to the install directory for your existing DSpace installation, and [dspace-1.2.2-source] to the source directory for DSpace 1.2.2. Whenever you see these path references, be sure to replace them with the actual path names on your local system. + +
Upgrading From 1.2 to 1.2.1+ +The changes in 1.2.1 are only code changes so the update should be fairly simple. + +In the notes below [dspace] refers to the install directory for your existing DSpace installation, and [dspace-1.2.1-source] to the source directory for DSpace 1.2.1. Whenever you see these path references, be sure to replace them with the actual path names on your local system. + +
Upgrading From 1.1 (or 1.1.1) to 1.2+ +The process for upgrading to 1.2 from either 1.1 or 1.1.1 is the same. If you are running DSpace 1.0 or 1.0.1, you need to follow the instructions for upgrading from 1.0.1 to 1.1 to before following these instructions. + +Note also that if you've substantially modified DSpace, these instructions apply to an unmodified 1.1.1 DSpace instance, and you'll need to adapt the process to any modifications you've made. + +This document refers to the install directory for your existing DSpace installation as [dspace], and to the source directory for DSpace 1.2 as [dspace-1.2-source]. Whenever you see these path references below, be sure to replace them with the actual path names on your local system. + +
Upgrading From 1.1 to 1.1.1+ +Fortunately the changes in 1.1.1 are only code changes so the update is fairly simple. + +In the notes below [dspace] refers to the install directory for your existing DSpace installation, and [dspace-1.1.1-source] to the source directory for DSpace 1.1.1. Whenever you see these path references, be sure to replace them with the actual path names on your local system. + +
Upgrading From 1.0.1 to 1.1+ +To upgrade from DSpace 1.0.1 to 1.1, follow the steps below. Your dspace.cfg does not need to be changed. In the notes below [dspace] refers to the install directory for your existing DSpace installation, and [dspace-1.1-source] to the source directory for DSpace 1.1. Whenever you see these path references, be sure to replace them with the actual path names on your local system. + +
|
+
![]() |
+
Document generated by Confluence on Mar 25, 2011 19:21 | +
+
+
+ DSpace Documentation : XMLUI Base Theme Templates (dri2xhtml)
+
+
+
+ This page last changed on Dec 15, 2010 by tdonohue.
+
+
+ Base Templates for Creating XMLUI Themes+ +
+
+
+
dri2xhtml+ +The dri2xhtml base template is the original template for creating XMLUI themes. It attempts to provide generic XSLT templates which are then applied across the entire DSpace site, thus making it easier to make site-wide changes. + +The dri2xhtml base template is used in the following Themes: +
Template Structure+ +The dri2xhtml base template consists of five main XSLTs: +
dri2xhtml-alt+ +The dri2xhtml-alt base template is an alternative template for creating XMLUI themes. It contains the same XSLT templates from dri2xhtml, but they are divided into multiple files and folders. Each file attempts to group XSLT templates together based on their function, in order to make it easier to find the templates related to the feature you're trying to modify. + +The dri2xhtml-alt base template is used in the following Themes: +
Configuration and Installation+ +The alternative basic templates is called "dri2xhtml-alt".
+ +<xsl:stylesheet xmlns:i18n="http://apache.org/cocoon/i18n/2.1" + xmlns:dri="http://di.tamu.edu/DRI/1.0/" + xmlns:mets="http://www.loc.gov/METS/" + xmlns:xlink="http://www.w3.org/TR/xlink/" + xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0" + xmlns:dim="http://www.dspace.org/xmlns/dspace/dim" + xmlns:xhtml="http://www.w3.org/1999/xhtml" + xmlns:mods="http://www.loc.gov/mods/v3" + xmlns:dc="http://purl.org/dc/elements/1.1/" + xmlns="http://www.w3.org/1999/xhtml" + exclude-result-prefixes="i18n dri mets xlink xsl dim xhtml mods dc"> + + <!-- + comment out original dri2xhtml + <xsl:import href="../dri2xhtml.xsl"/> + and enable dri2xhtml-alt + --> + + <xsl:import href="../dri2xhtml-alt/dri2xhtml.xsl"/> + + <xsl:output indent="yes"/> + ++ Because the contents of dri2xhtml-alt is identical to the current dri2xhtml.xsl and its derivatives, updating any of the existing themes to reference the new dri2xhtml-alt should not impose any changes in the rendering of the pages. + + +Features+ +
Template Structure+ +
+ +/dspace-xmlui/dspace-xmlui-webapp/src/main/webapp/themes/dri2xhtml-alt/ + +├── aspect +│ ├── administrative +│ │ └── harvesting.xsl +│ ├── artifactbrowser +│ │ ├── COinS.xsl +│ │ ├── ORE.xsl +│ │ ├── artifactbrowser.xsl +│ │ ├── collection-list.xsl +│ │ ├── collection-view.xsl +│ │ ├── common.xsl +│ │ ├── community-list.xsl +│ │ ├── community-view.xsl +│ │ ├── item-list.xsl +│ │ └── item-view.xsl +│ └── general +│ └── choice-authority-control.xsl +├── core +│ ├── attribute-handlers.xsl +│ ├── elements.xsl +│ ├── forms.xsl +│ ├── global-variables.xsl +│ ├── navigation.xsl +│ ├── page-structure.xsl +│ └── utils.xsl +└── dri2xhtml.xsl ++ |
+
![]() |
+
Document generated by Confluence on Mar 25, 2011 19:21 | +
+
+
+ DSpace Documentation : XMLUI Configuration and Customization
+
+
+
+ This page last changed on Feb 17, 2011 by helix84.
+
+
+ DSpace System Documentation: Manakin [XMLUI] Configuration and Customization+ +The DSpace digital repository supports two user interfaces: one based on JavaServer Pages (JSP) technologies and one based upon the Apache Cocoon framework (XMLUI). This chapter describes those parameters which are specific to the Manakin (XMLUI) interface based upon the Cocoon framework. + +
+
+
+
Manakin Configuration Property Keys+ +In an effort to save the programmer/administrator some time, the configuration table below is taken from 5.3.43. XMLUI Specific Configuration. +
+
+
+
+
+
Configuring Themes and Aspects+ +The Manakin user interface is composed of two distinct components: aspects and themes. Manakin aspects are like extensions or plugins for Manakin; they are interactive components that modify existing features or provide new features for the digital repository. Manakin themes stylize the look-and-feel of the repository, community, or collection. + +The repository administrator is able to define which aspects and themes are installed for the particular repository by editing the [dspace]/config/xmlui.xconf configuration file. The xmlui.xconf file consists of two major sections: Aspects and Themes. + +Aspects+ +The <aspects> section defines the "Aspect Chain", or the linear set of aspects that are installed in the repository. For each aspect that is installed in the repository, the aspect makes available new features to the interface. For example, if the "submission" aspect were to be commented out or removed from the xmlui.xconf, then users would not be able to submit new items into the repository (even the links and language prompting users to submit items are removed). Each <aspect> element has two attributes, name and path. The name is used to identify the Aspect, while the path determines the directory where the aspect's code is located. Here is the default aspect configuration: +
+ + <aspects> + <aspect name="Artifact Browser" path="resource://aspects/ArtifactBrowser/" /> + <aspect name="Administration" path="resource://aspects/Administrative/" /> + <aspect name="E-Person" path="resource://aspects/EPerson/" /> + <aspect name="Submission and Workflow" path="resource://aspects/Submission/" /> + </aspects>+ A standard distribution of Manakin/DSpace includes four "core" aspects: +
Themes+ +The <themes> section defines a set of "rules" that determine where themes are installed in the repository. Each rule is processed in the order that it appears, and the first rule that matches determines the theme that is applied (so order is important). Each rule consists of a <theme> element with several possible attributes: + +
Multilingual Support+ +The XMLUI user interface supports multiple languages through the use of internationalization catalogues as defined by the Cocoon Internationalization Transformer. Each catalog contains the translation of all user-displayed strings into a particular language or variant. Each catalog is a single xml file whose name is based upon the language it is designated for, thus: + +messages_language_country_variant.xml + +messages_language_country.xml + +messages_language.xml + +messages.xml + +The interface will automatically determine which file to select based upon the user's browser and system configuration. For example, if the user's browser is set to Australian English then first the system will check if messages_en_au.xml is available. If this translation is not available it will fall back to messages_en.xml, and finally if that is not available, messages.xml. + +Manakin supplies an English only translation of the interface. In order to add other translations to the system, locate the [dspace-source]/dspace/modules/xmlui/src/main/webapp/i18n/ directory. By default this directory will be empty; to add additional translations add alternative versions of the messages.xml file in specific language and country variants as needed for your installation. + +To set a language other than English as the default language for the repository's interface, simply name the translation catalogue for the new default language "messages.xml" + + +Creating a New Theme+ +Manakin themes stylize the look-and-feel of the repository, community, or collection and are distributed as self-contained packages. A Manakin/DSpace installation may have multiple themes installed and available to be used in different parts of the repository. The central component of a theme is the sitemap.xmap, which defines what resources are available to the theme such as XSL stylesheets, CSS stylesheets, images, or multimedia files.
+ + <global-variables> + <theme-path>[your theme's directory]</theme-path> + <theme-name>[your theme's name]</theme-name> + </global-variables>+ Update both the theme's path to the directory name you created in step one. The theme's name is used only for documentation. [your theme's directory]/lib/style.css (The base style sheet used for all browsers) + +[your theme's directory]/lib/style-ie.css (Specific stylesheet used for internet explorer)
Customizing the News Document+ +The XMLUI "news" document is only shown on the root page of your repository. It was intended to provide the title and introductory message, but you may use it for anything. + +The news document is located at [dspace]/dspace/config/news-xmlui.xml. There is only one version; it is localized by inserting "i18n" callouts into the text areas. It must be a complete and valid XML DRI document (see Chapter 15). + +Its (the News document) exact rendering in the XHTML UI depends, of course, on the theme. The default content is designed to operate with the reference themes, so when you modify it, be sure to preserve the tag structure and e.g. the exact attributes of the first DIV tag. Also note that the text is DRI, not HTML, so you must use only DRI tags, such as the XREF tag to construct a link. + +Example 1: a single language: +
+ <document> + <body> + <div id="file.news.div.news" n="news" rend="primary"> + <head> TITLE OF YOUR REPOSITORY HERE </head> + <p> + INTRO MESSAGE HERE + Welcome to my wonderful repository etc etc ... + A service of <xref target="http://myuni.edu/">My University</xref> + </p> + </div> + </body> + <options/> + <meta> + <userMeta/> + <pageMeta/> + <repositoryMeta/> + </meta> + </document>+ Example 2: all text replaced by references to localizable message keys: +
+ +<document> + <body> + <div id="file.news.div.news" n="news" rend="primary"> + <head><i18n:text>myuni.repo.title</i18n:text></head> + <p> + <i18n:text>myuni.repo.intro</i18n:text> + <i18n:text>myuni.repo.a.service.of</i18n:text> + <xref target="http://myuni.edu/"><i18n:text>myuni.name</i18n:text></xref> + </p> + </div> + </body> + <options/> + <meta> + <userMeta/> + <pageMeta/> + <repositoryMeta/> + </meta> + </document> ++ Adding Static Content+ +The XMLUI user interface supports the addition of globally static content (as well as static content within individual themes). + +Globally static content can be placed in the [dspace-source]/dspace/modules/xmlui/src/main/webapp/static/ directory. By default this directory only contains the default robots.txt file, which provides helpful site information to web spiders/crawlers. However, you may also add static HTML (*.html) content to this directory, as needed for your installation. + +Any static HTML content you add to this directory may also reference static content (e.g. CSS, Javascript, Images, etc.) from the same [dspace-source]/dspace/modules/xmlui/src/main/webapp/static/ directory. You may reference other static content from your static HTML files similar to the following: +
+ + <link href="./static/mystyle.css" rel="stylesheet" type="text/css"/> + <img src="./static/images/static-image.gif" alt="Static image in /static/images/ directory"/> + <img src="./static/static-image.jpg" alt="Static image in /static/ directory"/>+ Enabling OAI-ORE Harvester using XMLUI+ +This section will give the necessary steps to set up the OAI-ORE Harvester usig Manakin. + +Setting up a collection (Collection Edit Screen): + +
Automatic Harvesting (Scheduler)+ +Setting up automatic harvesting in the Control Panel Screen. + +
Additional XMLUI Learning Resources+ +Useful links with further information into XMLUI Development + +
|
+
![]() |
+
Document generated by Confluence on Mar 25, 2011 19:21 | +
+
![]() |
+
Document generated by Confluence on Mar 25, 2011 19:21 | +