diff --git a/dspace/docs/html/apa.html b/dspace/docs/html/apa.html index c058fecc56..9b4f90a75a 100644 --- a/dspace/docs/html/apa.html +++ b/dspace/docs/html/apa.html @@ -1,4 +1,4 @@ -
+
¹Used by system. DO NOT REMOVE
Note that there is no 'DELETE' action. In order to 'delete' an object (e.g. an item) from the archive, one must have REMOVE permission on all objects (in this case, collection) that contain it. The 'orphaned' item is automatically deleted. Policies can apply to individual e-people or groups of e-people. Rather than being a single subsystem, ingesting is a process that spans several. Below is a simple illustration of the current ingesting process in DSpace.
DSpace Ingest Process The batch item importer is an application, which turns an external SIP (an XML metadata document with some content files) into an "in progress submission" object. The Web submission UI is similarly used by an end-user to assemble an "in progress submission" object. Depending on the policy of the collection to which the submission in targeted, a workflow process may be started. This typically allows one or more human reviewers or 'gatekeepers' to check over the submission and ensure it is suitable for inclusion in the collection. When the Batch Ingester or Web Submit UI completes the InProgressSubmission object, and invokes the next stage of ingest (be that workflow or item installation), a provenance message is added to the Dublin Core which includes the filenames and checksums of the content of the submission. Likewise, each time a workflow changes state (e.g. a reviewer accepts the submission), a similar provenance statement is added. This allows us to track how the item has changed since a user submitted it. Once any workflow process is successfully and positively completed, the InProgressSubmission object is consumed by an "item installer", that converts the InProgressSubmission into a fully blown archived item in DSpace. The item installer:
A collection's workflow can have up to three steps. Each collection may have an associated e-person group for performing each step; if no group is associated with a certain step, that step is skipped. If a collection has no e-person groups associated with any step, submissions to that collection are installed straight into the main archive. In other words, the sequence is this: The collection receives a submission. If the collection has a group assigned for workflow step 1, that step is invoked, and the group is notified. Otherwise, workflow step 1 is skipped. Likewise, workflow steps 2 and 3 are performed if and only if the collection has a group assigned to those steps. When a step is invoked, the task of performing that workflow step put in the 'task pool' of the associated group. One member of that group takes the task from the pool, and it is then removed from the task pool, to avoid the situation where several people in the group may be performing the same task without realizing it. The member of the group who has taken the task from the pool may then perform one of three actions:
It is important to rememeber that there are two
The runtime file is supposed to be the copy of the source file, which is considered the master version. However, the DSpace server and command programs only look at the runtime configuration file, so when you are revising your configuration values, it is tempting to only edit the runtime file. DO NOT do this. Always make the same changes to the source version of To keep the two files in synchronization, you can edit your files in cd /[dspace-source]/dspace/target/dspace-<version>-build.dir ant update_configs - This will copy the source
The primary way of configuring DSpace is to edit the
Normalization Rules are those rules that make it possible for the indexes to intermix entries without regard to case sensitivity. By default, the display of metadata in the browse indexes are case-sensitive. In the example below, you retrieve separate entries:
However, clicking through from either of these will result in the same set of items (i.e., any item that contains either representation in the correct field).
At the present time, you would need to edit your metadata to clean up the index presentation. We set other browse values in the following section.
At the present time, you would need to edit your metadata to clean up the index presentation. We set other browse values in the following section.
Replace Now that we know which field is our author or other multiple metadata value field we can provide the option to truncate the number of values displayed by default. We replace the remaining list of values with "et al" or the language pack specific alternative. Note that this is just for the default, and users will have the option of changing the number displayed when they browse the results. See the following table:
We can define which fields link to other browse listings. This is useful, for example, to link an author's name to a list of just that author's items. The effect this has is to create links to browse views for the item clicked on. If it is a "single" type, it will link to a view of all the items which share that metadata element in common (i.e. all the papers by a single author). If it is a "full" type, it will link to a view of the standard full browse page, starting with the value of the link clicked on.
The format of the property key is
Replace Now that we know which field is our author or other multiple metadata value field we can provide the option to truncate the number of values displayed by default. We replace the remaining list of values with "et al" or the language pack specific alternative. Note that this is just for the default, and users will have the option of changing the number displayed when they browse the results. See the following table:
We can define which fields link to other browse listings. This is useful, for example, to link an author's name to a list of just that author's items. The effect this has is to create links to browse views for the item clicked on. If it is a "single" type, it will link to a view of all the items which share that metadata element in common (i.e. all the papers by a single author). If it is a "full" type, it will link to a view of the standard full browse page, starting with the value of the link clicked on.
The format of the property key is
Examples of some browse links used in a real DSpace installation instance:
This allows us to define which index to base Recent Submission display on, and how many we should show at any one time. This uses the PluginManager to automatically load the relevant plugin for the Community and Collection home pages. Values given in examples are the defaults supplied in
There will be the need to set up the processors that the PluginManager will load to actually perform the recent submissions query on the relevant pages. This is already configured by default + | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
webui.browse.link. n | n is an arbitrary number you choose | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
<index name> | This need to match your entry for the index name from webui.browse.index property key. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
<display column metadata> | Use the DC element (and qualifier) |
Examples of some browse links used in a real DSpace installation instance:
webui.browse.link.1 = author:dc.contributor.* Creates a link for all types of contributors (authors, editors, illustrators, others, etc.) | webui.browse.link.2 = subject:dc.subject.lcsh Creates a link to subjects that are Library of Congress only. In this case, you have a browse index that contains only LC Subject Headings |
webui.browse.link.3 = series:dc.relation.ispartofseries Creates a link for the browse index "Series". Please note this is again, a customized browse index and not part of the DSpace distributed release. |
This allows us to define which index to base Recent Submission display on, and how many we should show at any one time. This uses the PluginManager to automatically load the relevant plugin for the Community and Collection home pages. Values given in examples are the defaults supplied in dspace.cfg
Property: | recent.submission.sort-option |
Example Value: | recent.submission.sort-option = dateaccessioned |
Informational Note: | First is to define the sort name (from webui.browse.sort-options ) to use for displaying recent submissions. |
Property: | recent.submissions.count |
Example Value: | recent.submissions.count = 5 |
Informational Note: | Defines how many recent submissions should be displayed at any one time. |
There will be the need to set up the processors that the PluginManager will load to actually perform the recent submissions query on the relevant pages. This is already configured by default dspace.cfg
so there should be no need for the administrator/programmer to worry about this.
plugin.sequence.org.dspace.plugin.CommunityHomeProcessor = \ org.dspace.app.webui.components.RecentCommunitySubmissions plugin.sequence.org.dspace.plugin.CollectionHomeProcessor = \ org.dspace.app.webui.components.RecentCollectionSubmissions -
This will enable syndication feeds—links display on community and collection home pages. This setting is not used by the XMLUI, as you enable feeds in your theme.
Property: | webui.feed.enable | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Example Value: | webui.feed.enable = false | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Informational Note: | By default, RSS feeds are set to false (off). Change key to "true" to enable. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Property: | webui.feed.items | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Example Value: | webui.feed.items = 4 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Informational Note: | Defines the number of DSpace items per feed (the most recent submissions) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Property: | webui.feed.cache.size | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Example Value: | webui.feed.cache.size = 100 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Informational Note: | Defines the maximum number of feeds in memory cache. Value of "0 " will disable caching. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Property: | webui.feed.cache.age | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Example Value: | webui.feed.cache.age = 48 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Informational Note: | Defines the number of hours to keep cached feeds before checking currency. The value of "0 " will force a check with each request. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Property: | webui.feed.formats | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Example Value: | webui.feed.formats = rss_1.0,rss_2.0,atom_1.0 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Informational Note: | Defines which syndication formats to offer. You can use more than one; use a comma-separated list. The following list are the available values: rss_0.90, rss_0.91, rss_0.92, rss_0.93, rss_0.94, rss_1.0, rss_2.0, atom_1.0. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Property: | webui.feed.localresolve | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Example Value: | webui.feed.localresolve = false | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Informational Note: | By default, (set to false), URLs returned by the feed will point at the global handle resolver (e.g. http://hdl.handle.net/123456789/1). If set to true the local server URLs are used (e.g. http://myserver.myorg/handle/123456789/1). | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Property: |
-webui.feed.item.title + This will enable syndication feeds—links display on community and collection home pages. This setting is not used by the XMLUI, as you enable feeds in your theme.
The following configuration is used to change the disposition behavior of the browser. That is, when the browser will attempt to open the file or download it to the user's specified location. For example, the default size is 8Mb. When an item being viewed is larger than 8MB, the browser will download the file to the desktop (or wherever you have it set to download) and the user will have to open it manually.
Other values are possible:
The setting is used to configure the "depth" of request for html documents bearing the same name.
The following configuration is used to change the disposition behavior of the browser. That is, when the browser will attempt to open the file or download it to the user's specified location. For example, the default size is 8Mb. When an item being viewed is larger than 8MB, the browser will download the file to the desktop (or wherever you have it set to download) and the user will have to open it manually.
Other values are possible:
The setting is used to configure the "depth" of request for html documents bearing the same name.
To aid web crawlers index the content within your repository, you can make use of sitemaps.
To aid web crawlers index the content within your repository, you can make use of sitemaps.
The following section is limited to JSPUI. If the user wishes to use XMLUI settings, please refer to Chapter 7: XMLUI Configuration and Customization.
The following section is limited to JSPUI. If the user wishes to use XMLUI settings, please refer to Chapter 7: XMLUI Configuration and Customization.
The table above, if needed and is used will result in:
If you set webui.supported.locales make sure that all the related additional files for each language are available.
|
Property: | webui.supported.locale |
Example Value: | webui.supported.locale = en, de or perhapswebui.supported.locals = en, en_ca, de |
Informational Note: | All the locales that are supported by this instance of DSpace. Comma separated list. |
The table above, if needed and is used will result in:
a language switch in the default header
the user will be enabled to choose his/her preferred language, this will be part of his/her profile
wording of emails
mails to registered users, e.g. alerting service will use the preferred language of the user
mails to unregistered users, e.g. suggest an item will use the language of the session
according to the language selected for the session, using dspace-admin Edit News will edit the news file of the language according to session
If you set webui.supported.locales make sure that all the related additional files for each language are available. LOCALE
should correspond to the locale set in webui.supported.locales
, e. g.: for webui.supported.locales = en, de, fr, there should be:
[dspace-source]/dspace/modules/jspui/src/main/resources/Messages.properties
[dspace-source]/dspace/modules/jspui/src/main/resources/Messages_en.properties
[dspace-source]/dspace/modules/jspui/src/main/resources/Messages_de.properties
@@ -1147,10 +1151,10 @@ webui.itemlist.<sort or index name>.columns
[dspace]/webapps/jspui/help/site-admin_LOCALE.html
must be copied to [dspace-source]/dspace/modules/jspui/src/main/webapp/help
-
Because the item mapper requires a primitive implementation of the browse system to be present, we simply need to tell that system which of our indexes defines the author browse (or equivalent) so that the mapper can list authors' items for mapping
Define the index name (from webui.browse.index
) to use for displaying items by author.
Property: | itemmap.author.index |
Example Value: | itemmap.author.index = author |
Informational Note: | If you change the name of your author browse field, you will also need to update this property key. |
Property: | webui.mydspace.showgroupmembership |
Example Value: | webui.mydspace.showgroupmembership = false |
Informational Note: | To display group membership set to "true". If omitted, the default behavior is false. |
SFX Server is an OpenURL Resolver.
Property: | sfx.server.url |
Example Value: | sfx.server.url = http://sfx.myu.edu:8888/sfx? |
Informational Note: | SFX query is appended to this URL. If this property is commented out or omitted, SFX support is switched off. |
Property: | webui.suggest.enable |
Example Value: | webui.suggest.enable = true |
Informational Note: | Show a link to the item recommendation page from item display page. |
Property: | webui.suggest.loggedinusers.only |
Example Value: | webui.suggest.loggedinusers.only = true |
Informational Note: | Enable only if the user is logged in. If this key commented out, the default value is false. |
DSpace now supports controlled vocabularies to confine the set of keywords that users can use while describing items.
Property: | webui.controlledvocabulary.enable |
Example Value: | webui.controlledvocabulary.enable = true |
Informational Note: | Enable or disable the controlled vocabulary add-on. WARNING: This feature is not compatible with WAI (it requires javascript to function). |
The need for a limited set of keywords is important since it eliminates the ambiguity of a free description system, consequently simplifying the task of finding specific items of information.
The controlled vocabulary add-on allows the user to choose from a defined set of keywords organized in an tree (taxonomy) and then use these keywords to describe items while they are being submitted.
We have also developed a small search engine that displays the classification tree (or taxonomy) allowing the user to select the branches that best describe the information that he/she seeks.
The taxonomies are described in XML following this (very simple) structure:
+
Because the item mapper requires a primitive implementation of the browse system to be present, we simply need to tell that system which of our indexes defines the author browse (or equivalent) so that the mapper can list authors' items for mapping
Define the index name (from webui.browse.index
) to use for displaying items by author.
Property: | itemmap.author.index |
Example Value: | itemmap.author.index = author |
Informational Note: | If you change the name of your author browse field, you will also need to update this property key. |
Property: | webui.mydspace.showgroupmembership |
Example Value: | webui.mydspace.showgroupmembership = false |
Informational Note: | To display group membership set to "true". If omitted, the default behavior is false. |
SFX Server is an OpenURL Resolver.
Property: | sfx.server.url |
Example Value: | sfx.server.url = http://sfx.myu.edu:8888/sfx? |
Informational Note: | SFX query is appended to this URL. If this property is commented out or omitted, SFX support is switched off. |
Property: | webui.suggest.enable |
Example Value: | webui.suggest.enable = true |
Informational Note: | Show a link to the item recommendation page from item display page. |
Property: | webui.suggest.loggedinusers.only |
Example Value: | webui.suggest.loggedinusers.only = true |
Informational Note: | Enable only if the user is logged in. If this key commented out, the default value is false. |
DSpace now supports controlled vocabularies to confine the set of keywords that users can use while describing items.
Property: | webui.controlledvocabulary.enable |
Example Value: | webui.controlledvocabulary.enable = true |
Informational Note: | Enable or disable the controlled vocabulary add-on. WARNING: This feature is not compatible with WAI (it requires javascript to function). |
The need for a limited set of keywords is important since it eliminates the ambiguity of a free description system, consequently simplifying the task of finding specific items of information.
The controlled vocabulary add-on allows the user to choose from a defined set of keywords organized in an tree (taxonomy) and then use these keywords to describe items while they are being submitted.
We have also developed a small search engine that displays the classification tree (or taxonomy) allowing the user to select the branches that best describe the information that he/she seeks.
The taxonomies are described in XML following this (very simple) structure:
<node id="acmccs98" label="ACMCCS98"> - <isComposedBy> + <isComposedBy> <node id="A." label="General Literature"> <isComposedBy> <node id="A.0" label="GENERAL"/> @@ -1175,14 +1179,26 @@ webui.itemlist.<sort or index name>.columns<required></required> <vocabulary [closed="false"]>nsi</vocabulary> </field> -
The vocabulary element has an optional boolean attribute closed that can be used to force input only with the javascript of controlled-vocabulary add-on. The default behavior (i.e. without this attribute) is as set closed="false". This allow the user also to enter the value in free way.
The following vocabularies are currently available by default:
nsi - nsi.xml - The Norwegian Science Index
srsc - srsc.xml - Swedish Research Subject Categories
The DSpace digital repository supports two user interfaces: one based upon JSP technologies and the other based upon the Apache Cocoon framework. This section describes those configurations settings which are specific to the XMLUI interface based upon the Cocoon framework. (Prior to DSpace Release 1.5.1 XMLUI was referred to Manakin. You may still see references to "Manakin")
Property: | xmlui.supported.locales |
Example Value: | xmlui.supported.locales = en, de |
Informational Note: | A list of supported locales for Manakin. Manakin will look at a user's browser configuration for the first language that appears in this list to make available to in the interface. This parameter is a comma separated list of Locales. All types of Locales country, country_language, country_language_variant. Note that if the appropriate files are not present (i.e. Messages_XX_XX.xml) then Manakin will fall back through to a more general language. |
Property: | xmlui.force.ssl |
Example Value: | xmlui.force.ssl = true |
Informational Note: | Force all authenticated connections to use SSL, only non-authenticated connections are allowed over plain http. If set to true, then you need to ensure that the 'dspace.hostname ' parameter is set to the correctly. |
Property: | xmlui.user.registration |
Example Value: | xmlui.user.registration = true |
Informational Note: | Determine if new users should be allowed to register. This parameter is useful in conjunction with Shibboleth where you want to disallow registration because Shibboleth will automatically register the user. Default value is true. |
Property: | xmlui.user.editmetadata |
Example Value: | xmlui.user.editmetadata = true |
Informational Note: | Determines if users should be able to edit their own metadata. This parameter is useful in conjunction with Shibboleth where you want to disable the user's ability to edit their metadata because it came from Shibboleth. Default value is true. |
Property: | xmlui.user.assumelogon |
Example Value: | xmlui.user.assumelogon = true |
Informational Note: | Determine if super administrators (those whom are in the Administrators group) can login as another user from the "edit eperson" page. This is useful for debugging problems in a running dspace instance, especially in the workflow process. The default value is false, i.e., no one may assume the login of another user. |
Property: | xmlui.user.loginredirect |
Example Value: | xmlui.user.loginredirect = /profile |
Informational Note: | After a user has logged into the system, which url should they be directed? Leave this parameter blank or undefined to direct users to the homepage, or /profile for the user's profile, or another reasonable choice is /submissions to see if the user has any tasks awaiting their attention. The default is the repository home page. |
Property: | xmlui.theme.allowoverrides |
Example Value: | xmlui.theme.allowoverrides = false |
Informational Note: | Allow the user to override which theme is used to display a particular page. When submitting a request add the HTTP parameter "themepath" which corresponds to a particular theme, that specified theme will be used instead of the any other configured theme. Note that this is a potential security hole allowing execution of unintended code on the server, this option is only for development and debugging it should be turned off for any production repository. The default value unless otherwise specified is "false". |
Property: | xmlui.bundle.upload |
Example Value: | xmlui.bundle.upload = ORIGINAL, METADATA, THUMBNAIL, LICENSE, CC_LICENSE |
Informational Note: | Determine which bundles administrators and collection administrators may upload into an existing item through the administrative interface. If the user does not have the appropriate privileges (add and write) on the bundle then that bundle will not be shown to the user as an option. |
Property: | xmlui.community-list.render.full |
Example Value: | xmlui.community-list.render.full = true |
Informational Note: | On the community-list page should all the metadata about a community/collection be available to the theme. This parameter defaults to true, but if you are experiencing performance problems on the community-list page you should experiment with turning this option off. |
Property: | xmlui.community-list.cache |
Example Value: | xmlui.community-list.cache = 12 hours |
Informational Note: | Normally, Manakin will fully verify any cache pages before using a cache copy. This means that when the community-list page is viewed the database is queried for each community/collection to see if their metadata has been modified. This can be expensive for repositories with a large community tree. To help solve this problem you can set the cache to be assumed valued for a specific set of time. The downside of this is that new or editing communities/collections may not show up the website for a period of time. |
Property: | xmlui.bistream.mods |
Example Value: | xmlui.bistream.mods = true |
Informational Note: | Optionally, you may configure Manakin to take advantage of metadata stored as a bitstream. The MODS metadata file must be inside the "METADATA" bundle and named MODS.xml. If this option is set to 'true' and the bitstream is present then it is made available to the theme for display. |
Property: | xmlui.bitstream.mets |
Example Value: | xmlui.bitstream.mets = true |
Informational Note: | Optionally, you may configure Manakin to take advantage of metadata stored as a bitstream. The METS metadata file must be inside the "METADATA" bundle and named METS.xml. If this optino is set to "true" and the bitstream is present then it is made available to the theme for display. |
Property: | xmlui.google.analytics.key |
Example Value: | xmlui.google.analytics.key = UA-XXXXXX-X |
Informational Note: | If you would like to use google analytics to track general website statistics then use the following parameter to provide your analytics key. First sign up for an account at http://analytics.google.com, then create an entry for your repositories website. Google Analytics will give you a snipit of javascript code to place on your site, inside that snip it is your google analytics key usually found in the line: _uacct = "UA-XXXXXXX-X" Take this key (just the UA-XXXXXX-X part) and place it here in this parameter. |
Property: | xmlui.controlpanel.activity.max |
Example Value: | xmlui.controlpanel.activity.max = 250 |
Informational Note: | Assign how many page views will be recorded and displayed in the control panel's activity viewer. The activity tab allows an administrator to debug problems in a running DSpace by understanding who and how their dspace is currently being used. The default value is 250. |
Property: | xmlui.controlpanel.activity.ipheader |
Example Value: | xmlui.controlpanel.activity.ipheader = X-Forward-For |
Informational Note: | Determine where the control panel's activity viewer recieves an events IP address from. If your DSpace is in a load balanced enviornment or otherwise behind a context-switch then you will need to set the paramater to the HTTP parameter that records the original IP address. |
In the following sections, you will learn how to configure OAI-PMH and activate additional OAI-PMH crosswalks. The user is also referred to 9.2OAI-PMH Data Provider for greater depth details of the program.
Property: | oai.didl.maxresponse | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Example Value: | oai.didle.maxresponse = 0 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Informational Note: |
+ The vocabulary element has an optional boolean attribute closed that can be used to force input only with the javascript of controlled-vocabulary add-on. The default behavior (i.e. without this attribute) is as set closed="false". This allow the user also to enter the value in free way. The following vocabularies are currently available by default:
The DSpace digital repository supports two user interfaces: one based upon JSP technologies and the other based upon the Apache Cocoon framework. This section describes those configurations settings which are specific to the XMLUI interface based upon the Cocoon framework. (Prior to DSpace Release 1.5.1 XMLUI was referred to Manakin. You may still see references to "Manakin")
In the following sections, you will learn how to configure OAI-PMH and activate additional OAI-PMH crosswalks. The user is also referred to 9.2OAI-PMH Data Provider for greater depth details of the program.
DSpace comes with an unqualified DC Crosswalk used in the default OAI-PMH data provider. There are also other Crosswalks bundled with the DSpace distribution which can be activated by editing one or more configuration files. How to do this for each available Crosswalk is described below. The DSpace source includes the following crosswalk plugins available for use with OAI-PMH:
OAI-PMH crosswalks based on Crosswalk Plugins are activated as follows:
|
DSpace comes with an unqualified DC Crosswalk used in the default OAI-PMH data provider. There are also other Crosswalks bundled with the DSpace distribution which can be activated by editing one or more configuration files. How to do this for each available Crosswalk is described below. The DSpace source includes the following crosswalk plugins available for use with OAI-PMH:
mets
- The manifest document from a DSpace METS SIP.
mods
- MODS metadata, produced by the table-driven MODS dissemination crosswalk.
qdc
- Qualfied Dublin Core, produced by the configurable QDC crosswalk. Note that this QDC does not include all of the DSpace "dublin core" metadata fields, since the XML standard for QDC is defined for a different set of elements and qualifiers.
OAI-PMH crosswalks based on Crosswalk Plugins are activated as follows:
Ensure the crosswalk plugin has a lower-case name (possibly in addition to its upper-case name) in the plugin configuration.
Add a line to the file config/templates/oaicat.properties
of the form:
Crosswalks.
plugin_name
=org.dspace.app.oai.PluginCrosswalk
-
substituting the plugin's name, e.g. "mets"
or "qdc"
for plugin_name.
Run the bin/install-configs
script
Restart your servlet container, e.g. Tomcat, for the change to take effect.
By activating the DIDL provider, DSpace items are represented as MPEG-21 DIDL objects. These DIDL objects are XML documents that wrap both the Dublin Core metadata that describes the DSpace item and its actual bitstreams. A bitstream is provided inline in the DIDL object in a base64 encoded manner, and/or by means of a pointer to the bitstream. The data provider exposes DIDL objects via the metadataPrefix didl.
The crosswalk does not deal with special characters and purposely skips dissemination of the license.txt
file awaiting a better understanding on how to map DSpace rights information to MPEG21-DIDL.
The DIDL Crosswalk can be activated as follows:
Uncomment the oai.didl.maxresponse
item in dspace.cfg
Uncomment the DIDL Crosswalk entry from the config/templates/oaicat.properties
file
Run the bin/install-configs
script
Restart Tomcat
Verify the Crosswalk is activated by accessing a URL such as http://mydspace/oai/request?verb=ListRecords&metadataPrefix=didl
Property: | core.authorization.community-admin.create-subelement | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Example Value: | core.authorization.community-admin.create-subelement = true | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Informational Note: |
+ substituting the plugin's name, e.g. Run the Restart your servlet container, e.g. Tomcat, for the change to take effect. By activating the DIDL provider, DSpace items are represented as MPEG-21 DIDL objects. These DIDL objects are XML documents that wrap both the Dublin Core metadata that describes the DSpace item and its actual bitstreams. A bitstream is provided inline in the DIDL object in a base64 encoded manner, and/or by means of a pointer to the bitstream. The data provider exposes DIDL objects via the metadataPrefix didl. The crosswalk does not deal with special characters and purposely skips dissemination of the The DIDL Crosswalk can be activated as follows:
This section describes the parameters used in configuring the OAI-ORE harvester. There are many possible configuration options for the OAI harvester. Most of them are technical and therefore omitted from the dspace.cfg file itself, using hard-coded deafults instead. However, should you wish to modify those values, including them in
The following configurations allow the adminstrator extract from the DSpace database a set of records for editing by a metadata export. It provides an easier way of editing large collections.
The following configurations allow the adminstrator extract from the DSpace database a set of records for editing by a metadata export. It provides an easier way of editing large collections.
It is now possible to hide metadata from public consumption that is only avaialable to the Administrator.
It is now possible to hide metadata from public consumption that is only avaialable to the Administrator.
The following section explains how to configure either optional features or advanced features that are not necessary to make DSpace "out-of-the-box" The In order to change the registries, you may adjust the XML files before the first installation of DSpace. On an already running instance it is recommended to change bitstream registries via DSpace admin UI, but the metadata registries can be loaded again at any time from the XML files without difficult. The changes made via admin UI are not reflected in the XML files. The default metadata schema is Dublin Core, so DSpace is distributed with a default Dublin Core Metadata Registry. Currently, the system requires that every item have a Dublin Core record. There is a set of Dublin Core Elements, which is used by the system and should not be removed or moved to another schema, see Appendix: Default Dublin Core Metadata registry. Note: altering a Metadata Registry has no effect on corresponding parts, e.g. item submission interface, item display, item import and vice versa. Every metadata element used in submission interface or item import must be registered before using it. Note also that deleting a metadata element will delete all its corresponding values. If you wish to add more metadata elements, you can do this in one of two ways. Via the DSpace admin UI you may define new metadata elements in the different available schemas. But you may also modify the XML file (or provide an additional one), and re-import the data as follows: [dspace]/bin/dsrun org.dspace.administer.MetadataImporter -f [xml file] The XML file should be structured as follows: <dspace-dc-types> + |
The following section explains how to configure either optional features or advanced features that are not necessary to make DSpace "out-of-the-box"
The [dspace]/config/registries
directory contains three XML files. These are used to load the initial contents of the Dublin Core Metadata registry and Bitstream Format registry and SWORD metadata registry. After the initial loading (performed by ant fresh_install
above), the registries reside in the database; the XML files are not updated.
In order to change the registries, you may adjust the XML files before the first installation of DSpace. On an already running instance it is recommended to change bitstream registries via DSpace admin UI, but the metadata registries can be loaded again at any time from the XML files without difficult. The changes made via admin UI are not reflected in the XML files.
The default metadata schema is Dublin Core, so DSpace is distributed with a default Dublin Core Metadata Registry. Currently, the system requires that every item have a Dublin Core record.
There is a set of Dublin Core Elements, which is used by the system and should not be removed or moved to another schema, see Appendix: Default Dublin Core Metadata registry.
Note: altering a Metadata Registry has no effect on corresponding parts, e.g. item submission interface, item display, item import and vice versa. Every metadata element used in submission interface or item import must be registered before using it.
Note also that deleting a metadata element will delete all its corresponding values.
If you wish to add more metadata elements, you can do this in one of two ways. Via the DSpace admin UI you may define new metadata elements in the different available schemas. But you may also modify the XML file (or provide an additional one), and re-import the data as follows:
[dspace]/bin/dsrun org.dspace.administer.MetadataImporter -f [xml file]
The XML file should be structured as follows:
<dspace-dc-types> <dc-type> <schema>dc</schema> <element>contributor</element> <qualifier>advisor</qualifier> <scope_note>Use primarily for thesis advisor.</scope_note> </dc-type> -</dspace-dc-types>
The bitstream formats recognized by the system and levels of support are similarly stored in the bitstream format registry. This can also be edited at install-time via [dspace]/config/registries/bitstream-formats.xml
or by the administation Web UI. The contents of the bitstream format registry are entirely up to you, though the system requires that the following two formats are present:
+</dspace-dc-types>
The bitstream formats recognized by the system and levels of support are similarly stored in the bitstream format registry. This can also be edited at install-time via [dspace]/config/registries/bitstream-formats.xml
or by the administation Web UI. The contents of the bitstream format registry are entirely up to you, though the system requires that the following two formats are present:
Unknown
License
-
Deleting a format will cause any existing bitstreams of this format to be reverted to the unknown bitstream format.
These filters handle PDF resources with more sophisticated tools that can produce thumbnail images of PDF and 3D PDF files, and do much faster (and more complete) text extraction as well.
The following instructions are the "quick and dirty" method for installing the XPDF filter. It does not address any issues that may be affectively needed for Maven Modules or and POMs. This method below does not take into account the maintaining the DSpace installation or upgrading.
Obtain a source distribution of DSpace 1.6, configure, and build it.
Edit configuration lines in dspace.cfg
Add -Pxpdf-mediafilter-support to maven build
Build and install.
First, download the XPDF suite found at:http://www.foolabs.com/xpdf and install it on your server under /usr/local/bin.
The only tools you really need are:
pdftoppm
pdfinfo
pdftotext
The user will be using Java™ Advanced Imaging Image I/O Tools.
![]() | |
For AIX, Sun support has the following: "JAI has native acceleration for the above but it also works in pure Java mode. So as long as you have an appropriate JDK for AIX (1.3 or later, I believe), you should be able to use it. You can download any of them, extract just the jars, and put those in your $CLASSPATH." |
Download the jai_imageio library version 1.0_01 or 1.1 found athttps://jai-imageio.dev.java.net/binary-builds.html#Stable_builds .
Install it in your local Maven repository with the command (assuming the library is installed locally at /opt/facade/lib/jai_imageio.jar):
- mvn install:install-file \ - -Dfile=/opt/facade/lib/jai_imageio.jar \ - -DgroupId=com.sun.media \ - -DartifactId=jai_imageio \ - -Dversion=1.0_01 \ - -Dpackaging=jar \ - -DgeneratePom=true -
First, be sure there is a value for thumbnail.maxwidth and that it corresponds to the size you want for preview images for the UI, e.g.: (NOTE: this code doesn't pay any attention to thumbnail.maxheight but it's best to set it too so the other thumbnail filters make square images.)
+
Deleting a format will cause any existing bitstreams of this format to be reverted to the unknown bitstream format.
This is an alternative suite of MediaFilter plugins that offers faster and more reliable text extraction from PDF Bitstreams, as well as thumbnail image generation. It replaces the built-in default PDF MediaFilter.
If this filter is so much better, why isn't it the default? The answer is that it relies on external executable programs which must be obtained and installed for your server platform. This would add too much complexity to the installation process, so it left out as an optional "extra" step.
Here are the steps required to install and configure the filters:
Install the xpdf tools for your platform, from the downloads at http://www.foolabs.com/xpdf
Acquire the Sun Java Advanced Imaging Tools and create a local Maven package.
Edit DSpace configuration properties to add location of xpdf executables, reconfigure MediaFilter plugins.
Build and install DSpace, adding -Pxpdf-mediafilter-support to Maven invocation.
First, download the XPDF suite found at: http://www.foolabs.com/xpdf and install it on your server. The executables can be located anywhere, but make a note of the full path to each command.
You may be able to download a binary distribution for your platform, which simplifies installation. Xpdf is readily available for Linux, Solaris, MacOSX, Windows, NetBSD, HP-UX, AIX, and OpenVMS, and is reported to work on AIX, OS/2, and many other systems.
The only tools you really need are:
pdfinfo
- displays properties and Info dict
pdftotext
- extracts text from PDF
pdftoppm
- images PDF for thumbnails
Fetch and install the Java™ Advanced Imaging Image I/O Tools.
![]() | |
For AIX, Sun support has the following: "JAI has native acceleration for the above but it also works in pure Java mode. So as long as you have an appropriate JDK for AIX (1.3 or later, I believe), you should be able to use it. You can download any of them, extract just the jars, and put those in your $CLASSPATH." |
Download the jai_imageio
library version 1.0_01 or 1.1 found at: https://jai-imageio.dev.java.net/binary-builds.html#Stable_builds .
![]() | |
For these filters you do NOT have to worry about the native code, just the JAR, so choose a download for any platform. |
+curl -O http://download.java.net/media/jai-imageio/builds/release/1.1/jai_imageio-1_1-lib-linux-i586.tar.gz +tar xzf jai_imageio-1_1-lib-linux-i586.tar.gz +
The preceding example leaves the JAR in jai_imageio-1_1/lib/jai_imageio.jar
. Now install it in your local Maven repository, e.g.: (changing the path after file=
if necessary)
+ mvn install:install-file \ + -Dfile=jai_imageio-1_1/lib/jai_imageio.jar \ + -DgroupId=com.sun.media \ + -DartifactId=jai_imageio \ + -Dversion=1.0_01 \ + -Dpackaging=jar \ + -DgeneratePom=true +
You may have to repeat this procedure for the jai_core.jar
library, as well, if it is not available in any of the public Maven repositories. Once acquired, this command installs it locally:
+mvn install:install-file -Dfile=jai_core-1.1.2_01.jar \ + -DgroupId=javax.media -DartifactId=jai_core -Dversion=1.1.2_01 -Dpackaging=jar -DgeneratePom=true +
First, be sure there is a value for thumbnail.maxwidth
and that it corresponds to the size you want for preview images for the UI, e.g.: (NOTE: this code doesn't pay any attention to thumbnail.maxheight
but it's best to set it too so the other thumbnail filters make square images.)
# maximum width and height of generated thumbnails thumbnail.maxwidth 300 thumbnail.maxheight 300 -
Now, add the absolute paths of the XPDF tools you installed:
- xpdf.path.pdftotext = /var/local/bin/pdftotext - xpdf.path.pdftoppm = /var/local/bin/pdftoppm - xpdf.path.pdfinfo = /var/local/bin/pdfinfo -
Also be sure the mediafilter configuration includes the new filters, e.g: (New sections are in bold)
+
Now, add the absolute paths to the XPDF tools you installed. In this example they are installed under /usr/local/bin
(a logical place on Linux and MacOSX), but they may be anywhere.
+ xpdf.path.pdftotext = /usr/local/bin/pdftotext + xpdf.path.pdftoppm = /usr/local/bin/pdftoppm + xpdf.path.pdfinfo = /usr/local/bin/pdfinfo +
Change the MediaFilter plugin configuration to remove the old org.dspace.app.mediafilter.PDFFilter
and add the new filters, e.g: (New sections are in bold)
filter.plugins = \ PDF Text Extractor, \ PDF Thumbnail, \ @@ -1265,77 +1287,79 @@ core.authorization.item-admin.cc-license
#Configure each filter's input format(s)
+
Then add the input format configuration properties for each of the new filters, e.g.:
filter.org.dspace.app.mediafilter.XPDF2Thumbnail.inputFormats = Adobe PDF filter.org.dspace.app.mediafilter.XPDF2Text.inputFormats = Adobe PDF - Add -Pxpdf-mediafilter-support to maven build -
Edit the POM for the dspace-api module. Within the <dependencies> element, add this new element:
- %mvn -Pxpdf-mediafilter-support package -
Follow the usual DSpace installation/update procedure (mvn package
and then ant -Dconfig=
etc. ...)
These instructions were retrieved from "http://libstaff.mit.edu/facade/index.php/DSpace_PDF_Media_Filters"
New Media Filters must implement the org.dspace.app.mediafilter.FormatFilter
interface. More information on the methods you need to implement is provided in the FormatFilter.java
source file. For example:
+
Finally, if you want PDF thumbnail images, don't forget to add that filter name to the filter.plugins
property, e.g.:
+ filter.plugins = PDF Thumbnail, PDF Text Extractor, ...
+
New Media Filters must implement the org.dspace.app.mediafilter.FormatFilter
interface. More information on the methods you need to implement is provided in the FormatFilter.java
source file. For example:
public class MySimpleMediaFilter implements
- FormatFilter
+ FormatFilter
Alternatively, you could extend the org.dspace.app.mediafilter.MediaFilter
class, which just defaults to performing no pre/post-processing of bitstreams before or after filtering.
public class MySimpleMediaFilter extends
- MediaFilter
+ MediaFilter
You must give your new filter a "name", by adding it and its name to the plugin.named.org.dspace.app.mediafilter.FormatFilter
field in dspace.cfg
. In addition to naming your filter, make sure to specify its input formats in the filter.<class path>.inputFormats
config item. Note the input formats must match the short description
field in the Bitstream Format Registry (i.e. bitstreamformatregistry
table).
plugin.named.org.dspace.app.mediafilter.FormatFilter = \
- org.dspace.app.mediafilter.MySimpleMediaFilter = My Simple Text
- Filter, \ ...
- filter.org.dspace.app.mediafilter.MySimpleMediaFilter.inputFormats =
- Text
+ org.dspace.app.mediafilter.MySimpleMediaFilter = My Simple Text
+ Filter, \ ...
+ filter.org.dspace.app.mediafilter.MySimpleMediaFilter.inputFormats =
+ Text
WARNING: If you neglect to define the inputFormats
for a particular filter, the MediaFilterManager
will never call that filter, since it will never find a bitstream which has a format matching that filter's input format(s).
-
If you have a complex Media Filter class, which actually performs different filtering for different formats (e.g. conversion from Word to PDF and conversion from Excel to CSV), you should define this as a Dynamic / Self-Named Format Filter.
If you have a more complex Media/Format Filter, which actually performs multiple filtering or conversions for different formats (e.g. conversion from Word to PDF and conversion from Excel to CSV), you should have define a class which implements the FormatFilter
interface, while also extending the
+
If you have a complex Media Filter class, which actually performs different filtering for different formats (e.g. conversion from Word to PDF and conversion from Excel to CSV), you should define this as a Dynamic / Self-Named Format Filter.
If you have a more complex Media/Format Filter, which actually performs multiple filtering or conversions for different formats (e.g. conversion from Word to PDF and conversion from Excel to CSV), you should have define a class which implements the FormatFilter
interface, while also extending the
SelfNamedPlugin
class. For example:
public class MyComplexMediaFilter extends
- SelfNamedPlugin implements FormatFilter
+ SelfNamedPlugin implements FormatFilter
Since SelfNamedPlugins
are self-named (as stated), they must provide the various names the plugin uses by defining a getPluginNames() method. Generally speaking, each "name" the plugin uses should correspond to a different type of filter it implements (e.g. "Word2PDF" and "Excel2CSV" are two good names for a complex media filter which performs both Word to PDF and Excel to CSV conversions).
Self-Named Media/Format Filters are also configured differently in dspace.cfg
. Below is a general template for a Self Named Filter (defined by an imaginary MyComplexMediaFilter
class, which can perform both Word to PDF and Excel to CSV conversions):
#Add to a list of all Self Named filters
- plugin.selfnamed.org.dspace.app.mediafilter.FormatFilter = \
- org.dspace.app.mediafilter.MyComplexMediaFilter #Define input formats
- for each "named" plugin this filter implements
- filter.org.dspace.app.mediafilter.MyComplexMediaFilter.Word2PDF.inputF
+ plugin.selfnamed.org.dspace.app.mediafilter.FormatFilter = \
+ org.dspace.app.mediafilter.MyComplexMediaFilter #Define input formats
+ for each "named" plugin this filter implements
+ filter.org.dspace.app.mediafilter.MyComplexMediaFilter.Word2PDF.inputF
ormats = Microsoft Word
- filter.org.dspace.app.mediafilter.MyComplexMediaFilter.Excel2CSV.input
+ filter.org.dspace.app.mediafilter.MyComplexMediaFilter.Excel2CSV.input
Formats = Microsoft Excel
As shown above, each Self-Named Filter class must be listed in the plugin.selfnamed.org.dspace.app.mediafilter.FormatFilter
item in dspace.cfg
. In addition, each Self-Named Filter must define the input formats for each named plugin defined by that filter. In the above example the MyComplexMediaFilter
class is assumed to have defined two named plugins, Word2PDF
and Excel2CSV
. So, these two valid plugin names ("Word2PDF" and "Excel2CSV") must be returned by the getPluginNames()
method of the MyComplexMediaFilter
class.
These named plugins take different input formats as defined above (see the corresponding inputFormats
setting). WARNING: If you neglect to define the inputFormats
for a particular named plugin, the MediaFilterManager
will never call that plugin, since it will never find a bitstream which has a format matching that plugin's input format(s).
For a particular Self-Named Filter, you are also welcome to define additional configuration settings in dspace.cfg
. To continue with our current example, each of our imaginary plugins actually results in a different output format (Word2PDF creates "Adobe PDF", while Excel2CSV creates "Comma Separated Values"). To allow this complex Media Filter to be even more configurable (especially across institutions, with potential different "Bitstream Format Registries"), you may wish to allow for the output format to be customizable for each named plugin. For example:
#Define output formats for each named plugin
- filter.org.dspace.app.mediafilter.MyComplexMediaFilter.Word2PDF.output
+ filter.org.dspace.app.mediafilter.MyComplexMediaFilter.Word2PDF.output
Format = Adobe PDF
- filter.org.dspace.app.mediafilter.MyComplexMediaFilter.Excel2CSV.outpu
+ filter.org.dspace.app.mediafilter.MyComplexMediaFilter.Excel2CSV.outpu
tFormat = Comma Separated Values
Any custom configuration fields in dspace.cfg
defined by your filter are ignored by the MediaFilterManager
, so it is up to your custom media filter class to read those configurations and apply them as necessary. For example, you could use the following sample Java code in your MyComplexMediaFilter
class to read these custom outputFormat
configurations from dspace.cfg
:
//get "outputFormat" configuration from dspace.cfg
- String outputFormat =
- ConfigurationManager.getProperty(MediaFilterManager.FILTER_PREFIX +
- "." + MyComplexMediaFilter.class.getName() + "." +
- this.getPluginInstanceName() + ".outputFormat");
+ String outputFormat =
+ ConfigurationManager.getProperty(MediaFilterManager.FILTER_PREFIX +
+ "." + MyComplexMediaFilter.class.getName() + "." +
+ this.getPluginInstanceName() + ".outputFormat");
-
To ease the hassle of keeping configuration files for other applications involved in running a DSpace site, for example Apache, in sync, the DSpace system can automatically update them for you when the main DSpace configuration is changed. This feature of the DSpace system is entirely optional, but we found it useful.
The way this is done is by placing the configuration files for those applications in [dspace]/config/templates
, and inserting special values in the configuration file that will be filled out with appropriate DSpace configuration properties. Then, tell DSpace where to put filled-out, 'live' version of the configuration by adding an appropriate property to dspace.cfg
, and run [dspace]/bin/install-configs
.
Take the apache13.conf
file as an example. This contains plenty of Apache-specific stuff, but where it uses a value that should be kept in sync across DSpace and associated applications, a 'placeholder' value is written. For example, the host name:
+
To ease the hassle of keeping configuration files for other applications involved in running a DSpace site, for example Apache, in sync, the DSpace system can automatically update them for you when the main DSpace configuration is changed. This feature of the DSpace system is entirely optional, but we found it useful.
The way this is done is by placing the configuration files for those applications in [dspace]/config/templates
, and inserting special values in the configuration file that will be filled out with appropriate DSpace configuration properties. Then, tell DSpace where to put filled-out, 'live' version of the configuration by adding an appropriate property to dspace.cfg
, and run [dspace]/bin/install-configs
.
Take the apache13.conf
file as an example. This contains plenty of Apache-specific stuff, but where it uses a value that should be kept in sync across DSpace and associated applications, a 'placeholder' value is written. For example, the host name:
ServerName @@dspace.hostname@@
The text @@dspace.hostname@@
will be filled out with the value of the dspace.hostname
property in dspace.cfg
. Then we decide where we want the 'live' version, that is, the version actually read in by Apache when it starts up, will go.
Let's say we want the live version to be located at /opt/apache/conf/dspace-httpd.conf
. To do this, we add the following property to dspace.cfg
so DSpace knows where to put it:
config.template.apache13.conf = /opt/apache/conf/dspace-httpd.conf
Now, we run [dspace]/bin/install-configs
. This reads in [dspace]/config/templates/apache13.conf
, and places a copy at /opt/apache/conf/dspace-httpd.conf
with the placeholders filled out.
So, in /opt/apache/conf/dspace-httpd.conf
, there will be a line like:
ServerName dspace.myu.edu -
The advantage of this approach is that if a property like the hostname changes, you can just change it in dspace.cfg
and run install-configs
, and all of your tools' configuration files will be updated.
However, take care to make all your edits to the versions in [dspace]/config/templates
! It's a wise idea to put a big reminder at the top of each file, since someone might unwittingly edit a 'live' configuration file which would later be overwritten.
A usage instrumentation plugin is configured as a singleton plugin for the abstract class org.dspace.app.statistics.AbstractUsageEvent
.
The Passive plugin is provided as the class org.dspace.app.statistics.PassiveUsageEvent
. It absorbs events without effect. Use the Passive plugin when you have no use for usage event postings. This is the default if no plugin is configured.
The Tab File Logger plugin is provided as the class org.dspace.app.statistics.UsageEventTabFileLogger
. It writes event records to a file in tab-separated column format. If left unconfigured, an error will be noted in the DSpace log and no file will be produced. To specify the file path, provide an absolute path as the value for usageEvent.tabFileLogger.file
in dspace.cfg
.
The XML Logger plugin is provided as the class org.dspace.app.statistics.UsageEventXMLLogger
. It writes event records to a file in a simple XML-like format. If left unconfigured, an error will be noted in the DSpace log and no file will be produced. To specify the file path, provide an absolute path as the value for usageEvent.xmlLogger.file
in dspace.cfg
.
SWORD (Simple Web-service Offering Repository Deposit) is a protocol that allows the remote deposit of items into repositories. DSpace implements the SWORD protocol via the 'sword' web application. The version of SWORD currently supported by DSpace is 1.3. The specification and further information can be downloaded fromhttp://swordapp.org.
SWORD is based on the Atom Publish Protocol and allows service documents to be requested which describe the structure of the repository, and packages to be deposited.
Properties: | sword.mets-ingester.package-ingester | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Example Value: | sword.mets-ingester.package-ingester = METS | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Informational Note: |
+ The advantage of this approach is that if a property like the hostname changes, you can just change it in However, take care to make all your edits to the versions in A usage instrumentation plugin is configured as a singleton plugin for the abstract class The Passive plugin is provided as the class The Tab File Logger plugin is provided as the class The XML Logger plugin is provided as the class SWORD (Simple Web-service Offering Repository Deposit) is a protocol that allows the remote deposit of items into repositories. DSpace implements the SWORD protocol via the 'sword' web application. The version of SWORD currently supported by DSpace is 1.3. The specification and further information can be downloaded fromhttp://swordapp.org. SWORD is based on the Atom Publish Protocol and allows service documents to be requested which describe the structure of the repository, and packages to be deposited.
OpenSearch is a small set of conventions and documents for describing and using "serach enginges", meaning any service that returns a set of results for a query. See extensive description in the Business Layer section of the documentation. Please note that for result data formatting, OpenSearch uses Syndication Feed Settings (RSS). So, even if Syndication Feeds are not enable, they must be configured to enable OpenSearch. OpenSearch uses all the configuration properties for DSpace RSS to determine the mapping of metadata fields to feed fields. Note that a new field for authors has been added (used in Atom format only).
It is possible now to configure a DSpace instance to have an "Embargo" feature uses for thesis and dissertations.
Remember that you need to replace Copyright © 2002-2009 - The DSpace Foundation + | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Informational Note: | Configure the plugins to process incoming packages. The form of this configuration is as per the Plugin Manager's Named Plugin documentation: plugin.named.[interface] = [implementation] = [package format identifier] \ . Package ingesters should implement the SWORDIngester interface, and will be loaded when a package of the format specified above in: sword.accept-packaging.[package format].identifier = [package format identifier] is received. In the event that this is a simple file deposit, with no package format, then the class named by "SimpleFileIngester" will be loaded and executed where appropriate. This case will only occur when a single file is being deposited into an existing DSpace Item. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Properties: | sword.accepts | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Example Value: | sword.accepts = application/zip, foo/bar | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Informational Note: | A comma separated list of MIME types that SWORD will accept. |
OpenSearch is a small set of conventions and documents for describing and using "serach enginges", meaning any service that returns a set of results for a query. See extensive description in the Business Layer section of the documentation.
Please note that for result data formatting, OpenSearch uses Syndication Feed Settings (RSS). So, even if Syndication Feeds are not enable, they must be configured to enable OpenSearch. OpenSearch uses all the configuration properties for DSpace RSS to determine the mapping of metadata fields to feed fields. Note that a new field for authors has been added (used in Atom format only).
Property: | websvc.opensearch.enable |
Example Value: | websvc.opensearch.enable = false |
Informational Note: | Whether or not OpenSearch is enabled. By default, the feature is disabled. Change the property key to 'ture' to enable. |
Property: | websvc.opensearch.uicontext |
Example Value: | websvc.opensearch.uicontext = simple-search |
Informational Note: | Context for HTML request URLs. Change only for non-standard servlet mapping. |
Property: | websvc.opensearch.svccontext |
Example Value: | websvc.opensearch.svccontext = open-search/ |
Informational Note: | Context for RSS/Atom request URLs. Change only for non-standard servlet mapping. |
Property: | websvc.opensearch.autolink |
Example Value: | websvc.opensearch.autolink = true |
Informational Note: | Present autodiscovery link in every page head. |
Property: | websvc.opensearch.validity |
Example Value: | websvc.opensearch.validity = 48 |
Informational Note: | Number of hours to retain results before recalculating. This applies to the Manakin interface only. |
Property: | websvc.opensearch.shortname |
Example Value: | websvc.opensearch.shortname = DSpace |
Informational Note: | A short name used in browsers for search service. It should be sixteen (16) or fewer characters. |
Property: | websvc.opensearch.longname |
Example Value: | websvc.opensearch.longname = ${dspace.name} |
Informational Note: | A longer name up to 48 characters. |
Property: | websvc.opensearch.description |
Example Value: | websvc.opensearch.description = ${dspace.name} DSpace repository |
Informational Note: | Brief service description |
Property: | websvc.opensearch.faviconurl |
Example Value: | websvc.opensearch.faviconurl = http://www.dspace.org/images/favicon.ico |
Informational Note: | Location of favicon for service, if any. They must by 16 x 16 pixels. You can profide your own local favicon instead of the default. |
Property: | websvc.opensearch.samplequery |
Example Value: | websvc.opensearch.samplequery = photosynthesis |
Informational Note: | Sample query. This should return results. You can replace the sample query with search terms that should actually yield results in your repository. |
Property: | websvc.opensearch.tags |
Example Value: | websc.opensearch.tags = IR DSpace |
Informational Note: | Tags used to describe search service. |
Property: | websvc.opensearch.formats |
Example Value: | websvc.opensearch.formats = html,atom,rss |
Informational Note: | Result formats offered. Use one or more comma-separated from the list: html, atom, rss. Please note that html is requred for autodiscovery in browsers to function, and must be the first in the list if present. |
It is possible now to configure a DSpace instance to have an "Embargo" feature uses for thesis and dissertations.
Property: | embargo.field.terms |
Example Value: | embargo.field.terms = SCHEMA.ELEMENT.QUALIFIER |
Informational Note: | DC metadata field to hold the user-supplied embargo terms |
Property: | embargo.field.lift |
Example Value: | embargo.field.lift = SCHEMA.ELEMENT.QUALIFIER |
Informational Note: | DC metadata field to hold computed "lift date" of embargo. You may need to create a DC metadata field in your Metadata Format Registry if it does not already exist. |
Property: | embargo.terms.open |
Example Value: | embargo.terms.open = forever |
Informational Note: | The string in terms field to indicate indefinite embargo |
Property: | plugin.single.org.dspace.embargo.EmbargoSetter |
Example Value: | plugin.single.org.dspace.embargo.EmbargoSetter = org.dspace.embargo.DefaultEmbargoSetter |
Informational Note: | Implementation of embargo setter plugin |
Property: | plugin.single.org.dspace.embargo.EmbargoLifter |
Example Value: | plugin.single.org.dspace.embargo.EmbargoLifter = org.dspace.embargo.DefaultEmbargoLifter |
Informational Note: | Implementation of embargo lifter plugin |
Remember that you need to replace SCHEMA.ELEMENT.QUALIFIER
with a real metadata field. Additionally, you need to replace the CLASSNAME
with a properly implemented plugin.
Copyright © 2002-2010 + The DuraSpace Foundation
\ No newline at end of file +Licensed under a Creative Commons Attribution 3.0 United States License
\ No newline at end of file diff --git a/dspace/docs/html/ch06.html b/dspace/docs/html/ch06.html index 31fb9884a6..9fbf029a83 100644 --- a/dspace/docs/html/ch06.html +++ b/dspace/docs/html/ch06.html @@ -1,9 +1,8 @@ -Table of Contents
The DSpace digital repository supports two user interfaces: one based on JavaServer Pages (JSP) technologies and one based upon the Apache Cocoon framework. This chapter describes those parameters which are specific to the JPSUI interface.
The user will need to refer to the extensive WebUI/JSPUI configurations that are contained in 5.2.36 JSP Web Interface Settings.
The JSPUI interface is implemented using Java Servlets which handle the business logic, and JavaServer Pages (JSPs) which produce the HTML pages sent to an end-user. Since the JSPs are much closer to HTML than Java code, altering the look and feel of DSpace is relatively easy.
To make it even easier, DSpace allows you to 'override' the JSPs included in the source distribution with modified versions, that are stored in a separate place, so when it comes to updating your site with a new DSpace release, your modified versions will not be overwritten. It should be possible to dramatically change the look of DSpace to suit your organization by just changing the CSS style file and the site 'skin' or 'layout' JSPs in jsp/layout
; if possible, it is recommended you limit local customizations to these files to make future upgrades easier.
You can also easily edit the text that appears on each JSP page by editing the Messages.properties
file. However, note that unless you change the entry in all of the different language message files, users of other languages will still see the default text for their language. See Internationalization in Application Layer.
Note that the data (attributes) passed from an underlying Servlet to the JSP may change between versions, so you may have to modify your customized JSP to deal with the new data.
Thus, if possible, it is recommended you limit your changes to the 'layout' JSPs and the stylesheet.
The JSPs are available in one of two places:
[dspace-source]/dspace-jspui/dspace-jspui-webapp/src/main/webapp/
- Only exists if you downloaded the full Source Release of DSpace
[dspace-source]/dspace/target/dspace-[version].dir/webapps/dspace-jspui-webapp/
- The location where they are copied after first building DSpace.
If you wish to modify a particular JSP, place your edited version in the [dspace-source]/dspace/modules/jspui/src/main/webapp/
directory (this is the replacement for the pre-1.5 /jsp/local
directory), with the same path as the original. If they exist, these will be used in preference to the default JSPs. For example:
DSpace default | Locally-modified version - |
[jsp.dir]/community-list.jsp | [jsp.custom-dir]/dspace/modules/jspui/src/main/webapp/community-list.jsp |
[jsp.dir]/mydspace/main.jsp | [jsp.custom-dir]/dspace/modules/jspui/src/main/webapp/mydspace/main.jsp |
Heavy use is made of a style sheet, styles.css.jsp
. If you make edits, copy the local version to [jsp.custom-dir]/dspace/modules/jspui/src/main/webapp/styles.css.jsp
, and it will be used automatically in preference to the default, as described above.
Fonts and colors can be easily changed using the stylesheet. The stylesheet is a JSP so that the user's browser version can be detected and the stylesheet tweaked accordingly.
The 'layout' of each page, that is, the top and bottom banners and the navigation bar, are determined by the JSPs /layout/header-*.jsp
and /layout/footer-*.jsp
. You can provide modified versions of these (in [jsp.custom-dir]/dspace/modules/jspui/src/main/webapp/layout
), or define more styles and apply them to pages by using the "style" attribute of the dspace:layout
tag.
Rebuild the DSpace installation package by running the following command from your [dspace-source]/dspace/
directory:
mvn package
Update all DSpace webapps to [dspace]/webapps
by running the following command from your [dspace-source]/dspace/target/dspace-[version]-build.dir
directory:
ant -Dconfig=[dspace]/config/dspace.cfg update
Deploy the the new webapps:
cp -R /[dspace]/webapps/* /[tomcat]/webapps
Restart Tomcat
When you restart the web server you should see your customized JSPs.
Copyright © 2002-2009 - The DSpace Foundation +
Table of Contents
The DSpace digital repository supports two user interfaces: one based on JavaServer Pages (JSP) technologies and one based upon the Apache Cocoon framework. This chapter describes those parameters which are specific to the JPSUI interface.
The user will need to refer to the extensive WebUI/JSPUI configurations that are contained in 5.2.36 JSP Web Interface Settings.
The JSPUI interface is implemented using Java Servlets which handle the business logic, and JavaServer Pages (JSPs) which produce the HTML pages sent to an end-user. Since the JSPs are much closer to HTML than Java code, altering the look and feel of DSpace is relatively easy.
To make it even easier, DSpace allows you to 'override' the JSPs included in the source distribution with modified versions, that are stored in a separate place, so when it comes to updating your site with a new DSpace release, your modified versions will not be overwritten. It should be possible to dramatically change the look of DSpace to suit your organization by just changing the CSS style file and the site 'skin' or 'layout' JSPs in jsp/layout
; if possible, it is recommended you limit local customizations to these files to make future upgrades easier.
You can also easily edit the text that appears on each JSP page by editing the Messages.properties
file. However, note that unless you change the entry in all of the different language message files, users of other languages will still see the default text for their language. See Internationalization in Application Layer.
Note that the data (attributes) passed from an underlying Servlet to the JSP may change between versions, so you may have to modify your customized JSP to deal with the new data.
Thus, if possible, it is recommended you limit your changes to the 'layout' JSPs and the stylesheet.
The JSPs are available in one of two places:
[dspace-source]/dspace-jspui/dspace-jspui-webapp/src/main/webapp/
- Only exists if you downloaded the full Source Release of DSpace
[dspace-source]/dspace/target/dspace-[version].dir/webapps/dspace-jspui-webapp/
- The location where they are copied after first building DSpace.
If you wish to modify a particular JSP, place your edited version in the [dspace-source]/dspace/modules/jspui/src/main/webapp/
directory (this is the replacement for the pre-1.5 /jsp/local
directory), with the same path as the original. If they exist, these will be used in preference to the default JSPs. For example:
DSpace default | Locally-modified version + |
[jsp.dir]/community-list.jsp | [jsp.custom-dir]/dspace/modules/jspui/src/main/webapp/community-list.jsp |
[jsp.dir]/mydspace/main.jsp | [jsp.custom-dir]/dspace/modules/jspui/src/main/webapp/mydspace/main.jsp |
Heavy use is made of a style sheet, styles.css.jsp
. If you make edits, copy the local version to [jsp.custom-dir]/dspace/modules/jspui/src/main/webapp/styles.css.jsp
, and it will be used automatically in preference to the default, as described above.
Fonts and colors can be easily changed using the stylesheet. The stylesheet is a JSP so that the user's browser version can be detected and the stylesheet tweaked accordingly.
The 'layout' of each page, that is, the top and bottom banners and the navigation bar, are determined by the JSPs /layout/header-*.jsp
and /layout/footer-*.jsp
. You can provide modified versions of these (in [jsp.custom-dir]/dspace/modules/jspui/src/main/webapp/layout
), or define more styles and apply them to pages by using the "style" attribute of the dspace:layout
tag.
Rebuild the DSpace installation package by running the following command from your [dspace-source]/dspace/
directory:
mvn package
Update all DSpace webapps to [dspace]/webapps
by running the following command from your [dspace-source]/dspace/target/dspace-[version]-build.dir
directory:
ant -Dconfig=[dspace]/config/dspace.cfg update
Deploy the the new webapps:
cp -R /[dspace]/webapps/* /[tomcat]/webapps
Restart Tomcat
When you restart the web server you should see your customized JSPs.
Copyright © 2002-2010 + The DuraSpace Foundation
\ No newline at end of file +Licensed under a Creative Commons Attribution 3.0 United States License
\ No newline at end of file diff --git a/dspace/docs/html/ch07.html b/dspace/docs/html/ch07.html index 49d9298884..2cd328233e 100644 --- a/dspace/docs/html/ch07.html +++ b/dspace/docs/html/ch07.html @@ -1,15 +1,15 @@ -Table of Contents
The DSpace digital repository supports two user interfaces: one based on JavaServer Pages (JSP) technologies and one based upon the Apache Cocoon framework. This chapter describes those parameters which are specific to the Manakin (XMLUI) interface based upon the Cocoon framework.
In an effort to save the programmer/administrator some time, the configuration table below is taken from 5.3.43. XMLUI Specific Configuration.
Property: | xmlui.supportedLocales |
Example Value: | xmlui.supportedLocales = en, de |
Informational Note: | A list of supported locales for Manakin. Manakin will look at a user's browser configuration for the first language that appears in this list to make available to in the interface. This parameter is a comma separated list of Locales. All types of Locales country, country_language, country_language_variant. Note that if the appropriate files are not present (i.e. Messages_XX_XX.xml) then Manakin will fall back through to a more general language. |
Property: | xmlui.force.ssl |
Example Value: | xmlui.force.ssl = true |
Informational Note: | Force all authenticated connections to use SSL, only non-authenticated connections are allowed over plain http. If set to true, then you need to ensure that the 'dspace.hostname ' parameter is set to the correctly. |
Property: | xmlui.user.registration |
Example Value: | xmlui.user.registration = true |
Informational Note: | Determine if new users should be allowed to register. This parameter is useful in conjunction with Shibboleth where you want to disallow registration because Shibboleth will automatically register the user. Default value is true. |
Property: | xmlui.user.editmetadata |
Example Value: | xmlui.user.editmetadata = true |
Informational Note: | Determines if users should be able to edit their own metadata. This parameter is useful in conjunction with Shibboleth where you want to disable the user's ability to edit their metadata because it came from Shibboleth. Default value is true. |
Property: | xmlui.user.assumelogon |
Example Value: | xmlui.user.assumelogon = true |
Informational Note: | Determine if super administrators (those whom are in the Administrators group) can login as another user from the "edit eperson" page. This is useful for debugging problems in a running dspace instance, especially in the workflow process. The default value is false, i.e., no one may assume the login of another user. |
Property: | xmlui.user.loginredirect |
Example Value: | xmlui.user.loginredirect = /profile |
Informational Note: | After a user has logged into the system, which url should they be directed? Leave this parameter blank or undefined to direct users to the homepage, or /profile for the user's profile, or another reasonable choice is /submissions to see if the user has any tasks awaiting their attention. The default is the repository home page. |
Property: | xmlui.theme.allowoverrides |
Example Value: | xmlui.theme.allowoverrides = false |
Informational Note: | Allow the user to override which theme is used to display a particular page. When submitting a request add the HTTP parameter "themepath" which corresponds to a particular theme, that specified theme will be used instead of the any other configured theme. Note that this is a potential security hole allowing execution of unintended code on the server, this option is only for development and debugging it should be turned off for any production repository. The default value unless otherwise specified is "false". |
Property: | xmlui.bundle.upload |
Example Value: | xmlui.bundle.upload = ORIGINAL, METADATA, THUMBNAIL, LICENSE, CC_LICENSE |
Informational Note: | Determine which bundles administrators and collection administrators may upload into an existing item through the administrative interface. If the user does not have the appropriate privileges (add and write) on the bundle then that bundle will not be shown to the user as an option. |
Property: | xmlui.community-list.render.full |
Example Value: | xmlui.community-list.render.full = true |
Informational Note: | On the community-list page should all the metadata about a community/collection be available to the theme. This parameter defaults to true, but if you are experiencing performance problems on the community-list page you should experiment with turning this option off. |
Property: | xmlui.community-list.cache |
Example Value: | xmlui.community-list.cache = 12 hours |
Informational Note: | Normally, Manakin will fully verify any cache pages before using a cache copy. This means that when the community-list page is viewed the database is queried for each community/collection to see if their metadata has been modified. This can be expensive for repositories with a large community tree. To help solve this problem you can set the cache to be assumed valued for a specific set of time. The downside of this is that new or editing communities/collections may not show up the website for a period of time. |
Property: | xmlui.bistream.mods |
Example Value: | xmlui.bistream.mods = true |
Informational Note: | Optionally, you may configure Manakin to take advantage of metadata stored as a bitstream. The MODS metadata file must be inside the "METADATA" bundle and named MODS.xml. If this option is set to 'true' and the bitstream is present then it is made available to the theme for display. |
Property: | xmlui.bitstream.mets |
Example Value: | xmlui.bitstream.mets = true |
Informational Note: | Optionally, you may configure Manakin to take advantage of metadata stored as a bitstream. The METS metadata file must be inside the "METADATA" bundle and named METS.xml. If this optino is set to "true" and the bitstream is present then it is made available to the theme for display. |
Property: | xmlui.google.analytics.key |
Example Value: | xmlui.google.analytics.key = UA-XXXXXX-X |
Informational Note: | If you would like to use google analytics to track general website statistics then use the following parameter to provide your analytics key. First sign up for an account at http://analytics.google.com, then create an entry for your repositories website. Google Analytics will give you a snipit of javascript code to place on your site, inside that snip it is your google analytics key usually found in the line: _uacct = "UA-XXXXXXX-X" Take this key (just the UA-XXXXXX-X part) and place it here in this parameter. |
Property: | xmlui.controlpanel.activity.max |
Example Value: | xmlui.controlpanel.activity.max = 250 |
Informational Note: | Assign how many page views will be recorded and displayed in the control panel's activity viewer. The activity tab allows an administrator to debug problems in a running DSpace by understanding who and how their dspace is currently being used. The default value is 250. |
Property: | xmlui.controlpanel.activity.ipheader |
Example Value: | xmlui.controlpanel.activity.ipheader = X-Forward-For |
Informational Note: | Determine where the control panel's activity viewer recieves an events IP address from. If your DSpace is in a load balanced enviornment or otherwise behind a context-switch then you will need to set the paramater to the HTTP parameter that records the original IP address. |
The Manakin user interface is composed of two distinct components: aspects and themes. Manakin aspects are like extensions or plugins for Manakin; they are interactive components that modify existing features or provide new features for the digital repository. Manakin themes stylize the look-and-feel of the repository, community, or collection.
The repository administrator is able to define which aspects and themes are installed for the particular repository by editing the [dspace]/config/xmlui.xconf
configuration file. The xmlui.xconf
file consists of two major sections: Aspects and Themes.
The <aspects>
section defines the "Aspect Chain", or the linear set of aspects that are installed in the repository. For each aspect that is installed in the repository, the aspect makes available new features to the interface. For example, if the "submission" aspect were to be commented out or removed from the xmlui.xconf
, then users would not be able to submit new items into the repository (even the links and language prompting users to submit items are removed). Each <aspect>
element has two attributes, name and path. The name is used to identify the Aspect, while the path determines the directory where the aspect's code is located. Here is the default aspect configuration:
+Chapter 7. DSpace System Documentation: Manakin [XMLUI] Configuration and Customization Table of Contents
The DSpace digital repository supports two user interfaces: one based on JavaServer Pages (JSP) technologies and one based upon the Apache Cocoon framework. This chapter describes those parameters which are specific to the Manakin (XMLUI) interface based upon the Cocoon framework.
In an effort to save the programmer/administrator some time, the configuration table below is taken from 5.3.43. XMLUI Specific Configuration.
Property: xmlui.supportedLocales
Example Value: xmlui.supportedLocales = en, de
Informational Note: A list of supported locales for Manakin. Manakin will look at a user's browser configuration for the first language that appears in this list to make available to in the interface. This parameter is a comma separated list of Locales. All types of Locales country, country_language, country_language_variant. Note that if the appropriate files are not present (i.e. Messages_XX_XX.xml) then Manakin will fall back through to a more general language. Property: xmlui.force.ssl
Example Value: xmlui.force.ssl = true
Informational Note: Force all authenticated connections to use SSL, only non-authenticated connections are allowed over plain http. If set to true, then you need to ensure that the ' dspace.hostname
' parameter is set to the correctly.Property: xmlui.user.registration
Example Value: xmlui.user.registration = true
Informational Note: Determine if new users should be allowed to register. This parameter is useful in conjunction with Shibboleth where you want to disallow registration because Shibboleth will automatically register the user. Default value is true. Property: xmlui.user.editmetadata
Example Value: xmlui.user.editmetadata = true
Informational Note: Determines if users should be able to edit their own metadata. This parameter is useful in conjunction with Shibboleth where you want to disable the user's ability to edit their metadata because it came from Shibboleth. Default value is true. Property: xmlui.user.assumelogon
Example Value: xmlui.user.assumelogon = true
Informational Note: Determine if super administrators (those whom are in the Administrators group) can login as another user from the "edit eperson" page. This is useful for debugging problems in a running dspace instance, especially in the workflow process. The default value is false, i.e., no one may assume the login of another user. Property: xmlui.user.loginredirect
Example Value: xmlui.user.loginredirect = /profile
Informational Note: After a user has logged into the system, which url should they be directed? Leave this parameter blank or undefined to direct users to the homepage, or /profile
for the user's profile, or another reasonable choice is/submissions
to see if the user has any tasks awaiting their attention. The default is the repository home page.Property: xmlui.theme.allowoverrides
Example Value: xmlui.theme.allowoverrides = false
Informational Note: Allow the user to override which theme is used to display a particular page. When submitting a request add the HTTP parameter "themepath" which corresponds to a particular theme, that specified theme will be used instead of the any other configured theme. Note that this is a potential security hole allowing execution of unintended code on the server, this option is only for development and debugging it should be turned off for any production repository. The default value unless otherwise specified is "false". Property: xmlui.bundle.upload
Example Value: xmlui.bundle.upload = ORIGINAL, METADATA, THUMBNAIL, LICENSE, CC_LICENSE
Informational Note: Determine which bundles administrators and collection administrators may upload into an existing item through the administrative interface. If the user does not have the appropriate privileges (add and write) on the bundle then that bundle will not be shown to the user as an option. Property: xmlui.community-list.render.full
Example Value: xmlui.community-list.render.full = true
Informational Note: On the community-list page should all the metadata about a community/collection be available to the theme. This parameter defaults to true, but if you are experiencing performance problems on the community-list page you should experiment with turning this option off. Property: xmlui.community-list.cache
Example Value: xmlui.community-list.cache = 12 hours
Informational Note: Normally, Manakin will fully verify any cache pages before using a cache copy. This means that when the community-list page is viewed the database is queried for each community/collection to see if their metadata has been modified. This can be expensive for repositories with a large community tree. To help solve this problem you can set the cache to be assumed valued for a specific set of time. The downside of this is that new or editing communities/collections may not show up the website for a period of time. Property: xmlui.bistream.mods
Example Value: xmlui.bistream.mods = true
Informational Note: Optionally, you may configure Manakin to take advantage of metadata stored as a bitstream. The MODS metadata file must be inside the "METADATA" bundle and named MODS.xml. If this option is set to 'true' and the bitstream is present then it is made available to the theme for display. Property: xmlui.bitstream.mets
Example Value: xmlui.bitstream.mets = true
Informational Note: Optionally, you may configure Manakin to take advantage of metadata stored as a bitstream. The METS metadata file must be inside the "METADATA" bundle and named METS.xml. If this optino is set to "true" and the bitstream is present then it is made available to the theme for display. Property: xmlui.google.analytics.key
Example Value: xmlui.google.analytics.key = UA-XXXXXX-X
Informational Note: If you would like to use google analytics to track general website statistics then use the following parameter to provide your analytics key. First sign up for an account at http://analytics.google.com, then create an entry for your repositories website. Google Analytics will give you a snipit of javascript code to place on your site, inside that snip it is your google analytics key usually found in the line: _uacct = "UA-XXXXXXX-X" Take this key (just the UA-XXXXXX-X part) and place it here in this parameter. Property: xmlui.controlpanel.activity.max
Example Value: xmlui.controlpanel.activity.max = 250
Informational Note: Assign how many page views will be recorded and displayed in the control panel's activity viewer. The activity tab allows an administrator to debug problems in a running DSpace by understanding who and how their dspace is currently being used. The default value is 250. Property: xmlui.controlpanel.activity.ipheader
Example Value: xmlui.controlpanel.activity.ipheader = X-Forward-For
Informational Note: Determine where the control panel's activity viewer recieves an events IP address from. If your DSpace is in a load balanced enviornment or otherwise behind a context-switch then you will need to set the paramater to the HTTP parameter that records the original IP address. The Manakin user interface is composed of two distinct components: aspects and themes. Manakin aspects are like extensions or plugins for Manakin; they are interactive components that modify existing features or provide new features for the digital repository. Manakin themes stylize the look-and-feel of the repository, community, or collection.
The repository administrator is able to define which aspects and themes are installed for the particular repository by editing the
[dspace]/config/xmlui.xconf
configuration file. Thexmlui.xconf
file consists of two major sections: Aspects and Themes.The
<aspects>
section defines the "Aspect Chain", or the linear set of aspects that are installed in the repository. For each aspect that is installed in the repository, the aspect makes available new features to the interface. For example, if the "submission" aspect were to be commented out or removed from thexmlui.xconf
, then users would not be able to submit new items into the repository (even the links and language prompting users to submit items are removed). Each<aspect>
element has two attributes, name and path. The name is used to identify the Aspect, while the path determines the directory where the aspect's code is located. Here is the default aspect configuration:<aspects> <aspect name="Artifact Browser" path="resource://aspects/ArtifactBrowser/" /> <aspect name="Administration" path="resource://aspects/Administrative/" /> <aspect name="E-Person" path="resource://aspects/EPerson/" /> <aspect name="Submission and Workflow" path="resource://aspects/Submission/" /> - </aspects>A standard distribution of Manakin/DSpace includes four "core" aspects:
Artifact Browser
The Artifact Browser Aspect is responsible for browsing communities, collections, items and bitstreams, viewing an individual item and searching the repository.
E-Person
The E-Person Aspect is responsible for logging in, logging out, registering new users, dealing with forgotten passwords, editing profiles and changing passwords.
Submission
The Submission Aspect is responsible for submitting new items to DSpace, determining the workflow process and ingesting the new items into the DSpace repository.
Administrative
The Administrative Aspect is responsible for administrating DSpace, such as creating, modifying and removing all communities, collections, e-persons, groups, registries and authorizations.
The
<themes>
section defines a set of "rules" that determine where themes are installed in the repository. Each rule is processed in the order that it appears, and the first rule that matches determines the theme that is applied (so order is important). Each rule consists of a<theme>
element with several possible attributes:
name (always required)
The name attribute is used to document the theme's name.
path (always required)
The path attribute determines where the theme is located relative to the
themes/
directory and must either contain a trailing slash or point directly to the theme'ssitemap.xmap
file.regex (either regex and/or handle is required)
The regex attribute determines which URLs the theme should apply to.
handle (either regex and/or handle is required)
The handle attribute determines which community, collection, or item the theme should apply to.
If you use the "handle" attribute, the effect is cascading, meaning if a rule is established for a community then all collections and items within that community will also have this theme apply to them as well. Here is an example configuration:
+ </aspects>A standard distribution of Manakin/DSpace includes four "core" aspects:
Artifact Browser
The Artifact Browser Aspect is responsible for browsing communities, collections, items and bitstreams, viewing an individual item and searching the repository.
E-Person
The E-Person Aspect is responsible for logging in, logging out, registering new users, dealing with forgotten passwords, editing profiles and changing passwords.
Submission
The Submission Aspect is responsible for submitting new items to DSpace, determining the workflow process and ingesting the new items into the DSpace repository.
Administrative
The Administrative Aspect is responsible for administrating DSpace, such as creating, modifying and removing all communities, collections, e-persons, groups, registries and authorizations.
The
<themes>
section defines a set of "rules" that determine where themes are installed in the repository. Each rule is processed in the order that it appears, and the first rule that matches determines the theme that is applied (so order is important). Each rule consists of a<theme>
element with several possible attributes:
name (always required)
The name attribute is used to document the theme's name.
path (always required)
The path attribute determines where the theme is located relative to the
themes/
directory and must either contain a trailing slash or point directly to the theme'ssitemap.xmap
file.regex (either regex and/or handle is required)
The regex attribute determines which URLs the theme should apply to.
handle (either regex and/or handle is required)
The handle attribute determines which community, collection, or item the theme should apply to.
If you use the "handle" attribute, the effect is cascading, meaning if a rule is established for a community then all collections and items within that community will also have this theme apply to them as well. Here is an example configuration:
<themes> <theme name="Theme 1" handle="123456789/23" path="theme1/"/> <theme name="Theme 2" regex="community-list" path="theme2/"/> <theme name="Reference Theme" regex=".*" path="Reference/"/> - </themes>In the example above three themes are configured: "Theme 1", "Theme 2", and the "Reference Theme". The first rule specifies that "Theme 1" will apply to all communities, collections, or items that are contained under the parent community "123456789/23". The next rule specifies any URL containing the string "community-list" will get "Theme 2". The final rule, using the regular expression ".*", will match anything, so all pages which have not matched one of the preceding rules will be matched to the Reference Theme.
The XMLUI user interface supports multiple languages through the use of internationalization catalogues as defined by the Cocoon Internationalization Transformer. Each catalog contains the translation of all user-displayed strings into a particular language or variant. Each catalog is a single xml file whose name is based upon the language it is designated for, thus:
messages_language_country_variant.xml
messages_language_country.xml
messages_language.xml
messages.xml
The interface will automatically determine which file to select based upon the user's browser and system configuration. For example, if the user's browser is set to Australian English then first the system will check if
messages_en_au.xml
is available. If this translation is not available it will fall back tomessages_en.xml
, and finally if that is not available,messages.xml
.Manakin supplies an English only translation of the interface. In order to add other translations to the system, locate the
[dspace-source]/dspace/modules/xmlui/src/main/webapp/i18n/
directory. By default this directory will be empty; to add additional translations add alternative versions of themessages.xml
file in specific language and country variants as needed for your installation.To set a language other than English as the default language for the repository's interface, simply name the translation catalogue for the new default language "
messages.xml
"Manakin themes stylize the look-and-feel of the repository, community, or collection and are distributed as self-contained packages. A Manakin/DSpace installation may have multiple themes installed and available to be used in different parts of the repository. The central component of a theme is the sitemap.xmap, which defines what resources are available to the theme such as XSL stylesheets, CSS stylesheets, images, or multimedia files.
+ </themes>
In the example above three themes are configured: "Theme 1", "Theme 2", and the "Reference Theme". The first rule specifies that "Theme 1" will apply to all communities, collections, or items that are contained under the parent community "123456789/23". The next rule specifies any URL containing the string "community-list" will get "Theme 2". The final rule, using the regular expression ".*", will match anything, so all pages which have not matched one of the preceding rules will be matched to the Reference Theme.
The XMLUI user interface supports multiple languages through the use of internationalization catalogues as defined by the Cocoon Internationalization Transformer. Each catalog contains the translation of all user-displayed strings into a particular language or variant. Each catalog is a single xml file whose name is based upon the language it is designated for, thus:
messages_language_country_variant.xml
messages_language_country.xml
messages_language.xml
messages.xml
The interface will automatically determine which file to select based upon the user's browser and system configuration. For example, if the user's browser is set to Australian English then first the system will check if
messages_en_au.xml
is available. If this translation is not available it will fall back tomessages_en.xml
, and finally if that is not available,messages.xml
.Manakin supplies an English only translation of the interface. In order to add other translations to the system, locate the
[dspace-source]/dspace/modules/xmlui/src/main/webapp/i18n/
directory. By default this directory will be empty; to add additional translations add alternative versions of themessages.xml
file in specific language and country variants as needed for your installation.To set a language other than English as the default language for the repository's interface, simply name the translation catalogue for the new default language "
messages.xml
"Manakin themes stylize the look-and-feel of the repository, community, or collection and are distributed as self-contained packages. A Manakin/DSpace installation may have multiple themes installed and available to be used in different parts of the repository. The central component of a theme is the sitemap.xmap, which defines what resources are available to the theme such as XSL stylesheets, CSS stylesheets, images, or multimedia files.
1) Create theme skeleton
Most theme developers do not create a new theme from scratch; instead they start from the standard theme template, which defines a skeleton structure for a theme. The template is located at:
[dspace-source]/dspace-xmlui/dspace-xmlui-webbapp/src/main/webbapp/themes/template
. To start your new theme simply copy the theme template into your locally defined modules directory,[dspace-source]/dspace/modules/xmlui/src/main/webbapp/themes/[your theme's directory]/
.2) Modify theme variables @@ -21,14 +21,50 @@ 3) Add your CSS stylesheets
The base theme template will produce a repository interface without any style - just plain XHTML with no color or formatting. To make your theme useful you will need to supply a CSS Stylesheet that creates your desired look-and-feel. Add your new CSS stylesheets:
[your theme's directory]/lib/style.css
(The base style sheet used for all browsers)
[your theme's directory]/lib/style-ie.css
(Specific stylesheet used for internet explorer)4) Install theme and rebuild DSpace -
Next rebuild and deploy Dspace (replace <version> with the your current release):
Rebuild the DSpace installation package by running the following command from your
[dspace-source]/dspace/
directory:mvn packageUpdate all DSpace webapps to
[dspace]/webapps
by running the following command from your[dspace-source]/dspace/target/dspace-[version]-build.dir
directory:ant -Dconfig=[dspace]/config/dspace.cfg update
Deploy the the new webapps:
cp -R /[dspace]/webapps/* /[tomcat]/webappsRestart Tomcat
This will ensure the theme has been installed as described in the previous section "Configuring Themes and Aspects".
The XMLUI user interface supports the addition of globally static content (as well as static content within individual themes).
Globally static content can be placed in the
[dspace-source]/dspace/modules/xmlui/src/main/webapp/static/
directory. By default this directory only contains the defaultrobots.txt
file, which provides helpful site information to web spiders/crawlers. However, you may also add static HTML (*.html
) content to this directory, as needed for your installation.Any static HTML content you add to this directory may also reference static content (e.g. CSS, Javascript, Images, etc.) from the same
[dspace-source]/dspace/modules/xmlui/src/main/webapp/static/
directory. You may reference other static content from your static HTML files similar to the following:+Next rebuild and deploy Dspace (replace <version> with the your current release):
Rebuild the DSpace installation package by running the following command from your
[dspace-source]/dspace/
directory:mvn packageUpdate all DSpace webapps to
[dspace]/webapps
by running the following command from your[dspace-source]/dspace/target/dspace-[version]-build.dir
directory:ant -Dconfig=[dspace]/config/dspace.cfg update
Deploy the the new webapps:
cp -R /[dspace]/webapps/* /[tomcat]/webappsRestart Tomcat
This will ensure the theme has been installed as described in the previous section "Configuring Themes and Aspects".
The XMLUI "news" document is only shown on the root page of your repository. It was intended to provide the title and introductory message, but you may use it for anything.
The news document is located at
[dspace]/dspace/config/news-xmlui.xml
. There is only one version; it is localized by inserting "i18n" callouts into the text areas. It must be a complete and valid XML DRI document (see Chapter 15).Its (the News document) exact rendering in the XHTML UI depends, of course, on the theme. The default content is designed to operate with the reference themes, so when you modify it, be sure to preserve the tag structure and e.g. the exact attributes of the first DIV tag. Also note that the text is DRI, not HTML, so you must use only DRI tags, such as the XREF tag to construct a link.
Example 1: a single language:
<document> + <body> + <div id="file.news.div.news" n="news" rend="primary"> + <head> TITLE OF YOUR REPOSITORY HERE </head> + <p> + INTRO MESSAGE HERE + Welcome to my wonderful repository etc etc ... + A service of <xref target="http://myuni.edu/">My University</xref> + </p> + </div> + </body> + <options/> + <meta> + <userMeta/> + <pageMeta/> + <repositoryMeta/> + </meta> + </document>Example 2: all text replaced by references to localizable message keys:
+<document> + <body> + <div id="file.news.div.news" n="news" rend="primary"> + <head><i18n:text>myuni.repo.title</i18n:text></head> + <p> + <i18n:text>myuni.repo.intro</i18n:text> + <i18n:text>myuni.repo.a.service.of</i18n:text> + <xref target="http://myuni.edu/"><i18n:text>myuni.name</i18n:text></xref> + </p> + </div> + </body> + <options/> + <meta> + <userMeta/> + <pageMeta/> + <repositoryMeta/> + </meta> + </document> +The XMLUI user interface supports the addition of globally static content (as well as static content within individual themes).
Globally static content can be placed in the
[dspace-source]/dspace/modules/xmlui/src/main/webapp/static/
directory. By default this directory only contains the defaultrobots.txt
file, which provides helpful site information to web spiders/crawlers. However, you may also add static HTML (*.html
) content to this directory, as needed for your installation.Any static HTML content you add to this directory may also reference static content (e.g. CSS, Javascript, Images, etc.) from the same
[dspace-source]/dspace/modules/xmlui/src/main/webapp/static/
directory. You may reference other static content from your static HTML files similar to the following:<link href="./static/mystyle.css" rel="stylesheet" type="text/css"/> <img src="./static/images/static-image.gif" alt="Static image in /static/images/ directory"/> - <img src="./static/static-image.jpg" alt="Static image in /static/ directory"/>
Copyright © 2002-2009 - The DSpace Foundation + <img src="./static/static-image.jpg" alt="Static image in /static/ directory"/>
This section will give the necessary steps to set up the OAI-ORE Harvester usig Manakin.
Setting up a collection (Collection Edit Screen):
Login and create a new collection.
Go to the tab named "Content Source" that now appears next to "Edit Metadata" and "Assign Roles " in the collection edit screens.
The two counter source options are standards (selected by default) and harvested. Select "harvests from external source" and click Save.
A new set of menus appear to configure the harvesting settings:
"OAI Provide" is in the URL of the OAI-PMH provider that the content from this collection should be harvested from. The PMH provider deployed with DSpace typically has the form:
"http://dspace.url/oai/reuqest". For this example use "http://web01.library.tamu.edu/oai-h151/request"
"OAI Set id" is the setSpec of the collection you wish to harvest from.
Use "hdl_1969.1_5671" for this example.
"Metadata format" determines the format that the descriptive metdata will be harvested. Since DSpace stores metadata in its own internal format, not all metadata values might bet harvested if a specific format is specified. Select "DSpace Intermediate Metadata" if available and "Simple Dublin Core" otherwise.
Click the Test Settings button will verify the settings supplied in the previous steps and will usually let you know what, if anything is missing or does not match up.
The list of radio buttons labeled "Content being harvested" allows you to select the harvest level. The first one requires no OAI-ORE support on the part of the provider and can be used to harvest metadata from any provider compliant with the OAI-PMH 2.0 specifications. The middle options will harvest the metadata and generate links to bitstreams stored remotely, while the last one will perform full local replication.
Select the middle option and click Save
At this point the settings are saved and the menu changes to provide three options:
"Change Settings" takes you back to the edit screen.
"Import Now" performs a single harvest from the remote collection into the local one. Success, notes, and errors encountered in the process will be reflected in the "Last Harvest Result" entry. More detailed information is available in the dspace log. Note that the whole harvest cycle is execuited withtin a single HTTP request and will time out for large collections. For this reason, it is advisable to use the automatic harvest scheduler set up + either in XMLUI or from the command line. If the scheduler is running, "Import Now" will handle the harvest task as a separate thread.
"Reset and Reimport Collection" will perform the same function as "Import Now", but will clear the collection of all existing items before doing so.
Setting up automatic harvesting in the Control Panel Screen.
A new tabl, Harvesting, has been added under Administrative —> Control Panel.
The panel offers the following information:
Available actions:
STart Harvester: starts the scheduler. From this point on, all properly configured collections (listed on the next line) will be harvested at regular intervals. This interval can be changed in the dspace.cfg
using the "harvester.harvestFrequency
" parameter.
Pause: the "nice" stop; waits for the active harvests to finish, saves the state/progress and pauses execution. Can be either resumed or stopped.
Stop: the "full stop"; waits for the current item to finish harvesting, and aborts further execution.
Reset Harvest Status: since stoppin in the middle of a harvest is likely to result in collections getting "stuck" in the queue, the button is available to clear all states.
Copyright © 2002-2010 + The DuraSpace Foundation
\ No newline at end of file +Licensed under a Creative Commons Attribution 3.0 United States License
\ No newline at end of file diff --git a/dspace/docs/html/ch08.html b/dspace/docs/html/ch08.html index ffe12d1ab7..008df1a7ad 100644 --- a/dspace/docs/html/ch08.html +++ b/dspace/docs/html/ch08.html @@ -1,10 +1,10 @@ -Table of Contents
DSpace operates on several levels: as a Tomcat servlet, cron jobs, and on-demand operations. This section explains many of the on-demand operations. Some of the command operations may be also set up as cron jobs. Many of these operations are performed at the Command Line Interface (CLI) also known as the Unix prompt ($:) Future reference will use the term CLI when the use needs to be at the command line.
Below is the "Command Help Table". This table explains what data is contained in the individual command/help tables in the sections that follow.
Table 8.1. Command Help Table
Command used: |
+ Table of Contents
DSpace operates on several levels: as a Tomcat servlet, cron jobs, and on-demand operations. This section explains many of the on-demand operations. Some of the command operations may be also set up as cron jobs. Many of these operations are performed at the Command Line Interface (CLI) also known as the Unix prompt ($:) Future reference will use the term CLI when the use needs to be at the command line. Below is the "Command Help Table". This table explains what data is contained in the individual command/help tables in the sections that follow. Table 8.1. Command Help Table
DSpace Command Launcher. With DSpace Release 1.6, the many commands and scripts have been replaced with a simple This CLI tool gives you the ability to import acommunity and collection structure directory froma source XML file. Table 8.2. Structure Importer Command Table
The administrator need to build the source xml document in the following format: <import_structure> + |
DSpace Command Launcher. With DSpace Release 1.6, the many commands and scripts have been replaced with a simple [dspace]/bin/dspace <command>
command. See Application Layer chapter for the details of the DSpace Command Launcher.
This CLI tool gives you the ability to import acommunity and collection structure directory froma source XML file.
Table 8.2. Structure Importer Command Table
Command used: | [dspace]/bin/dspace structure-builder |
Java class: | org.dspace.administer.StructBuilder |
Argument: short and long (if available) forms: | Description of the argument |
-f | Source xml file. |
-o | Output xml file. |
-e | Email of DSpace Administrator. |
The administrator need to build the source xml document in the following format:
<import_structure> <community> <name>Community Name</name> <description>Descriptive text</description> @@ -50,18 +50,18 @@ </collection> </community> </import_structure> -
This command-line tool gives you the ability to import a community and collection structure directly from a source XML file. It is executed as follows:
[dspace]/bin/dspace structure-builder -f /path/to/source.xml -o path/to/output.xml -e admin@user.com
This will examine the contents of [source xml]
, import the structure into DSpace while logged in as the supplied administrator, and then output the same structure to the output file, but including the handle for each imported community and collection as an attribute.
This command-line tool gives you access to the Packager plugins. It can ingest a package to create a new DSpace Item, or disseminate an Item as a package.
To see all the options, invoke it as:
+
This command-line tool gives you the ability to import a community and collection structure directly from a source XML file. It is executed as follows:
[dspace]/bin/dspace structure-builder -f /path/to/source.xml -o path/to/output.xml -e admin@user.com
This will examine the contents of [source xml]
, import the structure into DSpace while logged in as the supplied administrator, and then output the same structure to the output file, but including the handle for each imported community and collection as an attribute.
This command-line tool gives you access to the Packager plugins. It can ingest a package to create a new DSpace Item, or disseminate an Item as a package.
To see all the options, invoke it as:
[dspace]
/bin/packager --help
-
This mode also displays a list of the names of package ingesters and disseminators that are available.
To ingest a package from a file, give the command:
[dspace]/bin/packager -e user -c handle -t packager path
Where user
is the e-mail address of the E-Person under whose authority this runs; handle
is the Handle of the collection into which the Item is added, packager
is the plugin name of the package ingester to use, and path
is the path to the file to ingest (or "-"
to read from the standard input).
Here is an example that loads a PDF file with internal metadata as a package:
+
This mode also displays a list of the names of package ingesters and disseminators that are available.
To ingest a package from a file, give the command:
[dspace]/bin/packager -e user -c handle -t packager path
Where user
is the e-mail address of the E-Person under whose authority this runs; handle
is the Handle of the collection into which the Item is added, packager
is the plugin name of the package ingester to use, and path
is the path to the file to ingest (or "-"
to read from the standard input).
Here is an example that loads a PDF file with internal metadata as a package:
/dspace/bin/packager -e florey@mit.edu -c 1721.2/13 -t pdf thesis.pdf
This example takes the result of retrieving a URL and ingests it:
wget -O - http://alum.mit.edu/jarandom/my-thesis.pdf | \ -/dspace/bin/packager -e florey@mit.edu -c 1721.2/13 -t pdf -
To disseminate an Item as a package, give the command:
[dspace]/bin/packager -e user -d -i handle -t packager path
Where user
is the e-mail address of the E-Person under whose authority this runs; handle
is the Handle of the Item to disseminate; packager
is the plugin name of the package disseminator to use; and path
is the path to the file to create (or "-"
to write to the standard output). This example writes an Item out as a METS package in the file "454.zip":
+/dspace/bin/packager -e florey@mit.edu -c 1721.2/13 -t pdf -
To disseminate an Item as a package, give the command:
[dspace]/bin/packager -e user -d -i handle -t packager path
Where user
is the e-mail address of the E-Person under whose authority this runs; handle
is the Handle of the Item to disseminate; packager
is the plugin name of the package disseminator to use; and path
is the path to the file to create (or "-"
to write to the standard output). This example writes an Item out as a METS package in the file "454.zip":
/dspace/bin/packager -e florey@mit.edu -d -i 1721.2/454 -t METS 454.zip
-
Since DSpace 1.4 release, the software includes a package disseminator and matching ingester for the DSpace METS SIP (Submission Information Package) format. They were created to help end users prepare sets of digital resources and metadata for submission to the archive using well-defined standards such as METS, MODS, and PREMIS. The plugin name is METS
by default, and it uses MODS for descriptive metadata.
The DSpace METS SIP profile is available at: - http://www.dspace.org/standards/METS/SIP/profilev1p0/metsipv1p0.pdf .
DSpace has a set of command line tools for importing and exporting items in batches, using the DSpace simple archive format. The tools are not terribly robust, but are useful and are easily modified. They also give a good demonstration of how to implement your own item importer if desired.
The basic concept behind the DSpace's simple archive format is to create an archive, which is directory full of items, with a subdirectory per item. Each item directory contains a file for the item's descriptive metadata, and the files that make up the item.
+
Since DSpace 1.4 release, the software includes a package disseminator and matching ingester for the DSpace METS SIP (Submission Information Package) format. They were created to help end users prepare sets of digital resources and metadata for submission to the archive using well-defined standards such as METS, MODS, and PREMIS. The plugin name is METS
by default, and it uses MODS for descriptive metadata.
The DSpace METS SIP profile is available at: + http://www.dspace.org/standards/METS/SIP/profilev1p0/metsipv1p0.pdf .
DSpace has a set of command line tools for importing and exporting items in batches, using the DSpace simple archive format. The tools are not terribly robust, but are useful and are easily modified. They also give a good demonstration of how to implement your own item importer if desired.
The basic concept behind the DSpace's simple archive format is to create an archive, which is directory full of items, with a subdirectory per item. Each item directory contains a file for the item's descriptive metadata, and the files that make up the item.
archive_directory/ item_000/ dublin_core.xml -- qualified Dublin Core metadata for metadata fields belonging to the dc schema @@ -87,92 +87,90 @@ archive_directory/ license
Please notice that the license is optional, and if you wish to have one included, you can place the file in the .../item_001/ directory, for example.
The bitstream name may optionally be followed by the sequence:
\tbundle:bundlename
-
where '\t' is the tab character and 'bundlename' is replaced by the name of the bundle to which the bitstream should be added. If no bundle is specified, the bitstream will be added to the 'ORIGINAL' bundle.
![]() | |
Before running the item importer over items previously exported from a DSpace instance, please first refer to Transferring Items Between DSpace Instances. |
Table 8.3. Import Items Command Table
Command used: |
+ where '\t' is the tab character and 'bundlename' is replaced by the name of the bundle to which the bitstream should be added. If no bundle is specified, the bitstream will be added to the 'ORIGINAL' bundle.
Table 8.3. Import Items Command Table
‡ These are mutually exclusive. The item importer is able to batch import unlimited numbers of items for a particular collection using a very simple CLI command and 'arguments' - To add items to a collection, you gather the following information:
At the command line: + | |||||||||||||||||||||||||||||||||||
Arguments short and (long) forms: | Description | |||||||||||||||||||||||||||||||||||
-a or --add | Add items to DSpace ‡ | |||||||||||||||||||||||||||||||||||
-r or --replace | Replace items listed in mapfile ‡ | |||||||||||||||||||||||||||||||||||
-d or --delete | Delete items listed in mapfile ‡ | |||||||||||||||||||||||||||||||||||
-s or --source | Source of the items (directory) | |||||||||||||||||||||||||||||||||||
-c or --collection | Destination Collection by their Handle or database ID | |||||||||||||||||||||||||||||||||||
-m or --mapfile | Where the mapfile for items can be found (name and directory) | |||||||||||||||||||||||||||||||||||
-e or --eperson | Email of eperson doing the importing | |||||||||||||||||||||||||||||||||||
-w or --workflow | Send submission through collection' workflow | |||||||||||||||||||||||||||||||||||
-n or --notify | Kicks off the email alerting of the item(s) has(have) been imported | |||||||||||||||||||||||||||||||||||
-t or --test | Test run—do not actually import items | |||||||||||||||||||||||||||||||||||
-p or --template | Apply the collection template | |||||||||||||||||||||||||||||||||||
-R or --resume | Resume a failed import (Used on Add only) | |||||||||||||||||||||||||||||||||||
-h or --help | Command help |
‡ These are mutually exclusive.
The item importer is able to batch import unlimited numbers of items for a particular collection using a very simple CLI command and 'arguments'
To add items to a collection, you gather the following information:
eperson
Collection ID (either Handle (e.g. 123456789/14) or Database ID (e.g. 2)
Source directory where the items reside
Mapfile. Since you don't have one, you need to determine where it will be (e.g. /Import/Col_14/mapfile)
At the command line:
[dspace]/bin/import --add --eperson=joe@user.com --collection=CollectionID --source=items_dir --mapfile=mapfile
or by using the short form:
[dspace]/bin/import -a -e joe@user.com -c CollectionID -s items_dir -m mapfile
-
The above command would cycle through the archive directory's items, import them, and then generate a map file which stores the mapping of item directories to item handles. SAVE THIS MAP FILE. Using the map file you can use it for replacing or deleting (unimporting) the file.
Testing. You can add --test
(or -t
) to the command to simulate the entire import process without actually doing the import. This is extremely useful for verifying your import files before doing the actual import.
Replacing existing items is relatively easy. Remember that mapfile you were supposed to save? Now you will use it. The command (in short form):
+
The above command would cycle through the archive directory's items, import them, and then generate a map file which stores the mapping of item directories to item handles. SAVE THIS MAP FILE. Using the map file you can use it for replacing or deleting (unimporting) the file.
Testing. You can add --test
(or -t
) to the command to simulate the entire import process without actually doing the import. This is extremely useful for verifying your import files before doing the actual import.
Replacing existing items is relatively easy. Remember that mapfile you were supposed to save? Now you will use it. The command (in short form):
[dspace]/bin/import -r -e joe@user.com -c collectionID -s items_dir -m mapfile
Long form:
[dspace]/bin/import --replace --eperson=joe@user.com --collection=collectionID --source=items_dire --mapfile=mapfile
-
You are able to unimport or delete items provided you have the mapfile. Remember that mapfile you were supposed to save? The command is (in short form):
+
You are able to unimport or delete items provided you have the mapfile. Remember that mapfile you were supposed to save? The command is (in short form):
[dspace]/bin/import -d -m mapfile
In long form:
[dspace/bin/import --delete --mapfile mapfile
-
Workflow. The importer usually bypasses any workflow assigned to a collection. But add the --workflow
(-w
) argument will route the imported items through the workflow system.
Templates. If you have templates that have constant data and you wish to apply that data during batch importing, add the --template
(-p
) argument.
Resume. If, during importing, you have an error and the import is aborted, you can use the --resume
(-R
) flag that you can try to resume the import where you left off after you fix the error.
The item exporter can export a single item or a collection of items, and creates a DSpace simple archive for each item to be exported.
Table 8.4. Export Items Command Table
Command used: |
+ Workflow. The importer usually bypasses any workflow assigned to a collection. But add the Templates. If you have templates that have constant data and you wish to apply that data during batch importing, add the Resume. If, during importing, you have an error and the import is aborted, you can use the The item exporter can export a single item or a collection of items, and creates a DSpace simple archive for each item to be exported. Table 8.4. Export Items Command Table
Exporting a Collection To export a collection's items you type at the CLI: [dspace]/bin/dspace export --type=COLLECTION --id=collID --dest=dest_dir --number=seq_num Short form:
Exporting a Single Item The keyword
Short form:
Each exported item will have an additional file in its directory, named 'handle'. This will contain the handle that was assigned to the item, and this file will be read by the importer so that items exported and then imported to another machine will retain the item's original handle. The Using the Where items are to be moved between DSpace instances (for example from a test DSpace into a production DSpace) the item exporter and item importer can be used in conjunction with a script to assist in this process. After running the item exporter each
In order to avoid duplication of this metadata, run
prior to running the item importer. This will remove the above metadata items, except for date.issued - if the item has been published or publicly distributed before and ItemUpdate is a batch-mode command-line tool for altering the metadata and bitstream content of existing items in a DSpace instance. It is a companion tool to ItemImport and uses the DSpace simple archive format to specify changes in metadata and bitstream contents. Those familiar with generating the source trees for ItemImporter will find a similar environment in the use of this batch processing tool. For metadata, ItemUpdate can perform 'add' and 'delete' actions on specified metadta elements. For bitstreams, 'add' and 'delete' are similarly available. All these actions can be combined in a single batch run. ItemUpdate supports an undo feature for all actions except bitstream deletion. There is also a test mode, as with ItemImport. However, unlike ItemImport, there is no resume feature for incomplete processing. There is more extensive logging with a summary statement at the end with counts of successful and unsuccessful items processed. One probable scenario for using this tool is where there is an external primary data source for which the DSpace instance is a secondary or down-stream system. Metadata and/or bitstream content changes in the primary system can be exported to the simple archive format to be used by ItemUpdate to synchronize the changes. A note on terminology: item refers to a DSpace item. metadata element refers generally to a qualified or unqualified element in a schema in the form As with ItemImporter, the idea behind the DSpace's simple archive format is to create an archive directory with a subdirectory per item. There are a few additional features added to this format specifically for ItemUpdate. Note that in the simple archive format, the item directories are merely local references and only used by ItemUpdate in the log output. The user is referred to the previous section DSpace Simple Archive Format. Additionally, the use of a delete_contents is now available. This file lists the bitstreams to be deleted, one bitstream ID per line. Currently, no other identifiers for bitstreams are usable for this function. This file is an addition to the Archive format specifically for ItemUpdate. The optional suppress_undo file is a flag to indicate that the 'undo archive' should not be written to disk. This file is usually written by the application in an undo archive to prevent a recursive undo. This file is an addition to the Archive format specifically for ItemUpdate. Table 8.5. ItemUpdate Command Table
Exporting a Collection To export a collection's items you type at the CLI: [dspace]/bin/dspace export --type=COLLECTION --id=collID --dest=dest_dir --number=seq_num Short form:
Exporting a Single Item The keyword
Short form:
Each exported item will have an additional file in its directory, named 'handle'. This will contain the handle that was assigned to the item, and this file will be read by the importer so that items exported and then imported to another machine will retain the item's original handle. The Using the Where items are to be moved between DSpace instances (for example from a test DSpace into a production DSpace) the item exporter and item importer can be used in conjunction with a script to assist in this process. After running the item exporter each
In order to avoid duplication of this metadata, run
prior to running the item importer. This will remove the above metadata items, except for date.issued - if the item has been published or publicly distributed before and ItemUpdate is a batch-mode command-line tool for altering the metadata and bitstream content of existing items in a DSpace instance. It is a companion tool to ItemImport and uses the DSpace simple archive format to specify changes in metadata and bitstream contents. Those familiar with generating the source trees for ItemImporter will find a similar environment in the use of this batch processing tool. For metadata, ItemUpdate can perform 'add' and 'delete' actions on specified metadta elements. For bitstreams, 'add' and 'delete' are similarly available. All these actions can be combined in a single batch run. ItemUpdate supports an undo feature for all actions except bitstream deletion. There is also a test mode, as with ItemImport. However, unlike ItemImport, there is no resume feature for incomplete processing. There is more extensive logging with a summary statement at the end with counts of successful and unsuccessful items processed. One probable scenario for using this tool is where there is an external primary data source for which the DSpace instance is a secondary or down-stream system. Metadata and/or bitstream content changes in the primary system can be exported to the simple archive format to be used by ItemUpdate to synchronize the changes. A note on terminology: item refers to a DSpace item. metadata element refers generally to a qualified or unqualified element in a schema in the form As with ItemImporter, the idea behind the DSpace's simple archive format is to create an archive directory with a subdirectory per item. There are a few additional features added to this format specifically for ItemUpdate. Note that in the simple archive format, the item directories are merely local references and only used by ItemUpdate in the log output. The user is referred to the previous section DSpace Simple Archive Format. Additionally, the use of a delete_contents is now available. This file lists the bitstreams to be deleted, one bitstream ID per line. Currently, no other identifiers for bitstreams are usable for this function. This file is an addition to the Archive format specifically for ItemUpdate. The optional suppress_undo file is a flag to indicate that the 'undo archive' should not be written to disk. This file is usually written by the application in an undo archive to prevent a recursive undo. This file is an addition to the Archive format specifically for ItemUpdate. Table 8.5. ItemUpdate Command Table
Registration is an alternate means of incorporating items, their metadata, and their bitstreams into DSpace by taking advantage of the bitstreams already being in storage accessible to DSpace. An example might be that there is a repository for existing digital assets. Rather than using the normal interactive ingest process or the batch import to furnish DSpace the metadata and to upload bitstreams, registration provides DSpace the metadata and the location of the bitstreams. DSpace uses a variation of the import tool to accomplish registration. To register an item its bitstreams must reside on storage accessible to DSpace and therefore referenced by an asset store number in DSpace uses the same import tool that is used for batch import except that several variations are employed to support registration. The discussion that follows assumes familiarity with the import tool. The archive format for registration does not include the actual content files (bitstreams) being registered. The format is however a directory full of items to be registered, with a subdirectory per item. Each item directory contains a file for the item's descriptive metadata ( The The -r -s n -f filepath + | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Arguments short and (long) forms: | Description | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
-a or --addmetadata [metadata element] | Repeatable for multiple elements. The metadata element should be in the form dc.x or dc.x.y. The mandatory argument indicates the metadata fields in the dublin_core.xml file to be added unless already present. However, duplicate fields will not be added to the item metadata without warning or error. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
-d or --deletemetadata [metadata element] | Repeatable for multiple elements. All metadata fields matching the element will be deleted. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
-A or --addbitstream | Adds bitstreams listed in the contents file with the bistream metadata cited there. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
-D or --deletebitstream [filter plug classname or alis] | Not repeatable. With no argument, this operation deletes bistreams listed in the deletes_contents file. Only bitstream ids are recognized identifiers for this operatiotn. The optional filter argument is the classname of an implementation of org.dspace.app.itemdupate.BitstreamFilter class to identify files for deletion or one of the aliases (ORIGINAL, ORIGINAL_AND_DERIVATIVES, TEXT, THUMBNAIL) which reference existing filters based on membership in a bundle of that name. IN this case, the delete_contents file is not required for any item. The filter properties file will contains properties pertinent to the particular filer used. Multiple filters are not allowed. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
-h or --help | Displays brief command line help. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
-e or --eperson | Email address of the person or the user's database ID (Required) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
-s or --source | Directory archive to process (Required) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
-i or --itemidentifier | Specifies an alternate metadata field (not a handle) used to hold an identifier used to match the DSpace item with that in the archive. If omitted, the item handle is expected to be located in the dc.identifier.uri field. (Optional) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
-t or --test | Runs the process in test mode with logging but no changes applied to the DSpace instance. (Optional) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
-P or --alterprovenance | Prevents any changes to the provenance field to represent changes in the bitstream content resulting from an Add or Delete. No provenance statements are written for thumbnails or text derivative bitstreams, un keepin with the practice of MediaFilterManager. (Optional) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
-F or --filterproperties | The filter properties files to be used by the delete bitstreams action (Optional) |
Registration is an alternate means of incorporating items, their metadata, and their bitstreams into DSpace by taking advantage of the bitstreams already being in storage accessible to DSpace. An example might be that there is a repository for existing digital assets. Rather than using the normal interactive ingest process or the batch import to furnish DSpace the metadata and to upload bitstreams, registration provides DSpace the metadata and the location of the bitstreams. DSpace uses a variation of the import tool to accomplish registration.
To register an item its bitstreams must reside on storage accessible to DSpace and therefore referenced by an asset store number in dspace.cfg
. The configuration file dspace.cfg
establishes one or more asset stores through the use of an integer asset store number. This number relates to a directory in the DSpace host's file system or a set of SRB account parameters. This asset store number is described in The dspace.cfg
Configuration Properties File section and in the dspace.cfg
file itself. The asset store number(s) used for registered items should generally not be the value of the assetstore.incoming
property since it is unlikely that you will want to mix the bitstreams of normally ingested and imported items and registered items.
DSpace uses the same import tool that is used for batch import except that several variations are employed to support registration. The discussion that follows assumes familiarity with the import tool.
The archive format for registration does not include the actual content files (bitstreams) being registered. The format is however a directory full of items to be registered, with a subdirectory per item. Each item directory contains a file for the item's descriptive metadata (dublin_core.xml
) and a file listing the item's content files (contents
), but not the actual content files themselves.
The dublin_core.xml
file for item registration is exactly the same as for regular item import.
The contents
file, like that for regular item import, lists the item's content files, one content file per line, but each line has the one of the following formats:
-r -s n -f filepath -r -s n -f filepath\tbundle:bundlename -r -s n -f filepath\tbundle:bundlename\tpermissions: -[r|w] 'group name' --r -s n -f filepath\tbundle:bundlename\tpermissions: -[r|w] 'group name'\tdescription: some text
where
-r
indicates this is a file to be registered
-s n
indicates the asset store number (n
)
-f filepath
indicates the path and name of the content file to be registered (filepath)
\t
is a tab character
bundle:bundlename
is an optional bundle name
permissions: -[r|w] 'group name'
is an optional read or write permission that can be attached to the bitstream
description: some text
is an optional description field to add to the file
The bundle, that is everything after the filepath, is optional and is normally not used.
The command line for registration is just like the one for regular import:
[dspace]/bin/dspace import -a -e joe@user.com -c collectionID -s items_dir -m mapfile
(or by using the long form)
[dspace]/bin/dspace import --add -eperson=joe@user.com --collection=collectionID --source=items_dir --map=mapfile
The --workflow
and --test
flags will function as described in Importing Items.
The --delete
flag will function as described in Importing Items but the registered content files will not be removed from storage. See Deleting Registered Items.
The --replace
flag will function as described in Importing Items but care should be taken to consider different cases and implications. With old items and new items being registered or ingested normally, there are four combinations or cases to consider. Foremost, an old registered item deleted from DSpace using --replace
will not be removed from the storage. See Deleting Registered Items. where is resides. A new item added to DSpace using --replace
will be ingested normally or will be registered depending on whether or not it is marked in the contents
files with the -r.
Once an item has been registered, superficially it is indistinguishable from items ingested interactively or by batch import. But internally there are some differences:
First, the randomly generated internal ID is not used because DSpace does not control the file path and name of the bitstream. Instead, the file path and name are that specified in the contents
file.
Second, the store_number
column of the bitstream database row contains the asset store number specified in the contents
file.
Third, the internal_id
column of the bitstream database row contains a leading flag (-R
) followed by the registered file path and name. For example, -Rfilepath
where filepath
is the file path and name relative to the asset store corresponding to the asset store number. The asset store could be traditional storage in the DSpace server's file system or an SRB account.
Fourth, an MD5 checksum is calculated by reading the registered file if it is in local storage. If the registerd file is in remote storage (say, SRB) a checksum is calculated on just the file name! This is an efficiency choice since registering a large number of large files that are in SRB would consume substantial network resources and time. A future option could be to have an SRB proxy process calculate MD5s and store them in SRB's metadata catalog (MCAT) for rapid retrieval. SRB offers such an option but it's not yet in production release.
Registered items and their bitstreams can be retrieved transparently just like normally ingested items.
Registered items may be exported as described in Exporting Items. If so, the export directory will contain actual copies of the files being exported but the lines in the contents file will flag the files as registered. This means that if DSpace items are "round tripped" (see Transferring Items Between DSpace Instances) using the exporter and importer, the registered files in the export directory will again registered in DSpace instead of being uploaded and ingested normally.
The METS Export Tool can also be used but note the cautions described in that section and note that MD5 values for items in remote storage are actually MD5 values on just the file name.
If a registered item is deleted from DSpace, either interactively or by using the --delete
or --replace
flags described in Importing Items, the item will disappear from DSpace but it's registered content files will remain in place just as they were prior to registration. Bitstreams not registered but added by DSpace as part of registration, such as license.txt
files, will be deleted.
The experimental (incomplete) METS export tool writes DSpace items to a filesystem with the metadata held in a more standard format based on METS.
This tool is obsolete, and does not export a complete AIP. It's use is strongly deprecated.
Table 8.6. Mets Export Command table
Command used: |
+-r -s n -f filepath\tbundle:bundlename\tpermissions: -[r|w] 'group name'\tdescription: some text where
The bundle, that is everything after the filepath, is optional and is normally not used. The command line for registration is just like the one for regular import:
(or by using the long form)
The The The Once an item has been registered, superficially it is indistinguishable from items ingested interactively or by batch import. But internally there are some differences: First, the randomly generated internal ID is not used because DSpace does not control the file path and name of the bitstream. Instead, the file path and name are that specified in the Second, the Third, the Fourth, an MD5 checksum is calculated by reading the registered file if it is in local storage. If the registerd file is in remote storage (say, SRB) a checksum is calculated on just the file name! This is an efficiency choice since registering a large number of large files that are in SRB would consume substantial network resources and time. A future option could be to have an SRB proxy process calculate MD5s and store them in SRB's metadata catalog (MCAT) for rapid retrieval. SRB offers such an option but it's not yet in production release. Registered items and their bitstreams can be retrieved transparently just like normally ingested items. Registered items may be exported as described in Exporting Items. If so, the export directory will contain actual copies of the files being exported but the lines in the contents file will flag the files as registered. This means that if DSpace items are "round tripped" (see Transferring Items Between DSpace Instances) using the exporter and importer, the registered files in the export directory will again registered in DSpace instead of being uploaded and ingested normally. The METS Export Tool can also be used but note the cautions described in that section and note that MD5 values for items in remote storage are actually MD5 values on just the file name. If a registered item is deleted from DSpace, either interactively or by using the The experimental (incomplete) METS export tool writes DSpace items to a filesystem with the metadata held in a more standard format based on METS. This tool is obsolete, and does not export a complete AIP. It's use is strongly deprecated. Table 8.6. Mets Export Command table
The following are examples of the types of process the METS tool can provide. Exporting an individual item. From the CLI:
Exporting a collection. From the CLI:
Exporting all the items in DSpace. From the CLI:
Note that this tool is deprecated, and the output format is not a true AIP Each exported item is written to a separate directory, created under the base directory specified in the command-line arguments, or in the current directory if Within each item directory is a An example AIP might look like this:
| ||||||||||||||||
Java class: | org.dspace.app.mets.METSExport | ||||||||||||||||
Arguments short and (long) forms: | Description | ||||||||||||||||
-a or --all | Export all items in the archive. | ||||||||||||||||
-c or --collection | Handle of the collection to export. | ||||||||||||||||
-d or --destination | Destination directory. | ||||||||||||||||
-i or --item | Handle of the item to export. | ||||||||||||||||
-h or --help | Help |
The following are examples of the types of process the METS tool can provide.
Exporting an individual item. From the CLI:
[dspace]
/bin/dspace mets-export -i
[handle] -d /path/to/destination
Exporting a collection. From the CLI:
[dspace]/bin/dspace mets-export -c [handle] -d /path/to/destination
Exporting all the items in DSpace. From the CLI:
[dspace]/bin/dspace mets-export -a -d /path/to/destination
Note that this tool is deprecated, and the output format is not a true AIP
Each exported item is written to a separate directory, created under the base directory specified in the command-line arguments, or in the current directory if --destination
is omitted. The name of each directory is the Handle, URL-encoded so that the directory name is 'legal'.
Within each item directory is a mets.xml
file which contains the METS-encoded metadata for the item. Bitstreams in the item are also stored in the directory. Their filenames are their MD5 checksums, firstly for easy integrity checking, and also to avoid any problems with 'special characters' in the filenames that were legal on the original filing system they came from but are illegal in the server filing system. The mets.xml
file includes XLink pointers to these bitstream files.
An example AIP might look like this:
hdl%3A123456789%2F8/
mets.xml
-- METS metadata
184BE84F293342
-- bitstream
3F9AD0389CB821
135FB82113C32D
-
The contents of the METS in the mets.xml
file are as follows:
A dmdSec
(descriptive metadata section) containing the item's metadata in Metadata Object Description Schema (MODS) XML. The Dublin Core descriptive metadata is mapped to MODS since there is no official qualified Dublin Core XML schema in existence as of yet, and the Library Application Profile of DC that DSpace uses includes some qualifiers that are not part of the DCMI Metadata Terms.
An amdSec
(administrative metadata section), which contains the a rights metadata element, which in turn contains the base64-encoded deposit license (the license the submitter granted as part of the submission process).
A fileSec
containing a list of the bitstreams in the item. Each bundle constitutes a fileGrp
. Each bitstream is represented by a file
element, which contains an FLocat
element with a simple XLink to the bitstream in the same directory as the mets.xml
file. The file
attributes consist of most of the basic technical metadata for the bitstream. Additionally, for those bitstreams that are thumbnails or text extracted from another bitstream in the item, those 'derived' bitstreams have the same GROUPID
as the bitstream they were derived from, in order that clients understand that there is a relationship.
The OWNERID
of each file
is the 'persistent' bitstream identifier assigned by the DSpace instance. The ID
and GROUPID
attributes consist of the item's Handle, together with the bitstream's sequence ID, which underscores used in place of dots and slashes. For example, a bitstream with sequence ID 24, in the item hdl:123.456/789
will have the ID
123_456_789_24
. This is because ID
and GROUPID
attributes must be of type xsd:id
.
No corresponding import tool yet
No structmap
section
Some technical metadata not written, e.g. the primary bitstream in a bundle, original filenames or descriptions.
Only the MIME type is stored, not the (finer grained) bitstream format.
Dublin Core to MODS mapping is very simple, probably needs verification
DSpace can apply filters to content/bitstreams, creating new content. Filters are included that extract text for full-text searching, and create thumbnails for items that contain images. The media filters are controlled by the MediaFilterManager
which traverses the asset store, invoking the MediaFilter
or FormatFilter
classes on bitstreams. The media filter plugin configuration filter.plugins
in dspace.cfg
contains a list of all enabled media/format filter plugins (see Configuring Media Filters for more information). The media filter system is intended to be run from the command line (or regularly as a cron task):
[dspace]/bin/filter-media
With no options, this traverses the asset store, applying media filters to bitstreams, and skipping bitstreams that have already been filtered.
+
The contents of the METS in the mets.xml
file are as follows:
A dmdSec
(descriptive metadata section) containing the item's metadata in Metadata Object Description Schema (MODS) XML. The Dublin Core descriptive metadata is mapped to MODS since there is no official qualified Dublin Core XML schema in existence as of yet, and the Library Application Profile of DC that DSpace uses includes some qualifiers that are not part of the DCMI Metadata Terms.
An amdSec
(administrative metadata section), which contains the a rights metadata element, which in turn contains the base64-encoded deposit license (the license the submitter granted as part of the submission process).
A fileSec
containing a list of the bitstreams in the item. Each bundle constitutes a fileGrp
. Each bitstream is represented by a file
element, which contains an FLocat
element with a simple XLink to the bitstream in the same directory as the mets.xml
file. The file
attributes consist of most of the basic technical metadata for the bitstream. Additionally, for those bitstreams that are thumbnails or text extracted from another bitstream in the item, those 'derived' bitstreams have the same GROUPID
as the bitstream they were derived from, in order that clients understand that there is a relationship.
The OWNERID
of each file
is the 'persistent' bitstream identifier assigned by the DSpace instance. The ID
and GROUPID
attributes consist of the item's Handle, together with the bitstream's sequence ID, which underscores used in place of dots and slashes. For example, a bitstream with sequence ID 24, in the item hdl:123.456/789
will have the ID
123_456_789_24
. This is because ID
and GROUPID
attributes must be of type xsd:id
.
No corresponding import tool yet
No structmap
section
Some technical metadata not written, e.g. the primary bitstream in a bundle, original filenames or descriptions.
Only the MIME type is stored, not the (finer grained) bitstream format.
Dublin Core to MODS mapping is very simple, probably needs verification
DSpace can apply filters to content/bitstreams, creating new content. Filters are included that extract text for full-text searching, and create thumbnails for items that contain images. The media filters are controlled by the MediaFilterManager
which traverses the asset store, invoking the MediaFilter
or FormatFilter
classes on bitstreams. The media filter plugin configuration filter.plugins
in dspace.cfg
contains a list of all enabled media/format filter plugins (see Configuring Media Filters for more information). The media filter system is intended to be run from the command line (or regularly as a cron task):
[dspace]/bin/filter-media
With no options, this traverses the asset store, applying media filters to bitstreams, and skipping bitstreams that have already been filtered.
Available Command-Line Options:
Help : [dspace]/bin/dspace filter-media -h
Display help message describing all command-line options.
Force mode : [dspace]/bin/dspace filter-media -f
Apply filters to ALL bitstreams, even if they've already been filtered. If they've already been filtered, the previously filtered content is overwritten.
Identifier mode : [dspace]/bin/dspace filter-media -i 123456789/2
Restrict processing to the community, collection, or item named by the identifier - by default, all bitstreams of all items in the repository are processed. The identifier must be a Handle, not a DB key. This option may be combined with any other option.
Maximum mode : [dspace]/bin/dspace filter-media -m 1000
Suspend operation after the specified maximum number of items have been processed - by default, no limit exists. This option may be combined with any other option.
No-Index mode : [dspace]/bin/dspace filter-media -n
Suppress index creation - by default, a new search index is created for full-text searching. This option suppresses index creation if you intend to run index-update
elsewhere.
Plugin mode : [dspace]/bin/dspace filter-media -p "PDF Text Extractor","Word Text Extractor"
Apply ONLY the filter plugin(s) listed (separated by commas). By default all named filters listed in the filter.plugins
field of dspace.cfg
are applied. This option may be combined with any other option. WARNING: multiple plugin names must be separated by a comma (i.e. ',') and NOT a comma followed by a space (i.e. ', ').
Skip mode : [dspace]/bin/dspace filter-media -s 123456789/9,123456789/100
SKIP the listed identifiers (separated by commas) during processing. The identifiers must be Handles (not DB Keys). They may refer to items, collections or communities which should be skipped. This option may be combined with any other option. WARNING: multiple identifiers must be separated by a comma (i.e. ','
) and NOT a comma followed by a space (i.e. ', '
).
NOTE: If you have a large number of identifiers to skip, you may maintain this comma-separated list within a separate file (e.g. filter-skiplist.txt
). Use the following format to call the program. Please note the use of the "grave" or "tick" (`
) symbol and do not use the single quotation.
[dspace]/bin/dspace filter-media -s `less filter-skiplist.txt`
-
Verbose mode : [dspace]/bin/dspace filter-media -v
Verbose mode - print all extracted text and other filter details to STDOUT.
Adding your own filters is done by creating a class which implements
the org.dspace.app.mediafilter.FormatFilter
interface. See the Creating a new Media Filter topic and comments in the source file FormatFilter.java for more information. In theory filters could be implemented in any programming language (C, Perl, etc.) However, they need to be invoked by the Java code in the Media Filter class that you create.
DSpace provides an administrative tool—'CommunityFiliator'—for managing community sub-structure. Normally this structure seldom changes, but prior to the 1.2 release sub-communities were not supported, so this tool could be used to place existing pre-1.2 communities into a hierarchy. It has two operations, either establishing a community to sub-community relationship, or dis-establishing an existing relationship.
The familiar parent/child metaphor can be used to explain how it works. Every community in DSpace can be either a 'parent' community—meaning it has at least one sub-community, or a 'child' community—meaning it is a sub-community of another community, or both or neither. In these terms, an 'orphan' is a community that lacks a parent (although it can be a parent); 'orphans' are referred to as 'top-level' communities in the DSpace user-interface, since there is no parent community 'above' them. The first operation—establishing a parent/child relationship - can take place between any community and an orphan. The second operation - removing a parent/child relationship—will make the child an orphan.
Table 8.7. Community Filiator Command table
Command used: |
+ Verbose mode :
Adding your own filters is done by creating a class which DSpace provides an administrative tool—'CommunityFiliator'—for managing community sub-structure. Normally this structure seldom changes, but prior to the 1.2 release sub-communities were not supported, so this tool could be used to place existing pre-1.2 communities into a hierarchy. It has two operations, either establishing a community to sub-community relationship, or dis-establishing an existing relationship. The familiar parent/child metaphor can be used to explain how it works. Every community in DSpace can be either a 'parent' community—meaning it has at least one sub-community, or a 'child' community—meaning it is a sub-community of another community, or both or neither. In these terms, an 'orphan' is a community that lacks a parent (although it can be a parent); 'orphans' are referred to as 'top-level' communities in the DSpace user-interface, since there is no parent community 'above' them. The first operation—establishing a parent/child relationship - can take place between any community and an orphan. The second operation - removing a parent/child relationship—will make the child an orphan. Table 8.7. Community Filiator Command table
Set a parent/child relationship, issue the following at the CLI:
(or using the short form)
where '-s' or '--set' means establish a relationship whereby the community identified by the '-p' parameter becomes the parent of the community identified by the '-c' parameter. Both the 'parentID' and 'childID' values may be handles or database IDs. The reverse operation looks like this:
(or using the short form)
where '-r' or '--remove' means dis-establish the current relationship in which the community identified by 'parentID' is the parent of the community identified by 'childID'. The outcome will be that the 'childID' community will become an orphan, i.e. a top-level community. If the required constraints of operation are violated, an error message will appear explaining the problem, and no change will be made. An example in a removal operation, where the stated child community does not have the stated parent community as its parent: "Error, child community not a child of parent community". It is possible to effect arbitrary changes to the community hierarchy by chaining the basic operations together. For example, to move a child community from one parent to another, simply perform a 'remove' from its current parent (which will leave it an orphan), followed by a 'set' to its new parent. It is important to understand that when any operation is performed, all the sub-structure of the child community follows it. Thus, if a child has itself children (sub-communities), or collections, they will all move with it to its new 'location' in the community tree. DSpace provides a batch metadata editing tool. The batch editing tool is able to produce a comma delimited file in the CVS format. The batch editing tool facilitates the user to perform the following:
The following table summarizes the basics. Table 8.8. Batch Editing Metatdata Export Command Table
Set a parent/child relationship, issue the following at the CLI:
(or using the short form)
where '-s' or '--set' means establish a relationship whereby the community identified by the '-p' parameter becomes the parent of the community identified by the '-c' parameter. Both the 'parentID' and 'childID' values may be handles or database IDs. The reverse operation looks like this:
(or using the short form)
where '-r' or '--remove' means dis-establish the current relationship in which the community identified by 'parentID' is the parent of the community identified by 'childID'. The outcome will be that the 'childID' community will become an orphan, i.e. a top-level community. If the required constraints of operation are violated, an error message will appear explaining the problem, and no change will be made. An example in a removal operation, where the stated child community does not have the stated parent community as its parent: "Error, child community not a child of parent community". It is possible to effect arbitrary changes to the community hierarchy by chaining the basic operations together. For example, to move a child community from one parent to another, simply perform a 'remove' from its current parent (which will leave it an orphan), followed by a 'set' to its new parent. It is important to understand that when any operation is performed, all the sub-structure of the child community follows it. Thus, if a child has itself children (sub-communities), or collections, they will all move with it to its new 'location' in the community tree. DSpace provides a batch metadata editing tool. The batch editing tool is able to produce a comma delimited file in the CVS format. The batch editing tool facilitates the user to perform the following:
The following table summarizes the basics. Table 8.8. Batch Editing Metatdata Export Command Table
| ||||||||||||||||||||||||||||||||||||||||||||||
Java class: | org.dspace.app.bulkedit.MetadataExport | ||||||||||||||||||||||||||||||||||||||||||||||
Arguments short and (long) forms): | Description | ||||||||||||||||||||||||||||||||||||||||||||||
-f or --file | Required. The filename of the resulting CSV. | ||||||||||||||||||||||||||||||||||||||||||||||
-i or --id | The Item, Collection, or Community handle or Database ID to export. If not specified, all items will be exported. | ||||||||||||||||||||||||||||||||||||||||||||||
-a or --all | Include all the metadata fields that are not normally changed (e.g. provenance) or those fields you configured in the dspace.cfg to be ignored on export. | ||||||||||||||||||||||||||||||||||||||||||||||
-h or --help | Display the help page. |
To run the batch editing exporter, at the command line:
[dspace]/bin/dspace metadata-export -f name_of_file.csv -i 1023/24
Example:
[dspace]/bin/dspace metadata-export -f /batch_export/col_14.csv -i /1989.1/24
-
In the above example we have requested that a collection, assigned handle '1989.1/24
' export the entire collection to the file 'col_14.cvs
' found in the '/batch_export
' directory.
The following table summarizes the basics.
Table 8.9. Batch Editing Metatdata Import Command Table
![]() | |
Silent Mode should be used carefully. It is possible (and probable) that you can overlay the wrong data and cause irreparable damage to the database. |
To run the batch importer, at the command line:
[dspace]/bin/dspace metadata-import -f name_of_file.csv
Example
[dspace]/bin/dspace metadata-import -f /dImport/col_14.csv
If you are wishing to upload new metadata without bistreams, at the command line:
[dspace]/bin/dspace/metadata-import -f /dImport/new_file.csv -e joe@user.com -w -n -t
-
In the above example we threw in all the arguments. This would add the metadata and engage the workflow, notification, and templates to all be applied to the items that are being added.
The csv files that this tool can import and export abide by the RFC4180 CSV format http://www.ietf.org/rfc/rfc4180.txt. This means that new lines, and embedded commas can be included by wrapping elements in double quotes. Double quotes can be included by using two double quotes. The code does all this for you, and any good csv editor such as Excel or OpenOffice will comply with this convention.
File Structure. The first row of the csv must define the metadata values that the rest of the csv represents. The first column must always be "id" which refers to the item'id. All other columns are optional. The other columns contain the dublin core metadata fields that the data is to reside.
A typical heading row looks like:
id,collection,dc.title,dc.contributor,dc.date.issued,etc,etc,etc.
Subsequent rows in the csv file relate to items. A typical row might look like:
350,2292,Item title,"Smith, John",2008
If you want to store multiple values for a given metadata element, they can be separated with the double-pipe '||' (or another character that you defined in your dspace.cfg
file. For example:
Horses||Dogs||Cats
Elements are stored in the database in the order that they appear in the csv file. You can use this to order elements where order may matter, such as authors, or controlled vocabulary such as Library of Congress Subject Headings.
When importing a csv file, the importer will overlay the data onto what is already in the repository to determine the differences. It only acts on the contents of the cvs file, rather than on the complete item metadata. This means that the CSV file that is exported can be manipulated quite substantially before being re-imported. Rows (items) or Columns (metadata elements) can be removed and will be ignored. For example, if you only want to edit item abstracts, you can remove all of the other columns and just leave the abstract column. (You do need to leave the ID column intact. This is mandatory).
Deleting Data. It is possible to perform deletes across the board of certain metadata fields from an exported file. For example, let's say you have used keywords (dc.subject) that need to be removed en masse. You would leave the column (dc.subject) intact, but remove the data in the corresponding rows.
Migrating Data or Exchanging data.It is possbile that you have data in one Dublin Core (DC) element and you wish to really have it in another. An example would be that your staff have input Library of Congress Subject Headings in the Subject field (dc.subject) instead of the LCSH field (dc.subject.lcsh). Follow these steps and your data is migrated upon import:
Insert a new column. The first row should be the new metadata element. (We will refer to it as the TARGET)
Select the column/rows of the data you wish to change. (We will refer to it as the SOURCE)
Cut and paste this data into the new column (TARGET) you created in Step 1.
Leave the column (SOURCE) you just cut and pasteed from empty. Do not delete it.
Checksum Checker is program that can run to verify the checksum of every item within DSpace. Checksum Checker was designed with the idea that most System Administrators will run it from the cron. Depending on the size of the repository choose the options wisely.
Table 8.10. Checksum Checker Information Table
Command used: | [dspace] /bin/dspace checker |
Java class: | org.dspace.app.checker.ChecksumChecker |
Arguments short and (long) forms): | Description |
-L or --continuous | Loop continuously through the bitstreams |
-a or --handle | Specify a handle to check |
-b <bitstream-ids> | Space separated list of bitstream IDs |
-c or --count | Check count |
-d or --duration | Checking duration |
-h or --help | Calls online help |
-l or --looping | Loop once through bitstreams |
-p <prune> | Prune old results (optionally using specified properties file for configuration |
-v or --verbose | Report all processing |
There are three aspects of the Checksum Checker's operation that can be configured:
the execution mode
the logging output
the policy for removing old checksum results from the database
The user should refer to Chapter 5. Configuration for specific configuration beys in the dspace.cfg
file.
Execution mode can be configured using command line options. Information on the options are found in the previous table above. The different modes are described below.
Unless a particular bitstream or handle is specified, the Checksum Checker will always check bitstreams in order of the least recently checked bitstream. (Note that this means that the most recently ingested bitstreams will be the last ones checked by the Checksum Checker.)
Available command line options
Limited-count mode: [dspace]/bin/dspace checker -c
To check a specific number of bitstreams. The -c
option if followed by an integer, the number of bitstreams to check.
Example: [dspace/bin/dspace checker -c 10
This is particularly useful for checking that the checker is executing properly. The Checksum Checker's default execution mode is to check a single bitstream, as if the option was -c 1
Duration mode: +
In the above example we threw in all the arguments. This would add the metadata and engage the workflow, notification, and templates to all be applied to the items that are being added.
The csv files that this tool can import and export abide by the RFC4180 CSV format http://www.ietf.org/rfc/rfc4180.txt. This means that new lines, and embedded commas can be included by wrapping elements in double quotes. Double quotes can be included by using two double quotes. The code does all this for you, and any good csv editor such as Excel or OpenOffice will comply with this convention.
File Structure. The first row of the csv must define the metadata values that the rest of the csv represents. The first column must always be "id" which refers to the item'id. All other columns are optional. The other columns contain the dublin core metadata fields that the data is to reside.
A typical heading row looks like:
id,collection,dc.title,dc.contributor,dc.date.issued,etc,etc,etc.
Subsequent rows in the csv file relate to items. A typical row might look like:
350,2292,Item title,"Smith, John",2008
If you want to store multiple values for a given metadata element, they can be separated with the double-pipe '||' (or another character that you defined in your dspace.cfg
file. For example:
Horses||Dogs||Cats
Elements are stored in the database in the order that they appear in the csv file. You can use this to order elements where order may matter, such as authors, or controlled vocabulary such as Library of Congress Subject Headings.
When importing a csv file, the importer will overlay the data onto what is already in the repository to determine the differences. It only acts on the contents of the cvs file, rather than on the complete item metadata. This means that the CSV file that is exported can be manipulated quite substantially before being re-imported. Rows (items) or Columns (metadata elements) can be removed and will be ignored. For example, if you only want to edit item abstracts, you can remove all of the other columns and just leave the abstract column. (You do need to leave the ID column intact. This is mandatory).
Deleting Data. It is possible to perform deletes across the board of certain metadata fields from an exported file. For example, let's say you have used keywords (dc.subject) that need to be removed en masse. You would leave the column (dc.subject) intact, but remove the data in the corresponding rows.
Migrating Data or Exchanging data.It is possbile that you have data in one Dublin Core (DC) element and you wish to really have it in another. An example would be that your staff have input Library of Congress Subject Headings in the Subject field (dc.subject) instead of the LCSH field (dc.subject.lcsh). Follow these steps and your data is migrated upon import:
Insert a new column. The first row should be the new metadata element. (We will refer to it as the TARGET)
Select the column/rows of the data you wish to change. (We will refer to it as the SOURCE)
Cut and paste this data into the new column (TARGET) you created in Step 1.
Leave the column (SOURCE) you just cut and pasteed from empty. Do not delete it.
Checksum Checker is program that can run to verify the checksum of every item within DSpace. Checksum Checker was designed with the idea that most System Administrators will run it from the cron. Depending on the size of the repository choose the options wisely.
Table 8.10. Checksum Checker Information Table
Command used: | [dspace] /bin/dspace checker |
Java class: | org.dspace.app.checker.ChecksumChecker |
Arguments short and (long) forms): | Description |
-L or --continuous | Loop continuously through the bitstreams |
-a or --handle | Specify a handle to check |
-b <bitstream-ids> | Space separated list of bitstream IDs |
-c or --count | Check count |
-d or --duration | Checking duration |
-h or --help | Calls online help |
-l or --looping | Loop once through bitstreams |
-p <prune> | Prune old results (optionally using specified properties file for configuration |
-v or --verbose | Report all processing |
There are three aspects of the Checksum Checker's operation that can be configured:
the execution mode
the logging output
the policy for removing old checksum results from the database
The user should refer to Chapter 5. Configuration for specific configuration beys in the dspace.cfg
file.
Execution mode can be configured using command line options. Information on the options are found in the previous table above. The different modes are described below.
Unless a particular bitstream or handle is specified, the Checksum Checker will always check bitstreams in order of the least recently checked bitstream. (Note that this means that the most recently ingested bitstreams will be the last ones checked by the Checksum Checker.)
Available command line options
Limited-count mode: [dspace]/bin/dspace checker -c
To check a specific number of bitstreams. The -c
option if followed by an integer, the number of bitstreams to check.
Example: [dspace/bin/dspace checker -c 10
This is particularly useful for checking that the checker is executing properly. The Checksum Checker's default execution mode is to check a single bitstream, as if the option was -c 1
Duration mode:
[dspace]/bin/dspace checker -d
To run the Check for a specific period of time with a time argument. You may use any of the time arguments below:
Example: [dspace/bin/dspace checker -d 2h
(Checker will run for 2 hours)
s | Seconds |
m | Minutes |
h | Hours |
d | Days |
w | Weeks |
y | Years |
The checker will keep starting new bitstream checks for the specific durations, so actual execution duration will be slightly longer than the specified duration. Bear this in mind when scheduling checks.
Specific Bistream mode:
[dspace]/bin/dspace checker -b
Checker will only look at the internal bitsteam IDs.
Example: [dspace]/bin/dspace checker -b 112 113 4567
Checker will only check bitstream IDs 112, 113 and 4567.
Specific Handle mode:
[dspace]/bin/dspace checker -a
Checkr will only check bistreams within the Community, Community or the item itself.
Example: [dspace]/bin/dspace checker -a 123456/999
Checker will only check this handle. If it is a Collection or Community, it will run through the entire Collection or Community.
The Check
Looping mode:
[dspace]/bin/dspace checker -l
or [dspace]/bin/dspace checker -L
There are two modes. The lowercase 'el' (-l) specifies to check every bitstream in the repository once. This is recommended for smaller repositories who are able to loop through all their content in just a few hours maximum. An uppercase 'L' (-L) specifies to continuously loops through the repository. This is not recommended for most repository systems.
Cron Jobs. For large repositories that cannot be completely checked in a couple of hours, we recommend the -d option in cron.
Pruning mode:
- [dspace]/bin/dspace checker -p
The Checksum Checker will store the result of every check in the checksum_histroy table. By default, successful checksum matches that are eight weeks old or older will be deleted when the -p option is used. (Unsuccessful ones will be retained indefinitel). Without this option, the retention settings are ignored and the database table may grow rather large!
As stated above in "Pruning mode", the checksum_history table can get rather large, and that running the checker with the -p assists in the size of the checksum_history being kept manageable. The amount of time for which results are retained in the checksum_history table can be modified by one of two methods:
Editing the retention policies in [dspace]/config/dspace.cfg
See Chapter 5 Configuration for the property keys.
OR
Pass in a properties file containting retention policies when using the -p option.
To do this, create a file with the following two property keys:
checker.retention.default = 10y -checker.retention.CHECKSUM_MATCH = 8wYou can use the table above for your time units.
At the command line:
[dspace]/bin/dspace checker -p retention_file_name <ENTER>
Checksum Checker uses log4j to report its results. By default it will report to a log called [dspace]/log/checker.log
, and it will report only on bitstreams for which the newly calculated checksum does not match the stored checksum. To report on all bitstreams checked regardless of outcome, use the -v
(verbose) command line option:
[dspace]/bin/dspace checker -l -v
(This will loop through the repository once and report in detail about every bitstream checked.
To change the location of the log, or to modify the prefix used on each line of output, edit the [dspace]/config/templates/log4j.properties
file and run [dspace]/bin/install_configs
.
You should schedule the Checksum Checker to run automatically, based on how frequently you backup your DSpace instance (and how long you keep those backups). The size of your repository is also a factor. For very large repositories, you may need to schedule it to run for an hour (e.g. -d 1h
option) each evening to ensure it makes it through your entire repository within a week or so. Smaller repositories can likely get by with just running it weekly.
Unix, Linux, or MAC OS. You can schedule it by adding a cron entry similar to the following to the crontab for the user who installed DSpace:
0 4 ** 0 [dspace]/bin/dspace checker -d2h -p
The above cron entry would schedule the checker to run the checker every Sunday at 400 (4:00 a.m.) for 2 hours. It also specifies to 'prune' the database based on the retention settings in dspace.cfg
.
Windows OS. You will be unable to use the checker shell script. Instead, you should use Windows Schedule Tasks to schedule the following command to run at the appropriate times:
''[dspace]''/bin/dsrun.bat org.dspace.app.checker.ChecksumChecker -d2h -p
(This command should appear on a single line).
Optionally, you may choose to receive automated emails listing the Checksum Checkers' results. Schedule it to run after the Checksum Checker has completed its processing (otherwise the email may not contain all the results).
Command used: | [dspace] /bin/dspace checker |
Java class: | org.dspace.checker.DailyReportEmailer |
Arguments short and (long) forms): | Description |
-a or --All | Send all the results (everything specified below) |
-d or --Deleted | Send E-mail report for all bitstreams set as deleted for today. |
-m or --Missing | Send E-mail report for all bitstreams not found in assetstore for today. |
-c or --Changed | Send E-mail report for all bitstrems where checksum has been changed for today. |
-u or --Unchanged | Send the Unchecked bitstream report. |
-n or --Not Processed | Send E-mail report for all bitstreams set to longer be processed for today. |
-h or --help | Help |
![]() | |
You can also combine options (e.g. -m -c) for combined reports. |
Cron. Follow the same steps above as you would running checker in cron. Change the time but match the regularity. Remember to schedule this **after** Checksum Checker has run.
If you have implemented the Embargo feature, you will need to run it periodically to check for Items with expired embargoes and lift them.
Table 8.11. Embargo Manager Command Table
Command used: | [dspace] /bin/dspace embargo-lifter |
Java class: | org.dspace.embargo.EmbargoManager |
Arguments short and (long) forms): | Description |
-c or --check | ONLY check the state of embargoed Items, do NOT lift any embargoes |
-i or --identifier | Process ONLY this handle identifier(s), which must be an Item. Can be repeated. |
-l or --lift | Only lift embargoes, do NOT check the state of any embargoed items. |
-n or --dryrun | Do no change anything in the data model, print message instead. |
-v or --verbose | Print a line describing the action taken for each embargoed item found. |
-q or --quiet | No output except upon error. |
-h or --help | Display brief help screen. |
You must run the Embargo Lifter task periodically to check for items with expired embargoes and lift them from being embargoed. For example, to check the status, at the CLI:
[dspace]/bin/dspace embargo-lifter -c
To lift the actual embargoes on those items that meet the time criteria, at the CLI:
[dspace]/bin/dspace embargo-lifter -l
To create all the various browse indexes that you define in the Configuration Section (Chapter 5) there are a variety of options available to you. You can see these options below in the command table.
Table 8.12. Browse Index Command Table
Command used: | [dspace] /bin/dspace index-init |
Java class: | org.dspace.browse.IndexBrowse |
Arguments short and long forms): | Description |
-r or --rebuild | Should we rebuild all the indexes, which removes old tables and creates new ones. For use with -f . Mutually exclusive with -d |
-s or --start | [-s <int>] start from this index number and work upwards (mostly only useful for debugging). For use with -t and -f |
-x or --execute | Execute all the remove and create SQL against the database. For use with -t and -f |
-i or --index | Actually do the indexing. Mutually exclusive with -t and -f . |
-o or --out | [-o<filename>] write the remove and create SQL to the given file. For use with -t and -f |
-p or --print | Write the remove and create SQL to the stdout. For use with -t and -f . |
-t or --tables | Create the tables only, do no attempt to index. Mutually exclusive with -f and -i |
-f or --full | Make the tables, and do the indexing. This forces -x . Mutually exclusive with -f and -i . |
-v or --verbose | Print extra information to the stdout. If used in conjunction with -p , you cannot use the stdout to generate your database structure. |
-d or --delete | Delete all the indexes, but do not create new ones. For use with -f . This is mutually exclusive with -r . |
-h or --help | Show this help documentation. Overrides all other arguments. |
Complete Index Regeneration. By running [dspace]/bin/dspace index-init
you will completely regenerate your indexes, tearing down all old tables and reconstructing with the new cofiguration. Running this is the same as:
[dspace]/bin/dsrun org.dspace.browse.IndexBrowse -f -r
Updating the Indexes. By running dspace/bin/dspace index-update
you will reindex your full browse wihtout modifying the table structure. (This should be your default approach if indexing, for example, via a cron job periodically). Running this is the same as:
[dspace]/bin/dsrun org.dspace.browse.IndexBrowse -i
Destroy and rebuild. You can destroy and rebuild the database, but do not do the indexing. Output the SQL to do this to the screen and a file, as well as executing it against the database, while being verbose. At the CLI screen:
[dspace]/bin/dsrun org.dspace.browse.IndexBrowse -r -t -p -v -x -o myfile.sql
DSpace provides robust browse indexing. It is possible to expand upon the default indexes delivered at the time of the installation. The System Administrator should review "Defining the Indexes" from the Chapter 5. Configuration to become familiar with the property keys and the definitions used therein before attempting heavy customizations.
Through customization is is possible to:
Add new browse indexes besides the four that are delivered upon installation. Examples:
Series
Specific subject fields (Library of Congress Subject Headings.(It is possible to create a browse index based on a controlled vocabulary or thesauris.)
Other metadata schema fields
Combine metadata fields into one browse
Combine different metadata schemas in one browse
Examples of new browse indexes that are possible. - (The system administrator is reminded to read the section on Defining the Indexes in Chapter 5. Configuration.)
Add a Series Browse. You want to add a new browse using a previously unused metadata element.
webui.browse.index.6 = series:metadata:dc.relation.ispartofseries:text:single
Note: the index # need to be adjusted to your browse stanza in the dspace.cfg
file. Also, you will need to update your Messages.properties
file.
Combine more than one metadata field into a browse. You may have other title fields used in your repository. You may only want one or two of them added, not all title fields. And/or you may want your series to file in there.
webui.browse.index.3 = title:metadata:dc.title,dc:title.uniform,dc:relation.ispartofseries:title:full
Separate subject browse. You may want to have a separate subject browse limited to only one type of subject.
webui.browse.index.7 = lcsubject.metdata:dc.subject.lcsh.text:single
As one can see, the choices are limited only by your metadata schema, the metadata, and your imagination.
![]() | |
Remember to run |
Copyright © 2002-2009
- The DSpace Foundation
+ [dspace]/bin/dspace checker -p
The Checksum Checker will store the result of every check in the checksum_histroy table. By default, successful checksum matches that are eight weeks old or older will be deleted when the -p option is used. (Unsuccessful ones will be retained indefinitel). Without this option, the retention settings are ignored and the database table may grow rather large!
As stated above in "Pruning mode", the checksum_history table can get rather large, and that running the checker with the -p assists in the size of the checksum_history being kept manageable. The amount of time for which results are retained in the checksum_history table can be modified by one of two methods:
Editing the retention policies in [dspace]/config/dspace.cfg
See Chapter 5 Configuration for the property keys.
OR
Pass in a properties file containting retention policies when using the -p option.
To do this, create a file with the following two property keys:
checker.retention.default = 10y +checker.retention.CHECKSUM_MATCH = 8wYou can use the table above for your time units.
At the command line:
[dspace]/bin/dspace checker -p retention_file_name <ENTER>
Checksum Checker uses log4j to report its results. By default it will report to a log called [dspace]/log/checker.log
, and it will report only on bitstreams for which the newly calculated checksum does not match the stored checksum. To report on all bitstreams checked regardless of outcome, use the -v
(verbose) command line option:
[dspace]/bin/dspace checker -l -v
(This will loop through the repository once and report in detail about every bitstream checked.
To change the location of the log, or to modify the prefix used on each line of output, edit the [dspace]/config/templates/log4j.properties
file and run [dspace]/bin/install_configs
.
You should schedule the Checksum Checker to run automatically, based on how frequently you backup your DSpace instance (and how long you keep those backups). The size of your repository is also a factor. For very large repositories, you may need to schedule it to run for an hour (e.g. -d 1h
option) each evening to ensure it makes it through your entire repository within a week or so. Smaller repositories can likely get by with just running it weekly.
Unix, Linux, or MAC OS. You can schedule it by adding a cron entry similar to the following to the crontab for the user who installed DSpace:
0 4 ** 0 [dspace]/bin/dspace checker -d2h -p
The above cron entry would schedule the checker to run the checker every Sunday at 400 (4:00 a.m.) for 2 hours. It also specifies to 'prune' the database based on the retention settings in dspace.cfg
.
Windows OS. You will be unable to use the checker shell script. Instead, you should use Windows Schedule Tasks to schedule the following command to run at the appropriate times:
''[dspace]''/bin/dsrun.bat org.dspace.app.checker.ChecksumChecker -d2h -p
(This command should appear on a single line).
Optionally, you may choose to receive automated emails listing the Checksum Checkers' results. Schedule it to run after the Checksum Checker has completed its processing (otherwise the email may not contain all the results).
Command used: | [dspace] /bin/dspace checker |
Java class: | org.dspace.checker.DailyReportEmailer |
Arguments short and (long) forms): | Description |
-a or --All | Send all the results (everything specified below) |
-d or --Deleted | Send E-mail report for all bitstreams set as deleted for today. |
-m or --Missing | Send E-mail report for all bitstreams not found in assetstore for today. |
-c or --Changed | Send E-mail report for all bitstrems where checksum has been changed for today. |
-u or --Unchanged | Send the Unchecked bitstream report. |
-n or --Not Processed | Send E-mail report for all bitstreams set to longer be processed for today. |
-h or --help | Help |
![]() | |
You can also combine options (e.g. -m -c) for combined reports. |
Cron. Follow the same steps above as you would running checker in cron. Change the time but match the regularity. Remember to schedule this **after** Checksum Checker has run.
If you have implemented the Embargo feature, you will need to run it periodically to check for Items with expired embargoes and lift them.
Table 8.11. Embargo Manager Command Table
Command used: | [dspace] /bin/dspace embargo-lifter |
Java class: | org.dspace.embargo.EmbargoManager |
Arguments short and (long) forms): | Description |
-c or --check | ONLY check the state of embargoed Items, do NOT lift any embargoes |
-i or --identifier | Process ONLY this handle identifier(s), which must be an Item. Can be repeated. |
-l or --lift | Only lift embargoes, do NOT check the state of any embargoed items. |
-n or --dryrun | Do no change anything in the data model, print message instead. |
-v or --verbose | Print a line describing the action taken for each embargoed item found. |
-q or --quiet | No output except upon error. |
-h or --help | Display brief help screen. |
You must run the Embargo Lifter task periodically to check for items with expired embargoes and lift them from being embargoed. For example, to check the status, at the CLI:
[dspace]/bin/dspace embargo-lifter -c
To lift the actual embargoes on those items that meet the time criteria, at the CLI:
[dspace]/bin/dspace embargo-lifter -l
To create all the various browse indexes that you define in the Configuration Section (Chapter 5) there are a variety of options available to you. You can see these options below in the command table.
Table 8.12. Browse Index Command Table
Command used: | [dspace] /bin/dspace index-init |
Java class: | org.dspace.browse.IndexBrowse |
Arguments short and long forms): | Description |
-r or --rebuild | Should we rebuild all the indexes, which removes old tables and creates new ones. For use with -f . Mutually exclusive with -d |
-s or --start | [-s <int>] start from this index number and work upwards (mostly only useful for debugging). For use with -t and -f |
-x or --execute | Execute all the remove and create SQL against the database. For use with -t and -f |
-i or --index | Actually do the indexing. Mutually exclusive with -t and -f . |
-o or --out | [-o<filename>] write the remove and create SQL to the given file. For use with -t and -f |
-p or --print | Write the remove and create SQL to the stdout. For use with -t and -f . |
-t or --tables | Create the tables only, do no attempt to index. Mutually exclusive with -f and -i |
-f or --full | Make the tables, and do the indexing. This forces -x . Mutually exclusive with -f and -i . |
-v or --verbose | Print extra information to the stdout. If used in conjunction with -p , you cannot use the stdout to generate your database structure. |
-d or --delete | Delete all the indexes, but do not create new ones. For use with -f . This is mutually exclusive with -r . |
-h or --help | Show this help documentation. Overrides all other arguments. |
Complete Index Regeneration. By running [dspace]/bin/dspace index-init
you will completely regenerate your indexes, tearing down all old tables and reconstructing with the new cofiguration. Running this is the same as:
[dspace]/bin/dsrun org.dspace.browse.IndexBrowse -f -r
Updating the Indexes. By running dspace/bin/dspace index-update
you will reindex your full browse wihtout modifying the table structure. (This should be your default approach if indexing, for example, via a cron job periodically). Running this is the same as:
[dspace]/bin/dsrun org.dspace.browse.IndexBrowse -i
Destroy and rebuild. You can destroy and rebuild the database, but do not do the indexing. Output the SQL to do this to the screen and a file, as well as executing it against the database, while being verbose. At the CLI screen:
[dspace]/bin/dsrun org.dspace.browse.IndexBrowse -r -t -p -v -x -o myfile.sql
DSpace provides robust browse indexing. It is possible to expand upon the default indexes delivered at the time of the installation. The System Administrator should review "Defining the Indexes" from the Chapter 5. Configuration to become familiar with the property keys and the definitions used therein before attempting heavy customizations.
Through customization is is possible to:
Add new browse indexes besides the four that are delivered upon installation. Examples:
Series
Specific subject fields (Library of Congress Subject Headings.(It is possible to create a browse index based on a controlled vocabulary or thesauris.)
Other metadata schema fields
Combine metadata fields into one browse
Combine different metadata schemas in one browse
Examples of new browse indexes that are possible. + (The system administrator is reminded to read the section on Defining the Indexes in Chapter 5. Configuration.)
Add a Series Browse. You want to add a new browse using a previously unused metadata element.
webui.browse.index.6 = series:metadata:dc.relation.ispartofseries:text:single
Note: the index # need to be adjusted to your browse stanza in the dspace.cfg
file. Also, you will need to update your Messages.properties
file.
Combine more than one metadata field into a browse. You may have other title fields used in your repository. You may only want one or two of them added, not all title fields. And/or you may want your series to file in there.
webui.browse.index.3 = title:metadata:dc.title,dc:title.uniform,dc:relation.ispartofseries:title:full
Separate subject browse. You may want to have a separate subject browse limited to only one type of subject.
webui.browse.index.7 = lcsubject.metdata:dc.subject.lcsh.text:single
As one can see, the choices are limited only by your metadata schema, the metadata, and your imagination.
![]() | |
Remember to run |
With the release of DSpace 1.6, new statistics software component was added. DSpace's use of SOLR for statics makes it possible to have a database of statistics. This in mind, there is the issue of the older log files and how a site can use them. The following command process is able to convert the existing log files and then import them for SOLR use. The user will need to perform this only once.
The Log Converter program converts log files from dspace.log into an intermediate format that can be inserted into SOLR.
Table 8.13. Log Converter Table
Command used: | [dspace] /bin/dspace log-converter |
Java class: | org.dspace.statistics.util.ClassicDSpaceLogConverter |
Arguments short and long forms): | Description |
-i or --in | Input file |
-o or --out | Output file |
-m or --multiple | Adds a wildcard at the end of input and output, so it would mean dspace.log* would be converted. (For example, the following files would be included because of this argument: dspace.log, dspace.log.1, dspace.log.2, dspace.log.3, etc.) |
-n or --newformat | If the log files have been created with DSpace 1.6 |
-v or --verbose | Display verbose ouput (helpful for debugging) |
-h or --help | Help |
The command loads the intermediate log files that have been created by the aforementioned script into SOLR.
Table 8.14. Log Import Table
Command used: | [dspace] /bin/dspace log-importer |
Java class: | org.dspace.statistics.util.StatisticsImporter |
Arguments (short and long forms): | Description |
-i or -- | input file |
-m or -- | Adds a wildcard at the end of the input, so it would mean dspace.log* would be imported |
-s or -- | To skip the reverse DNS lookups that work out where a user is from. (The DNS lookup finds the inforamtion about the host from its IP addess, such as geographical location, etc. This can be slow, and wouldn't work on a server not connected to the internet.) |
-v or -- | Display verbose ouput (helpful for debugging) |
-l or -- | For developers: allows you to import a log file from another system, so because the handles won't exist, it looks up random items in your local system to add hits to instead. |
-h or -- | Help |
This command can be used at any time to test for Database connectivity. It will assist in troubleshooting PostgreSQL and Oracle connection issues with the datase.
Table 8.15. Test Database Command Table
Command used: | [dspace] /bin/dspace test-database |
Java class: | org.dspace.storage.rdbms.DatabaseManager |
Arguments (short and long forms): | Description |
- or -- | There are no arguments used at this time. |
Copyright © 2002-2010 + The DuraSpace Foundation
\ No newline at end of file +Licensed under a Creative Commons Attribution 3.0 United States License
\ No newline at end of file diff --git a/dspace/docs/html/ch09.html b/dspace/docs/html/ch09.html index 6dcd7a6c3e..2d0487e399 100644 --- a/dspace/docs/html/ch09.html +++ b/dspace/docs/html/ch09.html @@ -1,10 +1,10 @@ -Table of Contents
+
Table of Contents
Back to architecture overview -
DSpace uses a relational database to store all information about the organization of content, metadata about the content, information about e-people and authorization, and the state of currently-running workflows. The DSpace system also uses the relational database in order to maintain indices that users can browse.
+
DSpace uses a relational database to store all information about the organization of content, metadata about the content, information about e-people and authorization, and the state of currently-running workflows. The DSpace system also uses the relational database in order to maintain indices that users can browse.
Graphical visualization of the relational database
Most of the functionality that DSpace uses can be offered by any standard SQL database that supports transactions. Presently, the browse indices use some features specific to PostgreSQL and Oracle, so some modification to the code would be needed before DSpace would function fully with an alternative database back-end.
The org.dspace.storage.rdbms
package provides access to an SQL database in a somewhat simpler form than using JDBC directly. The main class is DatabaseManager
, which executes SQL queries and returns TableRow
or TableRowIterator
objects. The InitializeDatabase
class is used to load SQL into the database via JDBC, for example to set up the schema.
All calls to the Database Manager
require a DSpace Context
object. Example use of the database manager API is given in the org.dspace.storage.rdbms
package Javadoc.
The database schema used by DSpace is created by SQL statements stored in a directory specific to each supported RDBMS platform:
PostgreSQL schemas are in [dspace-source]/dspace/etc/postgres/
Oracle schemas are in [dspace-source]/dspace/etc/oracle/
database_schema.sql
. The schema SQL file also creates the two required e-person groups (Anonymous
and Administrator
) that are required for the system to function properly.Also in [dspace-source]/dspace/etc/[database]
are various SQL files called database_schema_1x_1y
. These contain the necessary SQL commands to update a live DSpace database from version 1.x
to 1.y
. Note that this might not be the only part of an upgrade process: see Updating a DSpace Installation for details.
The DSpace database code uses an SQL function getnextid
to assign primary keys to newly created rows. This SQL function must be safe to use if several JVMs are accessing the database at once; for example, the Web UI might be creating new rows in the database at the same time as the batch item importer. The PostgreSQL-specific implementation of the method uses SEQUENCES
for each table in order to create new IDs. If an alternative database backend were to be used, the implementation of getnextid
could be updated to operate with that specific DBMS.
The etc
directory in the source distribution contains two further SQL files. clean-database.sql
contains the SQL necessary to completely clean out the database, so use with caution! The Ant target clean_database
can be used to execute this. update-sequences.sql
contains SQL to reset the primary key generation sequences to appropriate values. You'd need to do this if, for example, you're restoring a backup database dump which creates rows with specific primary keys already defined. In such a case, the sequences would allocate primary keys that were already used.
Versions of the *.sql*
files for Oracle are stored in [dspace-source]/dspace/etc/oracle
. These need to be copied over their PostgreSQL counterparts in [dspace-source]/dspace/etc
prior to installation.
When using PostgreSQL, it's a good idea to perform regular 'vacuuming' of the database to optimize performance. This is performed by the vacuumdb
command which can be executed via a 'cron' job, for example by putting this in the system crontab
:
+ The SQL (DDL) statements to create the tables for the current release, starting with an empty database, aer indatabase_schema.sql
. The schema SQL file also creates the two required e-person groups (Anonymous
andAdministrator
) that are required for the system to function properly.Also in
[dspace-source]/dspace/etc/[database]
are various SQL files calleddatabase_schema_1x_1y
. These contain the necessary SQL commands to update a live DSpace database from version 1.x
to 1.y
. Note that this might not be the only part of an upgrade process: see Updating a DSpace Installation for details.The DSpace database code uses an SQL function
getnextid
to assign primary keys to newly created rows. This SQL function must be safe to use if several JVMs are accessing the database at once; for example, the Web UI might be creating new rows in the database at the same time as the batch item importer. The PostgreSQL-specific implementation of the method usesSEQUENCES
for each table in order to create new IDs. If an alternative database backend were to be used, the implementation ofgetnextid
could be updated to operate with that specific DBMS.The
etc
directory in the source distribution contains two further SQL files.clean-database.sql
contains the SQL necessary to completely clean out the database, so use with caution! The Ant targetclean_database
can be used to execute this.update-sequences.sql
contains SQL to reset the primary key generation sequences to appropriate values. You'd need to do this if, for example, you're restoring a backup database dump which creates rows with specific primary keys already defined. In such a case, the sequences would allocate primary keys that were already used.Versions of the
*.sql*
files for Oracle are stored in[dspace-source]/dspace/etc/oracle
. These need to be copied over their PostgreSQL counterparts in[dspace-source]/dspace/etc
prior to installation.When using PostgreSQL, it's a good idea to perform regular 'vacuuming' of the database to optimize performance. This is performed by the
vacuumdb
command which can be executed via a 'cron' job, for example by putting this in the systemcrontab
:# clean up the database nightly 40 2 * * * /usr/local/pgsql/bin/vacuumdb --analyze dspace > /dev/null 2>&1 @@ -15,7 +15,7 @@ DELETE FROM epersongroup;After restoring a backup, you will need to reset the primary key generation sequences so that they do not produce already-used primary keys. Do this by executing the SQL in
[dspace-source]/dspace/etc/update-sequences.sql
, for example with:psql -U dspace -f [dspace-source]/dspace/etc/update-sequences.sql -
Future updates of DSpace may involve minor changes to the database schema. Specific instructions on how to update the schema whilst keeping live data will be included. The current schema also contains a few currently unused database columns, to be used for extra functionality in future releases. These unused columns have been added in advance to minimize the effort required to upgrade.
The database manager is configured with the following properties in dspace.cfg
:
+ Future updates of DSpace may involve minor changes to the database schema. Specific instructions on how to update the schema whilst keeping live data will be included. The current schema also contains a few currently unused database columns, to be used for extra functionality in future releases. These unused columns have been added in advance to minimize the effort required to upgrade. The database manager is configured with the following properties in
DSpace offers two means for storing content. The first is in the file system on the server. The second is using SRB (Storage Resource Broker). Both are achieved using a simple, lightweight API. SRB is purely an option but may be used in lieu of the server's file system or in addition to the file system. Without going into a full description, SRB is a very robust, sophisticated storage manager that offers essentially unlimited storage and straightforward means to replicate (in simple terms, backup) the content on other local or remote storage resources. The terms "store", "retrieve", "in the system", "storage", and so forth, used below can refer to storage in the file system on the server ("traditional") or in SRB. The The bitstream storage manager provides three methods that store, retrieve and delete bitstreams. Bitstreams are referred to by their 'ID'; that is the primary key As of DSpace version 1.1, there can be multiple bitstream stores. Each of these bitstream stores can be traditional storage or SRB storage. This means that the potential storage of a DSpace system is not bound by the maximum size of a single disk or file system and also that traditional and SRB storage can be combined in one DSpace installation. Both traditional and SRB storage are specified by configuration parameters. Also see Configuring the Bitstream Store below. Stores are numbered, starting with zero, then counting upwards. Each bitstream entry in the database has a store number, used to retrieve the bitstream when required. At the moment, the store in which new bitstreams are placed is decided using a configuration parameter, and there is no provision for moving bitstreams between stores. Administrative tools for manipulating bitstreams and stores will be provided in future releases. Right now you can move a whole store (e.g. you could move store number 1 from Bitstreams also have an 38-digit internal ID, different from the primary key ID of the bitstream table row. This is not visible or used outside of the bitstream storage manager. It is used to determine the exact location (relative to the relevant store directory) that the bitstream is stored in traditional or SRB storage. The first three pairs of digits are the directory path that the bitstream is stored under. The bitstream is stored in a file with the internal ID as the filename. For example, a bitstream with the internal ID + |
DSpace offers two means for storing content. The first is in the file system on the server. The second is using SRB (Storage Resource Broker). Both are achieved using a simple, lightweight API.
SRB is purely an option but may be used in lieu of the server's file system or in addition to the file system. Without going into a full description, SRB is a very robust, sophisticated storage manager that offers essentially unlimited storage and straightforward means to replicate (in simple terms, backup) the content on other local or remote storage resources.
The terms "store", "retrieve", "in the system", "storage", and so forth, used below can refer to storage in the file system on the server ("traditional") or in SRB.
The BitstreamStorageManager
provides low-level access to bitstreams stored in the system. In general, it should not be used directly; instead, use the Bitstream
object in the content management API since that encapsulated authorization and other metadata to do with a bitstream that are not maintained by the BitstreamStorageManager
.
The bitstream storage manager provides three methods that store, retrieve and delete bitstreams. Bitstreams are referred to by their 'ID'; that is the primary key bitstream_id
column of the corresponding row in the database.
As of DSpace version 1.1, there can be multiple bitstream stores. Each of these bitstream stores can be traditional storage or SRB storage. This means that the potential storage of a DSpace system is not bound by the maximum size of a single disk or file system and also that traditional and SRB storage can be combined in one DSpace installation. Both traditional and SRB storage are specified by configuration parameters. Also see Configuring the Bitstream Store below.
Stores are numbered, starting with zero, then counting upwards. Each bitstream entry in the database has a store number, used to retrieve the bitstream when required.
At the moment, the store in which new bitstreams are placed is decided using a configuration parameter, and there is no provision for moving bitstreams between stores. Administrative tools for manipulating bitstreams and stores will be provided in future releases. Right now you can move a whole store (e.g. you could move store number 1 from /localdisk/store
to /fs/anotherdisk/store
but it would still have to be store number 1 and have the exact same contents.
Bitstreams also have an 38-digit internal ID, different from the primary key ID of the bitstream table row. This is not visible or used outside of the bitstream storage manager. It is used to determine the exact location (relative to the relevant store directory) that the bitstream is stored in traditional or SRB storage. The first three pairs of digits are the directory path that the bitstream is stored under. The bitstream is stored in a file with the internal ID as the filename.
For example, a bitstream with the internal ID 12345678901234567890123456789012345678
is stored in the directory:
(assetstore dir)/12/34/56/12345678901234567890123456789012345678
The reasons for storing files this way are:
Using a randomly-generated 38-digit number means that the 'number space' is less cluttered than simply using the primary keys, which are allocated sequentially and are thus close together. This means that the bitstreams in the store are distributed around the directory structure, improving access efficiency.
The internal ID is used as the filename partly to avoid requiring an extra lookup of the filename of the bitstream, and partly because bitstreams may be received from a variety of operating systems. The original name of a bitstream may be an illegal UNIX filename.
When storing a bitstream, the BitstreamStorageManager
DOES set the following fields in the corresponding database table row:
bitstream_id
@@ -55,14 +55,14 @@ psql -U dspace -f
deleted
store_number
-
The remaining fields are the responsibility of the Bitstream
content management API class.
The bitstream storage manager is fully transaction-safe. In order to implement transaction-safety, the following algorithm is used to store bitstreams:
A database connection is created, separately from the currently active connection in the current DSpace context.
An unique internal identifier (separate from the database primary key) is generated.
The bitstream DB table row is created using this new connection, with the deleted
column set to true
.
The new connection is commit
ted, so the 'deleted' bitstream row is written to the database
The bitstream itself is stored in a file in the configured 'asset store directory', with a directory path and filename derived from the internal ID
The deleted
flag in the bitstream row is set to false
. This will occur (or not) as part of the current DSpace Context
.
This means that should anything go wrong before, during or after the bitstream storage, only one of the following can be true:
No bitstream table row was created, and no file was stored
A bitstream table row with deleted=true
was created, no file was stored
A bitstream table row with deleted=true
was created, and a file was stored
None of these affect the integrity of the data in the database or bitstream store.
Similarly, when a bitstream is deleted for some reason, its deleted
flag is set to true as part of the overall transaction, and the corresponding file in storage is not deleted.
The above techniques mean that the bitstream storage manager is transaction-safe. Over time, the bitstream database table and file store may contain a number of 'deleted' bitstreams. The cleanup
method of BitstreamStorageManager
goes through these deleted rows, and actually deletes them along with any corresponding files left in the storage. It only removes 'deleted' bitstreams that are more than one hour old, just in case cleanup is happening in the middle of a storage operation.
This cleanup can be invoked from the command line via the Cleanup
class, which can in turn be easily executed from a shell on the server machine using /dspace/bin/cleanup
. You might like to have this run regularly by cron
, though since DSpace is read-lots, write-not-so-much it doesn't need to be run very often.
The bitstreams (files) in traditional storage may be backed up very easily by simply 'tarring' or 'zipping' the assetstore
directory (or whichever directory is configured in dspace.cfg
). Restoring is as simple as extracting the backed-up compressed file in the appropriate location.
Similar means could be used for SRB, but SRB offers many more options for managing backup.
It is important to note that since the bitstream storage manager holds the bitstreams in storage, and information about them in the database, that a database backup and a backup of the files in the bitstream store must be made at the same time; the bitstream data in the database must correspond to the stored files.
Of course, it isn't really ideal to 'freeze' the system while backing up to ensure that the database and files match up. Since DSpace uses the bitstream data in the database as the authoritative record, it's best to back up the database before the files. This is because it's better to have a bitstream in storage but not the database (effectively non-existent to DSpace) than a bitstream record in the database but not storage, since people would be able to find the bitstream but not actually get the contents.
Both traditional and SRB bitstream stores are configured in dspace.cfg
.
Bitstream stores in the file system on the server are configured like this:
+
The remaining fields are the responsibility of the Bitstream
content management API class.
The bitstream storage manager is fully transaction-safe. In order to implement transaction-safety, the following algorithm is used to store bitstreams:
A database connection is created, separately from the currently active connection in the current DSpace context.
An unique internal identifier (separate from the database primary key) is generated.
The bitstream DB table row is created using this new connection, with the deleted
column set to true
.
The new connection is commit
ted, so the 'deleted' bitstream row is written to the database
The bitstream itself is stored in a file in the configured 'asset store directory', with a directory path and filename derived from the internal ID
The deleted
flag in the bitstream row is set to false
. This will occur (or not) as part of the current DSpace Context
.
This means that should anything go wrong before, during or after the bitstream storage, only one of the following can be true:
No bitstream table row was created, and no file was stored
A bitstream table row with deleted=true
was created, no file was stored
A bitstream table row with deleted=true
was created, and a file was stored
None of these affect the integrity of the data in the database or bitstream store.
Similarly, when a bitstream is deleted for some reason, its deleted
flag is set to true as part of the overall transaction, and the corresponding file in storage is not deleted.
The above techniques mean that the bitstream storage manager is transaction-safe. Over time, the bitstream database table and file store may contain a number of 'deleted' bitstreams. The cleanup
method of BitstreamStorageManager
goes through these deleted rows, and actually deletes them along with any corresponding files left in the storage. It only removes 'deleted' bitstreams that are more than one hour old, just in case cleanup is happening in the middle of a storage operation.
This cleanup can be invoked from the command line via the Cleanup
class, which can in turn be easily executed from a shell on the server machine using /dspace/bin/cleanup
. You might like to have this run regularly by cron
, though since DSpace is read-lots, write-not-so-much it doesn't need to be run very often.
The bitstreams (files) in traditional storage may be backed up very easily by simply 'tarring' or 'zipping' the assetstore
directory (or whichever directory is configured in dspace.cfg
). Restoring is as simple as extracting the backed-up compressed file in the appropriate location.
Similar means could be used for SRB, but SRB offers many more options for managing backup.
It is important to note that since the bitstream storage manager holds the bitstreams in storage, and information about them in the database, that a database backup and a backup of the files in the bitstream store must be made at the same time; the bitstream data in the database must correspond to the stored files.
Of course, it isn't really ideal to 'freeze' the system while backing up to ensure that the database and files match up. Since DSpace uses the bitstream data in the database as the authoritative record, it's best to back up the database before the files. This is because it's better to have a bitstream in storage but not the database (effectively non-existent to DSpace) than a bitstream record in the database but not storage, since people would be able to find the bitstream but not actually get the contents.
Both traditional and SRB bitstream stores are configured in dspace.cfg
.
Bitstream stores in the file system on the server are configured like this:
assetstore.dir = [dspace]/assetstore
(Remember that [dspace] is a placeholder for the actual name of your DSpace install directory).
The above example specifies a single asset store.
assetstore.dir = [dspace]/assetstore_0
assetstore.dir.1 = /mnt/other_filesystem/assetstore_1
The above example specifies two asset stores. assetstore.dir specifies the asset store number 0 (zero); after that use assetstore.dir.1, assetstore.dir.2 and so on. The particular asset store a bitstream is stored in is held in the database, so don't move bitstreams between asset stores, and don't renumber them.
By default, newly created bitstreams are put in asset store 0 (i.e. the one specified by the assetstore.dir property.) This allows backwards compatibility with pre-DSpace 1.1 configurations. To change this, for example when asset store 0 is getting full, add a line to dspace.cfg
like:
assetstore.incoming = 1 -
Then restart DSpace (Tomcat). New bitstreams will be written to the asset store specified by assetstore.dir.1
, which is /mnt/other_filesystem/assetstore_1
in the above example.
The same framework is used to configure SRB storage. That is, the asset store number (0..n) can reference a file system directory as above or it can reference a set of SRB account parameters. But any particular asset store number can reference one or the other but not both. This way traditional and SRB storage can both be used but with different asset store numbers. The same cautions mentioned above apply to SRB asset stores as well: The particular asset store a bitstream is stored in is held in the database, so don't move bitstreams between asset stores, and don't renumber them.
For example, let's say asset store number 1 will refer to SRB. The there will be a set of SRB account parameters like this:
+
Then restart DSpace (Tomcat). New bitstreams will be written to the asset store specified by assetstore.dir.1
, which is /mnt/other_filesystem/assetstore_1
in the above example.
The same framework is used to configure SRB storage. That is, the asset store number (0..n) can reference a file system directory as above or it can reference a set of SRB account parameters. But any particular asset store number can reference one or the other but not both. This way traditional and SRB storage can both be used but with different asset store numbers. The same cautions mentioned above apply to SRB asset stores as well: The particular asset store a bitstream is stored in is held in the database, so don't move bitstreams between asset stores, and don't renumber them.
For example, let's say asset store number 1 will refer to SRB. The there will be a set of SRB account parameters like this:
srb.host.1 = mysrbmcathost.myu.edu srb.port.1 = 5544 srb.mcatzone.1 = mysrbzone @@ -72,11 +72,10 @@ srb.username.1 = mysrbuser srb.password.1 = mysrbpassword srb.homedirectory.1 = /mysrbzone/home/mysrbuser.mysrbdomain srb.parentdir.1 = mysrbdspaceassetstore -
Several of the terms, such as mcatzone
, have meaning only in the SRB context and will be familiar to SRB users. The last, srb.parentdir.n
, can be used to used for addition (SRB) upper directory structure within an SRB account. This property value could be blank as well.
(If asset store 0 would refer to SRB it would be srb.host =
..., srb.port =
..., and so on (.0
omitted) to be consistent with the traditional storage configuration above.)
The similar use of assetstore.incoming
to reference asset store 0 (default) or 1..n (explicit property) means that new bitstreams will be written to traditional or SRB storage determined by whether a file system directory on the server is referenced or a set of SRB account parameters are referenced.
There are comments in dspace.cfg that further elaborate the configuration of traditional and SRB storage.
Copyright © 2002-2009 - The DSpace Foundation +
Several of the terms, such as mcatzone
, have meaning only in the SRB context and will be familiar to SRB users. The last, srb.parentdir.n
, can be used to used for addition (SRB) upper directory structure within an SRB account. This property value could be blank as well.
(If asset store 0 would refer to SRB it would be srb.host =
..., srb.port =
..., and so on (.0
omitted) to be consistent with the traditional storage configuration above.)
The similar use of assetstore.incoming
to reference asset store 0 (default) or 1..n (explicit property) means that new bitstreams will be written to traditional or SRB storage determined by whether a file system directory on the server is referenced or a set of SRB account parameters are referenced.
There are comments in dspace.cfg that further elaborate the configuration of traditional and SRB storage.
Copyright © 2002-2010 + The DuraSpace Foundation
\ No newline at end of file +Licensed under a Creative Commons Attribution 3.0 United States License
\ No newline at end of file diff --git a/dspace/docs/html/ch10.html b/dspace/docs/html/ch10.html index 06ac8ebfa2..9c481bc6b4 100644 --- a/dspace/docs/html/ch10.html +++ b/dspace/docs/html/ch10.html @@ -1,17 +1,17 @@ -Table of Contents
A complete DSpace installation consists of three separate directory trees:
This is where (surprise!) the source code lives. Note that the config files here are used only during the initial install process. After the install, config files should be changed in the install directory. It is referred to in this document as [dspace-source]
.
This directory is populated during the install process and also by DSpace as it runs. It contains config files, command-line tools (and the libraries necessary to run them), and usually--although not necessarily--the contents of the DSpace archive (depending on how DSpace is configured). After the initial build and install, changes to config files should be made in this directory. It is referred to in this document as [dspace]
.
This directory is generated by the web server the first time it finds a dspace.war file in its webapps directory. It contains the unpacked contents of dspace.war, i.e. the JSPs and java classes and libraries necessary to run DSpace. Files in this directory should never be edited directly; if you wish to modify your DSpace installation, you should edit files in the source directory and then rebuild. The contents of this directory aren't listed here since its creation is completely automatic. It is usually referred to in this document as [tomcat]/webapps/dspace
.
+
Table of Contents
A complete DSpace installation consists of three separate directory trees:
This is where (surprise!) the source code lives. Note that the config files here are used only during the initial install process. After the install, config files should be changed in the install directory. It is referred to in this document as [dspace-source]
.
This directory is populated during the install process and also by DSpace as it runs. It contains config files, command-line tools (and the libraries necessary to run them), and usually--although not necessarily--the contents of the DSpace archive (depending on how DSpace is configured). After the initial build and install, changes to config files should be made in this directory. It is referred to in this document as [dspace]
.
This directory is generated by the web server the first time it finds a dspace.war file in its webapps directory. It contains the unpacked contents of dspace.war, i.e. the JSPs and java classes and libraries necessary to run DSpace. Files in this directory should never be edited directly; if you wish to modify your DSpace installation, you should edit files in the source directory and then rebuild. The contents of this directory aren't listed here since its creation is completely automatic. It is usually referred to in this document as [tomcat]/webapps/dspace
.
[dspace-source]
dspace/
- Directory which contains all build and configuration information for DSpace
CHANGES
- Detailed list of code changes between versions.
KNOWN_BUGS
- Known bugs in the current version.
LICENSE
- DSpace source code license.
README
- Obligatory basic information file.
bin/
- Some shell and Perl scripts for running DSpace command-line tasks.
config/
- Configuration files:
controlled-vocabularies/
- Fixed, limited vocabularies used in metadata entry
crosswalks/
- Metadata crosswalks - property files or XSL stylesheets
dspace.cfg
- The Main DSpace configuration file (You will need to edit this).
dc2mods.cfg
- Mappings from Dublin Core metadata to MODS for the METS export.
default.license
- The default license that users must grant when submitting items.
dstat.cfg
, dstat.map
- Configuration for statistical reports.
input-forms.xml
- Submission UI metadata field configuration.
news-side.html
- Text of the front-page news in the sidebar, only used in JSPUI.
news-top.html
- Text of the front-page news in the top box, only used in teh JSPUI.
emails/
- Text and layout templates for emails sent out by the system.
registries/
- Initial contents of the bitstream format registry and Dublin Core element/qualifier registry. These are only used on initial system setup, after which they are maintained in the database.
docs/
- DSpace system documentation. The technical documentation for functionality, installation, configuration, etc.
etc/
-
This directory contains administrative files needed for the install process and by developers, mostly database initialization and upgrade scripts. Any .xml
files in etc/
are common to all supported database systems.
postgres/
- Versions of the database schema and updater SQL scripts for PostgreSQL.
oracle/
- Versions of the database schema and updater SQL scripts for Oracle.
modules/
- The Web UI modules "overlay" directory. DSpace uses Maven to automatically look here for any customizations you wish to make to DSpace Web interfaces.
jspui
- Contains all customizations for the JSP User Interface.
src/main/resources/
- The overlay for JSPUI Resources. This is the location to place any custom Messages.properties files. (Previously this file had been stored at: [dspace-source]/config/language-packs/Messages.properties
src/main/webapp/
- The overlay for JSPUI Web Application. This is the location to place any custom JSPs to be used by DSpace.
lni
- Contains all customizations for the Lightweight Network Interface.
oai
- Contains all customizations for the OAI-PMH Interface.
sword
- Contains all customizations for the SWORD (Simple Web-service Offering Repository Deposit) Interface.
xmlui
- Contains all customizations for the XML User Interface (aka Manakin).
src/main/webapp/
- The overlay for XMLUI Web Application. This is the location to place custom Themes or Configurations.
i18n/
- The location to place a custom version of the XMLUI's messages.xml (You have to manually create this folder)
themes/
- The location to place custom Themes for the XMLUI (You have to manually create this folder).
src/
- Maven configurations for DSpace System. This directory contains the Maven and Ant build files for DSpace.
target/
- (Only exists after building DSpace) This is the location Maven uses to build your DSpace installation package.
dspace-[version].dir
- The location of the DSpace Installation Package (which can then be installed by running ant update
)
Below is the basic layout of a DSpace installation using the default configuration. These paths can be configured if necessary.
+
Below is the basic layout of a DSpace installation using the default configuration. These paths can be configured if necessary.
[dspace]
assetstore/
- asset store files
bin/
- shell and Perl scripts
config/
- configuration, with sub-directories as above
handle-server/
- Handles server files
history/
- stored history files (generally RDF/XML)
lib/
- JARs, including dspace.jar, containing the DSpace classes
log/
- Log files
reports/
- Reports generated by statistical report generator
search/
- Lucene search index files
upload/
- temporary directory used during file uploads etc.
webapps/
- location where DSpace installs all Web Applications
DSpace's Ant build file creates a dspace-jspui-webapp/
directory with the following structure:
(top level dir)
The JSPs
+
DSpace's Ant build file creates a dspace-jspui-webapp/
directory with the following structure:
(top level dir)
The JSPs
WEB-INF/
web.xml
- DSpace JSPUI Web Application configuration and Servlet mappings
dspace-tags.tld
- DSpace custom tag descriptor
fmt.tld
- JSTL message format tag descriptor, for internationalization
lib/
- All the third-party JARs and pre-compiled DSpace API JARs needed to run JSPUI
classes/
- Any additional necessary class files
DSpace's Ant build file creates a dspace-xmlui-webapp/
directory with the following structure:
(top level dir)
aspects/
- Contains overarching Aspect Generator config and Prototype DRI (Digital Repository Interface) document for Manakin.
i18n/
- Internationalization / Multilingual support. Contains the messages.xml
English language pack by default.
themes/
- Contains all out-of-the-box Manakin themes
Classic/
- The classic theme, which makes the XMLUI look like classic DSpace
dri2xhtml/
- The base theme, which converts XMLUI DRI (Digital Repository Interface) format into XHTML for display
Reference/
- The default reference theme for XMLUI
template/
- A theme template...useful as a starting point for your own custom theme(s)
dri2xhtml.xsl
- The DRI-to-XHTML XSL Stylesheet. Uses the above 'dri2xhtml' theme to generate XHTML
themes.xmap
- The Theme configuration file. It determines which theme(s) are used by XMLUI
+
DSpace's Ant build file creates a dspace-xmlui-webapp/
directory with the following structure:
(top level dir)
aspects/
- Contains overarching Aspect Generator config and Prototype DRI (Digital Repository Interface) document for Manakin.
i18n/
- Internationalization / Multilingual support. Contains the messages.xml
English language pack by default.
themes/
- Contains all out-of-the-box Manakin themes
Classic/
- The classic theme, which makes the XMLUI look like classic DSpace
dri2xhtml/
- The base theme, which converts XMLUI DRI (Digital Repository Interface) format into XHTML for display
Reference/
- The default reference theme for XMLUI
template/
- A theme template...useful as a starting point for your own custom theme(s)
dri2xhtml.xsl
- The DRI-to-XHTML XSL Stylesheet. Uses the above 'dri2xhtml' theme to generate XHTML
themes.xmap
- The Theme configuration file. It determines which theme(s) are used by XMLUI
WEB-INF/
lib/
- All the third-party JARs and pre-compiled DSpace JARs needed to run XMLUI
classes/
- Any additional necessary class files
cocoon.xconf
- XMLUI's Apache Cocoon configuration
logkit.xconf
- XMLUI's Apache Cocoon Logging configuration
web.xml
- XMLUI Web Application configuration and Servlet mappings
The first source of potential confusion is the log files. Since DSpace uses a number of third-party tools, problems can occur in a variety of places. Below is a table listing the main log files used in a typical DSpace setup. The locations given are defaults, and might be different for your system depending on where you installed DSpace and the third-party tools. The ordering of the list is roughly the recommended order for searching them for the details about a particular problem or error.
Table 10.1. DSpace Log File Locations
+ The first source of potential confusion is the log files. Since DSpace uses a number of third-party tools, problems can occur in a variety of places. Below is a table listing the main log files used in a typical DSpace setup. The locations given are defaults, and might be different for your system depending on where you installed DSpace and the third-party tools. The ordering of the list is roughly the recommended order for searching them for the details about a particular problem or error. Table 10.1. DSpace Log File Locations
Copyright © 2002-2009 - The DSpace Foundation + |
the file [dspace]/config/log4j.properties
controls how and where log files are created. There are three sets of configurations in that file, called A1, A2, and A3. These are used to control the logs for DSpace, the checksum checker, and the XMLUI respectively. The important settings in this file are:
Table 10.2. log4j.properties Table
log4j.rootCategory=INFO,A +log4j.logger.org.dspace=INFO,A1 | These lines control what level of logging takes place. Normally they should be set to INFO, but if you need to see more information in the logs, set them to dEBUG and restart your web server |
log4j.appender.A1=org.dspace.app.util.DailyFileAppender | This is the name of the log file creation method used. The DailyFileAppender creates a new date-stamped file every day or month. |
log4j.appender.A1.File=${log.dir}/dspace.log | This sets the filename and location of where the log file will be stored. It iwll have a date stamp appended to the file name. |
log4j.appender.A1.DatePattern=yyy-MM-DD | This defines the format for the date stamp that is appended to the log file names. If you wish to have log files created monthyl instead of daily, change this to yyyy-MM |
log4j.appender.A1.MaxLogs=0 | This defines how many log files will be created. You may wish to define a retention period for log files. If you set this to 365, logs older than a year will be deleted. By default this is set to 0 so that no logs are ever deleted. Ensure that you monitor the disk space used by the logs to make sure that you have enough space for them. It is often important to keep the log files for a long time in case you want to rebuild your statistics. |
Copyright © 2002-2010 + The DuraSpace Foundation
\ No newline at end of file +Licensed under a Creative Commons Attribution 3.0 United States License
Table of Contents
The DSpace system is organized into three layers, each of which consists of a number of components.
+
Table of Contents
The DSpace system is organized into three layers, each of which consists of a number of components.
-
DSpace System Architecture
The storage layer is responsible for physical storage of metadata and content. The business logic layer deals with managing the content of the archive, users of the archive (e-people), authorization, and workflow. The application layer contains components that communicate with the world outside of the individual DSpace installation, for example the Web user interface and the Open Archives Initiative protocol for metadata harvesting service.
Each layer only invokes the layer below it; the application layer may not used the storage layer directly, for example. Each component in the storage and business logic layers has a defined public API. The union of the APIs of those components are referred to as the Storage API (in the case of the storage layer) and the DSpace Public API (in the case of the business logic layer). These APIs are in-process Java classes, objects and methods.
It is important to note that each layer is trusted. Although the logic for authorising actions is in the business logic layer, the system relies on individual applications in the application layer to correctly and securely authenticate e-people. If a 'hostile' or insecure application were allowed to invoke the Public API directly, it could very easily perform actions as any e-person in the system.
The reason for this design choice is that authentication methods will vary widely between different applications, so it makes sense to leave the logic and responsibility for that in these applications.
The source code is organized to cohere very strictly to this three-layer architecture. Also, only methods in a component's public API are given the public
access level. This means that the Java compiler helps ensure that the source code conforms to the architecture.
Table 11.1. Source Code Packages
+ DSpace System Architecture The storage layer is responsible for physical storage of metadata and content. The business logic layer deals with managing the content of the archive, users of the archive (e-people), authorization, and workflow. The application layer contains components that communicate with the world outside of the individual DSpace installation, for example the Web user interface and the Open Archives Initiative protocol for metadata harvesting service. Each layer only invokes the layer below it; the application layer may not used the storage layer directly, for example. Each component in the storage and business logic layers has a defined public API. The union of the APIs of those components are referred to as the Storage API (in the case of the storage layer) and the DSpace Public API (in the case of the business logic layer). These APIs are in-process Java classes, objects and methods. It is important to note that each layer is trusted. Although the logic for authorising actions is in the business logic layer, the system relies on individual applications in the application layer to correctly and securely authenticate e-people. If a 'hostile' or insecure application were allowed to invoke the Public API directly, it could very easily perform actions as any e-person in the system. The reason for this design choice is that authentication methods will vary widely between different applications, so it makes sense to leave the logic and responsibility for that in these applications. The source code is organized to cohere very strictly to this three-layer architecture. Also, only methods in a component's public API are given the Table 11.1. Source Code Packages
The above format allows the logs to be easily parsed and analysed. The
It's a good idea to 'nice' this log reporter to avoid an impact on server performance.
The content management API package Classes corresponding to the main elements in the DSpace data model ( Each class generally has one or more static
Context context = new Context(); Community existingCommunity = Community.find(context, 123); -Collection myNewCollection = existingCommunity.createCollection(); The primary reason for this is for determining authorization. In order to know whether an e-person may create an object, the system must know which container the object is to be added to. It makes no sense to create a collection outside of a community, and the authorization system does not have a policy for that.
In the previous chapter there is an overview of the item ingest process which should clarify the previous paragraph. Also see the section on the workflow system.
Classes whose name begins The The The When creating, modifying or for whatever reason removing data with the content management API, it is important to know when changes happen in-memory, and when they occur in the physical DSpace storage. Primarily, one should note that no change made using a particular Additionally, some changes made to objects only happen in-memory. In these cases, invoking the Some examples to illustrate this are shown below:
To support additional metadata schemas a new set of metadata classes have been added. These are backwards compatible with the DC classes and should be used rather than the DC specific classes whereever possible. Note that hierarchical metadata schemas are not currently supported, only flat schemas (such as DC) are able to be defined. The The Packager plugins let you ingest a package to create a new DSpace Object, and disseminate a content Object as a package. A package is simply a data stream; its contents are defined by the packager plugin's implementation. To ingest an object, which is currently only implemented for Items, the sequence of operations is:
The packager also takes a Here is an example package ingestion code fragment: Collection collection = find target collection
InputStream source = ...;
PackageParameters params = ...;
String license = null;
@@ -194,10 +194,10 @@ context.complete();
PackageIngester dip = (PackageDisseminator) PluginManager
.getNamedPlugin(PackageDisseminator.class, packageType);
- dip.disseminate(context, dso, params, destination);The PluginManager is a very simple component container. It creates and organizes components (plugins), and helps select a plugin in the cases where there are many possible choices. It also gives some limited control over the lifecycle of a plugin. The following terms are important in understanding the rest of this section:
The Plugin Manager supports three different patterns of usage:
Named plugins can get their names either from the configuration or, for a variant called self-named plugins, from within the plugin itself. Self-named plugins are necessary because one plugin implementation can be configured itself to take on many "personalities", each of which deserves its own plugin name. It is already managing its own configuration for each of these personalities, so it makes sense to allow it to export them to the Plugin Manager rather than expecting the plugin configuration to be kept in sync with it own configuration. An example helps clarify the point: There is a named plugin that does crosswalks, call it This XSLT-crosswalk plugin has its own configuration that maps a Plugin Name to a stylesheet -- it has to, since of course the Plugin Manager doesn't know anything about stylesheets. It becomes a self-named plugin, so that it reads its configuration data, gets the list of names to which it can respond, and passes those on to the Plugin Manager. When the Plugin Manager creates an instance of the XSLT-crosswalk, it records the Plugin Name that was responsible for that instance. The plugin can look at that Name later in order to configure itself correctly for the Name that created it. This mechanism is all part of the SelfNamedPlugin class which is part of any self-named plugin. The most common thing you will do with the Plugin Manager is obtain an instance of a plugin. To request a plugin, you must always specify the plugin interface you want. You will also supply a name when asking for a named plugin. A sequence plugin is returned as an array of See the getSinglePlugin(), getPluginSequence(), getNamedPlugin() methods. When For reasons that will become clear later, the manager actually caches a separate instance of an implementation class for each name under which it can be requested. You can ask the The Note that it only returns the plugin name, so if you need a more sophisticated or meaningful "label" (i.e. a key into the I18N message catalog) then you should add a method to the plugin itself to return that. Note: The The only downside of "on-demand" loading is that errors in the configuration don't get discovered right away. The solution is to call the The Here are the public methods, followed by explanations:
The PluginManager is a very simple component container. It creates and organizes components (plugins), and helps select a plugin in the cases where there are many possible choices. It also gives some limited control over the lifecycle of a plugin. The following terms are important in understanding the rest of this section:
The Plugin Manager supports three different patterns of usage:
Named plugins can get their names either from the configuration or, for a variant called self-named plugins, from within the plugin itself. Self-named plugins are necessary because one plugin implementation can be configured itself to take on many "personalities", each of which deserves its own plugin name. It is already managing its own configuration for each of these personalities, so it makes sense to allow it to export them to the Plugin Manager rather than expecting the plugin configuration to be kept in sync with it own configuration. An example helps clarify the point: There is a named plugin that does crosswalks, call it This XSLT-crosswalk plugin has its own configuration that maps a Plugin Name to a stylesheet -- it has to, since of course the Plugin Manager doesn't know anything about stylesheets. It becomes a self-named plugin, so that it reads its configuration data, gets the list of names to which it can respond, and passes those on to the Plugin Manager. When the Plugin Manager creates an instance of the XSLT-crosswalk, it records the Plugin Name that was responsible for that instance. The plugin can look at that Name later in order to configure itself correctly for the Name that created it. This mechanism is all part of the SelfNamedPlugin class which is part of any self-named plugin. The most common thing you will do with the Plugin Manager is obtain an instance of a plugin. To request a plugin, you must always specify the plugin interface you want. You will also supply a name when asking for a named plugin. A sequence plugin is returned as an array of See the getSinglePlugin(), getPluginSequence(), getNamedPlugin() methods. When For reasons that will become clear later, the manager actually caches a separate instance of an implementation class for each name under which it can be requested. You can ask the The Note that it only returns the plugin name, so if you need a more sophisticated or meaningful "label" (i.e. a key into the I18N message catalog) then you should add a method to the plugin itself to return that. Note: The The only downside of "on-demand" loading is that errors in the configuration don't get discovered right away. The solution is to call the The Here are the public methods, followed by explanations:
A named plugin implementation must extend this class if it wants to supply its own Plugin Name(s). See Self-Named Plugins for why this is sometimes necessary. abstract class SelfNamedPlugin +
The order of the plugins in the array is the same as their class names in the configuration's value field. See the
See the
The names are NOT returned in any predictable order, so you may wish to sort them first. Note: Since a plugin may be bound to more than one name, the list of names this returns does not represent the list of plugins. To get the list of unique implementation classes corresponding to the names, you might have to eliminate duplicates (i.e. create a Set of classes).
A named plugin implementation must extend this class if it wants to supply its own Plugin Name(s). See Self-Named Plugins for why this is sometimes necessary. abstract class SelfNamedPlugin { // Your class must override this: // Return all names by which this plugin should be known. @@ -207,20 +207,20 @@ context.complete();// This is implemented by SelfNamedPlugin and should NOT be overridden. public String getPluginInstanceName(); -} public class PluginConfigurationError extends Error { public PluginConfigurationError(String message); } An error of this type means the caller asked for a single plugin, but either there was no single plugin configured matching that interface, or there was more than one. Either case causes a fatal configuration error. public class PluginInstantiationException extends RuntimeException { public PluginInstantiationException(String msg, Throwable cause) -} This exception indicates a fatal error when instantiating a plugin class. It should only be thrown when something unexpected happens in the course of instantiating a plugin, e.g. an access error, class not found, etc. Simply not finding a class in the configuration is not an exception. This is a All of the Plugin Manager's configuration comes from the DSpace Configuration Manager, which is a Java Properties map. You can configure these characteristics of each plugin:
This entry configures a Single Plugin for use with getSinglePlugin(): +} This exception indicates a fatal error when instantiating a plugin class. It should only be thrown when something unexpected happens in the course of instantiating a plugin, e.g. an access error, class not found, etc. Simply not finding a class in the configuration is not an exception. This is a All of the Plugin Manager's configuration comes from the DSpace Configuration Manager, which is a Java Properties map. You can configure these characteristics of each plugin:
This entry configures a Single Plugin for use with getSinglePlugin():
For example, this configures the class
This kind of configuration entry defines a Sequence Plugin, which is bound to a sequence of implementation classes. The key identifies the interface, and the value is a comma-separated list of classnames: plugin.sequence.interface = classname, ... The plugins are returned by For example, this entry configures Stackable Authentication with three implementation classes: plugin.sequence.org.dspace.eperson.AuthenticationMethod = \ + This kind of configuration entry defines a Sequence Plugin, which is bound to a sequence of implementation classes. The key identifies the interface, and the value is a comma-separated list of classnames: plugin.sequence.interface = classname, ... The plugins are returned by For example, this entry configures Stackable Authentication with three implementation classes: plugin.sequence.org.dspace.eperson.AuthenticationMethod = \ org.dspace.eperson.X509Authentication, \ org.dspace.eperson.PasswordAuthentication, \ - edu.mit.dspace.MITSpecialGroup There are two ways of configuring named plugins:
There are two ways of configuring named plugins:
Plugins are assumed to be reusable by default, so you only need to configure the ones which you would prefer not to be reusable. The format is as follows: + NOTE: Since there can only be one key with Plugins are assumed to be reusable by default, so you only need to configure the ones which you would prefer not to be reusable. The format is as follows:
For example, this marks the PDF plugin from the example above as non-reusable:
The Plugin Manager is very sensitive to mistakes in the DSpace configuration. Subtle errors can have unexpected consequnces that are hard to detect: for example, if there are two "plugin.single" entries for the same interface, one of them will be silently ignored. To validate the Plugin Manager configuration, call the
The Eventually, someone should develop a general configuration-file sanity checker for DSpace, which would just call Here are some usage examples to illustrate how the Plugin Manager works. The existing DSpace 1.3 MediaFilterManager implementation has been largely replaced by the Plugin Manager. The MediaFilter classes become plugins named in the configuration. Refer to the configuration guide for further details. The Plugin Manager is very sensitive to mistakes in the DSpace configuration. Subtle errors can have unexpected consequnces that are hard to detect: for example, if there are two "plugin.single" entries for the same interface, one of them will be silently ignored. To validate the Plugin Manager configuration, call the
The Eventually, someone should develop a general configuration-file sanity checker for DSpace, which would just call Here are some usage examples to illustrate how the Plugin Manager works. The existing DSpace 1.3 MediaFilterManager implementation has been largely replaced by the Plugin Manager. The MediaFilter classes become plugins named in the configuration. Refer to the configuration guide for further details. This shows how to configure and access a single anonymous plugin, such as the BitstreamDispatcher plugin: Configuration:
The following code fragment shows how dispatcher, the service object, is initialized and used: BitstreamDispatcher dispatcher = @@ -254,7 +254,7 @@ while (id != BitstreamDispatcher.SENTINEL) */ id = dispatcher.next(); -} This crosswalk plugin acts like many different plugins since it is configured with different XSL translation stylesheets. Since it already gets each of its stylesheets out of the DSpace configuration, it makes sense to have the plugin give PluginManager the names to which it answers instead of forcing someone to configure those names in two places (and try to keep them synchronized). NOTE: Remember how Here is the configuration file listing both the plugin's own configuration and the crosswalk.dissemination.stylesheet.DublinCore = xwalk/TESTDIM-2-DC_copy.xsl +} This crosswalk plugin acts like many different plugins since it is configured with different XSL translation stylesheets. Since it already gets each of its stylesheets out of the DSpace configuration, it makes sense to have the plugin give PluginManager the names to which it answers instead of forcing someone to configure those names in two places (and try to keep them synchronized). NOTE: Remember how Here is the configuration file listing both the plugin's own configuration and the crosswalk.dissemination.stylesheet.DublinCore = xwalk/TESTDIM-2-DC_copy.xsl crosswalk.dissemination.stylesheet.MODS = xwalk/mods.xsl plugin.selfnamed.org.dspace.content.metadata.DisseminationCrosswalk = \ @@ -285,7 +285,7 @@ plugin.selfnamed.org.dspace.content.metadata.DisseminationCrosswalk = \ return ConfigurationManager.getProperty(prefix + getPluginInstanceName()); } -} The Stackable Authentication mechanism needs to know all of the plugins configured for the interface, in the order of configuration, since order is significant. It gets a Sequence Plugin from the Plugin Manager. Refer to the Configuration Section on Stackable Authentication for further details. The primary classes are:
|