mirror of
https://github.com/DSpace/DSpace.git
synced 2025-10-08 10:34:25 +00:00
Remove Older HistoryManager which is no longer called in code. (Part of Event System Changes).
git-svn-id: http://scm.dspace.org/svn/repo/trunk@2078 9c30dcfa-912a-0410-8fc2-9e0234be79fd
This commit is contained in:
File diff suppressed because it is too large
Load Diff
@@ -1,244 +0,0 @@
|
|||||||
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
|
|
||||||
<html>
|
|
||||||
<head>
|
|
||||||
<!--
|
|
||||||
Author: Peter Breton
|
|
||||||
Version: $Revision$
|
|
||||||
Date: $Date$
|
|
||||||
-->
|
|
||||||
</head>
|
|
||||||
<body bgcolor="white">
|
|
||||||
Provides classes and methods to record information about changes in DSpace.
|
|
||||||
The main class is {@link org.dspace.history.HistoryManager}.
|
|
||||||
|
|
||||||
<h2>Overview</h2>
|
|
||||||
|
|
||||||
<p>The purpose of the history subsystem is two-fold:</p>
|
|
||||||
|
|
||||||
<ul>
|
|
||||||
<li>Capture a time-based record of significant changes in DSpace, in a
|
|
||||||
manner suitable for later refactoring or repurposing</li>
|
|
||||||
|
|
||||||
<li>To provide a corpus of data suitable for research by HP Labs and
|
|
||||||
other interested parties</li>
|
|
||||||
</ul>
|
|
||||||
|
|
||||||
<p>
|
|
||||||
Note that the history data is not expected to provide current
|
|
||||||
information about the archive; it simply records what has happened in
|
|
||||||
the past.
|
|
||||||
</p>
|
|
||||||
|
|
||||||
<h2>Harmony Model</h2>
|
|
||||||
|
|
||||||
<p>
|
|
||||||
The <a href="http://www.ilrt.bris.ac.uk/discovery/harmony/docs/abc/abc_draft.html">Harmony project</a>
|
|
||||||
describes a simple and powerful approach for modeling temporal data.
|
|
||||||
The DSpace history framework adopts this model.
|
|
||||||
</p>
|
|
||||||
|
|
||||||
<p>
|
|
||||||
The Harmony model is used by the serialization mechanism (and
|
|
||||||
ultimately by agents who interpret the serializations); users of the
|
|
||||||
History API need not be aware of it.
|
|
||||||
</p>
|
|
||||||
|
|
||||||
<h2>High-Level Approach</h2>
|
|
||||||
|
|
||||||
<p>
|
|
||||||
When anything of archival interest occurs in DSpace, the saveHistory
|
|
||||||
method of the HistoryManager is invoked. The parameters to the
|
|
||||||
call are references to anything of archival interest.
|
|
||||||
</p>
|
|
||||||
|
|
||||||
<p>
|
|
||||||
The history data component receives the objects of interest via
|
|
||||||
method calls on the HistoryManager. (Note that this does not preclude other
|
|
||||||
interested parties from acting on object as well). Upon reception
|
|
||||||
of the object, it serializes the state of all archive objects referred
|
|
||||||
to by it, and creates Harmony-style objects and associations to
|
|
||||||
describe the relationships between the objects. (A simple example is
|
|
||||||
given below). Note that each archive object must have a unique
|
|
||||||
identifier to allow linkage between discrete events; this is discussed
|
|
||||||
under <a href="#unique">Unique Ids</a> below.
|
|
||||||
</p>
|
|
||||||
|
|
||||||
<p>
|
|
||||||
The serializations (including the Harmony objects and associations)
|
|
||||||
are persisted to the filesystem, and marked as history data in the
|
|
||||||
database.
|
|
||||||
</p>
|
|
||||||
|
|
||||||
<h2>Archival Events</h2>
|
|
||||||
|
|
||||||
<p>
|
|
||||||
Creating, modifying or deleting Community, Collection, Item, EPerson,
|
|
||||||
WorkflowItem, or WorkspaceItem objects (including adding subobjects)
|
|
||||||
are generally of archival interest.
|
|
||||||
</p>
|
|
||||||
|
|
||||||
<h2>Serializations</h2>
|
|
||||||
|
|
||||||
<p>The serialization of an archival object consists of:</p>
|
|
||||||
|
|
||||||
<ul>
|
|
||||||
<li>Its instance fields (ie, non-static, non-transient fields)
|
|
||||||
<li>The serializations of associated objects (or references to these
|
|
||||||
serializations).
|
|
||||||
</ul>
|
|
||||||
|
|
||||||
<p>
|
|
||||||
The implementation of serialization simply calls methods in the Content
|
|
||||||
API.
|
|
||||||
</p>
|
|
||||||
|
|
||||||
<p>
|
|
||||||
Version information for the serializer itself is included in
|
|
||||||
the serialization.
|
|
||||||
</p>
|
|
||||||
|
|
||||||
<a name="#unique"/>
|
|
||||||
<h2>Unique Ids</h2>
|
|
||||||
|
|
||||||
<p>
|
|
||||||
To be able to trace the history of an object, it is essential that the
|
|
||||||
object have a unique identifier.
|
|
||||||
</p>
|
|
||||||
|
|
||||||
<p>
|
|
||||||
After discussion, the unique identifiers are only weakly tied to the
|
|
||||||
Handle system. Instead, the identifier consists of:
|
|
||||||
</p>
|
|
||||||
|
|
||||||
<ul>
|
|
||||||
<li> an identifer for the project</li>
|
|
||||||
<li> a site id (using the handle prefix)</li>
|
|
||||||
<li> an id (usually RDBMS-based) for objects</li>
|
|
||||||
</ul>
|
|
||||||
|
|
||||||
<h2>Why Synchronization Is Not a Problem</h2>
|
|
||||||
|
|
||||||
<p>
|
|
||||||
A classic problem with having data in two places is synchronization;
|
|
||||||
it is no longer always clear which data source is authoritative.
|
|
||||||
</p>
|
|
||||||
|
|
||||||
<p>
|
|
||||||
This is not a problem for the history data because:
|
|
||||||
</p>
|
|
||||||
|
|
||||||
<ul>
|
|
||||||
<li> The data is read-only; once generated, it is never changed</li>
|
|
||||||
<li> The data is temporal, and so it is only expected to be correct as
|
|
||||||
of the time when it was generated.</li>
|
|
||||||
</ul>
|
|
||||||
|
|
||||||
<h2>Storage</h2>
|
|
||||||
|
|
||||||
<p>
|
|
||||||
The History system stores serializations and an MD5 checksum for the
|
|
||||||
serialization. When another object is serialized, the checksum for the
|
|
||||||
serialization is matched against existing checksums for that
|
|
||||||
object. If the checksum already exists, the object is not stored; a
|
|
||||||
reference to the object is used instead.
|
|
||||||
</p>
|
|
||||||
|
|
||||||
<p>
|
|
||||||
Note that since none of the serializations are deleted, ref counting
|
|
||||||
is unnecessary.
|
|
||||||
</p>
|
|
||||||
|
|
||||||
<h2>History Maps</h2>
|
|
||||||
|
|
||||||
<p>
|
|
||||||
The history data is not initially stored in a queryable
|
|
||||||
form. Nonetheless, it is a good idea to provide at least basic
|
|
||||||
indications of what is stored, and where it is stored.
|
|
||||||
</p>
|
|
||||||
|
|
||||||
Therefore the following simple RDBMS tables are used:
|
|
||||||
|
|
||||||
<pre>
|
|
||||||
History table:
|
|
||||||
history_id INTEGER PRIMARY KEY,
|
|
||||||
-- When the history data was created (this data is also in the history!)
|
|
||||||
timestamp TIMESTAMP
|
|
||||||
|
|
||||||
HistoryReference table:
|
|
||||||
history_reference_id INTEGER PRIMARY KEY,
|
|
||||||
-- Reference to the history
|
|
||||||
history_id INTEGER FOREIGN KEY,
|
|
||||||
-- Object Id
|
|
||||||
object_id VARCHAR(64),
|
|
||||||
</pre>
|
|
||||||
|
|
||||||
<p>
|
|
||||||
One way to trace the history of an object would be to find all history
|
|
||||||
serializations which refer to it (in the HistoryReference table), and
|
|
||||||
unwind and interpret these. When the history data refers to a
|
|
||||||
serialization of an object, use the History table to find the
|
|
||||||
serialization.
|
|
||||||
</p>
|
|
||||||
|
|
||||||
<h2>Example</h2>
|
|
||||||
|
|
||||||
<p>
|
|
||||||
An item is submitted to a collection via bulk upload. When (and if)
|
|
||||||
the Item is eventually added to the collection, the saveHistory
|
|
||||||
method is called, with references to the Item, its Collection, the User who
|
|
||||||
performed the bulk upload, and some indication of the fact that it was
|
|
||||||
submitted via a bulk upload.
|
|
||||||
</p>
|
|
||||||
|
|
||||||
<p>
|
|
||||||
When called, the HistoryManager does the following:
|
|
||||||
It creates the following new resources (all with unique ids):
|
|
||||||
</p>
|
|
||||||
|
|
||||||
<li> An event</li>
|
|
||||||
<li> A state</li>
|
|
||||||
<li> An action</li>
|
|
||||||
|
|
||||||
<p>
|
|
||||||
It also generates the following relationships:
|
|
||||||
</p>
|
|
||||||
|
|
||||||
<pre>
|
|
||||||
event --atTime--> time
|
|
||||||
event --hasOutput--> state
|
|
||||||
Item --inState--> state
|
|
||||||
state --contains--> Item
|
|
||||||
action --creates--> Item
|
|
||||||
event --hasAction--> action
|
|
||||||
action --usesTool--> DSpace Upload
|
|
||||||
action --hasAgent--> User
|
|
||||||
</pre>
|
|
||||||
|
|
||||||
<p>
|
|
||||||
The HistoryManager serializes the state of all archival objects
|
|
||||||
involved (in this case, the Item, the User, and the DSpace Upload). It
|
|
||||||
creates entries in the history map which associate the archival
|
|
||||||
objects with the generated serializations.
|
|
||||||
</p>
|
|
||||||
|
|
||||||
<h2>What History Data Is Not</h2>
|
|
||||||
|
|
||||||
|
|
||||||
<p>
|
|
||||||
History Data is not version control information. No effort has been
|
|
||||||
made to provide diffs, merges, or highly efficient storage; instead,
|
|
||||||
effort is focused on simple <em>remembrance</em>. Note that this does not
|
|
||||||
preclude more sophisticated approaches later.
|
|
||||||
</p>
|
|
||||||
|
|
||||||
<p>
|
|
||||||
History Data does not attempt to reconcile any contradictions in the
|
|
||||||
data it serializes.
|
|
||||||
</p>
|
|
||||||
|
|
||||||
<p>
|
|
||||||
History Data does not keep track of any kind of <em>current state</em>.
|
|
||||||
</p>
|
|
||||||
|
|
||||||
</body>
|
|
||||||
</html>
|
|
Reference in New Issue
Block a user