Remove Older HistoryManager which is no longer called in code. (Part of Event System Changes).

git-svn-id: http://scm.dspace.org/svn/repo/trunk@2078 9c30dcfa-912a-0410-8fc2-9e0234be79fd
This commit is contained in:
Mark Diggory
2007-07-20 20:02:53 +00:00
parent 1bbebf4026
commit 38382014b6
2 changed files with 0 additions and 1428 deletions

File diff suppressed because it is too large Load Diff

View File

@@ -1,244 +0,0 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<html>
<head>
<!--
Author: Peter Breton
Version: $Revision$
Date: $Date$
-->
</head>
<body bgcolor="white">
Provides classes and methods to record information about changes in DSpace.
The main class is {@link org.dspace.history.HistoryManager}.
<h2>Overview</h2>
<p>The purpose of the history subsystem is two-fold:</p>
<ul>
<li>Capture a time-based record of significant changes in DSpace, in a
manner suitable for later refactoring or repurposing</li>
<li>To provide a corpus of data suitable for research by HP Labs and
other interested parties</li>
</ul>
<p>
Note that the history data is not expected to provide current
information about the archive; it simply records what has happened in
the past.
</p>
<h2>Harmony Model</h2>
<p>
The <a href="http://www.ilrt.bris.ac.uk/discovery/harmony/docs/abc/abc_draft.html">Harmony project</a>
describes a simple and powerful approach for modeling temporal data.
The DSpace history framework adopts this model.
</p>
<p>
The Harmony model is used by the serialization mechanism (and
ultimately by agents who interpret the serializations); users of the
History API need not be aware of it.
</p>
<h2>High-Level Approach</h2>
<p>
When anything of archival interest occurs in DSpace, the saveHistory
method of the HistoryManager is invoked. The parameters to the
call are references to anything of archival interest.
</p>
<p>
The history data component receives the objects of interest via
method calls on the HistoryManager. (Note that this does not preclude other
interested parties from acting on object as well). Upon reception
of the object, it serializes the state of all archive objects referred
to by it, and creates Harmony-style objects and associations to
describe the relationships between the objects. (A simple example is
given below). Note that each archive object must have a unique
identifier to allow linkage between discrete events; this is discussed
under <a href="#unique">Unique Ids</a> below.
</p>
<p>
The serializations (including the Harmony objects and associations)
are persisted to the filesystem, and marked as history data in the
database.
</p>
<h2>Archival Events</h2>
<p>
Creating, modifying or deleting Community, Collection, Item, EPerson,
WorkflowItem, or WorkspaceItem objects (including adding subobjects)
are generally of archival interest.
</p>
<h2>Serializations</h2>
<p>The serialization of an archival object consists of:</p>
<ul>
<li>Its instance fields (ie, non-static, non-transient fields)
<li>The serializations of associated objects (or references to these
serializations).
</ul>
<p>
The implementation of serialization simply calls methods in the Content
API.
</p>
<p>
Version information for the serializer itself is included in
the serialization.
</p>
<a name="#unique"/>
<h2>Unique Ids</h2>
<p>
To be able to trace the history of an object, it is essential that the
object have a unique identifier.
</p>
<p>
After discussion, the unique identifiers are only weakly tied to the
Handle system. Instead, the identifier consists of:
</p>
<ul>
<li> an identifer for the project</li>
<li> a site id (using the handle prefix)</li>
<li> an id (usually RDBMS-based) for objects</li>
</ul>
<h2>Why Synchronization Is Not a Problem</h2>
<p>
A classic problem with having data in two places is synchronization;
it is no longer always clear which data source is authoritative.
</p>
<p>
This is not a problem for the history data because:
</p>
<ul>
<li> The data is read-only; once generated, it is never changed</li>
<li> The data is temporal, and so it is only expected to be correct as
of the time when it was generated.</li>
</ul>
<h2>Storage</h2>
<p>
The History system stores serializations and an MD5 checksum for the
serialization. When another object is serialized, the checksum for the
serialization is matched against existing checksums for that
object. If the checksum already exists, the object is not stored; a
reference to the object is used instead.
</p>
<p>
Note that since none of the serializations are deleted, ref counting
is unnecessary.
</p>
<h2>History Maps</h2>
<p>
The history data is not initially stored in a queryable
form. Nonetheless, it is a good idea to provide at least basic
indications of what is stored, and where it is stored.
</p>
Therefore the following simple RDBMS tables are used:
<pre>
History table:
history_id INTEGER PRIMARY KEY,
-- When the history data was created (this data is also in the history!)
timestamp TIMESTAMP
HistoryReference table:
history_reference_id INTEGER PRIMARY KEY,
-- Reference to the history
history_id INTEGER FOREIGN KEY,
-- Object Id
object_id VARCHAR(64),
</pre>
<p>
One way to trace the history of an object would be to find all history
serializations which refer to it (in the HistoryReference table), and
unwind and interpret these. When the history data refers to a
serialization of an object, use the History table to find the
serialization.
</p>
<h2>Example</h2>
<p>
An item is submitted to a collection via bulk upload. When (and if)
the Item is eventually added to the collection, the saveHistory
method is called, with references to the Item, its Collection, the User who
performed the bulk upload, and some indication of the fact that it was
submitted via a bulk upload.
</p>
<p>
When called, the HistoryManager does the following:
It creates the following new resources (all with unique ids):
</p>
<li> An event</li>
<li> A state</li>
<li> An action</li>
<p>
It also generates the following relationships:
</p>
<pre>
event --atTime--> time
event --hasOutput--> state
Item --inState--> state
state --contains--> Item
action --creates--> Item
event --hasAction--> action
action --usesTool--> DSpace Upload
action --hasAgent--> User
</pre>
<p>
The HistoryManager serializes the state of all archival objects
involved (in this case, the Item, the User, and the DSpace Upload). It
creates entries in the history map which associate the archival
objects with the generated serializations.
</p>
<h2>What History Data Is Not</h2>
<p>
History Data is not version control information. No effort has been
made to provide diffs, merges, or highly efficient storage; instead,
effort is focused on simple <em>remembrance</em>. Note that this does not
preclude more sophisticated approaches later.
</p>
<p>
History Data does not attempt to reconcile any contradictions in the
data it serializes.
</p>
<p>
History Data does not keep track of any kind of <em>current state</em>.
</p>
</body>
</html>