mirror of
https://github.com/DSpace/DSpace.git
synced 2025-10-07 10:04:21 +00:00
Remove Older HistoryManager which is no longer called in code. (Part of Event System Changes).
git-svn-id: http://scm.dspace.org/svn/repo/trunk@2078 9c30dcfa-912a-0410-8fc2-9e0234be79fd
This commit is contained in:
File diff suppressed because it is too large
Load Diff
@@ -1,244 +0,0 @@
|
||||
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
|
||||
<html>
|
||||
<head>
|
||||
<!--
|
||||
Author: Peter Breton
|
||||
Version: $Revision$
|
||||
Date: $Date$
|
||||
-->
|
||||
</head>
|
||||
<body bgcolor="white">
|
||||
Provides classes and methods to record information about changes in DSpace.
|
||||
The main class is {@link org.dspace.history.HistoryManager}.
|
||||
|
||||
<h2>Overview</h2>
|
||||
|
||||
<p>The purpose of the history subsystem is two-fold:</p>
|
||||
|
||||
<ul>
|
||||
<li>Capture a time-based record of significant changes in DSpace, in a
|
||||
manner suitable for later refactoring or repurposing</li>
|
||||
|
||||
<li>To provide a corpus of data suitable for research by HP Labs and
|
||||
other interested parties</li>
|
||||
</ul>
|
||||
|
||||
<p>
|
||||
Note that the history data is not expected to provide current
|
||||
information about the archive; it simply records what has happened in
|
||||
the past.
|
||||
</p>
|
||||
|
||||
<h2>Harmony Model</h2>
|
||||
|
||||
<p>
|
||||
The <a href="http://www.ilrt.bris.ac.uk/discovery/harmony/docs/abc/abc_draft.html">Harmony project</a>
|
||||
describes a simple and powerful approach for modeling temporal data.
|
||||
The DSpace history framework adopts this model.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
The Harmony model is used by the serialization mechanism (and
|
||||
ultimately by agents who interpret the serializations); users of the
|
||||
History API need not be aware of it.
|
||||
</p>
|
||||
|
||||
<h2>High-Level Approach</h2>
|
||||
|
||||
<p>
|
||||
When anything of archival interest occurs in DSpace, the saveHistory
|
||||
method of the HistoryManager is invoked. The parameters to the
|
||||
call are references to anything of archival interest.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
The history data component receives the objects of interest via
|
||||
method calls on the HistoryManager. (Note that this does not preclude other
|
||||
interested parties from acting on object as well). Upon reception
|
||||
of the object, it serializes the state of all archive objects referred
|
||||
to by it, and creates Harmony-style objects and associations to
|
||||
describe the relationships between the objects. (A simple example is
|
||||
given below). Note that each archive object must have a unique
|
||||
identifier to allow linkage between discrete events; this is discussed
|
||||
under <a href="#unique">Unique Ids</a> below.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
The serializations (including the Harmony objects and associations)
|
||||
are persisted to the filesystem, and marked as history data in the
|
||||
database.
|
||||
</p>
|
||||
|
||||
<h2>Archival Events</h2>
|
||||
|
||||
<p>
|
||||
Creating, modifying or deleting Community, Collection, Item, EPerson,
|
||||
WorkflowItem, or WorkspaceItem objects (including adding subobjects)
|
||||
are generally of archival interest.
|
||||
</p>
|
||||
|
||||
<h2>Serializations</h2>
|
||||
|
||||
<p>The serialization of an archival object consists of:</p>
|
||||
|
||||
<ul>
|
||||
<li>Its instance fields (ie, non-static, non-transient fields)
|
||||
<li>The serializations of associated objects (or references to these
|
||||
serializations).
|
||||
</ul>
|
||||
|
||||
<p>
|
||||
The implementation of serialization simply calls methods in the Content
|
||||
API.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
Version information for the serializer itself is included in
|
||||
the serialization.
|
||||
</p>
|
||||
|
||||
<a name="#unique"/>
|
||||
<h2>Unique Ids</h2>
|
||||
|
||||
<p>
|
||||
To be able to trace the history of an object, it is essential that the
|
||||
object have a unique identifier.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
After discussion, the unique identifiers are only weakly tied to the
|
||||
Handle system. Instead, the identifier consists of:
|
||||
</p>
|
||||
|
||||
<ul>
|
||||
<li> an identifer for the project</li>
|
||||
<li> a site id (using the handle prefix)</li>
|
||||
<li> an id (usually RDBMS-based) for objects</li>
|
||||
</ul>
|
||||
|
||||
<h2>Why Synchronization Is Not a Problem</h2>
|
||||
|
||||
<p>
|
||||
A classic problem with having data in two places is synchronization;
|
||||
it is no longer always clear which data source is authoritative.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
This is not a problem for the history data because:
|
||||
</p>
|
||||
|
||||
<ul>
|
||||
<li> The data is read-only; once generated, it is never changed</li>
|
||||
<li> The data is temporal, and so it is only expected to be correct as
|
||||
of the time when it was generated.</li>
|
||||
</ul>
|
||||
|
||||
<h2>Storage</h2>
|
||||
|
||||
<p>
|
||||
The History system stores serializations and an MD5 checksum for the
|
||||
serialization. When another object is serialized, the checksum for the
|
||||
serialization is matched against existing checksums for that
|
||||
object. If the checksum already exists, the object is not stored; a
|
||||
reference to the object is used instead.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
Note that since none of the serializations are deleted, ref counting
|
||||
is unnecessary.
|
||||
</p>
|
||||
|
||||
<h2>History Maps</h2>
|
||||
|
||||
<p>
|
||||
The history data is not initially stored in a queryable
|
||||
form. Nonetheless, it is a good idea to provide at least basic
|
||||
indications of what is stored, and where it is stored.
|
||||
</p>
|
||||
|
||||
Therefore the following simple RDBMS tables are used:
|
||||
|
||||
<pre>
|
||||
History table:
|
||||
history_id INTEGER PRIMARY KEY,
|
||||
-- When the history data was created (this data is also in the history!)
|
||||
timestamp TIMESTAMP
|
||||
|
||||
HistoryReference table:
|
||||
history_reference_id INTEGER PRIMARY KEY,
|
||||
-- Reference to the history
|
||||
history_id INTEGER FOREIGN KEY,
|
||||
-- Object Id
|
||||
object_id VARCHAR(64),
|
||||
</pre>
|
||||
|
||||
<p>
|
||||
One way to trace the history of an object would be to find all history
|
||||
serializations which refer to it (in the HistoryReference table), and
|
||||
unwind and interpret these. When the history data refers to a
|
||||
serialization of an object, use the History table to find the
|
||||
serialization.
|
||||
</p>
|
||||
|
||||
<h2>Example</h2>
|
||||
|
||||
<p>
|
||||
An item is submitted to a collection via bulk upload. When (and if)
|
||||
the Item is eventually added to the collection, the saveHistory
|
||||
method is called, with references to the Item, its Collection, the User who
|
||||
performed the bulk upload, and some indication of the fact that it was
|
||||
submitted via a bulk upload.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
When called, the HistoryManager does the following:
|
||||
It creates the following new resources (all with unique ids):
|
||||
</p>
|
||||
|
||||
<li> An event</li>
|
||||
<li> A state</li>
|
||||
<li> An action</li>
|
||||
|
||||
<p>
|
||||
It also generates the following relationships:
|
||||
</p>
|
||||
|
||||
<pre>
|
||||
event --atTime--> time
|
||||
event --hasOutput--> state
|
||||
Item --inState--> state
|
||||
state --contains--> Item
|
||||
action --creates--> Item
|
||||
event --hasAction--> action
|
||||
action --usesTool--> DSpace Upload
|
||||
action --hasAgent--> User
|
||||
</pre>
|
||||
|
||||
<p>
|
||||
The HistoryManager serializes the state of all archival objects
|
||||
involved (in this case, the Item, the User, and the DSpace Upload). It
|
||||
creates entries in the history map which associate the archival
|
||||
objects with the generated serializations.
|
||||
</p>
|
||||
|
||||
<h2>What History Data Is Not</h2>
|
||||
|
||||
|
||||
<p>
|
||||
History Data is not version control information. No effort has been
|
||||
made to provide diffs, merges, or highly efficient storage; instead,
|
||||
effort is focused on simple <em>remembrance</em>. Note that this does not
|
||||
preclude more sophisticated approaches later.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
History Data does not attempt to reconcile any contradictions in the
|
||||
data it serializes.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
History Data does not keep track of any kind of <em>current state</em>.
|
||||
</p>
|
||||
|
||||
</body>
|
||||
</html>
|
Reference in New Issue
Block a user