This is a simple protocol that supports submission of records to a library and asynchronous notification of the acceptance or rejection of those records.
An ingest server is a web service, operating on behalf of a library, that accepts from distributed submitters library content in the form of discrete records.
Our notion of "library" is very broad here, and includes any kind of database, repository, service, etc., that stores and exerts curatorial power over discrete records. A record may be any XML document that has a format acceptable to the library and that has a library-assigned identifier.
A submitter sends to an ingest server an ingest request containing a record and, optionally, contact and identification information related to the record and its source. In return, the submitter (synchronously) receives an ingest disposition. The disposition may be accepted, indicating the record was added to the library and in which case the record identifier assigned by the library is returned; or rejected, in which case a failure reason may be returned; or provisionally accepted, in which case the ultimate disposition will be sent asynchronously (i.e., at some future time) to a notification recipient identified by the submitter.
We first define the XML document formats utilized by the protocol.
For brevity namespace declarations have been elided in the definitions
below, but all XML elements should be understood to reside in
namespace "http://www.alexandria.ucsb.edu". sip.xsd [TBD] is an XML
schema that defines the XML formats; sip.dtd
[TBD] is a roughly equivalent XML DTD.
<ingest-properties>Ingest server properties. <notification-style>
is the server's notification style: if "synchronous",
ultimate record dispositions are always returned synchronously; if
"asynchronous", ultimate record dispositions are returned
synchronously or asynchronously depending on the errors encountered in
the record and the processing required. <formats>
lists one or more record formats accepted by the server. Each format
is expressed as the URL of the format's XML schema.
<!ELEMENT ingest-properties (notification-style,
formats)>
<!ELEMENT notification-style (#PCDATA)>
<!-- "synchronous" or "asynchronous" -->
<!ELEMENT formats (format+)>
<!ELEMENT format (#PCDATA)>
For example:
<ingest-properties>
<notification-style>synchronous</notification-style>
<formats>
<format>http://.../myschema.xsd</format>
</formats>
</ingest-properties>
<ingest-request>A request to ingest a single record into the library. The record,
enclosed within the <record> element, may be any
XML content, subject only to the restrictions that it appear as a
single XML element and that it adhere to one of the formats supported
by the ingest server. <source> describes the
source of the record for identification and contact purposes. Within
<source>, <submitter> describes,
by a name and email address, the record's source institution, project,
or person; <sequence> names the sequence of records
the submitted record is a member of; and
<source-identifier> is the source's identifier for
the record. All elements within <source>, and
<source> itself, are optional.
<notification-recipient>, if present, is the URL of
the notification recipient for the request.
<!ELEMENT ingest-request (source?,
notification-recipient?, record)>
<!ELEMENT source (submitter?, sequence?,
source-identifier?)>
<!ELEMENT submitter (name, email-address)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT email-address (#PCDATA)>
<!ELEMENT sequence (#PCDATA)>
<!ELEMENT source-identifier (#PCDATA)>
<!ELEMENT notification-recipient (#PCDATA)>
<!ELEMENT record ANY>
<!-- any single element -->
For example:
<ingest-request>
<source>
<submitter>
<name>Stanford DL Project</name>
<email-address>...@stanford.edu</email-address>
</submitter>
<sequence>Campus Buildings</sequence>
<source-identifier>145</source-identifier>
</source>
<notification-recipient>http://...</notification-recipient>
<record>
<ADL_gazetteer_entry xmlns="...">
...
</ADL_gazetteer_entry>
</record>
<ingest-request>
<ingest-disposition>The disposition of an ingest request. <source>
repeats the source information from the original request, to the
extent present. If the record was accepted,
<assigned-identifier> is the record's identifier as
assigned by the library. If the record was rejected,
<reason> says why.
<!ELEMENT ingest-disposition (source?, (accepted |
provisionally-accepted | rejected)>
<!ELEMENT accepted (assigned-identifier)>
<!ELEMENT assigned-identifier (#PCDATA)>
<!ELEMENT provisionally-accepted EMPTY>
<!ELEMENT rejected (reason?)>
<!ELEMENT reason (#PCDATA)>
For example:
<ingest-disposition>
<source>
<submitter>
<name>Stanford DL Project</name>
<email-address>...@stanford.edu</email-address>
</submitter>
<sequence>Campus Buildings</sequence>
<source-identifier>145</source-identifier>
</source>
<rejected>
<reason>Data value out of range...</reason>
</rejected>
</ingest-disposition>
All protocol operations are stateless. An ingest server provides the following two operations.
<-
get-properties()Returns the server's properties.
<-
ingest(ingest-request)Accepts and processes an ingest request. The return document indicates the disposition of the request; if the disposition is provisionally accepted, the ultimate disposition will be delivered at a future time to the notification recipient specified in the request.
A notification recipient provides the following operation.
notify(ingest-disposition)Accepts an ingest disposition. The disposition must not be provisionally accepted.
The SOAP binding of this
protocol is largely defined by the above XML formats. We need only
note that documents are passed using document-style encoding, and that
notification recipient URLs must use either the "http" or
"smtp" protocols, corresponding to the respective
well-known SOAP transport machanisms.
created 2003-02-01; last modified 2009-01-20 09:20