ADN Metadata Mapping

Here's a mapping from the ADN metadata standard to the ADL bucket, browse, and access metadata views. The mapping is based on the ADN standard as it has been interpreted and used by the DLESE collections.

In the following, we refer to ADN metadata fields both by name and by the XPath expressions of the corresponding XML elements. The latter tend to be verbose, so the following abbreviations are used:

R  =  /itemRecord
G  =  R/general
C  =  R/lifecycle/contributors/contributor
AP  =  C[@role="Author" or @role="Publisher"]
M  =  R/metaMetadata
T  =  R/technical
E  =  R/educational
GC  =  R/geospatialCoverages/geospatialCoverage
TC  =  R/temporalCoverages/timeAndPeriod

Bucket view

Below are mappings from 15 ADN metadata fields to the nine standard ADL buckets and three new, DLESE-specific buckets. Additional information about definitions of metadata fields and consolidation of textual values can be found under Bucket mapping notes at the end of this document. First, the nine ADL buckets:

adl:titles (textual)
Field:
Title
Element:
G/title

The ADN field Title (specifically, XML element G/title) maps to the adl:titles textual bucket.

adl:geographic-locations (spatial)
Field:
Overarching bounding box
Elements:
GC/boundBox/northCoord
GC/boundBox/southCoord
GC/boundBox/eastCoord
GC/boundBox/westCoord

The bounding box described by the above elements maps to either a box or a point (only if element GC/body/planet is "Earth", of course, which it currently is for all DLESE resources). The ADN detailed geometries (GC/detGeos/detGeo//*) could be mapped as well, but with little practical benefit. In the DLESE collections all detailed geometries are just points and boxes anyway, and currently only three resources have more than one detailed geometry.

adl:dates (temporal)
Field:
Time AD
Elements:
TC/timeInfo/timeAD/begin@date
TC/timeInfo/timeAD/end@date

The range described by the above elements maps to either a range or a single year, month, or day; if it maps to a range, the begin and end times must be normalized to have the same precision. The string "Present" must be mapped to a specific date (e.g., 9999-12-31 if it appears as an end date).

adl:types (hierarchical)

No mapping is presently possible. The thesaurus associated with this bucket is insufficiently expressive.

adl:formats (hierarchical)

Given the limited thesaurus currently associated with this bucket, and given that all DLESE resources are online, the only possible mapping is a constant mapping to the term "Online".

adl:assigned-terms (textual)
Fields:
Subjects
Keywords
Elements:
G/subjects/subject
G/keywords/keyword

The "DLESE:" prefix should be removed from subjects.

adl:subject-related-text (textual)
Fields:
Description
Placenames
Event names
Named time periods
Elements:
G/description
GC/detGeos/detGeo/detPlaces/place/name
GC/boundBox/bbPlaces/place/name
GC/detGeos/detGeo/detEvents/event/name
GC/boundBox/bbEvents/event/name
TC/periods/period/name

In addition to the above, this bucket implicitly includes the mappings to the adl:titles and adl:assigned-terms buckets. Metadata values used in mapping placenames and event names should be drawn from the overarching bounding box only if there are no corresponding values among the detailed geometries.

adl:originators (textual)
Fields:
Author
Publisher
Elements:
AP/organization/instName
AP/person/nameFirst
AP/person/nameMiddle
AP/person/nameLast
AP/person/instName

In mapping a personal contributor, the name components and affiliation should be consolidated into a single value, as in "Donald E. Knuth, Stanford University".

adl:identifiers (identification)
Field:
Identification number
Element:
M/catalogEntries/catalog@entry

The namespace for all DLESE identifiers is "DLESE".

And here are the three DLESE-specific buckets:

dlese:grade-ranges (hierarchical)
Field:
Grade range
Element:
E/audiences/audience/gradeRange

The "DLESE:" prefix should be removed from grade ranges. The thesaurus associated with this bucket is "DLESE Grade Ranges", a flat list of 11 terms (nine grade ranges, e.g., "Middle school", plus two escape terms, "To be supplied" and "Not applicable").

dlese:resource-types (hierarchical)
Field:
Resource type
Element:
E/resourceTypes/resourceType

The "DLESE:" prefix should be removed from resource types. The thesaurus associated with this bucket is "DLESE Resource Types", a two-level thesaurus consisting of nine top-level terms (e.g., "Audio"), 62 second-level terms (e.g., "Audio:Radio broadcast"), and a tenth top-level term that serves as an escape, "To be supplied".

dlese:standards (hierarchical)
Field:
Content standard
Element:
E/contentStandards/contentStandard

There are multiple thesauri associated with this bucket, one for each content standard. In mapping content standard terms, the prefix that identifies the content standard (e.g., "NSES:") should be removed. The other types of ADN standards (i.e., process standards and teaching standards) could be mapped to this bucket as well, but as yet they are unpopulated by DLESE.

Browse view

DLESE resources do not have browse graphics, nor is ADN capable of describing them.

Access view

Fields:
Primary URL
Mirror URL
Rights description
Elements:
T/online/primaryURL
T/online/mirrorURLs/mirrorURL
R/rights/description

All URLs are treated as simple web interface access points. If there is just a primary URL, it is the root access point; if mirror URLs are present as well, the root access point is an alternatives access point that encompasses all URLs. The rights description is associated with the root access point in both cases.

The ADN elements T/online/mediums/medium (which is freetext, but in practice holds a MIME type) and T/online/size suggest that mapping to download access points may be possible. But in DLESE's usage the latter element is unpopulated, and the former element reflects the type of content one will encounter in navigating the resource's URL, not the type of the URL itself. For example, an HTML resource might have the medium "image/gif" because the HTML page contains inline GIF images. Thus DLESE resources are fundamentally web interfaces.

Element T/offline could be mapped to an offline access point, but as yet this element is unpopulated; all DLESE resources are online.

Bucket mapping notes

Some additional notes on mapping to buckets.

Buckets vs. fields. In mapping a metadata value to a bucket, the metadata source can be explicitly represented in the form of a single source metadata field identified by a URI and human-readable name. For example, here's a mapping from the ADN Title field to the adl:titles bucket:

<bucket name="adl:titles">
  <textual-value>
    <field name="[ADN] Title" uri="..."/>
    <text>Jules Map Server</text>
  </textual-value>
</bucket>

The functional value of this information is that it allows ADL to search across both overall buckets and individual fields. For example, referring to the mapping to the adl:assigned-terms bucket above, it is possible to search over all assigned terms (subject terms and keywords), or over subject terms only.

To maximize functionality, the fields mentioned in the mappings above were named primarily for their utility in searching and only secondarily for their adherence to ADN. This is most apparent in the adl:originators bucket, where we've listed Author and Publisher as fields even though in ADN these are characterized as roles that contributors can play.

Field URIs. ADL requires that metadata fields be unambiguously named by URIs, which ADN does not define. One approach to creating URIs is to concatenate the ADN namespace ("http://adn.dlese.org") with the absolute XPath expression of (one of) the XML elements that corresponds to the field (e.g., "/itemRecord/general/title"). This is not optimal because of the loose correspondence between ADN fields and XML elements and because of the verbosity of XPath expressions. Consider the URI for Publisher:

/itemRecord/lifecycle/contributors/contributor
[@role="Publisher"]/organization/instName

Consolidation of textual values. For non-textual buckets, including hierarchical buckets, multiple values for a metadata field (e.g., multiple time ranges or multiple resource types) must be and can only be mapped as separate bucket values. For example, this ADN fragment containing two time ranges:

<timeAD>
  <begin date="1995-10-04"/>
  <end date="1995-10-05"/>
</timeAD>
...
<timeAD>
  <begin date="1997-09-25"/>
  <end date="1997-09-25"/>
</timeAD>

is mapped to the adl:dates bucket as two bucket values:

<bucket name="adl:dates">
  <temporal-value>
    <field name="[ADN] Time AD" uri="..."/>
    <range>
      <begin>1995-10-04</begin>
      <end>1995-10-05</end>
    </range>
  </temporal-value>
  <temporal-value>
    <field name="[ADN] Time AD" uri="..."/>
    <date>1997-09-25</date>
  </temporal-value>
</bucket>

Mappings to textual buckets can be handled in the same manner. But another option, one that results in equivalent searchability but lessened XML verbosity, is to consolidate textual values from the same field into a single value by concatenating the values and separating them by semicolons. For example, this ADN fragment containing two subjects and two keywords:

<subjects>
  <subject>DLESE:Biology</subject>
  <subject>DLESE:Ecology</subject>
</subjects>
<keywords>
  <keyword>bioecological systems</keyword>
  <keyword>social complexity</keyword>
</keywords>

can be mapped to the adl:assigned-terms bucket as only two bucket values:

<bucket name="adl:assigned-terms">
  <textual-value>
    <field name="[ADN] Subjects" uri="..."/>
    <text>Biology; Ecology</text>
  </textual-value>
  <textual-value>
    <field name="[ADN] Keywords" uri="..."/>
    <text>bioecological systems; social complexity</text>
  </textual-value>
</bucket>

This approach was used in the mappings described in this document, and it accounts for the pluralization of certain ADN field names.

created 2003-10-06; last modified 2009-01-14 09:24