|
An Experiment in Metadata MappingThe sample Bucket99 configuration files represent an experiment in forming an ADL collection by directly mapping existing metadata to the ADL buckets. That is, given a set of items having item-level metadata, we form the collection by mapping the metadata to the ADL buckets with little to no manipulation; the only work required is deciding which metadata fields map to which buckets. In this particular experiment the mapping and configuration were done manually, but we're anticipating the development of future components that will build collections by metadata harvesting, mapping, and ingest processes that are entirely automated. The collection in this experiment was a set of 2,851 USGS DRGs having FGDC metadata. The item-level FGDC metadata was derived from a single, comprehensive series-level FGDC record for the DRGs combined with a database of 13 small, item-level fields. This is perhaps an unfair experiment—other collections such as the DLESE collections feature metadata that is truly item-level—nevertheless, this type of collection is one that ADL has historically been targeted at and will continue to be. The following are some problems encountered. 1. Mapping entire metadata text can lead to false hitsDirectly mapping entire metadata fields to textual buckets can cause false hits because the text may contain words that, in the context of discovery, are misleading. For example, the FGDC Abstract field for the DRGs contains this sentence: The DRG can be used to collect, review, and revise other digital data, especially digital line graphs (DLG). This is certainly appropriate metadata for a DRG, but when the text
is mapped in its entirety to the 2. Mapping series-level metadata text can lead to false hitsAs mentioned above, the metadata for this collection was largely derived from series-level metadata, not metadata that is truly specific to the individual items. This leads to another kind of inappropriate text. The FGDC Abstract field also contains the sentence: The USGS is producing DRG's of the 1:24,000-, 1:24,000/1:25,000-, 1:63,360- (Alaska), 1:100,000-, and 1:250,000-scale topographic map series. Mapping this sentence to the 3. Poor support for metadata field URIsIn bucket mappings, source metadata fields are identified by URIs.
Dublin Core has assigned URIs to its metadata fields (e.g.,
4. Configuration problemsDuplication of information. A general problem with
ADL's configuration mechanism (not specific to metadata mapping) is
the overlap and outright duplication between the bucket configuration
related to report generation and that related to query translation.
Consider the configuration for just the
And here's the query translator configuration:
Sub-buckets cause duplication problems of a different kind.
Because the Python-based universal query translator has no intrinsic
support for sub-buckets, such bucket relationships must be implemented
manually. In particular, the configurations for the
These problems of duplication of information will be solved—or rather, hidden—by the universal collection driver because, for at least those collections under its control, configuration will be largely automated and therefore shielded from users. A solution for manually-configured collections could involve the use of some kind of meta-configuration file. Poor support for vocabularies. Supporting
hierarchical buckets (specifically, Greg Janée Created: 2004-10-01 Last modified: 2004-10-18 09:54 |