Greg Janée >
NGDA >
NGDA Second-year Goals and Tasks
NGDA Second-year Goals and Tasks
Facet I: Research
Goal— Characterize the "landscape" of
geospatial data from the perspective of long-term preservation.
Why— There's little awareness in the library community
of the complexity of geospatial data (either its intrinsic complexity
or its production-related complexity).
- Finish Banning's analyses of the USGS DOQ and DRG, CaSIL
shapefile, and NASA Landsat 7 products.
- Analyze preservation of remote-sensing and seismographic
geospatial data. Look at data sizes and production rates, data
processing levels, availability of format- and semantics-defining
specifications, ability to reproduce products from raw data, and the
needs of the scientific communities that use (and will continue to
use) the data. Interview data producers and consumers.
- Consider the role federal agencies play in preservation.
- Host a workshop?
- Write up and publish the results.
Facet II: Registry
Goal— Develop a working, populated
registry for format specifications and other semantics-defining
specifications. Why— This represents one of our two
principal strategies for long-term preservation.
- Develop a registry system that models dependencies between
specifications and enforces format "recoverability."
- Develop a web interface that supports incremental population and
long-term maintenance of the registry.
- Consider the implications of registry maintenance by a distributed
community.
- Populate the registry with all formats encountered in practice,
including dependent formats.
- As the registry is populated, develop a data model for formats
and a vocabulary of format relationships.
- Acquire and ingest (possibly embargoed) ESRI format
specifications.
- Participate in GDFR discussions.
Facet III: Archive
Goal— Develop an operational archive
and ingest system. Why— To archive at-risk content and
validate proposed approaches to long-term preservation.
- Complete development of the initial NGDA system.
- Implement validation and registry-related constraint checks.
- Evaluate Fedora as an archival platform.
- Investigate distributed storage systems and approaches.
Facet IV: Access
Goal— Develop multiple access
mechanisms for archived content. Why— Inaccessible
content is useless, and access is needed to make project
accomplishments visible.
- Develop a "simple" access mechanism such that archive objects are
located at canonical URLs (e.g.,
http://archive/id);
object manifests are retrievable as HTML documents; object components
are downloadable as MIME-typed files; and the archive as a whole is
crawlable by Internet search engines.
- ADL:
- Develop ADL ingest services. Automate index-building.
- Develop a crawler/mapper component that crawls archive collections
and maps and ingests them into ADL using the aforementioned
services.
- Provide OAI access to archive metadata.
Facet V: Content
Goal— Archive at-risk content.
Why— In addition to being the ultimate purpose of the
project, this provides needed feedback on the other facets.
- Complete ingest of the CaSIL (now Cal-Atlas) collections.
- Identify additional at-risk collections.
- Archive 'em.
- Develop a collection development policy.
- Define "at-risk."
- Consider value over time, urgency, and ephemerality.
Facet VI: Legal
Goal— Determine the legal
(contractual, copyright, and other) ramifications of long-term
archival of geospatial data. Why— It's a necessary
evil.
- Research the impact of CRADAs on
access to government data.
- Develop prototype provider/archive contract(s).
created 2006-01-25, last modified
2009-11-19 13:12