NGDA Second-year Goals and Tasks

Facet I: Research

Goal— Characterize the "landscape" of geospatial data from the perspective of long-term preservation. Why— There's little awareness in the library community of the complexity of geospatial data (either its intrinsic complexity or its production-related complexity).

  1. Finish Banning's analyses of the USGS DOQ and DRG, CaSIL shapefile, and NASA Landsat 7 products.
  2. Analyze preservation of remote-sensing and seismographic geospatial data. Look at data sizes and production rates, data processing levels, availability of format- and semantics-defining specifications, ability to reproduce products from raw data, and the needs of the scientific communities that use (and will continue to use) the data. Interview data producers and consumers.
    • Consider the role federal agencies play in preservation.
    • Host a workshop?
  3. Write up and publish the results.

Facet II: Registry

Goal— Develop a working, populated registry for format specifications and other semantics-defining specifications. Why— This represents one of our two principal strategies for long-term preservation.

  1. Develop a registry system that models dependencies between specifications and enforces format "recoverability."
  2. Develop a web interface that supports incremental population and long-term maintenance of the registry.
    • Consider the implications of registry maintenance by a distributed community.
  3. Populate the registry with all formats encountered in practice, including dependent formats.
  4. As the registry is populated, develop a data model for formats and a vocabulary of format relationships.
  5. Acquire and ingest (possibly embargoed) ESRI format specifications.
  6. Participate in GDFR discussions.

Facet III: Archive

Goal— Develop an operational archive and ingest system. Why— To archive at-risk content and validate proposed approaches to long-term preservation.

  1. Complete development of the initial NGDA system.
  2. Implement validation and registry-related constraint checks.
  3. Evaluate Fedora as an archival platform.
  4. Investigate distributed storage systems and approaches.

Facet IV: Access

Goal— Develop multiple access mechanisms for archived content. Why— Inaccessible content is useless, and access is needed to make project accomplishments visible.

  1. Develop a "simple" access mechanism such that archive objects are located at canonical URLs (e.g., http://archive/id); object manifests are retrievable as HTML documents; object components are downloadable as MIME-typed files; and the archive as a whole is crawlable by Internet search engines.
  2. ADL:
    1. Develop ADL ingest services. Automate index-building.
    2. Develop a crawler/mapper component that crawls archive collections and maps and ingests them into ADL using the aforementioned services.
  3. Provide OAI access to archive metadata.

Facet V: Content

Goal— Archive at-risk content. Why— In addition to being the ultimate purpose of the project, this provides needed feedback on the other facets.

  1. Complete ingest of the CaSIL (now Cal-Atlas) collections.
  2. Identify additional at-risk collections.
  3. Archive 'em.
  4. Develop a collection development policy.
    • Define "at-risk."
    • Consider value over time, urgency, and ephemerality.

Facet VI: Legal

Goal— Determine the legal (contractual, copyright, and other) ramifications of long-term archival of geospatial data. Why— It's a necessary evil.

  1. Research the impact of CRADAs on access to government data.
  2. Develop prototype provider/archive contract(s).

created 2006-01-25, last modified 2009-11-19 13:12