INGEST PROCEDURE: OUTLINE
INGEST PROCEDURE: OUTLINE
Metadata from raw to ready-to-search:Overview:Metadata will arrive into ADL's hands in various formats, styles, and types and degrees of cleanliness. We seek to encourage participants to format metadata before we receive it. Yet, some transformations will need to take place on most metadata before the metadata is ready to be inserted into the ADL catalog.
1. Perform preliminary crosswalk on the metadata.
- 1.1 Obtain description of dataset, with accompanying documentation.
- 1.2 Extract a representative sample of the records.
- 1.3 Prepare crosswalk based on the dataset's fields and how they match into ADL.
2. Extract, clean, organize and create metadata as needed. Revisit original crosswalk; update to reflect the metadata in hand.
- 2.1 Determine appropriate metadata manipulation tools and techniques.
- 2.1.1 Import metadata into MS Access, do manipulations and cleaning inside the program.
- 2.1.2 If it is more efficient to work in a Unix environment rather than in MS Access, export the metadata to an ASCII file. Parse/clean/change the metadata with perl, sql, unix line commands, or whichever tool suits the task.
- 2.2 Convert degrees, minutes, seconds into decimal degrees.
- 2.3 Create bounding box coordinates.
- 2.4 Generate footprints.
- 2.5 Check for coordinate errors.
- 2.6 Add fields. (e.g., browse files, file locations).
- 2.7 Remove unwanted characters such as delimiters, "/", ", etc.
- 2.8 Delimit fields with accepted character ("|").
- 2.9 Determine how to sort out parent records if applicable. Create parent/child records.
- 2.10 Decode coded fields.
- 2.11 Assign ADL holding id number to each record.
- 2.12 Attach MIL local call numbers to records.
- 2.12.1 Determine from shelflist or actual items which MIL holdings. Enter this information (and whatever else may be needed, e.g., area) into a database software such as Access.
- 2.12.2 Determine if any of fields, or any combination of fields, in the dataset match or compose the MIL local class number.
- 2.12.3 Use database resulting from step 2.12.1. Match against metadata dataset. For any metadata that match an item held by MIL, have local call number entered in that metadata record.
3. After formatting the metadata as much as possible, ADL will then place bucket metadata in a temp table structure in the ingest database. At this point, the metadata may need additional manipulations, transformations, or corrections.
- 3.1 Create temp table in Informix in order to ingest metadata into buckets.
- 3.1.1 Create fields as close as possible to desired datatype. Use int, varchar, and numeric in fields taking these datatypes.
- 3.2 Load parsed metadata into temp table in database.
4. Create full report for each record. The full report catalogs all the original metadata as well as source and access information.
5. Create access report for each record. The access report directs the user to the data associated with the metadata via online linkage urls.
6. Data checking will now take place. Q/A the data again, inside the temp table. When the metadata reaches a "clean" state, it will then be placed in the schema tables within the Informix database. At this point, the metadata should be formatted, correct, free of error, and matching field definitions. If not, return to data checking (Q/A) until the metadata is clean.
- 6.1 Q/A metadata now in the Informix database. Revisit crosswalk.
- 6.2. Correct metadata if necessary.
7. Document above steps.
8. Backup databases and text files regularly onto tape.
Contact:metadata@alexandria.sdc.ucsb.edu