|
/ modules / paradigms / Adaptor_Concatenation.py
SYNOPSIS
Adaptor_Concatenation (paradigms,
mapping=TextUtils.mappings.nonAlphanumericToWhitespace,
deleteList=TextUtils.deleteLists.keepAll)
paradigms
A dictionary that maps one or more field URIs
(e.g., "http://purl.org/dc/elements/1.1/creator") to
underlying paradigms.
mapping
A Python character mapping table (i.e., a string of
length 256, indexed by ASCII character code) to process
constraint text with. Defaults to
'nonAlphanumericToWhitespace', which preserves
alphanumeric characters and maps all other characters to
whitespace (i.e., to word separators). Used only when
the constraint operator is "contains-all-words".
deleteList
A string of zero or more characters to delete from
constraint text. The default is the empty string, which
keeps all characters. Used only when the constraint
operator is "contains-all-words".
DESCRIPTION
An adaptor that adds support for bucket-level textual searching
to a set of paradigms (the "underlying" paradigms), each of
which supports a specific field-level textual search, by
treating a bucket-level search as a virtual search over the
logical concatenation of the field-level textual content.
Specifically, a field-level constraint matching one of the URIs
listed in 'paradigms' is passed through to the corresponding
underlying paradigm; if the paradigm does not support
field-level searching, it should treat the constraint as being
bucket-level. A field-level constraint not matching any listed
URI results in a query of the form
SELECT id FROM table
WHERE 1 = 0
being returned.
A bucket-level constraint (O, T), where O is a textual operator
and T is constraint text, is handled as follows. If O is
"contains-any-words" or "contains-phrase", the constraint is
passed to all underlying paradigms and the resulting queries are
UNIONed together. Otherwise, if O is "contains-all-words", this
paradigm parses T into one or more words (W1, W2, W3, ...) by:
1) deleting from T any characters that appear in 'deleteList';
2) mapping the remaining characters using 'mapping'; and 3)
treating sequences of whitespace characters as word separators.
For each word W this paradigm then passes a new constraint
(O, W) to each underlying paradigm and UNIONs the resulting
queries, and those UNIONs are then INTERSECTed. If underlying
paradigm i returns query Qi(W) on word W, then the overall
returned query has the form:
(Q1(W1) UNION Q2(W1) UNION Q3(W1) ...) INTERSECT
(Q1(W2) UNION Q2(W2) UNION Q3(W2) ...) INTERSECT ...
Exceptions thrown:
no query words specified
AUTHOR
Greg Janee
gjanee@alexandria.ucsb.edu
HISTORY
$Log: Adaptor_Concatenation.py,v $
Revision 1.2 2003/10/29 21:48:26 gjanee
Per revision 1.8 of UniversalTranslator.py, this paradigm now
invokes field-level methods of the underlying paradigms if the
latter support field-level searching. Unresolved issue: given a
*bucket-level* constraint, this paradigm still calls, for each
underlying paradigm, the underlying paradigm's bucket-level
methods, even if the underlying paradigm supports field-level
searching. It's not clear what the correct behavior is in this
case; perhaps it should be configurable. At any rate, nesting a
field adaptor inside this paradigm may lead to unexpected
behavior.
Revision 1.1 2002/11/04 22:46:40 gjanee
Initial revision
|