|
/ modules / paradigms / Textual_LikeDelimitedSubstring.py
SYNOPSIS
Textual_LikeDelimitedSubstring (table, idColumn, textColumn,
delimiter, cardinality,
mapping=TextUtils.mappings.uppercaseAlphanumericOthersToWhitespace,
deleteList=TextUtils.deleteLists.keepAll, function=None)
table
A table to query, e.g., "holding".
idColumn
The table's object identifier column (i.e., the column
to be selected), e.g., "holding_id".
textColumn
The table column containing the text to search over
(i.e., the column against which the constraint is to be
placed), e.g., "subject_text".
delimiter
A single character that serves to delimit words in
'textColumn', e.g., "^".
cardinality
A Cardinality object representing the cardinality of
'table' with respect to 'textColumn'.
mapping
A Python character mapping table (i.e., a string of
length 256, indexed by ASCII character code) to process
constraint text with. Defaults to
'uppercaseAlphanumericOthersToWhitespace', which maps
alphanumeric characters to their uppercase equivalents
and all other characters to whitespace (i.e., to word
separators).
deleteList
A string of zero or more characters to delete from
constraint text. The default is the empty string, which
keeps all characters.
function
A function to apply to 'textColumn' (e.g., "UPPER"), or
None. Defaults to None.
DESCRIPTION
Translates a textual constraint to a boolean combination of one
or more substring matches using SQL LIKE operators.
This paradigm assumes that the text in 'textColumn' has been
encoded such that words are delimited by a common delimiter
character and phrases are separated by two or more delimiter
characters. For example, assuming the delimiter character is
"^", a column value containing the two phrases "I am Sam" and
"Sam I am" would be encoded as:
^I^am^Sam^^Sam^I^am^
Given a textual constraint (B, O, T) where B is a textual
bucket, O is one of the standard textual operators, and T is a
text string, this paradigm parses T into a sequence of one or
more words (W1, W2, W3, ...) by: 1) deleting from T any
characters that appear in 'deleteList'; 2) mapping the remaining
characters using 'mapping'; and 3) treating sequences of
whitespace characters as word separators. The paradigm then
returns one of the following queries (we use "^" here to
represent the delimiter character). If O is
"contains-all-words":
SELECT idColumn FROM table
WHERE textColumn LIKE '%^W1^%' AND
textColumn LIKE '%^W2^%' AND
textColumn LIKE '%^W3^%' ...
If O is "contains-any-words":
SELECT idColumn FROM table
WHERE textColumn LIKE '%^W1^%' OR
textColumn LIKE '%^W2^%' OR
textColumn LIKE '%^W3^%' ...
If O is "contains-phrase":
SELECT idColumn FROM table
WHERE textColumn LIKE '%^W1^W2^W3^...^%'
If a text column function is specified (e.g., "UPPER"), the
returned query will have the form:
SELECT idColumn FROM table
WHERE UPPER(textColumn) LIKE ...
Under certain circumstances the query
SELECT idColumn FROM table
WHERE 1 = 0
may be returned.
The semantics of the "contains-all-words" operator will
generally be correct only if the cardinality is "1" or "1?". If
the cardinality is "0+" or "1+", wrap this paradigm in an
Adaptor_IndivisibleConcatenation paradigm.
Exceptions thrown:
no query words specified
AUTHOR
Greg Janee
gjanee@alexandria.ucsb.edu
HISTORY
$Log: Textual_LikeDelimitedSubstring.py,v $
Revision 1.4 2003/10/21 20:34:37 gjanee
Minor (but critical) documentation change.
Revision 1.3 2003/01/29 21:13:50 gjanee
Recoded slightly to take advantage of new paradigm convenience
functions.
Revision 1.2 2003/01/24 04:14:12 gjanee
Minor update to conform to the "transparent immutable objects"
programming model. Fixed an obscure bug.
Revision 1.1 2002/10/31 22:32:13 gjanee
Initial revision
|