ADL_query_translator modules/paradigms/Textual_MySQLFulltext.py

modules / paradigms / Textual_MySQLFulltext.py 


 SYNOPSIS

     Textual_MySQLFulltext (table, idColumn, textColumns, cardinality,
         regexpPhraseFilter=1,
         mapping=TextUtils.mappings.nonAlphanumericToWhitespace,
         deleteList=TextUtils.deleteLists.keepAll)

         table
             A table to query, e.g., "holding".

         idColumn
             The table's identifier column (i.e., the column to be
             selected), e.g., "holding_id".

         textColumns
             A single table column (e.g., "subject_text") or a list
             of one or more table columns (e.g., ["subject_text",
             "assigned_terms"]) containing the text to search over.

         cardinality
             A Cardinality object representing the cardinality of
             'table' with respect to the column or columns listed in
             'textColumns'.

         regexpPhraseFilter
             A boolean that indicates if "contains-phrase"
             constraints are to be translated as "contains-all-words"
             constraints conjoined with REGEXP conditions.  Defaults
             to true.  See below.

         mapping
             A Python character mapping table (i.e., a string of
             length 256, indexed by ASCII character code) to process
             constraint text with.  Defaults to
             'nonAlphanumericToWhitespace', which maps
             non-alphanumeric characters to whitespace (i.e., to word
             separators).

         deleteList
             A string of zero or more characters to delete from
             constraint text.  The default is the empty string, which
             keeps all characters.

 DESCRIPTION

     Translates a textual constraint to a MySQL fulltext index
     search.  The returned query has the general form

         SELECT idColumn FROM table
             WHERE MATCH (textColumns, ...)
             AGAINST ('expression' IN BOOLEAN MODE)

     where 'expression' is a string expression whose form depends on
     the constraint operator.  In the following, let W1, W2, W3, ...,
     Wn be the words formed from the constraint text T by 1) deleting
     from T any characters that appear in 'deleteList'; 2) mapping
     the remaining characters using 'mapping'; and 3) treating
     sequences of whitespace characters as word separators.  Then the
     query expression is:

         contains-any-words
             W1 W2 W3 ... Wn

         contains-all-words
             +W1 +W2 +W3 ... +Wn

         contains-phrase
             "W1 W2 W3 ... Wn"

     Note that MySQL's phrase matching (as of version 4.1.0alpha) is
     essentially simple substring matching, and thus will have poor
     recall performance unless the text in the table has been
     appropriately processed beforehand (namely, adjacent words
     within a phrase must be separated by exactly one space).  But if
     the 'regexpPhraseFilter' argument is true, and if the query
     phrase contains more than one word, then the returned query has
     the alternate form

         SELECT idColumn FROM table
             WHERE MATCH (textColumns, ...)
             AGAINST ('+W1 +W2 +W3 ... +Wn' IN BOOLEAN MODE) AND
             (textColumn1 REGEXP
              '[[:<:]]W1[[:space:]]+W2[[:space:]]+...Wn[[:>:]]'
              OR textColumn2 REGEXP
              '[[:<:]]W1[[:space:]]+W2[[:space:]]+...Wn[[:>:]]'
              OR ...)

     I.e., the REGEXP filter more forgivingly allows adjacent words
     within a phrase to be separated by one or more whitespace
     characters.

     The semantics of the "contains-all-words" operator will
     generally be correct only if the cardinality is "1" or "1?".  If
     the cardinality is "0+" or "1+", wrap this paradigm in an
     Adaptor_IndivisibleConcatenation paradigm.

     This paradigm assumes that the text processing specified by
     'mapping' and 'deleteList' is compatible with MySQL's notion of
     words, which it is, by default.

     Exceptions thrown:

         no query words specified

 AUTHOR

     Greg Janee
     gjanee@alexandria.ucsb.edu

 HISTORY

     $Log: Textual_MySQLFulltext.py,v $
     Revision 1.2  2003/12/15 23:54:18  peter
     Mondified source code documentation so that it formats properly when
     creating HTML documents with happydoc.

     Revision 1.1  2003/12/08 23:32:56  valentin
     update to oct2003

     Revision 1.1  2003/11/06 04:41:41  gjanee
     Initial revision

Imported Modules   

import UniversalTranslator
import edu.ucsb.adl.middleware
import paradigms
import string
import types

Functions   
  _formAnyExpression 
_formAnyExpression ( wordList )
  _formAllExpression 
_formAllExpression ( wordList )
  _formRegexp 
_formRegexp ( wordList )
  _formPhraseExpression 
_formPhraseExpression ( wordList )
  _protectRegexpSpecials 
_protectRegexpSpecials ( word )
Classes   

Textual_MySQLFulltext


This document was automatically generated Thu Mar 4 12:45:27 2004 by HappyDoc version WORKING