Home » Server Options » Text & interMedia » Problem with Fuzzy Query (10g)
Problem with Fuzzy Query [message #350357] Wed, 24 September 2008 16:59 Go to next message
formsdev
Messages: 6
Registered: June 2005
Junior Member
Hi,

I wrote the following query.
my indexes look like this. I have the lexer so that it takes away any . - , etc.

v_indexscript := 'create index fuzzy_idx_addr1 on gp(address1) indextype is ctxsys.context parameters (''WORDLIST Stem_Fuzzy_pref LEXER Fuzzy_Lexer'')';


c:= dbms_sql.open_cursor;
dbms_sql.parse(c,v_indexscript, dbms_sql.native);

n := dbms_sql.execute(c);
dbms_sql.close_cursor(c);

I have similar indexes on all the columns of my where clause.

This is the SQL where I pass everything as parameters including the score and variations.
select col1, col2 ......score(1)
from gp
where CONTAINS(firstname, 'fuzzy('||pFirstname||', '||pScore||','||pVariants||',weight)',1)>0
and CONTAINS(lastname, 'fuzzy('||pLastname||', '||pScore||','||pVariants||', weight)',2)> 0
and (pAddress1 is null or CONTAINS(address1, 'fuzzy('||replace(paddress1,' ','')||', '||pScore||','||pVariants||', weight)',3)>0)
and (pAddress2 is null or CONTAINS(address2, 'fuzzy('||replace(pAddress2,' ','')||', '||pScore||','||pVariants||', weight)',4)>0)
and (pCity is null or CONTAINS(city, 'fuzzy('||replace(pCity,' ','')||', '||pScore||','||pVariants||', weight)',5)>0)
and (pState is null or CONTAINS(state, 'fuzzy('||pState||', '||pScore||','||pVariants||', weight)',6)>0)
and (pZip is null or CONTAINS(zipcode, 'fuzzy('||pZip||', '||pScore||','||pVariants||', weight)',7)>0)
and (pPhone is null or CONTAINS(phone1, 'fuzzy('||replace(pPhone,' ','')||', '||pScore||','||pVariants||', weight)',8)>0)
and (pEmail is null or CONTAINS(email, 'fuzzy('||replace(pEmail,' ','')||', '||pScore||','||pVariants||', weight)',9)>0) ;


** I am using replace because the lexer doesnt seem to take away any spaces.
My table has this data

First Last addr1 addr2 city state zip phone email

SS VV 21.First St Los angeles CA 90001 123-456-7890 ss@yahoo.com
SS VV
SS VV
SS VV

I would like the above query to return all the 4 rows when I pass it SS,VV and 21.First St as pFirstname, pLastname and pAddress1.
Is there a way to do that?

thanks
Re: Problem with Fuzzy Query [message #350674 is a reply to message #350357] Thu, 25 September 2008 21:11 Go to previous message
Barbara Boehmer
Messages: 9077
Registered: November 2002
Location: California, USA
Senior Member
Why are you creating your indexes dynamically? This is generally unnecessary and a bad practice.

I am not sure what you mean about your lexer. I don't know if you have specified the period, hyphen, and comma as whitespace or printjoins or punctuation or what. It might help if you provided the code that you used to set the attributes for your lexer. That might be part of your problem.

You should be using a multi_column_datastore with one index and one contains clause using WITHIN. Multiple contains clauses may result in slow performance or errors.

I have provided a partial demo below of what you seem to need. You should be able to modify it to add your other fields.

SCOTT@orcl_11g> -- table:
SCOTT@orcl_11g> CREATE TABLE gp
  2    (first	    VARCHAR2 ( 5),
  3  	last	    VARCHAR2 ( 9),
  4  	addr1	    VARCHAR2 (11),
  5  	addr2	    VARCHAR2 ( 5),
  6  	city	    VARCHAR2 (11),
  7  	state	    VARCHAR2 ( 5),
  8  	zip	    VARCHAR2 (10),
  9  	phone	    VARCHAR2 (12),
 10  	email	    VARCHAR2 (12),
 11  	searchcols  VARCHAR2 ( 1))
 12  /

Table created.

SCOTT@orcl_11g> -- data:
SCOTT@orcl_11g> INSERT ALL
  2  INTO gp (first, last) VALUES ('SS', 'VV')
  3  INTO gp VALUES ('SS', 'VV', '21 First St', NULL, 'Los Angeles', 'CA', '90001', '123-456-7890', 'SS@yahoo.com', NULL)
  4  INTO gp (first, last) VALUES ('forms', 'dev')
  5  INTO gp (first, last) VALUES ('form', 'developer')
  6  INTO gp (first) VALUES ('farms')
  7  SELECT * FROM DUAL
  8  /

5 rows created.

SCOTT@orcl_11g> -- basic_wordlist, multi_column_datastore AND basic_section_group:
SCOTT@orcl_11g> begin
  2    ctx_ddl.create_preference('STEM_FUZZY_PREF', 'BASIC_WORDLIST');
  3    ctx_ddl.set_attribute('STEM_FUZZY_PREF','FUZZY_MATCH','ENGLISH');
  4    ctx_ddl.set_attribute('STEM_FUZZY_PREF','FUZZY_SCORE','0');
  5    ctx_ddl.set_attribute('STEM_FUZZY_PREF','FUZZY_NUMRESULTS','5000');
  6    ctx_ddl.set_attribute('STEM_FUZZY_PREF','SUBSTRING_INDEX','TRUE');
  7    ctx_ddl.set_attribute('STEM_FUZZY_PREF','STEMMER','ENGLISH');
  8  end;
  9  /

PL/SQL procedure successfully completed.

SCOTT@orcl_11g> BEGIN
  2    CTX_DDL.CREATE_PREFERENCE ('your_multi', 'MULTI_COLUMN_DATASTORE');
  3    CTX_DDL.SET_ATTRIBUTE ('your_multi', 'COLUMNS', 'first, last, addr1');
  4  END;
  5  /

PL/SQL procedure successfully completed.

SCOTT@orcl_11g> BEGIN
  2    CTX_DDL.CREATE_SECTION_GROUP ('your_sec_group', 'BASIC_SECTION_GROUP');
  3    CTX_DDL.ADD_FIELD_SECTION ('your_sec_group', 'first', 'first', TRUE);
  4    CTX_DDL.ADD_FIELD_SECTION ('your_sec_group', 'last', 'last', TRUE);
  5    CTX_DDL.ADD_FIELD_SECTION ('your_sec_group', 'addr1', 'addr1', TRUE);
  6  END;
  7  /

PL/SQL procedure successfully completed.

SCOTT@orcl_11g> -- index:
SCOTT@orcl_11g> CREATE INDEX fuzzy_idx_addr1 ON gp (searchcols)
  2  INDEXTYPE IS CTXSYS.CONTEXT
  3  PARAMETERS
  4    ('WORDLIST	stem_fuzzy_pref
  5  	 DATASTORE	your_multi
  6  	 SECTION GROUP	your_sec_group')
  7  /

Index created.

SCOTT@orcl_11g> -- procedure:
SCOTT@orcl_11g> CREATE OR REPLACE PROCEDURE your_proc
  2    (presult   OUT SYS_REFCURSOR,
  3  	pfirst	  IN  VARCHAR2 DEFAULT NULL,
  4  	plast	  IN  VARCHAR2 DEFAULT NULL,
  5  	paddress1 IN  VARCHAR2 DEFAULT NULL,
  6  	paddress2 IN  VARCHAR2 DEFAULT NULL)
  7  AS
  8  BEGIN
  9    OPEN presult FOR
 10    SELECT gp.*, SCORE (1)
 11    FROM   gp
 12    WHERE  CONTAINS
 13  		(searchcols,
 14  		 RTRIM
 15  		   (NVL2 (pfirst,    'FUZZY (' || REPLACE (pfirst,    ' ', ') FUZZY (')    || ') WITHIN first,' , NULL) ||
 16  		    NVL2 (plast,     'FUZZY (' || REPLACE (plast,     ' ', ') FUZZY (')     || ') WITHIN last ,' , NULL) ||
 17  		    NVL2 (paddress1, 'FUZZY (' || REPLACE (paddress1, ' ', ') FUZZY (') || ') WITHIN addr1,' , NULL) ||
 18  		    NVL2 (paddress2, 'FUZZY (' || REPLACE (paddress2, ' ', ') FUZZY (') || ') WITHIN addr2'  , NULL),
 19  		    ','),
 20  		 1) > 0
 21    ORDER  BY SCORE (1) DESC;
 22  END your_proc;
 23  /

Procedure created.

SCOTT@orcl_11g> SHOW ERRORS
No errors.
SCOTT@orcl_11g> -- execution:
SCOTT@orcl_11g> VARIABLE gresults REFCURSOR
SCOTT@orcl_11g> SET AUTOPRINT ON
SCOTT@orcl_11g> EXEC your_proc (:gresults, 'SS', 'VV', '21.First St')

PL/SQL procedure successfully completed.


FIRST LAST      ADDR1       ADDR2 CITY        STATE ZIP        PHONE        EMAIL        S   SCORE(1)
----- --------- ----------- ----- ----------- ----- ---------- ------------ ------------ - ----------
SS    VV        21 First St       Los Angeles CA    90001      123-456-7890 SS@yahoo.com           68
SS    VV                                                                                           35

SCOTT@orcl_11g> EXEC your_proc (:gresults, 'forms', 'dev')

PL/SQL procedure successfully completed.


FIRST LAST      ADDR1       ADDR2 CITY        STATE ZIP        PHONE        EMAIL        S   SCORE(1)
----- --------- ----------- ----- ----------- ----- ---------- ------------ ------------ - ----------
forms dev                                                                                          52
farms                                                                                               2
form  developer                                                                                     2

SCOTT@orcl_11g>





Previous Topic: CONTEXT INDEX and Tune LIKE clause (merged by bb and mc)
Next Topic: Near operator with Ctxcat index (split from CTXCAT / CONTEXT index by bb)
Goto Forum:
  


Current Time: Thu Mar 28 08:14:49 CDT 2024