Home » Server Options » Text & interMedia » Error trying to index a PDF file (10.2.4)
Error trying to index a PDF file [message #483805] Wed, 24 November 2010 10:22 Go to next message
kastania
Messages: 19
Registered: May 2007
Junior Member
We get an error when trying to index a PDF document(not all pdf documents)
DRG-11207: user filter command exited with status 1
DRG-11221: Third-party filter indicates this document is corrupted.

The documents opens with various PDF viewers, and the corruption is not visible to us.

the statement that creates the index is :
CREATE INDEX documents_ot_idx ON documents(document_binary) INDEXTYPE IS CTXSYS.CONTEXT PARAMETERS('SYNC(ON COMMIT) STORAGE INDEX_STORAGE lexer EQF_LEXER');


Our version is 10.2.4. I have read about AUTO_FILTER. We do not use AUTO_FILTER in the create statement. Could this be the solution?
If no filter is specified during index creation, is there any default filter used?

The document is PDF-1.6. If we save it to PDF-1.4 it is indexed successfully. But this is not an accepted solution.

I have found this in a forum: AUTO_FILTER in 10.2.0.3 already supports PDF 1.6, see Note 309154.1 but my metalink contract has expired and I cannot read it. Anyone who can????

[Updated on: Wed, 24 November 2010 10:30]

Report message to a moderator

Re: Error trying to index a PDF file [message #483809 is a reply to message #483805] Wed, 24 November 2010 10:59 Go to previous messageGo to next message
Barbara Boehmer
Messages: 9077
Registered: November 2002
Location: California, USA
Senior Member
Supposedly, auto_filter is the default, so theoretically you should not have to declare it, but I have found that sometimes it does make a difference, so you should try adding "filter ctxsys.auto_filter" to your index parameters. If you upgrade to 11g, the filtering is better.

Re: Error trying to index a PDF file [message #483810 is a reply to message #483809] Wed, 24 November 2010 11:03 Go to previous messageGo to next message
kastania
Messages: 19
Registered: May 2007
Junior Member
Unfortunattely we cannot migrate to 11g. Is there any other solution?
We and the client have the same version of Oracle Text. The same pdf is successfully indexed in our enviroment and not in theirs.... Who knows... I'll just tell them not to use 1.6 pdfs...
Re: Error trying to index a PDF file [message #483811 is a reply to message #483810] Wed, 24 November 2010 11:10 Go to previous messageGo to next message
Barbara Boehmer
Messages: 9077
Registered: November 2002
Location: California, USA
Senior Member
According to the online documentation, 1.6 is not supported in 10g. The following section of the online documentation lists some limitations. You might check that there is no problem with passwording or some such thing.

http://download.oracle.com/docs/cd/B19306_01/text.102/b14218/afilsupt.htm#sthref2463
Re: Error trying to index a PDF file [message #483812 is a reply to message #483811] Wed, 24 November 2010 11:21 Go to previous messageGo to next message
Barbara Boehmer
Messages: 9077
Registered: November 2002
Location: California, USA
Senior Member
Considering your other post, it sounds like you are having multiple things that work on your system, but not on your client's. I would start looking for what is different, even small seemingly unrelated things. For example, on one system it was found that the DBA had restricted select on the all_tables view. It turned out that, as part of an Oracle Text background process, there was a call to a routine that verifies schema and table names and such, to avoid SQL injection, that needed access to the all_tables view to do so, so it failed, but with an error message that gave no clue that was the problem. It was a lengthy, but interesting process to trace it down.
Re: Error trying to index a PDF file [message #483813 is a reply to message #483812] Wed, 24 November 2010 11:24 Go to previous messageGo to next message
Barbara Boehmer
Messages: 9077
Registered: November 2002
Location: California, USA
Senior Member
When making comparisons between systems, I would try to make it as simple as possible, such as running from SQL*Plus. If it works for your client through SQL*Plus, but not through something else, then the problem is within the something else. If it does not work through SQL*Plus for your client, then you need to look at things like versions of SQL*Plus, compatibility parameters, and permissions.
Re: Error trying to index a PDF file [message #483911 is a reply to message #483813] Thu, 25 November 2010 08:52 Go to previous messageGo to next message
kastania
Messages: 19
Registered: May 2007
Junior Member
Well according to your link 1.6 is not supported. I finally reactivated my Metalink account were I found note(ID 1120683.1) which states that: This is a bug in the Filtering technology used in 10.2.0.4.0. In Oracle database versions where the filtering technology has been changed from Verity 9.2 to OIT 8.2.X this problem has been resolved

The solution they recommend is to apply 10.2.0.5.0 patchset or Upgrade to 11.1.0.7.0 or 11.2.0.1.0. We will ask the client to apply he patch set, which is easier.

As far as my other post is concerned, you are right, running the queries from sqlplus does not fail. However running them from Java code, they fail. Those errors were on the query of the index of this post( that is PDF related), maybe this solution will solve my other problem.

It is difficult to debug the whole situation because the development is in another country, the client in another country and the database server in another one:) So the communication is sloooooww...

Just mentioning, that I have found another note on metalink that in few words said that another person had a peculiar problem which was not logged correctly and the explanation was that there weren't sufficient privileges on Oracle TMP directory were Oracle Text temporary stores the binary content(pdf) until it finishes their "processing". If nothing of the above works, I'll ask from the client to check that...

[Updated on: Thu, 25 November 2010 08:53]

Report message to a moderator

Re: Error trying to index a PDF file [message #484041 is a reply to message #483911] Fri, 26 November 2010 10:42 Go to previous messageGo to next message
kastania
Messages: 19
Registered: May 2007
Junior Member
The client gets an empty ORA-20000 message in 10.2.0.4.
It is a bug of 10.2.0.4 not to provide an error message according to metalink note ID 889805.1.
How can a get more info on that error?
I managed to reproduce an empty ORA-20000 error running the same query on my enviroment, but I do not know if it is the same error as it doesnt provide info.
I traced the sql statement that fails and the .trc file does not show any error. The same with the alert.log, it does not have any logs for the error.
Where is this info held???

Re: Error trying to index a PDF file [message #484052 is a reply to message #484041] Fri, 26 November 2010 12:13 Go to previous messageGo to next message
Barbara Boehmer
Messages: 9077
Registered: November 2002
Location: California, USA
Senior Member
Unfortunately, that information is obscured. The following is a simplified demonstration of what is happening. Suppose that somewhere within the Oracle Text code, certain values cause an error:

SCOTT@orcl_11gR2> declare
  2    v_num  number;
  3  begin
  4    select 1/0 into v_num from dual;
  5  end;
  6  /
declare
*
ERROR at line 1:
ORA-01476: divisor is equal to zero
ORA-06512: at line 4


What Oracle has done within its internal code is to obscure the actual error and line number with a generic message:

SCOTT@orcl_11gR2> declare
  2    v_num  number;
  3  begin
  4    select 1/0 into v_num from dual;
  5  exception
  6    when others then
  7  	 raise_application_error (-20000, null);
  8  end;
  9  /
declare
*
ERROR at line 1:
ORA-20000:
ORA-06512: at line 7


Only Oracle has access to their own source code, so they can temporarily disable the exception handling and see the actual error and line number, then work from there.

You need to be able to provide a small script that will consistently reproduce the error, in order to report it as a bug to Oracle, so they can look into it and provide a patch or workaround or at least let you know what the problem is. If you can provide such a script, you can also post it on the OTN Text forum, where Oracle Text product manager Roger Ford regularly responds. He has been known to take such test cases and provide the snippet of internal code that raises the error, so that the group of us that regularly respond have been able to eventually discover the root of the problem. The important part is that you narrow it down to a reproducible test case, so that we can produce it on our systems.

Re: Error trying to index a PDF file [message #484053 is a reply to message #484052] Fri, 26 November 2010 12:16 Go to previous messageGo to next message
Barbara Boehmer
Messages: 9077
Registered: November 2002
Location: California, USA
Senior Member
Some problems are common and known. If you can post the sql statement that produced the error, there might be something familiar that would stand out.
Re: Error trying to index a PDF file [message #484273 is a reply to message #484053] Mon, 29 November 2010 04:47 Go to previous messageGo to next message
kastania
Messages: 19
Registered: May 2007
Junior Member
I have created a new thread in http://forums.oracle.com/forums/thread.jspa?threadID=2136816&stqc=true
Re: Error trying to index a PDF file [message #508701 is a reply to message #484273] Tue, 24 May 2011 02:25 Go to previous message
desertman909y
Messages: 9
Registered: May 2011
Location: dubai
Junior Member
kastania... is rite i was also have same this problem i did this

[Updated on: Tue, 24 May 2011 02:25]

Report message to a moderator

Previous Topic: query with clob
Next Topic: Multi-lingual index for blob column
Goto Forum:
  


Current Time: Thu Mar 28 06:58:15 CDT 2024