Home » RDBMS Server » Performance Tuning » How to Improve query performance
How to Improve query performance [message #582923] Tue, 23 April 2013 07:20 Go to next message
panyamravi
Messages: 11
Registered: February 2009
Junior Member

Hi,

All , I need to run a query on a very big table ( having 400+ million rows ) to extract few fields which are not present in another small table ( 27k rows ). Any suggestions on how could I improve the performance of this query?

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
PL/SQL Release 11.2.0.2.0 - Production
CORE    11.2.0.2.0      Production
TNS for Linux: Version 11.2.0.2.0 - Production
NLSRTL Version 11.2.0.2.0 - Production


Explain Plan for my query:

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 1730241303

----------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                  | Name                        | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     | Pstart| Pstop |
----------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |                             |   938M|    11G|       |  1837M  (1)|999:59:59 |       |   |
|   1 |  HASH UNIQUE               |                             |   938M|    11G|    17G|  1837M  (1)|999:59:59 |       |   |
|*  2 |   FILTER                   |                             |       |       |       |            |          |       |   |
|   3 |    PX COORDINATOR          |                             |       |       |       |            |          |       |   |
|   4 |     PX SEND QC (RANDOM)    | :TQ20000                    |   938M|    11G|       | 12239  (19)| 00:02:27 |       |   |
|   5 |      PX BLOCK ITERATOR     |                             |   938M|    11G|       | 12239  (19)| 00:02:27 |     1 |     1 |

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   6 |       TABLE ACCESS FULL    | US_SGSN_1                   |   938M|    11G|       | 12239  (19)| 00:02:27 |     1 |   746 |
|*  7 |    PX COORDINATOR          |                             |       |       |       |            |          |       |   |
|   8 |     PX SEND QC (RANDOM)    | :TQ10000                    | 81185 |       |       |     2   (0)| 00:00:01 |       |   |
|*  9 |      FILTER                |                             |       |       |       |            |          |       |   |
|  10 |       PX BLOCK ITERATOR    |                             | 81185 |       |       |     2   (0)| 00:00:01 |       |   |
|  11 |        INDEX FAST FULL SCAN| INDEX_LACIDCELLSACID_ES_LOC | 81185 |       |       |     2   (0)| 00:00:01 |       |   |
----------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------


PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   2 - filter( NOT EXISTS (SELECT 0 FROM "SAIDATA"."ES_LOCATION_1" "ES_LOCATION_1" WHERE NVL(:B1,(-1)) IS NOT NULL))
   7 - filter(NVL(:B1,(-1)) IS NOT NULL)
   9 - filter(NVL(:B1,(-1)) IS NOT NULL)

Note
-----
   - dynamic sampling used for this statement (level=4)

29 rows selected.


Query I used :

select DISTINCT colA from BIG_TABLE where nvl(colA,-1) not in (select nvl(colA,-1) from SMALL_TABLE ) .

Please note that colA does not have index on it. I can't create it either as it's a one time to task to extract this data.

Any help?

Regards,
Ravi
Re: How to Improve query performance [message #582976 is a reply to message #582923] Wed, 24 April 2013 02:56 Go to previous messageGo to next message
michael_bialik
Messages: 621
Registered: July 2006
Senior Member
Post the real query.
Re: How to Improve query performance [message #583017 is a reply to message #582976] Wed, 24 April 2013 20:38 Go to previous messageGo to next message
rleishman
Messages: 3728
Registered: October 2005
Location: Melbourne, Australia
Senior Member
The problem is on Line 2 - it is using a FILTER with 2 child steps. This behaves as a Nested Loop: for each row in the big table it performs a full scan of the small table.

A NOT IN sub-query can be converted to an Anti Join by the optimizer, which will then allow it to use a HASH join instead of Nested Loops. This will be faster as it will perform a single pass of each table.

It is not always possible to convert NOT IN subqueries to Anti Joins. The main restriction is that you must ensure that that the joining column(s) can never be NULL. You have already done this with the NVL, so I'm a bit surprised to see it not working.

Try this

select DISTINCT colA 
from BIG_TABLE 
where colA IS NOT NULL
AND colA not in (
    select colA 
    from SMALL_TABLE s
    WHERE s.colA IS NOT NULL )


If that doesn't work, you can rewrite it as a join
SELECT b.colA
FROM (select DISTINCT colA FROM BIGTABLE WHERE colA IS NOT NULL) b
LEFT JOIN (select DISTINCT colA FROM SMALLTABLE WHERE colA IS NOT NULL) s
ON s.colA = b.colA
WHERE s.colA IS NULL   -- This gives the ones that fail the outer join


Another perhaps simpler method is to use MINUS
SELECT colA
FROM BIGTABLE
MINUS
SELECT colA
FROM SMALLTABLE


Ross Leishman
Re: How to Improve query performance [message #583221 is a reply to message #583017] Sat, 27 April 2013 22:47 Go to previous messageGo to next message
Kevin Meade
Messages: 2103
Registered: December 1999
Location: Connecticut USA
Senior Member
If this is truly a one time task then there is no need to worry too much about it. On most systems this query should be doable in 20 minutes or so. You have likely spent more time than that thinking about it. Given the nature of this particular query (too bad it is a generic case as your real case may not match this one), there are two steps to getting this query done. You must:

FILTER
and
DISTINCT

But since your filtering is based on the same value(s) you are distincting, you have the option to do these in either order. Thus you can either:

1) filter then distinct
or
2) distinct then filter


It is a choice of
doing 400M filter checks and then distincting what remains
or
distincting 400M values and then filtering what remains


It is likely that one of these will be a lot faster than the other. Thus if you can figure out which, you can write the SQL to take advantage of that knowledge. Unfortunately, it will take more effort to figure this out than to just run the query. But to continue, consider this code:

create table temp_big as select distinct cola from bigtable;
select * from temp_big where nvl(cola,-1) not in (select nvl(cola,-1) from littletable);


vs.
create table temp_big as select cola from bigtable where nvl(cola,-1) not in (select nvl(cola,-1) from littletable);
select distinct cola from temp_big;


One of these MAY be much faster than the other. Depends upon your sort settings, how many rows pass the filter, and how many distinct values are in cola.

If the FILTER operation removes most rows then the DISTINCT will be faster if you filter first.
If COLA has only a few unique values then the FILTER operation will be faster if you distinct first.

Do you know your data well enough to know which it true?
If you have collected stats on both tables, Oracle may know already.

Also, does anyone know why the formatting tools no longer work? When I try to use CODE or QUOTE buttons, they erase my text. Used to be you highlighted the text and then clicked the button and it would surround the text with the tags. What gives?


An index on BIGTABLE.COLA would have been interesting as well. It would have opened the door to doing the DISTINCT without sorting and thus the opportunity to pipeline the distinct/filter operations making for a very fast query.

If this is not yet making sense then consider the extreme cases (I love extreme cases, they are so good a giving clarity).
Assume that your filter operation filters away all rows in the result set.  You then have 0 rows to be distincted.  That makes for a pretty fast distinct operation.
or
Assume that all values in COLA are NULL (or any other single value you prefer).  A distinct will reduce your 400M rows to 1 row.  You can check the filters for one row pretty fast.


So this extreme case shows that the order of operations can make a big difference depending upon your data.

The opposite is just as true.

Assume your filter removes no rows.  Then after filtering you must distinct 400M.
Assume COLA is unique.  Then after distincting you must filter 400M rows.


In this case it makes no difference what order you do the operations in, the cost will be the same.

But I think also Ross is right on the money. This should most likely be a HASH JOIN based plan not using the FILTER OPERATOR. Though I am a bit perplexed as to why Oracle did not choose a HJ in the first place. Try using NOPARALLEL hint to turn off PQ and see what happens.

Kevin

[Updated on: Sat, 27 April 2013 23:09]

Report message to a moderator

Re: How to Improve query performance [message #583241 is a reply to message #583221] Sun, 28 April 2013 04:47 Go to previous messageGo to next message
cookiemonster
Messages: 13917
Registered: September 2008
Location: Rainy Manchester
Senior Member
Kevin Meade wrote on Sun, 28 April 2013 04:47

Also, does anyone know why the formatting tools no longer work? When I try to use CODE or QUOTE buttons, they erase my text. Used to be you highlighted the text and then clicked the button and it would surround the text with the tags. What gives?



They work just find for me, so the bug must be browser specific. Do you want to create a new topic in suggestions and feedback for this?
Re: How to Improve query performance [message #583246 is a reply to message #583241] Sun, 28 April 2013 07:02 Go to previous message
Kevin Meade
Messages: 2103
Registered: December 1999
Location: Connecticut USA
Senior Member
Nah, you are right. I just checked my windowsXP machine with IE8 and it worked fine. So it must be my windows7/IE combination that is the problem. Good catch.

Kevin
Previous Topic: Bind variable issue
Next Topic: Sql query running slowly
Goto Forum:
  


Current Time: Thu Mar 28 03:22:19 CDT 2024