(136 ratings)
Is this answer out of date? If it is, please let us know via a Comment
SQL> select count(*) from emp T1
2 where not exists ( select null from emp T2 where t2.mgr = t1.empno );
you could have used
SQL> select count(*) from emp T1
2 where not exists ( select mgr from emp T2 where t2.mgr = t1.empno );
Could you tell what circumstances do we use "select null"
instead of "select <value>". Are there any advantages
[email protected]> select * from dual where dummy not in ( NULL );
no rows selected
[email protected]> select * from dual where NOT( dummy not in (NULL) );
no rows selected
(you would think one of the two queries would return a row -- but there is a third state for a boolean expression in sql -- "I don't know what the answer is")
Your solution is very useful
but i have one query
i have two tables
first table with item_cd, sub_item_cd (both the columns are in primary key)
second table with sub_item_cd (no primary key)
i want to filter only those records from first table where all sub_item_cd exists in second table
ex.
rows of first table
item_cd sub_item_cd
1010 A
1010 B
1010 C
1011 A
1011 D
1012 A
rows of second table
sub_item_cd
A
A
A
B
B
C
B
C
i need output as
item_cd
1010
1012
how can I get it in one select ?
can exists will solve it?
thanks in advance
ps$tkyte@ORA920> select t1.item_cd, t1.sub_item_cd, decode( t2.sub_item_cd, NULL, 1, 0 )
2 from t1, (select distinct sub_item_cd from t2 ) t2
3 where t1.sub_item_cd = t2.sub_item_cd(+)
ITEM_CD S DECODE(T2.SUB_ITEM_CD,NULL,1,0)
---------- - -------------------------------
1010 a 0
1011 a 0
1012 a 0
1010 b 0
1010 c 0
1011 d 1
6 rows selected.
ops$tkyte@ORA920> select t1.item_cd
2 from t1, (select distinct sub_item_cd from t2 ) t2
3 where t1.sub_item_cd = t2.sub_item_cd(+)
4 group by t1.item_cd
5 having max(decode( t2.sub_item_cd, NULL, 1, 0 )) = 0
ITEM_CD
----------
ops$tkyte@ORA920>
give me a "for example" and define "too slow".
on my pc, to full scan a 1,000,000 row table (a copy of all_objects -- the WIDTH * HEIGHT of a table is relevant -- not just the height. I have 1,000,000 tables that consume lots less space then a 10,000 row one) takes about 7 seconds. a 20million -- 20 times as long.
As long as you are using the CBO, and the tables are analyzed, it will use a hash anti join and be very fast.
Hi Tom, Thanks for your reply.
I have following observations :
Query:
SELECT ind_id, lst_ord_ss, dec_sc_is, lst_ord_is,
ss_group, dec_sc_ss, list_pri, cont_lvl,
site_upd_dt, source_cd, sd_custtyp, site_id,
ss_subgrp
FROM multi_mv
WHERE country_cd = :sys_b_00
AND dsf_index IN (:sys_b_01, :sys_b_02)
AND mail_cont IN (:sys_b_03, :sys_b_04)
AND mail_site = :sys_b_05
AND ss_group IN
(:sys_b_06, :sys_b_07, :sys_b_08, :sys_b_09, :sys_b_10)
AND NOT ind_id IN (SELECT DISTINCT ind_id
FROM ssw.promo
WHERE drop_cd = :sys_b_11)
1. The plan of above query. NL's
SELECT STATEMENT Cost = 193604
FILTER
TABLE ACCESS FULL MULTI_MV
TABLE ACCESS BY INDEX ROWID PROMO
INDEX RANGE SCAN PROM_DROP_CD
2. Adding "ORDER BY int_id"into above Query changes
the plan to seems like MERGE_AJ
SELECT STATEMENT Cost = 118609
TABLE ACCESS BY INDEX ROWID MULTI_MV
INDEX FULL SCAN MV_MULTI_IND_ID
TABLE ACCESS BY INDEX ROWID PROMO
INDEX RANGE SCAN PROM_DROP_CD
3. Changing "NOT IN" to "NOT EXISTS"
SELECT STATEMENT Cost = 198944
SORT ORDER BY
HASH JOIN ANTI
TABLE ACCESS FULL MULTI_MV
TABLE ACCESS FULL PROMO
Why is the CBO choosing Anti join only in case of NOT EXISTS ?
Internally the Anti Hash Join retrives the full row or only the join column ?
The approx time for the query to finish in first 2 cases is 5 days.
I would appreciate any comments on this ?.
obviously NO BIND variables on your system eh.
I'll guess (in light of no creates, no knowledge of indexes, no dbms_stats.set_table_stats to let me know how big/how many rows things are, etc etc etc)
Is ind_id in ssw.promo NULLABLE
and if so, what happens when you say:
AND NOT ind_id IN (SELECT DISTINCT ind_id
FROM ssw.promo
WHERE drop_cd = :sys_b_11
and IND_ID IS NOT NULL )
instead (you do know of course that NOT EXISTS and NOT IN are *not* interchangeable, they'll give different results!)
Tom,
I have 2 tables. Both tables have around 8000 records.
Table 1:
ID
parentID
col1
col2
col3
Table1: Primary key: ID
Table2
Module
ID
processDate
ParentID
dataCol1
dataCol2
source_column
source_fg
Table2 Primary key: Module, Id, processDate
Table 1 is a superset of all Ids that Table 2 has.
ParentIds for Ids in Table1 and Table2 would be same for same Id.
Now, I need to pick those Ids from Table1 that are
- not in Table2
- have same ParentID from Table2.
- for records that have source_column in table2 is not null
- for records in Table2 that have source_fg as null
Table1:
ID ParentId Col1 col2 col3
101 111 10 20 30
102 111 20 40 60
103 112 10 10 10
104 113 10 20 30
105 113 30 40 50
106 114 40 40 50
107 113 50 50 50
Table2:
Module ID ParentId ProcessDate datacol1 datacol2 source_column source_fg
1 101 111 27-MAY-2003 1000 1000 2000
1 103 112 27-MAY-2003 2000 2000 2000 Y
1 104 113 27-MAY-2003 3000 3000 3000
1 106 114 27-MAY-2003 2300 2300 3500
Now, I need the out to be:
102
105
107
Select t1.Id
from Table1 t1, Table2 t2
where not Exists (select null from Table2 where table2.module=1 and
table2.id = t1.id and table2.processDate = sysdate)
and t1.parentId = t2.parentId
and t2.Module = 1
and t2.processDate = sysdate
and t2.source_column is not null
and t2.source_fg is null
Actually, I need to insert these Ids in Table2 and replicate data from their parentIDs in Table2.
so, New records would be
Module ID ParentId ProcessDate datacol1 datacol2 source_column source_fg
1 102 111 27-MAY-2003 1000 1000 2000
1 105 113 27-MAY-2003 3000 3000 3000
1 107 113 27-MAY-2003 3000 3000 3000
I am trying to use, Insert into table2
select ......
Which one would be better: (using RBO)
Select t1.Id, t2.datacol1, t2.datacol2..........
from Table1 t1, Table2 t2
where not Exists (select null from Table2 where table2.module=1 and
table2.id = t1.id and table2.processDate = sysdate)
and t1.parentId = t2.parentId
and t2.Module = 1
and t2.processDate = sysdate
and t2.source_column is not null
and t2.source_fg is null
or
Select t1.Id, t2.datacol1, t2.datacol2..........
from Table1 t1, Table2 t2
where t1.Id not in (select Id from Table2 where table2.module=1 and
table2.processDate = sysdate)
and t1.parentId = t2.parentId
and t2.Module = 1
and t2.processDate = sysdate
and t2.source_column is not null
and t2.source_fg is null
Please advice. Thanks
Hi Tom,
Its difficult getting you meanwhile this question is similar what is being discussed and I will be grateful if you will mail the reply to [email protected]. Please don't ignore reading this.I need an urgent reply to be able to get this report to management.
The problem is in generating a report in Report6i from a University HR database Oracle7.3.2 Server.
The problem is like this:
In a University, staff who retired are taken on contract to work for some time. Also staff who resigned from the university and want to come back are re-engaged.
Now management requested for a report on all active staff (i.e to exclude retirees and include staff re-engaged).
When I wrote a query to produce such report I was not getting the correct result. The query failed to exclude the retired staff. The logic i was using is that the subquery should select all staff who retired and resigned from the university. At that instance of creating their service record, their start date is the same as their end date so the subquery would select by using the condition where the start date equals the end date. When such records of staff are found by the subquery,the outer query would compare its result with the inner query and then exclude any matching records found in the inner query result. I hope i made myself clear.Here is the detail of my query:
SELECT distinct id,name
,INITCAP(LOWER(dname))
,disc,descrip
,startdate,endate
from staff a
,dept b
,jobhist c
,dictator d
,postab e
WHERE a.id= c.stid
AND a.deptno= b.deptno
AND c.indic = d.code
AND c.jobtitle = e.postid
AND NOT EXISTS
(SELECT 1
FROM jobhist j
WHERE j.stid=a.id
AND j.startdate=(SELECT MAX(NVL k.endate,k.startdate))
FROM jobhist k,jobhist n
WHERE k.stid = a.id
AND k.stid=n.stid
AND c.indic IN('01','02','04')
)
)
ORDER BY name asc
This is the result got from the above query:
DeptFile# NameService Indicator PostStart Date End Date
Operations 10003 Mana NEW APPOINTMENT LECTURER 1-Dec-87 20-Jan-99
Operations 10003 Mana re-enganged SENIOR LECTURER 1-Jun-01
Research 10001 Ofoe NEW APPOINTMENT LECTURER 15-Aug-02 15-May-03
Research 10001 Ofoe PROMOTION SENIOR LECTURER 15-May-03
Sales 10002 John NEW APPOINTMENT Junior Admin Asst 14-Jun-75 15-May-88
Sales 10002 John PROMOTION LECTURER 16-May-90 11-Jun-00
Sales 10002 John PROMOTION SENIOR ADMIN 15-May-88 30-May-90
Sales 10002 John PROMOTION SENIOR LECTURER 13-Jun-00 30-Jul-02
The correct output should have excluded staff called John since he is retired.
Here are the records stored in the various tables:
Jobhist Table
File# Ser.Indicator Postcode StartDate End Date
10001 A1 5 15-Aug-02 15-May-03
10001 A2 6 15-May-03
10002 A3 7 14-Jun-75 15-May-88
10002 A3 6 15-May-88 30-May-90
10002 A1 6 16-May-90 11-Jun-00
10002 A2 6 13-Jun-00 30-Jul-02
10002 A2 1 30-May-03 30-May-03
10003 A1 5 1-Dec-87 20-Jan-99
10003 A2 4 1-Jul-00 1-Jul-00
10003 A2 3 1-Jun-01
Staff table:
File# Name Hiredate StaffType Deptno Birthdate
10001 Ofoe 12-Aug-02 3 20
10004 Kofi 1-Jun-75 3 10 15-Jun-40
10002 John 12-May-14 3 30 12-Jun-70
10003 Mana 12-Sep-87 2 40 17-Jul-56
Service Indicator Table(dictator):
Code Description
1 retired
2 contract
3 re-enganged
4 resigned
5 NEW APPOINTMENT
6 PROMOTION
Is there a mistake I made, is there a different way of achieving the result or is SQL incapable of achieving the result? I need your help please. Thanks for your help.
well, first - i don't read every review, most but not all. Also, I read them as I have time -- not on any sort of set schedule
second, I'm not emailing people on individual followups. I'd spend the other half of my life doing that (leaving no half for me)
third, the example you put forth is really really really hard to follow. Your sample tables for example haven't many columns at all that match the columns in your query. I don't understand your datamodel (as you should). So, I'll make a simple guess
sounds like you want STAFF records such that their last (most recent) record in JOBHIST does not have an indic code in ( 1, 2, 4 )
so, I think we can ignore enddate all togther, just find the max startdate for each person. we can start with:
select file#, max( to_char(startdate,'yyyymmddhh24miss') || indic )
from jobhist
group by file#
Now, that will give us a set of file# (which I assume is the key into STAFF) and their max startdate value ALONG WITH THE INDIC code for that max date.
We can then:
select file#
from (
select file#, max( to_char(startdate,'yyyymmddhh24miss') || indic ) data
from jobhist
group by file#
)
where substr( data, 15 ) not in ( '01', '02', '04' )
/
That'll be the set of ACTIVE STAFF file#'s.
Now you can just
select *
from staff
where file# in ( <that query> );
Hi Tom,
Thanks for the response.
With reference to your understanding "sounds like you want STAFF records such that their last (most recent) record in
JOBHIST does not have an indic code in ( 1, 2, 4 )"
What i mean is that staff who resign(4) or retired(1)
shouldn't have their names or previous service records in the JOBHIST table appeared in the report at all.
Take it as that the STAFF TABLE is the MASTER Table with the primary key fileNo and the JOBHIST table is the DETAIL TABLE with FileNo as the foreign Key.
Your solutions you gave produced the names and previous service records of staff excluding their most recent records that has indic code in (1,2,4).
We don't the records of those whose most recent records that has indic code in (1,2,4) appeared.
Can't there be away of achieving that?
Thanks for your help.
you say
"we don't the records ... with 1,2,4 appeared"
If I read that to mean "why don't" -- then it is because I filtered them:
select file#
from (
select file#, max( to_char(startdate,'yyyymmddhh24miss') || indic ) data
from jobhist
group by file#
)
where substr( data, 15 ) not in ( '01', '02', '04' )
/
Look at the concept there. (you should be able to fix this, it is after all your set of tables???)
That should -- if you've explained things accurately -- return the set of primary keys from jobhist for joining to the other table. The set of records this returns is the MAX startdate record for everyone in jobhist (their LAST ENTRY) if and only if their indic code isn't in 1, 2, 4
Take the technique, verify the results -- apply the technique to your problem.
Since I
a) don't know your tables
b) don't know your keys
c) am having a hard time following your explaination
this is the best I can do.
Hi Tom,
It is working perfectly.
Thanks for your patience although my questions were not explanatory enough,your guess were correct.I made some mistakes in implementing the solution.
Just as you said,I will now fine tune the query so i can get the records from more than one tables. I will do my best to get accurate result but please bear with me if i should get back to you for additional help.
God bless!!
SELECT
count(*)
FROM
AER A,
AER_PRODUCT B ,
AER_PRODUCT_APPROVAL_NO P ,
AER_REACT E
WHERE
A.AER_DELETED = '0' AND
B.RECORD_DELETED != '1' AND
E.RECORD_DELETED != '1' AND
(B.NOT_RELEVANT IS NULL) AND
(E.NOT_RELEVANT IS NULL) AND
B.TRADE_NAME LIKE :1 AND
(
( B.PRODUCT_TYPE_MULTI = '0'
AND
(B.PRODUCT_TYPE = '1' OR B.PRODUCT_TYPE = '2' OR B.PRODUCT_TYPE = '4' )
)
OR
( B.PRODUCT_TYPE_MULTI = '1'
AND
( ( P.PRODUCT_TYPE = '1' OR P.PRODUCT_TYPE = '2' OR P.PRODUCT_TYPE = '4' )
AND P.COUNTRY_CODE = NVL(A.AER_COUNTRY, '078')
AND B.SEQ_PRODUCT = P.SEQ_PRODUCT AND B.AER_ID = P.AER_ID
)
OR
(NOT EXISTS
( SELECT Y.APPROVAL_NO FROM AER_PRODUCT_APPROVAL_NO Y
WHERE Y.COUNTRY_CODE = NVL(A.AER_COUNTRY, '078')
AND B.SEQ_PRODUCT = Y.SEQ_PRODUCT AND B.AER_ID = Y.AER_ID
)
AND
B.PRODUCT_TYPE != '3'
))
)
AND
(B.PRODUCT_FLAG = '1' OR B.PRODUCT_FLAG = '4') AND
B.AER_ID = A.AER_ID AND
E.AER_ID = A.AER_ID AND
E.EVENT_TYPE != '2'
ORDER BY A.AER_NO, A.COMPOSITE_VERSION_NO DESC , A.AER_ID, E.AUTO_RANK, B.AUTO_RANK
Tables P.KEY
AER AER_ID
AER_PRODUCT AER_ID,SEQ_PRODUCT
AER_APPROVAL_NO
AER_ID,SEQ_PRODUCT,SEQ_APPROVAL_NUMBER
i WANT TO ELIMINATE THE CORRELATED SUBQRY
PLEASE HELP
and a.fromdmdunit = b.dmdunit
and a.fromdmdgroup = b.dmdgroup
and a.fromdfuloc = b.loc
and a.frommodel = b.model
no rows selected
Elapsed: 00:00:30.29
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=59587 Bytes=3634807)
1 0 FILTER
2 1 INDEX (FAST FULL SCAN) OF 'DFUMAP1' (NON-UNIQUE) (Cost=2 Card=59587 Bytes=36
34807)
3 1 INDEX (UNIQUE SCAN) OF 'DFU1' (UNIQUE) (Cost=2 Card=1 Bytes=23)
Statistics
----------------------------------------------------------
0 recursive calls
50 db block gets
3590173 consistent gets
0 physical reads
0 redo size
686 bytes sent via SQL*Net to client
319 bytes received via SQL*Net from client
1 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
0 rows processed
SQL> desc dfu
Name Null? Type
----------------------------------------------------------------------- -------- ----------------------------
DMDUNIT NOT NULL VARCHAR2(30)
DMDGROUP NOT NULL VARCHAR2(10)
LOC NOT NULL VARCHAR2(10)
HISTSTART NOT NULL DATE
EFF NOT NULL DATE
DISC NOT NULL DATE
FCSTHOR NOT NULL NUMBER(38)
DMDCAL NOT NULL VARCHAR2(18)
DMDPOSTDATE NOT NULL DATE
MODELDATE NOT NULL DATE
STATMSE NOT NULL FLOAT(126)
MAXHIST NOT NULL NUMBER(38)
TOTFCSTLOCK NOT NULL NUMBER(38)
LOCKDUR NOT NULL NUMBER(38)
REFITDATE NOT NULL DATE
MASK NOT NULL VARCHAR2(18)
SCEN NOT NULL NUMBER(38)
SCENBITS NOT NULL NUMBER(38)
MAPUSED NOT NULL VARCHAR2(18)
NETFCSTMSE NOT NULL FLOAT(126)
NETFCSTMSESMCONST NOT NULL FLOAT(126)
NETFCSTERROR NOT NULL FLOAT(126)
NEGFCSTSW NOT NULL NUMBER(38)
AUTOADJFACTOR NOT NULL FLOAT(126)
MODEL NOT NULL VARCHAR2(18)
AUTOADJMIN NOT NULL FLOAT(126)
AUTOADJMAX NOT NULL FLOAT(126)
PUBLISHDATE NOT NULL DATE
UDC_DFU_HIERARCHY VARCHAR2(50)
UDC_MODELNUM NOT NULL NUMBER(38)
UDC_ABCINDICATOR VARCHAR2(50)
UDC_DELETERECSW NOT NULL NUMBER(38)
UDC_IMPORTSW NUMBER(38)
UDC_NEWAUTOGENSW NOT NULL NUMBER(38)
SQL>desc dfumap
Name Null? Type
----------------------------------------------------------------------- -------- ----------------------------
MAP NOT NULL VARCHAR2(18)
TODMDUNIT NOT NULL VARCHAR2(30)
TODMDGROUP NOT NULL VARCHAR2(10)
TODFULOC NOT NULL VARCHAR2(10)
FROMDMDUNIT NOT NULL VARCHAR2(30)
FROMDMDGROUP NOT NULL VARCHAR2(10)
FROMDFULOC NOT NULL VARCHAR2(10)
EFF NOT NULL DATE
DISC NOT NULL DATE
FACTOR NOT NULL FLOAT(126)
LEVELNUM NOT NULL NUMBER(38)
FCSTTYPE NOT NULL NUMBER(38)
HISTTYPE NOT NULL NUMBER(38)
SCEN NOT NULL NUMBER(38)
SCENBITS NOT NULL NUMBER(38)
CONVFACTOR NOT NULL FLOAT(126)
LOCKSTART NOT NULL DATE
LOCKEND NOT NULL DATE
TOMODEL NOT NULL VARCHAR2(18)
FROMMODEL NOT NULL VARCHAR2(18)
UDC_NEWAUTOGENSW NOT NULL NUMBER(38)
SQL>select count(*) from dfu;
COUNT(*)
----------
770946
SQL>select count(*) from dfumap;
COUNT(*)
----------
1191740
parameters for
optimizer_index_caching integer 80
optimizer_index_cost_adj integer 25
What can you say about it?
If I delete the stats it works:
SQL> exec dbms_stats.delete_table_stats(tabname=> 'T1',ownname=>'ESCMANAGER');
PL/SQL procedure successfully completed.
SQL> /
F1 F2 F3
---------- ---------- ----------
1 5 3
It is a 9.2.0.4 database on Windows 2003.
Thanks,
Tom,
Could the problem above be explained by an implicit cast?
Where the optimizer is after the analyze aware of the fact the the datatype of the column is varchar2 and does not convert the literal to a number?
freek@DEV01> select (case when to_char(x) > to_char(y) then 'X bigger' else 'Y bigger' end)
2 from ( select 2 x, 10 y
3 from dual
4 );
(CASEWHE
--------
X bigger
Elapsed: 00:00:00.00
if you modify the example to use different values, you get following result
freek@DEV01> select * from v$version;
BANNER
----------------------------------------------------------------
Oracle9i Enterprise Edition Release 9.2.0.1.0 - Production
PL/SQL Release 9.2.0.1.0 - Production
CORE 9.2.0.1.0 Production
TNS for 32-bit Windows: Version 9.2.0.1.0 - Production
NLSRTL Version 9.2.0.1.0 - Production
Elapsed: 00:00:00.00
freek@DEV01> drop table t;
Table dropped.
Elapsed: 00:00:00.00
freek@DEV01> create table t1
2 ( f1 varchar(10),
3 f2 varchar(10),
4 f3 varchar(10)
5 );
Table created.
Elapsed: 00:00:00.00
freek@DEV01> insert into t1 values ('1','2','3');
1 row created.
Elapsed: 00:00:00.00
freek@DEV01> insert into t1 values ('4','10','6');
1 row created.
Elapsed: 00:00:00.00
freek@DEV01> commit;
Commit complete.
Elapsed: 00:00:00.00
freek@DEV01> select f1,f2,f3
2 from t1 a
3 where a.f2 <= '20'
4 and not exists ( select '1'
5 from t1 b
6 where b.f2 <= '20'
7 and b.f2 > a.f2
8 );
F1 F2 F3
---------- ---------- ----------
1 2 3
Elapsed: 00:00:00.00
freek@DEV01> analyze table t1 compute statistics;
Table analyzed.
Elapsed: 00:00:00.00
freek@DEV01> select f1,f2,f3
2 from t1 a
3 where a.f2 <= '20'
4 and not exists ( select '1'
5 from t1 b
6 where b.f2 <= '20'
7 and b.f2 > a.f2
8 );
F1 F2 F3
---------- ---------- ----------
1 2 3
Elapsed: 00:00:00.00
now if you change the query to use an to_number
freek@DEV01> select f1,f2,f3
2 from t1 a
3 where to_number(a.f2) <= 20
4 and not exists ( select 1
5 from t1 b
6 where to_number(b.f2) <= 20
7 and to_number(b.f2) > to_number(a.f2)
8 );
F1 F2 F3
---------- ---------- ----------
4 10 6
Is there some environmental/config setting that could make a difference? I have 9.2.0.4.0 on a WinXP laptop and can not duplicate... Is this behavior specific to Windows Server 2003? I could not duplicate on 9.2.0.4.0 on Solaris, AIX, or Linux as well... (I am going to install Windows Server 2003 and test on that tomorrow)
SQL> select * from v$version;
BANNER
----------------------------------------------------------
Oracle9i Enterprise Edition Release 9.2.0.4.0 - Production
PL/SQL Release 9.2.0.4.0 - Production
CORE 9.2.0.3.0 Production
TNS for 32-bit Windows: Version 9.2.0.4.0 - Production
NLSRTL Version 9.2.0.4.0 - Production
5 rows selected.
SQL> create table t1
3 f1 varchar(10),
4 f2 varchar(10),
5 f3 varchar(10)
6 );
Table created.
SQL> insert into t1 values ('1','5','3');
1 row created.
SQL> insert into t1 values ('4','4','6');
1 row created.
SQL> commit;
Commit complete.
SQL> SELECT f1,
2 f2,
3 f3
4 FROM t1 a
5 WHERE a.f2 <= '5'
6 AND NOT EXISTS (SELECT '1'
7 FROM t1 b
8 WHERE b.f2 <= '5'
9 AND b.f2 > a.f2);
F1 F2 F3
---------- ---------- ----------
1 5 3
1 row selected.
SQL> analyze table t1 compute statistics;
Table analyzed.
SQL> SELECT f1,
2 f2,
3 f3
4 FROM t1 a
5 WHERE a.f2 <= '5'
6 AND NOT EXISTS (SELECT '1'
7 FROM t1 b
8 WHERE b.f2 <= '5'
9 AND b.f2 > a.f2);
F1 F2 F3
---------- ---------- ----------
1 5 3
1 row selected.
Thanks,
it is the "poor mans hash anti-join", a good idea in the olden days of the RBO, not so good in the days of a smarter CBO.
start with:
ops$tkyte@ORA920PC> create table t1 as select * from all_objects where rownum <= 5000;
Table created.
ops$tkyte@ORA920PC> create table t2 as select * from all_objects where rownum <= 4950;
Table created.
ops$tkyte@ORA920PC> create index t2_idx on t2(object_id);
Index created.
select count(*)
from t1 rbo
where object_id not in ( select object_id from t2 )
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 8.63 8.53 129066 197295 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 8.64 8.53 129066 197295 0 1
select count(*)
from t1 rbo
where NOT EXISTS (select null from t2 where t2.object_id = rbo.object_id )
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 0.08 0.13 83 10075 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 0.09 0.13 83 10075 0 1
select count(*)
from t1, t2 rbo
where t1.object_id = rbo.object_id(+)
and rbo.object_id IS NULL
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 0.08 0.11 72 5087 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 0.09 0.11 72 5087 0 1
<b>so, it looks "good" right? better than the alternatives in RBO, yes. but, analyze and</b>
select count(*)
from t1 cbo
where object_id not in ( select object_id from t2 )
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.04 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 0.06 0.08 0 91 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 0.06 0.12 0 91 0 1
<b>hmmm, rbo not so good -- "user written hash anti join under rbo" not so good...
the cbo does OK though with the user written one:</b>
select count(*)
from t1, t2 cbo
where t1.object_id = cbo.object_id(+)
and cbo.object_id IS NULL
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 0.08 0.10 0 91 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 0.08 0.11 0 91 0 1
<b>but I find the not in to be infinitely easier to code and alot more "informational" (clear,easy to understand)</b>
Hi Tom, I don't have any statistics to provide you but could you look at query A and query B and advise which would be more efficient to write.
Query A)
select *
from contr_prod_price cpp,
where cpp.contr in (
select contr
from contr
where co = 'E')
and cpp.prod_cd = 'PX'
and not exists
(select 'x'
from contr_prod_prc_low_lvl_ind cpli
where cpp.contr = cpli.contr
and cpp.contr_frmla = cpli.contr_frmla
and cpp.contr_lvl = cpli.contr_lvl
union
select 'x'
from contr_prod_prc_low_lvl_dir cpld
where cpp.contr = cpld.contr
and cpp.contr_frmla = cpld.contr_frmla
and cpp.contr_lvl = cpld.contr_lvl )
Query B)
select *
from contr_prod_price cpp,
where cpp.contr in (
select contr
from contr
where co = 'E')
and cpp.prod_cd = 'PX'
and not exists
(select 'x'
from contr_prod_prc_low_lvl_ind cpli
where cpp.contr = cpli.contr
and cpp.contr_frmla = cpli.contr_frmla
and cpp.contr_lvl = cpli.contr_lvl )
union
select *
from contr_prod_price cpp
where cpp.contr in (
select contr
from contr
where co = 'E')
and cpp.prod_cd = 'PX'
and not exists
(select 'x'
from contr_prod_prc_low_lvl_dir cpld
where cpp.contr = cpld.contr
and cpp.contr_frmla = cpld.contr_frmla
and cpp.contr_lvl = cpld.contr_lvl )
Thanks
Hi Tom,
The following are 2 tables,(external_documents and external_document_links).They are linked by external_documents.NAME and external_document_links.EXTERNAL_DOCUMENT_NAME columns.
SQL> select * from external_documents --(Primary Key: NAME)
----------------------
corp-bro2.jpg
cvs_structure.doc
corp-bro.jpg
Raid.pdf
AutomaticUndoInternals.pdf
5 rows selected
-------------------------------------------------------------
SQL> select * from external_document_links; --(All the 4 columns are part of the primary key)
FORM DOCUMENT_NAME DOCUMENT_VERSION EXTERNAL_DOCUMENT_NAME
---------- ----------------------- -----------------------------
form_1 document_1 1 corp-bro2.jpg
form_1 document_1 1 Raid.pdf
form_2 document_2 1 AutomaticUndoInternals.pdf
form_2 document_1 1 Raid.pdf
PO po_4000542_2_69 1 Raid.pdf
PO po_4000542_2_69 1 AutomaticUndoInternals.pdf
form_3 document_1 1 Raid.pdf
form_1 document_2 1 corp-bro.jpg
PO po_4000542_2_69 1 corp-bro.jpg
PO po_4000542_2_69 1 corp-bro2.jpg
PO po_4000542_2_69 1 cvs_structure.doc
11 rows selected.
-------------------------------------------------------------
The input to the query will be external_document_links.form,external_document_links.document_name and external_document_links.document_version. Based on this criteria I need to retrive documents linked to the above 3 columns with a flag as 'Y' and rest of the documents from external_documents table with the flag as 'N'
I have written a query which goes like this :-
SQL> select a.name NAME,'Y' PRESENT from external_documents a,external_document_links b
where a.name=b.EXTERNAL_DOCUMENT_NAME AND b.form ='form_1' and b.DOCUMENT_VERSION =1 and b.DOCUMENT_NAME='document_1'
union
select name ,'N' from external_documents where name
not in (select a.name NAME from external_documents a,external_document_links b
where a.name=b.EXTERNAL_DOCUMENT_NAME AND b.form ='form_1'and b.DOCUMENT_VERSION =1 and b.DOCUMENT_NAME='document_1')
So I get the right output as follows :-
--------------------------------------------------------------------------------
NAME P
------------------------------ -
AutomaticUndoInternals.pdf N
Raid.pdf Y
corp-bro.jpg N
corp-bro2.jpg Y
cvs_structure.doc N
5 rows selected
----------------------------------------------------------------
Can you please tell me if there is any other way in which this query can be re-written, i.e without using the 'NOT IN' clause.
sigh, no table creates. no insert intos. no easy way to play with the data.
seems you want every row from external_documents, and if that document has a mate in external_document_links -- show Y else show N
select name, nvl( (select 'Y'
from external_document_links A
where a.name = b.external_document_name
and b.form = :bind1
and b.document_version = :bind2
and b.document_name = :bind3 ), 'N' ) present
from external_documents;
probably works, but I couldn't really run it or anything.
Hello Tom,
I would like to ask you for one our bad sql which uses "not in" for remote site. For some sites we have slow network and the package takes about 10min. (it transfers about 100rows, runs once per hour). Do you have any suggestion for us how can we improve performance of the bad sql?
Thank you.
SET AUTOTRACE TRACEONLY
-- insert into monitor.promon@tesa
select
PROINUM1,
PROINUM2,
PROAPPL,
--lc_host,
PROSTAT,
upper(PROPROG),
PROMESS,
PRODATE,
upper(PROUSER),
PROTYPE
from
r4.promon
where
prodate > sysdate-1/24 and --ld_prodate and
prodate < sysdate and --ld_currentDate and
UPPER(proprog) not in (select UPPER(batch) from monitor.promon_badbatch@tesa);
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=2069 Card=1 Bytes=100)
1 0 FILTER
2 1 TABLE ACCESS (FULL) OF 'PROMON' (Cost=2069 Card=1 Bytes=100)
3 1 REMOTE* (Cost=1 Card=1 Bytes=17) TESA.WORLD
3 SERIAL_FROM_REMOTE SELECT "BATCH" FROM "MONITOR"."PROMON_BADBATCH" "PROMON_BADBATCH"
WHERE LNNVL(UPPER(:1)<>UPPER("BATCH"))
Thats a pretty intresting thread. Tom would you please through some light on query:
select count(*)
from t1, t2 cbo
where t1.object_id = cbo.object_id(+)
and cbo.object_id IS NULL
We are (outer) joing the two tables on the basis of object_id and on the other hand we are saying "cbo.object_id IS NULL".
I can't understand how it will be executed. Can you please tell us in psuedo code ???
if you understand an outer join -- basically every row in T1 will be in the result set, regardless of whether there was a 'mate' in t2 to be joined to.
So, t1 will be retrieved from, and if there is a row(s) in t2 for that object_id -- the join will happen (and t2.object_id will be NOT NULL of course since we did the join). Else, the column values for t2 will be set to NULL and the row from t2 will be output. consider:
ops$tkyte@ORA9IR2> create table t1 ( x int );
Table created.
ops$tkyte@ORA9IR2> create table t2 ( x int );
Table created.
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> insert into t1 values(1);
1 row created.
ops$tkyte@ORA9IR2> insert into t1 values(2);
1 row created.
ops$tkyte@ORA9IR2> insert into t2 values(1);
1 row created.
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> select * from t1,t2 where t1.x = t2.x(+);
X X
---------- ----------
1 1
<b>the outer join says "all rows in t1 will be output, no matter what". So, all we need to do is keep the rows where t2.x (cbo.object_id) is NULL -- those are the rows in t1 such that there was no row in T2 to join to -- just like a not in</b>
I thought that this should not be processed as correlated:
... UPPER(proprog) not in (select UPPER(batch) from monitor.promon_badbatch@tesa);
But the excecution plan showed (I think that :1 = promon.proprog):
SELECT "BATCH" FROM "MONITOR"."PROMON_BADBATCH" "PROMON_BADBATCH" WHERE LNNVL(UPPER(:1)<>UPPER("BATCH"))
I did little test:
create table t1 (id number); create table t2 (id number);
insert into t1 values (1); insert into t1 values (2);
insert into t2 values (2);
analyze table t1 compute statistics;
analyze table t2 compute statistics;
I)
select * from t1 where id not in (select id from t2);
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=6)
1 0 FILTER
2 1 TABLE ACCESS (FULL) OF 'T1'
3 1 TABLE ACCESS (FULL) OF 'T2'
II)
select * from t1 where id not in (select id from t2@ee);
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=6)
1 0 FILTER
2 1 TABLE ACCESS (FULL) OF 'T1' (Cost=3)
3 1 REMOTE* (Cost=3) EE.WORLD
3 SERIAL_FROM_REMOTE SELECT "ID" FROM "T2" "T2" WHERE LNNVL("ID"<>:1)
1) Is the first SQL processed same as the second - but only explain plan did not show it?
2) If not - why the 2nd is processed as correlated (I think that :1 = t1.id)?
Thank you.
Your two basic queries for comparison were:
SQL> select count(*) from emp where empno not in ( select mgr from emp );
COUNT(*)
----------------
SQL> select count(*) from emp T1
2 where not exists ( select null from emp T2 where t2.mgr = t1.empno );
COUNT(*)
----------------
The first fails because of 3-value logic; it's easy to forget that NULL != NULL. The query might as well be:
SQL> select count(*) from emp where empno not in ( select null from dual );
COUNT(*)
----------------
The first query could be rectified to account for 3-value logic; for example:
SQL> select count(*) from emp where to_char(empno)
2 NOT IN ( select NVL(to_char(mgr),'null') from emp );
COUNT(*)
----------------
Is that correct? And the reason anti-joins can be efficient if there are no NULLs is because a standard index doesn't include NULL values?
it would be more correct to code:
not in ( select mgr from emp where mgr is not null )
converting a number to a string would not be a good thing and it is more "self explanatory" what is happening with the "is not null"
It is not a direct cause/effect. create index on emp(mgr,empno) would have an entry for each and every row since EMPNO is not null -- so it is not correct to say that indexes do not include null values -- starndard b*tree indexes do not include ENTIRELY null key entrys but in general they have nulls.
Looking at the "poor mans hash anti-join", there were 3 alternatives evaluated:
select count(*) from t1 rbo
where object_id not in ( select object_id from t2 );
select count(*) from t1 rbo
where NOT EXISTS (select null from t2 where t2.object_id = rbo.object_id );
select count(*) from t1, t2 rbo
where t1.object_id = rbo.object_id(+) and rbo.object_id IS NULL;
Yet none seemed "set oriented". I tried another query that, to me, seems clearer:
select count(*) from t1 rbo,
( select object_id from t1 MINUS select object_id from t2 ) SET_OP
where SET_OP.object_id = RBO.object_id
/
Under RBO, it won with only 225 LIOs (compared with 197295, 10075, and 5087 LIOs):
select count(*) from t1 rbo,
( select object_id from t1 MINUS select object_id from t2 ) SET_OP
where SET_OP.object_id = RBO.object_id
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.06 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 0.04 0.04 0 225 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 0.04 0.10 0 225 0 1
However, under the CBO, it came in 3rd with 166 LIOs (more than 91 for 2 others).
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 0.03 0.03 0 166 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 0.03 0.03 0 166 0 1
I just find it interesting that logically-equivalent queries have such different implementations.
select count(*) from t1 rbo
where object_id not in ( select object_id from t2 );
isn't set oriented? actually -- they are all set oriented, no procedural code, no ordered array semantics (as you get with analytics)
you have "give me everything in this set that isn't in this other set", vs "get me everything in this set such that is doesn't exist in the other set", vs "outer join set A to set B and keep rows such that .... is true"
that last one you added really seems "cleaner" than
select count(*) from t1 rbo
where object_id not in ( select object_id from t2 );
does?!?! really?
hi
in one of your first replies you mentioned this
+++++++++++++++++++++++++
So, the predicate
where x not in ( NULL )
evaluates to neither TRUE, nor FALSE
How about
where x in ( NULL )
that evaluates to neither true or false right?
So isnt IN and EXISTS different as well (like NOT IN and NOT EXISTS)?
Truly all examples were non-procedural, but I think of set operations in terms of Venn diagrams and operations like MINUS, UNION, and INTERSECT. To my taste, the cleanest query for a simple count is:
select count(*) from
( select object_id from t1 MINUS select object_id from t2 ) RBO_SET
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.01 0.01 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 0.05 0.18 143 150 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 0.06 0.20 143 150 0 1
which has a scant 150 LIOs under the RBO, and an equal-to-the-best 91 LIOs under CBO:
select count(*) from
( select object_id from t1 MINUS select object_id from t2 ) CBO_SET
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.02 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 0.02 0.02 0 91 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 0.02 0.04 0 91 0 1
The other 3 queries could easily obtain information from T1 by replacing the "count(*)", so I augmented the simple MINUS with a join back to T1 in my previous query.
BTW, I appreciate your new "LINKS I LIKE" tab! Thank you.
I have the most bizarre CBO'ness (for what of a better word!) that I can not fathom. Below are the tear down and set up needed.
select * from v$banner
BANNER
----------------------------------------------------------------
Oracle9i Enterprise Edition Release 9.2.0.4.0 - 64bit Production
PL/SQL Release 9.2.0.4.0 - Production
CORE 9.2.0.3.0 Production
TNS for HPUX: Version 9.2.0.4.0 - Production
NLSRTL Version 9.2.0.4.0 - Production
--
-- Clear down
--
drop table HAS_PERMISSION_UNIT_TEST
/
drop table PLT_NODE_PERMISSIONS
/
drop table PLT_SECURITY_NODES
/
drop table PLT_USER_DELEGATES
/
drop table PLT_XDI_PERMISSIONS
/
CREATE TABLE PLT_SECURITY_NODES
( NODE_ID VARCHAR2(16) NOT NULL,
PARENT_NODE_ID VARCHAR2(16),
IS_DELETED NUMBER NOT NULL,
IS_SYSTEM NUMBER NOT NULL,
LAST_MODIFIED_DATETIME DATE,
LAST_MODIFIED_USER_ID VARCHAR2(32)
)
/
CREATE INDEX PSN_PSN_FK_I ON PLT_SECURITY_NODES
(PARENT_NODE_ID)
/
CREATE UNIQUE INDEX PSN_PK ON PLT_SECURITY_NODES
(NODE_ID)
/
CREATE INDEX NODE_AND_PARENT_NODE ON PLT_SECURITY_NODES
(NODE_ID, PARENT_NODE_ID)
/
ALTER TABLE PLT_SECURITY_NODES ADD (
CONSTRAINT PSN_PK PRIMARY KEY (NODE_ID))
/
ALTER TABLE PLT_SECURITY_NODES ADD (
CONSTRAINT PSN_PSN_FK FOREIGN KEY (PARENT_NODE_ID)
REFERENCES PLT_SECURITY_NODES (NODE_ID))
/
------------------------------------------------------------------------------------------------
CREATE TABLE PLT_NODE_PERMISSIONS
( NODE_ID VARCHAR2(32) NOT NULL,
USER_ID VARCHAR2(32) NOT NULL,
ROLE_ID VARCHAR2(32) NOT NULL,
IS_DELETED NUMBER NOT NULL,
IS_SYSTEM NUMBER NOT NULL,
LAST_MODIFIED_DATETIME DATE,
LAST_MODIFIED_USER_ID VARCHAR2(32)
)
/
CREATE UNIQUE INDEX PNP_PK ON PLT_NODE_PERMISSIONS
(NODE_ID, USER_ID, ROLE_ID)
/
CREATE INDEX NODE_AND_USER ON PLT_NODE_PERMISSIONS
(NODE_ID, USER_ID)
/
CREATE INDEX PNP_PSN_FK_I ON PLT_NODE_PERMISSIONS
(NODE_ID)
/
CREATE INDEX PNP_PPR_FK_I ON PLT_NODE_PERMISSIONS
(ROLE_ID, USER_ID)
/
ALTER TABLE PLT_NODE_PERMISSIONS ADD (
CONSTRAINT PNP_PK PRIMARY KEY (NODE_ID, USER_ID, ROLE_ID))
/
ALTER TABLE PLT_NODE_PERMISSIONS ADD (
CONSTRAINT PNP_PSN_FK FOREIGN KEY (NODE_ID)
REFERENCES PLT_SECURITY_NODES (NODE_ID))
/
----------------------------------------------------------------------------
CREATE TABLE PLT_USER_DELEGATES
( USER_ID VARCHAR2(32) NOT NULL,
DELEGATE_USER_ID VARCHAR2(32) NOT NULL,
EFFECTIVE_FROM DATE NOT NULL,
EFFECTIVE_TO DATE,
IS_DELETED NUMBER NOT NULL,
IS_SYSTEM NUMBER NOT NULL,
LAST_MODIFIED_DATETIME DATE,
LAST_MODIFIED_USER_ID VARCHAR2(32)
)
/
CREATE INDEX PUD_PP_FK_I ON PLT_USER_DELEGATES
(USER_ID)
/
CREATE INDEX PUD_PP_DELEGATE_FK_I ON PLT_USER_DELEGATES
(DELEGATE_USER_ID)
/
CREATE UNIQUE INDEX PUD_PK ON PLT_USER_DELEGATES
(USER_ID, DELEGATE_USER_ID, EFFECTIVE_FROM)
/
ALTER TABLE PLT_USER_DELEGATES ADD (
CONSTRAINT PUD_PK PRIMARY KEY (USER_ID, DELEGATE_USER_ID, EFFECTIVE_FROM))
/
---------------------------------------------------------------------------------------
CREATE TABLE PLT_XDI_PERMISSIONS
(
XDI VARCHAR2(255) NOT NULL,
USER_ID VARCHAR2(32) NOT NULL,
ROLE_ID VARCHAR2(16) NOT NULL,
IS_DELETED NUMBER NOT NULL,
IS_SYSTEM NUMBER NOT NULL,
LAST_MODIFIED_DATETIME DATE,
LAST_MODIFIED_USER_ID VARCHAR2(32)
)
/
CREATE INDEX PXP_PPR_FK_I ON PLT_XDI_PERMISSIONS
(ROLE_ID, USER_ID)
/
CREATE UNIQUE INDEX PXP_PK ON PLT_XDI_PERMISSIONS
(XDI, USER_ID, ROLE_ID)
/
ALTER TABLE PLT_XDI_PERMISSIONS ADD (
CONSTRAINT PXP_PK PRIMARY KEY (XDI, USER_ID, ROLE_ID))
/
begin
--
-- Set up 4 delegates for this user to have access via
--
for idx in 1..4
loop
insert into plt_user_delegates
( user_id, delegate_user_id, effective_from, effective_to, is_deleted, is_system, last_modified_datetime, last_modified_user_id)
values
('MJONES', 'MJONES_DELEGATE_'||idx, sysdate -100, sysdate + 100, 0, 0, sysdate, 'MJONES');
--
end loop;
--
-- Loop 500 times to create some realistic volume of data that this user has
--
for idx in 1..500
loop
insert into plt_security_nodes
( node_id, parent_node_id, is_deleted, is_system, last_modified_datetime, last_modified_user_id )
values
('NODE_'||idx||'_TOP_LVL', null, 0, 1, sysdate, 'MJONES');
--
insert into plt_security_nodes
( node_id, parent_node_id, is_deleted, is_system, last_modified_datetime, last_modified_user_id )
values
('NODE_'||idx||'_LEVEL_2', 'NODE_'||idx||'_TOP_LVL', 0, 1, sysdate, 'MJONES');
--
insert into plt_security_nodes
( node_id, parent_node_id, is_deleted, is_system, last_modified_datetime, last_modified_user_id )
values
('NODE_'||idx||'_LVL_3', 'NODE_'||idx||'_LEVEL_2', 0, 1, sysdate, 'MJONES');
--
-- Create permission into Node
--
insert into plt_node_permissions ( node_id, user_id, role_id, is_deleted, is_system ) values (
'NODE_'||idx||'_LVL_3', 'MJONES', 'UNIT_TEST', 0, 0);
end loop;
--
-- Now wedge up some XDI Permissions. Have a total of 100 XDI's
--
for idx in 1..100
loop
insert into plt_xdi_permissions
( xdi, user_id, role_id, is_deleted, is_system)
values
( idx||'_JONES_THE_XDI', 'MJONES', 'UNIT_TEST', 0, 0);
end loop;
end;
/
analyze table plt_node_permissions compute statistics for table for all indexes for all indexed columns
/
analyze table plt_security_nodes compute statistics for table for all indexes for all indexed columns
/
analyze table plt_user_delegates compute statistics for table for all indexes for all indexed columns
/
analyze table plt_xdi_permissions compute statistics for table for all indexes for all indexed columns
/
--
-- Now build up a table of 100,000 rows to base all this malarkey on... Set the access up to start with so that we have access too all 100,000 records in bunches of 200 (500 access records we have access to)
--
create table has_permission_unit_test as
select
to_char(rownum) primary_key,
owner, object_name, subobject_name, object_id, data_object_id, object_type, created, last_ddl_time, timestamp, status, temporary, generated, secondary,
'NODE_'||(mod(rownum,200)+1)||'_LVL_3' security_node_id,
--decode(mod(rownum,1000),0, mod(rownum,1000)||'JONES_THE_XDI',rownum||'_XDI_'||object_name) xdi,
rownum||'_XDI_'||object_name xdi,
0 is_deleted,
sysdate last_modified_datetime,
'BOB' last_modified_user_id
from
( select * from all_objects
union all
select * from all_objects
union all
select * from all_objects
union all
select * from all_objects
union all
select * from all_objects
)
where rownum <= 100000-- 100K
/
--
alter table has_permission_unit_test modify primary_key varchar2(32)
/
--
alter table has_permission_unit_test add primary key (primary_key)
/
--
analyze table has_permission_unit_test compute statistics for table for all indexes for all indexed columns
/
--
-- END OF SET UP
--
The query that behaves strangley is...
select count(*)
from has_permission_unit_test main
where main.primary_key <= :i_volume
and
(
main.security_node_id in
( select /*use_hash(sn) index_ffs( sn node_and_parent_node ) use_hash(np) index_ffs(np node_and_user)*/ np.node_id
from plt_security_nodes sn, plt_node_permissions np
where sn.node_id = np.node_id
and np.user_id in
( select user_id
from plt_user_delegates ud_first
where ud_first.delegate_user_id = 'MJONES'
and sysdate between ud_first.effective_from and ud_first.effective_to
and ud_first.is_deleted = 0
union all
select 'MJONES' user_id
from dual dual_first
)
connect by prior sn.parent_node_id = sn.node_id
)
-- COMMENT OUT FROM HERE
or
( main.xdi in
( select xi.xdi
from plt_xdi_permissions xi
where xi.user_id in
( select user_id
from plt_user_delegates ud_second
where ud_second.delegate_user_id = 'MJONES'
and sysdate between ud_second.effective_from and ud_second.effective_to
and ud_second.is_deleted = 0
union all
select 'MJONES' user_id
from dual dual_second
)
)
)
-- DOWN TO HERE
)
Running as is with :i_volume = 1000 takes around 4 seconds, and the matches are being done with the security nodes (I beleive), but if you comment out the bottom bit the plan radically changes to start FFS with a couple of indexes I can not see why the "or" changes this. Even if I change this to "or 1 in (select null from dual where 1=0) it still won't FFS. I tried hinting (the + is removed so it won't pick it up to no avail. Why has the performance so dropped off when I stick any OR in?
Thanks in advance, and hope this is enough info.
Mike.
I've 2 tables cust(5G) and archive(1G), I want to create third table which will have cust - archive data so I used following qry
create table newcust
nologging
as
select * from cust where c1 not in
(select c1 from archive)
parallel(Degree 8)
/
Cust table have index on c1 column ... archive table do not have any indexes.
This query was running for 7 hours .. I killed this qry as it was taking almost all resources.
Can you please suggest some alternative to speed up this query?
Can you please explain part?
make sure c1 is null or you say "where c1 is not null" in
the subquery.
should it be
"make sure c1 is not null or you say "where c1 is not null" in the subquery."?
One more thing to add. C1 is not null in both the tables. In cust table it is PK and archive table is extract of cust.
I'm using following query
CREATE TABLE ARCHIVE
AS SELECT * FROM SMALL WHERE CNO NOT IN
(SELECT /*+ HASH_AJ */ CNO FROM LARGE WHERE CNO IS NOT NULL)
Following is the query plan
CREATE TABLE STATEMENT Optimizer Mode=CHOOSE
LOAD AS SELECT
FILTER
TABLE ACCESS FULL SMALL 14 K 564 K 236
INDEX FULL SCAN PK_LARGE 867 K 21 M 48303
Size of small table is 20M, and large table is 3225M. Here I was expecting HASH ANTI JOIN.
But if I use following query ...
CREATE TABLE ARCHIVE
AS SELECT * FROM SMALL WHERE CNO IS NOT NULL AND CNO NOT IN
SELECT /*+ HASH_AJ */ CNO FROM LARGE)
It is using ANTI HASH JOIN.
My questions are,
1. Do we have to place NOT NULL in inner query or it can be a part of outer query?
2. Does table size matter?
TABLE_NAME : LARGE
TABLESPACE_NAME :
CLUSTER_NAME :
IOT_NAME :
PCT_FREE :
PCT_USED :
INI_TRANS :
MAX_TRANS :
INITIAL_EXTENT :
NEXT_EXTENT :
MIN_EXTENTS :
MAX_EXTENTS :
PCT_INCREASE :
FREELISTS :
FREELIST_GROUPS :
LOGGING :
BACKED_UP : N
NUM_ROWS : 17351339
BLOCKS : 399153
EMPTY_BLOCKS : 12287
AVG_SPACE : 1305
CHAIN_CNT : 0
AVG_ROW_LEN : 153
AVG_SPACE_FREELIST_BLOCKS : 0
NUM_FREELIST_BLOCKS : 0
DEGREE : 1
INSTANCES : 1
CACHE : N
TABLE_LOCK : ENABLED
SAMPLE_SIZE : 0
LAST_ANALYZED : 28-mar-2005 20:31:03
PARTITIONED : YES
IOT_TYPE :
TEMPORARY : N
NESTED : NO
BUFFER_POOL :
###############################################################
TABLE_NAME : SMALL
TABLESPACE_NAME : DB1
CLUSTER_NAME :
IOT_NAME :
PCT_FREE : 10
PCT_USED : 40
INI_TRANS : 1
MAX_TRANS : 255
INITIAL_EXTENT : 5242880
NEXT_EXTENT : 5242880
MIN_EXTENTS : 1
MAX_EXTENTS : 2147483645
PCT_INCREASE : 0
FREELISTS : 1
FREELIST_GROUPS : 1
LOGGING : YES
BACKED_UP : N
NUM_ROWS : 282211
BLOCKS : 2449
EMPTY_BLOCKS : 110
AVG_SPACE : 2792
CHAIN_CNT : 0
AVG_ROW_LEN : 41
AVG_SPACE_FREELIST_BLOCKS : 4293
NUM_FREELIST_BLOCKS : 9
DEGREE : 1
INSTANCES : 1
CACHE : N
TABLE_LOCK : ENABLED
SAMPLE_SIZE : 0
LAST_ANALYZED : 03-apr-2005 18:24:31
PARTITIONED : NO
IOT_TYPE :
TEMPORARY : N
NESTED : NO
BUFFER_POOL : DEFAULT
SVRMGR> show parameter hash
NAME TYPE VALUE
----------------------------------- ------- ------------------------------
hash_area_size integer 0
hash_join_enabled boolean TRUE
hash_multiblock_io_count integer 1
sequence_cache_hash_buckets integer 10
Hi Tom,
Please look into this query.
SELECT
statement_date,
revenue_centre,
bill_create_date,
jnl_end_date,
company_code,
revenue
FROM ods_merc_bill_rev stg
WHERE NOT EXISTS (SELECT 1
FROM tmp_rtmr_accounts tra
WHERE stg.gl_account_code||' 000000' = tra.account_code_n106)
OR NOT EXISTS (SELECT account_code_n106,product_code
FROM tmp_rtmr_accounts tra
WHERE stg.gl_account_code||' 000000' = tra.account_code_n106
and EXISTS (SELECT 1
FROM tmp_rtmr_products trp
WHERE
tra.product_code = trp.product_code ))
Now I want the product code which is used in inner most query to be displayed in main query..
How to bring this into main query. Actually It should
return only three rows. If I insert the product code into main query, It is replicated. How to do!..Please advice me..
Table structure for ods_merc_bill_rev:
PROCESS_KEY NUMBER (12)
COMPANY_CODE VARCHAR (6)
REVENUE_CENTRE VARCHAR (6)
STATEMENT_DATE VARCHAR (8)
BILL_CREATE_DATE VARCHAR (8)
JNL_END_DATE VARCHAR (8)
BILL_PERIOD VARCHAR (3)
GL_ACCOUNT_CODE VARCHAR (10)
REVENUE NUMBER (20,2)
Table structure for tmp_rtmr_accounts:
ACCOUNT_CODE_N106 VARCHAR (17)
PRODUCT_CODE VARCHAR (20)
Table structure for tmp_rtmr_products:
PRODUCT_CODE VARCHAR (20)
Please have a look at this and give me a solution as soon as possible..
Thanks in Advance!
Regards,
baskar
you'll be joining - but it doesn't seem to make sense to me since you say "or not exists ( product code )"
how can you show something that doesn't exist.
(i'm not here to write SQL for you, I'll be glad to answer questions -- but please don't think I'm going to write your SQL as soon as possible, that is just not reasonable)
the "hint" seems to be not documented, but the anti join lives on
ops$tkyte@ORA10G> create table t1 ( x int not null, y int );
Table created.
ops$tkyte@ORA10G> create table t2 ( x int not null, y int );
Table created.
ops$tkyte@ORA10G>
ops$tkyte@ORA10G> exec dbms_stats.set_table_stats( user, 'T1', numrows=>1000000, numblks=> 10000 );
PL/SQL procedure successfully completed.
ops$tkyte@ORA10G> exec dbms_stats.set_table_stats( user, 'T2', numrows=>1000000, numblks=> 10000 );
PL/SQL procedure successfully completed.
ops$tkyte@ORA10G>
ops$tkyte@ORA10G> @plan "select * from t1 where x not in (select x from t2 )";
ops$tkyte@ORA10G> delete from plan_table;
4 rows deleted.
ops$tkyte@ORA10G> explain plan for &1;
old 1: explain plan for &1
new 1: explain plan for select * from t1 where x not in (select x from t2 )
Explained.
ops$tkyte@ORA10G> select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------------------------------------------
Plan hash value: 629543484
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)|
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 39 | | 8525 (2)|
|* 1 | HASH JOIN RIGHT ANTI| | 1 | 39 | 23M| 8525 (2)|
| 2 | TABLE ACCESS FULL | T2 | 1000K| 12M| | 2742 (2)|
| 3 | TABLE ACCESS FULL | T1 | 1000K| 24M| | 2742 (2)|
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("X"="X")
15 rows selected.
Hi Tom,
I'm using NOT IN to clear up a table and turning up with some strange results. Hope this is not too stupid... anyway here goes.
The basic idea is to clean up an old large table, based on a smaller new one. Both tables have no null elements.
--create a "small" table first.
CREATE TABLE m1_msisdns_int
(msisdn varchar2(18) NOT NULL);
--it's filled with some data...
--create table of unmatched keys
CREATE TABLE unmatched_msisdns
(msisdn varchar2(18) NOT NULL);
--fill it up
INSERT INTO unmatched_msisdns
(SELECT mi.msisdn FROM medapp.msisdn_imei mi WHERE mi.msisdn NOT IN
(SELECT msisdn FROM m1_msisdns_int));
OK, now records in unmatched_msisdns *should* be in msisdn_imei and *should not* be in m1_msisdns_int. But this is not the case. For example, picking some random records:
SQL> select * from unmatched_msisdns where rownum < 5;
MSISDN
------------------
6581000016
6581000017
6581000032
6581000033
SQL> select msisdn from medapp.msisdn_imei where msisdn = '6581000016';
MSISDN
------------------
6581000016
SQL> select msisdn from m1_msisdns_int where msisdn = '6581000016';
MSISDN
------------------
6581000016
Why is the record in the m1_msisdns_int table too? What am I doing wrong?
Thanks.
(msisdn varchar2(18) NOT NULL);
INSERT INTO m1_msisdns_int (SELECT msisdn FROM m1_msisdns_ext);
DROP TABLE m1_msisdns_ext;
commit;
CREATE TABLE unmatched_msisdns
(msisdn varchar2(18) NOT NULL);
INSERT INTO unmatched_msisdns (SELECT msisdn FROM medapp.msisdn_imei WHERE msisdn NOT IN (SELECT msisdn FROM m1_msisdns_int));
commit;
========================
1. The data file to create m1_msisdns_ext (msisdns_20050811.dat) is just a single column file with strings e.g.:
6596886272
6597600746
6593686340
6596875161
6594745290
6597902848
6598493940
6590933396
6597440556
6593672611
You can generate it yourself, or I can pass one to you.
2. The table medapp.msisdn_imei also contains a list of strings like m1_msisdns_int, but a much larger set.
3. We're using Oracle 10g, on Lintel. All results shown here using SQLPlus.
Just to be sure, I do these checks:
SQL> select count(*) from medapp.msisdn_imei;
COUNT(*)
----------
1652979
SQL> select count(*) from m1_msisdns_int;
COUNT(*)
----------
1204505
SQL> select count(*) from unmatched_msisdns;
COUNT(*)
----------
449069
SQL> select * from m1_msisdns_int where msisdn is null;
no rows selected
SQL> select * from medapp.msisdn_imei where msisdn is null;
no rows selected
However, when I run the NOT IN based insertion, I still see some (some, not all) records in the unmatched_msisdns where some records belong to both tables (m1_msisdns_int and medapp.msisdn_imei).
Finally, as an aside, when I select with quote a query does not work, but works without quotes!! Weird, isn't it??:
SQL> select * from unmatched_msisdns where msisdn = 6581000016;
MSISDN
------------------
6581000016
SQL> select * from unmatched_msisdns where msisdn = '6581000016';
no rows selected
Sorry for the long post, but hope it makes things clear.
Thanks for your time and effort.
However, I just noticed you said:
SQL> select * from unmatched_msisdns where msisdn = 6581000016;
MSISDN
------------------
6581000016
SQL> select * from unmatched_msisdns where msisdn = '6581000016';
no rows selected
I disagree with your analysis of the trailing whitespace. When you compare the string to a number, there is an implicit to_number applied to the string:
ops$xp10gr1\tkyte@ORA10GR1> create table t ( x varchar2(18) );
Table created.
ops$xp10gr1\tkyte@ORA10GR1> @plan "select * from t where x = 1"
0 rows deleted.
old 1: explain plan for &1
new 1: explain plan for select * from t where x = 1
Explained.
PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------
Plan hash value: 1601196873
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 11 | 2 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| T | 1 | 11 | 2 (0)| 00:00:01 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(TO_NUMBER("X")=1)
-----
- dynamic sampling used for this statement
17 rows selected.
<b>when you compare a string to a string, you do NOT have that happening:</b>
ops$xp10gr1\tkyte@ORA10GR1> @plan "select * from t where x = '1'"
2 rows deleted.
old 1: explain plan for &1
new 1: explain plan for select * from t where x = '1'
Explained.
PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------
Plan hash value: 1601196873
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 11 | 2 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| T | 1 | 11 | 2 (0)| 00:00:01 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("X"='1')
-----
- dynamic sampling used for this statement
17 rows selected.
So, I believe you DO have trailing whitespace on some of those. Look again.
select * from t
where x like '% '
or x like '%'||chr(9)
or x like '%' || chr(10)
or x like '%' || chr(13)
for example.
Tom,
Sorry for dragging this thread back to the basics a bit, but going back to your first reply
select count(*) from emp where empno not in (Select nvl(mgr,0) from emp)
appears to bring the correct result.
What I am trying to understand is when the "NOT EXISTS" works surely it should also say 'Hey, I see a Null value and I don't know if it maybe or may not be so I can ignore'. I mean why is NOT IN alone returning count(*)=0, comparitively.
Thanks
Tom I have the following which takes forever to run:
..
..
AND NOT EXISTS (SELECT null FROM tm tp WHERE tp.id = cd.id)
AND NOT EXISTS (SELECT null FROM t1 tp WHERE tp.id = cd.id)
can this be changed to:
AND NOT EXISTS (SELECT null FROM tm tp,t1 tp1 WHERE tp.id = cd.id OR tp1.id = cd.id )
Would this result in the same result?
why do you believe the not exists is the problem? (you are using the CBO right?)
No, it cannot be changed to what you have, that query is not the same as the prior two.
Using the cbo, you might:
and cd.id not in ( select id from tm where id is not null
union all
select id from t1 where id is not null )
Tom,
I've come across an application where the client checks if there are any rows meeting some criteria. Here is how it is done:
select count(*) from some_table where (some complex criteria) and rownum=1
/* also in some places rownum<2 is used */
Is is a safe way for existence checks?
Isn't it better to check existence like this:
select case when exists(select 1 from some_table where (some complex criteria)) then 1 else 0 end a from dual
Thank you
here is the best way to do this:
THIS SPACE INTENTIONALLY LEFT BLANK
I hate that check for existence, hardware vendors love them since it increases sales, but in software it is typically "not useful"
I take the approach of "rather than see if there is work to be done, I'll try to do work and at that point discover one of two things:
a) work is there to be done, lets do it
b) work is not there to be done, skip it
the rownum check works however.
Hello Tom,
I have the query below - service_agreement_tas_d and cm1_agreement_param_tas_d have about 2 million rows each, csm_offer_Param has about 1000 rows. Tables are not analyzed as we are doing a data conversion where the tables do not have indexes. The join in the sub query is on key fields (meaning agreement_no, param_name, offer_instance_id define a unique row in cm1_agreement_param):
1 select
2 mod(agreement_no, 100) agreement_key,
3 agreement_no, soc, 30000000 + rownum param_seq_no,
4 sa.effective_date, sa.expiration_date, sa.source_agr_no, sa.soc_seq_no,
5 param_instance_level, param_name, param_values,
6 row_number() over
7 (partition by agreement_no, param_name order by soc_seq_no) rn
8 from service_agreement_tas_d sa,
9 (select distinct soc_cd, param_name, populate_level, param_values,
10 mandatory_ind, param_instance_level from csm_offer_param ) op
11 where sa.soc = op.soc_cd
12 and not exists (select /*+ HASH_AJ */ NULL from cm1_agreement_param_tas_d p
13 where p.agreement_no = sa.agreement_no
14 and p.param_name = op.param_name
15 and ( (op.param_instance_level = 'O'
16 and p.offer_instance_id = sa.soc_seq_no)
17 or (op.param_instance_level = 'S'
18 and p.offer_instance_id is NULL)
19 )
20* )
SQL> /
Elapsed: 00:00:00.00
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=124560227 Card=19271
2 Bytes=425700808)
1 0 WINDOW (SORT) (Cost=68275 Card=192712 Bytes=425700808)
2 1 COUNT
3 2 FILTER
4 3 HASH JOIN (Cost=9984 Card=192712 Bytes=425700808)
5 4 VIEW (Cost=4340 Card=14703 Bytes=31508529)
6 5 SORT (UNIQUE) (Cost=4340 Card=14703 Bytes=315967
7 6 TABLE ACCESS (FULL) OF 'CSM_OFFER_PARAM' (Cost
=6 Card=14703 Bytes=31596747)
8 4 PARTITION LIST (ALL)
9 8 TABLE ACCESS (FULL) OF 'SERVICE_AGREEMENT_TAS_D'
(Cost=1156 Card=3854235 Bytes=254379510)
10 3 PARTITION LIST (ALL)
11 10 TABLE ACCESS (FULL) OF 'CM1_AGREEMENT_PARAM_TAS_D'
(Cost=646 Card=1 Bytes=155)
Is it possible for the last full table access in the sub-query to use a hash mechanism?
I tried using HASH_AJ in inner query
and not exists (select /*+ HASH_AJ */ NULL from cm1_agreement_param_tas_d p
where p.agreement_no = sa.agreement_no
But that did not work.
Hash related settings:
1* select name, value from v$parameter where name like '%hash%'
SQL> /
NAME VALUE
------------------------------ ------------------------------
hash_join_enabled TRUE
hash_area_size 50000000
Thanks,
Naresh
Version used is 9.2.0.6
I tried a variation for above query with a not in - still does not use hash_aj. I put nvl for each column in the inner select and also set always_anti_join=hash (I do not know if this parameter is relevant in the version we use - 9.2.0.6). What can be done to get a hash join working for this?
1 select
2 agreement_no, soc, 30000000 + rownum param_seq_no,
3 sa.effective_date, sa.expiration_date, sa.soc_seq_no,
4 param_instance_level, param_name, param_values,
5 row_number() over
6 (partition by agreement_no, param_name order by soc_seq_no) rn
7 from service_agreement_tas_d sa,
8 (select distinct soc_cd, param_name, populate_level, param_values,
9 mandatory_ind, param_instance_level from csm_offer_param ) op
10 where sa.soc = op.soc_cd
11 and (nvl(agreement_no,0), nvl(param_name,' '), nvl(soc_seq_no, -1)) not in
12 (select /*+ HASH_AJ */ nvl(agreement_no,0), nvl(param_name,' '),
13 nvl(offer_instance_id, -1)
14* from cm1_agreement_param_tas_d)
SQL> /
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=226118412 Card=19271
2 Bytes=423195552)
1 0 WINDOW (SORT) (Cost=67236 Card=192712 Bytes=423195552)
2 1 COUNT
3 2 FILTER
4 3 HASH JOIN (Cost=9276 Card=192712 Bytes=423195552)
5 4 VIEW (Cost=4340 Card=14703 Bytes=31508529)
6 5 SORT (UNIQUE) (Cost=4340 Card=14703 Bytes=315967
7 6 TABLE ACCESS (FULL) OF 'CSM_OFFER_PARAM' (Cost
=6 Card=14703 Bytes=31596747)
8 4 PARTITION LIST (ALL)
9 8 TABLE ACCESS (FULL) OF 'SERVICE_AGREEMENT_TAS_D'
(Cost=1156 Card=3854235 Bytes=204274455)
10 3 PARTITION LIST (ALL)
11 10 TABLE ACCESS (FULL) OF 'CM1_AGREEMENT_PARAM_TAS_D'
(Cost=1173 Card=489 Bytes=75795)
The Card=489 for 'CM1_AGREEMENT_PARAM_TAS_D is different from the one below:
SQL> select count(*) from CM1_AGREEMENT_PARAM_TAS_D;
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=RULE (Cost=1173 Card=1)
1 0 SORT (AGGREGATE)
2 1 PARTITION LIST (ALL)
3 2 TABLE ACCESS (FULL) OF 'CM1_AGREEMENT_PARAM_TAS_D' (Co
st=1173 Card=3910431)
And this is what the user_tables shows
SQL> SQL> select lasT_analyzed, num_rows, USER_STATS from user_tables where table_name in
2 ('SERVICE_AGREEMENT_TAS_D', 'CM1_AGREEMENT_PARAM_TAS_D');
LAST_ANAL NUM_ROWS USE
--------- ---------- ---
Thanks,
Naresh
Please look into the part of query below. I need to optimize the query. This looks very big and not optimized. Please advice.
AND EXISTS ((SELECT 1
FROM springs.si_jde_schedrcpts_work sr
WHERE t.order_id = sr.t.order_id
AND t.item = sr.item
AND t.fromloc = sr.fromloc
AND sr.order_status < 90
AND sr.supplysource = 1)
OR EXISTS (SELECT 1
FROM stsc.inventory inventory
WHERE t.item = inventory.item
AND t.fromloc = inventory.loc
AND inventory.qty => t.stqty
AND sr.order_staus > 90
AND sr.supplysource = 1)
OR EXISTS (SELECT 1
FROM stsc.inventory inventory
WHERE t.item = inventory.item
AND t.fromloc = inventory.loc
AND inventory.qty => t.stqty
AND sr.order_staus < 90
AND sr.supplysource <> 1)
)
3 where 1=1
4 and exists ( (select 1 from dual)
5 or exists (select 1 from dual)
6 or exists (select 1 from dual)
7 )
or exists (select 1 from dual)
ERROR at line 5:
ORA-00907: missing right parenthesis
these two:
OR EXISTS (SELECT 1
FROM stsc.inventory inventory
WHERE t.item = inventory.item
AND t.fromloc = inventory.loc
AND inventory.qty => t.stqty
AND sr.order_staus > 90
AND sr.supplysource = 1)
OR EXISTS (SELECT 1
FROM stsc.inventory inventory
WHERE t.item = inventory.item
AND t.fromloc = inventory.loc
AND inventory.qty => t.stqty
AND sr.order_staus < 90
AND sr.supplysource <> 1)
can obviously be rolled into one - but it would require an OR which wouldn't be much different from what you have if an index on qty,status,supplysource were being used.
basically "not sufficient data to really comment"
exists is not inheritly "bad"
Hi Tom, I am here writting the whole code for the above query I have to make the changes in a Cursor.
The change description is :
In the c_recship cursor, add an edit to first of all check to see if the t.order_id, t.item and t.fromloc exist on the springs.si_jde_schedrcpts_work sr table, joining t.order_id to sr.order_id, t.item to sr.item and t.fromloc to sr.loc, where the sr. order_status < 90 and the sr.supplysource = 1. If found, create the recship. If it exists and the order_status > 90 or the t.order_id, t.item and t.fromloc does not exist at all as a supplysource = 1 row on the si_jde_schedrcpts_work table, then check the stsc.inventory table joining t.item to inventory.item, t.fromloc to inventory.loc and create recship if inventory.qty = > t.stqty.
Please suggest for the changes done by me.
CURSOR c_recship IS
SELECT t.item item, t.toloc dest, t.fromloc source, s.transmode
, TO_CHAR(NVL(t.duedate,dt_ohpost),'YYMMDD') needarrivedate
, TO_CHAR(NVL(t.duedate,dt_ohpost),'YYMMDD') schedarrivedate
, TO_CHAR(NVL(t.duedate,dt_ohpost),'YYMMDD') availtoshipdate
, TO_CHAR(NVL(t.duedate,dt_ohpost),'YYMMDD') schedshipdate
, TO_CHAR(NVL(t.duedate,dt_ohpost),'YYMMDD') needshipdate
, 4 type, 0 actionallowedsw, 1 firmsw , ROUND(t.stqty,2) qty, 0 orderid
, t.item primaryitem, 1 sys_info, t.order_id
FROM
(SELECT a.item, a.duedate
, b.fromloc, a.toloc, stqty, a.order_number order_id
FROM (SELECT DISTINCT item, TRUNC(duedate) duedate
, toloc, order_number
FROM springs.si_jde_intransit_work
WHERE TRIM(UPPER(order_type)) = 'OT'
) a
, (SELECT item, TRUNC(duedate) duedate, fromloc
, rel_ot_ord_nbr, SUM(NVL(net_st_qty,0)) stqty
FROM springs.si_jde_intransit_work
WHERE TRIM(UPPER(order_type)) = 'ST'
GROUP BY item, TRUNC(duedate), fromloc, rel_ot_ord_nbr
HAVING SUM(NVL(net_st_qty,0)) <> 0
) b
WHERE a.order_number = b.rel_ot_ord_nbr
AND a.item = b.item
) t, stsc.sourcing s
WHERE NOT EXISTS (SELECT 1 FROM stsc.loc
WHERE scen = 0 AND loc = t.toloc
AND NVL(TRIM(si_loc_type),' ') = 'DC'
)
--Change by Soni
AND EXISTS((SELECT 1 FROM springs.si_jde_schedrcpts_work sr
WHERE t.order_id = sr.t.order_id AND t.item = sr.item
AND t.fromloc = sr.fromloc AND sr.order_status < 90
AND sr.supplysource =1 )
OR EXISTS(SELECT 1 FROM stsc.inventory inventory
WHERE t.item = inventory.item
AND t.fromloc = inventory.loc AND inventory.qty => t.stqty AND sr.order_staus > 90)
OR EXISTS(SELECT 1 FROM stsc.inventory inventory
WHERE t.item = inventory.item
AND t.fromloc = inventory.loc AND inventory.qty => t.stqty AND sr.supplysource=1))
--Change by Soni till here
AND s.transmode=springs.si_transmode_lookup(t.fromloc
,t.toloc
,' ',t.item
)
AND s.item = t.item
AND s.source = t.fromloc
AND s.dest = t.toloc
AND s.scen = 0;
CURSOR c_recship IS
SELECT t.item item, t.toloc dest, t.fromloc source, s.transmode
, TO_CHAR(NVL(t.duedate,dt_ohpost),'YYMMDD') needarrivedate
, TO_CHAR(NVL(t.duedate,dt_ohpost),'YYMMDD') schedarrivedate
, TO_CHAR(NVL(t.duedate,dt_ohpost),'YYMMDD') availtoshipdate
, TO_CHAR(NVL(t.duedate,dt_ohpost),'YYMMDD') schedshipdate
, TO_CHAR(NVL(t.duedate,dt_ohpost),'YYMMDD') needshipdate
, 4 type, 0 actionallowedsw, 1 firmsw , ROUND(t.stqty,2) qty, 0 orderid , t.item primaryitem, 1 sys_info, t.order_id
FROM
(SELECT a.item, a.duedate
, b.fromloc, a.toloc, stqty, a.order_number order_id
FROM (SELECT DISTINCT item, TRUNC(duedate) duedate
, toloc, order_number
FROM springs.si_jde_intransit_work
WHERE TRIM(UPPER(order_type)) = 'OT'
) a
, (SELECT item, TRUNC(duedate) duedate, fromloc
, rel_ot_ord_nbr, SUM(NVL(net_st_qty,0)) stqty
FROM springs.si_jde_intransit_work
WHERE TRIM(UPPER(order_type)) = 'ST'
GROUP BY item, TRUNC(duedate), fromloc, rel_ot_ord_nbr
HAVING SUM(NVL(net_st_qty,0)) <> 0
) b
WHERE a.order_number = b.rel_ot_ord_nbr
AND a.item = b.item
) t, stsc.sourcing s
WHERE NOT EXISTS (SELECT 1 FROM stsc.loc
WHERE scen = 0 AND loc = t.toloc
AND NVL(TRIM(si_loc_type),' ') = 'DC'
)
--Changes by Soni from here
AND (EXISTS ( SELECT 1 FROM springs.si_jde_schedrcpts_work
sr WHERE t.order_id = sr.t.order_id AND t.item = sr.item AND t.fromloc = sr.fromloc AND sr.order_status < 90
AND sr.supplysource =1 )
OR EXISTS( SELECT 1 FROM stsc.inventory inventory
WHERE t.item = inventory.item
AND t.fromloc = inventory.loc AND
inventory.qty => t.stqty
AND sr.order_staus > 90
)
OR EXISTS( SELECT 1 FROM stsc.inventory inventory
WHERE t.item = inventory.item
AND t.fromloc = inventory.loc AND
inventory.qty => t.stqty
AND sr.supplysource=1
)
)
--Changes by Soni till here
AND s.transmode=springs.si_transmode_lookup(t.fromloc
,t.toloc ,' ',t.item
)
AND s.item = t.item AND s.source = t.fromloc
AND s.dest = t.toloc AND s.scen = 0;
**********
The changes done by me are between commnets. Please tell me if this correct. I have tested this. Its not giving any compilation error. Please check the part modified by me.
The description of the change is in the question asked before this. I want to put them not in three select statement but atleast in only two.
Thanks.
Tom,
Please keep in mind that main_report_tbl contains 4261053 records
select yard_area , area_name , chassis_lt_description , visit , gate_id , reference_id ,
ssl_user_code , container , lane_out_by , cont_ck , container_lht_description ,
gross_weight , seals , temp_mins , temp_maxs , location , loc_type , booking ,
trucker_description , driver , chassis , genset , hazardous , guard_out_date , voided_date ,
visit_voided_date , haz , vessel_voy , tir , ckr , sec , action , eir_id ,
foreign , sealink_code , lane_in_date , ssl_user_description , port_of_destination_desc ,
port_of_discharge_description , reefer , net_elapsed_time , net_time , rpm_time , clerical_time ,
clerical_services_time , remarks , damage_description , action ,
chassis , loc_type
from main_report_tbl
where guard_out_date is not null
and voided_date is null
and visit_voided_date is null
and substr ( action , 2 , 1 ) not in ( 'C' , 'G' )
and lane_in_date between to_date( '13-JUN-2005 00:00:00', 'dd-mon-yyyy hh24:mi:ss' )
and to_date( '13-JUN-2005 23:59:59', 'dd-mon-yyyy hh24:mi:ss' )
Tom..
i have some trouble with a sql...
insert into table T
select /*+ parallel(x,4) */
a,
sum(c),
....
30 columns all sum()
from table X
where b not in (2,3)
group by a
the above sql runs fine..in about 10 mins...
But i have a limitation that i have to write the above sql
like this
insert into table T
select /*+ parallel(x,4) */
a,
sum(c),
....
30 columns all sum()
from table X
where b not in (select b from table Y)
group by a..
table Y has 2 rows (2 and 3)
Now this SQL takes 40 mins to complete...any ideas improve this performance...please let me know if i am being vague...Thanks
Both the tables are analyzed and the plans are as shown below ...table X has around 7.4 million rows and table y has 2 rows.
The column b in table y does not have null values but its nullable....
the output of the select statement will be arond 170,000 rows....
SQL1:
-----
The below sql takes 25 mins to comeback
select /*+ parallel(x,2) */
a,
sum(c),
....
30 columns all sum()
from table X
where b not in (select b from table Y)
group by a..
Operation Object Name Rows Bytes Cost Object Node In/Out PStart PStop
SELECT STATEMENT Optimizer Mode=CHOOSE 73 K 162987
SORT GROUP BY 73 K 44 M 16217
FILTER
TABLE ACCESS FULL ODS.X 369 K 224 M 10175 :Q1638707000 P->S QC (RANDOM)
TABLE ACCESS FULL ODS.Y 1 4 2
SQL2:
----
The below SQL takes 6 mins to comeback
select /*+ parallel(x,2) */
a,
sum(c),
....
30 columns all sum()
from table X
where b not in (2,3)
group by a..
Operation Object Name Rows Bytes Cost Object Node In/Out PStart PStop
SELECT STATEMENT Optimizer Mode=CHOOSE 73 K 96849
SORT GROUP BY 73 K 44 M 96849 :Q1638710001 P->S QC (RANDOM)
SORT GROUP BY 73 K 44 M 96849 :Q1638710000 P->P HASH
TABLE ACCESS FULL ODS.X 7 M 4G 10175 :Q1638710000 PCWP
Tom..i tried what you said..still no change...
select /*+ parallel(x,2) */
a,
sum(c),
....
30 columns all sum()
from table X
where b not in (select b from table Y where b is not null)
group by a..
Operation Object Name Rows Bytes Cost Object Node In/Out PStart PStop
SELECT STATEMENT Optimizer Mode=CHOOSE 73 K 162987
SORT GROUP BY 73 K 44 M 16217
FILTER
TABLE ACCESS FULL ODS.X 369 K 224 M 10175 :Q1640640000 P->S QC (RANDOM)
TABLE ACCESS FULL ODS.Y 1 4 2
I have a query like
select request_id from request
where request_id not in ( select ref from refer_tab)
and reuest_id not in ( select sed from sed_tab)
in whose explian plan ( select ref from refer_tab) query was not using index.
when i write it like .....
select request_id from request
where exists( select ref from refer_tab)
and reuest_id not in ( select sed from sed_tab)
it gives me the same output and also costs is some 10000 times low and query executes fast.
I can't understand the reason that exists and not in gives same result.
ops$tkyte@ORA10GR2> select *
2 from t1
3 where NOT EXISTS (select null from t2 where t2.x = t1.x)
4 and x not in (select x from t3)
----------
is ref in refer_tab NULLABLE.
if it is, add the predicate "where ref is not null" to the not in so that NOT IN and NOT EXISTS (I'm assuming your second example, which is totally "syntax free" was really a not exists and not exists...) all of a sudden become semantically "the same"
Having worked with Oracle since v.6 I have never really gotten comfortable with the CBO.
I find that often (always?) I would like my queries to behave predictably and consistently rather than potentially (rarely) faster.
Having said that, I realize that I am being backwards about this, and thus I have set out to explore your brilliant site in search of the one thing that will win me over.
This thread is at the core of my issue with the CBO: I have a query that is clearly well suited for a hash anti join, and it runs in about 2 mins using the "poor man's hash anti join".
Both tables involved are indexed and analyzed, so I set out to rewrite my query to use NOT IN, hoping that the CBO would do the rest for me. The result: Indexes used, decrease of perfomance by a factor 3. NOT EXISTS has the same problem. No NULL columns involved.
Normally I would just shake my head and use my hard-coded anti join, but I have a little bit of time on my hands to experiment, so ...
Could you possibly list a few reasons why the CBO would not pick the hash anti join for my query (looking for new records from mta_tab_2 to add to mta_tab_1)?
mta_tab_1 has 2819134 records (acct_id,start_dt unique)
mta_tab_2 has 1844852 records
SELECT x.acct_id
,x.start_dt
FROM mta_tab_2 x
,(SELECT a.acct_id
,a.start_dt
FROM (SELECT acct_id
,start_dt
,end_dt
FROM mta_tab_1
) a
,(SELECT acct_id
,MAX(start_dt) start_dt
FROM mta_tab_1
GROUP BY acct_id
) b
WHERE b.acct_id = a.acct_id
AND :dt BETWEEN a.start_dt AND a.end_dt
AND a.start_dt = b.start_dt
) y
WHERE :dt BETWEEN x.start_dt AND x.end_dt
AND y.start_dt IS NULL
AND y.acct_id(+) = x.acct_id;
get used to the cbo - it is a fact, in 10g, it is just a plain and simple fact of life.
I always start with comparing the "guess" with "reality"
compare an autotrace traceonly explain
with
a tkprof
see where the optimizer gets estimated cardinalities wrong and ask yourself "is there a logical reason why they are wrong". From that we can usually apply corrective actions.
Looks like you really want to learn about analytics!!!!!
(SELECT a.acct_id
,a.start_dt
FROM (SELECT acct_id
,start_dt
,end_dt
FROM mta_tab_1
) a
,(SELECT acct_id
,MAX(start_dt) start_dt
FROM mta_tab_1
GROUP BY acct_id
) b
WHERE b.acct_id = a.acct_id
AND :dt BETWEEN a.start_dt AND a.end_dt
AND a.start_dt = b.start_dt
) y
could be written in a single pass as:
select acct_id, start_dt
from (
select acct_id, start_dt, end_dt,
row_number() over (partition by acct_id order by start_dt DESC) rn
from mta_tab_1
)
where rn = 1
and start_date <= :dt
and end_dt >= :dt
You just want to get the "max row by acct_id, where the max row is designated by start_dt"
And use that in a NOT IN
Seems this is the query:
select *
from mta_tab_2
where start_dt <= :dt
and end_dt >= :dt
and (acct_id) NOT IN
(
select acct_id
from (
select acct_id, start_dt, end_dt,
row_number() over (partition by acct_id order by start_dt DESC) rn
from mta_tab_1
)
where rn = 1
and start_date <= :dt
and end_dt >= :dt
and acct_id IS NOT NULL
)
Thanks for the incredible response time!
I am onto the analytics. I tried the approach out with similar queries without any significant performance improvement, and in the end decided to modify the code as little as possible (I inherited the code).
So you are saying that by comparing the autotrace and the tkprof I should be able to pinpoint where the CBO goes wrong? (The tricky part is getting access to the trace file)
SELECT uc.constraint_name,
uc.constraint_type,
uic.*
FROM user_indexes uix,
user_ind_columns uic,
(SELECT index_name,
uc.constraint_type,
uc.table_name,
uc.constraint_name
FROM user_constraints uc
WHERE uc.constraint_type = 'P') uc
WHERE uix.index_name = uic.index_name
AND uix.table_name = uic.table_name
AND uix.index_name = uc.index_name(+)
AND uc.constraint_type IS NULL
order by uic.column_position
any better idea ? can JDBC do it ?
note that the query must run from a user schema with only connect,resource prives.
Hi Tom,
What about the scalability of each of these ? I have a situation where in the data in the inner query resultset will grow almost exponentially over time whereas the outer query data will almost remain constant (comparitively speaking). Currently, the plans do not look very different from each other (I am using NOT IN & NOT EXISTS).
Thanks a lot!
Hi Tom,
Any better way to rewrite the query given below?
SELECT ol.line_id
,ol.created_by
,ol.last_updated_by
,ad.document_id
,ad.pk1_value
,ad.seq_num
FROM oe_order_lines_all ol
,fnd_attached_documents ad
,(SELECT *
FROM so_lines_all
) sl
WHERE ad.entity_name = 'OE_ORDER_LINES'
AND ol.split_from_line_id = ad.pk1_value
AND sl.header_id = ol.Header_id
AND sl.line_id = ol.split_from_line_id
AND NOT EXISTS
( SELECT 1
FROM fnd_attached_documents ad2
WHERE ol.line_id = ad2.pk1_value
AND ad2.entity_name = 'OE_ORDER_LINES'
Please do reply.
A null value was added to a column, and suddenly no data was returned from queries with a NOT IN clause. We figured out why, but then figured that other queries using things like join conditions and IN to compare the same values were probably returning different results, but that ended up not being the case.
If NULL really means "gee, I don't know. (litterally, null means Unknown)" per October 02, 2002 above, and "US" matches NULL in a NOT IN clause, why don't those two values also match in a join condition, an IN clause, a NOT EXISTS or a MINUS?
It almost seems like NOT IN is very unique in its treatment of NULLs. What are we missing? Thanks.
because the sql standard says so - it is the way it was defined to work.
but it makes sense.
where x not in ( Q )
and Q contains nulls - is X not in there? We don't know, there is no way to compare X to nulls...
where x in ( Q )
well, regardless of what Q has as far as nulls go - if X=5 and Q has 5, X is in Q.
Back to the 3 easy examples:
select count(*) from t1 rbo
where object_id not in ( select object_id from t2 );
select count(*) from t1 rbo
where NOT EXISTS (select null from t2 where t2.object_id = rbo.object_id );
select count(*) from t1, t2 rbo
where t1.object_id = rbo.object_id(+) and rbo.object_id IS NULL;
On my system (10.2.0.1), all these run with the same plan, and thus the same amount of LIO when using the CBO:
---------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 8 | 25 (24)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 8 | | |
|* 2 | HASH JOIN ANTI | | 50 | 400 | 25 (24)| 00:00:01 |
| 3 | TABLE ACCESS FULL | T1 | 5000 | 20000 | 17 (12)| 00:00:01 |
| 4 | INDEX FAST FULL SCAN| T2_IDX | 4950 | 19800 | 5 (20)| 00:00:01 |
---------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("OBJECT_ID"="OBJECT_ID")
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
80 consistent gets
0 physical reads
0 redo size
227 bytes sent via SQL*Net to client
234 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
Then Duke Ganote claimed his approach using MINUS to be better, even using CBO. On my system its not:
------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 17 | 52 (35)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 17 | | |
|* 2 | HASH JOIN | | 5000 | 85000 | 52 (35)| 00:00:01 |
| 3 | TABLE ACCESS FULL | T1 | 5000 | 20000 | 17 (12)| 00:00:01 |
| 4 | VIEW | | 5000 | 65000 | 32 (41)| 00:00:01 |
| 5 | MINUS | | | | | |
| 6 | SORT UNIQUE | | 5000 | 20000 | | |
| 7 | TABLE ACCESS FULL | T1 | 5000 | 20000 | 17 (12)| 00:00:01 |
| 8 | SORT UNIQUE | | 4950 | 19800 | | |
| 9 | INDEX FAST FULL SCAN| T2_IDX | 4950 | 19800 | 5 (20)| 00:00:01 |
------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("SET_OP"."OBJECT_ID"="RBO"."OBJECT_ID")
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
144 consistent gets
0 physical reads
0 redo size
226 bytes sent via SQL*Net to client
234 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
2 sorts (memory)
0 sorts (disk)
1 rows processed
The execution plan is much more complex, T1 is FTS'ed twice and his query needs to do 2 sorts to get to the result.
I guess this could be for a number of reasons, like:
* DBMS version
* init parameters
* difference in the way statistics where gathered
Any comments on this?
Jens
Hi Tom
I tried to solve a simple problem with the IN /EXISTS functions but I was not successfull yet.
Let's assume 2 tables T1 and T2:
T1 has attributes workerUnit, valid_from, valid_to
T2 has attributes workerUnit, year (not date but string)
T1 identifies service providers for a certain period and T2 identifies if the service providers had a budget allocated for a specific year.
What I'm trying to do is to generate a view (parametrized or not) which would allow me to perform the following query "Give me the list of valid workerUnits for a given date and also if a budget was allocated to it for that given year.
Sketch for a result set input date 24.04.2007
workerUnit | valid_from | valid_to | hasBudget
945 01.01.1988 31.12.9999 1
888 03.04.2007 31.12.9999 1
874 08.05.2007 31.03.2010 0
where 0 and 1 in the 'hasBudget' column identify a workerUnit which got budget for the year. Let's exercise it for the first row for instance
T1
workerUnit | valid_from | valid_to
945 01.01.1988 31.12.9999
222 01.01.1972 24.03.2006
workerUnit | Year
945 2004
945 2005
945 2007
Thus if the input date was 22.03.2006(1) the record would have a 0 in the hasBudget. As another example this input date(1) would also retrieve the workerUnit 222 which has a valid_to of 24.03.2006 which is older than (1) but no budget has been allocated (record missing in T2) thus hasBudget 0
the record 222 looks like (still (1) has input date)
workerUnit | valid_from | valid_to | hasBudget
222 01.01.1972 24.03.2006 0
...
Thanks your help ..
Maybe try using a join with a case
Select workerUnit, valid_from , valid_to,
case when to_date('YYYY',year) between valid_from and valid_to then 1 else 0 end
from t1 outer join t2 on t1.workerUnit = t1.workerUnit
Tom,
I want to select all the rows from a table where one column values are not in another table.
CREATE TABLE T1
(
A1 INTEGER,
ANOTHER_COL NUMBER DEFAULT 1,
ANOTHER_COL1 INTEGER,
ANOTHER_COL2 NUMBER(5)
)
/
Insert into T1
(A1, ANOTHER_COL, ANOTHER_COL1, ANOTHER_COL2)
Values
(1, 1, 1, 1);
Insert into T1
(A1, ANOTHER_COL, ANOTHER_COL1, ANOTHER_COL2)
Values
(23, 2, 3, 34);
COMMIT;
create table t3 as
select * from t1;
insert into t1 values (12,3,4,12);
commit;
I want to select all the rows in t1 where another_col1 values are not in t3 table.
select * from t1 where another_col1 not in (select another_col1 from t3) ;
How do we frame the query using Not exists?
Also, If I insert another set of values in T3:
insert into t3 values (13,4,'',23);
Why can't I get any rows when I have some values in T3 with null values ?
I can understand comparing with nulls is a problem. But it should return all rows whose values are not null, avoiding that null row, shouldn't it?
It's always been a confusion for me..can you clarify?
Thanks,
where column in (set that contains non-null and some nulls)
where, we can say that column is either in that set or not (we only need look at the non-nulls to see if that value is in that set)
where column NOT IN (same set)
well, the presence of nulls - as defined by the SQL language itself, a rule if you will - makes it impossible to see if that value is "not in" that set - we cannot tell if column not in (null) is true or false - it evaluates UNKNOWN.
so, you would in general where column not in (select * from (that set) where that_column is not null )
if that is what you want....
As for the not exists, that is pretty trivial - give that bit of work to yourself as an exercise....
Hi Tom
I have this query,it retuns 12 rows,runs for 90 minutes.
trying to tune it , but could not,spent 5 days but to no avail.
Parsing user id: 65 (APPS)
********************************************************************************
SELECT /*+ ordered use_hash(mtp) push_subq */ ooha.org_id "OPERATING UNIT ID"
FROM apps.oe_transaction_types_all ott,
apps.oe_transaction_types_tl ottt_l,
apps.oe_order_lines_all oola,
apps.mtl_parameters mtp,
apps.oe_order_headers_all ooha,
apps.hz_cust_accounts hca
WHERE ooha.header_id = oola.header_id AND
ooha.org_id = oola.org_id AND
ooha.order_source_id = 1263 AND
ott.transaction_type_id = ottt_l.transaction_type_id AND
ott.org_id = ooha.org_id AND
oola.line_type_id = ottt_l.transaction_type_id AND
oola.line_type_id = ott.transaction_type_id AND
mtp.organization_id = oola.ship_from_org_id AND
hca.cust_account_id = ooha.sold_to_org_id AND
(ottt_l.NAME LIKE 'GPO%FE_CONSIGNMENT_LINE' OR
ottt_l.NAME LIKE 'GPO%FE_SHIPMENT_LINE') AND
oola.flow_status_code IN ('SHIPPED', 'CLOSED') AND
hca.attribute9 IS NOT NULL
and NOT EXISTS
( SELECT /*+ NO_UNNEST */ 1
FROM apps.mtl_material_transactions mmt ,
apps.mtl_transaction_lot_numbers mtnl ,
apps.mtl_parameters mp ,
apps.mtl_item_locations mil
WHERE mmt.inventory_item_id = oola.inventory_item_id AND
mmt.transaction_type_id IN (420, 40, 41, 42) AND
mmt.transaction_quantity > 0 AND
mtnl.lot_number = ooha.order_number AND
mtnl.transaction_id = mmt.transaction_id AND
mmt.transaction_date > (TRUNC(SYSDATE) -90) AND
mmt.organization_id = mp.organization_id AND
mp.attribute7 = 'PARTS' AND
mp.attribute11 ='SERVICE' AND
mil.inventory_location_id = hca.attribute9 AND
mil.organization_id = mp.organization_id
) AND
NVL(oola.actual_shipment_date, oola.schedule_ship_date) > SYSDATE - 90
/
its tkporf output is
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 690.28 4899.16 1524771 23494672 0 12
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 690.28 4899.16 1524771 23494672 0 12
Misses in library cache during parse: 0
Optimizer goal: CHOOSE
Parsing user id: 65 (APPS)
Rows Row Source Operation
------- ---------------------------------------------------
12 NESTED LOOPS (cr=23494672 r=1524771 w=0 time=4640213970 us)
157696 NESTED LOOPS (cr=5730245 r=1402270 w=0 time=3699650845 us)
157697 HASH JOIN (cr=5257151 r=1379549 w=0 time=3576187058 us)
157697 TABLE ACCESS BY INDEX ROWID OE_ORDER_LINES_ALL (cr=5257149 r=1379549 w=0 time=3575115599 us)
2789657 NESTED LOOPS (cr=13232 r=12982 w=0 time=90323177 us)
76 HASH JOIN (cr=277 r=82 w=0 time=175347 us)
4597 TABLE ACCESS FULL OE_TRANSACTION_TYPES_ALL (cr=157 r=0 w=0 time=6683 us)
76 TABLE ACCESS FULL OE_TRANSACTION_TYPES_TL (cr=120 r=82 w=0 time=157808 us)
2789580 INDEX RANGE SCAN GEMS_ONT_ORDER_LINES_N99 (cr=12955 r=12900 w=0 time=86301122 us)(object id 1863725)
377 INDEX FULL SCAN MTL_PARAMETERS_U1 (cr=2 r=0 w=0 time=367 us)(object id 9847)
157696 TABLE ACCESS BY INDEX ROWID OE_ORDER_HEADERS_ALL (cr=473094 r=22721 w=0 time=123022055 us)
157697 INDEX UNIQUE SCAN OE_ORDER_HEADERS_U1 (cr=315397 r=1363 w=0 time=11565697 us)(object id 688729)
12 TABLE ACCESS BY INDEX ROWID HZ_CUST_ACCOUNTS (cr=17764427 r=122501 w=0 time=1114177945 us)
157696 INDEX UNIQUE SCAN HZ_CUST_ACCOUNTS_U1 (cr=315395 r=274 w=0 time=3770326 us)(object id 715003)
157621 TABLE ACCESS BY INDEX ROWID MTL_TRANSACTION_LOT_NUMBERS (cr=17291336 r=121205 w=0 time=1096619120 us)
6897874 NESTED LOOPS (cr=14562731 r=114663 w=0 time=1018837983 us)
3448931 NESTED LOOPS (cr=7488435 r=112057 w=0 time=936487677 us)
157633 NESTED LOOPS (cr=788165 r=581 w=0 time=10022817 us)
157633 INDEX RANGE SCAN MTL_ITEM_LOCATIONS_U1 (cr=472899 r=581 w=0 time=7984520 us)(object id 9761)
157633 TABLE ACCESS BY INDEX ROWID MTL_PARAMETERS (cr=315266 r=0 w=0 time=1667395 us)
157633 INDEX UNIQUE SCAN MTL_PARAMETERS_U1 (cr=157633 r=0 w=0 time=770942 us)(object id 9847)
3448931 TABLE ACCESS BY INDEX ROWID MTL_MATERIAL_TRANSACTIONS (cr=6700270 r=111476 w=0 time=924106849 us)
8273225 INDEX RANGE SCAN MTL_MATERIAL_TRANSACTIONS_N1 (cr=695175 r=32781 w=0 time=303092226 us)(object id 6205511)
3448931 INDEX RANGE SCAN MTL_TRANSACTION_LOT_NUMBERS_N1 (cr=7074296 r=2606 w=0 time=74416944 us)(object id 9976)
is there anything which can be done to improve its response time.
is push_subq is the correct choice here.
how does not exists work - please explain using the above query and the tkprof output..
what is wrong here.
thanks
Tom, there is an explanation I usually give for developers who ask me why "1 in (1,2,null)" returns TRUE but "3 not in (1,2,null)" returns FALSE. Please tell me what you think:
The IN operation is a series of *equality* comparisons that is ultimately resolved by an OR:
1 = 1 ? TRUE
1 = 2 ? FALSE
1 = NULL ? FALSE
TRUE OR FALSE OR FALSE = TRUE
The NOT IN operation is a series of *inequality* comparisons that is ultimately resolved by an AND:
3 != 1 ? TRUE
3 != 2 ? TRUE
3 != NULL ? FALSE
TRUE AND TRUE AND FALSE = FALSE
with the minor exception that 3!=NULL is not false, it is not true, it is UNKNOWN
ops$tkyte%ORA10GR2> select * from dual where 3 <> NULL;
no rows selected
ops$tkyte%ORA10GR2> select * from dual where NOT(3 <> NULL);
no rows selected
Thanks for this wonderful thread. i read:
<Quote>
Because NULL means -- gee, I don't know. (litterally, null means Unknown)
So, the predicate
where x not in ( NULL )
evaluates to neither TRUE, nor FALSE
</qoute>
but when using in ( subquery), if subquery contains null, it will select rows ,why is that?
SQL> desc cc
SQL> desc dd
SQL> insert into dd values(null);
1 row created.
SQL> select * from cc where a in ( select * from dd);
----------
1 row selected.
because the rules say "where 5 in ( 5, NULL )" is true because 5 is in (5).
But "where 5 NOT IN (1, null)" is unknown - it is unknown if "5 is not in (null)"
ops$tkyte%ORA11GR1> select * from dual where 5 in (5,null);
ops$tkyte%ORA11GR1> select * from dual where 5 NOT in (1,null);
no rows selected
Hello Tom
Thanks for clearing my doubts about the not in. I have following query , could you please help how conceptually evaluate this query?
SQL> create table t1 ( x int);
Table created.
SQL> create table t2 ( x int);
Table created.
SQL> select * from t1;
----------
3 rows selected.
SQL> select * from t2;
----------
3 rows selected.
SQL> select * from t1,t2 where exists ( select null from t2 where t1.x=t2.x);
X X
---------- ----------
1 1
1 3
3 rows selected.
SQL> select * from t1,t2 where exists ( select null from t1 where t1.x=t2.x);
X X
---------- ----------
1 1
2 1
3 rows selected.
I am not sure how those two query are executed, Most appreciated if you can give an explaination.
Thanks~
... select * from t1,t2 ...
join every row in t1 with every row in t2.
you have
1
2
and
1
3
that results in
t1.x,t2.x
1,1
1,3
2,1
2,3
then - apply the filter:
where exists ( select null from t2 where t1.x=t2.x);
to each row.
You can always envision:
select *
from <table list>
where <predicate>
as a cartesian join of all tables in the table list (every row in table t1 joined with every row in t2, joined with every row in t3 and so on - so if you had 10 rows in each table - you'd end up with 10x10x10 rows in the result set) AND THEN run the predicate, the filter against them.
Hi Tom
Thanks for your explaination about the sql execution. one more question i am curious to know is:
suppose i issue sql : select * from table1 where a in (select a from table2),suppose hash semi-join is used , does oracle have to finish the hash join for all rows before feeding back the resultset to the client or during the execution?
Or is there a universal rule for oracle to feed back the result set, feed back during the sql execution or after the execution?
with a hash (semi or not) join, the plan is typically like this:
hash join
full scan t1
full scan t2
One of the tables will be full scanned and hashed (hopefully into memory, but not necessarily)
Once that step has taken place, we can start full scanning the other table and probing the hash table for the matches
and start returning them.
So, you will full scan and hash one table BEFORE you get the first row back.
So, does Oracle have to finish the hash join for all rows? - No
sounds good to me - but then I have no information of any use, so I get to make it up...
there are lots of indexes, and the delete hits them all.
there is no index on entryid and you are removing 20,000 out of 2,000,000 records
there is no index on C536870913 and you are finding 1 record out of 100,000
things like that - I'll just assume they are true - so therefore, 8 and 6 seconds is really fast.
no tables
no indexing scheme
no plans
no tkprofs
in short - nothing to really look at.
what the heck is a rooms table?
anyway, if there are any rows in rooms such that "30 > roomsize" then the subquery "(select 1 from rooms
where 30 > roomsize)" obviously returns a record - doesn't it (we started with the supposition that there are some rows in rooms such that 30>roomsize).
Now, if that row exists - then "select x from t where exists (that subquery we know returns AT LEAST one row)" - would return every row from T - since the subquery by our definition here returns at least one row.
Hi Tom,
Oracle Database Version 11G
We had a bug in our application where in the original query was like this..
select bank_branch_name, bank_name, bank_branch_id
from xx_bank_branches
WHERE BANK_BRANCH_ID IN
(SELECT BANK_BRANCH_ID
from xx_bank_accounts a
WHERE a.ACCOUNT_TYPE <> 'EXTERNAL' )
order by bank_branch_name, bank_name ;
I thought of re-writing the query as
select bank_branch_name, bank_name, bank_branch_id
from xx_bank_branches b
where EXISTS ( SELECT 'x' FROM
xx_BANK_ACCOUNTS A
WHERE A.ACCOUNT_TYPE <> 'EXTERNAL'
and a.bank_branch_id = b.bank_branch_id )
order by bank_branch_name, bank_name;
However when I compared the explain plan.. the plan generated by both the queries was exactly the same..
My question is did CBO treat both the queries as similar ?
/* Formatted on 3/15/2011 10:37:11 AM (QP5 v5.126.903.23003) */
SELECT --gjh.period_name,gjh.je_Source,gjh.Name,gjh.Description,---rico
gjh.je_header_id journal_ctrl_num, gjl.je_line_num sequence_id
,gcc.segment1 || gcc.segment10 || gcc.segment2 || gcc.segment3 || gcc.segment4 || gcc.segment5 account_code
, --gcc.segment4,--RICO
SUBSTR(ap_main.description, 1, 40) journal_description
,(CASE WHEN gcc.segment3 IN ('1001', '1002') THEN SUBSTR(ap_main.doc_description, 1, 16) ELSE NULL END) document_1
,NULL document_2, SUBSTR(gjl.attribute1, 1, 32) reference_code, NVL(gjl.accounted_dr, -gjl.accounted_cr) balance
,NVL(gjl.entered_dr, -gjl.entered_cr) nat_balance, SUBSTR(gjh.currency_code, 1, 8) nat_cur_code
,gjh.currency_conversion_rate rate, gjh.currency_conversion_type rate_type_home
,DECODE(gjh.status, 'P', 1, 'U', 0, -1) posted_flag, TO_CHAR(GJH.POSTED_DATE, 'yyyy-mm-dd hh24:mi:ss') date_posted
,TO_CHAR(SYSDATE, 'yyyy-mm-dd hh24:mi:ss') date_transfer
FROM apps.gl_je_headers gjh
,apps.gl_je_lines gjl
,apps.gl_code_combinations gcc
,--gl.gl_period_statuses gps ,
( SELECT ap.JE_HEADER_ID, ap.je_line_num, MAX(ap.description) description, MAX(ap.doc_description) doc_description
FROM (SELECT GJH.JE_HEADER_ID, gjl.je_line_num, gjh.period_name, aid.description, NULL doc_description
FROM apps.ap_ae_headers_all aeh, apps.ap_ae_lines_all ael, apps.gl_import_references imp
,apps.gl_je_headers gjh, apps.gl_je_lines gjl, apps.Ap_Invoice_Distributions_All aid
WHERE aeh.ae_header_id = ael.ae_header_id
AND AEL.SOURCE_TABLE = 'AP_INVOICE_DISTRIBUTIONS'
AND AEL.SOURCE_ID = AID.INVOICE_DISTRIBUTION_ID
AND ael.gl_sl_link_id = imp.gl_sl_link_id
AND imp.Gl_Sl_Link_Table = 'APECL'
AND gjh.je_header_id = imp.je_header_id
AND gjh.je_header_id = gjl.je_header_id
AND gjh.set_of_books_id = 1001
AND gjl.je_line_num = imp.je_line_num
AND aeh.gl_transfer_flag = 'Y'
AND aeh.set_of_books_id = 1001
AND(GJH.PERIOD_NAME = :cp_period_name OR GJH.PERIOD_NAME = :cp_prior_period_name)
UNION
--select GJH.JE_HEADER_ID,gjl.je_line_num ,gjh.period_name ,aid.description
SELECT GJH.JE_HEADER_ID, gjl.je_line_num, gjh.period_name, aid.description, b.description doc_description
FROM apps.ap_ae_headers_all aeh
,apps.ap_ae_lines_all ael
,apps.gl_import_references imp
,apps.ap_invoice_payments_all pay
,apps.ap_invoices_all inv
,apps.gl_je_headers gjh
,apps.gl_je_lines gjl
,apps.Ap_Invoice_Distributions_All aid
,apps.JA_CN_CFS_ACTIVITIES_ALL JCCA
,(SELECT fvv.FLEX_VALUE, fvv.description
FROM apps.FND_FLEX_VALUES_VL FVV, apps.FND_FLEX_VALUE_SETS FVS
WHERE FVS.FLEX_VALUE_SET_NAME = 'JA_CN_CASHFLOW_ITEMS' AND FVV.FLEX_VALUE_SET_ID = FVS.FLEX_VALUE_SET_ID)
b
WHERE aeh.ae_header_id = ael.ae_header_id
AND ael.source_table = 'AP_INVOICE_PAYMENTS'
AND ael.source_id = pay.invoice_payment_id
AND pay.invoice_id = inv.invoice_id
AND ael.gl_sl_link_id = imp.gl_sl_link_id
AND imp.Gl_Sl_Link_Table = 'APECL'
AND gjh.je_header_id = imp.je_header_id
AND gjh.je_header_id = gjl.je_header_id
AND gjh.set_of_books_id = 1001
AND gjl.je_line_num = imp.je_line_num
AND aeh.set_of_books_id = 1001
AND aeh.gl_transfer_flag = 'Y'
AND(GJH.PERIOD_NAME = :cp_period_name OR GJH.PERIOD_NAME = :cp_prior_period_name)
AND pay.invoice_id = jcca.TRX_id(+)
AND jcca.SOURCE(+) = 'AP'
AND b.FLEX_VALUE(+) = JCCA.DETAILED_CFS_ITEM
AND inv.invoice_id = aid.invoice_id
UNION
SELECT GJH.JE_HEADER_ID, gjl.je_line_num, gjh.period_name, aid.description, NULL doc_description
FROM apps.ap_ae_headers_all aeh, apps.ap_ae_lines_all ael, apps.gl_import_references imp
,apps.ap_invoices_all inv, apps.gl_je_headers gjh, apps.gl_je_lines gjl
,apps.Ap_Invoice_Distributions_All aid
WHERE aeh.ae_header_id = ael.ae_header_id
AND ael.source_table = 'AP_INVOICES'
AND ael.source_id = inv.invoice_id
AND ael.gl_sl_link_id = imp.gl_sl_link_id
AND imp.Gl_Sl_Link_Table = 'APECL'
AND gjh.je_header_id = imp.je_header_id
AND gjh.je_header_id = gjl.je_header_id
AND gjh.set_of_books_id = 1001
AND gjl.je_line_num = imp.je_line_num
AND aeh.set_of_books_id = 1001
AND aeh.gl_transfer_flag = 'Y'
AND(GJH.PERIOD_NAME = :cp_period_name OR GJH.PERIOD_NAME = :cp_prior_period_name)
AND inv.invoice_id = aid.invoice_id
UNION
SELECT GJH.JE_HEADER_ID, gjl.je_line_num, gjh.period_name, gjl.description, b.description doc_description
FROM apps.ap_ae_headers_all aeh
,apps.ap_ae_lines_all ael
,apps.gl_import_references imp
,apps.AP_CHECKS_ALL AC
,apps.gl_je_headers gjh
,apps.gl_je_lines gjl
,apps.JA_CN_CFS_ACTIVITIES_ALL JCCA
,(SELECT fvv.FLEX_VALUE, fvv.description
FROM apps.FND_FLEX_VALUES_VL FVV, apps.FND_FLEX_VALUE_SETS FVS
WHERE FVS.FLEX_VALUE_SET_NAME = 'JA_CN_CASHFLOW_ITEMS' AND FVV.FLEX_VALUE_SET_ID = FVS.FLEX_VALUE_SET_ID)
b
,apps.ap_invoice_payments_all pay
WHERE aeh.ae_header_id = ael.ae_header_id
AND ael.source_table = 'AP_CHECKS'
AND ael.source_id = AC.CHECK_ID
AND ael.gl_sl_link_id = imp.gl_sl_link_id
AND imp.Gl_Sl_Link_Table = 'APECL'
AND gjh.je_header_id = imp.je_header_id
AND gjh.je_header_id = gjl.je_header_id
AND gjh.set_of_books_id = 1001
AND gjl.je_line_num = imp.je_line_num
AND aeh.set_of_books_id = 1001
AND aeh.gl_transfer_flag = 'Y'
AND(GJH.PERIOD_NAME = :cp_period_name OR GJH.PERIOD_NAME = :cp_prior_period_name)
AND ac.check_id = pay.check_id
AND pay.invoice_id = jcca.TRX_id(+)
AND jcca.SOURCE(+) = 'AP'
AND b.FLEX_VALUE(+) = JCCA.DETAILED_CFS_ITEM) ap
GROUP BY ap.JE_HEADER_ID, ap.je_line_num) ap_main
WHERE gjh.set_of_books_id = 1001
AND(GJH.PERIOD_NAME = :cp_period_name OR GJH.PERIOD_NAME = :cp_prior_period_name)
--and gjh.period_name =gps.period_name
AND gjh.je_header_id = gjl.je_header_id
AND gjl.code_combination_id = gcc.code_combination_id
AND ap_main.je_header_id = gjh.je_header_id
AND ap_main.je_line_num = gjl.je_line_num
UNION ALL
--ar
SELECT --gjh.period_name,gjh.je_Source,gjh.Name,gjh.Description,---rico
gjh.je_header_id journal_ctrl_num, gjl.je_line_num sequence_id
,gcc.segment1 || gcc.segment10 || gcc.segment2 || gcc.segment3 || gcc.segment4 || gcc.segment5 ccount_code
, --gcc.segment4,--RICO
SUBSTR(gjl.description, 1, 40) journal_description
,(CASE WHEN gcc.segment3 IN ('1001', '1002') THEN SUBSTR(ar_main.doc_description, 1, 16) ELSE NULL END) document_1
,NULL document_2, SUBSTR(gjl.attribute1, 1, 32) reference_code, NVL(gjl.accounted_dr, -gjl.accounted_cr) balance
,NVL(gjl.entered_dr, -gjl.entered_cr) nat_balance, SUBSTR(gjh.currency_code, 1, 8) nat_cur_code
,gjh.currency_conversion_rate rate, gjh.currency_conversion_type rate_type_home
,DECODE(gjh.status, 'P', 1, 'U', 0, -1) posted_flag, TO_CHAR(GJH.POSTED_DATE, 'yyyy-mm-dd hh24:mi:ss') date_posted
,TO_CHAR(SYSDATE, 'yyyy-mm-dd hh24:mi:ss') date_transfer
FROM apps.gl_je_headers gjh
,apps.gl_je_lines gjl
,apps.gl_code_combinations gcc
,( SELECT ar.JE_HEADER_ID, ar.je_line_num, MAX(ar.description) doc_description
FROM (SELECT GJH.JE_HEADER_ID, gjl.je_line_num, b.description
FROM apps.gl_je_headers gjh
,apps.gl_je_lines gjl
,apps.gl_import_references gir
,-- ar_cash_receipts_all acr,
apps.AR_CASH_RECEIPTS_ALL ACR
,apps.JA_CN_CFS_ACTIVITIES_ALL JCCA
,(SELECT fvv.FLEX_VALUE, fvv.description
FROM apps.FND_FLEX_VALUES_VL FVV, apps.FND_FLEX_VALUE_SETS FVS
WHERE FVS.FLEX_VALUE_SET_NAME = 'JA_CN_CASHFLOW_ITEMS' AND FVV.FLEX_VALUE_SET_ID = FVS.FLEX_VALUE_SET_ID)
b
WHERE gjh.je_source = 'Receivables'
AND gjh.je_header_id = gjl.je_header_id
AND gjh.set_of_books_id = 1001
AND gir.je_header_id = gjh.je_header_id
AND gir.je_line_num = gjl.je_line_num
-- NTP 4-Mar-11
--AND SUBSTR (gir.reference_2, 1, INSTR (gir.reference_2, 'C') - 1) =
-- TO_CHAR (acr.CASH_RECEIPT_ID)
AND TO_NUMBER(SUBSTR(gir.reference_2, 1, INSTR(gir.reference_2, 'C') - 1)) = acr.CASH_RECEIPT_ID
AND ACR.CASH_RECEIPT_ID = jcca.TRX_id(+)
AND jcca.SOURCE(+) = 'AR'
AND b.FLEX_VALUE(+) = JCCA.DETAILED_CFS_ITEM
UNION
SELECT GJH.JE_HEADER_ID, gjl.je_line_num, b.description
FROM apps.gl_je_headers gjh
,apps.gl_je_lines gjl
,apps.gl_import_references gir
,apps.ar_receivable_applications_all ara
,apps.JA_CN_CFS_ACTIVITIES_ALL JCCA
,(SELECT fvv.FLEX_VALUE, fvv.description
FROM apps.FND_FLEX_VALUES_VL FVV, apps.FND_FLEX_VALUE_SETS FVS
WHERE FVS.FLEX_VALUE_SET_NAME = 'JA_CN_CASHFLOW_ITEMS' AND FVV.FLEX_VALUE_SET_ID = FVS.FLEX_VALUE_SET_ID)
b
WHERE --gjh.je_source= 'Receivables'
gjh .je_header_id = gjl.je_header_id
AND gjh.set_of_books_id = 1001
AND gir.je_header_id = gjh.je_header_id
AND gir.je_line_num = gjl.je_line_num
AND NVL(gir.reference_2, 0) = TO_CHAR(ara.applied_customer_trx_id)
AND ara.applied_customer_trx_id = jcca.TRX_id(+)
AND jcca.SOURCE(+) = 'AR'
AND b.FLEX_VALUE(+) = JCCA.DETAILED_CFS_ITEM) ar
GROUP BY ar.JE_HEADER_ID, ar.je_line_num) ar_main
WHERE gjh.set_of_books_id = 1001
AND(GJH.PERIOD_NAME = :cp_period_name OR GJH.PERIOD_NAME = :cp_prior_period_name)
--and gjh.period_name =gps.period_name
AND gjh.je_header_id = gjl.je_header_id
AND gjl.code_combination_id = gcc.code_combination_id
AND ar_main.je_header_id = gjh.je_header_id
AND ar_main.je_line_num = gjl.je_line_num
UNION ALL
--gl
SELECT --gjh.period_name,gjh.je_Source,gjh.Name,gjh.Description,---rico
gjh.je_header_id journal_ctrl_num, gjl.je_line_num sequence_id
,gcc.segment1 || gcc.segment10 || gcc.segment2 || gcc.segment3 || gcc.segment4 || gcc.segment5 account_code
, ---gcc.segment4,--RICO
SUBSTR(gjl.description, 1, 40) journal_description
,DECODE(gcc.segment3, '1001', SUBSTR(b.description, 1, 16), '1002', SUBSTR(b.description, 1, 16), NULL) document_1
,NULL document_2, SUBSTR(gjl.attribute1, 1, 32) reference_code, NVL(gjl.accounted_dr, -gjl.accounted_cr) balance
,NVL(gjl.entered_dr, -gjl.entered_cr) nat_balance, SUBSTR(gjh.currency_code, 1, 8) nat_cur_code
,gjh.currency_conversion_rate rate, gjh.currency_conversion_type rate_type_home
,DECODE(gjh.status, 'P', 1, 'U', 0, -1) posted_flag, TO_CHAR(GJH.POSTED_DATE, 'yyyy-mm-dd hh24:mi:ss') date_posted
,TO_CHAR(SYSDATE, 'yyyy-mm-dd hh24:mi:ss') date_transfer
FROM apps.gl_je_headers gjh
,apps.gl_je_lines gjl
,apps.gl_code_combinations gcc
,(SELECT fvv.FLEX_VALUE, fvv.description
FROM apps.FND_FLEX_VALUES_VL FVV, apps.FND_FLEX_VALUE_SETS FVS
WHERE FVS.FLEX_VALUE_SET_NAME = 'JA_CN_CASHFLOW_ITEMS' AND FVV.FLEX_VALUE_SET_ID = FVS.FLEX_VALUE_SET_ID) b
WHERE gjh.je_header_id = gjl.je_header_id
AND gjh.set_of_books_id = 1001
AND gjl.global_attribute1 = B.FLEX_VALUE
AND gjl.code_combination_id = gcc.code_combination_id
AND(GJH.PERIOD_NAME = :cp_period_name OR GJH.PERIOD_NAME = :cp_prior_period_name)
UNION ALL
SELECT ---gjh.period_name,gjh.je_Source,gjh.Name,gjh.Description,---rico
gjh.je_header_id journal_ctrl_num, gjl.je_line_num sequence_id
,gcc.segment1 || gcc.segment10 || gcc.segment2 || gcc.segment3 || gcc.segment4 || gcc.segment5 account_code
, ---gcc.segment4,--RICO
SUBSTR(gjl.description, 1, 40) journal_description, NULL document_1, NULL document_2
,SUBSTR(gjl.attribute1, 1, 32) reference_code, NVL(gjl.accounted_dr, -gjl.accounted_cr) balance
,NVL(gjl.entered_dr, -gjl.entered_cr) nat_balance, SUBSTR(gjh.currency_code, 1, 8) nat_cur_code
,gjh.currency_conversion_rate rate, gjh.currency_conversion_type rate_type_home
,DECODE(gjh.status, 'P', 1, 'U', 0, -1) posted_flag, TO_CHAR(GJH.POSTED_DATE, 'yyyy-mm-dd hh24:mi:ss') date_posted
,TO_CHAR(SYSDATE, 'yyyy-mm-dd hh24:mi:ss') date_transfer
FROM apps.gl_je_headers gjh, apps.gl_je_lines gjl, apps.gl_code_combinations gcc
-- gl.gl_period_statuses gps
WHERE gjh.set_of_books_id = 1001
AND(GJH.PERIOD_NAME = :cp_period_name OR GJH.PERIOD_NAME = :cp_prior_period_name)
--and gjh.period_name =gps.period_name
AND gjh.je_header_id = gjl.je_header_id
AND gjh.set_of_books_id = 1001
AND gjl.code_combination_id = gcc.code_combination_id
AND(GJH.JE_HEADER_ID, GJL.JE_LINE_NUM) NOT IN
(SELECT A.JE_HEADER_ID, A.je_line_num
FROM (SELECT GJH.JE_HEADER_ID, gjl.je_line_num, gjh.period_name
FROM apps.ap_ae_headers_all aeh, apps.ap_ae_lines_all ael
,apps.gl_import_references imp, apps.gl_je_headers gjh
,apps.gl_je_lines gjl, apps.Ap_Invoice_Distributions_All aid
WHERE aeh.ae_header_id = ael.ae_header_id
AND AEL.SOURCE_TABLE = 'AP_INVOICE_DISTRIBUTIONS'
AND AEL.SOURCE_ID = AID.INVOICE_DISTRIBUTION_ID
AND ael.gl_sl_link_id = imp.gl_sl_link_id
AND imp.Gl_Sl_Link_Table = 'APECL'
AND gjh.je_header_id = imp.je_header_id
AND gjh.set_of_books_id = 1001
AND gjh.je_header_id = gjl.je_header_id
AND gjl.je_line_num = imp.je_line_num
AND aeh.set_of_books_id = 1001
AND aeh.gl_transfer_flag = 'Y'
AND(GJH.PERIOD_NAME = :cp_period_name OR GJH.PERIOD_NAME = :cp_prior_period_name)
UNION
SELECT GJH.JE_HEADER_ID, gjl.je_line_num, gjh.period_name
FROM apps.ap_ae_headers_all aeh, apps.ap_ae_lines_all ael, apps.gl_import_references imp
,apps.ap_invoice_payments_all pay, apps.ap_invoices_all inv, apps.gl_je_headers gjh
,apps.gl_je_lines gjl, apps.Ap_Invoice_Distributions_All aid
WHERE aeh.ae_header_id = ael.ae_header_id
AND ael.source_table = 'AP_INVOICE_PAYMENTS'
AND ael.source_id = pay.invoice_payment_id
AND pay.invoice_id = inv.invoice_id
AND ael.gl_sl_link_id = imp.gl_sl_link_id
AND imp.Gl_Sl_Link_Table = 'APECL'
AND gjh.je_header_id = imp.je_header_id
AND gjh.set_of_books_id = 1001
AND gjh.je_header_id = gjl.je_header_id
AND gjl.je_line_num = imp.je_line_num
AND aeh.set_of_books_id = 1001
AND aeh.gl_transfer_flag = 'Y'
AND(GJH.PERIOD_NAME = :cp_period_name OR GJH.PERIOD_NAME = :cp_prior_period_name)
AND inv.invoice_id = aid.invoice_id
UNION
SELECT GJH.JE_HEADER_ID, gjl.je_line_num, gjh.period_name
FROM apps.ap_ae_headers_all aeh, apps.ap_ae_lines_all ael, apps.gl_import_references imp
,apps.ap_invoices_all inv, apps.gl_je_headers gjh, apps.gl_je_lines gjl
,apps.Ap_Invoice_Distributions_All aid
WHERE aeh.ae_header_id = ael.ae_header_id
AND ael.source_table = 'AP_INVOICES'
AND ael.source_id = inv.invoice_id
AND ael.gl_sl_link_id = imp.gl_sl_link_id
AND imp.Gl_Sl_Link_Table = 'APECL'
AND gjh.je_header_id = imp.je_header_id
AND gjh.set_of_books_id = 1001
AND gjh.je_header_id = gjl.je_header_id
AND gjl.je_line_num = imp.je_line_num
AND aeh.set_of_books_id = 1001
AND aeh.gl_transfer_flag = 'Y'
AND(GJH.PERIOD_NAME = :cp_period_name OR GJH.PERIOD_NAME = :cp_prior_period_name)
AND inv.invoice_id = aid.invoice_id
UNION
SELECT GJH.JE_HEADER_ID, gjl.je_line_num, gjh.period_name
FROM apps.ap_ae_headers_all aeh, apps.ap_ae_lines_all ael, apps.gl_import_references imp
,apps.AP_CHECKS_ALL AC, apps.gl_je_headers gjh, apps.gl_je_lines gjl
WHERE aeh.ae_header_id = ael.ae_header_id
AND ael.source_table = 'AP_CHECKS'
AND ael.source_id = AC.CHECK_ID
AND ael.gl_sl_link_id = imp.gl_sl_link_id
AND imp.Gl_Sl_Link_Table = 'APECL'
AND gjh.je_header_id = imp.je_header_id
AND gjh.set_of_books_id = 1001
AND gjh.je_header_id = gjl.je_header_id
AND gjl.je_line_num = imp.je_line_num
AND aeh.set_of_books_id = 1001
AND aeh.gl_transfer_flag = 'Y'
AND(GJH.PERIOD_NAME = :cp_period_name OR GJH.PERIOD_NAME = :cp_prior_period_name)
--ar
UNION
SELECT GJH.JE_HEADER_ID, gjl.je_line_num, gjh.period_name
FROM apps.gl_je_headers gjh
,apps.gl_je_lines gjl
,apps.gl_import_references gir
,-- ar_cash_receipts_all acr,
apps.AR_CASH_RECEIPTS_ALL ACR
,apps.JA_CN_CFS_ACTIVITIES_ALL JCCA
,(SELECT fvv.FLEX_VALUE, fvv.description
FROM apps.FND_FLEX_VALUES_VL FVV, apps.FND_FLEX_VALUE_SETS FVS
WHERE FVS.FLEX_VALUE_SET_NAME = 'JA_CN_CASHFLOW_ITEMS'
AND FVV.FLEX_VALUE_SET_ID = FVS.FLEX_VALUE_SET_ID) b
WHERE gjh.je_source = 'Receivables'
AND gjh.je_header_id = gjl.je_header_id
AND gjh.set_of_books_id = 1001
AND gir.je_header_id = gjh.je_header_id
AND gir.je_line_num = gjl.je_line_num
AND TO_NUMBER(SUBSTR(gir.reference_2, 1, INSTR(gir.reference_2, 'C') - 1)) = acr.CASH_RECEIPT_ID
AND ACR.CASH_RECEIPT_ID = jcca.TRX_id(+)
AND jcca.SOURCE(+) = 'AR'
AND b.FLEX_VALUE(+) = JCCA.DETAILED_CFS_ITEM
AND(GJH.PERIOD_NAME = :cp_period_name OR GJH.PERIOD_NAME = :cp_prior_period_name)
UNION
SELECT GJH.JE_HEADER_ID, gjl.je_line_num, gjh.period_name
FROM apps.gl_je_headers gjh
,apps.gl_je_lines gjl
,apps.gl_import_references gir
,apps.ar_receivable_applications_all ara
,apps.JA_CN_CFS_ACTIVITIES_ALL JCCA
,(SELECT fvv.FLEX_VALUE, fvv.description
FROM apps.FND_FLEX_VALUES_VL FVV, apps.FND_FLEX_VALUE_SETS FVS
WHERE FVS.FLEX_VALUE_SET_NAME = 'JA_CN_CASHFLOW_ITEMS'
AND FVV.FLEX_VALUE_SET_ID = FVS.FLEX_VALUE_SET_ID) b
WHERE --gjh.je_source= 'Receivables'
gjh .je_header_id = gjl.je_header_id
AND gir.je_header_id = gjh.je_header_id
AND gjh.set_of_books_id = 1001
AND gir.je_line_num = gjl.je_line_num
AND NVL(gir.reference_2, 0) = TO_CHAR(ara.applied_customer_trx_id)
AND ara.applied_customer_trx_id = jcca.TRX_id(+)
AND jcca.SOURCE(+) = 'AR'
AND b.FLEX_VALUE(+) = JCCA.DETAILED_CFS_ITEM
AND(GJH.PERIOD_NAME = :cp_period_name OR GJH.PERIOD_NAME = :cp_prior_period_name)
--gl
UNION
SELECT GJH.JE_HEADER_ID, gjl.je_line_num, gjh.period_name
FROM gl.gl_je_headers gjh
,gl.gl_je_lines gjl
,(SELECT fvv.FLEX_VALUE, fvv.description
FROM apps.FND_FLEX_VALUES_VL FVV, apps.FND_FLEX_VALUE_SETS FVS
WHERE FVS.FLEX_VALUE_SET_NAME = 'JA_CN_CASHFLOW_ITEMS'
AND FVV.FLEX_VALUE_SET_ID = FVS.FLEX_VALUE_SET_ID) b
WHERE gjh.je_header_id = gjl.je_header_id
AND gjh.set_of_books_id = 1001
AND gjl.global_attribute1 = B.FLEX_VALUE
AND(GJH.PERIOD_NAME = :cp_period_name OR GJH.PERIOD_NAME = :cp_prior_period_name)) A
WHERE (A.PERIOD_NAME = :cp_period_name OR A.PERIOD_NAME = :cp_prior_period_name));
in a word - no
in a sentence - it is huge, it is against tables I don't know, containing data I don't have a clue about, answering a question I don't understand. The odds that it is 100% "correct" (that every outer join is really necessary, that the approach taken is sound, that is even returns the right answer) is near 0% in my experience - therefore reverse engineering it - which itself would take a really really long time - isn't fruitful because the query is probably a bad idea in the first place.
In order to "tune a query", one needs:
o intimate knowledge of the schema - all of the constraints, everything.
o intimate knowledge of the question that needs to be answered
What one typically doesn't want (surprisingly) is the
o query you are trying to tune
Not that I'm asking you for the schema, etc - this is not "ask tom to tune your query"
Hi Tom,
Please explain me what is wrong in performance in this query and how to tune this. Thanks in advance..
UPDATE customer_table aaa
SET (acc_name, acc_flg, act_date) =
(
SELECT
DISTINCT name,
'Y',
DECODE(to_char(REFERENCE_START_DT, 'DD/MM/YYYY HH24.MI.SS'), NULL, aaa.status_date, to_char(REFERENCE_START_DT, 'DD/MM/YYYY HH24.MI.SS'))
FROM
account acc
WHERE
acc.acc_num = aaa.customer_id
AND aaa.success_flg = 'Y'
AND aaa.error_flg = 'N'
)
WHERE EXISTS
(
SELECT
DISTINCT name,
'Y',
DECODE(to_char(REFERENCE_START_DT, 'DD/MM/YYYY HH24.MI.SS'), NULL, aaa.status_date, to_char(REFERENCE_START_DT, 'DD/MM/YYYY HH24.MI.SS'))
FROM
account acc
WHERE
acc.acc_num = aaa.customer_id
AND aaa.success_flg = 'Y'
AND aaa.error_flg = 'N'
);
If the optimizer is unable to merge the correlated subquery into the update - it might have to execute it at least once per row in the update.
I like MERGE for this
merge into customer_table aaa
using account acc
ON (aaa.customer_id = acc.customer_id
and aaa.success_flg='Y'
and aaa.error_flg='N')
when matched then update set acc_name = acc.acc_name, acc_flg = 'Y', act_date = nvl(acc.act_date,status_date);
Instead of decoding and converting to strings (I assume that status_date is a date? ) just nvl it.
If the distinct is necessary - there is something really wrong with your data model, in a big way.
There is no way the predicate:
WHERE
acc.acc_num = aaa.customer_id
AND aaa.success_flg = 'Y'
AND aaa.error_flg = 'N'
)
on aaa.success_flg and aaa.error_flg belonged in the subquery - that should have been set in the where clause of the update itself - where exists (...) AND success_flg = 'Y' and error_flg = 'N'.
In this case the account acc is the remote table which is having more than 30millions of rec.
earlier I was using the hint driving_site(acc) for the select statement to improve the performance.
So Now how I will use this hint or is there any alternate to improve the performance ?
Will appreciate an earliest reply.
Thanks
Prince
Dear Tom,
We have been experimenting in the usage of IN and NOT IN and we came across a small scenario:
When we executed the following SQL:
select SYSDATE from dual where '2' not in ('1','3')
We got sysdate as an Output, but when we used the following SQL:
select SYSDATE from dual where '2' not in ('1','3', NULL)
0 Records was selected.
What could be the reason for this?
it works that way by definition.
When you compare "NON-NULL-VALUE" to "NULL-VALUE" the answer is "I don't know", it is not true, it is not false.
Therefore, it is "unknown" whether '2' is not in ( NULL ).
With in - we can "know"
where '2' in ( '1', '2', null )
we Know 2 is in that set. But we don't know if '3' is NOT IN that set...
It is by definition the way it works. Part of SQL
Exists was introduced in SQL - the language specification. It is not an Oracle thing - it is a sql thing.
It is primarily syntactic convenience, semantic pleasantness. Sometimes "IN" is just "better sounding", sometimes exists is.
In and Exists can be used interchangeable.
Not In and Not Exists cannot - they are different (regarding treatment of nulls).
Just like you can say:
select * from t;
select ALL * from t;
interchangeably, or
select distinct * from t;
select unique * from t;
interchangeably...
In and exists both exist for the same reason "for" and "while" loops exist (even with goto which can be used instead of for or while as well)...
Hi Tom,
1.Scalar subquery is a SPECIAL case of Single row return subquery which returns Single COLOUMN from single ROW.
We can use Scalar Subqueries wherever we can use single row subquery with single column and there are NO any differences between Scalar subquery and Single row retun subquery with single coloumn.
Could you please clarify if the above correct ?
2.I already asked the below question in new thread but did not get answer -SORRY for that.. Can you please confirm if this is correct that
a)when the subquery is non-correlated and
b) when subquery is having aggregate functions count(*) with group by and
c) when it returns multiple rows -
First we will execute the subquery once and STORE the results in PGA -then we will execute main query with this as input ??
ex: select t.x,t.y from T
where t.x > (select max(a) from s group by b)
Thanks Tom for your help all the way !!
1) what is a single row subquery?
You can use scalar subqueries anywhere you can use an expression.
select * from t where x = 10;
select * from t where x = (select 10 from dual);
There, a scalar subquery was used to replace the expression "10".
Now, if you have:
select * from t where x in (select 10 from dual);
the select 10 from dual is not a scalar subquery in that context, it is just a "subquery".
2) No, that is not how it will happen in all cases. We can and will do view merging when we find it to be the appropriate approach.
Never assume some order of operation outside of expressions with ()'s. That is
(5+3)/2
has a definite order of operation - but
where x > (5+3)/2 and (y < 55 and z > 4)
You cannot assume that y&z are evaulated and then x is (or that x is, and then y and z, or etc etc etc).
All you know is (5+3)/2 - IF evaluated - will be done 'correctly' and everything else happens in some order.
Subqueries, inline views, etc etc etc - all can be moved around, rewritten and so on.
Your query in general does not work - select max(a) from s group by b would IN GENERAL return 0 to N records where N can be any number greater than zero. To see "t.x > (select max(a) from s group by b)" work - either
o s would have to be empty, that query returns 0 rows.
o b is unique in S, that query returns 1 rows.
If b has more than one value, that query returns more than 1 rows and will fail in this context. You need a scalar subquery here!
Now, if you wrote instead;
where t.x > ANY(select max(a) from s group by b)
or
where t.x > ALL(select max(a) from s group by b)
then your query would become a subquery and could work even if b was not unique in s.
Hi Tom,
thanks for the reply , It was my misatke in the 2nd point - missing ANY..
ok, you said -
select * from t where x = (select 10 from dual);
There, a scalar subquery was used to replace the expression "10".
Now, if you have:
select * from t where x in (select 10 from dual);
the select 10 from dual is not a scalar subquery in that context, it is just a "subquery".
Why - is it because of the operator 'IN' ?
Can u explain the diff between the Output of these queries?? i.e. The algorithm by which EXIST, NOT EXISTS, In & NOT IN operates.
SELECT * FROM ALL_WKSCRATCHPAD_DB.TB1;??
SELECT * FROM ALL_WKSCRATCHPAD_DB.TB2;
Number Name
1 brad 30
2 tom 20
3 John 10
Number Name
1 brad 30
2 tom 20
3 anil 40
LOCKING ROW FOR ACCESS
SELECT *
FROM ALL_WKSCRATCHPAD_DB.TB1 T1
WHERE NOT EXISTS (SELECT ''
FROM ALL_WKSCRATCHPAD_DB.TB2 T2
WHERE T1.Name = T2.name)
;
Ans:
Number Name
1 John 10
---------------------------
LOCKING ROW FOR ACCESS
SELECT *
FROM ALL_WKSCRATCHPAD_DB.TB1 T1
WHERE T1.Name NOT IN (SELECT T2.name
FROM ALL_WKSCRATCHPAD_DB.TB2 T2
WHERE T1.Name = T2.name)
;
Number Name
1 John 10
----------------------------------
LOCKING ROW FOR ACCESS
SELECT *
FROM ALL_WKSCRATCHPAD_DB.TB1 T1
WHERE EXISTS (SELECT ''
FROM ALL_WKSCRATCHPAD_DB.TB2 T2
WHERE T1.Name <> T2.name)
;
Number Name
1 brad 30
2 tom 20
3 John 10
-------------------------------
LOCKING ROW FOR ACCESS
SELECT *
FROM ALL_WKSCRATCHPAD_DB.TB1 T1
WHERE T1.Name IN (SELECT T2.name
FROM ALL_WKSCRATCHPAD_DB.TB2 T2
WHERE T1.Name <> T2.name)
;
Number Name
IN = Exists, they the same.
NOT IN is a *little* different from NOT EXISTS, but usually the same.
select * from t1 where not exists (select null from t2 where t2.x = t1.x);
select * from t1 where t1.x NOT IN (select t2.x from t2);
IF t2.x is NOT NULL - those two queries are the same.
IF t2.x is NULLABLE and there is at least one value of t2.x that is null in T2 - then the second query returns ZERO rows - whereas the not exists might return some rows.
but if you have:
select * from t1 where t1.x NOT IN (select t2.x from t2 where t2.x is not null)
then they would be the same again.
<quote>
Your query in general does not work - select max(a) from s group by b would IN GENERAL return 0 to N records where N can be any number greater than zero. To see "t.x > (select max(a) from s group by b)" work - either
o s would have to be empty, that query returns 0 rows.
o b is unique in S, that query returns 1 rows.
<quote>
little correction....
for second point it should not be unique but all record should have same value for b in order to return single row
Hi Tom,
First of all, thank you very much for spending your valuable time in answering all our questions.
I have a question regarding not in vs not exists.
db version: 11gr2
Suppose I have a query like this,
select * from tab1 t1 where
(col1,col2,col3) not in (select col1,col2,col3 from tab2 t2);
If (col1,col2,col3) in tab2 is a primary key and so doesn't return any NULL values, can I rewrite the above query as below?
select * from tab1 t1 where
not exists(select null from tab2 t2 where t2.col1=t1.col1 and t2.col2=t1.col2 and t2.col3=t1.col3);
Appreciate all your help. Thank you.
Hi Tom,
I have been trying to fetch rows from table A if it doesn't exist multiple times in table B..
Eg: Table A
ID Value
1 ABC
1 ABC
1 ABC
2 DEF
2 DEF
Table B
1 ABC
1 ABC
2 DEF
When I compare,these two tables,I need to show that there are two rows in the output as
as ID Value
1 ABC
2 DEF
I tried using the following query
select A.ID ,A.value,
from A
where not exists (select * from B where A.id= B.id and A.value= B.value).
However,I don't get the required output.
I tried using LEFT OUTER JOIN too,
SELECT A.ID, A.value LEFT OUTER JOIN B
ON A.ID = B.ID AND A.value = B.value
Even this doesn't work..
Kindly help me on this.
Hi Tom,
i have faced with the strange query execution behavior.
In brief, I has two tables.
First - rem3_credit with "LINK" column as primary key.
Second - credit_doc with "LINK" column as primary key and nullable column "LINK_UP" referenced on rem3_credit.link.
Data was loaded via SQL*Loader with setting DIRECT=Y, so reference constraint had not checked during load.
I know about special settings in SQL*Loader which force reference constraints been checked after load but interesting issue was found when I decide to find rows in rem3_credit that are not in credit_doc.
My query was:
SQL> select * from rem3_credit rc where not exists (select * from credit_doc cd where cd.link=rc.link_up);
no rows selected
But actually there was one!
SQL> select rc.link,rc.link_up fk_link, cd.link pk_link from rem3_credit rc, credit_doc cd where cd.link(+)=rc.link_up and cd.link is null;
LINK FK_LINK PK_LINK
---------- ---------- ----------
2202 24031
There is the same result if i`d query such way:
SQL> select link,link_up from rem3_credit rc where not exists (select * from credit_doc cd where cd.link=rc.link_up and cd.link is not null);
LINK LINK_UP
---------- ----------
2202 24031
But why I need to write additional predicate if I know (and Oracle knows) that credit_doc.link is primary key and can`t be null?
Let`s take a look at execution plan:
SQL> select * from TABLE(dbms_xplan.display_cursor('','','ALLSTATS LAST'));
SQL_ID 1cbvtut47qfv1, child number 0
-------------------------------------
select /*+ GATHER_PLAN_STATISTICS */ * from rem3_credit rc where not
exists (select * from credit_doc cd where cd.link=rc.link_up)
Plan hash value: 1161649700
-------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
-------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 0 |00:00:00.01 | 15 |
|* 1 | TABLE ACCESS FULL| REM3_CREDIT | 1 | 1 | 0 |00:00:00.01 | 15 |
-------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("RC"."LINK_UP" IS NULL)
Execution plan of "correct" query is:
SQL> select * from TABLE(dbms_xplan.display_cursor('','','ALLSTATS LAST'));
SQL_ID av9w7ftq1bu06, child number 0
-------------------------------------
select /*+ GATHER_PLAN_STATISTICS */ rc.link,rc.link_up fk_link,
cd.link pk_link from rem3_credit rc, credit_doc cd where
cd.link(+)=rc.link_up and cd.link is null
Plan hash value: 2146757845
-----------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
-----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 452 |
|* 1 | FILTER | | 1 | | 1 |00:00:00.01 | 452 |
| 2 | NESTED LOOPS OUTER| | 1 | 8 | 797 |00:00:00.01 | 452 |
| 3 | TABLE ACCESS FULL| REM3_CREDIT | 1 | 798 | 797 |00:00:00.01 | 16 |
|* 4 | INDEX UNIQUE SCAN| PK_CREDIT_DOC | 797 | 12208 | 796 |00:00:00.01 | 436 |
-----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("CD"."LINK" IS NULL)
4 - access("CD"."LINK"="RC"."LINK_UP")
I suppose there is some kind of query rewrite or transformation. Or simply a bug. By the way:
SQL> select * from v$version;
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
Can you give some clarification about what happening here? Thank you in advance.
When you use IN / NOT IN, the query retrieves the records from subquery and then checks the existance of the master record in the subquery. It will not consider NULL Records.
Incase of Exists/Not Exists, the where condition will return a boolean (true/false) and display the desired output. It is faster in performance than IN and NOT IN.
Another better method is using Left/Right outer join.
there is so much wrong with what you just said. so much. I don't know where to start.
did you know we can and will rewrite your IN to be Exists, your Exists to be In, your not in as not exists, your not exists as not in???
they are the same to us - we use them as we see fit.
http://www.oracle.com/technetwork/issue-archive/2006/06-may/o36asktom-099107.html
When you use IN / NOT IN, the query retrieves the records from subquery and
then checks the existance of the master record in the subquery. It will not
consider NULL Records.
I don't know what you mean by "it will not consider NULL records" - but I have a feeling that whatever you would say it means is probably wrong. Not in and VERY null sensitive. ops$tkyte%ORA11GR2> create table t1 ( x int );
Table created.
ops$tkyte%ORA11GR2> create table t2 ( x int );
Table created.
ops$tkyte%ORA11GR2>
ops$tkyte%ORA11GR2> insert into t1 values(1);
1 row created.
ops$tkyte%ORA11GR2> insert into t1 values(2);
1 row created.
ops$tkyte%ORA11GR2> insert into t2 values(1);
1 row created.
ops$tkyte%ORA11GR2>
ops$tkyte%ORA11GR2> select * from t1 where x not in (select x from t2);
----------
ops$tkyte%ORA11GR2> insert into t2 values(null);
1 row created.
ops$tkyte%ORA11GR2> select * from t1 where x not in (select x from t2);
no rows selected
ops$tkyte%ORA11GR2>
and you know what, ask yourself, if this were true:
It is faster in performance than
IN and NOT IN.
then why would anyone 'invent' IN and NOT IN????
Hi Tom,
What about 'in' vs. '<>'?
I have a query that looks like this:
select txn.* from txn_tbl partition (20100101) txn, attribute_tbl attr
where txn.attrib_fk = attr.attrib_pk and attr.status <> 'A';
Select from big transaction table based on one attribute. The 'status' column in the latter has 3 values and a bitmap index on it. I need to return all transaction with attributes that have any of 2 values.
Run as it is (attr.status <> 'A'), I get the following plan:
--------------------------------------------------------------------------------------------------
Id |Operation |Name |Rows|Bytes|Cost(CPU|Time |Pstr|Pstp|IN-OUT
--------------------------------------------------------------------------------------------------
0|SELECT STATEMENT | | 18M|2912M|6168 (2)|00:01:52| | |
1| PX COORDINATOR | | | | | | | |
2| PX SEND QC (RANDOM) |:TQ10001 | 18M|2912M|6168 (2)|00:01:52| | | P->S
* 3| HASH JOIN | | 18M|2912M|6168 (2)|00:01:52| | | PCWP
4| BUFFER SORT | | | | | | | | PCWC
5| PX RECEIVE | |128 | 512 | 3(34)|00:00:01| | | PCWP
6| PX SEND BROADCAST |:TQ10000 |128 | 512 | 3(34)|00:00:01| | | S->P
7| VIEW |index$_join$_02|128 | 512 | 3(34)|00:00:01| | |
* 8| HASH JOIN | | | | | | | |
9| BITMAP CONV'N TO ROWIDS| |128 | 512 | 1 (0)|00:00:01| | |
*10| BITMAP INDEX FULL SCAN|ATTRIB_TBL_BMI1| | | | | | |
11| INDEX FAST FULL SCAN |ATTRIB_PK_IDX |128 | 512 | 1 (0)|00:00:01| | |
12| PX BLOCK ITERATOR | | 18M|2840M|6159 (1)|00:01:51|1317|1317| PCWC
13| TABLE ACCESS FULL |TXN_TBL | 18M|2840M|6159 (1)|00:01:51|1317|1317| PCWP
--------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("TXN"."ATTRIB_FK"="ATTR"."ATTRIB_PK")
8 - access(ROWID=ROWID)
10 - filter("ATTR"."STATUS"<>'A')
In one of the releases, the query was changed to the following "to make it easier to understand which values we're bringing back":
select txn.* from txn_tbl partition (20100101) txn, attribute_tbl attr
where txn.attrib_fk = attr.attrib_pk and attr.status in ('B','C');
The plan for this version is this:
--------------------------------------------------------------------------------------------------
Id |Operation |Name |Rows|Bytes|Cost(CPU|Time |Pstr|Pstp|INOUT
--------------------------------------------------------------------------------------------------
0|SELECT STATEMENT | | 18M|2912M|6169 (2)|00:01:52| | |
1| PX COORDINATOR | | | | | | | |
2| PX SEND QC (RANDOM) |:TQ10001 | 18M|2912M|6169 (2)|00:01:52| | | P->S
* 3| HASH JOIN | | 18M|2912M|6169 (2)|00:01:52| | | PCWP
4| BUFFER SORT | | | | | | | | PCWC
5| PX RECEIVE | |128 | 512 | 4(25)|00:00:01| | | PCWP
6| PX SEND BROADCAST |:TQ10000 |128 | 512 | 4(25)|00:00:01| | | S->P
* 7| VIEW |index$_join$_02|128 | 512 | 4(25)|00:00:01| | |
* 8| HASH JOIN | | | | | | | |
9| INLIST ITERATOR | | | | | | | |
10| BITMAP CONV'N TO ROWIDS| |128 | 512 | 2 (0)|00:00:01| | |
*11| BITMAP IDX SINGL VALUE|ATTRIB_TBL_BMI1| | | | | | |
12| INDEX FAST FULL SCAN |ATTRIB_PK_IDX |128 | 512 | 1 (0)|00:00:01| | |
13| PX BLOCK ITERATOR | | 18M|2840M|6159 (1)|00:01:51|1317|1317| PCWC
14| TABLE ACCESS FULL |TXN_TBL | 18M|2840M|6159 (1)|00:01:51|1317|1317| PCWP
--------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("TXN"."ATTRIB_FK"="ATTR"."ATTRIB_PK")
7 - filter("ATTR"."STATUS"='B' OR "ATTR"."STATUS"='C')
8 - access(ROWID=ROWID)
11 - access("ATTR"."STATUS"='B' OR "ATTR"."STATUS"='C')
With an additional access predicate and 'in list' operation, I would've thought that the second version would produce a different plan. But the plans appear virtually identical in terms of cost. I tried it with a larger number of values in the 'in' list against another table, but that didn't seem to affect the plan either. Does that mean that doing <> is no better than doing in ()? I would've thought that saying 'not A' would yield better performance than listing 'B, C, D, E' as possible 'in' values...
Thanks as always.
But the plans appear
virtually identical in terms of cost.
that should be expected, not unexpected!! The cost if retrieving the same amount of data should be....
the same.
I'm disappointed when it *doesn't* work out that way!!!
Does that mean that doing <> is no better than doing in ()?
I would've thought that saying 'not A' would yield better performance than
listing 'B, C, D, E' as possible 'in' values...
no, it doesn't "mean" that. It just means that to get 18,000,000 rows from a table - most of the work, the time, the energy is going to be spent doing IO.
Thanks Tom.
I suppose I was expecting the plans to be different (erroneously so) in a sense that one would be better than the other one, which was not the case.
So then with the io being spent either way to get the rows, does it make it a purely sematic debate as to whether <> or () should be used, or is there a meaningful difference between the two?
Thanks as always
Hi Tom -
Someone in the thread asked to tune the query with multiple EXISTS clause with OR. I have the similar case with multiple EXISTS that is producing the wong result. I replaced subsequents EXISTS with UNION ALL and changed to one EXISTS clause and get the correct result.
Though I am not sure why that would make the difference. Do you see any reason?
--This clause misses some records
WHERE
EXISTS ( SELECT 1 FROM springs.si_jde_schedrcpts_work sr
WHERE t.order_id = sr.t.order_id
AND t.item = sr.item
AND t.fromloc = sr.fromloc
AND sr.order_status < 90
AND sr.supplysource =1 )
OR EXISTS( SELECT 1 FROM stsc.inventory inventory
WHERE t.item = inventory.item
AND t.fromloc = inventory.loc AND
inventory.qty => t.stqty
AND sr.order_staus > 90
)
OR EXISTS( SELECT 1 FROM stsc.inventory inventory
WHERE t.item = inventory.item
AND t.fromloc = inventory.loc AND
inventory.qty => t.stqty
AND sr.supplysource=1
)
VS
--this produces the expected result
WHERE
EXISTS ( SELECT 1 FROM springs.si_jde_schedrcpts_work sr
WHERE t.order_id = sr.t.order_id
AND t.item = sr.item
AND t.fromloc = sr.fromloc
AND sr.order_status < 90
AND sr.supplysource =1
UNION ALL
SELECT 1 FROM stsc.inventory inventory
WHERE t.item = inventory.item
AND t.fromloc = inventory.loc AND
inventory.qty => t.stqty
AND sr.order_staus > 90
UNION ALL
SELECT 1 FROM stsc.inventory inventory
WHERE t.item = inventory.item
AND t.fromloc = inventory.loc AND
inventory.qty => t.stqty
AND sr.supplysource=1
)
Thanks an always!
Prakash
please contact support with this one...
(and use two subqueries, not three - just or the last condition on the last two)
If what you say is true, it looks like a bad plan (bug) is being generated for the first query - but I cannot confirm that as I don't have the schema or the entire query and cannot verify that.
Follow up to my previous post;
Somewhere I read, use of multiple EXISTS with OR clause like below will not execute subsequent clause and that may vary if a column used in the predicate is indexed. e.g. SR.SUPPLYSOURCE. That's exactly the case I tested, but WHY? Why the standalone index makes the query behave differently?
I see INDEX or UNION ALL are fix or work around but curious what's the reason to Oracle behaves differently?
....
WHERE
EXISTS( SELECT 1 FROM stsc.inventory inventory
WHERE t.item = inventory.item
AND t.fromloc = inventory.loc
AND inventory.qty => t.stqty
AND sr.supplysource = 1 )
OR EXISTS( SELECT 1 FROM stsc.inventory inventory
WHERE t.item = inventory.item
AND t.fromloc = inventory.loc
AND inventory.qty => t.stqty
AND sr.supplysource = 2 )
OR EXISTS( SELECT 1 FROM stsc.inventory inventory
WHERE t.item = inventory.item
AND t.fromloc = inventory.loc
AND inventory.qty => t.stqty
AND sr.supplysource =3 )
Thanks as always!
Prakash
Tom - Thanks for your response. In real scenario I have 5 OR EXISTS and data breaks after 3rd clause. I had my peers and DBA looked into it and came to the similar conclusion that it is a bug. While we open SR with Oracle, I am applying workaround with UNION ALL, that preserves the performance and produce the correct result.
Thanks as always.
Prakash
please only ever post the REAL QUERY.
You just wasted your time and my time by having me look at a fake example that does not represent what you really are faced with.
Now we have no idea if you are facing a bug or not - because we have never seen the actual query you are running.
The union all might not be giving you the right answer by the way. If more than one branch of the OR is satisfied - you are going to get the same row more than once with a union all.
Hi Tom ,
Your link provided for IN and exists is awesome . Though for NOT IN AND NOT EXISTS you gave us the example of the case which was mishandled by NOT IN .So if i ignore such cases then the working of NOT IN and NOT EXISTS should be similar to IN AND EXISTS. I mean , in case of NOT IN - The sub-query is transformed to a view with distinct values and a join . but an equi join wouldn't be needed here .So how does it work ?
Also for NOT EXISTS ,suppose the inner table is big enough and indexed on the column .Since we arent matching any values, i suppose the index wont be used. so how does NOT EXISTS work ?
NOT IN and NOT EXISTS are not the same in the way that IN and EXISTS are.
they (not in/not exists) deal with NULLS very differently.
with not in/exists - the subuquery doesn't have to be distincted - we can do a thing called a semi-join - so we don't necessarily need to distinct them. We'd use an outer join and keep only the rows where the joined to table was "null" (not there at all).
and we can use an index easily for both not in and not exists if it makes sense.
ops$tkyte%ORA11GR2> create table t1 ( x int, y int, z int );
Table created.
ops$tkyte%ORA11GR2> exec dbms_stats.set_table_stats( user, 'T1', numrows => 100 );
PL/SQL procedure successfully completed.
ops$tkyte%ORA11GR2> create table t2 ( x int not null, y int, z int );
Table created.
ops$tkyte%ORA11GR2> create index t2_idx on t2(x);
Index created.
ops$tkyte%ORA11GR2> exec dbms_stats.set_table_stats( user, 'T2', numrows => 1000000, numblks => 100000 );
PL/SQL procedure successfully completed.
ops$tkyte%ORA11GR2>
ops$tkyte%ORA11GR2> set autotrace traceonly explain
ops$tkyte%ORA11GR2> select *
2 from t1
3 where NOT EXISTS (select null from t2 where t2.x = t1.x)
Execution Plan
----------------------------------------------------------
Plan hash value: 2403176152
-----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 100 | 5200 | 29 (0)| 00:00:01 |
| 1 | NESTED LOOPS ANTI | | 100 | 5200 | 29 (0)| 00:00:01 |
| 2 | TABLE ACCESS FULL| T1 | 100 | 3900 | 29 (0)| 00:00:01 |
|* 3 | INDEX RANGE SCAN | T2_IDX | 1 | 13 | 0 (0)| 00:00:01 |
-----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("T2"."X"="T1"."X")
ops$tkyte%ORA11GR2> set autotrace off
Tom -
I apologize for not posting the actual query.
Here is the query that I replaced "OR" with "UNION ALL".
In the third EXIST clause, if I explicitly convert ics.cac_segment_type (which is number field) to TO_NUMBRE (as "TO_NUMBER(ics.cac_segment_type) = 1.3", the query fully executes the next evaluation "ics.cac_segment_type= 1.4" and returns the correct result, or else only partiatially evaluates the next statmenet and skips about 20% of "ics.cac_segment_type= 1.4" matchings. The use of TO_NUMBER() changed the plan to "INDEX STORAGE FAST FULL SCAN" from "INDEX RANGE SCAN". This solution was found as hit and trial that I don't see the logical reasoning and also impacted the performance unacceptably.
UNION ALL produces the correct result and there is no counter impact. Though I don't see the point as you said "The union all might not be giving you the right answer by the way....". Can you please elaborate why would top query returns multiple records if the EXIST satisfies with multiple clauses. EXIST will only validate EXIST = Yes or No. What am I missing?
Thanks,
Prakash
SELECT
...
FROM
table1 x
WHERE EXISTS
(SELECT 'X'
FROM local_gtt ics
WHERE ics.cac_segment_type = 1.1
AND ics.buyer_bus_org_fk = x.buyerorg
AND ics.ui_buyer_org_exists =1
AND ics.segment_value = x.segment1
UNION ALL
SELECT 'X'
FROM local_gtt ics
WHERE ics.cac_segment_type =1.2
AND ics.buyer_bus_org_fk = x.buyerorg
AND ics.ui_buyer_org_exists = 1
AND ics.segment_value = x.segment2
UNION ALL
SELECT 'X'
FROM local_gtt ics
WHERE ics.cac_segment_type = 1.3
AND ics.buyer_bus_org_fk = x.buyerorg
AND ics.ui_buyer_org_exists = 1
AND ics.segment_value = x.segment3
UNION ALL
SELECT 'X'
FROM local_gtt ics
WHERE ics.cac_segment_type = 1.4
AND ics.buyer_bus_org_fk = x.buyerorg
AND ics.ui_buyer_org_exists = 1
AND ics.segment_value = x.segment4
UNION ALL
SELECT 'X'
FROM local_gtt ics
WHERE ics.cac_segment_type = 1.5
AND ics.buyer_bus_org_fk = x.buyerorg
AND ics.ui_buyer_org_exists = 1
AND ics.segment_value = x.segment5
)
Hi Tom,
I have following queries with execution plans. Both return exact same data. First query uses normal joins, it shows execution plan with low cost and executes faster. Second query uses EXISTS, it's cost is much higher and execution is 10 times slower compared to first query.
I am trying to understand reason for the difference between the execution plans of these 2 queries. Is there any way so that I can put HINT to second query so that its plan becomes better?
Thanks!
Table details -
Table_name Num_rows
--------------------------------
TBL_CALENDAR 4217
TBL_ACCOUNT 41355
TBL_DATA_SRC 12
TBL_POSN 377259
---------------------------------------------------------------------------------------------
First Query -
SELECT S.ROWID
FROM TBL_POSN S,
TBL_ACCOUNT A,
TBL_DATA_SRC D,
(
SELECT TO_CHAR(TBL_CALENDAR_DATE,'MMYYYY') MMYYYY,
MAX(CASE WHEN TO_CHAR(TBL_CALENDAR_DATE, 'D') IN (2, 3, 4, 5, 6) THEN TBL_CALENDAR_DATE ELSE NULL END) LAST_WORKING_DAY,
MAX(LAST_DAY(TBL_CALENDAR_DATE)) LAST_CALENDAR_DAY
FROM TBL_CALENDAR
GROUP BY TO_CHAR(TBL_CALENDAR_DATE,'MMYYYY')
) LAST_DAY
WHERE 1=1
AND REPORT_DATE < '31-OCT-12'
AND REPORT_DCDE = 'I'
AND A.AC_SK = S.AC_SK
AND A.DATA_SRC_ID = D.DATA_SRC_ID
AND S.REPORT_DATE >= D.BACKLOAD_CUTOFF_DATE
and TO_CHAR(s.REPORT_DATE, 'MMYYYY') = LAST_DAY.MMYYYY
and (
a.DATA_SRC_ID in(1, 2)
OR (
a.DATA_SRC_ID in(3, 4, 5, 6, 10)
AND s.REPORT_DATE <> LAST_DAY.LAST_WORKING_DAY
)
or (
A.DATA_SRC_ID = 8
and s.REPORT_DATE <> LAST_DAY.LAST_CALENDAR_DAY
)
);
Plan hash value: 1112228868
---------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Pstart| Pstop |
---------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 11926 | 815K| 529 (15)| | |
|* 1 | HASH JOIN | | 11926 | 815K| 529 (15)| | |
|* 2 | HASH JOIN | | 857 | 40279 | 519 (15)| | |
| 3 | NESTED LOOPS | | 8574 | 326K| 362 (19)| | |
|* 4 | TABLE ACCESS FULL | TBL_DATA_SRC | 1 | 12 | 2 (0)| | |
| 5 | PARTITION RANGE ITERATOR| | 14291 | 376K| 360 (19)| KEY | KEY |
|* 6 | INDEX FAST FULL SCAN | XU_TBL_POSN_PK | 14291 | 376K| 360 (19)| KEY | KEY |
| 7 | TABLE ACCESS FULL | TBL_ACCOUNT | 41355 | 323K| 154 (4)| | |
| 8 | VIEW | | 4200 | 96600 | 8 (25)| | |
| 9 | HASH GROUP BY | | 4200 | 33600 | 8 (25)| | |
| 10 | INDEX FULL SCAN | TBL_CALENDAR_PK| 4217 | 33736 | 6 (0)| | |
---------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("LAST_DAY"."MMYYYY"=TO_CHAR(INTERNAL_FUNCTION("S"."REPORT_DATE"),'MMYYYY'))
filter("A"."DATA_SRC_ID"=1 OR "A"."DATA_SRC_ID"=2 OR ("A"."DATA_SRC_ID"=3 OR "A"."DATA_SRC_ID"=4 OR
"A"."DATA_SRC_ID"=5 OR "A"."DATA_SRC_ID"=6 OR "A"."DATA_SRC_ID"=10) AND
"S"."REPORT_DATE"<>"LAST_DAY"."LAST_WORKING_DAY" OR "A"."DATA_SRC_ID"=8 AND
"S"."REPORT_DATE"<>"LAST_DAY"."LAST_CALENDAR_DAY")
2 - access("A"."AC_SK"="S"."AC_SK" AND "A"."DATA_SRC_ID"="D"."DATA_SRC_ID")
4 - filter("D"."BACKLOAD_CUTOFF_DATE"<'31-OCT-12')
6 - filter("REPORT_DCDE"='I' AND "REPORT_DATE"<'31-OCT-12' AND
"S"."REPORT_DATE">="D"."BACKLOAD_CUTOFF_DATE")
------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------
Second query -
SELECT ROWID
FROM TBL_POSN s
WHERE 1=1
AND REPORT_DATE < '31-OCT-12'
AND REPORT_DCDE = 'I'
and EXISTS
(
SELECT 1
FROM TBL_ACCOUNT A,
TBL_DATA_SRC D,
(
SELECT TO_CHAR(TBL_CALENDAR_DATE,'MMYYYY') MMYYYY,
MAX(CASE WHEN TO_CHAR(TBL_CALENDAR_DATE, 'D') IN (2, 3, 4, 5, 6) THEN TBL_CALENDAR_DATE ELSE NULL END) LAST_WORKING_DAY,
MAX(LAST_DAY(TBL_CALENDAR_DATE)) LAST_CALENDAR_DAY
FROM TBL_CALENDAR
GROUP BY TO_CHAR(TBL_CALENDAR_DATE,'MMYYYY')
) LAST_DAY
WHERE AC_SK = s.AC_SK
AND A.DATA_SRC_ID = D.DATA_SRC_ID
AND REPORT_DATE >= D.BACKLOAD_CUTOFF_DATE
and TO_CHAR(REPORT_DATE, 'MMYYYY') = LAST_DAY.MMYYYY
and (
a.DATA_SRC_ID in(1, 2)
OR (
A.DATA_SRC_ID IN(3, 4, 5, 6, 10)
AND REPORT_DATE <> LAST_DAY.LAST_WORKING_DAY
)
or (
A.DATA_SRC_ID = 8
and REPORT_DATE <> LAST_DAY.LAST_CALENDAR_DAY
)
)
);
Plan hash value: 723147282
------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Pstart| Pstop |
------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 27 | 456K (8)| | |
|* 1 | FILTER | | | | | | |
| 2 | PARTITION RANGE ITERATOR | | 285K| 7536K| 353 (17)| 1 | KEY |
|* 3 | INDEX FAST FULL SCAN | XU_TBL_POSN_PK | 285K| 7536K| 353 (17)| 1 | KEY |
| 4 | NESTED LOOPS | | 2 | 86 | 4 (0)| | |
| 5 | NESTED LOOPS | | 1 | 20 | 2 (0)| | |
| 6 | TABLE ACCESS BY INDEX ROWID| TBL_ACCOUNT | 1 | 8 | 1 (0)| | |
|* 7 | INDEX UNIQUE SCAN | PK_TBL_ACCOUNT | 1 | | 1 (0)| | |
|* 8 | TABLE ACCESS BY INDEX ROWID| TBL_DATA_SRC | 1 | 12 | 1 (0)| | |
|* 9 | INDEX UNIQUE SCAN | PK_TBL_DATA_SRC| 1 | | 1 (0)| | |
|* 10 | VIEW | | 2 | 46 | 2 (0)| | |
| 11 | SORT GROUP BY | | 42 | 336 | 8 (25)| | |
|* 12 | INDEX FULL SCAN | TBL_CALENDAR_PK| 42 | 336 | 7 (15)| | |
------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter( EXISTS (SELECT /*+ */ 0 FROM (SELECT /*+ */
TO_CHAR(INTERNAL_FUNCTION("TBL_CALENDAR_DATE"),'MMYYYY') "MMYYYY",MAX(CASE
TO_NUMBER(TO_CHAR(INTERNAL_FUNCTION("TBL_CALENDAR_DATE"),'d')) WHEN 2 THEN "TBL_CALENDAR_DATE" WHEN 3 THEN
"TBL_CALENDAR_DATE" WHEN 4 THEN "TBL_CALENDAR_DATE" WHEN 5 THEN "TBL_CALENDAR_DATE" WHEN 6 THEN "TBL_CALENDAR_DATE"
ELSE NULL END ) "LAST_WORKING_DAY",MAX(LAST_DAY(INTERNAL_FUNCTION("TBL_CALENDAR_DATE"))) "LAST_CALENDAR_DAY" FROM
"TBLDBO"."TBL_CALENDAR" "TBL_CALENDAR" WHERE TO_CHAR(INTERNAL_FUNCTION("TBL_CALENDAR_DATE"),'MMYYYY')=TO_CHAR(:B1,'M
MYYYY') GROUP BY TO_CHAR(INTERNAL_FUNCTION("TBL_CALENDAR_DATE"),'MMYYYY')) "LAST_DAY","TBLDBO"."TBL_DATA_SRC"
"D","TBLDBO"."TBL_ACCOUNT" "A" WHERE "AC_SK"=:B2 AND "A"."DATA_SRC_ID"="D"."DATA_SRC_ID" AND
"D"."BACKLOAD_CUTOFF_DATE"<=:B3 AND (("A"."DATA_SRC_ID"=1 OR "A"."DATA_SRC_ID"=2) OR ("A"."DATA_SRC_ID"=3 OR
"A"."DATA_SRC_ID"=4 OR "A"."DATA_SRC_ID"=5 OR "A"."DATA_SRC_ID"=6 OR "A"."DATA_SRC_ID"=10) AND
"LAST_DAY"."LAST_WORKING_DAY"<>:B4 OR "A"."DATA_SRC_ID"=8 AND "LAST_DAY"."LAST_CALENDAR_DAY"<>:B5)))
3 - filter("REPORT_DCDE"='I' AND "REPORT_DATE"<'31-OCT-12')
7 - access("AC_SK"=:B1)
8 - filter("D"."BACKLOAD_CUTOFF_DATE"<=:B1)
9 - access("A"."DATA_SRC_ID"="D"."DATA_SRC_ID")
10 - filter("A"."DATA_SRC_ID"=1 OR "A"."DATA_SRC_ID"=2 OR ("A"."DATA_SRC_ID"=3 OR "A"."DATA_SRC_ID"=4 OR
"A"."DATA_SRC_ID"=5 OR "A"."DATA_SRC_ID"=6 OR "A"."DATA_SRC_ID"=10) AND "LAST_DAY"."LAST_WORKING_DAY"<>:B1 OR
"A"."DATA_SRC_ID"=8 AND "LAST_DAY"."LAST_CALENDAR_DAY"<>:B2)
12 - filter(TO_CHAR(INTERNAL_FUNCTION("TBL_CALENDAR_DATE"),'MMYYYY')=TO_CHAR(:B1,'MMYYYY'))
couple of comments on a few really *bad* practices:
a) TO_CHAR(TBL_CALENDAR_DATE,'MMYYYY') MMYYYY
use trunc( dt, 'mm' ). keep the date a date, trunc to the month level. don't conver to a string (hugely expensive). especially when you need it to compare to something later: TO_CHAR(REPORT_DATE, 'MMYYYY') = LAST_DAY.MMYYYY. that should be "trunc(report_date,'mm') = last_day.mmyyyy.
trunc simply resets bytes at the end of a 7 byte field. to_char has to run through probably hundreds of thousands of lines of code!
b) CASE WHEN TO_CHAR(TBL_CALENDAR_DATE, 'D') IN (2, 3, 4, 5, 6)
compare numbers to numbers, dates to dates, strings to string. here you are comparing a string to a number - meaning there is an implicit conversion in there - meaning this code is *bad* (implicit conversions are *evil* - avoid them)
b) TO_CHAR(TBL_CALENDAR_DATE, 'd' )
beware of date functions that are dependent on where you are located.
ops$tkyte%ORA11GR2> alter session set nls_territory = america;
Session altered.
ops$tkyte%ORA11GR2> select to_char(sysdate,'d'), trunc(sysdate)-trunc(sysdate,'iw')+1
2 from dual;
T TRUNC(SYSDATE)-TRUNC(SYSDATE,'IW')+1
- ------------------------------------
2 1
ops$tkyte%ORA11GR2> alter session set nls_territory = germany;
Session altered.
ops$tkyte%ORA11GR2> select to_char(sysdate,'d'), trunc(sysdate)-trunc(sysdate,'iw')+1
2 from dual;
T TRUNC(SYSDATE)-TRUNC(SYSDATE,'IW')+1
- ------------------------------------
1 1
ops$tkyte%ORA11GR2>
your code will return different answers in different countries... you might consider using the IW approach - it is based on ISO standards (worldwide) instead of country by country settings.
c) MAX(LAST_DAY(TBL_CALENDAR_DATE)) LAST_CALENDAR_DAY
don't compute things you don't need. You only reference last_calendar_day in the outer query in the predicate and could easily have used last_day(mmyyyy)
d) AND REPORT_DATE < '31-OCT-12'
never compare dates to strings, strings to numbers, etc.
always compare dates to dates, strings to strings, numbers to numbers.
never rely on implicit conversions, consider:
ops$tkyte%ORA11GR2> create table t ( x date );
Table created.
ops$tkyte%ORA11GR2>
ops$tkyte%ORA11GR2> insert into t values ( to_date( '01-jan-2012', 'dd-mon-yyyy' ) );
1 row created.
ops$tkyte%ORA11GR2>
ops$tkyte%ORA11GR2> select * from t where x < '31-oct-12';
no rows selected
ops$tkyte%ORA11GR2> alter session set nls_date_format = 'dd-mon-yyyy';
Session altered.
ops$tkyte%ORA11GR2> select * from t where x < '31-oct-12';
no rows selected
ops$tkyte%ORA11GR2>
whoops, wrong answer all of a sudden.....
e) AND REPORT_DATE < '31-OCT-12'
you must have started programming in about 2001 or 2002 or later.... Anyone that programmed in the 1990's knows - it takes four characters for years. Always and forever.
I'm not a fan of hints at all - so no, I'm not going to give a magical hint - you might try using dynamic sampling set to level 3 or above - the exists query is getting what looks like an incorrect cardinality estimate. the first query gets 11-12,000 rows as a guess, the second gets 1 row as a guess. that is the root cause of the bad plan (incorrect cardinality estimates). I cannot tell really where it is incorrect (because you know your data, I do not - you know what indexes go with what tables - i do not, makes it really hard to read a plan...)
so, looking at these plans - where are the cardinality estimates off by a large amount. we'll start there - figure out how we might get the right cardinalities and then the right plan will just "fall out" from that.
as far as I know, these two queries are not equivalent. the join query could return many more rows than the exists. I don't know your constraints (primary keys, foreign keys, not null, etc) and hence cannot even tell you if these two queries are interchangeable!
man@ora10g:rac1> select a.rowid
from emp a where a.hiredate >to_date('1987-04-19','yyyy-mm-dd')
and exists
( select null from big_table b , big_table c where b.object_id = c.object_id ); 2 3 4
ROWID
------------------
AAANjIAAEAACMNcAAK
Elapsed: 00:00:00.03
Execution Plan
----------------------------------------------------------
Plan hash value: 4226424926
-----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 7692K| 146M| 223K (3)| 00:44:45 |
|* 1 | FILTER | | | | | |
|* 2 | TABLE ACCESS FULL | EMP | 7692K| 146M| 223K (3)| 00:44:45 |
| 3 | NESTED LOOPS | | 1 | 10 | 4 (0)| 00:00:01 |
| 4 | INDEX FAST FULL SCAN| BIG_TBL_ID_IDX | 50503 | 246K| 2 (0)| 00:00:01 |
|* 5 | INDEX RANGE SCAN | BIG_TBL_ID_IDX | 1 | 5 | 2 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter( EXISTS (SELECT /*+ */ 0 FROM "BIG_TABLE" "C","BIG_TABLE" "B"
WHERE "B"."OBJECT_ID"="C"."OBJECT_ID"))
2 - filter("A"."HIREDATE">TO_DATE(' 1987-04-19 00:00:00', 'syyyy-mm-dd
hh24:mi:ss'))
5 - access("B"."OBJECT_ID"="C"."OBJECT_ID")
Then, I took tkprof
********************************************************************************
select a.rowid
from emp a where a.hiredate >to_date('1987-04-19','yyyy-mm-dd')
and exists
( select null from big_table b , big_table c where b.object_id = c.object_id )
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.02 0 18 0 0
Fetch 2 0.00 0.00 0 4 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 0.01 0.03 0 22 0 1
Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 61
Rows Row Source Operation
------- ---------------------------------------------------
1 FILTER (cr=22 pr=0 pw=0 time=1125 us)
1 TABLE ACCESS FULL EMP (cr=4 pr=0 pw=0 time=273 us)
1 NESTED LOOPS (cr=18 pr=0 pw=0 time=776 us)
1 INDEX FAST FULL SCAN BIG_TBL_ID_IDX (cr=15 pr=0 pw=0 time=638 us)(object id 55243)
1 INDEX RANGE SCAN BIG_TBL_ID_IDX (cr=3 pr=0 pw=0 time=92 us)(object id 55243)
********************************************************************************
So Oracle is doing right thing here - as usual.
as soon as the 1st row is returned from the NL.. job is done .. no longer required joining each row from big_table a, to big_table c
I can see estimation of row in explain plan is way off the reality... is this the reason we should not rely much on explain plan. so we would not go with the cost as shown in the case in above post - might be tkprof would reveal more?
how did it come up with over 7,000 rows in EMP??????
ops$tkyte%ORA10GR2> create table emp as select * from scott.emp;
Table created.
ops$tkyte%ORA10GR2> create table big_table as select * from all_objects;
Table created.
ops$tkyte%ORA10GR2> alter table big_table add constraint bt_pk primary key(object_id);
Table altered.
ops$tkyte%ORA10GR2> exec dbms_stats.gather_table_stats( user, 'EMP' );
PL/SQL procedure successfully completed.
ops$tkyte%ORA10GR2> exec dbms_stats.gather_table_stats( user, 'BIG_TABLE' );
PL/SQL procedure successfully completed.
ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> set autotrace on explain
ops$tkyte%ORA10GR2> select a.rowid
2 from emp a where a.hiredate >to_date('1987-04-19','yyyy-mm-dd')
3 and exists
4 ( select null from big_table b , big_table c where b.object_id =
5 c.object_id );
ROWID
------------------
AAARLfAAEAAASvUAAK
Execution Plan
----------------------------------------------------------
Plan hash value: 2260623314
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 20 | 5 (0)| 00:00:01 |
|* 1 | FILTER | | | | | |
|* 2 | TABLE ACCESS FULL | EMP | 1 | 20 | 3 (0)| 00:00:01 |
| 3 | NESTED LOOPS | | 2 | 20 | 2 (0)| 00:00:01 |
| 4 | INDEX FAST FULL SCAN| BT_PK | 48879 | 238K| 2 (0)| 00:00:01 |
|* 5 | INDEX UNIQUE SCAN | BT_PK | 1 | 5 | 0 (0)| 00:00:01 |
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter( EXISTS (SELECT 0 FROM "BIG_TABLE" "C","BIG_TABLE" "B"
WHERE "B"."OBJECT_ID"="C"."OBJECT_ID"))
2 - filter("A"."HIREDATE">TO_DATE(' 1987-04-19 00:00:00',
'syyyy-mm-dd hh24:mi:ss'))
5 - access("B"."OBJECT_ID"="C"."OBJECT_ID")
ops$tkyte%ORA10GR2> set autotrace off
post your step by steps to reproduce please.
Dear Tom,
My bad !
emp table in MAN schema had stats way off.
man@ora10g:rac1> select owner, table_name, num_rows from dba_tables where table_name ='EMP';
OWNER TABLE_NAME NUM_ROWS
------------------------------ ------------------------------ ----------
MAN EMP 100000000
later I gather stats.
man@ora10g:rac1> exec dbms_stats.gather_table_stats(user, 'EMP', cascade=>true);
PL/SQL procedure successfully completed.
man@ora10g:rac1> select owner, table_name, num_rows, last_analyzed from dba_tables where table_name ='EMP';
OWNER TABLE_NAME NUM_ROWS LAST_ANALYZED
------------------------------ ------------------------------ ---------- -------------------
MAN EMP 14 2013-02-05 12:25:31
man@ora10g:rac1> explain plan for
2 select a.rowid
from emp a where a.hiredate >to_date('1987-04-19','yyyy-mm-dd')
and exists
( select null from big_table b , big_table c where b.object_id = c.object_id ); 3 4 5
Explained.
man@ora10g:rac1> @xplan
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 4226424926
-----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 20 | 7 (0)| 00:00:01 |
|* 1 | FILTER | | | | | |
|* 2 | TABLE ACCESS FULL | EMP | 1 | 20 | 3 (0)| 00:00:01 |
| 3 | NESTED LOOPS | | 1 | 10 | 4 (0)| 00:00:01 |
| 4 | INDEX FAST FULL SCAN| BIG_TBL_ID_IDX | 50503 | 246K| 2 (0)| 00:00:01 |
|* 5 | INDEX RANGE SCAN | BIG_TBL_ID_IDX | 1 | 5 | 2 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter( EXISTS (SELECT /*+ */ 0 FROM "BIG_TABLE" "C","BIG_TABLE" "B"
WHERE "B"."OBJECT_ID"="C"."OBJECT_ID"))
2 - filter("A"."HIREDATE">TO_DATE(' 1987-04-19 00:00:00', 'syyyy-mm-dd
hh24:mi:ss'))
5 - access("B"."OBJECT_ID"="C"."OBJECT_ID")
21 rows selected.
Hi Tom,
Hope you are fine :)
#############
Create Table A As Select Level A From Dual Connect By Level<=5;
create table b as select level+2 a from dual connect by level<=5 union select null from dual;
Select * From A;
1
2
3
4
5
Select * From B;
3
4
5
6
7
Select * From A Where A not In (Select A From B);
no rows selected
select * from a where a in (select a from b);
3
4
5
##############
You told at the top:
--------
Beware the NULL value
and NOT IN!! (also the reason why NOT IN is sometimes avoided).
---------
That is true but why oracle doesn't handle as it handles in case of "in".
I mean to say how oracle skip the values "1" and "2" in case of "in" which is not correct (as we know null cant be compared with anything (null is unknown), how it calculates that "1" in "null" and "2" in "null" are false).
Many people say that, that is side effect of "not in" , but could you please say something :).
Hi Tom,
You said, NOT IN and NOT EXISTS are not the same!!! logically I understand that but I did not quite get, as the result I get is the same (query below). Are you referring to saying (NOT THE SAME) that, EXISTS evaluates to true immediately on the first occurrence of the value during a scan on the table/index?
Used smaple HR schema.
select department_id from departments d where not exists (select null from employees e where e.department_id = d.department_id);
select department_id from departments d where department_id not in (select department_id from employees e where e.department_id = d.department_id);
Hi Tom,
I'm still struggling with the difference in null handling between NOT IN and NOT EXISTS. The example you gave that worked, only worked because the correlated subquery inner join eliminated the null in the table (emp 7839 has no manager):
select count(*) from emp t1 where not exists ( select null from emp t2 where mgr = t1.empno);
If I add "or m.mgr is null", I get 0 records, the same answer I get with NOT IN.
select count(*) from emp t1 where not exists ( select null from emp t2 where mgr = t1.empno or mgr is null);
Also, if I use NOT IN in your NOT EXISTS query, it returns the correct count of employees who aren't managers:
select count(*) from emp t1 where empno not in (select mgr from emp t2 where mgr = t1.empno);
How can I see the difference?
Other answer said/implied that the set in:
select count(*) from emp where empno not in ( select mgr from emp );
evaluated to ( NULL ) - why? It seems like it should evaluate to a set that includes managers and a null (does a set automatically discard redundant values?) . Is that correct? And are you saying this wouldn't work- that NULL would make the set poorly defined?
Would this work better/well?:
select count(*) from emp where empno not in ( select distinct(mgr) from emp where mgr is not null);