Obwohl ich hier in der Vergangenheit diverse Aussagen (vor allem von Randolf Geist und Maria Colgan) zum Thema der incremental statistics verlinkt habe und glaube, eine halbwegs klare Vorstellung ihrer Rolle zu haben, hier ein Test, mit dem ich diese Vorstellungen überprüfen und präzisieren will.
Das grundsätzliche Problem, das mit den inkrementellen Statistiken angegangen werden soll, ist folgendes: für sehr große partitionierte Tabellen, bei denen in erster Linie neue Daten in neuen Partitionen ergänzt werden (also z.B. Abverkaufsdaten), ist es relativ einfach - und billig - Statistiken für neue Partitionen anzulegen. Sehr teuer ist es aber, die globalen Statistiken zu aktualisieren, da dazu ein Lesen der kompletten Tabelle erforderlich ist. Alternativ kann man die globalen Statistiken auch aus den Partitions-Statistiken aggregieren lassen, aber dabei können die NDV(= number of distinct values)-Werte nicht sinnvoll ermittelt werden. Man kann in solchen Fällen auch globale Statistiken auf der Basis beschränkter Block-Samples anlegen lassen, aber auch das kann natürlich zu massiven Fehlern führen (wenn die Daten eine Clusterung in den Blocks besitzen).
Die in 11.1 eingeführten inkrementellen Statistiken lösen das Problem, indem sie diverse interne Hilfstabellen mit den Daten einer Synopsis füllen, aus der sich dann die NDV-Werte ableiten lassen. Dazu nun der Test (11.1.0.7, Windows XP, ASSM, 8K Blocksize):
Fall 1: initiale Statistikerfassung
-- Anlage einer intervall-partitionierten Testtabelle drop table test_big_part; create table test_big_part ( startdate date , id number , n_col1 number , n_col2 number , n_col3 number , n_col4 number , v_col1 varchar2(50) , v_col2 varchar2(50) , v_col3 varchar2(50) , v_col4 varchar2(50) ) partition by range (startdate) interval (NUMTODSINTERVAL(1,'DAY')) ( partition test_p1 values less than (to_date('20120906', 'yyyymmdd')) ); -- Einfügen von 10M rows mit Attributen, deren Werte sich -- unterschiedlich häufig wiederholen insert into test_big_part with generator as ( select trunc(sysdate) + mod(rownum, 10) a_date from dual connect by level <= 100 ) , basedata as ( select rownum id , mod(rownum, 2) n_col1 , mod(rownum, 10) n_col2 , round(rownum/10) n_col3 , dbms_random.value * 1000 n_col4 , lpad('*', 50, '*') v_col1 , dbms_random.string('a', 1) v_col2 , dbms_random.string('a', 50) v_col3 , 'test' v_col4 from dual connect by level <= 100000) select generator.a_date , basedata.* from generator , basedata; commit;
Nun folgt die Statistikerfassung, die jeweils bei aktiviertem SQL_TRACE erfolgt. Zunächst mit der Granularität GLOBAL AND PARTITION (alle anderen Parameter sind Default-Werte, nur die METHOD_OPT wurde explizit gesetzt, um die Erzeugung von Histogrammen zu vermeiden):
begin DBMS_STATS.GATHER_TABLE_STATS ( OwnName => user , TabName => 'TEST_BIG_PART' , Granularity => 'GLOBAL AND PARTITION' , Method_Opt => 'FOR ALL COLUMNS SIZE 1' ); end; /
Die Gesamtlaufzeit des Statistikerfassung beträgt in diesem Fall 54 sec., die sich ziemlich gleich auf das Lesen aller Partitionen und das Lesen der gesamten Tabelle verteilen:
-- Zugriff auf Partitionen SQL ID: 4mnm957p8yjs6 Plan Hash: 2760598834 select /*+ full(t) no_parallel(t) no_parallel_index(t) dbms_stats cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring */to_char(count("STARTDATE")),to_char(substrb(dump(min("STARTDATE"),16,0,32) ,1,120)),to_char(substrb(dump(max("STARTDATE"),16,0,32),1,120)), to_char(count("ID")),to_char(substrb(dump(min("ID"),16,0,32),1,120)), to_char(substrb(dump(max("ID"),16,0,32),1,120)),to_char(count("N_COL1")), to_char(substrb(dump(min("N_COL1"),16,0,32),1,120)), to_char(substrb(dump(max("N_COL1"),16,0,32),1,120)),to_char(count("N_COL2")) ,to_char(substrb(dump(min("N_COL2"),16,0,32),1,120)), to_char(substrb(dump(max("N_COL2"),16,0,32),1,120)),to_char(count("N_COL3")) ,to_char(substrb(dump(min("N_COL3"),16,0,32),1,120)), to_char(substrb(dump(max("N_COL3"),16,0,32),1,120)),to_char(count("N_COL4")) ,to_char(substrb(dump(min("N_COL4"),16,0,32),1,120)), to_char(substrb(dump(max("N_COL4"),16,0,32),1,120)),to_char(count("V_COL1")) ,to_char(substrb(dump(min("V_COL1"),16,0,32),1,120)), to_char(substrb(dump(max("V_COL1"),16,0,32),1,120)),to_char(count("V_COL2")) ,to_char(substrb(dump(min("V_COL2"),16,0,32),1,120)), to_char(substrb(dump(max("V_COL2"),16,0,32),1,120)),to_char(count("V_COL3")) ,to_char(substrb(dump(min("V_COL3"),16,0,32),1,120)), to_char(substrb(dump(max("V_COL3"),16,0,32),1,120)),to_char(count("V_COL4")) ,to_char(substrb(dump(min("V_COL4"),16,0,32),1,120)), to_char(substrb(dump(max("V_COL4"),16,0,32),1,120)) from "TEST"."TEST_BIG_PART" t where TBL$OR$IDX$PART$NUM("TEST"."TEST_BIG_PART",0, 4,0,"ROWID") = :objn /* NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV, NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL*/ call count cpu elapsed disk query current rows ------- ------ -------- ---------- ---------- ---------- ---------- ---------- Parse 10 0.00 0.00 0 0 0 0 Execute 10 0.00 0.00 0 0 0 0 Fetch 10 15.45 27.11 222785 223490 0 10 ------- ------ -------- ---------- ---------- ---------- ---------- ---------- total 30 15.45 27.11 222785 223490 0 10 Misses in library cache during parse: 1 Misses in library cache during execute: 1 Optimizer mode: ALL_ROWS Parsing user id: 93 (recursive depth: 1) Rows Row Source Operation ------- --------------------------------------------------- 1 SORT AGGREGATE (cr=22349 pr=22272 pw=0 time=0 us) 998200 APPROXIMATE NDV AGGREGATE (cr=22349 pr=22272 pw=0 time=0 us cost=6109 size=3539530 card=18245) 1000000 PARTITION RANGE SINGLE PARTITION: KEY KEY (cr=22349 pr=22272 pw=0 time=31249 us cost=6109 size=3539530 card=18245) 1000000 TABLE ACCESS FULL TEST_BIG_PART PARTITION: KEY KEY (cr=22349 pr=22272 pw=0 time=31249 us cost=6109 size=3539530 card=18245) Elapsed times include waiting on following events: Event waited on Times Max. Wait Total Waited ---------------------------------------- Waited ---------- ------------ db file scattered read 2046 0.07 11.56 db file sequential read 45 0.00 0.02 ******************************************************************************** -- Zugriff auf Gesamttabelle SQL ID: 8zu50wq38a6zk Plan Hash: 719485973 select /*+ full(t) no_parallel(t) no_parallel_index(t) dbms_stats cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring */to_char(count("STARTDATE")),to_char(substrb(dump(min("STARTDATE"),16,0,32) ,1,120)),to_char(substrb(dump(max("STARTDATE"),16,0,32),1,120)), to_char(count("ID")),to_char(substrb(dump(min("ID"),16,0,32),1,120)), to_char(substrb(dump(max("ID"),16,0,32),1,120)),to_char(count("N_COL1")), to_char(substrb(dump(min("N_COL1"),16,0,32),1,120)), to_char(substrb(dump(max("N_COL1"),16,0,32),1,120)),to_char(count("N_COL2")) ,to_char(substrb(dump(min("N_COL2"),16,0,32),1,120)), to_char(substrb(dump(max("N_COL2"),16,0,32),1,120)),to_char(count("N_COL3")) ,to_char(substrb(dump(min("N_COL3"),16,0,32),1,120)), to_char(substrb(dump(max("N_COL3"),16,0,32),1,120)),to_char(count("N_COL4")) ,to_char(substrb(dump(min("N_COL4"),16,0,32),1,120)), to_char(substrb(dump(max("N_COL4"),16,0,32),1,120)),to_char(count("V_COL1")) ,to_char(substrb(dump(min("V_COL1"),16,0,32),1,120)), to_char(substrb(dump(max("V_COL1"),16,0,32),1,120)),to_char(count("V_COL2")) ,to_char(substrb(dump(min("V_COL2"),16,0,32),1,120)), to_char(substrb(dump(max("V_COL2"),16,0,32),1,120)),to_char(count("V_COL3")) ,to_char(substrb(dump(min("V_COL3"),16,0,32),1,120)), to_char(substrb(dump(max("V_COL3"),16,0,32),1,120)),to_char(count("V_COL4")) ,to_char(substrb(dump(min("V_COL4"),16,0,32),1,120)), to_char(substrb(dump(max("V_COL4"),16,0,32),1,120)) from "TEST"."TEST_BIG_PART" t /* NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL, NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL*/ call count cpu elapsed disk query current rows ------- ------ -------- ---------- ---------- ---------- ---------- ---------- Parse 1 0.00 0.00 0 0 0 0 Execute 1 0.00 0.00 0 0 0 0 Fetch 1 14.76 26.03 222783 223490 0 1 ------- ------ -------- ---------- ---------- ---------- ---------- ---------- total 3 14.76 26.03 222783 223490 0 1 Misses in library cache during parse: 1 Optimizer mode: ALL_ROWS Parsing user id: 93 (recursive depth: 1) Rows Row Source Operation ------- --------------------------------------------------- 1 SORT AGGREGATE (cr=223490 pr=222783 pw=0 time=0 us) 9980200 APPROXIMATE NDV AGGREGATE (cr=223490 pr=222783 pw=0 time=437497 us cost=60962 size=3320566340 card=18244870) 10000000 PARTITION RANGE ALL PARTITION: 1 1048575 (cr=223490 pr=222783 pw=0 time=124995 us cost=60962 size=3320566340 card=18244870) 10000000 TABLE ACCESS FULL TEST_BIG_PART PARTITION: 1 1048575 (cr=223490 pr=222783 pw=0 time=124995 us cost=60962 size=3320566340 card=18244870) Elapsed times include waiting on following events: Event waited on Times Max. Wait Total Waited ---------------------------------------- Waited ---------- ------------ db file scattered read 2046 0.05 11.15 db file sequential read 45 0.00 0.01 ********************************************************************************
Im Fall der initialen Anlage inkrementeller Statistiken ergeben sich (wie erwartet) andere Operationen:
-- Aktivierung der inkrementellen Statistikerfassung begin dbms_stats.set_table_prefs('TEST', 'TEST_BIG_PART', 'INCREMENTAL', 'TRUE'); end; / begin DBMS_STATS.GATHER_TABLE_STATS ( OwnName => user , TabName => 'TEST_BIG_PART' , Granularity => 'AUTO' , Method_Opt => 'FOR ALL COLUMNS SIZE 1' ); end; /
In diesem Fall beträgt die Laufzeit 45 sec., aber daraus würde ich nicht zwingend eine notwendige Beschleunigung bei der initialen Statistikerfassung gegenüber dem ersten Fall 'GLOBAL AND PARTITION' ableiten - hier zumindest reduziert sich die Laufzeit um ca. 10 sec. Dabei ergeben sich deutlich andere (und anscheinend kompliziertere) Operationen als im nicht-inkrementellen Fall.
SELECT ... FROM SYS.WRI$_OPTSTAT_SYNOPSIS_HEAD$ ... DELETE FROM SYS.WRI$_OPTSTAT_SYNOPSIS$ ... DELETE FROM SYS.WRI$_OPTSTAT_SYNOPSIS_HEAD$ ... INSERT INTO SYS.WRI$_OPTSTAT_SYNOPSIS_HEAD$ ... SELECT ... FROM SYS.MON_MODS_ALL ... MERGE ... INTO SYS.MON_MODS_ALL$ ... SELECT ... FROM SYS.MON_MODS ... DELETE ... FROM SYS.MON_MODS ... DELETE ... FROM WRI$_OPTSTAT_SYNOPSIS$ ... -- Zugriff auf Partitionen SQL ID: a4sr7dfdnjss5 Plan Hash: 2760598834 select /*+ full(t) no_parallel(t) no_parallel_index(t) dbms_stats cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring */to_char(count("STARTDATE")),to_char(substrb(dump(min("STARTDATE"),16,0,32) ,1,120)),to_char(substrb(dump(max("STARTDATE"),16,0,32),1,120)), to_char(count("ID")),to_char(substrb(dump(min("ID"),16,0,32),1,120)), to_char(substrb(dump(max("ID"),16,0,32),1,120)),to_char(count("N_COL1")), to_char(substrb(dump(min("N_COL1"),16,0,32),1,120)), to_char(substrb(dump(max("N_COL1"),16,0,32),1,120)),to_char(count("N_COL2")) ,to_char(substrb(dump(min("N_COL2"),16,0,32),1,120)), to_char(substrb(dump(max("N_COL2"),16,0,32),1,120)),to_char(count("N_COL3")) ,to_char(substrb(dump(min("N_COL3"),16,0,32),1,120)), to_char(substrb(dump(max("N_COL3"),16,0,32),1,120)),to_char(count("N_COL4")) ,to_char(substrb(dump(min("N_COL4"),16,0,32),1,120)), to_char(substrb(dump(max("N_COL4"),16,0,32),1,120)),to_char(count("V_COL1")) ,to_char(substrb(dump(min("V_COL1"),16,0,32),1,120)), to_char(substrb(dump(max("V_COL1"),16,0,32),1,120)),to_char(count("V_COL2")) ,to_char(substrb(dump(min("V_COL2"),16,0,32),1,120)), to_char(substrb(dump(max("V_COL2"),16,0,32),1,120)),to_char(count("V_COL3")) ,to_char(substrb(dump(min("V_COL3"),16,0,32),1,120)), to_char(substrb(dump(max("V_COL3"),16,0,32),1,120)),to_char(count("V_COL4")) ,to_char(substrb(dump(min("V_COL4"),16,0,32),1,120)), to_char(substrb(dump(max("V_COL4"),16,0,32),1,120)) from "TEST"."TEST_BIG_PART" t where TBL$OR$IDX$PART$NUM("TEST"."TEST_BIG_PART",0, 4,0,"ROWID") = :objn /* SYN,NIL,NIL,SYN,NIL,NIL,SYN,NIL,NIL,SYN,NIL,NIL,SYN, NIL,NIL,SYN,NIL,NIL,SYN,NIL,NIL,SYN,NIL,NIL,SYN,NIL,NIL,SYN,NIL,NIL*/ call count cpu elapsed disk query current rows ------- ------ -------- ---------- ---------- ---------- ---------- ---------- Parse 10 0.00 0.00 0 0 0 0 Execute 10 0.00 0.00 0 0 0 0 Fetch 10 15.34 27.53 222785 223490 0 10 ------- ------ -------- ---------- ---------- ---------- ---------- ---------- total 30 15.34 27.53 222785 223490 0 10 Misses in library cache during parse: 1 Misses in library cache during execute: 1 Optimizer mode: ALL_ROWS Parsing user id: 93 (recursive depth: 1) Rows Row Source Operation ------- --------------------------------------------------- 1 SORT AGGREGATE (cr=22349 pr=22228 pw=0 time=0 us) 998200 APPROXIMATE NDV AGGREGATE (cr=22349 pr=22228 pw=0 time=0 us cost=6087 size=1530000 card=10000) 1000000 PARTITION RANGE SINGLE PARTITION: KEY KEY (cr=22349 pr=22228 pw=0 time=0 us cost=6087 size=1530000 card=10000) 1000000 TABLE ACCESS FULL TEST_BIG_PART PARTITION: KEY KEY (cr=22349 pr=22228 pw=0 time=0 us cost=6087 size=1530000 card=10000) Elapsed times include waiting on following events: Event waited on Times Max. Wait Total Waited ---------------------------------------- Waited ---------- ------------ db file scattered read 2046 0.08 11.83 db file sequential read 45 0.01 0.03 ******************************************************************************** UPDATE SYS.WRI$_OPTSTAT_SYNOPSIS_HEAD$ ... INSERT INTO SYS.WRI$_OPTSTAT_SYNOPSIS$ ... INSERT INTO SYS.WRI$_OPTSTAT_HISTHEAD_HISTORY
Demnach erfolgen in diesem Fall relativ komplexe Metadaten-Operationen, die die WRI$_OPTSTAT%-Tabellen auslesen und aktualisieren (wobei recht große Datenmengen entstehen können, wie Randolf Geist in seinem Block ausführt).
Fall 2: Ergänzung einer weiteren Partition
Nun der relevantere Fall: wie verhalten sich die beiden Verfahren nach Ergänzung einer zusätzlichen Tagespartition?
-- Daten für einen weiteren Tag insert into test_big_part with generator as ( select trunc(sysdate) + 10 a_date from dual connect by level <= 10 ) , basedata as ( select rownum id , mod(rownum, 2) n_col1 , mod(rownum, 10) n_col2 , round(rownum/10) n_col3 , dbms_random.value * 1000 n_col4 , lpad('*', 50, '*') v_col1 , dbms_random.string('a', 1) v_col2 , dbms_random.string('a', 50) v_col3 , 'test' v_col4 from dual connect by level <= 100000) select generator.a_date , basedata.* from generator , basedata; commit;
Für den nicht-inkrementellen Fall ergibt sich für die folgende Statistikerfassung folgendes Bild:
-- Statistikerfassung für neue Partition und global begin DBMS_STATS.GATHER_TABLE_STATS (OwnName => user , TabName => 'TEST_BIG_PART' , partname => 'SYS_P80' , Method_Opt => 'FOR ALL COLUMNS SIZE 1' ); end; /
Im Trace sieht man nun erwartungsgemäß den Zugriff auf eine Partition und die folgende globale Statistikerfassung:
-- Zugriff auf eine Partition SQL ID: 4mnm957p8yjs6 Plan Hash: 2760598834 select /*+ full(t) no_parallel(t) no_parallel_index(t) dbms_stats cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring */to_char(count("STARTDATE")),to_char(substrb(dump(min("STARTDATE"),16,0,32) ,1,120)),to_char(substrb(dump(max("STARTDATE"),16,0,32),1,120)), to_char(count("ID")),to_char(substrb(dump(min("ID"),16,0,32),1,120)), to_char(substrb(dump(max("ID"),16,0,32),1,120)),to_char(count("N_COL1")), to_char(substrb(dump(min("N_COL1"),16,0,32),1,120)), to_char(substrb(dump(max("N_COL1"),16,0,32),1,120)),to_char(count("N_COL2")) ,to_char(substrb(dump(min("N_COL2"),16,0,32),1,120)), to_char(substrb(dump(max("N_COL2"),16,0,32),1,120)),to_char(count("N_COL3")) ,to_char(substrb(dump(min("N_COL3"),16,0,32),1,120)), to_char(substrb(dump(max("N_COL3"),16,0,32),1,120)),to_char(count("N_COL4")) ,to_char(substrb(dump(min("N_COL4"),16,0,32),1,120)), to_char(substrb(dump(max("N_COL4"),16,0,32),1,120)),to_char(count("V_COL1")) ,to_char(substrb(dump(min("V_COL1"),16,0,32),1,120)), to_char(substrb(dump(max("V_COL1"),16,0,32),1,120)),to_char(count("V_COL2")) ,to_char(substrb(dump(min("V_COL2"),16,0,32),1,120)), to_char(substrb(dump(max("V_COL2"),16,0,32),1,120)),to_char(count("V_COL3")) ,to_char(substrb(dump(min("V_COL3"),16,0,32),1,120)), to_char(substrb(dump(max("V_COL3"),16,0,32),1,120)),to_char(count("V_COL4")) ,to_char(substrb(dump(min("V_COL4"),16,0,32),1,120)), to_char(substrb(dump(max("V_COL4"),16,0,32),1,120)) from "TEST"."TEST_BIG_PART" t where TBL$OR$IDX$PART$NUM("TEST"."TEST_BIG_PART",0, 4,0,"ROWID") = :objn /* NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV, NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL*/ call count cpu elapsed disk query current rows ------- ------ -------- ---------- ---------- ---------- ---------- ---------- Parse 1 0.00 0.00 0 0 0 0 Execute 1 0.00 0.00 0 0 0 0 Fetch 1 1.54 2.34 16865 22349 0 1 ------- ------ -------- ---------- ---------- ---------- ---------- ---------- total 3 1.54 2.34 16865 22349 0 1 Misses in library cache during parse: 0 Optimizer mode: ALL_ROWS Parsing user id: 93 (recursive depth: 1) Rows Row Source Operation ------- --------------------------------------------------- 1 SORT AGGREGATE (cr=22349 pr=16865 pw=0 time=0 us) 998380 APPROXIMATE NDV AGGREGATE (cr=22349 pr=16865 pw=0 time=0 us cost=6087 size=1530000 card=10000) 1000000 PARTITION RANGE SINGLE PARTITION: KEY KEY (cr=22349 pr=16865 pw=0 time=0 us cost=6087 size=1530000 card=10000) 1000000 TABLE ACCESS FULL TEST_BIG_PART PARTITION: KEY KEY (cr=22349 pr=16865 pw=0 time=0 us cost=6087 size=1530000 card=10000) Elapsed times include waiting on following events: Event waited on Times Max. Wait Total Waited ---------------------------------------- Waited ---------- ------------ db file scattered read 161 0.02 0.81 ******************************************************************************** -- globaler Zugriff SQL ID: 8zu50wq38a6zk Plan Hash: 719485973 select /*+ full(t) no_parallel(t) no_parallel_index(t) dbms_stats cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring */to_char(count("STARTDATE")),to_char(substrb(dump(min("STARTDATE"),16,0,32) ,1,120)),to_char(substrb(dump(max("STARTDATE"),16,0,32),1,120)), to_char(count("ID")),to_char(substrb(dump(min("ID"),16,0,32),1,120)), to_char(substrb(dump(max("ID"),16,0,32),1,120)),to_char(count("N_COL1")), to_char(substrb(dump(min("N_COL1"),16,0,32),1,120)), to_char(substrb(dump(max("N_COL1"),16,0,32),1,120)),to_char(count("N_COL2")) ,to_char(substrb(dump(min("N_COL2"),16,0,32),1,120)), to_char(substrb(dump(max("N_COL2"),16,0,32),1,120)),to_char(count("N_COL3")) ,to_char(substrb(dump(min("N_COL3"),16,0,32),1,120)), to_char(substrb(dump(max("N_COL3"),16,0,32),1,120)),to_char(count("N_COL4")) ,to_char(substrb(dump(min("N_COL4"),16,0,32),1,120)), to_char(substrb(dump(max("N_COL4"),16,0,32),1,120)),to_char(count("V_COL1")) ,to_char(substrb(dump(min("V_COL1"),16,0,32),1,120)), to_char(substrb(dump(max("V_COL1"),16,0,32),1,120)),to_char(count("V_COL2")) ,to_char(substrb(dump(min("V_COL2"),16,0,32),1,120)), to_char(substrb(dump(max("V_COL2"),16,0,32),1,120)),to_char(count("V_COL3")) ,to_char(substrb(dump(min("V_COL3"),16,0,32),1,120)), to_char(substrb(dump(max("V_COL3"),16,0,32),1,120)),to_char(count("V_COL4")) ,to_char(substrb(dump(min("V_COL4"),16,0,32),1,120)), to_char(substrb(dump(max("V_COL4"),16,0,32),1,120)) from "TEST"."TEST_BIG_PART" t /* NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL, NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL*/ call count cpu elapsed disk query current rows ------- ------ -------- ---------- ---------- ---------- ---------- ---------- Parse 1 0.00 0.00 0 0 0 0 Execute 1 0.00 0.00 0 0 0 0 Fetch 1 16.75 29.05 239658 245839 0 1 ------- ------ -------- ---------- ---------- ---------- ---------- ---------- total 3 16.75 29.05 239658 245839 0 1 Misses in library cache during parse: 0 Optimizer mode: ALL_ROWS Parsing user id: 93 (recursive depth: 1) Rows Row Source Operation ------- --------------------------------------------------- 1 SORT AGGREGATE (cr=245839 pr=239658 pw=0 time=0 us) 10979786 APPROXIMATE NDV AGGREGATE (cr=245839 pr=239658 pw=0 time=301646 us cost=60796 size=1530000000 card=10000000) 11000000 PARTITION RANGE ALL PARTITION: 1 1048575 (cr=245839 pr=239658 pw=0 time=239300 us cost=60796 size=1530000000 card=10000000) 11000000 TABLE ACCESS FULL TEST_BIG_PART PARTITION: 1 1048575 (cr=245839 pr=239658 pw=0 time=156303 us cost=60796 size=1530000000 card=10000000) Elapsed times include waiting on following events: Event waited on Times Max. Wait Total Waited ---------------------------------------- Waited ---------- ------------ db file scattered read 2207 0.08 12.48 db file sequential read 45 0.01 0.03 ********************************************************************************
Hier gibt's also keine Überraschung: die Statistik für die Partition ist schnell erzeugt, aber die Aktualisierung der globalen Statistiken erfordert wieder das Lesen der kompletten Daten. Nun zu den inkrementellen Statistiken und ihrem Verhalten bei der Ergänzung einer weiteren Partition:
begin DBMS_STATS.GATHER_TABLE_STATS ( OwnName => user , TabName => 'TEST_BIG_PART' , Granularity => 'AUTO' , Method_Opt => 'FOR ALL COLUMNS SIZE 1' ); end; /
In diesem Fall sinkt die Laufzeit von ca. 30 sec. auf ca. 10 sec. Davon entfällt der größte Teil (ca. 7 sec.) auf das Lesen der Partition; zusätzlich gibt es aber wieder diverse lesende und schreibende Zugriffe auf die WRI$_OPTSTAT%-Synposis-Tabellen:
SQL ID: a4sr7dfdnjss5 Plan Hash: 2760598834 select /*+ full(t) no_parallel(t) no_parallel_index(t) dbms_stats cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring */to_char(count("STARTDATE")),to_char(substrb(dump(min("STARTDATE"),16,0,32) ,1,120)),to_char(substrb(dump(max("STARTDATE"),16,0,32),1,120)), to_char(count("ID")),to_char(substrb(dump(min("ID"),16,0,32),1,120)), to_char(substrb(dump(max("ID"),16,0,32),1,120)),to_char(count("N_COL1")), to_char(substrb(dump(min("N_COL1"),16,0,32),1,120)), to_char(substrb(dump(max("N_COL1"),16,0,32),1,120)),to_char(count("N_COL2")) ,to_char(substrb(dump(min("N_COL2"),16,0,32),1,120)), to_char(substrb(dump(max("N_COL2"),16,0,32),1,120)),to_char(count("N_COL3")) ,to_char(substrb(dump(min("N_COL3"),16,0,32),1,120)), to_char(substrb(dump(max("N_COL3"),16,0,32),1,120)),to_char(count("N_COL4")) ,to_char(substrb(dump(min("N_COL4"),16,0,32),1,120)), to_char(substrb(dump(max("N_COL4"),16,0,32),1,120)),to_char(count("V_COL1")) ,to_char(substrb(dump(min("V_COL1"),16,0,32),1,120)), to_char(substrb(dump(max("V_COL1"),16,0,32),1,120)),to_char(count("V_COL2")) ,to_char(substrb(dump(min("V_COL2"),16,0,32),1,120)), to_char(substrb(dump(max("V_COL2"),16,0,32),1,120)),to_char(count("V_COL3")) ,to_char(substrb(dump(min("V_COL3"),16,0,32),1,120)), to_char(substrb(dump(max("V_COL3"),16,0,32),1,120)),to_char(count("V_COL4")) ,to_char(substrb(dump(min("V_COL4"),16,0,32),1,120)), to_char(substrb(dump(max("V_COL4"),16,0,32),1,120)) from "TEST"."TEST_BIG_PART" t where TBL$OR$IDX$PART$NUM("TEST"."TEST_BIG_PART",0, 4,0,"ROWID") = :objn /* SYN,NIL,NIL,SYN,NIL,NIL,SYN,NIL,NIL,SYN,NIL,NIL,SYN, NIL,NIL,SYN,NIL,NIL,SYN,NIL,NIL,SYN,NIL,NIL,SYN,NIL,NIL,SYN,NIL,NIL*/ call count cpu elapsed disk query current rows ------- ------ -------- ---------- ---------- ---------- ---------- ---------- Parse 1 0.00 0.00 0 0 0 0 Execute 1 0.00 0.00 0 0 0 0 Fetch 1 1.78 6.69 16141 44674 1 1 ------- ------ -------- ---------- ---------- ---------- ---------- ---------- total 3 1.78 6.69 16141 44674 1 1 Misses in library cache during parse: 1 Misses in library cache during execute: 1 Optimizer mode: ALL_ROWS Parsing user id: 93 (recursive depth: 1) Rows Row Source Operation ------- --------------------------------------------------- 1 SORT AGGREGATE (cr=44674 pr=16141 pw=0 time=0 us) 998146 APPROXIMATE NDV AGGREGATE (cr=44674 pr=16141 pw=0 time=110048 us cost=5536 size=15400000 card=100000) 1000000 PARTITION RANGE SINGLE PARTITION: KEY KEY (cr=44674 pr=16141 pw=0 time=31253 us cost=5536 size=15400000 card=100000) 1000000 TABLE ACCESS FULL TEST_BIG_PART PARTITION: KEY KEY (cr=44674 pr=16141 pw=0 time=31253 us cost=5536 size=15400000 card=100000) Elapsed times include waiting on following events: Event waited on Times Max. Wait Total Waited ---------------------------------------- Waited ---------- ------------ db file scattered read 268 0.11 4.34 db file sequential read 2 0.03 0.03 db file parallel read 2 0.55 0.61 ********************************************************************************
Demnach ist das inkrementelle Verfahren zur Aktualisierung der globalen Statistiken also deutlich schneller als das nicht-inkrementelle - was auch wieder keine große Überraschung ist. Bleibt die Frage nach der Qualität der Ergebnisse, deren Beantwortung ich der table function dbms_stats.diff_table_stats_in_history überlasse:
select * from table(dbms_stats.diff_table_stats_in_history( ownname => user, tabname => upper('TEST_BIG_PART'), time1 => systimestamp, time2 => to_timestamp('05.09.2012 12:05:00','dd.mm.yyyy hh24:mi:ss'), pctthreshold => 0)); REPORT -------------------------------------------------------------------------------- MAXDIFFPCT ---------- ############################################################################### STATISTICS DIFFERENCE REPORT FOR: ................................. TABLE : TEST_BIG_PART OWNER : TEST SOURCE A : Statistics as of 05.09.12 12:16:41,613000 +02:00 SOURCE B : Statistics as of 05.09.12 12:05:00,000000 +02:00 PCTTHRESHOLD : 0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ TABLE / (SUB)PARTITION STATISTICS DIFFERENCE: ............................................. OBJECTNAME TYP SRC ROWS BLOCKS ROWLEN SAMPSIZE ............................................................................... TEST_BIG_PART T A 11000000 245707 153 11000000 B 11000000 245707 154 11000000 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ COLUMN STATISTICS DIFFERENCE: ............................. COLUMN_NAME SRC NDV DENSITY HIST NULLS LEN MIN MAX SAMPSIZ ............................................................................... N_COL4 A 200512 ,000004987 NO 0 22 BF055 C20A6 1.1E+07 B 199488 ,000005012 NO 0 22 C0042 C20A6 1.1E+07 V_COL3 A 201744 ,000004956 NO 0 51 41414 7A7A7 1.1E+07 B 202816 ,000004930 NO 0 51 41414 7A7A7 1.1E+07 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ NO DIFFERENCE IN INDEX / (SUB)PARTITION STATISTICS ###############################################################################
Demnach ergeben sich in diesem Test-Beispiel nur geringfügige Abweichungen zwischen den beiden Verfahren.
Fazit:
- die inkrementellen Statistiken scheinen ziemlich genau das zu leisten, was sie versprechen.
- bei der initialen Anlage von Objektstatistiken (also der kompletten Neuerstellung der Statistiken für eine große partitionierte Tabelle) gibt im Hinblick auf die Laufzeit anscheinend keinen gravierenden Unterschied zwischen dem inkrementellen Verfahren und der vollständigen Erfassung.
- im Fall der Ergänzung neuer Partitionen erfolgt die Aktualisierung der globalen Statistiken über den inkrementellen Mechanismus auf der Basis der Synopsis-Informationen deutlich schneller als bei einem kompletten Neuaufbau.
- natürlich ist die Pflege der Synopsis-Informationen in den WRI$_OPTSTAT%-Tabellen nicht kostenlos: die Aktualisierung der Statistik macht relativ komplexe interne Queries und DML-Operationen erforderlich. Außerdem kann der Speicherplatzbedarf der Synopsis signifikant sein (wie Randolf Geist gezeigt hat).
- die Ergebnisse der inkrementellen und der nicht-inkrementellen Statistikerzeugung sind im Test nahezu identisch.
Keine Kommentare:
Kommentar veröffentlichen