MP Oracle Blog: Inkrementelle Statistiken für partitionierte Tabellen

Obwohl ich hier in der Vergangenheit diverse Aussagen (vor allem von Randolf Geist und Maria Colgan) zum Thema der incremental statistics verlinkt habe und glaube, eine halbwegs klare Vorstellung ihrer Rolle zu haben, hier ein Test, mit dem ich diese Vorstellungen überprüfen und präzisieren will.

Das grundsätzliche Problem, das mit den inkrementellen Statistiken angegangen werden soll, ist folgendes: für sehr große partitionierte Tabellen, bei denen in erster Linie neue Daten in neuen Partitionen ergänzt werden (also z.B. Abverkaufsdaten), ist es relativ einfach - und billig - Statistiken für neue Partitionen anzulegen. Sehr teuer ist es aber, die globalen Statistiken zu aktualisieren, da dazu ein Lesen der kompletten Tabelle erforderlich ist. Alternativ kann man die globalen Statistiken auch aus den Partitions-Statistiken aggregieren lassen, aber dabei können die NDV(= number of distinct values)-Werte nicht sinnvoll ermittelt werden. Man kann in solchen Fällen auch globale Statistiken auf der Basis beschränkter Block-Samples anlegen lassen, aber auch das kann natürlich zu massiven Fehlern führen (wenn die Daten eine Clusterung in den Blocks besitzen).

Die in 11.1 eingeführten inkrementellen Statistiken lösen das Problem, indem sie diverse interne Hilfstabellen mit den Daten einer Synopsis füllen, aus der sich dann die NDV-Werte ableiten lassen. Dazu nun der Test (11.1.0.7, Windows XP, ASSM, 8K Blocksize):

Fall 1: initiale Statistikerfassung

-- Anlage einer intervall-partitionierten Testtabelle
drop table test_big_part;
create table test_big_part (
    startdate date
  , id number
  , n_col1 number
  , n_col2 number
  , n_col3 number
  , n_col4 number
  , v_col1 varchar2(50)
  , v_col2 varchar2(50)
  , v_col3 varchar2(50)
  , v_col4 varchar2(50)
)
partition by range (startdate)
interval (NUMTODSINTERVAL(1,'DAY'))
(
    partition test_p1 values less than (to_date('20120906', 'yyyymmdd'))
);

-- Einfügen von 10M rows mit Attributen, deren Werte sich 
-- unterschiedlich häufig wiederholen
insert into test_big_part
with
generator as (
select trunc(sysdate) + mod(rownum, 10) a_date
  from dual
connect by level <= 100
)
,
basedata as (
select rownum id
    , mod(rownum, 2) n_col1
    , mod(rownum, 10) n_col2
    , round(rownum/10) n_col3
    , dbms_random.value * 1000 n_col4
    , lpad('*', 50, '*') v_col1
    , dbms_random.string('a', 1) v_col2
    , dbms_random.string('a', 50) v_col3
    , 'test' v_col4
  from dual
connect by level <= 100000)
select generator.a_date
     , basedata.*
  from generator
     , basedata;

commit;

Nun folgt die Statistikerfassung, die jeweils bei aktiviertem SQL_TRACE erfolgt. Zunächst mit der Granularität GLOBAL AND PARTITION (alle anderen Parameter sind Default-Werte, nur die METHOD_OPT wurde explizit gesetzt, um die Erzeugung von Histogrammen zu vermeiden):

begin
DBMS_STATS.GATHER_TABLE_STATS ( OwnName        => user
                              , TabName        => 'TEST_BIG_PART'      
                              , Granularity    => 'GLOBAL AND PARTITION'     
                              , Method_Opt     => 'FOR ALL COLUMNS SIZE 1'     
);
end;
/

Die Gesamtlaufzeit des Statistikerfassung beträgt in diesem Fall 54 sec., die sich ziemlich gleich auf das Lesen aller Partitionen und das Lesen der gesamten Tabelle verteilen:

-- Zugriff auf Partitionen
SQL ID: 4mnm957p8yjs6
Plan Hash: 2760598834
select /*+ full(t) no_parallel(t) no_parallel_index(t) dbms_stats 
  cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring 
  */to_char(count("STARTDATE")),to_char(substrb(dump(min("STARTDATE"),16,0,32)
  ,1,120)),to_char(substrb(dump(max("STARTDATE"),16,0,32),1,120)),
  to_char(count("ID")),to_char(substrb(dump(min("ID"),16,0,32),1,120)),
  to_char(substrb(dump(max("ID"),16,0,32),1,120)),to_char(count("N_COL1")),
  to_char(substrb(dump(min("N_COL1"),16,0,32),1,120)),
  to_char(substrb(dump(max("N_COL1"),16,0,32),1,120)),to_char(count("N_COL2"))
  ,to_char(substrb(dump(min("N_COL2"),16,0,32),1,120)),
  to_char(substrb(dump(max("N_COL2"),16,0,32),1,120)),to_char(count("N_COL3"))
  ,to_char(substrb(dump(min("N_COL3"),16,0,32),1,120)),
  to_char(substrb(dump(max("N_COL3"),16,0,32),1,120)),to_char(count("N_COL4"))
  ,to_char(substrb(dump(min("N_COL4"),16,0,32),1,120)),
  to_char(substrb(dump(max("N_COL4"),16,0,32),1,120)),to_char(count("V_COL1"))
  ,to_char(substrb(dump(min("V_COL1"),16,0,32),1,120)),
  to_char(substrb(dump(max("V_COL1"),16,0,32),1,120)),to_char(count("V_COL2"))
  ,to_char(substrb(dump(min("V_COL2"),16,0,32),1,120)),
  to_char(substrb(dump(max("V_COL2"),16,0,32),1,120)),to_char(count("V_COL3"))
  ,to_char(substrb(dump(min("V_COL3"),16,0,32),1,120)),
  to_char(substrb(dump(max("V_COL3"),16,0,32),1,120)),to_char(count("V_COL4"))
  ,to_char(substrb(dump(min("V_COL4"),16,0,32),1,120)),
  to_char(substrb(dump(max("V_COL4"),16,0,32),1,120)) 
from
 "TEST"."TEST_BIG_PART" t  where TBL$OR$IDX$PART$NUM("TEST"."TEST_BIG_PART",0,
  4,0,"ROWID") = :objn /* NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,
  NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL*/


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse       10      0.00       0.00          0          0          0           0
Execute     10      0.00       0.00          0          0          0           0
Fetch       10     15.45      27.11     222785     223490          0          10
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total       30     15.45      27.11     222785     223490          0          10

Misses in library cache during parse: 1
Misses in library cache during execute: 1
Optimizer mode: ALL_ROWS
Parsing user id: 93     (recursive depth: 1)

Rows     Row Source Operation
-------  ---------------------------------------------------
      1  SORT AGGREGATE (cr=22349 pr=22272 pw=0 time=0 us)
 998200   APPROXIMATE NDV AGGREGATE (cr=22349 pr=22272 pw=0 time=0 us cost=6109 size=3539530 card=18245)
1000000    PARTITION RANGE SINGLE PARTITION: KEY KEY (cr=22349 pr=22272 pw=0 time=31249 us cost=6109 size=3539530 card=18245)
1000000     TABLE ACCESS FULL TEST_BIG_PART PARTITION: KEY KEY (cr=22349 pr=22272 pw=0 time=31249 us cost=6109 size=3539530 card=18245)


Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  db file scattered read                       2046        0.07         11.56
  db file sequential read                        45        0.00          0.02
********************************************************************************

-- Zugriff auf Gesamttabelle
SQL ID: 8zu50wq38a6zk
Plan Hash: 719485973
select /*+ full(t) no_parallel(t) no_parallel_index(t) dbms_stats 
  cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring 
  */to_char(count("STARTDATE")),to_char(substrb(dump(min("STARTDATE"),16,0,32)
  ,1,120)),to_char(substrb(dump(max("STARTDATE"),16,0,32),1,120)),
  to_char(count("ID")),to_char(substrb(dump(min("ID"),16,0,32),1,120)),
  to_char(substrb(dump(max("ID"),16,0,32),1,120)),to_char(count("N_COL1")),
  to_char(substrb(dump(min("N_COL1"),16,0,32),1,120)),
  to_char(substrb(dump(max("N_COL1"),16,0,32),1,120)),to_char(count("N_COL2"))
  ,to_char(substrb(dump(min("N_COL2"),16,0,32),1,120)),
  to_char(substrb(dump(max("N_COL2"),16,0,32),1,120)),to_char(count("N_COL3"))
  ,to_char(substrb(dump(min("N_COL3"),16,0,32),1,120)),
  to_char(substrb(dump(max("N_COL3"),16,0,32),1,120)),to_char(count("N_COL4"))
  ,to_char(substrb(dump(min("N_COL4"),16,0,32),1,120)),
  to_char(substrb(dump(max("N_COL4"),16,0,32),1,120)),to_char(count("V_COL1"))
  ,to_char(substrb(dump(min("V_COL1"),16,0,32),1,120)),
  to_char(substrb(dump(max("V_COL1"),16,0,32),1,120)),to_char(count("V_COL2"))
  ,to_char(substrb(dump(min("V_COL2"),16,0,32),1,120)),
  to_char(substrb(dump(max("V_COL2"),16,0,32),1,120)),to_char(count("V_COL3"))
  ,to_char(substrb(dump(min("V_COL3"),16,0,32),1,120)),
  to_char(substrb(dump(max("V_COL3"),16,0,32),1,120)),to_char(count("V_COL4"))
  ,to_char(substrb(dump(min("V_COL4"),16,0,32),1,120)),
  to_char(substrb(dump(max("V_COL4"),16,0,32),1,120)) 
from
 "TEST"."TEST_BIG_PART" t  /* NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,
  NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL*/


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        1     14.76      26.03     222783     223490          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        3     14.76      26.03     222783     223490          0           1

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 93     (recursive depth: 1)

Rows     Row Source Operation
-------  ---------------------------------------------------
      1  SORT AGGREGATE (cr=223490 pr=222783 pw=0 time=0 us)
9980200   APPROXIMATE NDV AGGREGATE (cr=223490 pr=222783 pw=0 time=437497 us cost=60962 size=3320566340 card=18244870)
10000000    PARTITION RANGE ALL PARTITION: 1 1048575 (cr=223490 pr=222783 pw=0 time=124995 us cost=60962 size=3320566340 card=18244870)
10000000     TABLE ACCESS FULL TEST_BIG_PART PARTITION: 1 1048575 (cr=223490 pr=222783 pw=0 time=124995 us cost=60962 size=3320566340 card=18244870)


Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  db file scattered read                       2046        0.05         11.15
  db file sequential read                        45        0.00          0.01
********************************************************************************

Im Fall der initialen Anlage inkrementeller Statistiken ergeben sich (wie erwartet) andere Operationen:

-- Aktivierung der inkrementellen Statistikerfassung
begin
dbms_stats.set_table_prefs('TEST', 'TEST_BIG_PART', 'INCREMENTAL', 'TRUE');
end;
/

begin
DBMS_STATS.GATHER_TABLE_STATS ( OwnName        => user
                              , TabName        => 'TEST_BIG_PART'
                              , Granularity    => 'AUTO'
                              , Method_Opt     => 'FOR ALL COLUMNS SIZE 1'
);
end;
/

In diesem Fall beträgt die Laufzeit 45 sec., aber daraus würde ich nicht zwingend eine notwendige Beschleunigung bei der initialen Statistikerfassung gegenüber dem ersten Fall 'GLOBAL AND PARTITION' ableiten - hier zumindest reduziert sich die Laufzeit um ca. 10 sec. Dabei ergeben sich deutlich andere (und anscheinend kompliziertere) Operationen als im nicht-inkrementellen Fall.

SELECT ... FROM SYS.WRI$_OPTSTAT_SYNOPSIS_HEAD$ ...
DELETE FROM SYS.WRI$_OPTSTAT_SYNOPSIS$ ...
DELETE FROM SYS.WRI$_OPTSTAT_SYNOPSIS_HEAD$ ...
INSERT INTO SYS.WRI$_OPTSTAT_SYNOPSIS_HEAD$ ...
SELECT ... FROM SYS.MON_MODS_ALL ...
MERGE ... INTO SYS.MON_MODS_ALL$ ...
SELECT ... FROM SYS.MON_MODS ...
DELETE ... FROM SYS.MON_MODS ...
DELETE ... FROM WRI$_OPTSTAT_SYNOPSIS$ ...

-- Zugriff auf Partitionen
SQL ID: a4sr7dfdnjss5
Plan Hash: 2760598834
select /*+ full(t) no_parallel(t) no_parallel_index(t) dbms_stats 
  cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring 
  */to_char(count("STARTDATE")),to_char(substrb(dump(min("STARTDATE"),16,0,32)
  ,1,120)),to_char(substrb(dump(max("STARTDATE"),16,0,32),1,120)),
  to_char(count("ID")),to_char(substrb(dump(min("ID"),16,0,32),1,120)),
  to_char(substrb(dump(max("ID"),16,0,32),1,120)),to_char(count("N_COL1")),
  to_char(substrb(dump(min("N_COL1"),16,0,32),1,120)),
  to_char(substrb(dump(max("N_COL1"),16,0,32),1,120)),to_char(count("N_COL2"))
  ,to_char(substrb(dump(min("N_COL2"),16,0,32),1,120)),
  to_char(substrb(dump(max("N_COL2"),16,0,32),1,120)),to_char(count("N_COL3"))
  ,to_char(substrb(dump(min("N_COL3"),16,0,32),1,120)),
  to_char(substrb(dump(max("N_COL3"),16,0,32),1,120)),to_char(count("N_COL4"))
  ,to_char(substrb(dump(min("N_COL4"),16,0,32),1,120)),
  to_char(substrb(dump(max("N_COL4"),16,0,32),1,120)),to_char(count("V_COL1"))
  ,to_char(substrb(dump(min("V_COL1"),16,0,32),1,120)),
  to_char(substrb(dump(max("V_COL1"),16,0,32),1,120)),to_char(count("V_COL2"))
  ,to_char(substrb(dump(min("V_COL2"),16,0,32),1,120)),
  to_char(substrb(dump(max("V_COL2"),16,0,32),1,120)),to_char(count("V_COL3"))
  ,to_char(substrb(dump(min("V_COL3"),16,0,32),1,120)),
  to_char(substrb(dump(max("V_COL3"),16,0,32),1,120)),to_char(count("V_COL4"))
  ,to_char(substrb(dump(min("V_COL4"),16,0,32),1,120)),
  to_char(substrb(dump(max("V_COL4"),16,0,32),1,120)) 
from
 "TEST"."TEST_BIG_PART" t  where TBL$OR$IDX$PART$NUM("TEST"."TEST_BIG_PART",0,
  4,0,"ROWID") = :objn /* SYN,NIL,NIL,SYN,NIL,NIL,SYN,NIL,NIL,SYN,NIL,NIL,SYN,
  NIL,NIL,SYN,NIL,NIL,SYN,NIL,NIL,SYN,NIL,NIL,SYN,NIL,NIL,SYN,NIL,NIL*/


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse       10      0.00       0.00          0          0          0           0
Execute     10      0.00       0.00          0          0          0           0
Fetch       10     15.34      27.53     222785     223490          0          10
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total       30     15.34      27.53     222785     223490          0          10

Misses in library cache during parse: 1
Misses in library cache during execute: 1
Optimizer mode: ALL_ROWS
Parsing user id: 93     (recursive depth: 1)

Rows     Row Source Operation
-------  ---------------------------------------------------
      1  SORT AGGREGATE (cr=22349 pr=22228 pw=0 time=0 us)
 998200   APPROXIMATE NDV AGGREGATE (cr=22349 pr=22228 pw=0 time=0 us cost=6087 size=1530000 card=10000)
1000000    PARTITION RANGE SINGLE PARTITION: KEY KEY (cr=22349 pr=22228 pw=0 time=0 us cost=6087 size=1530000 card=10000)
1000000     TABLE ACCESS FULL TEST_BIG_PART PARTITION: KEY KEY (cr=22349 pr=22228 pw=0 time=0 us cost=6087 size=1530000 card=10000)


Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  db file scattered read                       2046        0.08         11.83
  db file sequential read                        45        0.01          0.03
********************************************************************************

UPDATE SYS.WRI$_OPTSTAT_SYNOPSIS_HEAD$ ...
INSERT INTO SYS.WRI$_OPTSTAT_SYNOPSIS$ ...
INSERT INTO SYS.WRI$_OPTSTAT_HISTHEAD_HISTORY

Demnach erfolgen in diesem Fall relativ komplexe Metadaten-Operationen, die die WRI$_OPTSTAT%-Tabellen auslesen und aktualisieren (wobei recht große Datenmengen entstehen können, wie Randolf Geist in seinem Block ausführt).

Fall 2: Ergänzung einer weiteren Partition

Nun der relevantere Fall: wie verhalten sich die beiden Verfahren nach Ergänzung einer zusätzlichen Tagespartition?

-- Daten für einen weiteren Tag
insert into test_big_part
with
generator as (
select trunc(sysdate) + 10 a_date
  from dual
connect by level <= 10  
)
,
basedata as (
select rownum id
    , mod(rownum, 2) n_col1
    , mod(rownum, 10) n_col2
    , round(rownum/10) n_col3
    , dbms_random.value * 1000 n_col4
    , lpad('*', 50, '*') v_col1
    , dbms_random.string('a', 1) v_col2
    , dbms_random.string('a', 50) v_col3
    , 'test' v_col4
  from dual
connect by level <= 100000)
select generator.a_date
     , basedata.*
  from generator
     , basedata;

commit;

Für den nicht-inkrementellen Fall ergibt sich für die folgende Statistikerfassung folgendes Bild:

-- Statistikerfassung für neue Partition und global
begin
DBMS_STATS.GATHER_TABLE_STATS (OwnName        => user
                              , TabName       => 'TEST_BIG_PART'
                              , partname      => 'SYS_P80'
                              , Method_Opt    => 'FOR ALL COLUMNS SIZE 1'
);
end;
/

Im Trace sieht man nun erwartungsgemäß den Zugriff auf eine Partition und die folgende globale Statistikerfassung:

-- Zugriff auf eine Partition
SQL ID: 4mnm957p8yjs6
Plan Hash: 2760598834
select /*+ full(t) no_parallel(t) no_parallel_index(t) dbms_stats 
  cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring 
  */to_char(count("STARTDATE")),to_char(substrb(dump(min("STARTDATE"),16,0,32)
  ,1,120)),to_char(substrb(dump(max("STARTDATE"),16,0,32),1,120)),
  to_char(count("ID")),to_char(substrb(dump(min("ID"),16,0,32),1,120)),
  to_char(substrb(dump(max("ID"),16,0,32),1,120)),to_char(count("N_COL1")),
  to_char(substrb(dump(min("N_COL1"),16,0,32),1,120)),
  to_char(substrb(dump(max("N_COL1"),16,0,32),1,120)),to_char(count("N_COL2"))
  ,to_char(substrb(dump(min("N_COL2"),16,0,32),1,120)),
  to_char(substrb(dump(max("N_COL2"),16,0,32),1,120)),to_char(count("N_COL3"))
  ,to_char(substrb(dump(min("N_COL3"),16,0,32),1,120)),
  to_char(substrb(dump(max("N_COL3"),16,0,32),1,120)),to_char(count("N_COL4"))
  ,to_char(substrb(dump(min("N_COL4"),16,0,32),1,120)),
  to_char(substrb(dump(max("N_COL4"),16,0,32),1,120)),to_char(count("V_COL1"))
  ,to_char(substrb(dump(min("V_COL1"),16,0,32),1,120)),
  to_char(substrb(dump(max("V_COL1"),16,0,32),1,120)),to_char(count("V_COL2"))
  ,to_char(substrb(dump(min("V_COL2"),16,0,32),1,120)),
  to_char(substrb(dump(max("V_COL2"),16,0,32),1,120)),to_char(count("V_COL3"))
  ,to_char(substrb(dump(min("V_COL3"),16,0,32),1,120)),
  to_char(substrb(dump(max("V_COL3"),16,0,32),1,120)),to_char(count("V_COL4"))
  ,to_char(substrb(dump(min("V_COL4"),16,0,32),1,120)),
  to_char(substrb(dump(max("V_COL4"),16,0,32),1,120)) 
from
 "TEST"."TEST_BIG_PART" t  where TBL$OR$IDX$PART$NUM("TEST"."TEST_BIG_PART",0,
  4,0,"ROWID") = :objn /* NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,
  NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL*/


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        1      1.54       2.34      16865      22349          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        3      1.54       2.34      16865      22349          0           1

Misses in library cache during parse: 0
Optimizer mode: ALL_ROWS
Parsing user id: 93     (recursive depth: 1)

Rows     Row Source Operation
-------  ---------------------------------------------------
      1  SORT AGGREGATE (cr=22349 pr=16865 pw=0 time=0 us)
 998380   APPROXIMATE NDV AGGREGATE (cr=22349 pr=16865 pw=0 time=0 us cost=6087 size=1530000 card=10000)
1000000    PARTITION RANGE SINGLE PARTITION: KEY KEY (cr=22349 pr=16865 pw=0 time=0 us cost=6087 size=1530000 card=10000)
1000000     TABLE ACCESS FULL TEST_BIG_PART PARTITION: KEY KEY (cr=22349 pr=16865 pw=0 time=0 us cost=6087 size=1530000 card=10000)


Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  db file scattered read                        161        0.02          0.81
********************************************************************************

-- globaler Zugriff
SQL ID: 8zu50wq38a6zk
Plan Hash: 719485973
select /*+ full(t) no_parallel(t) no_parallel_index(t) dbms_stats 
  cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring 
  */to_char(count("STARTDATE")),to_char(substrb(dump(min("STARTDATE"),16,0,32)
  ,1,120)),to_char(substrb(dump(max("STARTDATE"),16,0,32),1,120)),
  to_char(count("ID")),to_char(substrb(dump(min("ID"),16,0,32),1,120)),
  to_char(substrb(dump(max("ID"),16,0,32),1,120)),to_char(count("N_COL1")),
  to_char(substrb(dump(min("N_COL1"),16,0,32),1,120)),
  to_char(substrb(dump(max("N_COL1"),16,0,32),1,120)),to_char(count("N_COL2"))
  ,to_char(substrb(dump(min("N_COL2"),16,0,32),1,120)),
  to_char(substrb(dump(max("N_COL2"),16,0,32),1,120)),to_char(count("N_COL3"))
  ,to_char(substrb(dump(min("N_COL3"),16,0,32),1,120)),
  to_char(substrb(dump(max("N_COL3"),16,0,32),1,120)),to_char(count("N_COL4"))
  ,to_char(substrb(dump(min("N_COL4"),16,0,32),1,120)),
  to_char(substrb(dump(max("N_COL4"),16,0,32),1,120)),to_char(count("V_COL1"))
  ,to_char(substrb(dump(min("V_COL1"),16,0,32),1,120)),
  to_char(substrb(dump(max("V_COL1"),16,0,32),1,120)),to_char(count("V_COL2"))
  ,to_char(substrb(dump(min("V_COL2"),16,0,32),1,120)),
  to_char(substrb(dump(max("V_COL2"),16,0,32),1,120)),to_char(count("V_COL3"))
  ,to_char(substrb(dump(min("V_COL3"),16,0,32),1,120)),
  to_char(substrb(dump(max("V_COL3"),16,0,32),1,120)),to_char(count("V_COL4"))
  ,to_char(substrb(dump(min("V_COL4"),16,0,32),1,120)),
  to_char(substrb(dump(max("V_COL4"),16,0,32),1,120)) 
from
 "TEST"."TEST_BIG_PART" t  /* NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,
  NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL*/


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        1     16.75      29.05     239658     245839          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        3     16.75      29.05     239658     245839          0           1

Misses in library cache during parse: 0
Optimizer mode: ALL_ROWS
Parsing user id: 93     (recursive depth: 1)

Rows     Row Source Operation
-------  ---------------------------------------------------
      1  SORT AGGREGATE (cr=245839 pr=239658 pw=0 time=0 us)
10979786   APPROXIMATE NDV AGGREGATE (cr=245839 pr=239658 pw=0 time=301646 us cost=60796 size=1530000000 card=10000000)
11000000    PARTITION RANGE ALL PARTITION: 1 1048575 (cr=245839 pr=239658 pw=0 time=239300 us cost=60796 size=1530000000 card=10000000)
11000000     TABLE ACCESS FULL TEST_BIG_PART PARTITION: 1 1048575 (cr=245839 pr=239658 pw=0 time=156303 us cost=60796 size=1530000000 card=10000000)


Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  db file scattered read                       2207        0.08         12.48
  db file sequential read                        45        0.01          0.03
********************************************************************************

Hier gibt's also keine Überraschung: die Statistik für die Partition ist schnell erzeugt, aber die Aktualisierung der globalen Statistiken erfordert wieder das Lesen der kompletten Daten. Nun zu den inkrementellen Statistiken und ihrem Verhalten bei der Ergänzung einer weiteren Partition:

begin
DBMS_STATS.GATHER_TABLE_STATS ( OwnName        => user
                              , TabName        => 'TEST_BIG_PART'
                              , Granularity    => 'AUTO'
                              , Method_Opt     => 'FOR ALL COLUMNS SIZE 1'
);
end;
/

In diesem Fall sinkt die Laufzeit von ca. 30 sec. auf ca. 10 sec. Davon entfällt der größte Teil (ca. 7 sec.) auf das Lesen der Partition; zusätzlich gibt es aber wieder diverse lesende und schreibende Zugriffe auf die WRI$_OPTSTAT%-Synposis-Tabellen:

SQL ID: a4sr7dfdnjss5
Plan Hash: 2760598834
select /*+ full(t) no_parallel(t) no_parallel_index(t) dbms_stats 
  cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring 
  */to_char(count("STARTDATE")),to_char(substrb(dump(min("STARTDATE"),16,0,32)
  ,1,120)),to_char(substrb(dump(max("STARTDATE"),16,0,32),1,120)),
  to_char(count("ID")),to_char(substrb(dump(min("ID"),16,0,32),1,120)),
  to_char(substrb(dump(max("ID"),16,0,32),1,120)),to_char(count("N_COL1")),
  to_char(substrb(dump(min("N_COL1"),16,0,32),1,120)),
  to_char(substrb(dump(max("N_COL1"),16,0,32),1,120)),to_char(count("N_COL2"))
  ,to_char(substrb(dump(min("N_COL2"),16,0,32),1,120)),
  to_char(substrb(dump(max("N_COL2"),16,0,32),1,120)),to_char(count("N_COL3"))
  ,to_char(substrb(dump(min("N_COL3"),16,0,32),1,120)),
  to_char(substrb(dump(max("N_COL3"),16,0,32),1,120)),to_char(count("N_COL4"))
  ,to_char(substrb(dump(min("N_COL4"),16,0,32),1,120)),
  to_char(substrb(dump(max("N_COL4"),16,0,32),1,120)),to_char(count("V_COL1"))
  ,to_char(substrb(dump(min("V_COL1"),16,0,32),1,120)),
  to_char(substrb(dump(max("V_COL1"),16,0,32),1,120)),to_char(count("V_COL2"))
  ,to_char(substrb(dump(min("V_COL2"),16,0,32),1,120)),
  to_char(substrb(dump(max("V_COL2"),16,0,32),1,120)),to_char(count("V_COL3"))
  ,to_char(substrb(dump(min("V_COL3"),16,0,32),1,120)),
  to_char(substrb(dump(max("V_COL3"),16,0,32),1,120)),to_char(count("V_COL4"))
  ,to_char(substrb(dump(min("V_COL4"),16,0,32),1,120)),
  to_char(substrb(dump(max("V_COL4"),16,0,32),1,120)) 
from
 "TEST"."TEST_BIG_PART" t  where TBL$OR$IDX$PART$NUM("TEST"."TEST_BIG_PART",0,
  4,0,"ROWID") = :objn /* SYN,NIL,NIL,SYN,NIL,NIL,SYN,NIL,NIL,SYN,NIL,NIL,SYN,
  NIL,NIL,SYN,NIL,NIL,SYN,NIL,NIL,SYN,NIL,NIL,SYN,NIL,NIL,SYN,NIL,NIL*/


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        1      1.78       6.69      16141      44674          1           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        3      1.78       6.69      16141      44674          1           1

Misses in library cache during parse: 1
Misses in library cache during execute: 1
Optimizer mode: ALL_ROWS
Parsing user id: 93     (recursive depth: 1)

Rows     Row Source Operation
-------  ---------------------------------------------------
      1  SORT AGGREGATE (cr=44674 pr=16141 pw=0 time=0 us)
 998146   APPROXIMATE NDV AGGREGATE (cr=44674 pr=16141 pw=0 time=110048 us cost=5536 size=15400000 card=100000)
1000000    PARTITION RANGE SINGLE PARTITION: KEY KEY (cr=44674 pr=16141 pw=0 time=31253 us cost=5536 size=15400000 card=100000)
1000000     TABLE ACCESS FULL TEST_BIG_PART PARTITION: KEY KEY (cr=44674 pr=16141 pw=0 time=31253 us cost=5536 size=15400000 card=100000)


Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  db file scattered read                        268        0.11          4.34
  db file sequential read                         2        0.03          0.03
  db file parallel read                           2        0.55          0.61
********************************************************************************

Demnach ist das inkrementelle Verfahren zur Aktualisierung der globalen Statistiken also deutlich schneller als das nicht-inkrementelle - was auch wieder keine große Überraschung ist. Bleibt die Frage nach der Qualität der Ergebnisse, deren Beantwortung ich der table function dbms_stats.diff_table_stats_in_history überlasse:

select * from table(dbms_stats.diff_table_stats_in_history(
                    ownname => user,
                    tabname => upper('TEST_BIG_PART'),
                    time1 => systimestamp,
                    time2 => to_timestamp('05.09.2012 12:05:00','dd.mm.yyyy hh24:mi:ss'),
                    pctthreshold => 0));

REPORT
--------------------------------------------------------------------------------

MAXDIFFPCT
----------
###############################################################################

STATISTICS DIFFERENCE REPORT FOR:
.................................

TABLE         : TEST_BIG_PART
OWNER         : TEST
SOURCE A      : Statistics as of 05.09.12 12:16:41,613000 +02:00
SOURCE B      : Statistics as of 05.09.12 12:05:00,000000 +02:00
PCTTHRESHOLD  : 0
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

TABLE / (SUB)PARTITION STATISTICS DIFFERENCE:
.............................................

OBJECTNAME                  TYP SRC ROWS       BLOCKS     ROWLEN     SAMPSIZE
...............................................................................

TEST_BIG_PART               T   A   11000000   245707     153        11000000
                                B   11000000   245707     154        11000000
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

COLUMN STATISTICS DIFFERENCE:
.............................

COLUMN_NAME     SRC NDV     DENSITY    HIST NULLS   LEN  MIN   MAX   SAMPSIZ
...............................................................................

N_COL4          A   200512  ,000004987 NO   0       22   BF055 C20A6 1.1E+07
                B   199488  ,000005012 NO   0       22   C0042 C20A6 1.1E+07
V_COL3          A   201744  ,000004956 NO   0       51   41414 7A7A7 1.1E+07
                B   202816  ,000004930 NO   0       51   41414 7A7A7 1.1E+07
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


NO DIFFERENCE IN INDEX / (SUB)PARTITION STATISTICS
###############################################################################

Demnach ergeben sich in diesem Test-Beispiel nur geringfügige Abweichungen zwischen den beiden Verfahren.

Fazit:

die inkrementellen Statistiken scheinen ziemlich genau das zu leisten, was sie versprechen.
bei der initialen Anlage von Objektstatistiken (also der kompletten Neuerstellung der Statistiken für eine große partitionierte Tabelle) gibt im Hinblick auf die Laufzeit anscheinend keinen gravierenden Unterschied zwischen dem inkrementellen Verfahren und der vollständigen Erfassung.
im Fall der Ergänzung neuer Partitionen erfolgt die Aktualisierung der globalen Statistiken über den inkrementellen Mechanismus auf der Basis der Synopsis-Informationen deutlich schneller als bei einem kompletten Neuaufbau.
natürlich ist die Pflege der Synopsis-Informationen in den WRI$_OPTSTAT%-Tabellen nicht kostenlos: die Aktualisierung der Statistik macht relativ komplexe interne Queries und DML-Operationen erforderlich. Außerdem kann der Speicherplatzbedarf der Synopsis signifikant sein (wie Randolf Geist gezeigt hat).
die Ergebnisse der inkrementellen und der nicht-inkrementellen Statistikerzeugung sind im Test nahezu identisch.

MP Oracle Blog

Seiten

Mittwoch, September 05, 2012

Inkrementelle Statistiken für partitionierte Tabellen

Fall 1: initiale Statistikerfassung

Fall 2: Ergänzung einer weiteren Partition

Keine Kommentare:

Kommentar veröffentlichen