dynamic sampling 和 multi-column statistics
oracle 在11g 中引入了新的multi-column statistics 以解决在多个列上过滤数据出现的无法预测潜在的数据分布而产生错误的执行计划的情况, 在11g 之前如果想比较准确的预测多个列上数据过滤的sql 执行计划是否够好可以使用hints /*+ dynamic_sampling(table_name integer)*/ ,
比如一个人的出生月份可以是1-12月任意一个,而他可能的星座也可能是12星座的任意一个,如果问一个人出生于12月又是双鱼的可能性是多少, 数学概率上就应该是1/12*1/12 = 1/144 , 但是双鱼座要么是2月要么是三月,所以实际某个人既出生于某一个月又是某个星座的概率应该是2/12 = 1/6 , 如果让数据库知道两个列以上的这种潜在的关系就可以让数据库选择更好的执行计划.
在otn 上看到文章描写dynamic sampling 和multi-column statistics 的,所以做了以下实验.
- create table dynsample (c1 number,c2 number,c3 number,c4 number,c5 number,c6 number);
- begin
for i in 1..5 loop
for j in 1..10000 loop
insert into dynsample values(i,i,i,i,i,i);
end loop;
end loop;
end;
/ - exec dbms_stats.gather_table_stats(null,’dynsample’,method_opt=>’for all columns size 1 for columns c1 size 254 for columns c2 size 254′);
set autotrace traceonly explain - select * from dynsample where c1=1 and c2=1;
——————————————————————————-
| Id | Operation | Name | Rows
——————————————————————————-
| 0 | SELECT STATEMENT | | 1991
|* 1 | TABLE ACCESS FULL | DYNSAMPLE | 1991
——————————————————————————-这个时候优化器的预测是对的,c1=1 的情况应该是1/5 , c2=1 的情况也应该是1/5 , 所以当c1=1 and c2=1 的时候应该是50000 * 1/5 * 1/5 = 2000 .
- 我们用dynamic_sampling hints 重新跑一遍.
select /*+ dynamic_sampling(dynsample 4) */ * from dynsample where c1=1 and c2=1;
———————————————–
| Id | Operation | Name | Rows |
———————————————–
| 0 | SELECT STATEMENT | | 10000 |
|* 1 | TABLE ACCESS FULL | DYNSAMPLE | 10000 |
———————————————–使用dynamic_sampling 之后出来的预测是代表真实的情况,
-
exec dbms_stats.gather_table_stats(‘alex’,'dynsample’,method_opt=>’for all columns size 1 for columns(c1,c2) size 254′);
使用11g 的新的multi-column statistics 来收集关于c1 和c2 的组合统计信息
-
select * from dynsample where c1=1 and c2=1;
———————————————–
| Id | Operation | Name | Rows |
———————————————–
| 0 | SELECT STATEMENT | | 10051 |
|* 1 | TABLE ACCESS FULL | DYNSAMPLE | 10051 |
———————————————–这时候的统计信息也是正确的.
- 换一个没有统计值的sql
select * from dynsample where c1=1 and c3=1;
———————————————–
| Id | Operation | Name | Rows |
———————————————–
| 0 | SELECT STATEMENT | | 2000 |
|* 1 | TABLE ACCESS FULL | DYNSAMPLE | 2000 |
———————————————– - 重新使用dynamic sampling 看看
select /*+dynamic_sampling(dynsample 4) from dynsample where c1=1 and c3=1;
———————————————–
| Id | Operation | Name | Rows |
———————————————–
| 0 | SELECT STATEMENT | | 10000 |
|* 1 | TABLE ACCESS FULL | DYNSAMPLE | 10000 |
———————————————–
11g 的multi-column statistics 的语法基本跟之前收集统计值的语法一样. 如果sql 有使用多个列的联合过滤的时候,11g的优化器就会使用multi-column statistics.
注意dynamic sampling 要使用级别4 以上的才能收集到multi-column 的统计信息.
dynamic sampling hints 在9i r2之后都可以使用. 使用的时候会耗费更多的解析时间, 一般来说会建议在统计信息不是很准确,解析时间不重要, 正确的执行计划至关重要的情况下,像是数据仓库里面的报表 或是OLTP 中不是运行非常频繁的sql.
顺便贴出dynamic sampling 的11个级别分别代表什么.
- Level 0: Do not use dynamic sampling.
- Level 1: Sample all tables that have not been analyzed if the following criteria are met: (1) there is at least 1 unanalyzed table in the query; (2) this unanalyzed table is joined to another table or appears in a subquery or non-mergeable view; (3) this unanalyzed table has no indexes; (4) this unanalyzed table has more blocks than the number of blocks that would be used for dynamic sampling of this table. The number of blocks sampled is the default number of dynamic sampling blocks (32).
- Level 2: Apply dynamic sampling to all unanalyzed tables. The number of blocks sampled is two times the default number of dynamic sampling blocks.
- Level 3: Apply dynamic sampling to all tables that meet Level 2 criteria, plus all tables for which standard selectivity estimation used a guess for some predicate that is a potential dynamic sampling predicate. The number of blocks sampled is the default number of dynamic sampling blocks. For unanalyzed tables, the number of blocks sampled is two times the default number of dynamic sampling blocks.
- Level 4: Apply dynamic sampling to all tables that meet Level 3 criteria, plus all tables that have single-table predicates that reference 2 or more columns. The number of blocks sampled is the default number of dynamic sampling blocks. For unanalyzed tables, the number of blocks sampled is two times the default number of dynamic sampling blocks.
- Levels 5, 6, 7, 8, and 9: Apply dynamic sampling to all tables that meet the previous level criteria using 2, 4, 8, 32, or 128 times the default number of dynamic sampling blocks respectively.
- Level 10: Apply dynamic sampling to all tables that meet the Level 9 criteria using all blocks in the table.
10 r1 有个sql profile 功能,它跟dynamic sampling 的区别就是把sql 执行计划里面统计过的数据持久化, dynamic sampling 是不保存统计信息的.
参考资料
http://www.oracle.com/technology/oramag/oracle/09-jan/o19asktom.html
tom 写的dynamic sampling 的文章
演示11g 的multi-column statistics 的新特性