Oracle 10g: MIN/MAX column value estimation

is it possible to retrieve statistics about the minimal or maximal value of a numeric column in Oracle 10g? I have found the table USER_TAB_COL_STATISTICS having a LOW_VALUE and HIGH_VALUE column, but I am not sure whether those are the values I am looking for.

I need to find an efficient way to ask the DBS for those statistics. Using a regular MIN(a) and MAX(a) query would be too slow on large tables.

Thanks in advance.

标签： oracle oracle10g

4条回答

\"骚年 ilove

2楼-- · 2019-06-02 16:18

Yes, LOW_VALUE and HIGH_VALUE will tell you the minimum and maximum values in the column but:

they are stored as RAW(32) columns, so the meaning will not be immediately apparent
they will be as of the last time statistics were gathered for the table, so may not be accurate (unless you explicitly gather stats before using them)

If you index the column then MIN(a) and MAX(a) should be very fast as in this example where T1 has 50000 rows and is indexed on OBJECT_ID:

SQL> select min(object_id) from t1;

MIN(OBJECT_ID)
--------------
           100

------------------------------------------------------------------------------------
| Id  | Operation                  | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |       |     1 |     5 |     2   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE            |       |     1 |     5 |            |          |
|   2 |   INDEX FULL SCAN (MIN/MAX)| T1_ID | 53191 |   259K|     2   (0)| 00:00:01 |
------------------------------------------------------------------------------------

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
          2  consistent gets
          0  physical reads
          0  redo size
        419  bytes sent via SQL*Net to client
        380  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

The result is the same if you select the MAX instead of the MIN. However, if you select the MIN and MAX in a single select statement the result is different:

SQL> select min(object_id), max(object_id) from t1;

MIN(OBJECT_ID) MAX(OBJECT_ID)
-------------- --------------
           100          72809


-------------------------------------------------------------------------------
| Id  | Operation             | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |       |     1 |     5 |    34   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE       |       |     1 |     5 |            |          |
|   2 |   INDEX FAST FULL SCAN| T1_ID | 53191 |   259K|    34   (0)| 00:00:01 |
-------------------------------------------------------------------------------


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        125  consistent gets
          0  physical reads
          0  redo size
        486  bytes sent via SQL*Net to client
        380  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

This suggests that it may be better to get them separately, though I haven't conclusively proved that.

0人赞添加讨论(0) 举报

Ridiculous、

3楼-- · 2019-06-02 16:21

The other answers here (using an index fast full scan; or examining the user_tab_columns statistics) are excellent.

Here's another method that might be suitable - if you're only interested in a rough estimate, you can use the SAMPLE clause (and adjust the sample size up or down depending on how accurate you need it):

SELECT max(value), min(value) FROM t SAMPLE(1);

This takes a 1% sample from the table. It will generally sample different rows each time it is run, so don't expect the results to be identical run-to-run. If you want it to run quicker, you can have lower sample sizes, e.g. SAMPLE(0.01), or if you want to sample half the table, SAMPLE(50).

The advantage of this approach over the "analyze, then-query-user-tab-cols" approach is that the analyze runs queries like this anyway in order to generate the statistics - so doing it this way may mean less work overall.

0人赞添加讨论(0) 举报

爱情/是我丢掉的垃圾

4楼-- · 2019-06-02 16:23

An example with a table containing numbers from 1 up to 1234:

SQL> create table t (nr) as select level from dual connect by level <= 1234
  2  /

Tabel is aangemaakt.

SQL> select min(nr)
  2       , max(nr)
  3    from t
  4  /

   MIN(NR)    MAX(NR)
---------- ----------
         1       1234

1 rij is geselecteerd.

If you analyze the table, the low_value and high_value columns contain the right numbers.

SQL> exec dbms_stats.gather_table_stats(user,'t')

PL/SQL-procedure is geslaagd.

SQL> select low_value
  2       , high_value
  3    from user_tab_columns
  4   where table_name = 'T'
  5     and column_name = 'NR'
  6  /

LOW_VALUE                                                        HIGH_VALUE
---------------------------------------------------------------- ----------------
C102                                                             C20D23

1 rij is geselecteerd.

They are raw, so they cannot be read easily. Using the utl_raw.cast_to_number function makes them readable:

SQL> select utl_raw.cast_to_number(low_value)
  2       , utl_raw.cast_to_number(high_value)
  3    from user_tab_columns
  4   where table_name = 'T'
  5     and column_name = 'NR'
  6  /

UTL_RAW.CAST_TO_NUMBER(LOW_VALUE) UTL_RAW.CAST_TO_NUMBER(HIGH_VALUE)
--------------------------------- ----------------------------------
                                1                               1234

1 rij is geselecteerd.

However, be careful: the numbers may be inaccurate when updates have taken place between the time the statistics were gathered and the time the query ran.

Regards, Rob.

0人赞添加讨论(0) 举报

Ridiculous、

5楼-- · 2019-06-02 16:26

In my case the column of interest has TIMESTAMP type and it seems there's no UTL_RAW.CAST_TO_TIMESTAMP function.

It helped to use the trick from http://www.oaktable.net/content/convert-rawhex-timestamp-0 to convert Oracle RAW type to TIMESTAMP:

select to_timestamp(
        to_char( to_number( substr( p_str, 1, 2 ), 'xx' ) - 100, 'fm00' ) ||
        to_char( to_number( substr( p_str, 3, 2 ), 'xx' ) - 100, 'fm00' ) ||
        to_char( to_number( substr( p_str, 5, 2 ), 'xx' ), 'fm00' ) ||
        to_char( to_number( substr( p_str, 7, 2 ), 'xx' ), 'fm00' ) ||
        to_char( to_number( substr( p_str,9, 2 ), 'xx' )-1, 'fm00' ) ||
        to_char( to_number( substr( p_str,11, 2 ), 'xx' )-1, 'fm00' ) ||
        to_char( to_number( substr( p_str,13, 2 ), 'xx' )-1, 'fm00' ), 'yyyymmddhh24miss' )
from (
select low_value p_str from user_tab_columns
   where table_name = 'MESSAGE' and column_name = 'TS'
)

0人赞添加讨论(0) 举报

Oracle 10g: MIN/MAX column value estimation

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间