Is the use of SELECT COUNT(*) before SELECT INTO s

2019-04-01 16:36发布

My last question got me thinking.

1)

SELECT COUNT(*) INTO count FROM foo WHERE bar = 123;
IF count > 0 THEN
    SELECT a INTO var FROM foo WHERE bar = 123;
    -- do stuff
ELSE
    -- do other stuff
END IF;

2)

BEGIN
    SELECT a INTO var FROM foo where bar = 123;
    -- do stuff
EXCEPTION
    WHEN no_data_found THEN
        --do other stuff
END ;

I assume number 2 is faster because it requires one less trip to the database.

Is there any situation where 1 would be superior, that I am not considering?

EDIT: I'm going to let this question hang for a few more days, to gather some more votes on the answers, before answering it.

4条回答
Summer. ? 凉城
2楼-- · 2019-04-01 17:14

Such a situation I usually do like this:

DECALRE
   CURSOR cur IS
   SELECT a FROM foo where bar = 123;
BEGIN
   OPEN cur;
   FETCH cur INTO var;
   IF cur%FOUND THEN
      -- do stuff, maybe a LOOP if required
   ELSE
      --do other stuff
   END;
END;

This has some benfits:

You read only one record from database, the rest you skip. Should be the fastest way of doing it in case you just have to know if number of rows > 1.

You don't handle a "normal" situation with an "exception" handler, some people consider this as "more beautiful" coding.

查看更多
走好不送
3楼-- · 2019-04-01 17:18

First see: Oracle PL/SQL - Are NO_DATA_FOUND Exceptions bad for stored procedure performance? that is essentially the same question than yours. And after that see About the performance of exception handling.

In both scenarios you should be prepared also to handle too_many_rows exception unless your database schema enforces uniqueness of bar.

This is PL/SQL so you're on a constant database trip - instead you should be afraid/aware of PL/SQL - SQL context switches. See also what Tom says:

But don't be afraid at all to invoke SQL from PLSQL - that is what PLSQL does best.

In first place you shouldn't be worried about the performance but the correctness of a program. In this regard my vote goes for scenario #2.

查看更多
走好不送
4楼-- · 2019-04-01 17:30

I'm not sure about faster, but I'd say (2) is clearly superior because you're not considering the case where someone issues DELETE FROM foo where bar='123' in between your statements in (1).

查看更多
淡お忘
5楼-- · 2019-04-01 17:31

If you use exact queries from the question then 1st variant of course slower because it must count all records in table which satisfies criteria.

It must be writed as

SELECT COUNT(*) INTO row_count FROM foo WHERE bar = 123 and rownum = 1;

or

select 1 into row_count from dual where exists (select 1 from foo where bar = 123);

because checking for record existence is enough for your purpose.

Of course, both variants don't guarantee that someone else don't change something in foo between two statements, but it's not an issue if this check is a part of more complex scenario. Just think about situation when someone changed value of foo.a after selecting it's value into var while performing some actions which refers selected var value. So in complex scenarios better to handle such concurrency issues on application logic level.
To perform atomic operations is better to use single SQL statement.

Any of variants above requires 2 context switches between SQL and PL/SQL and 2 queries so performs slower then any variant described below in cases when row found in a table.

There are another variants to check existence of row without exception:

select max(a), count(1) into var, row_count 
from foo 
where bar = 123 and rownum < 3;

If row_count = 1 then only one row satisfies criteria.

Sometime it's enough to check only for existence because of unique constraint on the foo which guarantees that there are no duplicated bar values in foo. E.g. bar is primary key.
In such cases it's possible to simplify query:

select max(a) into var from foo where bar = 123;
if(var is not null) then 
  ...
end if;

or use cursor for processing values:

for cValueA in ( 
  select a from foo where bar = 123
) loop
  ...  
end loop;

Next variant is from link, provided by @user272735 in his answer:

select 
  (select a from foo where bar = 123)
  into var 
from dual;

From my experience any variant without exception blocks in most cases faster then a variant with exceptions, but if number of executions of such block is low then better to use exception block with handling of no_data_found and too_many_rows exceptions to improve code readability.

Right point to choose to use exception or don't use it, is to ask a question "Is this situation are normal for application?". If row not found and it's a expected situation which can be handled (e.g. add new row or take data from another place and so on) is better to avoid exception. If it's unexpected and there are no way to fix a situation, then catch exception to customize error message, write it to event log and re-throw, or just don't catch it at all.

To compare performance just make a simple test case on you system whith both variants called many times and compare.
Say more, in 90 percent of applications this question is more theoretical than practical because there are a lot of another sources of performance issues which must be taken into account first.

Update

I reproduced example from this page at SQLFiddle site with a little corrections (link).
Results prove that variant with selecting from dual performs best: a little overhead when most of queries succeed and lowest performance degradation when number of missing rows raises.
Surprisingly variant with count() and two queries showed best result in case if all queries failed.

| FNAME | LOOP_COUNT | ALL_FAILED | ALL_SUCCEED | variant name |
----------------------------------------------------------------
|    f1 |       2000 |       2.09 |        0.28 |  exception   |
|    f2 |       2000 |       0.31 |        0.38 |  cursor      |
|    f3 |       2000 |       0.26 |        0.27 |  max()       |
|    f4 |       2000 |       0.23 |        0.28 |  dual        |
|    f5 |       2000 |       0.22 |        0.58 |  count()     |

-- FNAME        - tested function name 
-- LOOP_COUNT   - number of loops in one test run
-- ALL_FAILED   - time in seconds if all tested rows missed from table
-- ALL_SUCCEED  - time in seconds if all tested rows found in table
-- variant name - short name of tested variant

Below is a setup code for test environment and test script.

create table t_test(a, b)
as
select level,level from dual connect by level<=1e5
/
insert into t_test(a, b) select null, level from dual connect by level < 100
/

create unique index x_text on t_test(a)
/

create table timings(
  fname varchar2(10), 
  loop_count number, 
  exec_time number
)
/

create table params(pstart number, pend number)
/
-- loop bounds
insert into params(pstart, pend) values(1, 2000)
/

-- f1 - exception handling

create or replace function f1(p in number) return number
as
  res number;
begin
  select b into res
  from t_test t
  where t.a=p and rownum = 1;
  return res;
exception when no_data_found then
  return null;
end;
/

-- f2 - cursor loop

create or replace function f2(p in number) return number
as
  res number;
begin
  for rec in (select b from t_test t where t.a=p and rownum = 1) loop
    res:=rec.b;
  end loop;
  return res;
end;
/

-- f3 - max()

create or replace function f3(p in number) return number
as
  res number;
begin
  select max(b) into res
  from t_test t
  where t.a=p and rownum = 1;
  return res;
end;
/

-- f4 - select as field in select from dual

create or replace function f4(p in number) return number
as
  res number;
begin
  select
    (select b from t_test t where t.a=p and rownum = 1)
    into res
  from dual;
  return res;
end;
/

-- f5 - check count() then get value

create or replace function f5(p in number) return number
as
  res number;
  cnt number;
begin
  select count(*) into cnt
  from t_test t where t.a=p and rownum = 1;

  if(cnt = 1) then
    select b into res from t_test t where t.a=p;
  end if;

  return res;
end;
/

Test script:

declare
  v       integer;
  v_start integer;
  v_end   integer;

  vStartTime number;

begin
  select pstart, pend into v_start, v_end from params;

  vStartTime := dbms_utility.get_cpu_time;

  for i in v_start .. v_end loop
    v:=f1(i);
  end loop;

  insert into timings(fname, loop_count, exec_time) 
    values ('f1', v_end-v_start+1, (dbms_utility.get_cpu_time - vStartTime)/100) ;
end;
/

declare
  v       integer;
  v_start integer;
  v_end   integer;

  vStartTime number;

begin
  select pstart, pend into v_start, v_end from params;

  vStartTime := dbms_utility.get_cpu_time;

  for i in v_start .. v_end loop
    v:=f2(i);
  end loop;

  insert into timings(fname, loop_count, exec_time) 
    values ('f2', v_end-v_start+1, (dbms_utility.get_cpu_time - vStartTime)/100) ;
end;
/

declare
  v       integer;
  v_start integer;
  v_end   integer;

  vStartTime number;

begin
  select pstart, pend into v_start, v_end from params;

  vStartTime := dbms_utility.get_cpu_time;

  for i in v_start .. v_end loop
    v:=f3(i);
  end loop;

  insert into timings(fname, loop_count, exec_time) 
    values ('f3', v_end-v_start+1, (dbms_utility.get_cpu_time - vStartTime)/100) ;
end;
/

declare
  v       integer;
  v_start integer;
  v_end   integer;

  vStartTime number;

begin
  select pstart, pend into v_start, v_end from params;

  vStartTime := dbms_utility.get_cpu_time;

  for i in v_start .. v_end loop
    v:=f4(i);
  end loop;

  insert into timings(fname, loop_count, exec_time) 
    values ('f4', v_end-v_start+1, (dbms_utility.get_cpu_time - vStartTime)/100) ;
end;
/

declare
  v       integer;
  v_start integer;
  v_end   integer;

  vStartTime number;

begin
  select pstart, pend into v_start, v_end from params;
  --v_end := v_start + trunc((v_end-v_start)*2/3);

  vStartTime := dbms_utility.get_cpu_time;

  for i in v_start .. v_end loop
    v:=f5(i);
  end loop;

  insert into timings(fname, loop_count, exec_time) 
    values ('f5', v_end-v_start+1, (dbms_utility.get_cpu_time - vStartTime)/100) ;
end;
/

select * from timings order by fname
/
查看更多
登录 后发表回答