Performance of SASHELP views versus SQL dictionary

2019-06-22 06:47发布

Why does it take longer for SAS to create a dataset from a data step view using, for example, sashelp.vcolumn versus the equivalent SQL table dictionary.columns?

I did a test using fullstimer and it seems to confirm my suspicion of performance differences.

option fullstimer;

data test1;
    set sashelp.vcolumn;
    where libname = 'SASHELP' and
        memname = 'CLASS' and
        memtype = 'DATA';
run;

proc sql;
    create table test2 as
    select *
    from dictionary.columns
    where libname = 'SASHELP' and
        memname = 'CLASS' and
        memtype = 'DATA';
quit;

An excerpt from the log:

NOTE: There were 5 observations read from the data set SASHELP.VCOLUMN.
      WHERE (libname='SASHELP') and (memname='CLASS') and (memtype='DATA');
NOTE: The data set WORK.TEST1 has 5 observations and 18 variables.
NOTE: DATA statement used (Total process time):
      real time           0.67 seconds
      user cpu time       0.23 seconds
      system cpu time     0.23 seconds
      memory              3820.75k
      OS Memory           24300.00k
      Timestamp           04/13/2015 09:42:21 AM
      Step Count                        5  Switch Count  0


NOTE: Table WORK.TEST2 created, with 5 rows and 18 columns.
NOTE: PROCEDURE SQL used (Total process time):
      real time           0.03 seconds
      user cpu time       0.01 seconds
      system cpu time     0.00 seconds
      memory              3267.46k
      OS Memory           24300.00k
      Timestamp           04/13/2015 09:42:21 AM
      Step Count                        6  Switch Count  0

The memory used is a little higher for SASHELP but the difference isn't huge. Note the time--it's 22 times longer using SASHELP than with the SQL dictionary. Surely it can't just be due to the relatively small difference in memory usage.

At @Salva's suggestion, I resubmitted the code in a new SAS session, this time running the SQL step before the data step. The memory and time differences are even more pronounced:

                | sql       | sashelp
----------------+-----------+-----------
real time       | 0.28 sec  | 1.84 sec
user cpu time   | 0.00 sec  | 0.25 sec
system cpu time | 0.00 sec  | 0.24 sec
memory          | 3164.78k  | 4139.53k
OS Memory       | 10456.00k | 13292.00k
Step Count      | 1         | 2
Switch Count    | 0         | 0

2条回答
甜甜的少女心
2楼-- · 2019-06-22 07:08

Some (if not all) of this is the difference in overhead between SQL and Data Step. For example:

proc sql;
    create table test2 as
    select *
    from sashelp.vcolumn
    where libname = 'SASHELP' and
        memname = 'CLASS' and
        memtype = 'DATA';
quit;

Also very fast.

The SAS page about Dictionary Tables gives some information that is likely the main explanation.

When querying a DICTIONARY table, SAS launches a discovery process that gathers information that is pertinent to that table. Depending on the DICTIONARY table that is being queried, this discovery process can search libraries, open tables, and execute views. Unlike other SAS procedures and the DATA step, PROC SQL can mitigate this process by optimizing the query before the discovery process is launched. Therefore, although it is possible to access DICTIONARY table information with SAS procedures or the DATA step by using the SASHELP views, it is often more efficient to use PROC SQL instead.

查看更多
疯言疯语
3楼-- · 2019-06-22 07:13

In my experience, using the sashelp views is slower than using proc datasets. This is more so if you have a lot of libraries assigned, especially external ones:

10   proc datasets lib=sashelp noprint;
11     contents data=class out=work.test2;
12   quit;

NOTE: The data set WORK.TEST2 has 5 observations and 40 variables.
NOTE: PROCEDURE DATASETS used (Total process time):
      real time           0.01 seconds
      user cpu time       0.00 seconds
      system cpu time     0.01 seconds
      memory              635.12k
      OS Memory           9404.00k
      Timestamp           14.04.2015 kl 10.22
查看更多
登录 后发表回答