I am trying to use PROC SQL to query a DB2 table with hundreds of millions of records. During the development stage, I want to run my query on an arbitrarily small subset of those records (say, 1000). I've tried using INOBS to limit the observations, but I believe that this parameter is simply limiting the number of records which SAS is processing. I want SAS to only fetch an arbitrary number of records from the database (and then process all of them).
If I were writing a SQL query myself, I would simply use SELECT * FROM x FETCH FIRST 1000 ROWS ONLY ...
(the equivalent of SELECT TOP 1000 * FROM x
in SQL Server). But PROC SQL doesn't seem to have any option like this. It's taking an extremely long time to fetch the records.
The question: How can I instruct SAS to arbitrarily limit the number of records to return from the database.
I've read that PROC SQL uses ANSI SQL, which doesn't have any specification for a row limiting keyword. Perhaps SAS didn't feel like making the effort to translate its SQL syntax to vendor-specific keywords? Is there no work around?
Have you tried using the
outobs
option in yourproc sql
?For example,
Alternatively, you can use SQL passthrough to write a query using DB2 syntax (
FETCH FIRST 10 ROWS ONLY
), although this requires you to store all your data in the database, at least temporarily.Passthrough looks something like this:
It requires more syntax and can't access your sas datasets, so if
outobs
works for you, I would recommend that.When SAS is talking to a database via SAS syntax, part of the query can be translated to DBMS language equivalent - this is called implicit pass through. The rest of the query is "post-processed" by SAS to produce final result. Depending on SAS version, DBMS vendor and DBMS version, and in some cases even some connection/libname options, different parts of SAS syntax are translatable/considered compatible between SAS and DBMS and thus sent to be performed by DBMS instead of SAS.
With SAS SQL options - INOBS and OUTOBS - I've worked a lot with MS SQL and Oracle via different versions of SAS, but I haven't seen those ever translated to TOP xxx type of queries, so this is probably not supported yet, although when query touches just DMBS data (no joins to SAS data etc), should be quite doable.
So I think you're left with the so called explicit pass-through - specific SAS SQL syntax to connect to database. This type of queries look like this:
In SAS 9.3 the syntax can be simplified - if there's already a LIBNAME connection, you can reuse it for explicit pass-through:
When connecting using libname be sure to use READBUFF (I usually set some 5000 or so) or INSERTBUFF options (1000 or more) when loading database.
To see if implicit pass-through takes place, set sastrace option: