Unable to run Call Execute within a Data Step

2019-07-17 10:23发布

站内文章 / 后端开发

14 0

男人必须洒脱

女 | 书童

私信

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I am trying to run an insert statement within a data step to populate a table. I get the following error while running the code ERROR: Domain error. There is no other information on this error i the logs.

The sample code which I am implementing is :

  DATA _NULL_;
  SET DataSetA;

 Call Execute ('Proc Sql; Insert Into TableA Select col1,col2,c.Col2 From Table B Inner        Join ( Select col1,' || Datasetcol1 || ' As col2, ' || Datasetcol2 || ' as Col3 FROM ' || Datasetcol3 || ' ) c On b.Col1=c.col1;quit;');

 run;

This code runs okay if there are 2000 records but now I have 10000 records and throws me that error. Datasetcol1 and Datasetcol2 are values from the dataset which I am using in the query.

I have no idea why this error occurs. Initially, I thought it could be because of the processor having too much work, I put in a wait by using sleep and wakeup , however, I still get the error. Also, I dont get this error always. It happens sometimes and sometimes it doesnt.

回答1:

I am in the middle of running Joe's code and I am having no issues. This is on Windows 7 with SAS 9.2m3, so that may be different environment than yours. Also since I do not have you data, I can not replicate what you have done.

I see lots of things which might improve the code, but the error message indicates some problem deep within SAS. I generally find it more efficient to check with SAS Tech support rather than speculate about what may be happening in the Black Box called SAS. Submit a ticket on this.

I did fine one reference to this error message on the SAS support site which points to Oracle specific issue. http://support.sas.com/kb/14/873.html

回答2:

First, I'd try removing the 'PROC SQL;' and 'QUIT;' from your repeated CALL EXECUTE, and run them only once each - using IF N = 1; to run the PROC SQL, and you can either use a LAST variable to determine when to call the QUIT, or just leave it out of the datastep and write QUIT; after the RUN; of the data step, as that would then execute in the proper place. It is possible the repeated invocation of the SQL environment is causing your problem itself.

Another possible solution would be instead of using CALL EXECUTE, write these all to a text file, then %include that text file. Again, %include the part not including PROC SQL and QUIT, and %include inside a PROC SQL block. If CALL EXECUTE is specifically at fault here, this gets around that with no difference in function. Honestly I'd prefer this method anyway, as it's a) closer to what you're really doing (writing SQL code programatically, then executing it), and b) easier to debug (it writes a text file you could then grab lines from to test).

Edit: Example of removing the PROC SQL and QUIT from the calls:

data class;
set sashelp.class;
run;

filename _null dummy;
proc printto log=_null;
data _null_;
set sashelp.class end=eof;
if _n_ = 1 then call execute('PROC SQL;');
do _t = 1 to 10000;
  _exec = cats('insert into class (name,age,sex) select name,age,sex from sashelp.class where name="',name,'";');
  call execute(_exec);
end;

if eof then call execute('QUIT;');
run;

回答3:

I think you're right that the cause is due to scaling issues with call execute. The limit could be determined by a variety of things such as the amount of available memory, limits to the instruction queue (I made that up btw), etc...

A good practice in this case would probably be to process the call executes in batches of 1000 using the firstobs and obs options on your dataset set statement.

DATA _NULL_;
  SET DataSetA(firstobs=1 obs=1000);
  *Call Execute ...;
run;

DATA _NULL_;
  SET DataSetA(firstobs=1001 obs=2000);
  *Call Execute ...;
run;

And then macro-ise the above code as necessary...

回答4:

For such huge code generation a safer approach is to generate an external file and run it:

 DATA _NULL_;
  SET DataSetA;
  file "script.sas";
  length stmt $200;
  stmt = 'Proc Sql; Insert Into TableA Select col1,col2,col3 From Table B Inner        Join ( Select col1,' || Datasetcol1 || ' As col2, ' || Datasetcol2 || ' as Col3 FROM Table C) c On b.Col1=c.col1;quit;';
  put stmt;
 run;

 %include "script.sas" ;

This also gives you a place to check the generated code while working on your code.

Note: take care of the length for the variable that holds the code (stmt $200 here) and linesize of the external file (256 by default).

回答5:

You may be able to avoid the use of call execute completely by doing everything through the macro facility.

Example:

%macro InsertFromTable(table_name, ds_col1, ds_col2);
    proc sql;
        Insert Into TableA 
        Select col1,col2,col3 
        From Table B 
        Inner Join ( 
            Select col1
                ,&ds_col1. As col2
                ,&ds_col2. as Col3 
            FROM &table_name. as C
        ) c On b.Col1=c.col1
    ;quit;
%mend;


%Macro InsertFromAllTables();
    %let dsid=%sysfunc(open(DataSetA,i));
    %syscall set(dsid);

    %let rc = %sysfunc(fetch(&dsid));
       %do %while (&rc. eq 0);            
            %InsertFromTable(&table_name., &col1., &col2.);
            %let rc = %sysfunc(fetch(&dsid));            
       %end;
    %let rc=%sysfunc(close(&dsid));
%mend;


data DatasetA;
    infile datalines firstobs=3 missover;    
    input table_name: $10. col1: $10. col2: $10.;

    datalines;    
    table_name   col1   col2
    ----------   ----   ----
    Table_1       C1      C2
    Table_2       D1      D2
    Table_x       xx      yy
;run;



%InsertFromAllTables();

回答6:

I'm going to split this off into another answer since it's entirely different from my original thoughts.

I would assume first that this is a data error. Troubleshoot by doing the following: 1. Use the first 5000 rows. If that succeeds, then use only the second 5000 rows (firstobs=5001). If neither fails, then it is not a data error; if one or both fail, it might be a data error. 2. Then perform a binary tree search-like algorithm. Say the first 5000 fails. Then try the first 2500. If that fails then try the first 1250. Continue until you find something that passes. Then try the other half of the last one that passed (say, 625 passed; so try 626-1250).

Continue splitting in half until you have a small set of data you can inspect (one row, five rows, whatever is best for your ability to see the data, particularly based on how many rows the inner join creates). Look at the rows and see if anything pops out as possibly illegal.

Of course if you get to a particular size (say 625) and find that all sets of 625 from 1 to 10000 pass, just not when combined into larger sets, you again may not have a data error - but I'd think it's likely you do, particularly given the link Larry posted.

If you want a somewhat faster solution, consider doing frequencies of the relevant variables in both DatasetA and TableA/B. See if anything in one of those rows is perhaps illegal, or unusual. If you see a date that's impossible, for example, or a negative number that doesn't make sense, perhaps it's at fault.

Finally, what system is this inserting rows into? I hope not SAS (as there are many better SAS-sy solutions). Depending on the system, you may be violating one of that system's rules (like in Larry's link) or you may be violating table constraints or something else that SAS doesn't have a special error message for, but nonetheless does break some rule. The phrase Domain Error could be construed that way, in any event.

回答7:

If you have the most recent version of SAS (version 9.3 Maint 2), you might try the DOSUBL statement. It allows you to run any collection of SAS statements from within a DATA step, as long as those statements represent a "complete" step. That is: global statements, DATA step with RUN;, PROC step with RUN/QUIT.

It's marked as Experimental, but basic documentation and examples are available in this SAS Global Forum 2012 paper.

Note: even though DOSUBL was valid syntax in SAS 9.3 initially, my experience is that it works much better with SAS 9.3M2.