SAS replace character in ALL columns

2019-07-18 09:45发布

问题:

I have a SAS dataset that I have to export to a .csv-file. I have the following two contradicting requirements.

  • I have to use the semicolon as the delimiter in the .csv-file.

  • Some of the character variables are manually inputted strings from formulas, hence they may contain semicolons.

My solution to the above is to either escape the semicolon or to replace it with a comma.

How can I, in a nice, clean and efficient way use e.g. tranwrd on an entire dataset?


My attempt:

For each variable, use the tranwrd(.., ";", ",") function on a variable in the data set. Update the dataset and loop through all variables. This, however, is naturally a very inefficient way of doing it for even semi-large datasets, since I have to do a datastep for each variable. The code for it is quite ugly, since I have to get the variable names by a few steps, but the inefficiency definitely takes the cake.

data test;
    input w $ c b  d  e $ f $;
    datalines4;
Aaa;; 50 11 1 222 a;s                                        
Bbb 35 12 2 250 qw                                        
Comma, 75 13 3 foo zx                                        
;;;;
run;

* Get the variable names;
proc contents data=test out=vars(keep=name type varnum) order=varnum noprint;
run;
* Sort by variable number;
proc sort data=vars;
    by varnum;
run;

* Put variable names into a space-separated string;
proc sql noprint;
    select compress(name)        
    into :name_list separated by ' '                      
    from vars;
quit;

%let len = %sysfunc(countw(&name_list));


*Initialize loop dataset;
data a;
    set test;
run;

%macro loop;
    %do i = 1 %to &len;
        %let j = %scan(&name_list,&i);

        data a(rename=(v_&j = &j) drop=&j);
            set a;
            v_&j.=compress(tranwrd(&j,";",","));
        run;
    %end;
%mend;
%loop;

回答1:

I think I may have more elegant solution to your problem:

data class;
   set sashelp.class;

   array vars [*] _character_;

   do i = 1 to dim(vars);
      vars[i] = compress(tranwrd(vars[i],"a","X"));
   end;

   drop i;
run;

You can use array to reference all character columns from your data set and then loop through them.



回答2:

The most widely used standard for csv files whose fields can contain delimiters is to quote fields that contain them, and double up any quotes. In SAS you can do this automatically using the dlm and dsd options in a put statement:

data test;
    input w $ c b  d  e $ f $;
    datalines4;
Aaa;; 50 11 1 222 a;s                                        
Bbb" 35 12 2 250 qw                                        
Comma, 75 13 3 foo zx                                        
;;;;
run;

data _null_;
  set test;
  file "c:\temp\test.csv" dsd dlm=';';
  put (_ALL_) (&);
run;

This results in the following semicolon-delimited csv (minus a header row, but that's a separate issue):

"Aaa;;";50;11;1;222;"a;s"
"Bbb""";35;12;2;250;qw
Comma,;75;13;3;foo;zx

Sorry, didn't notice your comment about the workaround until after I posted this. I'll leave it here in case anyone finds it helpful.



回答3:

Fields in a properly formatted delimited file are quoted. PROC EXPORT will do that. There is no need to change the data.

data test;
    input w $ c b  d  e $ f $;
    datalines4;
Aaa;; 50 11 1 222 a;s                                        
Bbb 35 12 2 250 qw                                        
Comma, 75 13 3 foo zx                                        
;;;;
   run;
filename FT45F001 temp;
proc export data=test outfile=FT45F001 dbms=csv;
   delimiter=';';
   run;
data _null_;
   infile FT45F001;
   input;
   list;
   run;

proc import replace datafile=FT45F001 dbms=csv out=test2;
   delimiter=';';
   run;
proc print;
   run;
proc compare base=test compare=test2;
   run;


标签: sas