可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

The dataset looks like this:

colx      coly    colz  
0         1       0      
0         1       1      
0         1       0

Required output:

Colname      value    count

colx         0        3
coly         1        3
colz         0        2
colz         1        1

The following code works perfectly...

ods output onewayfreqs=outfreq;

proc freq data=final;
  tables colx coly colz / nocum nofreq;
run;

data freq;
  retain colname column_value;
  set outfreq;
  colname = scan(tables, 2, ' ');
  column_Value = trim(left(vvaluex(colname)));
  keep colname column_value frequency percent;
run;

... but I believe that's not efficient. Say I have 1000 columns, running prof freq on all 1000 columns is not efficient. Is there any other efficient way with out using the proc freq that accomplishes my desired output?

回答1:

One of the most efficient mechanisms for computing frequency counts is through a hash object set up for reference counting via the suminc tag.

The SAS documentation for "Hash Object - Maintaining Key Summaries" demonstrates the technique for a single variable. The following example goes one step further and computes for each variable specified in an array. The suminc:'one' specifies that each use of ref will add the value of one to an internal reference sum. While iterating over the distinct keys for output, the frequency count is extracted via the sum method.

* one million data values;

data have;
  array v(1000);
  do row = 1 to 1000;
    do index = 1 to dim(v);
      v(index) = ceil(100*ranuni(123));
    end;
    output;
  end;
  keep v:;
  format v: 4.;
run;

* compute frequency counts via .ref();    

data freak_out(keep=name value count);
  length name $32 value 8;

  declare hash bins(ordered:'a', suminc:'one');
  bins.defineKey('name', 'value');
  bins.defineData('name', 'value');
  bins.defineDone();

  one = 1;

  do until (end_of_data);
    set have end=end_of_data;
    array v v1-v1000;
    do index = 1 to dim(v);
      name = vname(v(index));
      value = v(index);
      bins.ref();
    end;
  end;

  declare hiter out('bins');
  do while (out.next() = 0);
    bins.sum(sum:count);
    output;
  end;
run;

Note Proc FREQ uses standard grammars, variables can be a mixed of character and numeric, and has lots of additional features that are specified through options.

回答2:

I think the most time consuming part in your code is generation of the ODS report. You can transpose the data before applying the freq. The below example does the task for 1000 rows with 1000 variables in few seconds. If you do it using ODS it may take much longer.

data dummy;
    array colNames [1000] col1-col1000;
    do line = 1 to 1000;
        do j = 1 to dim(colNames);
            colNames[j] = int(rand("uniform")*100);
        end;
        output;
    end;
    drop j;
run;

proc transpose 
    data = dummy 
    out = dummyTransposed (drop = line rename = (_name_ = colName col1 = value))
    ;
    var col1-col1000;
    by line;
run;

proc freq data = dummyTransposed noprint;
    tables colName*value / out = result(drop = percent);
run;

回答3:

Perhaps this statement from the comments is the real problem.

I felt like the odsoutput with proc freq is slowing down and creating huge logs and outputs. think of 10,000 variables and million records. I felt there should be another way of accomplishing this and arrays seems to be a great fit

You can tell ODS not to produce the printed output if you don't want it.

ods exclude all ;
ods output onewayfreqs=outfreq;
proc freq data=final;
  tables colx coly colz / nocum nofreq;
run;
ods exclude none ;

SAS summary statistic from a dataset

问题:

回答1:

回答2:

回答3:

收藏的人(0)

SAS summary statistic from a dataset

问题:

回答1:

回答2:

回答3:

收藏的人(0)

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮