I have a variable that is mostly numeric values, however there is occasionally a character snuck-in. Therefor, the variable is stored as a character in SAS. I would like to be able to determine if individual values in this character variable are numeric. After knowing which values are numeric and which are character, I would like to create a new (numeric) variable, where the numeric values of the character variable are stored as numbers and the character values are stored as missing.
Here is an example.
char_var --> num_var
a .
1 1
2.34 2.34
## .
cat .
I was hoping do be able to do it something like below, but am missing a function to help me:
if char_var=is.numeric(char_var) then num_var=char_var;
else num_var=.;
The is.numeric
would be a function that would determine this for me.
Any advice is appreciated.
On a side note, in R I would try (maybe incorrectly) to progam this as follows:
mydata$type<-is.numeric(mydata$char_var)
if (mydata$type==1) {mydata$num_var=mydata$char_var} else {mydata$num_var=NA}
mydata$num_var<-as.numeric(mydata$num_var)
You really don't need to worry too much about this; SAS will happy convert the variable to a numeric variable for you. For example:
options errors=0;
data have;
input char_var $;
length num_var 8; /* To make a numeric variable */
num_var = char_var;
put char_var=
/ num_var=;
datalines;
a
1
2.34
##
cat
run;
My favorite SAS option (errors=0) is used to suppress the messages SAS puts in the log as it converts. If the character value can be interpreted as a number, the conversion will work without a problem. There might be a way to scan the character variable to see if it will result in a number, but I bet the overhead with such a solution "costs" more than just letting SAS do it for you.
Any BTW, the option "errors=0" is a bit misleading; it's the same as "errors=1" and means only the first occurrence of an error message is written to the log. I have this in my autoexec.sas program because I normally don't need multiple error messages to figure out what's wrong.
Using this dataset for all examples:
data have;
input xchar $;
datalines;
1
123
123.54
1234.43
123.1.4
124A
234.1A
1234E5
12.34E7
;;;;
run;
There are a lot of different ways of handling this, depending on your needs.
First off, you can let SAS handle it for you; Bob provides such a solution. I consider this pretty bad form, though; while in his program it is obvious you're doing it on purpose, in longer code it's not always obvious, and thus it can confuse other programmers and/or cause data errors. I'd not allow code that involved intentional implicit conversion to make it to production without good reason.
You can use the input statement to do the conversion, which is a better version of SAS handling things. In that case, you can easily suppress the warning messages about invalid conversions. The leading ?? suppresses conversion errors; a single ? would cause a line to be printed to the log for invalid conversions but less obtrusively than no ?.
data want_qmark;
set have;
xnum = input(xchar,??BEST12.);
run;
You could check the field explicitly; this depends a lot on your data. The following would work for simple numeric fields, but fails on the last two (with scientific notation), and on the line with two decimals.
if missing(compress(xnum,'.','d')) then xnum=input(xchar,best12.);
You could also write your own isnumeric function. The following uses a perl regular expression that is fairly robust (but not perfect, and does not include lots of valid numeric varieties such as comma or percents; you could add those in if your data suggested them).
options cmplib=work.funcs;
proc fcmp outlib=work.funcs.test;
function isnumeric(value $);
prx = prxparse('/^-?\d+\.?\d*(e\d+)?$/io');
rc = prxmatch(prx,trimn(value));
return(rc);
endsub;
quit;
data want_fcmp;
set have;
if isnumeric(xchar) then xnum=input(xchar,BEST12.);
run;
For most uses the simple input with ?? is probably sufficient.