I have a variable that is mostly numeric values, however there is occasionally a character snuck-in. Therefor, the variable is stored as a character in SAS. I would like to be able to determine if individual values in this character variable are numeric. After knowing which values are numeric and which are character, I would like to create a new (numeric) variable, where the numeric values of the character variable are stored as numbers and the character values are stored as missing.
Here is an example.
char_var --> num_var
a .
1 1
2.34 2.34
## .
cat .
I was hoping do be able to do it something like below, but am missing a function to help me:
if char_var=is.numeric(char_var) then num_var=char_var;
else num_var=.;
The is.numeric
would be a function that would determine this for me.
Any advice is appreciated.
On a side note, in R I would try (maybe incorrectly) to progam this as follows:
mydata$type<-is.numeric(mydata$char_var)
if (mydata$type==1) {mydata$num_var=mydata$char_var} else {mydata$num_var=NA}
mydata$num_var<-as.numeric(mydata$num_var)
Using this dataset for all examples:
There are a lot of different ways of handling this, depending on your needs.
First off, you can let SAS handle it for you; Bob provides such a solution. I consider this pretty bad form, though; while in his program it is obvious you're doing it on purpose, in longer code it's not always obvious, and thus it can confuse other programmers and/or cause data errors. I'd not allow code that involved intentional implicit conversion to make it to production without good reason.
You can use the input statement to do the conversion, which is a better version of SAS handling things. In that case, you can easily suppress the warning messages about invalid conversions. The leading ?? suppresses conversion errors; a single ? would cause a line to be printed to the log for invalid conversions but less obtrusively than no ?.
You could check the field explicitly; this depends a lot on your data. The following would work for simple numeric fields, but fails on the last two (with scientific notation), and on the line with two decimals.
You could also write your own isnumeric function. The following uses a perl regular expression that is fairly robust (but not perfect, and does not include lots of valid numeric varieties such as comma or percents; you could add those in if your data suggested them).
For most uses the simple input with ?? is probably sufficient.
You really don't need to worry too much about this; SAS will happy convert the variable to a numeric variable for you. For example:
My favorite SAS option (errors=0) is used to suppress the messages SAS puts in the log as it converts. If the character value can be interpreted as a number, the conversion will work without a problem. There might be a way to scan the character variable to see if it will result in a number, but I bet the overhead with such a solution "costs" more than just letting SAS do it for you.
Any BTW, the option "errors=0" is a bit misleading; it's the same as "errors=1" and means only the first occurrence of an error message is written to the log. I have this in my autoexec.sas program because I normally don't need multiple error messages to figure out what's wrong.