confirm conditional statement applies to >0 observ

2019-07-19 10:39发布

问题:

This is something that has puzzled me for some time and I have yet to find an answer.

I am in a situation where I am applying a standardized data cleaning process to (supposedly) similarly structured files, one file for each year. I have a statement such as the following:

replace field="Plant" if field=="Plant & Machinery"

Which was a result of the original code-writing based on the data file for year 1. Then I generalize the code to loop through the years of data. The problem becomes if in year 3, the analogous value in that variable was coded as "Plant and MachInery ", such that the code line above would not make the intended change due to the difference in the text string, but not result in an error alerting the change was not made.

What I am after is some sort of confirmation that >0 observations actually satisfied the condition each instance the code is executed in the loop, otherwise return an error. Any combination of trimming, removing spaces, and standardizing the text case are not workaround options. At the same time, I don't want to add a count if and then assert statement before every conditional replace as that becomes quite bulky.

Aside from going to the raw files to ensure the variable values are standardized, is there any way to do this validation "on the fly" as I have tried to describe? Maybe just write a custom program that combines a count if, assert and replace?

回答1:

The idea has surfaced occasionally that replace should return the number of observations changed, but there are good reasons why not, notably that it is not a r-class or e-class command any way and it's quite important not to change the way it works because that could break innumerable programs and do-files.

So, I think the essence of any answer is that you have to set up your own monitoring process counting how many values have (or would be) changed.

One pattern is -- when working on a current variable:

gen was = . 

foreach ... { 
     ... 
     replace was = current 
     replace current = ... 
     qui count if was != current 
     <use the result> 
}