My question is closely related to existing discussions on Statalist like this one. I want to raise a new question because I want to look at more complex patterns of panels beyond numbers of consecutive spells.
Say, given a panel of firms, I want to check how many years that firm owns no real estate property land == 0
before it buys some land > 0
.
Or, even more sophisticatedly, how many years the firm's property is below some level land < 0.05 * land[s]
where s
refers to the year a firm purchases real estate property.
My first thought is to use egen
command. But unlike other usual cases, the specific time of property purchase is different for each firm, or even doesn't exist.
My second thought is to use the package xtpatternvar
, with slight modifications. However, with limited knowledge of Stata programming, I don't quite understand its source code.
Let's focus on your real problem, which is not understanding xtpatternvar
(SSC, as you should explain), but determining, in panel data, for how long does a variable satisfy some condition in each panel before it satisfies the complementary condition. It's not easy, but it's important to generalise beyond particular examples you have in posing questions here. Other people may not care about land purchase data, but they may have the same general kind of problem.
This is just a couple of small twists on the problem discussed in http://www.stata.com/support/faqs/data-management/dropping-spells-of-missing-values/ That FAQ discusses various techniques. I will just pick one. The entire FAQ may be worth study. (Further moral: look at the FAQs on StataCorp's website.)
It's also a good idea to give data they can use straight away to people who answer. On Statalist people are asked to use dataex
(SSC), and there is no reason for lower standards here. That is consistent with https://stackoverflow.com/help/mcve
The first time (here year) in each panel is the minimum of the time variable in each panel. (In some datasets, you may not need to calculate that; you know it's always a particular time.) The first time some condition is satisfied is again a minimum, but now necessarily conditional. The time lapse you want is just the difference between them. Note that there is no assumption in the code of xtset
or tsset
data; no assumption of equally spaced values or balanced panels; and no assumption that there is a spell of complementary values at the beginning of each panel. Note that the solution is of the same kind for your "sophisticated" problem.
clear
input float(firm response year)
1 0 2001
1 0 2002
1 12 2003
1 345 2004
1 6789 2005
2 12 2001
2 345 2002
2 6789 2003
2 12 2004
2 34 2005
end
egen first = min(year), by(firm)
egen first_pos = min(cond(response > 0, year, .)), by(firm)
gen time_to_first_pos = first_pos - first
list
+-------------------------------------------------------------+
| firm time response year first_~s first time_t~s |
|-------------------------------------------------------------|
1. | 1 1 0 2001 2003 2001 2 |
2. | 1 2 0 2002 2003 2001 2 |
3. | 1 3 12 2003 2003 2001 2 |
4. | 1 4 345 2004 2003 2001 2 |
5. | 1 5 6789 2005 2003 2001 2 |
|-------------------------------------------------------------|
6. | 2 1 12 2001 2001 2001 0 |
7. | 2 2 345 2002 2001 2001 0 |
8. | 2 3 6789 2003 2001 2001 0 |
9. | 2 4 12 2004 2001 2001 0 |
10. | 2 5 34 2005 2001 2001 0 |
+-------------------------------------------------------------+