How to determine (complex) panel pattern?

2019-08-11 09:27发布

My question is closely related to existing discussions on Statalist like this one. I want to raise a new question because I want to look at more complex patterns of panels beyond numbers of consecutive spells.

Say, given a panel of firms, I want to check how many years that firm owns no real estate property land == 0 before it buys some land > 0.

Or, even more sophisticatedly, how many years the firm's property is below some level land < 0.05 * land[s] where s refers to the year a firm purchases real estate property.

My first thought is to use egen command. But unlike other usual cases, the specific time of property purchase is different for each firm, or even doesn't exist.

My second thought is to use the package xtpatternvar, with slight modifications. However, with limited knowledge of Stata programming, I don't quite understand its source code.

1条回答
冷血范
2楼-- · 2019-08-11 09:56

Let's focus on your real problem, which is not understanding xtpatternvar (SSC, as you should explain), but determining, in panel data, for how long does a variable satisfy some condition in each panel before it satisfies the complementary condition. It's not easy, but it's important to generalise beyond particular examples you have in posing questions here. Other people may not care about land purchase data, but they may have the same general kind of problem.

This is just a couple of small twists on the problem discussed in http://www.stata.com/support/faqs/data-management/dropping-spells-of-missing-values/ That FAQ discusses various techniques. I will just pick one. The entire FAQ may be worth study. (Further moral: look at the FAQs on StataCorp's website.)

It's also a good idea to give data they can use straight away to people who answer. On Statalist people are asked to use dataex (SSC), and there is no reason for lower standards here. That is consistent with https://stackoverflow.com/help/mcve

The first time (here year) in each panel is the minimum of the time variable in each panel. (In some datasets, you may not need to calculate that; you know it's always a particular time.) The first time some condition is satisfied is again a minimum, but now necessarily conditional. The time lapse you want is just the difference between them. Note that there is no assumption in the code of xtset or tsset data; no assumption of equally spaced values or balanced panels; and no assumption that there is a spell of complementary values at the beginning of each panel. Note that the solution is of the same kind for your "sophisticated" problem.

clear
input float(firm response year)
1    0 2001
1    0 2002
1   12 2003
1  345 2004
1 6789 2005
2   12 2001
2  345 2002
2 6789 2003
2   12 2004
2   34 2005
end

egen first = min(year), by(firm)

egen first_pos = min(cond(response > 0, year, .)), by(firm)

gen time_to_first_pos  = first_pos - first

list 

     +-------------------------------------------------------------+
     | firm   time   response   year   first_~s   first   time_t~s |
     |-------------------------------------------------------------|
  1. |    1      1          0   2001       2003    2001          2 |
  2. |    1      2          0   2002       2003    2001          2 |
  3. |    1      3         12   2003       2003    2001          2 |
  4. |    1      4        345   2004       2003    2001          2 |
  5. |    1      5       6789   2005       2003    2001          2 |
     |-------------------------------------------------------------|
  6. |    2      1         12   2001       2001    2001          0 |
  7. |    2      2        345   2002       2001    2001          0 |
  8. |    2      3       6789   2003       2001    2001          0 |
  9. |    2      4         12   2004       2001    2001          0 |
 10. |    2      5         34   2005       2001    2001          0 |
     +-------------------------------------------------------------+
查看更多
登录 后发表回答