可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I created a toy example of my code below. In this toy example I would like to create a measure of all higher prices minus lower prices within a self-created reference group. So within each reference group, I would like to take each individual and subtract its price value from all higher price values from other individuals in the same group. I do not want to have negative differences. Then I would like to sum all these differences. In creating this code I found some help here: http://www.stata.com/support/faqs/data-management/try-all-values-with-foreach/

However, the code didn't work perfectly for me, because my dataset is quite large (several 100K obs) and the examples on the website and my code only work until the numlist maximum of 1600 in Stata. (I am using version 12). The toy example with the auto dataset works, due to small size of the dataset.

I would like to ask if someone has an idea how to code this more efficiently, so that I can get around the numlist restriction. I thought about summing the differences directly without saving them in intermediate variables, but that also blow up the numlist restriction.

clear all
sysuse auto

ren headroom refgroup

bysort refgroup : egen pricerank = rank(price)
qui: su pricerank, meanonly
gen test = `r(max)'
su test
 foreach i of num 1/`r(max)' {
 qui: bys refgroup: gen intermediate`i' = price[_n+`i'] -price if price[_n+`i'] > price
  }
egen price_diff = rowmax(intermediate*)
drop intermediate*

回答1:

If I understand this correctly, this isn't even a problem that requires explicit loops. The sum of all higher prices is just the difference between two cumulative sums. You might need to think through what you want to do if prices are tied.

. clear

. set obs 10 
obs was 0, now 10

. gen group = _n > 5 

. set seed 2803

. gen price = ceil(1000 * runiform()) 

. bysort group (price) : gen sumhigherprices = sum(price) 

. by group : replace sumhigherprices = sumhigherprices[_N] - sumhigherprices 
(10 real changes made)

. list 

     +--------------------------+
     | group   price   sumhig~s |
     |--------------------------|
  1. |     0     218       1448 |
  2. |     0     264       1184 |
  3. |     0     301        883 |
  4. |     0     335        548 |
  5. |     0     548          0 |
     |--------------------------|
  6. |     1     125       3027 |
  7. |     1     213       2814 |
  8. |     1     828       1986 |
  9. |     1     988        998 |
 10. |     1     998          0 |
     +--------------------------+

Edit: For what the OP needs, there is an extra line

. by group : replace sumhigherprices = sumhigherprices - (_N - _n) * price

回答2:

If I understand the wording of the problem correctly, maybe this can help. It uses joinby (new observations are created and depending on the size of the original database, you may or not hit the Stata hard-limit on number of observations). The code reproduces the results that would follow from the code of the original post. This is a second attempt. The code before this final edit did not provide the sought-after results. The wording of the problem was somewhat difficult for me to understand.

clear all
set more off

* Load data
sysuse auto

* Delete unnecessary vars
ren headroom refgroup
keep refgroup price

* Generate id´s based on rankings (sort)
bysort refgroup (price): gen id = _n

* Pretty list
order refgroup id
sort refgroup id price
list, sepby(refgroup)

* joinby procedure
tempfile main
save "`main'"

rename (price id) =0
joinby refgroup using "`main'"
list, sepby(refgroup)

* Do not compare with itself and drop duplicates
drop if id0 >= id

* Compute differences and max
gen dif = abs(price0 - price)
collapse (max) dif, by(refgroup id0)

list, sepby(refgroup)