可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I have a data frame which looks like this

> fitchRatings
               Country Month Year FitchLongTerm LongTermTransformed
1            Abu Dhabi     7 2007            AA                  22
2               Angola     5 2012           BB-                  12
3               Angola     5 2011           BB-                  12
4               Angola     5 2010            B+                  11
5            Argentina     7 2010             B                  10
6            Argentina    12 2008            RD                   3
7            Argentina     8 2006            RD                   3
8            Argentina    12 2005            RD                   3
9            Argentina     6 2005           DDD                   2
10           Argentina     1 2005             D                   0

As you can see, for some Countries, there are multiple observations for a single year. I want to subset the DF so that I keep only one observation for each country-year and the observation I want to keep is the one that has the smallest value for "LongTermTransformed".

In this data set Country and LongTermTransformed are factors and Year is an integer.

Thanks in advance.

回答1:

There are many ways to subset the rows based on the minimum value in a column with grouping columns. One option is to get the index of the 'min' value with which.min after converting the 'LongTermTransformed' to 'numeric'. We can use slice to subset the rows identified by the index.

library(dplyr)
fitchRatings %>%
          group_by(Country, Year) %>%
          slice(which.min(as.numeric(as.character(LongTermTransformed))))

Or we can use a similar option with data.table. The difference is we convert the 'data.frame' to 'data.table' (setDT) and for subsetting .SD was used.

library(data.table)#v1.9.5+
setDT(fitchRatings)[, 
 .SD[which.min(as.numeric(levels(LongTermTransformed))[LongTermTransformed])],
              by = .(Country, Year)]

回答2:

Another data.table options is just to order by LongTermTransformed and take the first value per Country and Year using unique

library(data.table)
unique(setDT(fitchRatings)[order(as.numeric(as.character(LongTermTransformed)))], by = c("Country", "Year"))

Or using dplyr

library(dplyr)
fitchRatings %>%
  arrange(as.numeric(as.character(LongTermTransformed))) %>%
  distinct(Country, Year)

Subsetting DataFrame in R by duplicate values for

问题:

回答1:

回答2:

收藏的人(0)

Subsetting DataFrame in R by duplicate values for

问题:

回答1:

回答2:

收藏的人(0)

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮