Drop factor levels in a subsetted data frame-第2页回答

I have a data frame containing a factor. When I create a subset of this data frame using subset() or another indexing function, a new data frame is created. However, the factor variable retains all of its original levels -- even when they do not exist in the new data frame.

This creates headaches when doing faceted plotting or using functions that rely on factor levels.

What is the most succinct way to remove levels from a factor in my new data frame?

Here's my example:

df <- data.frame(letters=letters[1:5],
                    numbers=seq(1:5))

levels(df$letters)
## [1] "a" "b" "c" "d" "e"

subdf <- subset(df, numbers <= 3)
##   letters numbers
## 1       a       1
## 2       b       2
## 3       c       3    

## but the levels are still there!
levels(subdf$letters)
## [1] "a" "b" "c" "d" "e"

标签： r dataframe r-factor r-faq

13条回答

春风洒进眼中

2楼-- · 2018-12-31 01:34

Here's another way, which I believe is equivalent to the factor(..) approach:

> df <- data.frame(let=letters[1:5], num=1:5)
> subdf <- df[df$num <= 3, ]

> subdf$let <- subdf$let[ , drop=TRUE]

> levels(subdf$let)
[1] "a" "b" "c"

0人赞添加讨论(0) 举报

旧人旧事旧时光

3楼-- · 2018-12-31 01:34

When i am working with data.frame, I now use options(stringsAsFactors = FALSE) at the beginning of the script. Hence, characters remain characters. Since, i do not get any problems with factors any more :)

0人赞添加讨论(0) 举报

梦醉为红颜

4楼-- · 2018-12-31 01:35

It is a known issue, and one possible remedy is provided by drop.levels() in the gdata package where your example becomes

> drop.levels(subdf)
  letters numbers
1       a       1
2       b       2
3       c       3
> levels(drop.levels(subdf)$letters)
[1] "a" "b" "c"

There is also the dropUnusedLevels function in the Hmisc package. However, it only works by altering the subset operator [ and is not applicable here.

As a corollary, a direct approach on a per-column basis is a simple as.factor(as.character(data)):

> levels(subdf$letters)
[1] "a" "b" "c" "d" "e"
> subdf$letters <- as.factor(as.character(subdf$letters))
> levels(subdf$letters)
[1] "a" "b" "c"

0人赞添加讨论(0) 举报

初与友歌

5楼-- · 2018-12-31 01:38

Since R version 2.12, there's a droplevels() function.

levels(droplevels(subdf$letters))

0人赞添加讨论(0) 举报

孤独寂梦人

6楼-- · 2018-12-31 01:41

If you don't want this behaviour, don't use factors, use character vectors instead. I think this makes more sense than patching things up afterwards. Try the following before loading your data with read.table or read.csv:

options(stringsAsFactors = FALSE)

The disadvantage is that you're restricted to alphabetical ordering. (reorder is your friend for plots)

0人赞添加讨论(0) 举报

不流泪的眼

7楼-- · 2018-12-31 01:48

Very interesting thread, I especially liked idea to just factor subselection again. I had the similar problem before and I just converted to character and then back to factor.

   df <- data.frame(letters=letters[1:5],numbers=seq(1:5))
   levels(df$letters)
   ## [1] "a" "b" "c" "d" "e"
   subdf <- df[df$numbers <= 3]
   subdf$letters<-factor(as.character(subdf$letters))

0人赞添加讨论(0) 举报

Drop factor levels in a subsetted data frame

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间