Generate a dummy-variable

2018-12-31 04:14发布

I have trouble generating the following dummy-variables in R:

I'm analyzing yearly time series data (time period 1948-2009). I have two questions:

  1. How do I generate a dummy variable for observation #10, i.e. for year 1957 (value = 1 at 1957 and zero otherwise)?

  2. How do I generate a dummy variable which is zero before 1957 and takes the value 1 from 1957 and onwards to 2009?

标签: r r-faq
15条回答
美炸的是我
2楼-- · 2018-12-31 04:17

The simplest way to produce these dummy variables is something like the following:

> print(year)
[1] 1956 1957 1957 1958 1958 1959
> dummy <- as.numeric(year == 1957)
> print(dummy)
[1] 0 1 1 0 0 0
> dummy2 <- as.numeric(year >= 1957)
> print(dummy2)
[1] 0 1 1 1 1 1

More generally, you can use ifelse to choose between two values depending on a condition. So if instead of a 0-1 dummy variable, for some reason you wanted to use, say, 4 and 7, you could use ifelse(year == 1957, 4, 7).

查看更多
一个人的天荒地老
3楼-- · 2018-12-31 04:19

The ifelse function is best for simple logic like this.

> x <- seq(1950, 1960, 1)

    ifelse(x == 1957, 1, 0)
    ifelse(x <= 1957, 1, 0)

>  [1] 0 0 0 0 0 0 0 1 0 0 0
>  [1] 1 1 1 1 1 1 1 1 0 0 0

Also, if you want it to return character data then you can do so.

> x <- seq(1950, 1960, 1)

    ifelse(x == 1957, "foo", "bar")
    ifelse(x <= 1957, "foo", "bar")

>  [1] "bar" "bar" "bar" "bar" "bar" "bar" "bar" "foo" "bar" "bar" "bar"
>  [1] "foo" "foo" "foo" "foo" "foo" "foo" "foo" "foo" "bar" "bar" "bar"

Categorical variables with nesting...

> x <- seq(1950, 1960, 1)

    ifelse(x == 1957, "foo", ifelse(x == 1958, "bar","baz"))

>  [1] "baz" "baz" "baz" "baz" "baz" "baz" "baz" "foo" "bar" "baz" "baz"

This is the most straightforward option.

查看更多
浅入江南
4楼-- · 2018-12-31 04:19

another way you can do it is use

ifelse(year < 1965 , 1, 0)
查看更多
残风、尘缘若梦
5楼-- · 2018-12-31 04:25

I read this on the kaggle forum:

#Generate example dataframe with character column
example <- as.data.frame(c("A", "A", "B", "F", "C", "G", "C", "D", "E", "F"))
names(example) <- "strcol"

#For every unique value in the string column, create a new 1/0 column
#This is what Factors do "under-the-hood" automatically when passed to function requiring numeric data
for(level in unique(example$strcol)){
  example[paste("dummy", level, sep = "_")] <- ifelse(example$strcol == level, 1, 0)
}
查看更多
若你有天会懂
6楼-- · 2018-12-31 04:26

Package mlr includes createDummyFeatures for this purpose:

library(mlr)
df <- data.frame(var = sample(c("A", "B", "C"), 10, replace = TRUE))
df

# var
# 1    B
# 2    A
# 3    C
# 4    B
# 5    C
# 6    A
# 7    C
# 8    A
# 9    B
# 10   C

createDummyFeatures(df, cols = "var")

# var.A var.B var.C
# 1      0     1     0
# 2      1     0     0
# 3      0     0     1
# 4      0     1     0
# 5      0     0     1
# 6      1     0     0
# 7      0     0     1
# 8      1     0     0
# 9      0     1     0
# 10     0     0     1

createDummyFeatures drops original variable. https://www.rdocumentation.org/packages/mlr/versions/2.9/topics/createDummyFeatures

查看更多
情到深处是孤独
7楼-- · 2018-12-31 04:27

I use such a function (for data.table):

# Ta funkcja dla obiektu data.table i zmiennej var.name typu factor tworzy dummy variables o nazwach "var.name: (level1)"
factorToDummy <- function(dtable, var.name){
  stopifnot(is.data.table(dtable))
  stopifnot(var.name %in% names(dtable))
  stopifnot(is.factor(dtable[, get(var.name)]))

  dtable[, paste0(var.name,": ",levels(get(var.name)))] -> new.names
  dtable[, (new.names) := transpose(lapply(get(var.name), FUN = function(x){x == levels(get(var.name))})) ]

  cat(paste("\nDodano zmienne dummy: ", paste0(new.names, collapse = ", ")))
}

Usage:

data <- data.table(data)
data[, x:= droplevels(x)]
factorToDummy(data, "x")
查看更多
登录 后发表回答