我做的是使用漩涡的R课程。 我在R编程环境的第12章 - 数据操作。 我被困关于泰坦尼克号幸存者的最后一个问题。 我开始从以前的问题,它创建了第一个数据帧的代码。
titanic_4 <- titanic %>%
select(Survived, Pclass, Age, Sex) %>%
filter(!is.na(Age))
mutate(agecat = cut(Age, breaks = c(0, 14.99, 50, 150),
include.lowest = TRUE,
labels = c("Under 15", "15 to 50",
"Over 50")))
head (titanic_4)
# After the previous question, you should have transformed the `titanic`
# data to look like this:
#
## Survived Pclass Age Sex agecat
## 0 3 22 male 15 to 50
## 1 1 38 female 15 to 50
## 1 3 26 female 15 to 50
## 1 1 35 female 15 to 50
## 0 3 35 male 15 to 50
## 0 1 54 male Over 50
#
# Add one or more `dplyr` or `tidyr` functions to the pipe chain in
# the code at the bottom of the script to change the `titanic`
# dataset. The first six lines of the final `titanic_4` dataset
# should look like the following example, with the number of
# passengers, number of survivors, and percent survival stratified
# by passenger class, age category, and sex. Be sure to use the
# same column names as shown in the example output.
#
## Pclass agecat Sex N survivors perc_survived
## <int> <fctr> <chr> <int> <int> <dbl>
## 1 Under 15 female 2 1 50.000000
## 1 Under 15 male 3 3 100.000000
## 1 15 to 50 female 70 68 97.142857
## 1 15 to 50 male 72 32 44.444444
## 1 Over 50 female 13 13 100.000000
## 1 Over 50 male 26 5 19.230769
为了解决这个问题,我创造了这个代码:
titanic_4 <- titanic %>%
select(Survived, Pclass, Age, Sex) %>%
filter(!is.na(Age)) %>%
mutate(agecat = cut(Age, breaks = c(0, 14.99, 50, 150),
include.lowest = TRUE,
labels = c("Under 15", "15 to 50",
"Over 50"))) %>%
group_by(Pclass,agecat,Sex) %>%
summarize(N=n(), survivors = sum(Survived))%>%
mutate(perc_survived = sprintf("%.6f",
((survivors/N)*100.000000)))
head(titanic_4)
这给出了这样的输出:
# A tibble: 6 x 6
# Groups: Pclass, agecat [3]
Pclass agecat Sex N survivors perc_survived
<int> <fctr> <chr> <int> <int> <chr>
1 1 Under 15 female 2 1 50.000000
2 1 Under 15 male 3 3 100.000000
3 1 15 to 50 female 70 68 97.142857
4 1 15 to 50 male 72 32 44.444444
5 1 Over 50 female 13 13 100.000000
6 1 Over 50 male 26 5 19.230769
上述输出是错误的,因为最后一列(perc_survived)是一个字符,而不是一个。
为了解决这个问题,我让R的类型更改为数字与as.numeric功能。
titanic_4 <- titanic %>%
select(Survived, Pclass, Age, Sex) %>%
filter(!is.na(Age)) %>%
mutate(agecat = cut(Age, breaks = c(0, 14.99, 50, 150),
include.lowest = TRUE,
labels = c("Under 15", "15 to 50",
"Over 50"))) %>%
group_by(Pclass,agecat,Sex) %>%
summarize(N=n(), survivors = sum(Survived))%>%
mutate(perc_survived = sprintf("%.6f", (perc_survived = as.numeric
((survivors/N)*100.000000))))
head(titanic_4)
它创建这样的输出:
# A tibble: 6 x 6
# Groups: Pclass, agecat [3]
Pclass agecat Sex N survivors perc_survived
<int> <fctr> <chr> <int> <int> <dbl>
1 1 Under 15 female 2 1 50.00000
2 1 Under 15 male 3 3 100.00000
3 1 15 to 50 female 70 68 97.14286
4 1 15 to 50 male 72 32 44.44444
5 1 Over 50 female 13 13 100.00000
6 1 Over 50 male 26 5 19.23077
新的问题是,输出的小数位,而不是6个位数后四舍五入到5位。 我曾尝试各种组合可以找我,但一直没能让R能够保持6个小数时,它从字符转换为数字。
我坚持,需要从一个慷慨的人一些指导。 谢谢你,安德鲁