可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
I'm still relatively new to R and hope you can again help me. I have a character vector with a length of 42000. The vector looks like this:
a <- c("blablabla-19960101T000000Z-1.tsv", "blablabla-19960101T000000Z-2.tsv", "blablabla-19960101T000000Z-3.tsv")
I want to split the vector into a data frame which looks like this:
Name Date no
blablabla 1996-01-01 1
blablabla 1996-01-01 2
blablabla 1996-01-01 3
I'm struggling with the splitting as well as the creation of my data frame. Can someone help me with this? Thanks!
回答1:
DF <- data.frame(do.call(rbind, strsplit(a, "-", fixed=TRUE)))
DF[,2] <- as.Date(DF[,2] , format="%Y%m%d")
DF[,3] <- as.integer(gsub(".tsv", "", DF[,3], fixed=TRUE))
# X1 X2 X3
#1 blablabla 1996-01-01 1
#2 blablabla 1996-01-01 2
#3 blablabla 1996-01-01 3
回答2:
maybe with
library(reshape2)
colsplit(a, "\\-", names=c("A", "B", "C"))
A B C
1 blablabla 19960101T000000Z 1.tsv
2 blablabla 19960101T000000Z 2.tsv
3 blablabla 19960101T000000Z 3.tsv
or
b <- colsplit(a, "[[:punct:]]|\\T|\\.", names=c("A", "B", "C", "D","E"))
A B C D E
1 blablabla 19960101 000000Z 1 tsv
2 blablabla 19960101 000000Z 2 tsv
3 blablabla 19960101 000000Z 3 tsv
and then
library(lubridate)
b$B <- ymd(b$B)
A B C D E
1 blablabla 1996-01-01 000000Z 1 tsv
2 blablabla 1996-01-01 000000Z 2 tsv
3 blablabla 1996-01-01 000000Z 3 tsv
str(b)
'data.frame': 3 obs. of 5 variables:
$ A: chr "blablabla" "blablabla" "blablabla"
$ B: POSIXct, format: "1996-01-01" "1996-01-01" "1996-01-01"
$ C: chr "000000Z" "000000Z" "000000Z"
$ D: int 1 2 3
$ E: chr "tsv" "tsv" "tsv"
回答3:
You can almost use read.table
directly, but your date format isn't the same as what R would use for the colClasses
argument.
No problem. Just specify your own class
and proceed :-)
## Create a class called "ymdDate"
setClass("ymdDate")
setAs("character", "ymdDate", function(from) as.Date(from, format="%Y%m%d"))
## Use `read.table` on your character vector. For convenience, I've
## used `gsub` to get rid of the `.tsv` in before reading it in.
out <- read.table(text = gsub(".tsv$", "", a), header = FALSE,
sep = "-", colClasses=c("character", "ymdDate", "integer"))
out
# V1 V2 V3
# 1 blablabla 1996-01-01 1
# 2 blablabla 1996-01-01 2
# 3 blablabla 1996-01-01 3
str(out)
# 'data.frame': 3 obs. of 3 variables:
# $ V1: chr "blablabla" "blablabla" "blablabla"
# $ V2: Date, format: "1996-01-01" "1996-01-01" "1996-01-01"
# $ V3: int 1 2 3
回答4:
I know I'm late to this party, but I wanted to see this same idea in a magrittr
pipe and using more tidyverse
functions. Here's what I've got:
library(stringr)
library(lubridate)
library(tidyverse)
a <- c("blablabla-19960101T000000Z-1.tsv", "blablabla-19960101T000000Z-2.tsv", "blablabla-19960101T000000Z-3.tsv")
a %>%
strsplit('-') %>%
transpose() %>%
map_dfc(~data_frame(.x)) %>%
unnest() %>%
set_names(c('Name','Date','no')) %>%
mutate(Date = Date %>%
str_extract('\\d+') %>%
ymd(),
no = str_extract(no, '\\d+'))