I have a data table like below. All columns are in characters.
Table:
V29 V30 V31 V32 V33 V34 V35 V36 V37 V38 .... V69
044 N 005 E 026 044 N 006 E 011
I want to paste them in 5 column groups starting from V29. For example I want to obtain an output column in Table
as shown below.
Table:
V29 V30 V31 V32 V33 V34 V35 V36 V37 V38 .... V69 Output
044 N 005 E 026 044 N 006 E 011 044N005E026-044N006E011-
How can I achieve this in R. Any help is appreciated.
Thanks.
Using DF
defined in the Note at the end create a sprintf
formatting string fmt
and then run it.
If there are NA's in DF
then they will appear in the output as the string "NA"
. If you prefer to omit them completely then replace them with the empty string in DF
before running the code below, i.e. run DF[is.na(DF)] <- ""
first.
fmt <- paste(rep(strrep("%s", 5), ncol(DF)/5), collapse = "-") # %s%s%s%s%s-%s%s%s%s%s
Output <- do.call("sprintf", c(fmt, DF))
data.frame(DF, Output, stringsAsFactors = FALSE)
giving:
V29 V30 V31 V32 V33 V34 V35 V36 V37 V38 Output
1 044 N 005 E 026 044 N 006 E 011 044N005E026-044N006E011
or using DF2
from Note in place of DF
we get:
V29 V30 V31 V32 V33 V34 V35 V36 V37 V38 Output
1 044 N 005 E 026 044 N 006 E 011 044N005E026-044N006E011
2 045 S 006 F 027 045 S 007 F 012 045S006F027-045S007F012
data.table
If, as per comment, you want to use data.table then use this (with fmt
from above):
library(data.table)
DT <- data.table(DF)
DT[, Output:=do.call("sprintf", c(fmt, .SD))]
Note
Lines <- "
V29 V30 V31 V32 V33 V34 V35 V36 V37 V38
044 N 005 E 026 044 N 006 E 011 "
DF <- read.table(text = Lines, header = TRUE, colClasses = "character")
Lines2 <- "
V29 V30 V31 V32 V33 V34 V35 V36 V37 V38
1 044 N 005 E 026 044 N 006 E 011
2 045 S 006 F 027 045 S 007 F 012"
DF2 <- read.table(text = Lines2, header = TRUE, colClasses = "character")
Expanding your data a little bit:
x <- read.table(stringsAsFactors=FALSE, header=TRUE, as.is=TRUE, colClasses="character", text="
V29 V30 V31 V32 V33 V34 V35 V36 V37 V38 V29a V30a V31a V32a V33a V34a V35a V36a V37a V38a
044 N 005 E 026 044 N 006 E 011 044 N 005 E 026 044 N 006 E 011
044 N 005 E 026 044 N 006 E 011 044 N 005 E 026 044 N 006 E 011 ")
The answer:
sapply(split.default(x, (seq_len(ncol(x))-1) %/% 5),
function(s) paste(apply(s, 1, paste0, collapse = ""), collapse = "-"))
# 0 1 2
# "044N005E026-044N005E026" "044N006E011-044N006E011" "044N005E026-044N005E026"
# 3
# "044N006E011-044N006E011"
This can easily be assigned to a column of the same frame.
Explanation:
to break a frame up by 5 columns, split
comes to mind, but the default use of split(...)
will use split.data.frame
which splits by row, not column, so we use split.default
(which works by column). From there, you can see how we're grouping things:
(seq_len(ncol(x))-1) %/% 5
# [1] 0 0 0 0 0 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3
For each of these groups, we get a 5-column frame:
split.default(x, (seq_len(ncol(x))-1) %/% 5)
# $`0`
# V29 V30 V31 V32 V33
# 1 44 N 5 E 26
# 2 44 N 5 E 26
# $`1`
# V34 V35 V36 V37 V38
# 1 44 N 6 E 11
# 2 44 N 6 E 11
### truncated for brevity
So we use sapply
to do something to each of these frames, returning it (in this case) simplified. (If we specify simplify=FALSE
or if not all of them are the same length, then it will be returned unsimplified, as a list
instead of a vector
).
The function we apply to each frame is apply(., 1, paste0, collapse0)
which will return a vector of the 5-column pastes, something like:
apply(s, 1, paste0, collapse = "")
# $`0`
# [1] ""044N005E026" "044N005E026""
Because we want them combined, we surround it as paste(apply(...), collapse = "-")
.