I would like to take a data frame with characters and numbers, and concatenate all of the elements of the each row into a single string, which would be stored as a single element in a vector. As an example, I make a data frame of letters and numbers, and then I would like to concatenate the first row via the paste function, and hopefully return the value "A1"
df <- data.frame(letters = LETTERS[1:5], numbers = 1:5)
df
## letters numbers
## 1 A 1
## 2 B 2
## 3 C 3
## 4 D 4
## 5 E 5
paste(df[1,], sep =".")
## [1] "1" "1"
So paste is converting each element of the row into an integer that corresponds to the 'index of the corresponding level' as if it were a factor, and it keeps it a vector of length two. (I know/believe that factors that are coerced to be characters behave in this way, but as R is not storing df[1,] as a factor at all (tested by is.factor(), I can't verify that it is actually an index for a level)
is.factor(df[1,])
## [1] FALSE
is.vector(df[1,])
## [1] FALSE
So if it is not a vector then it makes sense that it is behaving oddly, but I can't coerce it into a vector
> is.vector(as.vector(df[1,]))
[1] FALSE
Using as.character
did not seem to help in my attempts
Can anyone explain this behavior?
if you want to start with
.. then there is no general rule about how
df$letters
will be interpreted by any given function. It's a factor for modelling functions, character for some and integer for some others. Even the same function such as paste may interpret it differently, depending on how you use it:No logic in it except that it will probably make sense once you know the internals of every function.
The factors seem to be converted to integers when an argument is converted to vector (as you know, data frames are lists of vectors of equal length, so the first row of a data frame is also a list, and when it is forced to be a vector, something like this happens:)
I don't know how
apply
achieves what it does (i.e., factors are represented by character values) -- if you're interested, look at its source code. It may be useful to know, though, that you can trust (in this specific sense)apply
(in this specific occasion). More generally, it is useful to store every piece of data in a sensible format, that includes storing strings as strings, i.e., usingstringsAsFactors=FALSE
.Btw, every introductory R book should have this idea in a subtitle. For example, my plan for retirement is to write "A (not so) gentle introduction to the zen of data fishery with R, the stringsAsFactors=FALSE way".
While others have focused on why your code isn't working and how to improve it, I'm going to try and focus more on getting the result you want. From your description, it seems you can readily achieve what you want using paste:
You can change
df$letters
to character usingdf$letters <- as.character(df$letters)
if you don't want to use thestringsAsFactors
argument.But let's assume that's not what you want. Let's assume you have hundreds of columns and you want to paste them all together. We can do that with your minimal example too:
EDIT: Alternative method and explanation:
I realised the problem you're having is a combination of the fact that you're using a factor and that you're using the
sep
argument instead ofcollapse
(as @adibender picked up). The difference is thatsep
gives the separator between two separate vectors andcollapse
gives separators within a vector. When you usedf[1,]
, you supply a single vector topaste
and hence you must use thecollapse
argument. Using your idea of getting every row and concatenating them, the following line of code will do exactly what you want:Ok, now for the explanations:
Why won't
as.list
work?as.list
converts an object to a list. So it does work. It will convert your dataframe to a list and subsequently ignore thesep=""
argument.c
combines objects together. Technically, a dataframe is just a list where every column is an element and all elements have to have the same length. So when I combine it withsep=""
, it just becomes a regular list with the columns of the dataframe as elements.Why use
do.call
?do.call
allows you to call a function using a named list as its arguments. You can't just throw the list straight intopaste
, because it doesn't like dataframes. It's designed for concatenating vectors. So remember thatdfargs
is a list containing a vector of letters, a vector of numbers and sep which is a length 1 vector containing only "". When I usedo.call
, the resulting paste function is essentiallypaste(letters, numbers, sep)
.But what if my original dataframe had columns
"letters", "numbers", "squigs", "blargs"
after which I added the separator like I did before? Then the paste function throughdo.call
would look like:So you see it works for any number of columns.
For those using library(tidyverse), you can simply use the unite function.
This will give you a new column called "together" with A1, B2, etc
This is indeed a little weird, but this is also what is supposed to happen. When you create the
data.frame
as you did, columnletters
is stored asfactor
. Naturally factors have no ordering, therefore whenas.numeric()
is applied to a factor it returns the ordering of of the factor. For example:A
is the first level of the factordf[, 1]
thereforeA
gets converted to the value1
, whenas.numeric
is applied. This is what happens when you callpaste(df[1, ])
. Since columns 1 and 2 are of different class, paste first transforms both elements of row 1 to numeric then to characters.When you want to concatenate both columns, you first need to transform the first row to character:
As @sebastian-c pointed out, you can also use
stringsAsFactors = FALSE
in the creation of the data.frame, then you can omit theas.character()
step.