I've been trying to remove the white space that I have in a data frame (using R). The data frame is large (>1gb) and has multiple columns that contains white space in every data entry.
Is there a quick way to remove the white space from the whole data frame? I've been trying to do this on a subset of the first 10 rows of data using:
gsub( " ", "", mydata)
This didn't seem to work, although R returned an output which I have been unable to interpret.
str_replace( " ", "", mydata)
R returned 47 warnings and did not remove the white space.
erase_all(mydata, " ")
R returned an error saying 'Error: could not find function "erase_all"'
I would really appreciate some help with this as I've spent the last 24hrs trying to tackle this problem.
Thanks!
One possibility involving just
dplyr
could be:Or considering that all variables are of class character:
If i understood you correctly then you want to remove all the white spaces from entire data frame, i guess the code which you are using is good for removing spaces in the column names.I think you should try this:
Hope this works.
This will return a matrix however, if you want to change it to data frame then do:
EDIT In 2020:
Using
lapply
andtrimws
function withboth=TRUE
can remove leading and trailing spaces but not inside it.Since there was no input data provided by OP, I am adding a dummy example to produce the results.DATA:
#situation: 1 (Using Base R), when we want to remove spaces only at the leading and trailing ends NOT inside the string values, we can use trimws
# situation: 2 (Using Base R) , when we want to remove spaces at every place in the dataframe in character columns (inside of a string as well as at the leading and trailing ends).
(This was the initial solution proposed using apply, please note a solution using apply seems to work but would be very slow, also the with the question its apparently not very clear if OP really wanted to remove leading/trailing blank or every blank in the data)
## situation: 1 (Using data.table, removing only leading and trailing blanks)
Output from situation1:
## situation: 2 (Using data.table, removing every blank inside as well as leading/trailing blanks)
Output from situation2:
Note the difference between the outputs of both situation, In row number 2: you can see that, with
trimws
we can remove leading and trailing blanks, but with regex solution we are able to remove every blank(s).I hope this helps , Thanks
Picking up on Fremzy and Mielniczuk, I came to the following solution:
It works for mixed numeric/charactert dataframes manipulates only character-columns.
Picking up on Fremzy and the comment from Stamper, this is now my handy routine for cleaning up whitespace in data:
As others have noted this changes all types to character. In my work, I first determine the types available in the original and conversions required. After trimming, I re-apply the types needed.
If your original types are OK, apply the solution from MarkusN below https://stackoverflow.com/a/37815274/2200542
Those working with Excel files may wish to explore the readxl package which defaults to trim_ws = TRUE when reading.