This question seems to make it easy to remove space characters in a string in R. However when I load the following table I'm not able to remove a space between two numbers (eg.11 846.4
):
require(XML)
library(RCurl)
link2fetch = 'https://www.destatis.de/DE/ZahlenFakten/Wirtschaftsbereiche/LandForstwirtschaftFischerei/FeldfruechteGruenland/Tabellen/AckerlandHauptfruchtgruppenFruchtarten.html'
theurl = getURL(link2fetch, .opts = list(ssl.verifypeer = FALSE) ) # important!
area_cult10 = readHTMLTable(theurl, stringsAsFactors = FALSE)
area_cult10 = data.table::rbindlist(area_cult10)
test = sub(',', '.', area_cult10$V5) # change , to .
test = gsub('(.+)\\s([A-Z]{1})*', '\\1', test) # remove LETTERS
gsub('\\s', '', test) # remove white space?
Why can't I remove the space in test[1]
?
Thanks for any advice! Can this be something else than a space character? Maybe the answer is really easy and I'm overlooking something.
You may shorten the
test
creation to just 2 steps and using just 1 PCRE regex (note theperl=TRUE
parameter):Result:
The
gsub
regex is worth special attention:(*UCP)
- the PCRE verb that enforces the pattern to be Unicode aware[\\s\\p{L}]+
- matches 1+ whitespace or letter characters|
- or (an alternation operator)\\W+$
- 1+ non-word chars at the end of the string.Then,
sub(",", ".", x, fixed=TRUE)
will replace the first,
with a.
as literal strings,fixed=TRUE
saves performance since it does not have to compile a regex.