I uploaded a .txt
file in to R
as follows: Election_Parties <- readr::read_lines("Election_Parties.txt")
The following text is in the file: pastebin link.
The text more or less looks as follows (Please use actual file for solution!):
BOLIVIA
P1-Nationalist Revolutionary Movement-Free Bolivia Movement (Movimiento
Nacionalista Revolucionario [MNR])
P19-Liberty and Justice (Libertad y Justicia [LJ])
P20-Tupak Katari Revolutionary Movement (Movimiento Revolucionario Tupak Katari [MRTK])
COLOMBIA
P1-Democratic Aliance M-19 (Alianza Democratica M-19 [AD-M19])
P2-National Popular Alliance (Alianza Nacional Popular [ANAPO])
P3-Indigenous Authorities of Colombia (Autoridades Indígenas
de Colombia)
I would like to have all information about a party on one line, no matter how long it is.
DESIRED OUTPUT:
BOLIVIA
P1-Nationalist Revolutionary Movement-Free Bolivia Movement (Movimiento Nacionalista Revolucionario
P19-Liberty and Justice (Libertad y Justicia [LJ])
P20-Tupak Katari Revolutionary Movement (Movimiento Revolucionario Tupak Katari [MRTK])
COLOMBIA
P1-Democratic Aliance M-19 (Alianza Democratica M-19 [AD-M19])
P2-National Popular Alliance (Alianza Nacional Popular [ANAPO])
P3-Indigenous Authorities of Colombia (Autoridades Indígenas de Colombia)
The following answer: strsplit(paste(Election_Parties, collapse=" "), "\\s+(?=P\\d+-)", perl=TRUE)[[1]]
from this LINK, works to correct the strings, but it does not deal with the headers (BOLIVIA, COLUMBIA & the empty lines) properly. Dealing with this is important because I want to apply this solution afterwards.
Although I got an answer in the commentsof that post which worked on the example, it does not work on my text file.
How can I adapt the solution to deal with (leave alone) the headers and empty lines?
I turned the whole thing into a tidy and useful format. Have a look:
First I read in the file:
I split the raw format into entries by looking for empty lines, which occur just before a new entry:
Then I loop through every entry and turn it into a
tibble
:And now we have a really nice
data.frame
we can easily work with:The strings which are scattered over multiple lines are corrected in this bit:
desc
will beNA
in cases where the line does not begin with e.g., "P1-" (1 can be any number). If this is the case the line is collapse with the previous entry. Later onNA
are removed leaving the information only in the correct line.