I have data imported from a .csv file. The first column contains character strings that contain text within parentheses. The data look like:
symbol
___________________________________________
1 | Apollo Senior Floating Rate Fund Inc. (AFT)
2 | Apollo Tactical Income Fund Inc. (AIF)
3 | Altra Industrial Motion Corp. (AIMC)
4 | Allegion plc (ALLE)
5 | Amphenol Corporation (APH)
6 | Ares Management Corporation (ARES)
7 | ARMOUR Residential REIT, Inc. (ARR)
8 | Banc of California, Inc. (BANC)
9 | BlackRock Resources (BCX)
10| Belden Inc (BDC)
...
I need to convert that column of data into a list such as:
symbol2
___________________________________________
1 | AFT
2 | AIF
3 | AIMC
4 | ALLE
5 | APH
6 | ARES
7 | ARR
8 | BANC
9 | BCX
10| BDC
...
My ultimate goal is to get a single character string where the text bound by parentheses are separated by a ";" like this:
"AFT;AIF;AIMC;ALLE;APH;ARES;ARR;BANC;BCX;BDC;..."
I can do this last step with
paste(symbol2, collapes = ";")
but I can't figure out how to isolate the desired text.
I've tried everything listed here (extract a substring in R according to a pattern) by replacing the ":" with "(" and could not get anything to work. I tried:
gsub("(?<=\\()[^()]*(?=\\))(*SKIP)(*F)|.", "", symbol, perl=T)
as recommended here (Extract text in parentheses in R), but the output is
"c(4, 5, 2, 1, 3, 6, 7, 8, 17, 9,...)"
Any help?
We can extract the contents using
str_extract
fromstringr
Regex taken from here.
You can then
paste
them together1) read.table Use
read.table
with the indicatedsep
andcomment
values to get a 2 column data frame in which the first column is the names and the second column is the symbols. Finally take that second column and collapse it into a single string. No packages or regular expressions are used.2) dplyr We can use
separate
from tidyr to separate the name and symbol columns dropping the name column at the same time.unlist
that and collapse it into a single string. tidyr 0.8.2 or later must be used.3) gsub We can match everything up to and including (, i.e.
".*\\("
and also everything from ) onwards, i.e."\\).*"
and replace those with the empty string. Then collapse as before.4) trimws This is another base solution. It requires R 3.6.0 or later (currently r-devel). We define whitespace as anything other than parentheses and use
trimws
to trim it away. Then we define whitespace as parentheses and then trim that away. That leaves us with the symbols which we can now collapse.Note
The input in reproducible form is:
Here is an option using base R's
sub
and a capture groupSample data