I'm trying to write some code to open all the data files in a folder, apply a function (or set of functions) to extract my data of interest. So far, so good. The problem is that I'd like to re-name one of the columns I'm extracting from each file using one element of the file name, and I'm having a hard time figuring out how to extract it.
I have a bunch of files named "YYYY-MM-DD geneName data copy.txt" and would like to extract the "geneName" part of the file name. (For example, I have "2012-05-31 PMA1 data copy.txt".)
The date format is always the same (YYYY-MM-DD), and all the file names end in "data copy.txt".
Additionally, some of the file names have an additional experiment annotation (either "E(number)" or "Expt(number)") in the file name between the date and geneName (for example, "2012-05-21 E7 PMA1 data copy.txt"); others have "SDM" between the geneName and "data copy.txt".
Here's a list of some file names and my desired output:
- 2012-05-31 CTN1 data copy.txt (I want "CTN1)
- 2012-05-21 E7 PMA1 data copy.txt (want "PMA1")
- 2011-11-29 TDH3 SDM data copy.txt (want "TDH3")
- 2012-01-04 POX1 data copy.txt (want "POX1")
Any thoughts about how I can do that without having to remove the experiment number or "SDM" from some of the files by hand?
Thanks!