I have to run a set of Entrez Direct commands to run in Terminal based on queries/search phrases in a column in a spreadsheet. How can I automatically run a terminal command repeatedly based off of a spreadsheet?
i.e., I have this command:
esearch -db pubmed -query "<query>" | efetch -format docsum | xtract -pattern DocumentSummary -element PubDate | cut -c 1-4 | sort-uniq-count > <directory>/<name>.xls
And a spreadsheet that contains values for <query>
in a column, and I can also put in values for the file to send the results to. There are 138 iterations. How do I automate that?
Assuming you mean Entrez Direct: E-utilities on the UNIX Command Line, those Unix shell utilities read and write lines of text, not Excel spreadsheet (.xls) files. A valid .xls file is a complex binary file defined by Excel. Don't write text data into files with the ".xls" filename extension (
... > <directory>/<name>.xls
). It would only lead to confusion.I think you're asking how to run a template shell script over a list of 138 query strings. If those query strings are in a spreadsheet, the easiest approach would be to export the spreadsheet as a .csv or .tsv file (comma-separated values or tab-separated values), and write a small program to read that file, loop over those query text lines, substitute the query text into a string template, and issue shell commands. Python would be a good choice of programming languages for this. It could be done as a bash script, but that would be more difficult to write, harder to debug, and more limited. Other fine language choices include Java and Ruby.
If your list of query strings is in an Excel spreadsheet, a more complicated approach would be to write an Excel VBA (Visual Basic for Applications) "macro" to do the same steps. It can use VBA's
Shell()
command to call a shell program. (If you're running on Windows, you'll need to use cygwin or mingw or a similar shell to run the commands rather than the standard DOS shell.)I would suggest you save the column of queries in text file in your HOME directory - probably using copy and paste and call it
queries.txt
. Let's imagine it looks like this:Then, I would save the following as
runQueries
in your HOME directory:Then you can run this once in Terminal to make the script executable:
And then you can run it as many times as you like, using this each time:
Sample Output