I have to run a set of Entrez Direct commands to run in Terminal based on queries/search phrases in a column in a spreadsheet. How can I automatically run a terminal command repeatedly based off of a spreadsheet?
i.e., I have this command:
esearch -db pubmed -query "<query>" | efetch -format docsum | xtract -pattern DocumentSummary -element PubDate | cut -c 1-4 | sort-uniq-count > <directory>/<name>.xls
And a spreadsheet that contains values for <query>
in a column, and I can also put in values for the file to send the results to. There are 138 iterations. How do I automate that?
I would suggest you save the column of queries in text file in your HOME directory - probably using copy and paste and call it queries.txt
. Let's imagine it looks like this:
Why do 24-hr gas stations have locks on their doors?
Why don't they make planes out of the same material as the indestructible black boxes?
If nothing sticks to Teflon, how do they stick it to the pan?
Then, I would save the following as runQueries
in your HOME directory:
#!/bin/bash
while read query; do
echo Running query $query...
esearch -db pubmed -query "$query" | efetch -format docsum | xtract -pattern DocumentSummary ...
done < queries.txt
Then you can run this once in Terminal to make the script executable:
chmod +x runQueries
And then you can run it as many times as you like, using this each time:
./runQueries
Sample Output
Running query Why do 24-hr gas stations have locks on their doors?...
esearch -db pubmed -query Why do 24-hr gas stations have locks on their doors? ...
Running query Why don't they make planes out of the same material as the indestructible black boxes?...
esearch -db pubmed -query Why don't they make planes out of the same material as the indestructible black boxes? ...
Running query If nothing sticks to Teflon, how do they stick it to the pan?...
esearch -db pubmed -query If nothing sticks to Teflon, how do they stick it to the pan? ...
Assuming you mean Entrez Direct: E-utilities on the UNIX Command Line, those Unix shell utilities read and write lines of text, not Excel spreadsheet (.xls) files. A valid .xls file is a complex binary file defined by Excel. Don't write text data into files with the ".xls" filename extension (... > <directory>/<name>.xls
). It would only lead to confusion.
I think you're asking how to run a template shell script over a list of 138 query strings. If those query strings are in a spreadsheet, the easiest approach would be to export the spreadsheet as a .csv or .tsv file (comma-separated values or tab-separated values), and write a small program to read that file, loop over those query text lines, substitute the query text into a string template, and issue shell commands. Python would be a good choice of programming languages for this. It could be done as a bash script, but that would be more difficult to write, harder to debug, and more limited. Other fine language choices include Java and Ruby.
If your list of query strings is in an Excel spreadsheet, a more complicated approach would be to write an Excel VBA (Visual Basic for Applications) "macro" to do the same steps. It can use VBA's Shell()
command to call a shell program. (If you're running on Windows, you'll need to use cygwin or mingw or a similar shell to run the commands rather than the standard DOS shell.)