Easiest way to cross-reference a CSV file with a t

2019-02-28 12:47发布

问题:

I have a list of strings in a CSV file, and another text file that I would like to search for these strings. The CSV file has just the strings that I am interested in, but the text file has a bunch of other text interspersed among the strings of interest (the strings I am interested in are ID numbers for a database of proteins). What would the easiest way of going about this be? I want to check the text file for the presence of every string in the CSV file. I am working in a research lab at a top university, so you would be aiding cutting-edge research!

Thanks :)

回答1:

I would use Python for this. To print the matching lines, you could do this:

import csv
with open("strings.csv") as csvfile: 
    reader = csv.reader(csvfile)
    searchstrings = {row[0] for row in reader}   # Construct a set of keywords
with open("text.txt") as txtfile:
    for number, line in enumerate(txtfile):
        for needle in searchstrings:
            if needle in line: 
                print("Line {0}: {1}".format(number, line.strip()))
                break   # only necessary if there are several matches per line