How to save output from python like tsv

2019-04-07 01:38发布

问题:

I am using biopython package and I would like to save result like tsv file. This output from print to tsv.

for record in SeqIO.parse("/home/fil/Desktop/420_2_03_074.fastq", "fastq"):
    print ("%s %s %s" % (record.id,record.seq, record.format("qual")))

Thank you.

回答1:

That is fairly simple , instead of printing it you need to write that to a file.

with open("records.tsv", "w") as record_file:
    for record in SeqIO.parse("/home/fil/Desktop/420_2_03_074.fastq", "fastq"):
        record_file.write("%s %s %s\n" % (record.id,record.seq, record.format("qual")))

And if you want to name the various columns in the file then you can use:

record_file.write("Record_Id    Record_Seq    Record_Qal\n")

So the complete code may look like:

with open("records.tsv", "w") as record_file:
    record_file.write("Record_Id    Record_Seq    Record_Qal\n")
    for record in SeqIO.parse("/home/fil/Desktop/420_2_03_074.fastq", "fastq"):
        record_file.write(str(record.id)+"  "+str(record.seq)+"  "+ str(record.format("qual"))+"\n")


回答2:

My preferred solution is to use the CSV module. It's a standard module, so:

  • Somebody else has already done all the heavy lifting.
  • It allows you to leverage all the functionality of the CSV module.
  • You can be fairly confident it will function as expected (not always the case when I write it myself).
  • You're not going to have to reinvent the wheel, either when you write the file or when you read it back in on the other end (I don't know your record format, but if one of your records contains a TAB, CSV will escape it correctly for you).
  • It will be easier to support when the next person has to go in to update the code 5 years after you've left the company.

The following code snippet should do the trick for you:

#! /bin/env python3
import csv
with open('records.tsv', 'w') as tsvfile:
    writer = csv.writer(tsvfile, delimiter='\t', newline='\n')
    for record in SeqIO.parse("/home/fil/Desktop/420_2_03_074.fastq", "fastq"):
        writer.writerow([record.id, record.seq, record.format("qual")])

Note that this is for Python 3.x. If you're using 2.x, the open and writer = ... will be slightly different.



回答3:

If you want to use the .tsv to label your word embeddings in TensorBoard, use the following snippet. It uses the CSV module (see Doug's answer).

# /bin/env python3
import csv

def save_vocabulary():
    label_file = "word2context/labels.tsv"
    with open(label_file, 'w', encoding='utf8', newline='') as tsv_file:
        tsv_writer = csv.writer(tsv_file, delimiter='\t', lineterminator='\n')
        tsv_writer.writerow(["Word", "Count"])
        for word, count in word_count:
            tsv_writer.writerow([word, count])

word_count is a list of tuples like this:

[('the', 222594), ('to', 61479), ('in', 52540), ('of', 48064) ... ]


回答4:

The following snippet:

from __future__ import print_function
with open("output.tsv", "w") as f:
  print ("%s\t%s\t%s" % ("asd", "sdf", "dfg"), file=f)
  print ("%s\t%s\t%s" % ("sdf", "dfg", "fgh"), file=f)

Yields a file output.tsv containing

asd    sdf    dfg
sdf    dfg    fgh

So, in your case:

from __future__ import print_function
with open("output.tsv", "w") as f:
  for record in SeqIO.parse("/home/fil/Desktop/420_2_03_074.fastq", "fastq"):
    print ("%s %s %s" % (record.id,record.seq, record.format("qual")), file=f)


回答5:

I prefer using join() in this type of code:

for record in SeqIO.parse("/home/fil/Desktop/420_2_03_074.fastq", "fastq"):
    print ( '\t'.join((str(record.id), str(record.seq), str(record.format("qual"))) )

The 'tab' character is \t and the join function takes the (3) arguments and prints them with a tab in between.