How do I get gene features in FASTA nucleotide for

I am able to download a FASTA file manually that looks like:

>lcl|CR543861.1_gene_1...
ATGCTTTGGACA...
>lcl|CR543861.1_gene_2...
GTGCGACTAAAA...

by clicking "Send to" and selecting "Gene Features", FASTA Nucleotide is the only option (which is fine because that's all I want) on this page.

With a script like this:

#!/usr/bin/env perl
use strict;
use warnings;
use Bio::DB::EUtilities;

my $factory = Bio::DB::EUtilities->new(-eutil   => 'efetch',
                                       -db      => 'nucleotide',
                                       -id      => 'CR543861',
                                       -rettype => 'fasta');
my $file = 'CR543861.fasta';
$factory->get_Response(-file => $file);

I get a file that looks like:

>gi|49529273|emb|CR543861.1| Acinetobacter sp. ADP1 complete genome
GATATTTTATCCACA...

with the whole genomic sequence lumped together. How do I get information like in the first (manually downloaded) file?

I looked at a couple of other posts:

As well as this section from EUtilities Cookbook.

I tried fetching and saving a GenBank file (since it seems to have separate sequences for each gene in the .gb file I get), but when I go work with it using Bio::SeqIO, I will get only 1 large sequence.

标签： database perl fasta bioperl ncbi

1条回答

祖国的老花朵

2楼-- · 2019-06-23 18:10

With that accession number and return type, you are getting the complete genome sequence. If you want to get the individual gene sequences, specify that you want the complete genbank file, then parse out the genes. Here is an example:

#!/usr/bin/env perl

use 5.010;
use strict;
use warnings;
use Bio::SeqIO;
use Bio::DB::EUtilities;


my $factory = Bio::DB::EUtilities->new(-eutil   => 'efetch',
                                       -email   => 'foo@bar.com',
                                       -db      => 'nucleotide',
                                       -id      => 'CR543861',
                                       -rettype => 'gb');
my $file = 'CR543861.gb';
$factory->get_Response(-file => $file);

my @gene_features = grep { $_->primary_tag eq 'gene' } 
                    Bio::SeqIO->new(-file => $file)->next_seq->get_SeqFeatures;

for my $feat_object (@gene_features) {
    for my $tag ($feat_object->get_all_tags) {
        # open a filehandle here for writing each to a separate file
        say ">",$feat_object->get_tag_values($tag);
        say $feat_object->spliced_seq->seq;
        # close it!
    } 
}

This will write each gene to the same file (if you redirect it, now it just writes to STDOUT) but I indicated where you could make a small change to write them to separate files. Parsing genbank can be a bit tricky at times, so it is always helpful to read the docs and in particular, the excellent Feature Annotation HOWTO.

0人赞添加讨论(0) 举报

How do I get gene features in FASTA nucleotide for

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间