I am able to download a FASTA file manually that looks like:
>lcl|CR543861.1_gene_1...
ATGCTTTGGACA...
>lcl|CR543861.1_gene_2...
GTGCGACTAAAA...
by clicking "Send to" and selecting "Gene Features", FASTA Nucleotide is the only option (which is fine because that's all I want) on this page.
With a script like this:
#!/usr/bin/env perl
use strict;
use warnings;
use Bio::DB::EUtilities;
my $factory = Bio::DB::EUtilities->new(-eutil => 'efetch',
-db => 'nucleotide',
-id => 'CR543861',
-rettype => 'fasta');
my $file = 'CR543861.fasta';
$factory->get_Response(-file => $file);
I get a file that looks like:
>gi|49529273|emb|CR543861.1| Acinetobacter sp. ADP1 complete genome
GATATTTTATCCACA...
with the whole genomic sequence lumped together. How do I get information like in the first (manually downloaded) file?
I looked at a couple of other posts:
- how to download complete genome sequence in biopython entrez.esearch (this answer seemed relevant)
- How can I download the entire GenBank file with just an accession number?
As well as this section from EUtilities Cookbook.
I tried fetching and saving a GenBank file (since it seems to have separate sequences for each gene in the .gb file I get), but when I go work with it using Bio::SeqIO, I will get only 1 large sequence.