How to get taxonomic specific ids for kingdom, phy

I have a list of taxids that looks like this:

I am looking to get a file with taxonomic ids in order from the taxids above:

kingdom_id      phylum_id       class_id        order_id        family_id       genus_id        species_id

I am using the package "ete3". I use the tool ete-ncbiquery that tells you the lineage from the ids above. (I run it from my linux laptop with the command below)

ete3 ncbiquery --search 1204725 2162 13000163 420247 --info

The result looks like this:

# Taxid Sci.Name    Rank    Named Lineage   Taxid Lineage
2162    Methanobacterium formicicum species root,cellular organisms,Archaea,Euryarchaeota,Methanobacteria,Methanobacteriales,Methanobacteriaceae,Methanobacterium,Methanobacterium formicicum   1,131567,2157,28890,183925,2158,2159,2160,2162
1204725 Methanobacterium formicicum DSM 3637    no rank root,cellular organisms,Archaea,Euryarchaeota,Methanobacteria,Methanobacteriales,Methanobacteriaceae,Methanobacterium,Methanobacterium formicicum,Methanobacterium formicicum DSM 3637  1,131567,2157,28890,183925,2158,2159,2160,2162,1204725
420247  Methanobrevibacter smithii ATCC 35061   no rank root,cellular organisms,Archaea,Euryarchaeota,Methanobacteria,Methanobacteriales,Methanobacteriaceae,Methanobrevibacter,Methanobrevibacter smithii,Methanobrevibacter smithii ATCC 350611,131567,2157,28890,183925,2158,2159,2172,2173,420247

I have no idea which items (IDS) correspond to what I am looking for (if any)

标签： bioinformatics taxonomy phylogeny ncbi etetoolkit

3条回答

甜甜的少女心

2楼-- · 2019-04-07 20:26

You can also use the R packaage taxonomizr. The package takes a bit of time to download the necessary files, but after that its quite fast and easy.

library("taxonomizr)
getNamesAndNodes()
taxaNodes <- read.nodes('nodes.dmp')
taxaNames <- read.names('names.dmp')
taxaID <- c("1204725", "2162", "1300163", "420247")

getNamesAndNodes downloads the names.dmp and nodes.dmp file from ncbi.

0人赞添加讨论(0) 举报

做自己的国王

3楼-- · 2019-04-07 20:27

The following code:

import csv
from ete3 import NCBITaxa

ncbi = NCBITaxa()

def get_desired_ranks(taxid, desired_ranks):
    lineage = ncbi.get_lineage(taxid)
    lineage2ranks = ncbi.get_rank(lineage)
    ranks2lineage = dict((rank, taxid) for (taxid, rank) in lineage2ranks.items())
    return {'{}_id'.format(rank): ranks2lineage.get(rank, '<not present>') for rank in desired_ranks}

def main(taxids, desired_ranks, path):
    with open(path, 'w') as csvfile:
        fieldnames = ['{}_id'.format(rank) for rank in desired_ranks]
        writer = csv.DictWriter(csvfile, delimiter='\t', fieldnames=fieldnames)
        writer.writeheader()
        for taxid in taxids:
            writer.writerow(get_desired_ranks(taxid, desired_ranks))

if __name__ == '__main__':
    taxids = [1204725, 2162,  1300163, 420247]
    desired_ranks = ['kingdom', 'phylum', 'class', 'order', 'family', 'genus', 'species']
    path = 'taxids.csv'
    main(taxids, desired_ranks, path)

Produces a file that looks like this:

kingdom_id  phylum_id   class_id    order_id    family_id   genus_id    species_id
<not present>   28890   183925  2158    2159    2160    2162
<not present>   28890   183925  2158    2159    2160    2162
<not present>   28890   183925  2158    2159    2160    2162
<not present>   28890   183925  2158    2159    2172    2173

0人赞添加讨论(0) 举报

贪生不怕死

4楼-- · 2019-04-07 20:48

With the Taxid Lineage numbers in your results, try using them in ete3's get_rank method. As an example:

from ete3 import NCBITaxa
ncbi = NCBITaxa()

print ncbi.get_rank([9606, 9443])
# {9443: u'order', 9606: u'species'}

Presumably the resulting dictionary should contain the rank information of all IDs, including any intermediate "no rank" IDs that you may want to eliminate.

0人赞添加讨论(0) 举报

How to get taxonomic specific ids for kingdom, phy

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间