skbio.tree.TreeNode.from_taxdump

classmethod TreeNode.from_taxdump(nodes, names=None)[source]

Construct a tree from the NCBI taxonomy database.

State: Experimental as of 0.5.8.

Parameters
  • nodes (pd.DataFrame) – Taxon hierarchy

  • names (pd.DataFrame or dict, optional) – Taxon names

Returns

The constructed tree

Return type

TreeNode

Notes

nodes and names correspond to “nodes.dmp” and “names.dmp” of the NCBI taxonomy database. The should be read into data frames using skbio.io.read prior to this operation. Alternatively, names may be provided as a dictionary. If names is omitted, taxonomy IDs be used as taxon names.

Raises
  • ValueError – If there is no top-level node

  • ValueError – If there are more than one top-level node

Examples

>>> import pandas as pd
>>> from skbio.tree import TreeNode
>>> nodes = pd.DataFrame([
...             [1, 1, 'no rank'],
...             [2, 1, 'domain'],
...             [3, 1, 'domain'],
...             [4, 2, 'phylum'],
...             [5, 2, 'phylum']], columns=[
...     'tax_id', 'parent_tax_id', 'rank']).set_index('tax_id')
>>> names = {1: 'root', 2: 'Bacteria', 3: 'Archaea',
...          4: 'Firmicutes', 5: 'Bacteroidetes'}
>>> tree = TreeNode.from_taxdump(nodes, names)
>>> print(tree.ascii_art())
                    /-Firmicutes
          /Bacteria|
-root----|          \-Bacteroidetes
         |
          \-Archaea