Creating Taxonomy Table in MySQL

2019-03-16 06:22发布

问题:

I am creating a botanical database where the plants will be organized by their taxonomy:

Life Domain Kingdom Phylum Class Order Family Genus Species

I was considering using the example put forth by the article Managing Hierarchical Data in MySQL, however it is adding the above list as records inside the table....and I'm not sure if that is the best thing to do since I will be having multiple species per genus and multiple genus per family and so on. What would you suggest is the best way to approach this problem. Thanks in advance.

回答1:

I worked with similar data, and I made it in 2 parts. In PostgreSQL syntax.

First is taxonomy structure (Family, Genus, Species, ...):

CREATE TABLE taxonomic_units (
  id         serial        PRIMARY KEY,
  name       varchar(20)   NOT NULL,
  parent_id  integer       REFERENCES taxonomic_units(id)
);

1 | Life    | NULL
2 | Domain  | 1
...
7 | Family  | 6
8 | Genus   | 7
9 | Species | 8

Second is description and storing of botanical data:

CREATE TABLE taxons (
  id                 serial        PRIMARY KEY,
  suptaxon_id        integer       REFERENCES taxons(id),
  taxonomic_unit_id  integer       NOT NULL REFERENCES taxonomic_units(id),
  name               varchar(50)   NOT NULL,
  authority          varchar(50)
);

100 | NULL | 8 | Ocimum    | L.
101 | 100  | 9 | basilicum | L.
102 | 100  | 9 | gratissim | L.


回答2:

I'm not sure I really buy into that article. Graph structures would be needed when the categories itself are mutable. Such as, all the sudden taxonomists decided to add three new levels between genus and species, and so on.

From the article:

... the management of hierarchical data is not what a relational database is intended for.

Actually, its exactly what it is intended for:

http://en.wikipedia.org/wiki/Hierarchical_database_model

The hierarchical data model lost traction as Codd's relational model became the de facto standard used by virtually all mainstream database management systems.

I would first write a view that joined all of your tables so that you would have these as your columns:

Life Domain Kingdom Phylum Class Order Family Genus Species

Now you can query that view any way you like and not have to worry about any joins. Easy :)



回答3:

You can download complete taxonomy data from http://itis.gov and the data is updated more or less monthly. The data they provide includes a Materialized Path -- every species in the database has a string of all the levels above it, like a breadcrumbs string or a filesystem path.

I used this data to design a demo in my presentation Models for Hierarchical Data. I converted the materialized path data into Closure Table.



回答4:

It sounds more like a graph. I'd wonder if NEO4J would be a better choice.



回答5:

There are several ways of representing hierarchical data in a relational database, albeit a NoSQL solution might be easier to work with as @duffymo mentioned. So assuming an RDBMS, see my question on the topic for an enumeration of a half dozen possibilities. For your situation, I would lead with a materialized path to make seeing the family tree easy. If the hierarchy changes regularly I would probably also model as an adjacency list and update the materialized path using a trigger.