I am creating a botanical database where the plants will be organized by their taxonomy:
Life
Domain
Kingdom
Phylum
Class
Order
Family
Genus
Species
I was considering using the example put forth by the article Managing Hierarchical Data in MySQL, however it is adding the above list as records inside the table....and I'm not sure if that is the best thing to do since I will be having multiple species per genus and multiple genus per family and so on. What would you suggest is the best way to approach this problem. Thanks in advance.
I worked with similar data, and I made it in 2 parts. In PostgreSQL syntax.
First is taxonomy structure (Family, Genus, Species, ...):
CREATE TABLE taxonomic_units (
id serial PRIMARY KEY,
name varchar(20) NOT NULL,
parent_id integer REFERENCES taxonomic_units(id)
);
1 | Life | NULL
2 | Domain | 1
...
7 | Family | 6
8 | Genus | 7
9 | Species | 8
Second is description and storing of botanical data:
CREATE TABLE taxons (
id serial PRIMARY KEY,
suptaxon_id integer REFERENCES taxons(id),
taxonomic_unit_id integer NOT NULL REFERENCES taxonomic_units(id),
name varchar(50) NOT NULL,
authority varchar(50)
);
100 | NULL | 8 | Ocimum | L.
101 | 100 | 9 | basilicum | L.
102 | 100 | 9 | gratissim | L.
I'm not sure I really buy into that article. Graph structures would be needed when the categories itself are mutable. Such as, all the sudden taxonomists decided to add three new levels between genus and species, and so on.
From the article:
... the management of hierarchical data is not what a relational database is intended for.
Actually, its exactly what it is intended for:
http://en.wikipedia.org/wiki/Hierarchical_database_model
The hierarchical data model lost traction as Codd's relational model became the de facto standard used by virtually all mainstream database management systems.
I would first write a view that joined all of your tables so that you would have these as your columns:
Life Domain Kingdom Phylum Class Order Family Genus Species
Now you can query that view any way you like and not have to worry about any joins. Easy :)
You can download complete taxonomy data from http://itis.gov and the data is updated more or less monthly. The data they provide includes a Materialized Path -- every species in the database has a string of all the levels above it, like a breadcrumbs string or a filesystem path.
I used this data to design a demo in my presentation Models for Hierarchical Data. I converted the materialized path data into Closure Table.
It sounds more like a graph. I'd wonder if NEO4J would be a better choice.
There are several ways of representing hierarchical data in a relational database, albeit a NoSQL solution might be easier to work with as @duffymo mentioned. So assuming an RDBMS, see my question on the topic for an enumeration of a half dozen possibilities. For your situation, I would lead with a materialized path to make seeing the family tree easy. If the hierarchy changes regularly I would probably also model as an adjacency list and update the materialized path using a trigger.