i want to store changes that i do on my "entity" table. This should be like a log. Currently it is implemented with this table in MySQL:
CREATE TABLE `entitychange` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`entity_id` int(10) unsigned NOT NULL,
`entitytype` enum('STRING_1','STRING_2','SOMEBOOL','SOMEDOUBLE','SOMETIMESTAMP') NOT NULL DEFAULT 'STRING_1',
`when` TIMESTAMP NOT NULL,
`value` TEXT,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
entity_id
= the primary key of myentity
table.entitytype
= the field that was changed in theentity
table. sometimes only one field is changed, sometimes multiple. one change = one row.value
= the string representation of the "new value" of the field.
Example when changing Field entity.somedouble
from 3 to 2, i run those queries:
UPDATE entity SET somedouble = 2 WHERE entity_id = 123;
INSERT INTO entitychange (entity_id,entitytype,value) VALUES (123,'SOMEDOUBLE',2);
I need to select
the changes of a specific entity and entitytype of the last 15 days. For example: The last changes with SOMEDOUBLE
for entity_id 123
within the last 15 days.
Now, there are two things that i dislike:
- All Data is stored as
TEXT
- although most (less than 1%) isn't really text, in my case, most values areDOUBLE
. Is this a big problem? - The Table is getting really, really slow when inserting, since the table already has 200 million rows. Currently my Server load is up to 10-15 because of this.
My Question: How do i address those two "bottlenecks"? I need to scale.
My approaches would be:
- Store it like this: http://sqlfiddle.com/#!2/df9d0 (click on browse) - Store the changes in the
entitychange
table and then store the value according to its datatype inentitychange_[bool|timestamp|double|string]
- Use partitioning by
HASH(entity_id)
- i thought of ~50 partitions. - Should I use another database system, maybe MongoDB?
Now I think I understand what you need, a versionable table with history of the records changed. This could be another way of achieving the same and you could easily make some quick tests in order to see if it gives you better performance than your current solution. Its the way Symfony PHP Framework does it in Doctrine with the Versionable plugin.
Have in mind that there is a primary key unique index of two keys, version and fk_entity.
Also take a look at the values saved. You will save a 0 value in the fields which didnt change and the changed value in those who changed.
And probably another step to improve performance, it could be to save all history log records in separate tables, once per month or so. That way you wont have many records in each table, and searching by date will be really fast.
This is called a temporal database, and researchers have been struggling with the best way to store and query temporal data for over 20 years.
Trying to store the EAV data as you are doing is inefficient, in that storing numeric data in a TEXT column uses a lot of space, and your table is getting longer and longer, as you have discovered.
Another option which is sometimes called Sixth Normal Form (although there are multiple unrelated definitions for 6NF), is to store an extra table to store revisions for each column you want to be tracked temporally. This is similar to the solution posed by @xtrm's answer, but it doesn't need to store redundant copies of columns that haven't changed. But it does lead to an explosion in the number of tables.
I've started to read about Anchor Modeling, which promises to handle temporal changes of both structure and content. But I don't understand it well enough to explain it yet. I'll just link to it and maybe it'll make sense to you.
Here are a couple of books that contain discussions of temporal databases: