Storing changes on entities: Is MySQL the proper s

2019-06-15 14:15发布

i want to store changes that i do on my "entity" table. This should be like a log. Currently it is implemented with this table in MySQL:

CREATE TABLE `entitychange` (
  `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
  `entity_id` int(10) unsigned NOT NULL,
  `entitytype` enum('STRING_1','STRING_2','SOMEBOOL','SOMEDOUBLE','SOMETIMESTAMP') NOT NULL DEFAULT 'STRING_1',
  `when` TIMESTAMP NOT NULL,
  `value` TEXT,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
  • entity_id = the primary key of my entity table.
  • entitytype = the field that was changed in the entity table. sometimes only one field is changed, sometimes multiple. one change = one row.
  • value = the string representation of the "new value" of the field.

Example when changing Field entity.somedouble from 3 to 2, i run those queries:

UPDATE entity SET somedouble = 2 WHERE entity_id = 123;
INSERT INTO entitychange (entity_id,entitytype,value) VALUES (123,'SOMEDOUBLE',2);

I need to select the changes of a specific entity and entitytype of the last 15 days. For example: The last changes with SOMEDOUBLE for entity_id 123 within the last 15 days.

Now, there are two things that i dislike:

  1. All Data is stored as TEXT - although most (less than 1%) isn't really text, in my case, most values are DOUBLE. Is this a big problem?
  2. The Table is getting really, really slow when inserting, since the table already has 200 million rows. Currently my Server load is up to 10-15 because of this.

My Question: How do i address those two "bottlenecks"? I need to scale.

My approaches would be:

  1. Store it like this: http://sqlfiddle.com/#!2/df9d0 (click on browse) - Store the changes in the entitychange table and then store the value according to its datatype in entitychange_[bool|timestamp|double|string]
  2. Use partitioning by HASH(entity_id) - i thought of ~50 partitions.
  3. Should I use another database system, maybe MongoDB?

8条回答
爱情/是我丢掉的垃圾
2楼-- · 2019-06-15 15:18

Now I think I understand what you need, a versionable table with history of the records changed. This could be another way of achieving the same and you could easily make some quick tests in order to see if it gives you better performance than your current solution. Its the way Symfony PHP Framework does it in Doctrine with the Versionable plugin.
Have in mind that there is a primary key unique index of two keys, version and fk_entity.
Also take a look at the values saved. You will save a 0 value in the fields which didnt change and the changed value in those who changed.

CREATE TABLE `entity_versionable` (
  `version` INT(11) UNSIGNED NOT NULL AUTO_INCREMENT,
  `fk_entity` INT(10) UNSIGNED NOT NULL,
  `str1` VARCHAR(255),
  `str2` VARCHAR(255),
  `bool1` BOOLEAN,
  `double1` DOUBLE,
  `date` TIMESTAMP NOT NULL,
  PRIMARY KEY (`version`,`fk_entity`)
) ENGINE=INNODB DEFAULT CHARSET=latin1;


INSERT INTO `entity_versionable` (fk_entity, str1, str2, bool1, double1, DATE)
VALUES ("1", "a1", "0", "0", "0", "2013-06-02 17:13:16");
INSERT INTO `entity_versionable` (fk_entity, str1, str2, bool1, double1, DATE)
VALUES ("1", "a2", "0", "0", "0", "2013-06-11 17:13:12");
INSERT INTO `entity_versionable` (fk_entity, str1, str2, bool1, double1, DATE)
VALUES ("1", "0", "b1", "0", "0", "2013-06-11 17:13:21");
INSERT INTO `entity_versionable` (fk_entity, str1, str2, bool1, double1, DATE)
VALUES ("1", "0", "b2", "0", "0", "2013-06-11 17:13:42");
INSERT INTO `entity_versionable` (fk_entity, str1, str2, bool1, double1, DATE)
VALUES ("1", "0", "0", "1", "0", "2013-06-16 17:19:31");

/*Another example*/
INSERT INTO `entity_versionable` (fk_entity, str1, str2, bool1, double1, DATE)
VALUES ("1", "a1", "b1", "0", "0", CURRENT_TIMESTAMP);


SELECT * FROM `entity_versionable` t WHERE 
(
    (t.`fk_entity`="1") AND 
    (t.`date` >= (CURDATE() - INTERVAL 15 DAY))
);


And probably another step to improve performance, it could be to save all history log records in separate tables, once per month or so. That way you wont have many records in each table, and searching by date will be really fast.

查看更多
叼着烟拽天下
3楼-- · 2019-06-15 15:19

This is called a temporal database, and researchers have been struggling with the best way to store and query temporal data for over 20 years.

Trying to store the EAV data as you are doing is inefficient, in that storing numeric data in a TEXT column uses a lot of space, and your table is getting longer and longer, as you have discovered.

Another option which is sometimes called Sixth Normal Form (although there are multiple unrelated definitions for 6NF), is to store an extra table to store revisions for each column you want to be tracked temporally. This is similar to the solution posed by @xtrm's answer, but it doesn't need to store redundant copies of columns that haven't changed. But it does lead to an explosion in the number of tables.

I've started to read about Anchor Modeling, which promises to handle temporal changes of both structure and content. But I don't understand it well enough to explain it yet. I'll just link to it and maybe it'll make sense to you.

Here are a couple of books that contain discussions of temporal databases:

查看更多
登录 后发表回答