When you have a TEXT field in MySQL or PostgreSQL,

2020-04-06 10:04发布

问题:

I've heard that if you have a table with a TEXT column that will hold a large chunk of text data, it's better for performance to move that column into a separate table and get it via JOINs to the base record.

Is this true, and if so why?

回答1:

Not with PostgreSQL, from the manual:

Very long values are also stored in background tables so that they do not interfere with rapid access to shorter column values.

So a large character column (such as TEXT or VARCHAR without a specified size limit) is stored away from the main table data. So, PostgreSQL has your "put it in a separate table" optimization built in. If you're using PostgreSQL, arrange your table sensibly and leave the data layout to PostgreSQL.

I don't know how MySQL or other RDBMs arrange their data.

The reason behind this optimization is that the database will usually keep the data for each row in contiguous blocks on disk to cut down on seeking when the row needs to be read or updated. If you have a TEXT (or other variable length type) column in a row then the size of the row is variable so more work is needed to go from row to row. An analogy would be the difference between accessing something in a linked list versus accessing an array; with a linked list, you have to read three elements one at a time to get to the fourth one, with an array you just offset 3 * element_size bytes from the beginning and you're there in one step.



回答2:

From the MySQL Manual:

For a table with several columns, to reduce memory requirements for queries that do not use the BLOB column, consider splitting the BLOB column into a separate table and referencing it with a join query when needed.



回答3:

In some scenarios, this might be true. The reason is that let's say your table is:

create table foo (
    id serial primary key,
    title varchar(200) not null,
    pub_date datetime not null,
    text_content text
);

Then you do a query like this:

select id, title, pub_date
  from foo;

You will have to load much more pages from disk that you would have if you didn't have the text_content field in that table. And query optimization is most about reducing disk I/O to the minimum possible.