Is it possible to find out which columns are (not)

2019-02-26 07:11发布

问题:

We have a very large, very old table with a few hundred columns. Some of the columns are historical and aren't used in any writing client. They are mostly empty (expect in very old records). I want to clean up the DB and get rid of old, unused columns in certain tables.

The problem are all the third party clients that access this DB (reading only). I can't expect all providers to update their clients. As long as they are querying for SELECT * ..., it doesn't matter. But I expect them to query explicitly (SELECT colA, colB, ...). Removing colA from the table would result in errors on the client side, obviously.

Now I would like to know which columns are explicitly used by any query statements, so I can remove the unused ones. I guess I could use the query log, analyze it and find explicitly used columns, but:

  1. We receive a few millions queries per hour.
  2. Some clients access our DB maybe once a week if at all, some every single second.

That means the query log would have to run for months in a production environment and I don't know if that could/would have any negative impact on the servers or overall performance.

Is there any other, more solid solution? Are my concerns regarding the query log exaggerated? I was hoping that MariaDB/MySQL are storing statistical data somewhere, showing the usage of columns, but I couldn't find anything I need.

回答1:

There no log of what columns are mentioned in queries.

The "general log" copies every query to a file. This can be a serious disk hog (space and speed), especially with "millions of queries per hour". But it would have an attempt at the answer...

The general log can, I think, be summarized via pt-query-digest.

Another possibility is to use tcpdump together with pt-query-digest to grab all the queries.

The advantage of the digest is that it consolidates multiple 'similar' queries into a single entry. You would still have to manually (or problematically) sift through the output.

Most columns are not worth removing, even if they are never used. I would suggest focusing on the 10% that are the bulkiest. There might be a way to use tcpdump | egrep to look for just those column names. Refine that a few times, and you might discover some prime candidate(s) for removal.

Unfortunately, if the clients do SELECT *, then "all" columns are being used.



回答2:

KEY_COLUMN_USAGE table describes which key columns have constraints , you will get even the table name in the result . If you will get a empty result it means you can delete the field from table without worry about FK .

SELECT *
    FROM
      KEY_COLUMN_USAGE
    WHERE
       REFERENCED_COLUMN_NAME = 'your_column_name';


标签: mysql mariadb