I'm working on a web project with MySql database on Java EE. We needed a view to summarize data from 3 tables with over 3M rows overall. Each table was created with index. But I haven't found out a way to take advantages in the indexes in the conditional select statement retrieval from the view that we created with [group by].
I've getting suggestions from people that using views in MySql is not a good idea. Because you can't create index for views in mysql like in oracle. But in some test that I took, indexes can be used in view select statement. Maybe I've created those views in a wrong way.
I'll use a example to describe my problem.
We have a table that records data for high scores in NBA games, with index on column [happend_in]
CREATE TABLE `highscores` (
`tbl_id` int(11) NOT NULL auto_increment,
`happened_in` int(4) default NULL,
`player` int(3) default NULL,
`score` int(3) default NULL,
PRIMARY KEY (`tbl_id`),
KEY `index_happened_in` (`happened_in`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
insert data(8 rows)
INSERT INTO highscores(happened_in, player, score)
VALUES (2006, 24, 61),(2006, 24, 44),(2006, 24, 81),
(1998, 23, 51),(1997, 23, 46),(2006, 3, 55),(2007, 24, 34), (2008, 24, 37);
then I create a view to see the highest score that Kobe Bryant got in each year
CREATE OR REPLACE VIEW v_kobe_highScores
AS
SELECT player, max(score) AS highest_score, happened_in
FROM highscores
WHERE player = 24
GROUP BY happened_in;
I wrote a conditional statement to see the highest score that kobe got in 2006;
select * from v_kobe_highscores where happened_in = 2006;
When I explain it in toad for mysql, I found out that mysql have scan all rows to form the view, then find data with condition in it, without using index on [happened_in].
explain select * from v_kobe_highscores where happened_in = 2006;
The view that we use in our project is built among tables with millions of rows. Scanning all the rows from table in every view data retrieval is unacceptable. Please help! Thanks!
@zerkms Here is the result I tested on real-life. I don't see much differences between. I think @spencer7593 has the right point. The MySQL optimizer doesn't "push" that predicate down in the view query.
How do you get MySQL to use an index for a view query? The short answer, provide an index that MySQL can use.
In this case, the optimum index is likely a "covering" index:
... ON highscores (player, happened_in, score)
It's likely that MySQL will use that index, and the EXPLAIN will show: "Using index"
due to the WHERE player = 24
(an equality predicate on the leading column in the index. The GROUP BY happened_id
(the second column in the index), may allow MySQL to optimize that using the index to avoid a sort operation. Including the score
column in the index will allow the query to satisfied entirely from the index, without having to visit (lookup) the data pages referenced by the index.
That's the quick answer. The longer answer is that MySQL is very unlikely to use an index with leading column of happened_id
for the view query.
Why the view causes a performance issue
One of the issues you have with the MySQL view is that MySQL does not "push" the predicate from the outer query down into the view query.
Your outer query specifies WHERE happened_in = 2006
. The MySQL optimizer does not consider the predicate when it runs the inner "view query". That query for the view gets executed separately, before the outer query. The resultset from the execution of that query get "materialized"; that is, the results are stored as an intermediate MyISAM table. (MySQL calls it a "derived table", and that name they use makes sense, when you understand the operations that MysQL performs.)
The bottom line is that the index you have defined on happened_in
is not being used by MySQL when it rusn the query that forms the view definition.
After the intermediate "derived table" is created, THEN the outer query is executed, using that "derived table" as a rowsource. It's when that outer query runs that the happened_in = 2006
predicate is evaluated.
Note that all of the rows from the view query are stored, which (in your case) is a row for EVERY value of happened_in
, not just the one you specify an equality predicate on in the outer query.
The way that view queries are processed may be "unexpected" by some, and this is one reason that using "views" in MySQL can lead to performance problems, as compared to the way view queries are processed by other relational databases.
Improving performance of the view query with a suitable covering index
Given your view definition and your query, about the best you are going to get would be a "Using index" access method for the view query. To get that, you'd need a covering index, e.g.
... ON highscores (player, happened_in, score).
That's likely to be the most beneficial index (performance wise) for your existing view definition and your existing query. The player
column is the leading column because you have an equality predicate on that column in the view query. The happened_in
column is next, because you've got a GROUP BY operation on that column, and MySQL is going to be able to use this index to optimize the GROUP BY operation. We also include the score
column, because that is the only other column referenced in your query. That makes the index a "covering" index, because MySQL can satisfy that query directly from index pages, without a need to visit any pages in the underlying table. And that's as good as we're going to get out of that query plan: "Using index" with no "Using filesort".
Compare performance to standalone query with no derived table
You could compare the execution plan for your query against the view vs. an equivalent standalone query:
SELECT player
, MAX(score) AS highest_score
, happened_in
FROM highscores
WHERE player = 24
AND happened_in = 2006
GROUP
BY player
, happened_in
The standalone query can also make use of a covering index e.g.
... ON highscores (player, happened_in, score)
but without a need to materialize an intermediate MyISAM table.
I am not sure that any of the previous provides a direct answer to the question you were asking.
Q: How do I get MySQL to use an INDEX for view query?
A: Define a suitable INDEX that the view query can use.
The short answer is provide a "covering index" (index includes all columns referenced in the view query). The leading columns in that index should be the columns that are referenced with equality predicates (in your case, the column player
would be a leading column because you have a player = 24
predicate in the query. Also, the columns referenced in the GROUP BY should be leading columns in the index, which allows MySQL to optimize the GROUP BY
operation, by making use of the index rather than using a sort operation.
The key point here is that the view query is basically a standalone query; the results from that query get stored in an intermediate "derived" table (a MyISAM table that gets created when a query against the view gets run.
Using views in MySQL is not necessarily a "bad idea", but I would strongly caution those who choose to use views within MySQL to be AWARE of how MySQL processes queries that reference those views. And the way MySQL processes view queries differs (significantly) from the way view queries are handled by other databases (e.g. Oracle, SQL Server).
Creating the composite index with player + happened_in
(in this particular order) columns is the best you can do in this case.
PS: don't test mysql optimizer behaviour on such small amount of rows, because it's likely to prefer fullscan over indexes. If you want to see what will happen in real life - fill it with real life-alike amount of data.
This doesn't directly answer the question, but it is a directly related workaround for others running into this issue. This achieves the same benefits of using a view, while minimizing the disadvantages.
I setup a PHP function to which I can send parameters, things to push into the inside to maximize index usage, rather than using them in a join or where clause outside a view. In the function you can formulate the SQL syntax for a derived table, and return that syntax. Then in the calling program, you can do something like this:
$table = tablesyntax(parameters);
select field1, field2 from {$table} as x... + other SQL
Thus you get the encapsulation benefits of the view, the ability to call it as if it is a view, but not the index limitations.