Processing a large amount of data efficiently with

I need to find a way to efficiently process a large amount of data in PHP/MySQL. Here's the situation:

I have a database table with, say, one million records. Based on user input from PHP, I need to rank all of those one million records according to a non-trivial calculation step, so that I can pick the top scoring items. My problem is that this scales very poorly from a memory usage standpoint, particularly at the sorting step, if I rearrange the data into columns and use array_multisort.

Alternative methods I can think of are:

Doing the calculations in PHP and reinserting the data with the scores into a temporary table, retrieving the highest scoring items using a SELECT ... ORDER BY score ... LIMIT query
Doing the calculations in PHP and outputting the data with the scores into a CSV file, then calling the command-line sort utility, then reading in the top X number of lines
Doing the calculations in MySQL using a stored procedure and retrieving the top X number of items, as in option 1. My concern with this is whether the DB is well suited for the number crunching this would involve

This has got to be a fairly common problem for things like search engines. Scalability is the number one priority, but performance has to be pretty good too. Is one of these approaches best, or is there some other great option that I'm not even considering?

标签： php mysql

3条回答

够拽才男人

2楼-- · 2020-02-11 08:16

Doing it in a complex ORDER BY is the best out of all the options you mentioned, if it's possible. But for one million records you will still encounter problems.

Sounds like you are doing things the hard way, trying to keep all your data in one place and order it on the fly. Is there no way to do scoring beforehand, even if it's a few scores you combine at request time for a custom sort key?

How complex are the user's queries? If you are trying to do fulltext search you should get software specialized for that task.

0人赞添加讨论(0) 举报

Explosion°爆炸

3楼-- · 2020-02-11 08:22

Assuming your dataset is too large to store in memory.... If you only need the top n items, you can keep only the top results in memory as you page through the 1 million rows. This would also work with the temporary table idea of yours, writing the top records from each batch.

Another option would be to write a user defined function:

http://dev.mysql.com/doc/refman/5.1/en/adding-functions.html

0人赞添加讨论(0) 举报

爷、活的狠高调

4楼-- · 2020-02-11 08:32

Why not do part or all of your calculation when you store the row. That way you only have to do it once and you have lots of time to do it in.

0人赞添加讨论(0) 举报

Processing a large amount of data efficiently with

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间