database sort vs. programmatic java sort

2020-01-30 04:11发布

I want to get data from the database (MySQL) by JPA, I want it sorted by some column value.

So, what is the best practice, to:

  • Retrieve the data from the database as list of objects (JPA), then sort it programmatically using some java APIs.

OR

  • Let the database sort it by using a sorting select query.

Thanks in advance

标签: java sorting jpa
9条回答
Deceive 欺骗
2楼-- · 2020-01-30 04:21

This is not completely on point, but I posted something recently that relates to database vs. application-side sorting. The article is about a .net technique, so most of it likely won't be interesting to you, but the basic principles remain:

Deferring sorting to the client side (e.g. jQuery, Dataset/Dataview sorting) may look tempting. And it actually is a viable option for paging, sorting and filtering, if (and only if):

1. the set of data is small, and

1. there is little concern about performance and scalability

From my experience, the systems that meet this kind of criteria are few and far between. Note that it’s not possible to mix and match sorting/paging in the application/database—if you ask the database for an unsorted 100 rows of data, then sort those rows on the application side, you’re likely not going to get the set of data you were expecting. This may seem obvious, but I’ve seen the mistake made enough times that I wanted to at least mention it.

It is much more efficient to sort and filter in the database for a number of reasons. For one thing, database engines are highly optimized for doing exactly the kind of work that sorting and filtering entail; this is what their underlying code was designed to do. But even barring that—even assuming you could write code that could match the kind of sorting, filtering and paging performance of a mature database engine—it’s still preferable to do this work in the database, for the simple reason that it’s more efficient to limit the amount of data that is transferred from the database to the application server.

So for example, if you have 10,000 rows before filtering, and your query pares that number down to 75, filtering on the client results in the data from all 10,000 rows being passed over the wire (and into your app server’s memory), where filtering on the database side would result in only the filtered 75 rows being moved between database and application. his can make a huge impact on performance and scalability.

The full post is here: http://psandler.wordpress.com/2009/11/20/dynamic-search-objects-part-5sorting/

查看更多
淡お忘
3楼-- · 2020-01-30 04:21

I would let the database do the sort, they are generally very good at that.

查看更多
Fickle 薄情
4楼-- · 2020-01-30 04:24

If you are retrieving a subset of all the database data, for example displaying 20 rows on screen out of 1000, it is better to sort on the database. This will be faster and easier and will allow you to retrieve one page of rows (20, 50, 100) at a time instead of all of them.

If your dataset is fairly small, sorting in your code may be more convenient if you want implement a complex sort. Usually this complex sort can be done in SQL but not as easily as in code.

The short of it is, the rule of thumb is sort via SQL, with some edge cases to the rule.

查看更多
smile是对你的礼貌
5楼-- · 2020-01-30 04:24

In general, you're better off using ORDER BY in your SQL query -- this way, if there is an applicable index, you may be getting your sorting "for free" (worst case, it will be the same amount of work as doing it in your code, but often it may be less work than that!).

查看更多
兄弟一词,经得起流年.
6楼-- · 2020-01-30 04:26

Well, there is not really a straightforward way to answer this; it must be answered in the context.

Is your application (middle tier) is running in the same node as the database?

If yes, you do not have to worry about the latency between the database and middle tier. Then the question becomes: How big is the subset/resultset of your query? Remember that to sort this is middle tier, you will take a list/set of size N, and either write a custom comparator or use the default Collection comparator. Or, whatever. So at the outset, you are setback by the size N.

But if the answer is no, then you are hit by the latency involved in transferring your resultset from DB to middle tier. And then if you are performing pagination, which is the last thing you should do, you are throwing away 90-95% of that resultset after cutting the pages.

So the wasted bandwidth cannot be justified. Imagine doing this for every request, across your tenant organizations.

However way you look at it, this is bad design.

I would do this in the database, no matter what. Just because almost all applications today demand pagination; even if they don't sending massive resultsets over the wire to your client is a total waste; drags everybody down across all your tenants.

One interesting idea that I am toying with these days is to harness the power of HTML5, 2-way data binding in browser frameworks like Angular, and push some processing back to the browser. That way, you dont end up waiting in the line for someone else before you to finish. True distributed processing. But care must be taken in deciding what can be pushed and what not.

查看更多
闹够了就滚
7楼-- · 2020-01-30 04:36

Let the database sort it. Then you can have paging with JPA easily without readin in the whole resultset.

查看更多
登录 后发表回答