I want to get data from the database (MySQL) by JPA, I want it sorted by some column value.
So, what is the best practice, to:
- Retrieve the data from the database as list of objects (JPA), then sort it programmatically using some java APIs.
OR
- Let the database sort it by using a sorting select query.
Thanks in advance
Depends on the context.
TL;DR
If you have the full data in your application server, do it in the application server.
If you have the full dataset that you need on the application server side already then it is better to do it on the application server side because those servers can scale horizontally. The most likely scenarios for this are:
Don't do it on client side unless you can guarantee it won't impact the client devices.
Databases themselves may be optimized, but if you can pull burden away from them you can reduce your costs overall because scaling the databases up is more expensive than scaling up application servers.
I'm almost positive that it will be faster to allow the Database to sort it. There's engineers who spend a lot of time perfecting and optimizing their search algorithms, whereas you'll have to implement your own sorting algorithm which might add a few more computations.
I ran into this very same question, and decided that I should run a little benchmark to quantify the speed differences. The results surprised me. I would like to post my experience with this very sort of question.
As with a number of the other posters here, my thought was that the database layer would do the sort faster because they are supposedly tuned for this sort of thing. @Alex made a good point that if the database already has an index on the sort, then it will be faster. I wanted to answer the question which raw sorting is faster on non-indexed sorts. Note, I said faster, not simpler. I think in many cases letting the db do the work is simpler and less error prone.
My main assumption was that the sort would fit in main memory. Not all problems will fit here, but a good number do. For out of memory sorts, it may well be that databases shine here, though I did not test that. In the case of in memory sorts all of java/c/c++ outperformed mysql in my informal benchmark, if one could call it that.
I wish I had had more time to more thoroughly compare the database layer vs application layer, but alas other duties called. Still, I couldn't help but record this note for others who are traveling down this road.
As I started down this path I started to see more hurdles. Should I compare data transfer? How? Can I compare time to read db vs time to read a flat file in java? How to isolate the sort time vs data transfer time vs time to read the records? With these questions here was the methodology and timing numbers I came up with.
All times in ms unless otherwise posted
All sort routines were the defaults provided by the language (these are good enough for random sorted data)
All compilation was with a typical "release-profile" selected via netbeans with no customization unless otherwise posted
All tests for mysql used the following schema
First here is a little snippet to populate the DB. There may be easier ways, but this is what I did:
MYSQL Direct CLI results:
These timings were somewhat difficult as the only info I had was the time reported after the execution of the command.
from mysql prompt for 10000000 elements it is roughly 2.1 to 2.4 either for sorting bigint_value or float_value
Java JDBC mysql call (similar performance to doing sort from mysql cli)
Five runs:
Java Sort Generating random numbers on fly (actually was slower than disk IO read). Assignment time is the time to generate random numbers and populate the array
Calling like
Timing results:
These results were for reading a file of doubles in binary mode
Doing a buffer transfer results in much faster runtimes
C and C++ Timing results (see below for source)
Debug profile using qsort
Release profile using qsort
Release profile Using std::sort( a, a + ARRAY_SIZE );
Release profile Reading random data from file and using std::sort( a, a + ARRAY_SIZE )
Below is the source code used. Hopefully minimal bugs :)
Java source Note that internal to JavaSort the runCode and writeFlag need to be adjusted depending on what you want to time. Also note that the memory allocation happens in the for loop (thus testing GC, but I did not see any appreciable difference moving the allocation outside the loop)
C/C++ source