I'm currently writing a webapp in Ruby on Rails that matches users based on questions they answered. Then they can search for a range of users, the system matches the searcher with every user that falls into that range, and returns them in an ordered list so the highest match comes first.
The problem is that this operation is such a heavy workload that I don't think I can do that just on the fly. I've already optimized my SQL to the max and realized my matching algorithm completely in one SQL query, which takes about 8.2ms to calculate the match percentage between 2 users (local machine). The thing is when there are 5000 users that got searched, Rails takes this array of users, iterates through them and performs this query 5000 times, which takes on my local machine about 50 seconds. Could I reduce this if I move to PostgresSQL and make this a stored procedure?
My question now is, what ways are there e.g. background processes, caches so that when the user presses search it would only take a few seconds for the results to show up? Or isn't this possible in this magnitude and I have to precompute the matches and store them in a NoSQL or something like that, since for 50k users there would already be 2.5 billion rows.
One way is to try to have one SQL query. Right now you are doing one query per user, but I mean one query over all. So the one query would be doing the work you are doing when you loop through the users.
You can do a database cache, and daily store the results for each user. You don't need a NoSQL data store for this, just a cron job to write the results to the database.
You could also store the results in memcache. The memcache would be shared between instances of Rails for your web app, so one copy would be available for all the instances. I would access the results through a method which checks for expiration conditions to test if it needs to refresh the data.