Caching JSON objects on server side

2020-06-05 04:34发布

问题:

I have a server which contains the data to be served upon API requests from mobile clients. The data is kind of persistent and update frequency is very low (say once in a week). But the table design is pretty heavy which makes API requests to be served slowly

The Web Service is implemented with Yii + Postgre SQL.

  • Using memcached is a way to solve this problem? If yes, how can I manage, if the cached data becomes dirty?
  • Any alternative solution for this? Postgre has any built-in mechanism like MEMORY in MySQL?
  • How about redis?

回答1:

You could use memcached, but again everybody would hit you database server. In your case, you are saying the query results are kind of persistent so it might make more sense to cache the JSON responses from your Web Service.

This could be done using a Reverse Proxy with a built in cache. I guess an example might help you the most how we do it with Jetty (Java) and NGINX:

In our setup, we have a Jetty (Java) instance serving an API for our mobile clients. The API is listening on localhost:8080/api and returning JSON results fetched from some queries on a local Mysql database.

At this point, we could serve the API directly to our clients, but here comes the Reverse Proxy:

In front of the API sits an NGINX webserver listening from 0.0.0.0:80/ (everywhere, port 80) When a mobile client connects to 0.0.0.0:80/api the built-in Reverse Proxy tries to fetch the exact query string from it's cache. If this fails, it fetches it from localhost:8080/api, puts it in it's cache and serves the new value found in the cache.

Benefits:

  • You can use other NGINX goodies: automatic GZIP compression of the cached JSON files
  • SSL endpoint termination at NGINX.
  • NGINX workers might benefit you, when you have a lot more connections, all requesting data from the cache.
  • You can consolidate your service endpoints

Think about cache-invalidation:

You have to think about cache-invalidation. You can tell NGINX to hold on it's cache, say, for a week for all HTTP 200 request for localhost:8080/api, or 1 minute for all other HTTP status codes. But if there comes the time, where you want to update the API in under a week, the cache is invalid, so you have to delete it somehow or turn down the caching time to an hour or day (so that most people will hit the cache).

This is what we do: We chose to delete the cache, when it is dirty. We have another JOB running on the Server listening to an Update-API event triggered via Puppet. The JOB will take care of clearing the NGINX cache for us.

Another idea would be to add the clearing cache function inside your Web Service. The reason we decided against this solution is: The Web Service would have to know it runs behind a reverse proxy, which breaks separation of concerns. But I would say, it depends on what you are planning.

Another thing, which would make your Web Service more right would be to serve correct ETAG and cache-expires headers with each JSON file. Again, we did not do that, because we have one big Update Event, instead of small ones for each file.

Side notes:

  • You do not have to use NGINX, but it really easy to configure
  • NGINX and Apache have SSL support
  • There is also the famous Reverse Proxy (https://www.varnish-cache.org), but to my knowledge it does not do SSL (yet?)

So, if you were to use Varnish in front of your Web Service + SSL, you would use a configuration like: NGINX -> Varnish -> Web Service.

References: - NGINX server: http://nginx.com - Varnish Reverse Proxy: https://www.varnish-cache.org - Puppet IT Automation: https://puppetlabs.com - NGINX reverse proxy tutorial: http://www.cyberciti.biz/faq/howto-linux-unix-setup-nginx-ssl-proxy/ http://www.cyberciti.biz/tips/using-nginx-as-reverse-proxy.html