Incremental updates using browser cache

2019-06-17 08:59发布

The client (an AngularJS application) gets rather big lists from the server. The lists may have hundreds or thousands of elements, which can mean a few megabytes uncompressed (and some users (admins) get much more data).

I'm not planning to let the client get partial results as sorting and filtering should not bother the server.

Compression works fine (factor of about 10) and as the lists don't change often, 304 NOT MODIFIED helps a lot, too. But another important optimization is missing:

As a typical change of the lists are rather small (e.g., modifying two elements and adding a new one), transferring the changes only sounds like a good idea. I wonder how to do it properly.

Something like GET /offer/123/items should always return all the items in the offer number 123, right? Compression and 304 can be used here, but no incremental update. A request like GET /offer/123/items?since=1495765733 sounds like the way to go, but then browser caching does not get used:

  • either nothing has changed and the answer is empty (and caching it makes no sense)
  • or something has changed, the client updates its state and does never ask for changes since 1495765733 anymore (and caching it makes even less sense)

Obviously, when using the "since" query, nothing will be cached for the "resource" (the original query gets used just once or not at all).

So I can't rely on the browser cache and I can only use localStorage or sessionStorage, which have a few downsides:

  • it's limited to a few megabytes (the browser HTTP cache may be much bigger and gets handled automatically)
  • I have to implement some replacement strategy when I hit the limit
  • the browser cache stores already compressed data which I don't get (I'd have to re-compress them)
  • it doesn't work for the users (admins) getting bigger lists as even a single list may already be over limit
  • it gets emptied on logout (a customer's requirement)

Given that there's HTML 5 and HTTP 2.0, that's pretty unsatisfactory. What am I missing?

Is it possible to use the browser HTTP cache together with incremental updates?

3条回答
Viruses.
2楼-- · 2019-06-17 09:11

I think there is one thing you are missing: in short, headers. What I'm thinking you could do and that would match (most) of your requirements, would be to:

  • First GET /offer/123/items is done normally, nothing special.
  • Subsequents GET /offer/123/items will be sent with a Fetched-At: 1495765733 header, indicating your server when the initial request has been sent.

From this point on, two scenarios are possible.

  • Either there is no change, and you can send the 304.
  • If there is a change however, return the new items since the time stamp previously sent has headers, but set a Cache-Control: no-cache from your response.

This leaves you to the point where you can have incremental updates, with caching of the initial megabytes-sized elements.

There is still one drawback though, that the caching is only done once, it won't cache updates. You said that your lists are not updated often so it might already work for you, but if you really want to push this further, I could think of one more thing.

Upon receiving an incremental update, you could trigger in the background another request without the Fetched-At header that won't be used at all by your application, but will just be there to update your http cache. It should not be as bad as it sounds performance-wise since your framework won't update its data with the new one (and potentially trigger re-renders), the only notable drawback would be in term of network and memory consumption. On mobile it might be problematic, but it doesn't sounds like an app intended to be displayed on them anyway.

I absolutely don't know your use-case and will just throw that out there, but are you really sure that doing some sort of pagination won't work? Megabytes of data sounds a lot to display and process for normal humans ;)

查看更多
Anthone
3楼-- · 2019-06-17 09:17

I would ditch the request/response cycle entirely and move to a push model. Specifically, WebSockets.

This is the standard technology used on financial trading websites serving tables of real-time ticker data. Here is one such production application demonstrating the power of WebSockets:

https://www.poloniex.com/exchange#btc_eth

WebSocket applications have two types of state: global and user. The above link will show three tables of global data. When you're logged in, two aditional tables of user data are displayed at the bottom.

This is not HTTP; you won't be able to just slap this into a Java Servlet. You'll need to run a separate process on your server which communicates over TCP. The good news is, there are mature solutions readily available. A Java-based solution with a very decent free licensing option, which includes both client and server APIs (and does integrate with Angular2) is Lightstreamer. They have a well-organized demo page too. There are also adapters available to integrate with your data sources.

You may be hesitant to ditch your existing servlet approach, but this will be less headaches in the long run, and scales marvelously. HTTP polling, even with well-designed header-only requests, do not scale well with large lists which update frequently.

---------- EDIT ----------

Since the list updates are infrequent, WebSockets are probably overkill. Based on the further details provided by comments on this answer, I would recommend a DOM-based, AJAX-updated sorter and filterer such as DataTables, which has some built-in options for caching. In order to reuse client data across sessions, ajax requests in the previous link should be modified to save the current data in the table to localStorage after every ajax request, and when the client starts a new session, populate the table with this data. This will allow the plugin to manage the filtering, sorting, caching and browser-based persistence.

查看更多
We Are One
4楼-- · 2019-06-17 09:32

I'm thinking about something similar to Aperçu's idea, but using two requests. The idea is yet incomplete, so bear with me...

  • The client asks for GET /offer/123/items, possibly with the ETag and Fetched-At headers.

The server answers with

  • 200 and a full list if either header is missing, or when there are too many changes since the Fetched-At timestamp
  • 304 if nothing has changed since then
  • 304 and a special Fetch-More header telling the client that more data is to be fetched otherwise

The last case is violating how HTTP should work, but AFAIK it's the only way letting the browser cache everything what I want it to cache. Since the whole communication is encrypted, proxies can't punish me for violating the spec.

The client reacts to Fetch-Errata by requesting GET /offer/123/items/errata. This way, the resource has got split into two requests. The split is ugly, but an angular $http interceptor can hide the ugliness from the application.

The second request is cacheable, too, and there can be also a Fetched-At header. The details are unclear, but some strong handwavium makes me believe that it can work. Actually, the errata could itself be inaccurate but still useful and get an errata itself.... etc.

With HTTP/1.1, more requests may mean more latency, but having a couple of them should still be profitable because of the saved bandwidth. The server can decide when to stop.

With HTTP/2, multiple requests could be send at once. The server could be make to handle them efficiently as it knows that they belong together. Some more handwavium...

I find the idea strange, but interesting and I'm looking forward to comments. Feel free to downvote me, but please leave an explanation.

查看更多
登录 后发表回答