Architecture for data layer that uses both localSt

2019-03-09 02:42发布

问题:

Anybody has any ideas or references on how to implement a data persistence layer that uses both a localStorage and a REST remote storage:

The data of a certain client is stored with localStorage (using an ember-data indexedDB adapter). The locally stored data is synced with the remote server (using ember-data RESTadapter).

The server gathers all data from clients. Using mathematical sets notation:

Server = Client1 ∪ Client2 ∪ ... ∪ ClientN 

where, in general, any record may not be unique to a certain client:

ClientX ∩ ClientY ≠ 0,  ∀ X,Y ∈ [1,N]

Here are some scenarios:

  • A client creates a record. The id of the record can't be set on the client, since it may conflict with a record stored on the server. Therefore a newly created record needs to be committed to the server -> receive the id -> create the record in localStorage.

  • A record is updated on the server, and as a consequence the data in localStorage and in the server go out of sync. Only the server knows that, so the architecture needs to implement a push architecture (?)

Would you use 2 stores (one for localStorage, one for REST) and sync between them, or use a hybrid indexedDB/REST adapter and write the sync code within the adapter?

Can you see any way to avoid implementing push (Web Sockets, ...)?

回答1:

The problem you bring up can not be answered in a few paragraphs, or answered simply. Nevertheless, here is my try...

First, there are number of difficulties with the approach you have adopted:

  1. The clients must always be network connected to create data and receive the keys from the server.
  2. If you do make different stores (localstorage & REST), all application code requiring data must look in both stores. That significantly increases the complexity of every part of the application.
  3. After creating a row, if you want to create child rows, you must wait for the server to return the primary key before you can reference it as a foreign key in the child row. For any moderately complex data structures, this becomes a heavy burden.
  4. When the server goes down, all clients cannot create data.

Here is my approach. It uses SequelSphereDB, but most of the concepts can be reused across other client data management systems.

FIRST: Use UUIDs for Primary Keys.

Most client data management systems should provide a manner for generating Universally Unique IDs. SequelSphere does it simply with an SQL function: UUID(). Having a UUID as the primary key for each row allows primary keys to be generated on any client at any time without having to contact the server, and still guarantee that the IDs will be unique. This also consequently allows the application to work in an "offline" mode, not requiring a connection to the server during run-time. This also keeps a downed server from bringing all of the clients down.

SECOND: Use a single set of tables that mirror the server's.

This is more of a requirement for simplicity than anything else. It is also a requirement for the next two fundamental principles.

THIRD: For downward-synchronization of small datasets, completely refreshing client data from the server is preferable.

Whenever possible, perform complete refreshes of data on the client from the server. It is a simpler paradigm, and results in less internal data integrity issues. The primary drawback is data size in transfer.

FOURTH: For downward-synchronization of large datasets, perform 'Transactional' updates

This is where my approach gets a little more complex. If the datasets are too large, and require only changed rows to be sync'd, you must find a way to sync them according to "transactions". That is: the inserts/updates/deletes in the order in which they were performed on the server to provide a simple script for performing the same on the client.

It is preferable to have a table on the server recording the transactions to be sync'd down to the device. If this is not possible, then the order can often be recorded on the server using Timestamps on the rows, and having the client ask for all changes since a particular timestamp. Big Negative: you will need to keep track of deleted rows either by "logical" deletes, or by tracking them in their own table. Even still, isolating the complexity to the server is preferable to spreading it across all the clients.

FIFTH: For upward-synchronization, use 'Transactional' updates

This is where SequelSphereDB really shines: It will keep track for you of all the inserts, updates, and deletes performed against tables, and then provide them back to you at sync time. It even does it across browser restarts, since it persists the information in localstorage/indexeddb. It even handles commits and rollbacks appropriately. The client app can interact with the data as it normally would without having to give thought about recording the changes, and then use SequelSphereDB's "Change Trackers" to replay the changes at sync time.

If you are not using SequelSphere (you should be), then keep a separate table on the client to record all inserts, updates, and deletes that the client performs. Whenever the client application inserts/updates/deletes rows, make a copy of that in the "transaction" table. At upward sync time, send those. On the server, simply perform the same steps in the same order to replicate the data that was on the client.

ALSO IMPORTANT: Always perform an Upwards-sync before fully refreshing the client tables from the server. :)

Conclusion

I suggest going for simplicity over complexity in as many places as possible. Using UUIDs for primary keys is extremely helpful for this. Using some sort of "change trackers" is also very useful. Using a tool such as SequelSphereDB to track the changes for you is most helpful, but not necessary for this approach.

FULL DISCLOSURE: I am closely related to the company SequelSphere, but that product really isn't necessary for implementing the above approach.