Which distributed database I need to choose for me

2019-08-23 17:33发布

问题:

Now we have java project with PostgreSQL database on spring boot 2 with Spring Data JPA (Hibernate).

Requirements to new architecture: On N computers we have workplace. Each workplace use the same program with different configuration (configured client for redistributed database). Computers count is not big - amount 10/20 PCs. Database must be scalable (a lot of data can be stored at the disk ~1/2 Tb). Every day up to 1 million rows can be inserted into database from one workplace. Each workplace works with redistributed database - it means, that each node must be able to read/write data, modified by each other. And make some decision based on data, modified by another workplace at runtime(Transactional). Datastore(disk database archive) must be able to archived and copied as backup snapshot. Project must be portable to new architecture with Spring Data JPA 2 and database backups with liquibase. Works on windows/ Linux.

The quick overview shows me that the most popular redistributed FREE database at now are: 1) Redis 2) Apache Ignite 3) Hazelcast

I need help in understanding way to architect described system. First of all, I'm tried to use redis and ignite. Redis start easily - but it works like simple IMDG(in memory data grid). But I need to store all the data in persistent database(at disk, like ignite persistence). There is a way to use redis with existing PostgreSQL database? Postgres synchronized with all nodes and Redis use in memory cache with fresh data, produced by each workplace. Each 10 minutes data flushed at disk. 1) This is possible? How?

Also I'm tried to use Ignite - but my project works on spring boot 2. Spring data 2. And Ignite last released version is 2.6 and spring data 2 support will appears only in apache ignite 2.7! 2) I have to download 2.7 version nightly build, but how can I use it in my project? (need to install to local Maven repository?)

3) And after all, what will be the best architecture in that case? Datastore provider stores persistent data at disk, synchronized with each workspace In-memory cache and persist in-memory data to disk by timeout? What will be the best solution and which database I should to choose? (may be something works with existing PostgreSQL?)

Thx)

回答1:

Your use case sounds like a common one with Hazelcast. You can store your data in memory (i.e. in an Hazelcast IMap), use a MapStore/MapLoader to persist changes to your database, or read from database. Persisting changes can be done in a write-through or write-behind manner based on your configuration. Also there is spring boot and spring-jpa integration available.

Also the amount of data you want to store is pretty big for 10-20 machines, so you might want to look into hazelcast High-Density Memory Store option to be able to store large amounts of data in commodity hardware without having GC problems.

Following links should give you further idea:

  • https://opencredo.com/spring-booting-hazelcast/
  • https://docs.hazelcast.org//docs/3.11/manual/html-single/index.html#loading-and-storing-persistent-data
  • https://hazelcast.com/products/high-density-memory-store/


回答2:

Ignite is not suitable for that options, because JPA 1 supports only. Redis isn't supports SQL queries. Our choiсe is plain PostgreSQL master with slave replication. May be cockroachDB applies also. Thx for help))