Synchronize mongo databases on different servers

2020-06-16 06:03发布

问题:

I have next situation. I have two mongodb instances on different servers. For example

Mongodb instance on server "one" (host1:27017) with database: "test1"
Mongodb instance on server "two" (host2:27017) with database: "test2"

Now, i need to synchronize "test1" database from "host1:27017" with "test2" from "host2:27017".

By "synchronize" I mean next:

  1. If some collection from "test1" database doesn't exist in "test2" then this collection should be full copied in "test1" database.

  2. If some record from collection doesn't exist in "test2" database, then must be added otherwise updated. If record not exist in A collection in "test1" database, but exist in A collection in "test2" database, then record must be deleted from "test2".

By the way here is problem. For example: "test1" database has collection "A" with the following documents:

{
 _id: "1",
 name: "some name"
}

"test2" database has collection "A" with the following documents:

{
 _id: "1",
 name: "some name"
}

{
 _id: "2",
 name: "some name2"
}

If I perform db.copyDatabase('test1', 'test2', "host2:27017") I get error:

"errmsg" : "exception: E11000 duplicate key error index: test1.A.$id dup key: { : \"1\" }"

Same with cloneDatabase command. How I can resolve it ?

In general what are the ways to synchronize databases? I know what the simplest way is just copy files from one server to second, but maybe there are better ways.

Please help. I'm newcomer in mongo. Thanks.

回答1:

I haven't tried this, but the current MongoDB documents describe a replication set equivalent to master-slave replication:

Deploy Master-Slave Equivalent using Replica Sets

If you want a replication configuration that resembles master-slave replication, using replica sets, consider the following replica configuration document. In this deployment hosts and 1 provide replication that is roughly equivalent to a two-instance master-slave deployment:

{
   _id : 'setName',
   members : [
              { _id : 0, host : "<master>", priority : 1 },
              { _id : 1, host : "<slave>", priority : 0, votes : 0 }
  ]
}

See Replica Set Configuration for more information about replica set configurations.



回答2:

Use _id instead of id. There is no need to declare it in your model.

if you have plenty of servers

I use on each server a small prehook which creates a controlled unique _id. The mongoose _id is built very logical (https://docs.mongodb.com/manual/reference/method/ObjectId/#ObjectIDs-BSONObjectIDSpecification), the digits 0,6 are the machine identifier. I just control these digits because I have multiple servers and I want to assure there is no collusion. If you have just a few, it is probably no risk to not do this. And even in my case I think it is too paranoid.

exports.useProcessId = ()->
  return process.env.INSTANCE_PROCESS_ID? && process.env.INSTANCE_PROCESS_ID.length == 4

exports.manipulateMongooseId = (id) ->
  id = id.toString()
  newId = new ObjectId(id.slice(0,6) + process.env.INSTANCE_PROCESS_ID + id.slice(10,24))
  return newId

schema

mymOdelSchema.pre('save', (next) ->
  data = @
  async.parallel
    myModel: (next)->
      myModelValidator.base(data, next)
    changeMongooseId: (next)->
      if useProcessId && instanceType == 'manager' then processIdConfig.changeMongooseId(data, next) else return next()
    (err)->
      return

 next new Error(err) if err?
      return next()
)