Efficiency when inserting into mongodb (pymongo)

2019-07-30 06:21发布

问题:

Updated for clarity: I need advice for performance when inserting/appending to a capped collection. I have two python scripts running:

(1) Tailing the cursor.

while WSHandler.cursor.alive:
        try:
            doc = WSHandler.cursor.next()
            self.render(doc)

(2) Inserting like so:

def on_data(self, data):                      #Tweepy
    if (len(data) > 5):
        data = json.loads(data)
        coll.insert(data)                     #insert into mongodb
        #print(coll.count())
        #print(data)

and it's running fine for a while (at 50 inserts/second). Then, after 20-60secs, it stumbles, hits the cpu roof (though it was running at 20% before), and never recovers. My mongostats take a dive (the dive is shown below).

Mongostat output:

The CPU is now choked, by the processes doing the insertion (at least according to htop).

When I run the Tweepy lines above with print(data) instead of adding it to db (coll.insert(data)), everything's running along fine at 15% cpu use.

What I see in mongostats:

  • res keeps climbing. (Though clogs may happen at 40m as well as run fine on 100m.)
  • flushes do not seem to interfere.
  • locked % is stable at 0.1%. Would this lead to clogging eventually?

(I'm running AWS microinstance; pymongo.)

回答1:

I would suggest using mongostat while running your tests. There are many things that could be wrong but mongostat will give you a good indication.

http://docs.mongodb.org/manual/reference/mongostat/

The first two things I would look at are the lock percentage and the data throughput. With reasonable throughput on dedicated machines I typically get into the 1000-2000 updates/inserts per second before suffering any degradation. This has been the case for several large production deployments I have worked with.