Random record from MongoDB

2018-12-31 04:39发布

I am looking to get a random record from a huge (100 million record) mongodb.

What is the fastest and most efficient way to do so? The data is already there and there are no field in which I can generate a random number and obtain a random row.

Any suggestions?

标签: mongodb
25条回答
公子世无双
2楼-- · 2018-12-31 05:02

If you have a simple id key, you could store all the id's in an array, and then pick a random id. (Ruby answer):

ids = @coll.find({},fields:{_id:1}).to_a
@coll.find(ids.sample).first
查看更多
春风洒进眼中
3楼-- · 2018-12-31 05:03

Now you can use the aggregate. Example:

db.users.aggregate(
   [ { $sample: { size: 3 } } ]
)

See the doc.

查看更多
深知你不懂我心
4楼-- · 2018-12-31 05:04

Do a count of all records, generate a random number between 0 and the count, and then do:

db.yourCollection.find().limit(-1).skip(yourRandomNumber).next()
查看更多
深知你不懂我心
5楼-- · 2018-12-31 05:05

You can pick a random timestamp and search for the first object that was created afterwards. It will only scan a single document, though it doesn't necessarily give you a uniform distribution.

var randRec = function() {
    // replace with your collection
    var coll = db.collection
    // get unixtime of first and last record
    var min = coll.find().sort({_id: 1}).limit(1)[0]._id.getTimestamp() - 0;
    var max = coll.find().sort({_id: -1}).limit(1)[0]._id.getTimestamp() - 0;

    // allow to pass additional query params
    return function(query) {
        if (typeof query === 'undefined') query = {}
        var randTime = Math.round(Math.random() * (max - min)) + min;
        var hexSeconds = Math.floor(randTime / 1000).toString(16);
        var id = ObjectId(hexSeconds + "0000000000000000");
        query._id = {$gte: id}
        return coll.find(query).limit(1)
    };
}();
查看更多
美炸的是我
6楼-- · 2018-12-31 05:05

If you're using mongoid, the document-to-object wrapper, you can do the following in Ruby. (Assuming your model is User)

User.all.to_a[rand(User.count)]

In my .irbrc, I have

def rando klass
    klass.all.to_a[rand(klass.count)]
end

so in rails console, I can do, for example,

rando User
rando Article

to get documents randomly from any collection.

查看更多
有味是清欢
7楼-- · 2018-12-31 05:08

My solution on php:

/**
 * Get random docs from Mongo
 * @param $collection
 * @param $where
 * @param $fields
 * @param $limit
 * @author happy-code
 * @url happy-code.com
 */
private function _mongodb_get_random (MongoCollection $collection, $where = array(), $fields = array(), $limit = false) {

    // Total docs
    $count = $collection->find($where, $fields)->count();

    if (!$limit) {
        // Get all docs
        $limit = $count;
    }

    $data = array();
    for( $i = 0; $i < $limit; $i++ ) {

        // Skip documents
        $skip = rand(0, ($count-1) );
        if ($skip !== 0) {
            $doc = $collection->find($where, $fields)->skip($skip)->limit(1)->getNext();
        } else {
            $doc = $collection->find($where, $fields)->limit(1)->getNext();
        }

        if (is_array($doc)) {
            // Catch document
            $data[ $doc['_id']->{'$id'} ] = $doc;
            // Ignore current document when making the next iteration
            $where['_id']['$nin'][] = $doc['_id'];
        }

        // Every iteration catch document and decrease in the total number of document
        $count--;

    }

    return $data;
}
查看更多
登录 后发表回答