如何PyMongo已知领域的指数未知领域？(How to index unknown fields

我试图找到数以百万计的tweets的唯一词，也是我想保持每个单词出现在那里。除此之外，我还对他们最初的分组的话。下面是一个示例代码：

from pymongo import UpdateOne
# connect to db stuff
for word in words: # this is actually not the real loop I've used but it fits for this example
    # assume tweet_id's and position is calculated here
    initial = word[0]
    ret = {"tweet_id": tweet_id, "pos": (beg, end)} # additional information about word
    command = UpdateOne({"initial": initial}, {"$inc": {"count": 1}, "$push": {"words.%s" % word: ret}}, upsert=True)
    commands.append(command)
    if len(commands) % 1000 == 0:
        db.tweet_words.bulk_write(commands, ordered=False)
        commands = []

然而，这是分析所有的鸣叫方式缓慢。我猜测，我的出现问题，因为我不使用的索引words场。

下面是一个文档的示例输出：

{
    initial: "t"
    count: 3,
    words: {
        "the": [{"tweet_id": <some-tweet-id>, "pos": (2, 5)}, 
                {"tweet_id": <some-other-tweet-id>, "pos": (9, 12)}]
        "turkish": [{"tweet_id": <some-tweet-id>, "pos": (5, 11)}]
    }
}

我试图创建使用下面的代码（失败）指标：

db.tweet_words.create_index([("words.$**", pymongo.TEXT)])

要么

db.tweet_words.create_index([("words", pymongo.HASHED)])

像我有错误add index fails, too many indexes for twitter.tweet_words或key too large to index 。有没有办法使用索引来做到这一点？还是应该改变我的方法的问题（也许重新设计DB）？

要建立索引，你必须保持你的动态数据中的对象，而不是键的值。所以，我建议你返工你的架构是这样的：

{
    initial: "t"
    count: 3,
    words: [
        {value: "the", tweets: [{"tweet_id": <some-tweet-id>, "pos": (2, 5)}, 
                                {"tweet_id": <some-other-tweet-id>, "pos": (9, 12)}]},
        {value: "turkish", tweets: [{"tweet_id": <some-tweet-id>, "pos": (5, 11)}]}
    ]
}

然后你可以索引：

db.tweet_words.create_index([("words.value", pymongo.TEXT)])

如何PyMongo已知领域的指数未知领域？(How to index unknown fields

Answer 1:

收藏的人(0)

如何PyMongo已知领域的指数未知领域？(How to index unknown fields

Answer 1:

收藏的人(0)

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮