The Most Efficient Way To Find Top K Frequent Word-第4页回答

Input: A positive integer K and a big text. The text can actually be viewed as word sequence. So we don't have to worry about how to break down it into word sequence.
Output: The most frequent K words in the text.

My thinking is like this.

use a Hash table to record all words' frequency while traverse the whole word sequence. In this phase, the key is "word" and the value is "word-frequency". This takes O(n) time.
sort the (word, word-frequency) pair; and the key is "word-frequency". This takes O(n*lg(n)) time with normal sorting algorithm.
After sorting, we just take the first K words. This takes O(K) time.

To summarize, the total time is O(n+nlg(n)+K)， Since K is surely smaller than N, so it is actually O(nlg(n)).

We can improve this. Actually, we just want top K words. Other words' frequency is not concern for us. So, we can use "partial Heap sorting". For step 2) and 3), we don't just do sorting. Instead, we change it to be

2') build a heap of (word, word-frequency) pair with "word-frequency" as key. It takes O(n) time to build a heap;

3') extract top K words from the heap. Each extraction is O(lg(n)). So, total time is O(k*lg(n)).

To summarize, this solution cost time O(n+k*lg(n)).

This is just my thought. I haven't find out way to improve step 1).
I Hope some Information Retrieval experts can shed more light on this question.

标签： algorithm word-frequency

19条回答

永恒的永恒

2楼-- · 2019-01-02 20:27

Simplest code to get the occurrence of most frequently used word.

 function strOccurence(str){
    var arr = str.split(" ");
    var length = arr.length,temp = {},max; 
    while(length--){
    if(temp[arr[length]] == undefined && arr[length].trim().length > 0)
    {
        temp[arr[length]] = 1;
    }
    else if(arr[length].trim().length > 0)
    {
        temp[arr[length]] = temp[arr[length]] + 1;

    }
}
    console.log(temp);
    var max = [];
    for(i in temp)
    {
        max[temp[i]] = i;
    }
    console.log(max[max.length])
   //if you want second highest
   console.log(max[max.length - 2])
}

0人赞添加讨论(0) 举报

上一页 1 2 3 4

The Most Efficient Way To Find Top K Frequent Word

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间