how to get the most frequent items

I am working on an application which has a large array containing lines of numbers,

transNum[20000][200]//this is the 2d array containing the numbers and always keep track of the line numbers

I am using a nested loop to look for the most frequent items. which is

for(int i=0/*,lineitems=0*/;i<lineCounter;i++)
  {
      for(int j=0,shows=1;j<lineitem1[i];j++)
      {
          for(int t=i+1;t<lineCounter;t++)
          {
              for(int s=0;s<lineitem1[t];s++)
              {
                  if(transNum[i][j]==transNum[t][s])
                      shows++;
              }
          }

          if(shows/lineCounter>=0.2)
          {

              freItem[i][lineitem2[i]]=transNum[i][j];
              lineitem2[i]++;
          }
      }

  }

when I was doing tests using small input arrays like test[200][200], this loop works fine and the computing time is acceptable, but when I try to process the array contains 12000 lines, the computing time is too long, so I am thinking if there are other ways to compute the frequent items rather than using this loop.I just ran a test on 10688 lines, and the time to get all the frequent item is 825805ms, which is way to expensive.

标签： java arrays frequency

4条回答

劫难

2楼-- · 2019-08-23 12:34

Depends on your input. If you are also inserting the data in the same code then you can count frequent items as you insert them.

Here is a pseudo-C solution:

int counts[1000000];

while(each number as n)
{
    counts[n]++;
    // then insert number into array
}

EDIT #2: Make sure, so you don't get unexpected results, to initialize all the items in the array to zero.

0人赞添加讨论(0) 举报

唯我独甜

3楼-- · 2019-08-23 12:37

Gave the algorithm for this one some thought. Here's the solution I came up with:

import java.util.ArrayList;
import java.util.Collections;
import java.util.Comparator;
import java.util.HashMap;
import java.util.Iterator;
import java.util.List;
import java.util.Map;
import java.util.Random;

public class NumberTotalizerTest {

    public static void main(String args[]) {

        HashMap<Integer,Integer> hashMap = new HashMap<Integer,Integer>();

        // Number input
        Random randomGenerator = new Random();
        for (int i = 1; i <= 50; ++i ) {
            int randomInt = randomGenerator.nextInt(15);
            System.out.println("Generated : " + randomInt);

            Integer tempInt = hashMap.get(randomInt);

            // Counting takes place here
            hashMap.put(randomInt, tempInt==null?1:(tempInt+1) );
        }

        // Sorting and display
        Iterator itr =  sortByValue(hashMap).iterator();

        System.out.println( "Occurences from lowest to highest:" );

        while(itr.hasNext()){
            int key = (Integer) itr.next();

            System.out.println( "Number: " + key + ", occurences: " + hashMap.get(key));
        }
    }

     public static List sortByValue(final Map m) {
        List keys = new ArrayList();
        keys.addAll(m.keySet());
        Collections.sort(keys, new Comparator() {
            public int compare(Object o1, Object o2) {
                Object v1 = m.get(o1);
                Object v2 = m.get(o2);
                if (v1 == null) {
                    return (v2 == null) ? 0 : 1;
                }
                else if (v1 instanceof Comparable) {
                    return ((Comparable) v1).compareTo(v2);
                }
                else {
                    return 0;
                }
            }
        });
        return keys;
    }
}

0人赞添加讨论(0) 举报

叛逆

4楼-- · 2019-08-23 12:48

Bear in mind this is an O(n^2) algorithm at best and could be worse. That means the number of operations is proportional to the count of the items squared. After a certain number of lines, performance will degrade rapidly and there's nothing you can do about it except to improve the algorithm.

0人赞添加讨论(0) 举报

Anthone

5楼-- · 2019-08-23 12:50

The Multiset implementation from Google Guava project might be useful in such cases. You could store items there and then retrieve set of values with count of each occurrence.

0人赞添加讨论(0) 举报

how to get the most frequent items

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间