I am working on an application which has a large array containing lines of numbers,
transNum[20000][200]//this is the 2d array containing the numbers and always keep track of the line numbers
I am using a nested loop to look for the most frequent items. which is
for(int i=0/*,lineitems=0*/;i<lineCounter;i++)
{
for(int j=0,shows=1;j<lineitem1[i];j++)
{
for(int t=i+1;t<lineCounter;t++)
{
for(int s=0;s<lineitem1[t];s++)
{
if(transNum[i][j]==transNum[t][s])
shows++;
}
}
if(shows/lineCounter>=0.2)
{
freItem[i][lineitem2[i]]=transNum[i][j];
lineitem2[i]++;
}
}
}
when I was doing tests using small input arrays like test[200][200], this loop works fine and the computing time is acceptable, but when I try to process the array contains 12000 lines, the computing time is too long, so I am thinking if there are other ways to compute the frequent items rather than using this loop.I just ran a test on 10688 lines, and the time to get all the frequent item is 825805ms, which is way to expensive.
Depends on your input. If you are also inserting the data in the same code then you can count frequent items as you insert them.
Here is a pseudo-C solution:
EDIT #2: Make sure, so you don't get unexpected results, to initialize all the items in the array to zero.
Gave the algorithm for this one some thought. Here's the solution I came up with:
Bear in mind this is an O(n^2) algorithm at best and could be worse. That means the number of operations is proportional to the count of the items squared. After a certain number of lines, performance will degrade rapidly and there's nothing you can do about it except to improve the algorithm.
The Multiset implementation from Google Guava project might be useful in such cases. You could store items there and then retrieve set of values with count of each occurrence.