So lets say i have a struct
struct largestOwners
{
string name;
double amountOwned;
};
And i am reading it in from a file using ifstream with 300 names and amounts.
How can i go about keeping track of the highest 5 numbers during input? So i dont have to sort after, but rather track it during ifstream input.
My goal is to keep track of the 5 highest amounts during input so i can easily print it out later. And save time/processing rather than to do it in the future
I get i can store this in an array or another struct, but whats a good algorithm for tracking this during ifstream input to the struct?
Lets say the text file looks like this, when im reading it in.
4025025 Tony
66636 John
25 Tom
23693296 Brady
363 Bradley
6200 Tim
Thanks!
You can use sorted buffer of 5 elements and on each step if item is higher than lowest item of the buffer, put item in buffer and evict lowest
Use a
map
of elementsFirst Create a class
If the size of largest is < 5, then you haven't read five elements.
Otherwise - if it is larger than the smallest of the largest five, then the largest five has changed.
A binary search tree could be a suitable data structure for this problem. Maybe you can find a suitable Tree class in STL or Boost or so (try to look for that). Otherwise simply use a struct if you insist.
The struct would be like that:
Taken from The C Programming Language, chapter 6.5 - Self-referential Structures. Just adapt it to your needs.
Though, I think if you want to program in C++ properly, try to create a Tree data structure (class) or try to use an existing one.
Considering that you only have 300 entries, that should do it. In theory when the input data is random it is supposed to be efficient. But that is theory and does not really play a role in your case. I think it is a good solution.
You can use a
set
to keep track of the highest values. If you want to track non-unique numbers use amultiset
instead:Of course you will have to provide a custom comparison function for your
struct
.You can use the standard library function
std::nth_element()
for this.It should be fairly easy to implement a comparison function (or overload the comparison operator) for your struct. Then you'd just parse the file into a vector of those and be done with it. The algorithm uses a partial sort, linear time on average.
Here's the example given on the documentation site I've linked below:
For reference:
Out of curiosity, I have implemented some approaches:
And I have made up some tests:
And I found it quite interesting to see the relative performance of these approaches (compiled with -O3 - I think I have to think a bit about these results):
To keep track of the highest 5 numbers in a stream of incoming numbers, you could use a min-heap of size 5 (C++ STL
set
can be used as a min-heap).First fill the min-heap with the first 5 numbers. After that, for each incoming element, compare that with the smallest of the largest 5 numbers that you have (root of the min-heap). If the current number is smaller than that, do nothing, otherwise remove the 5th largest (pop from min-heap) and insert the current number to the min-heap.
Deleting and inserting in the min-heap will take O(log n) time.
For example, consider the following stream of numbers:
The min-heap will have
1 2 3 5 6
initially.On encountering
4
, 1 gets removed and 4 gets inserted. Min heap now looks like this:2 3 4 5 6
On encountering
0
, nothing happens.On encountering
10
, 2 gets removed and 10 gets inserted. Min heap now looks like this:3 4 5 6 10
On encountering
3
, nothing happens.So your final set of 5 largest elements are contained in the heap (
3 4 5 6 10
)You can even tweak this to keep track of the k highest elements in an incoming stream of numbers. Just change the size of the min-heap to k.