Compute Median of Values Stored In Vector - C++?

2019-01-06 14:39发布

问题:

I'm a programming student, and for a project I'm working on, on of the things I have to do is compute the median value of a vector of int values. I'm to do this using only the sort function from the STL and vector member functions such as .begin(), .end(), and .size().

I'm also supposed to make sure I find the median whether the vector has an odd number of values or an even number of values.

And I'm Stuck, below I have included my attempt. So where am I going wrong? I would appreciate if you would be willing to give me some pointers or resources to get going in the right direction.

Code:

int CalcMHWScore(const vector<int>& hWScores)
{
     const int DIVISOR = 2;
     double median;
     sort(hWScores.begin(), hWScores.end());
     if ((hWScores.size() % DIVISOR) == 0)
     {
         median = ((hWScores.begin() + hWScores.size()) + (hWScores.begin() + (hWScores.size() + 1))) / DIVISOR);
     }
     else 
     {
       median = ((hWScores.begin() + hWScores.size()) / DIVISOR)
     }

    return median;
}

Thanks!!

回答1:

You are doing an extra division and overall making it a bit more complex than it needs to be. Also, there's no need to create a DIVISOR when 2 is actually more meaningful in context.

double CalcMHWScore(vector<int> scores)
{
  size_t size = scores.size();

  if (size == 0)
  {
    return 0;  // Undefined, really.
  }
  else
  {
    sort(scores.begin(), scores.end());
    if (size % 2 == 0)
    {
      return (scores[size / 2 - 1] + scores[size / 2]) / 2;
    }
    else 
    {
      return scores[size / 2];
    }
  }
}


回答2:

There is no need to completely sort the vector: std::nth_element can do enough work to put the median in the correct position. See my answer to this question for an example.

Of course, that doesn't help if your teacher forbids using the right tool for the job.



回答3:

The following is a simple function that will return the median of a set of values using input iterators. It will not modify the original dataset, at the cost of allocating memory.

// Get the median of an unordered set of numbers of arbitrary 
// type without modifying the underlying dataset.
template <typename It>
auto Median(It begin, It end)
{
    using T = typename std::iterator_traits<It>::value_type;
    std::vector<T> data(begin, end);
    std::nth_element(data.begin(), data.begin() + data.size() / 2, data.end());
    return data[data.size() / 2];
}

If you want to avoid the cost of allocating a copy of the dataset and are willing to modify the underlying dataset, you can use this instead:

// Get the median of an unordered set of numbers of arbitrary 
// type (this will modify the underlying dataset).
template <typename It>
auto Median(It begin, It end)
{
    const auto size = std::distance(begin, end)
    std::nth_element(begin, begin + size / 2, end);
    return *std::next(begin, size / 2);
}


回答4:

const int DIVISOR = 2;

Don't do this. It just makes your code more convoluted. You've probably read guidelines about not using magic numbers, but evenness vs. oddness of numbers is a fundamental property, so abstracting this out provides no benefit but hampers readability.

if ((hWScores.size() % DIVISOR) == 0)
{
    median = ((hWScores.begin() + hWScores.size()) + (hWScores.begin() + (hWScores.size() + 1))) / DIVISOR);

You're taking an iterator to the end of the vector, taking another iterator that points one past the end of the vector, adding the iterators together (which isn't an operation that makes sense), and then dividing the resulting iterator (which also doesn't make sense). This is the more complicated case; I'll explain what to do for the odd-sized vector first and leave the even-sized case as an exercise for you.

}
else 
{
    median = ((hWScores.begin() + hWScores.size()) / DIVISOR)

Again, you're dividing an iterator. What you instead want to do is to increment an iterator to the beginning of the vector by hWScores.size() / 2 elements:

    median = *(hWScores.begin() + hWScores.size() / 2);

And note that you have to dereference iterators to get values out of them. It'd be more straightforward if you used indices:

    median = hWScores[hWScores.size() / 2];


回答5:

I give below a sample program that is somewhat similar to the one in Max S.'s response. To help the OP advance his knowledge and understanding, I have made a number of changes. I have:

a) changed the call by const reference to call by value, since sort is going to want to change the order of the elements in your vector, (EDIT: I just saw that Rob Kennedy also said this while I was preparing my post)

b) replaced size_t with the more appropriate vector<int>::size_type (actually, a convenient synonym of the latter),

c) saved size/2 to an intermediate variable,

d) thrown an exception if the vector is empty, and

e) I have also introduced the conditional operator (? :).

Actually, all of these corrections are straight out of Chapter 4 of "Accelerated C++" by Koenig and Moo.

double median(vector<int> vec)
{
        typedef vector<int>::size_type vec_sz;

        vec_sz size = vec.size();
        if (size == 0)
                throw domain_error("median of an empty vector");

        sort(vec.begin(), vec.end());

        vec_sz mid = size/2;

        return size % 2 == 0 ? (vec[mid] + vec[mid-1]) / 2 : vec[mid];
}


回答6:

I'm not exactly sure what your restrictions on the user of member functions of vector are, but index access with [] or at() would make accessing elements simpler:

median = hWScores.at(hWScores.size() / 2);

You can also work with iterators like begin() + offset like you are currently doing, but then you need to first calculate the correct offset with size()/2 and add that to begin(), not the other way around. Also you need to dereference the resulting iterator to access the actual value at that point:

median = *(hWScores.begin() + hWScores.size()/2)