Max_element with boost directory_iterator

2019-08-10 08:36发布

问题:

I am using boost's filesystem and std::max_element() to find a file with the longest name in a given directory:

#include <iostream>
#include <iterator>
#include <algorithm>
#include <string>
#include <boost/filesystem.hpp>
using namespace std;
using namespace boost::filesystem;

bool size_comp( directory_entry de1, directory_entry de2  )
{
    return de1.path().string().size() < de2.path().string().size(); 
}

int main(int argc, char* argv[])
{
    path p (argv[1]);   // p is a path to a directory

    directory_iterator itr (p); 
    directory_iterator itr_end; 
    directory_iterator itr_max=::max_element(itr,itr_end,size_comp);
    int max_size = itr_max->path().string().size();
    cout << "Longest file name: " << itr_max->path() << " has " 
         << max_size << " characters" << endl; 
    return 0;
}

For the directory Animals with files cat.dat, mouse.dat, elephant.dat the output is:

Longest file name: Animals/mouse.dat has 17 characters

This is neither the longest nor the shortest filename. What's wrong with the code above?

回答1:

The boost::filesystem::directory_iterator shares state. The implementation states the implementation is managed by boost::shared_ptr to allow shallow-copy semantics required for InputIterators. Thus, when an algorithm, such as std::max_element, iterates over [first,last), the result iterator is indirectly modified with each increment of first, as the result iterator will share state with first.

To resolve this, consider storing boost::filesystem::directory_entry throughout the overall algorithm. For example, one could construct a std::vector<directory_entry> from a directory_iterator range, then pass the vector to std::max_element. Alternatively, it may be easier to write the algorithms by hand.

Here is a complete example showing both approaches, operating on the current directory.

#include <algorithm> // std::copy, std::max_element
#include <iterator>  // std::back_inserter
#include <iostream>  // std::cout, std::endl
#include <vector>
#include <utility>   // std::make_pair

#include <boost/filesystem.hpp>
#include <boost/foreach.hpp>

namespace fs = boost::filesystem;

bool size_comp(const fs::directory_entry& lhs,
               const fs::directory_entry& rhs)
{
  return lhs.path().string().size() < rhs.path().string().size();
}

/// @brief Finds max by copying all directory entries.
fs::directory_entry max_full_copy(
  fs::directory_iterator first,
  fs::directory_iterator last)
{
  // Extract directory_entries from directory_iteartor.
  std::vector<fs::directory_entry> entries;
  std::copy(first, last, std::back_inserter(entries));
  // Find max element.
  return *std::max_element(entries.begin(), entries.end(), &size_comp);
}

/// @brief Finds max by only storing a copy of the max entry.
fs::directory_entry max_single_copy(
  fs::directory_iterator first,
  fs::directory_iterator last)
{
  fs::directory_entry result;
  BOOST_FOREACH(fs::directory_entry& current, std::make_pair(first, last))
  {
    if (size_comp(result, current))
      result = current;
  }
  return result;
}

int main()
{
  std::cout << max_full_copy(fs::directory_iterator("."),
                             fs::directory_iterator()) << "\n"
            << max_single_copy(fs::directory_iterator("."),
                               fs::directory_iterator()) << std::endl;
}

And an example run with output:

[tsansbury@localhost tmp]$ ls
file_four  file_one  file_three  file_two
[tsansbury@localhost tmp]$ ../a.out 
"./file_three"
"./file_three"