How to select unique elements

2019-01-01 07:29发布

问题:

I would like to extend the Array class with a uniq_elements method which returns those elements with multiplicity of one. I also would like to use closures to my new method as with uniq. For example:

t=[1,2,2,3,4,4,5,6,7,7,8,9,9,9]
t.uniq_elements # => [1,3,5,6,8]

Example with closure:

t=[1.0, 1.1, 2.0, 3.0, 3.4, 4.0, 4.2, 5.1, 5.7, 6.1, 6.2]
t.uniq_elements{|z| z.round} # => [2.0, 5.1]

Neither t-t.uniq nor t.to_set-t.uniq.to_set works. I don\'t care of speed, I call it only once in my program, so it can be a slow.

回答1:

Helper method

This method uses the helper:

class Array
  def difference(other)
    h = other.each_with_object(Hash.new(0)) { |e,h| h[e] += 1 }
    reject { |e| h[e] > 0 && h[e] -= 1 }
  end
end

This method is similar to Array#-. The difference is illustrated in the following example:

a = [3,1,2,3,4,3,2,2,4]
b = [2,3,4,4,3,4]

a - b              #=> [1]
c = a.difference b #=> [1, 3, 2, 2] 

As you see, a contains three 3\'s and b contains two, so the first two 3\'s in a are removed in constructing c (a is not mutated). When b contains as least as many instances of an element as does a, c contains no instances of that element. To remove elements beginning at the end of a:

a.reverse.difference(b).reverse #=> [3, 1, 2, 2]

Array#difference! could be defined in the obvious way.

I have found many uses for this method: here, here, here, here, here, here, here, here, here, here, here, here, here, here, here, here, here, here, here, here, here, here and here.

I have proposed that this method be added to the Ruby core.

When used with Array#-, this method makes it easy to extract the unique elements from an array a:

a = [1,3,2,4,3,4]
u = a.uniq          #=> [1, 2, 3, 4]
u - a.difference(u) #=> [1, 2]

This works because

a.difference(u)     #=> [3,4]    

contains all the non-unique elements of a (each possibly more than once).

Problem at Hand

Code

class Array
  def uniq_elements(&prc)
    prc ||= ->(e) { e }
    a = map { |e| prc[e] }
    u = a.uniq
    uniques = u - a.difference(u)
    select { |e| uniques.include?(prc[e]) ? (uniques.delete(e); true) : false }
  end
end

Examples

t = [1,2,2,3,4,4,5,6,7,7,8,9,9,9]
t.uniq_elements
  #=> [1,3,5,6,8]

t = [1.0, 1.1, 2.0, 3.0, 3.4, 4.0, 4.2, 5.1, 5.7, 6.1, 6.2]
t.uniq_elements { |z| z.round }
  # => [2.0, 5.1]


回答2:

Here\'s another way.

Code

require \'set\'

class Array
  def uniq_elements(&prc)
    prc ||= ->(e) { e }
    uniques, dups = {}, Set.new
    each do |e|
      k = prc[e]
      ((uniques.key?(k)) ? (dups << k; uniques.delete(k)) :
          uniques[k] = e) unless dups.include?(k)
    end
    uniques.values
  end
end

Examples

t = [1,2,2,3,4,4,5,6,7,7,8,9,9,9]
t.uniq_elements #=> [1,3,5,6,8]

t = [1.0, 1.1, 2.0, 3.0, 3.4, 4.0, 4.2, 5.1, 5.7, 6.1, 6.2]
t.uniq_elements { |z| z.round } # => [2.0, 5.1]

Explanation

  • if uniq_elements is called with a block, it is received as the proc prc.
  • if uniq_elements is called without a block, prc is nil, so the first statement of the method sets prc equal to the default proc (lambda).
  • an initially-empty hash, uniques, contains representations of the unique values. The values are the unique values of the array self, the keys are what is returned when the proc prc is passed the array value and called: k = prc[e].
  • the set dups contains the elements of the array that have found to not be unique. It is a set (rather than an array) to speed lookups. Alternatively, if could be a hash with the non-unique values as keys, and arbitrary values.
  • the following steps are performed for each element e of the array self:
    • k = prc[e] is computed.
    • if dups contains k, e is a dup, so nothing more needs to be done; else
    • if uniques has a key k, e is a dup, so k is added to the set dups and the element with key k is removed from uniques; else
    • the element k=>e is added to uniques as a candidate for a unique element.
  • the values of unique are returned.


回答3:

class Array
  def uniq_elements
    counts = Hash.new(0)

    arr = map do |orig_val|
      converted_val =  block_given? ? (yield orig_val) : orig_val
      counts[converted_val] += 1
      [converted_val, orig_val]
    end

    uniques = []

    arr.each do |(converted_val, orig_val)|
      uniques << orig_val if counts[converted_val] == 1
    end

    uniques
  end
end

t=[1,2,2,3,4,4,5,6,7,7,8,9,9,9]
p t.uniq_elements

t=[1.0, 1.1, 2.0, 3.0, 3.4, 4.0, 4.2, 5.1, 5.7, 6.1, 6.2]
p  t.uniq_elements { |elmt| elmt.round }

--output:--
[1, 3, 5, 6, 8]
[2.0, 5.1]

Array#uniq does not find non-duplicated elements, rather Array#uniq removes duplicates.



回答4:

class Array
  def uniq_elements
    zip( block_given? ? map { |e| yield e } : self )
      .each_with_object Hash.new do |(e, v), h| h[v] = h[v].nil? ? [e] : false end
      .values.reject( &:! ).map &:first
  end
end

[1,2,2,3,4,4,5,6,7,7,8,9,9,9].uniq_elements #=> [1, 3, 5, 6, 8]
[1.0, 1.1, 2.0, 3.0, 3.4, 4.0, 4.2, 5.1, 5.7, 6.1, 6.2].uniq_elements &:round #=> [2.0, 5.1]


回答5:

  1. Creating and calling a default proc is a waste of time, and
  2. Cramming everything into one line using tortured constructs doesn\'t make the code more efficient--it just makes the code harder to understand.
  3. In require statements, rubyists don\'t capitalize file names.

....

require \'set\'

class Array
  def uniq_elements
    uniques = {}
    dups = Set.new

    each do |orig_val|
      converted_val =  block_given? ? (yield orig_val) : orig_val
      next if dups.include? converted_val 

      if uniques.include?(converted_val)  
        uniques.delete(converted_val)
        dups << converted_val
      else
        uniques[converted_val] = orig_val
      end
    end

    uniques.values
  end
end


t=[1,2,2,3,4,4,5,6,7,7,8,9,9,9]
p t.uniq_elements

t=[1.0, 1.1, 2.0, 3.0, 3.4, 4.0, 4.2, 5.1, 5.7, 6.1, 6.2]

p  t.uniq_elements {|elmt|
  elmt.round
}

--output:--
[1, 3, 5, 6, 8]
[2.0, 5.1]


标签: ruby arrays uniq