array of substrings in array of strings

2019-09-24 04:45发布

I have two array of strings. Strings in one array might be the subset of string in other array. I need to find out which all strings in one array are the substrings of strings in the other array

Example:

arr1 = ["firestorm", "peanut", "earthworm"]
arr2 = ["fire", "tree", "worm", "rest"]

result:

res = ["fire","worm", "rest"]

My solution is mentioned below. But it takes a lot of time. I have to process Thousands of words.

Solution:

res =[]
arr1.each do |word1|
  arr2.each do |word2|
   if word1.include? word2
     res << word2
   end
  end
end

Please suggest me the faster way to to do this

2条回答
ゆ 、 Hurt°
2楼-- · 2019-09-24 05:01

Unfortunely we don't know your solution.

But Array takes up more memory space than String. So you can convert it.

arr1 = ["firestorm", "peanut", "earthworm"]
arr2 = ["fire", "tree", "worm", "rest"]

arr1 = arr1.join(',')

And then

res = arr2.select { |word| arr1.include?(word) } #=> ["fire", "worm", "rest"]

or

res = arr2.select { |word| arr1.match?(word) } #=> ["fire", "worm", "rest"]

or

res = arr2.select { |word| arr1.match(word) } #=> ["fire", "worm", "rest"]
查看更多
ゆ 、 Hurt°
3楼-- · 2019-09-24 05:01

Due to overlapping terms you need to brute-force this as far as I can tell:

def matched(find, list)
  list.flat_map { |e| find.flat_map { |f| e.scan(f) } }.uniq
end

In practice:

matched(%w[ fire tree worm rest ], %w[ firestorm peanut earthworm ])
# => ["fire", "rest", "worm"]

Where here %w is used as a quicker way of expressing lists.

Here's an approximation using scan and flat_map:

def matched(find, list)
  rx = Regexp.union(find)

  list.flat_map { |e| e.scan(rx) }.uniq
end

Where using Rexexp.union you can make a regular expression that runs fairly quickly compared to individual tests.

Where it isn't as accurate:

matched(%w[ fire tree worm rest ], %w[ firestorm peanut earthworm ])
# => ["fire", "worm"]
查看更多
登录 后发表回答