Find the longest common starting substring in a se-第4页回答

This is a challenge to come up with the most elegant JavaScript, Ruby or other solution to a relatively trivial problem.

This problem is a more specific case of the Longest common substring problem. I need to only find the longest common starting substring in an array. This greatly simplifies the problem.

For example, the longest substring in [interspecies, interstelar, interstate] is "inters". However, I don't need to find "ific" in [specifics, terrific].

I've solved the problem by quickly coding up a solution in JavaScript as a part of my answer about shell-like tab-completion (test page here). Here is that solution, slightly tweaked:

function common_substring(data) {
  var i, ch, memo, idx = 0
  do {
    memo = null
    for (i=0; i < data.length; i++) {
      ch = data[i].charAt(idx)
      if (!ch) break
      if (!memo) memo = ch
      else if (ch != memo) break
    }
  } while (i == data.length && idx < data.length && ++idx)

  return (data[0] || '').slice(0, idx)
}

This code is available in this Gist along with a similar solution in Ruby. You can clone the gist as a git repo to try it out:

$ git clone git://gist.github.com/257891.git substring-challenge

I'm not very happy with those solutions. I have a feeling they might be solved with more elegance and less execution complexity—that's why I'm posting this challenge.

I'm going to accept as an answer the solution I find the most elegant or concise. Here is for instance a crazy Ruby hack I come up with—defining the & operator on String:

# works with Ruby 1.8.7 and above
class String
  def &(other)
    difference = other.to_str.each_char.with_index.find { |ch, idx|
      self[idx].nil? or ch != self[idx].chr
    }
    difference ? self[0, difference.last] : self
  end
end

class Array
  def common_substring
    self.inject(nil) { |memo, str| memo.nil? ? str : memo & str }.to_s
  end
end

Solutions in JavaScript or Ruby are preferred, but you can show off clever solution in other languages as long as you explain what's going on. Only code from standard library please.

Update: my favorite solutions

I've chosen the JavaScript sorting solution by kennebec as the "answer" because it struck me as both unexpected and genius. If we disregard the complexity of actual sorting (let's imagine it's infinitely optimized by the language implementation), the complexity of the solution is just comparing two strings.

Other great solutions:

"regex greed" by FM takes a minute or two to grasp, but then the elegance of it hits you. Yehuda Katz also made a regex solution, but it's more complex
commonprefix in Python — Roberto Bonvallet used a feature made for handling filesystem paths to solve this problem
Haskell one-liner is short as if it were compressed, and beautiful
the straightforward Ruby one-liner

Thanks for participating! As you can see from the comments, I learned a lot (even about Ruby).

标签： javascript python ruby haskell longest-prefix

30条回答

Luminary・发光体

2楼-- · 2020-01-25 04:22

This is probably not the most concise solution (depends if you already have a library for this), but one elegant method is to use a trie. I use tries for implementing tab completion in my Scheme interpreter:

http://github.com/jcoglan/heist/blob/master/lib/trie.rb

For example:

tree = Trie.new
%w[interspecies interstelar interstate].each { |s| tree[s] = true }
tree.longest_prefix('')
#=> "inters"

I also use them for matching channel names with wildcards for the Bayeux protocol; see these:

http://github.com/jcoglan/faye/blob/master/client/channel.js

http://github.com/jcoglan/faye/blob/master/lib/faye/channel.rb

0人赞添加讨论(0) 举报

ら.Afraid

3楼-- · 2020-01-25 04:22

This one is very similar to Roberto Bonvallet's solution, except in ruby.

chars = %w[interspecies interstelar interstate].map {|w| w.split('') }
chars[0].zip(*chars[1..-1]).map { |c| c.uniq }.take_while { |c| c.size == 1 }.join

The first line replaces each word with an array of chars. Next, I use zip to create this data structure:

[["i", "i", "i"], ["n", "n", "n"], ["t", "t", "t"], ...

map and uniq reduce this to [["i"],["n"],["t"], ...

take_while pulls the chars off the array until it finds one where the size isn't one (meaning not all chars were the same). Finally, I join them back together.

0人赞添加讨论(0) 举报

我只想做你的唯一

4楼-- · 2020-01-25 04:23

You just need to traverse all strings until they differ, then take the substring up to this point.

Pseudocode:

loop for i upfrom 0
     while all strings[i] are equal
     finally return substring[0..i]

Common Lisp:

(defun longest-common-starting-substring (&rest strings)
  (loop for i from 0 below (apply #'min (mapcar #'length strings))
     while (apply #'char=
                  (mapcar (lambda (string) (aref string i))
                          strings))
     finally (return (subseq (first strings) 0 i))))

0人赞添加讨论(0) 举报

smile是对你的礼貌

5楼-- · 2020-01-25 04:23

A javascript version based on @Svante's algorithm:

function commonSubstring(words){
    var iChar, iWord,
        refWord = words[0],
        lRefWord = refWord.length,
        lWords = words.length;
    for (iChar = 0; iChar < lRefWord; iChar += 1) {
        for (iWord = 1; iWord < lWords; iWord += 1) {
            if (refWord[iChar] !== words[iWord][iChar]) {
                return refWord.substring(0, iChar);
            }
        }
    }
    return refWord;
}

0人赞添加讨论(0) 举报

萌系小妹纸

6楼-- · 2020-01-25 04:25

Ruby one-liner:

l=strings.inject{|l,s| l=l.chop while l!=s[0...l.length];l}

0人赞添加讨论(0) 举报

▲ chillily

7楼-- · 2020-01-25 04:25

In Python I wouldn't use anything but the existing commonprefix function I showed in another answer, but I couldn't help to reinvent the wheel :P. This is my iterator-based approach:

>>> a = 'interspecies interstelar interstate'.split()
>>>
>>> from itertools import takewhile, chain, izip as zip, imap as map
>>> ''.join(chain(*takewhile(lambda s: len(s) == 1, map(set, zip(*a)))))
'inters'

Edit: Explanation of how this works.

zip generates tuples of elements taking one of each item of a at a time:

In [6]: list(zip(*a))  # here I use list() to expand the iterator
Out[6]:
[('i', 'i', 'i'),
 ('n', 'n', 'n'),
 ('t', 't', 't'),
 ('e', 'e', 'e'),
 ('r', 'r', 'r'),
 ('s', 's', 's'),
 ('p', 't', 't'),
 ('e', 'e', 'a'),
 ('c', 'l', 't'),
 ('i', 'a', 'e')]

By mapping set over these items, I get a series of unique letters:

In [7]: list(map(set, _))  # _ means the result of the last statement above
Out[7]:
[set(['i']),
 set(['n']),
 set(['t']),
 set(['e']),
 set(['r']),
 set(['s']),
 set(['p', 't']),
 set(['a', 'e']),
 set(['c', 'l', 't']),
 set(['a', 'e', 'i'])]

takewhile(predicate, items) takes elements from this while the predicate is True; in this particular case, when the sets have one element, i.e. all the words have the same letter at that position:

In [8]: list(takewhile(lambda s: len(s) == 1, _))
Out[8]:
[set(['i']),
 set(['n']), 
 set(['t']), 
 set(['e']), 
 set(['r']), 
 set(['s'])]

At this point we have an iterable of sets, each containing one letter of the prefix we were looking for. To construct the string, we chain them into a single iterable, from which we get the letters to join into the final string.

The magic of using iterators is that all items are generated on demand, so when takewhile stops asking for items, the zipping stops at that point and no unnecessary work is done. Each function call in my one-liner has a implicit for and an implicit break.

0人赞添加讨论(0) 举报

Find the longest common starting substring in a se

Update: my favorite solutions

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间