Is this an acceptable algorithm?

I've designed an algorithm to find the longest common subsequence. these are steps:

Pick the first letter in the first string.
Look for it in the second string and if its found, Add that letter to common_subsequence and store its position in index, Otherwise compare the length of common_subsequence with the length of lcs and if its greater, asign its value to lcs.
Return to the first string and pick the next letter and repeat the previous step again, But this time start searching from indexth letter
Repeat this process until there is no letter in the first string to pick. At the end the value of lcs is the Longest Common Subsequence.

This is an example:
‫‪

X=A, B, C, B, D, A, B‬‬  
‫‪Y=B, D, C, A, B, A‬‬

Pick A in the first string.
Look for A in Y.
Now that there is an A in the second string, append it to common_subsequence.
Return to the first string and pick the next letter that is B. Look for B in the second string this time starting from the position of A.
There is a B after A so append B to common_subsequence.
Now pick the next letter in the first string that is C. There isn't a C next to B in the second string. So assign the value of common_subsequence to lcs because its length is greater than the length of lcs. repeat the previous steps until reaching the end of the first string. In the end the value of lcs is the Longest Common Subsequence.

The complexity of this algorithm is theta(n*m). Here is my implementations:

First algorithm:

import time
def lcs(xstr, ystr):
    if not (xstr and ystr): return # if string is empty
    lcs = [''] #  longest common subsequence
    lcslen = 0 # length of longest common subsequence so far
    for i in xrange(len(xstr)):
        cs = '' # common subsequence
        start = 0 # start position in ystr
        for item in xstr[i:]:
            index = ystr.find(item, start) # position at the common letter
            if index != -1: # if common letter has found
                cs += item # add common letter to the cs
                start = index + 1
            if index == len(ystr) - 1: break # if reached end of the ystr
        # update lcs and lcslen if found better cs
        if len(cs) > lcslen: lcs, lcslen = [cs], len(cs) 
        elif len(cs) == lcslen: lcs.append(cs)
    return lcs

file1 = open('/home/saji/file1')
file2 = open('/home/saji/file2')
xstr = file1.read()
ystr = file2.read()

start = time.time()
lcss = lcs(xstr, ystr)
elapsed = (time.time() - start)
print elapsed

The same algorithm using hash table:

import time
from collections import defaultdict
def lcs(xstr, ystr):
    if not (xstr and ystr): return # if strings are empty
    lcs = [''] #  longest common subsequence
    lcslen = 0 # length of longest common subsequence so far
    location = defaultdict(list) # keeps track of items in the ystr
    i = 0
    for k in ystr:
        location[k].append(i)
        i += 1
    for i in xrange(len(xstr)):
        cs = '' # common subsequence
        index = -1
        reached_index = defaultdict(int)
        for item in xstr[i:]:
            for new_index in location[item][reached_index[item]:]:
                reached_index[item] += 1
                if index < new_index:
                    cs += item # add item to the cs
                    index = new_index
                    break
            if index == len(ystr) - 1: break # if reached end of the ystr
        # update lcs and lcslen if found better cs
        if len(cs) > lcslen: lcs, lcslen = [cs], len(cs) 
        elif len(cs) == lcslen: lcs.append(cs)
    return lcs

file1 = open('/home/saji/file1')
file2 = open('/home/saji/file2')
xstr = file1.read()
ystr = file2.read()

start = time.time()
lcss = lcs(xstr, ystr)
elapsed = (time.time() - start)
print elapsed

标签： python lcs

2条回答

Rolldiameter

2楼-- · 2019-07-20 01:43

maxLength = 0
foundString = ""
for start in xrange(len(str1)-1):
     for end in xrange(start+1, len(str1)):
        str1Temp = str1[start:end]   
        maxLengthTemp = len(str1Temp)
        if(str2.find(str1Temp)):
             if(maxLengthTemp>maxLength):
                 maxLength = maxLengthTemp
                 foundString = str1Temp

print maxLength
print foundString

0人赞添加讨论(0) 举报

成全新的幸福

3楼-- · 2019-07-20 01:53

If your professor wants you to invent your own LCS algorithm, you're done. Your algorithm is not the most optimal one ever created, but it's in the right complexity class, you clearly understand it, and you clearly didn't copy your implementation from the internet. You might want to be prepared to defend your algorithm, or discuss alternatives. If I were your prof, I'd give you an A if:

You turned in that program.
You were able to explain why there's no possible O(N) or O(N log M) alternative.
You were able to participate in a reasonable discussion about other algorithms that might have a better lower bound (or significantly lower constants, etc.), and the time/space tradeoffs, etc., even if you didn't know the outcome of that discussion in advance.

On the other hand, if your professor wants you to pick one of the well-known algorithms and write your own implementation, you probably want to use the standard LP algorithm. It's a standard algorithm for a reason—which you probably want to read up on until you understand. (Even if it isn't going to be on the test, you're taking this class to learn, not just to impress the prof, right?)

Wikipedia has pseudocode for a basic implementation, then English-language descriptions of common optimizations. I'm pretty sure that writing your own Python code based on what's on that page wouldn't count as plagiarism, or even as a trivial port, especially if you can demonstrate that you understand what your code is doing, and why, and why it's a good algorithm. Plus, you're writing it in Python, which has much better ways to memoize than what's demonstrated in that article, so if you understand how it works, your code should actually be substantially better than what Wikipedia gives you.

Either way, as I suggested in the comments, I'd read A survey of longest common subsequence algorithms by Bergroth, Hakonen, and Raita, and search for similar papers online.

0人赞添加讨论(0) 举报

Is this an acceptable algorithm?

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间