Scala String Equality Question from Programming In

2020-06-08 02:59发布

问题:

Since I liked programming in Scala, for my Google interview, I asked them to give me a Scala / functional programming style question. The Scala functional style question that I got was as follows:

You have two strings consisting of alphabetic characters as well as a special character representing the backspace symbol. Let's call this backspace character '/'. When you get to the keyboard, you type this sequence of characters, including the backspace/delete character. The solution you are to implement must check if the two sequences of characters produce the same output. For example, "abc", "aa/bc". "abb/c", "abcc/", "/abc", and "//abc" all produce the same output, "abc". Because this is a Scala / functional programming question, you must implement your solution in idiomatic Scala style.

I wrote the following code (it might not be exactly what I wrote, I'm just going off memory). Basically I just go linearly through the string, prepending characters to a list, and then I compare the lists.

def processString(string: String): List[Char] = {
  string.foldLeft(List[Char]()){ case(accumulator: List[Char], char: Char) =>
    accumulator match {
      case head :: tail => if(char != '/') { char :: head :: tail } else { tail }
      case emptyList => if(char != '/') { char :: emptyList } else { emptyList }
    }
  }
}

def solution(string1: String, string2: String): Boolean = {
  processString(string1) == processString(string2)
}

So far so good? He then asked for the time complexity and I responded linear time (because you have to process each character once) and linear space (because you have to copy each element into a list). Then he asked me to do it in linear time, but with constant space. I couldn't think of a way to do it that was purely functional. He said to try using a function in the Scala collections library like "zip" or "map" (I explicitly remember him saying the word "zip").

Here's the thing. I think that it's physically impossible to do it in constant space without having any mutable state or side effects. Like I think that he messed up the question. What do you think?

Can you solve it in linear time, but with constant space?

回答1:

This code takes O(N) time and needs only three integers of extra space:

def solution(a: String, b: String): Boolean = {

  def findNext(str: String, pos: Int): Int = {
    @annotation.tailrec
    def rec(pos: Int, backspaces: Int): Int = {
      if (pos == 0) -1
      else {
        val c = str(pos - 1)
        if (c == '/') rec(pos - 1, backspaces + 1)
        else if (backspaces > 0) rec(pos - 1, backspaces - 1)
        else pos - 1
      }
    }
    rec(pos, 0)
  }

  @annotation.tailrec 
  def rec(aPos: Int, bPos: Int): Boolean = {
    val ap = findNext(a, aPos)
    val bp = findNext(b, bPos)
    (ap < 0 && bp < 0) ||
    (ap >= 0 && bp >= 0 && (a(ap) == b(bp)) && rec(ap, bp))
  }

  rec(a.size, b.size)
}

The problem can be solved in linear time with constant extra space: if you scan from right to left, then you can be sure that the /-symbols to the left of the current position cannot influence the already processed symbols (to the right of the current position) in any way, so there is no need to store them. At every point, you need to know only two things:

  1. Where are you in the string?
  2. How many symbols do you have to throw away because of the backspaces

That makes two integers for storing the positions, and one additional integer for temporary storing the number of accumulated backspaces during the findNext invocation. That's a total of three integers of space overhead.

Intuition

Here is my attempt to formulate why the right-to-left scan gives you a O(1) algorithm:

The future cannot influence the past, therefore there is no need to remember the future.

The "natural time" in this problem flows from left to right. Therefore, if you scan from right to left, you are moving "from the future into the past", and therefore you don't need to remember the characters to the right of your current position.

Tests

Here is a randomized test, which makes me pretty sure that the solution is actually correct:

val rng = new util.Random(0)
def insertBackspaces(s: String): String = {
  val n = s.size
  val insPos = rng.nextInt(n)
  val (pref, suff) = s.splitAt(insPos)
  val c = ('a' + rng.nextInt(26)).toChar
  pref + c + "/" + suff
}

def prependBackspaces(s: String): String = {
  "/" * rng.nextInt(4) + s
}

def addBackspaces(s: String): String = {
  var res = s
  for (i <- 0 until 8) 
    res = insertBackspaces(res)
  prependBackspaces(res)
}

for (i <- 1 until 1000) {
  val s = "hello, world"
  val t = "another string"

  val s1 = addBackspaces(s)
  val s2 = addBackspaces(s)
  val t1 = addBackspaces(t)
  val t2 = addBackspaces(t)

  assert(solution(s1, s2))
  assert(solution(t1, t2))
  assert(!solution(s1, t1))
  assert(!solution(s1, t2))
  assert(!solution(s2, t1))
  assert(!solution(s2, t2))

  if (i % 100 == 0) {
    println(s"Examples:\n$s1\n$s2\n$t1\n$t2")
  }
}

A few examples that the test generates:

Examples:
/helly/t/oj/m/, wd/oi/g/x/rld
///e/helx/lc/rg//f/o, wosq//rld
/anotl/p/hhm//ere/t/ strih/nc/g
anotx/hb/er sw/p/tw/l/rip/j/ng
Examples:
//o/a/hellom/, i/wh/oe/q/b/rld
///hpj//est//ldb//y/lok/, world
///q/gd/h//anothi/k/eq/rk/ string
///ac/notherli// stri/ig//ina/n/g
Examples:
//hnn//ello, t/wl/oxnh///o/rld
//helfo//u/le/o, wna//ova//rld
//anolq/l//twl//her n/strinhx//g
/anol/tj/hq/er swi//trrq//d/ing
Examples:
//hy/epe//lx/lo, wr/v/t/orlc/d
f/hk/elv/jj//lz/o,wr// world
/anoto/ho/mfh///eg/r strinbm//g
///ap/b/notk/l/her sm/tq/w/rio/ng
Examples:
///hsm/y//eu/llof/n/, worlq/j/d
///gx//helf/i/lo, wt/g/orn/lq/d
///az/e/notm/hkh//er sm/tb/rio/ng
//b/aen//nother v/sthg/m//riv/ng

Seems to work just fine. So, I'd say that the Google-guy did not mess up, looks like a perfectly valid question.



回答2:

You don't have to create the output to find the answer. You can iterate the two sequences at the same time and stop on the first difference. If you find no difference and both sequences terminate at the same time, they're equal, otherwise they're different.

But now consider sequences such as this one: aaaa/// to compare with a. You need to consume 6 elements from the left sequence and one element from the right sequence before you can assert that they're equal. That means that you would need to keep at least 5 elements in memory until you can verify that they're all deleted. But what if you iterated elements from the end? You would then just need to count the number of backspaces and then just ignoring as many elements as necessary in the left sequence without requiring to keep them in memory since you know they won't be present in the final output. You can achieve O(1) memory using these two tips.

I tried it and it seems to work:

def areEqual(s1: String, s2: String) = {
    def charAt(s: String, index: Int) = if (index < 0) '#' else s(index)

    @tailrec
    def recSol(i1: Int, backspaces1: Int, i2: Int, backspaces2: Int): Boolean = (charAt(s1, i1), charAt(s2, i2)) match {
        case ('/',  _) => recSol(i1 - 1, backspaces1 + 1, i2, backspaces2)
        case (_,  '/') => recSol(i1, backspaces1, i2 - 1, backspaces2 + 1)
        case ('#' , '#') => true
        case (ch1, ch2)  => 
            if      (backspaces1 > 0) recSol(i1 - 1, backspaces1 - 1, i2    , backspaces2    )
            else if (backspaces2 > 0) recSol(i1    , backspaces1    , i2 - 1, backspaces2 - 1)
            else        ch1 == ch2 && recSol(i1 - 1, backspaces1    , i2 - 1, backspaces2    )
    }
    recSol(s1.length - 1, 0, s2.length - 1, 0)
}

Some tests (all pass, let me know if you have more edge cases in mind):

// examples from the question
val inputs = Array("abc", "aa/bc", "abb/c", "abcc/", "/abc", "//abc")
for (i <- 0 until inputs.length; j <- 0 until inputs.length) {
    assert(areEqual(inputs(i), inputs(j)))
}

// more deletions than required
assert(areEqual("a///////b/c/d/e/b/b", "b")) 
assert(areEqual("aa/a/a//a//a///b", "b"))
assert(areEqual("a/aa///a/b", "b"))

// not enough deletions
assert(!areEqual("aa/a/a//a//ab", "b")) 

// too many deletions
assert(!areEqual("a", "a/"))

PS: just a few notes on the code itself:

  • Scala type inference is good enough so that you can drop types in the partial function inside your foldLeft
  • Nil is the idiomatic way to refer to the empty list case

Bonus:

I had something like Tim's soltion in mind before implementing my idea, but I started early with pattern matching on characters only and it didn't fit well because some cases require the number of backspaces. In the end, I think a neater way to write it is a mix of pattern matching and if conditions. Below is my longer original solution, the one I gave above was refactored laater:

def areEqual(s1: String, s2: String) = {
    @tailrec
    def recSol(c1: Cursor, c2: Cursor): Boolean = (c1.char, c2.char) match {
        case ('/',  '/') => recSol(c1.next, c2.next)
        case ('/' ,   _) => recSol(c1.next, c2     )
        case (_   , '/') => recSol(c1     , c2.next)
        case ('#' , '#') => true
        case (a   ,   b) if (a == b) => recSol(c1.next, c2.next)
        case _           => false
    }
    recSol(Cursor(s1, s1.length - 1), Cursor(s2, s2.length - 1))
}

private case class Cursor(s: String, index: Int) {
    val char = if (index < 0) '#' else s(index)
    def next = {
      @tailrec
      def recSol(index: Int, backspaces: Int): Cursor = {
          if      (index < 0      ) Cursor(s, index)
          else if (s(index) == '/') recSol(index - 1, backspaces + 1)
          else if (backspaces  > 1) recSol(index - 1, backspaces - 1)
          else                      Cursor(s, index - 1)
      }
      recSol(index, 0)
    }
}


回答3:

If the goal is minimal memory footprint, it's hard to argue against iterators.

def areSame(a :String, b :String) :Boolean = {
  def getNext(ci :Iterator[Char], ignore :Int = 0) : Option[Char] =
    if (ci.hasNext) {
      val c = ci.next()
      if (c == '/')        getNext(ci, ignore+1)
      else if (ignore > 0) getNext(ci, ignore-1)
      else                 Some(c)
    } else None

  val ari = a.reverseIterator
  val bri = b.reverseIterator
  1 to a.length.max(b.length) forall(_ => getNext(ari) == getNext(bri))
}

On the other hand, when arguing FP principals it's hard to defend iterators, since they're all about maintaining state.



回答4:

Here is a version with a single recursive function and no additional classes or libraries. This is linear time and constant memory.

def compare(a: String, b: String): Boolean = {
  @tailrec
  def loop(aIndex: Int, aDeletes: Int, bIndex: Int, bDeletes: Int): Boolean = {
    val aVal = if (aIndex < 0) None else Some(a(aIndex))
    val bVal = if (bIndex < 0) None else Some(b(bIndex))

    if (aVal.contains('/')) {
      loop(aIndex - 1, aDeletes + 1, bIndex, bDeletes)
    } else if (aDeletes > 0) {
      loop(aIndex - 1, aDeletes - 1, bIndex, bDeletes)
    } else if (bVal.contains('/')) {
      loop(aIndex, 0, bIndex - 1, bDeletes + 1)
    } else if (bDeletes > 0) {
      loop(aIndex, 0, bIndex - 1, bDeletes - 1)
    } else {
      aVal == bVal && (aVal.isEmpty || loop(aIndex - 1, 0, bIndex - 1, 0))
    }
  }

  loop(a.length - 1, 0, b.length - 1, 0)
}