Explain this pattern matching code

2020-02-19 06:25发布

问题:

This code is from Querying a Dataset with Scala's Pattern Matching:

object & { def unapply[A](a: A) = Some((a, a)) }

"Julie" match {
  case Brothers(_) & Sisters(_) => "Julie has both brother(s) and sister(s)"
  case Siblings(_) => "Julie's siblings are all the same sex"
  case _ => "Julie has no siblings"
}

// => "Julie has both brother(s) and sister(s)"

How does & actually work? I don't see a Boolean test anywhere for the conjunction. How does this Scala magic work?

回答1:

Here's how unapply works in general:

When you do

obj match {case Pattern(foo, bar) => ... }

Pattern.unapply(obj) is called. This can either return None in which case the pattern match is a failure, or Some(x,y) in which case foo and bar are bound to x and y.

If instead of Pattern(foo, bar) you did Pattern(OtherPattern, YetAnotherPatter) then x would be matched against the pattern OtherPattern and y would be matched against YetAnotherPattern. If all of those pattern matches are successful, the body of the match executes, otherwise the next pattern is tried.

when the name of a pattern is not alphanumeric, but a symbol (like &), it is used infix, i.e. you write foo & bar instead of &(foo, bar).


So here & is a pattern that always returns Some(a,a) no matter what a is. So & always matches and binds the matched object to its two operands. In code that means that

obj match {case x & y => ...}

will always match and both x and y will have the same value as obj.

In the example above this is used to apply two different patterns to the same object.

I.e. when you do

obj match { case SomePattern & SomeOtherPattern => ...}`

first the pattern & is applied. As I said, it always matches and binds obj to its LHS and its RHS. So then SomePattern is applied to &'s LHS (which is the same as obj) and SomeOtherPattern is applied to &'s RHS (which is also the same as obj).

So in effect, you just applied two patterns to the same object.



回答2:

Let's do this from the code. First, a small rewrite:

object & { def unapply[A](a: A) = Some(a, a) }

"Julie" match {
  // case Brothers(_) & Sisters(_) => "Julie has both brother(s) and sister(s)"
  case &(Brothers(_), Sisters(_)) => "Julie has both brother(s) and sister(s)"
  case Siblings(_) => "Julie's siblings are all the same sex"
  case _ => "Julie has no siblings"
}

The new rewrite means exactly the same thing. The comment line is using infix notation for extractors, and the second is using normal notation. They both translate to the same thing.

So, Scala will feed "Julie" to the extractor, repeatedly, until all unbound variables got assigned to Some thing. The first extractor is &, so we get this:

&.unapply("Julie") == Some(("Julie", "Julie"))

We got Some back, so we can proceed with the match. Now we have a tuple of two elements, and we have two extractors inside & as well, so we feed each element of the tuple to each extractor:

Brothers.unapply("Julie") == ?
Sisters.unapply("Julie") == ?

If both of these return Some thing, then the match is succesful. Just for fun, let's rewrite this code without pattern matching:

val pattern = "Julie"
val extractor1 = &.unapply(pattern)
if (extractor1.nonEmpty && extractor1.get.isInstanceOf[Tuple2]) {
  val extractor11 = Brothers.unapply(extractor1.get._1)
  val extractor12 = Sisters.unapply(extractor1.get._2)
  if (extractor11.nonEmpty && extractor12.nonEmpty) {
    "Julie has both brother(s) and sister(s)"
  } else {
    "Test Siblings and default case, but I'll skip it here to avoid repetition" 
  }
} else {
  val extractor2 = Siblings.unapply(pattern)
  if (extractor2.nonEmpty) {
    "Julie's siblings are all the same sex"
  } else {
    "Julie has no siblings"
}

Ugly looking code, even without optimizing to only get extractor12 if extractor11 isn't empty, and without the code repetition that should have gone where there's a comment. So I'll write it in yet another style:

val pattern = "Julie"
& unapply pattern  filter (_.isInstanceOf[Tuple2]) flatMap { pattern1 =>
  Brothers unapply pattern1._1 flatMap { _ =>
    Sisters unapply pattern1._2 flatMap { _ =>
      "Julie has both brother(s) and sister(s)"
    }
  }
} getOrElse {
  Siblings unapply pattern map { _ =>
    "Julie's siblings are all the same sex"
  } getOrElse {
    "Julie has no siblings"
  }
}

The pattern of flatMap/map at the beginning suggests yet another way of writing this:

val pattern = "Julie"
(
  for {
    pattern1 <- & unapply pattern
    if pattern1.isInstanceOf[Tuple2]
    _ <- Brothers unapply pattern1._1
    _ <- Sisters unapply pattern1._2
  } yield "Julie has both brother(s) and sister(s)
) getOrElse (
 for {
   _ <- Siblings unapply pattern
 } yield "Julie's siblings are all the same sex"
) getOrElse (
  "julie has no siblings"
)

You should be able to run all this code and see the results for yourself.



回答3:

For additional info, I recommend reading the Infix Operation Patterns section (8.1.10) of the Scala Language Specification.

An infix operation pattern p op q is a shorthand for the constructor or extractor pattern op(p,q). The precedence and associativity of operators in patterns is the same as in expressions.

Which is pretty much all there is to it, but then you can read about constructor and extractor patterns and patterns in general. It helps separate the syntactic sugar aspect (the "magic" part of it) from the fairly simple idea of pattern matching:

A pattern is built from constants, constructors, variables and type tests. Pattern matching tests whether a given value (or sequence of values) has the shape defined by a pattern, and, if it does, binds the variables in the pattern to the corresponding components of the value (or sequence of values).



标签: scala