Why is Scala's syntax for tuples so unusual?

2019-01-18 01:00发布

问题:

In mathematics and computer science, a tuple is an ordered list of elements. In set theory, an (ordered) n-tuple is a sequence (or ordered list) of n elements, where n is a positive integer.

So, for example, in Python the 2nd item of a tuple would be accessed via t[1].

In Scala, access is only possible via strange names t._2.

So the question is, why can't I access data in tuples as Sequence or List if it is by definition? Is there some sort of idea or just yet not inspected?

回答1:

Scala knows the arity of the tuples and is thus able to provide accessors like _1, _2, etc., and produce a compile-time error if you select _3 on a pair, for instance. Moreover, the type of those fields is exactly what the type used as parameter for Tuple (e.g. _3 on a Tuple3[Int, Double, Float] will return a Float).

If you want to access the nth element, you can write tuple.productElement(n), but the return type of this can only be Any, so you lose the type information.



回答2:

I believe the following excerpt from "Programming in Scala: A Comprehensive Step-by-Step Guide" (Martin Odersky, Lex Spoon and Bill Venners) directly addresses both of your questions:

Accessing the elements of a tuple

You may be wondering why you can't access the elements of a tuple like the elements of a list, for example, with "pair(0)". The reason is that a list's apply method always returns the same type, but each element of a tuple may be a different type: _1 can have one result type, _2 another, and so on. These _N numbers are one-based, instead of zero-based, because starting with 1 is a tradition set by other languages with statically typed tuples, such as Haskell and ML.

Scala tuples get very little preferential treatment as far as the language syntax is concerned, apart from expressions '(' a1, ..., an ')' being treated by the compiler as an alias for scala.Tuplen(a1, ..., an) class instantiation. Otherwise tuples do behave as any other Scala objects, in fact they are written in Scala as case classes that range from Tuple2 to Tuple22. Tuple2 and Tuple3 are also known under the aliases of Pair and Triple respectively:

 val a = Pair   (1,"two")      // same as Tuple2 (1,"two") or (1,"two") 
 val b = Triple (1,"two",3.0)  // same as Tuple3 (1,"two",3.0) or (1,"two",3.0)


回答3:

One big difference between List, Seq or any collection and tuple is that in tuple each element has it's own type where in List all elements have the same type.

And as consequence, in Scala you will find classes like Tuple2[T1, T2] or Tuple3[T1, T2, T3], so for each element you also have type parameter. Collections accept only 1 type parameter: List[T]. Syntax like ("Test", 123, new Date) is just syntactic sugar for Tuple3[String, Int, Date]. And _1, _2, etc. are just fields on tuple that return correspondent element.



回答4:

You can easily achive that with shapeless:

import shapeless.syntax.std.tuple._

val t = ("a", 2, true, 0.0)

val s = t(0) // String at compile time
val i = t(1) // Int at compile time
// etc

A lot of methods available for standard collection are also available for tuples this way (head, tail, init, last, ++ and ::: for concatenation, +: and :+ for adding elements, take, drop, reverse, zip, unzip, length, toList, toArray, to[Collection], ...)



回答5:

With normal index access, any expression can be used, and it would take some serious effort to check at compiletime if the result of the index expression it is guaranteed to be in range. Make it an attribute, and a compile-time error for (1, 2)._3 follows "for free". Things like allowing only integer constants inside item access on tuples would be a very special case (ugly and unneeded, some would say ridiculous) and again some work to implement in the compiler.

Python, for instance, can get away with that because it wouldn't (couldn't) check (at compiletime, that is) if the index is in range anyway.



回答6:

I think it's for type checking. As delnan says, if you have a tuple t and an index e (an arbitrary expression), t(e) would give the compiler no information about which element is being accessed (or even if it's a valid element for a tuple of that size). When you access elements by field name (_2 is a valid identifier, it's not special syntax), the compiler knows which field you're accessing and what type it has. Languages like Python don't really have types, so this is not necessary for them.



回答7:

Apart from the benefits Jean-Philippe Pellet already mentioned this notation is also very common in mathematics (see http://en.wikipedia.org/wiki/Tuple). A lot of lecturers append indexes to tuple variables if they want to referring to the elements of a tuple. And the common (LaTeX) notation for writing "with index n" (referring to the n-th element of the tuple) is _n. So I find it actually very intuitive.