Split on an empty string returns an array of size 1 :
scala> "".split(',')
res1: Array[String] = Array("")
Consider that this returns empty array:
scala> ",,,,".split(',')
res2: Array[String] = Array()
Please explain :)
Split on an empty string returns an array of size 1 :
scala> "".split(',')
res1: Array[String] = Array("")
Consider that this returns empty array:
scala> ",,,,".split(',')
res2: Array[String] = Array()
Please explain :)
For the same reason that
",test" split ','
and
",test," split ','
will return an array of size 2. Everything before the first match is returned as the first element.
If you split an orange zero times, you have exactly one piece - the orange.
The Java and Scala split methods operate in two steps like this:
",,,".split(",")
returns empty array.According to this, the result of "".split(",")
should be an empty array because of the second step, right?
It should. Unfortunately, this is an artificially introduced corner case. And that is bad, but at least it is documented in java.util.regex.Pattern
, if you remember to take a look at the documentation:
For n == 0, the result is as for n < 0, except trailing empty strings will not be returned. (Note that the case where the input is itself an empty string is special, as described above, and the limit parameter does not apply there.)
So, I advise you to always pass n == -1
as the second parameter (this will skip step two above), unless you specifically know what you want to achieve / you are sure that the empty string is not something that your program would get as an input.
If you are already using Guava in your project, you can try the Splitter (documentation) class. It has a very rich API, and makes your code very easy to understand.
Splitter.on(".").split(".a.b.c.") // "", "a", "b", "c", ""
Splitter.on(",").omitEmptyStrings().split("a,,b,,c") // "a", "b", "c"
Splitter.on(CharMatcher.anyOf(",.")).split("a,b.c") // "a", "b", "c"
Splitter.onPattern("=>?").split("a=b=>c") // "a", "b", "c"
Splitter.on(",").limit(2).split("a,b,c") // "a", "b,c"
Splitting an empty string returns the empty string as the first element. If no delimiter is found in the target string, you will get an array of size 1 that is holding the original string, even if it is empty.
"a".split(",")
-> "a"
therefore
"".split(",")
-> ""
In all programming languages I know a blank string is still a valid String. So doing a split using any delimiter will always return a single element array where that element is the blank String. If it was a null (not blank) String then that would be a different issue.
This split
behavior is inherited from Java, for better or worse...
Scala does not override the definition from the String
primitive.
Note, that you can use the limit
argument to modify the behavior:
The limit parameter controls the number of times the pattern is applied and therefore affects the length of the resulting array. If the limit n is greater than zero then the pattern will be applied at most n - 1 times, the array's length will be no greater than n, and the array's last entry will contain all input beyond the last matched delimiter. If n is non-positive then the pattern will be applied as many times as possible and the array can have any length. If n is zero then the pattern will be applied as many times as possible, the array can have any length, and trailing empty strings will be discarded.
i.e. you can set the limit=-1
to get the behavior of (all?) other languages:
@ ",a,,b,,".split(",")
res1: Array[String] = Array("", "a", "", "b")
@ ",a,,b,,".split(",", -1) // limit=-1
res2: Array[String] = Array("", "a", "", "b", "", "")
It's seems to be well-known the Java behavior is quite confusing but:
The behavior above can be observed from at least Java 5 to Java 8.
There was an attempt to change the behavior to return an empty array when splitting an empty string in JDK-6559590. However, it was soon reverted in JDK-8028321 when it causes regression in various places. The change never makes it into the initial Java 8 release.
Note: The split method wasn't in Java from the beginning (it's not in 1.0.2) but actually is there from at least 1.4 (e.g. see JSR51 circa 2002). I am still investigating...
What's unclear is why Java chose this in the first place (my suspicion is that it was originally an oversight/bug in an "edge case"), but now irrevocably baked into the language and so it remains.
Empty string have no special status while splitting a string. You may use:
Some(str)
.filter(_ != "")
.map(_.split(","))
.getOrElse(Array())