Java tagged union / sum types

2020-02-17 05:17发布

Is there any way to define a sum type in Java? Java seems to naturally support product types directly, and I thought enums might allow it to support sum types, and inheritance looks like maybe it could do it, but there is at least one case I can't resolve. To elaborate, a sum type is a type which can have exactly one of a set of different types, like a tagged union in C. In my case, I'm trying to implement haskell's Either type in Java:

data Either a b = Left a | Right b

but at the base level I'm having to implement it as a product type, and just ignore one of its fields:

public class Either<L,R>
{
    private L left = null;
    private R right = null;

    public static <L,R> Either<L,R> right(R right)
    {
        return new Either<>(null, right);
    }

    public static <L,R> Either<L,R> left(L left)
    {
        return new Either<>(left, null);
    }

    private Either(L left, R right) throws IllegalArgumentException
    {
        this.left = left;
        this.right = right;
        if (left != null && right != null)
        {
            throw new IllegalArgumentException("An Either cannot be created with two values");
        }
        if (left == right)
        {
            throw new IllegalArgumentException("An Either cannot be created without a value");
        }
    }

    .
    .
    .
}

I tried implementing this with inheritance, but I have to use a wildcard type parameter, or equivalent, which Java generics won't allow:

public class Left<L> extends Either<L,?>

I haven't used Java's Enums much, but while they seem the next best candidate, I'm not hopeful.
At this point, I think this might only be possible by type-casting Object values, which I would hope to avoid entirely, unless there's a way to do it once, safely, and be able to use that for all sum types.

4条回答
Viruses.
2楼-- · 2020-02-17 05:22

Inheritance can be used to emulate sum types (Disjoint unions), but there are a few issues you need to deal with:

  1. You need to take care to keep others from adding new cases to your type. This is especially important if you want to exhaustively handle every case you might encounter. It's possible with a non-final super class, and package-private constructor.
  2. The lack of pattern patching makes it quite difficult to consume a value of this type. If you want compiler-checked way to guarantee that you've exhaustively handled all cases, you need to implement a match function yourself.
  3. You're forced into one of two styles of API, neither of which are ideal:
    • All cases implement a common API, throwing errors on API they don't support themselves. Consider Optional.get(). Ideally, this method would only be available on a disjoint type who's value is known to be some rather than none. But there's no way to do that, so it's an instance member of a general Optional type. It throws NoSuchElementException if you call it on an optional whose "case" is "none".
    • Each case has a unique API that tells you exactly what it's capable of, but that requires a manual type check and cast every time you wish to call one of these subclass-specific methods.
  4. Changing "cases" requires new object allocation (and adds pressure on the GC if done often).

TL;DR: Functional programming in Java is not a pleasant experience.

查看更多
兄弟一词,经得起流年.
3楼-- · 2020-02-17 05:28

A standard way of encoding sum types is Boehm–Berarducci encoding (often referred to by the name of its cousin, Church encoding) which represents an algebraic data type as its eliminator, i.e., a function that does pattern-matching. In Haskell:

left :: a -> (a -> r) -> (b -> r) -> r
left x l _ = l x

right :: b -> (a -> r) -> (b -> r) -> r
right x _ r = r x

match :: (a -> r) -> (b -> r) -> ((a -> r) -> (b -> r) -> r) -> r
match l r k = k l r

-- Or, with a type synonym for convenience:

type Either a b r = (a -> r) -> (b -> r) -> r

left :: a -> Either a b r
right :: b -> Either a b r
match :: (a -> r) -> (b -> r) -> Either a b r -> r

In Java this would look like a visitor:

public interface Either<A, B> {
    <R> R match(Function<A, R> left, Function<B, R> right);
}

public final class Left<A, B> implements Either<A, B> {

    private final A value;

    public Left(A value) {
        this.value = value;
    }

    public <R> R match(Function<A, R> left, Function<B, R> right) {
        return left.apply(value);
    }

}

public final class Right<A, B> implements Either<A, B> {

    private final B value;

    public Right(B value) {
        this.value = value;
    }

    public <R> R match(Function<A, R> left, Function<B, R> right) {
        return right.apply(value);
    }

}

Example usage:

Either<Integer, String> result = new Left<Integer, String>(42);
String message = result.match(
  errorCode -> "Error: " + errorCode.toString(),
  successMessage -> successMessage);

For convenience, you can make a factory for creating Left and Right values without having to mention the type parameters each time; you can also add a version of match that accepts Consumer<A> left, Consumer<B> right instead of Function<A, R> left, Function<B, R> right if you want the option of pattern-matching without producing a result.

查看更多
forever°为你锁心
4楼-- · 2020-02-17 05:30

Make Either an abstract class with no fields and only one constructor (private, no-args, empty) and nest your "data constructors" (left and right static factory methods) inside the class so that they can see the private constructor but nothing else can, effectively sealing the type.

Use an abstract method either to simulate exhaustive pattern matching, overriding appropriately in the concrete types returned by the static factory methods. Implement convenience methods (like fromLeft, fromRight, bimap, first, second) in terms of either.

import java.util.Optional;
import java.util.function.Function;

public abstract class Either<A, B> {
    private Either() {}

    public abstract <C> C either(Function<? super A, ? extends C> left,
                                 Function<? super B, ? extends C> right);

    public static <A, B> Either<A, B> left(A value) {
        return new Either<A, B>() {
            @Override
            public <C> C either(Function<? super A, ? extends C> left,
                                Function<? super B, ? extends C> right) {
                return left.apply(value);
            }
        };
    }

    public static <A, B> Either<A, B> right(B value) {
        return new Either<A, B>() {
            @Override
            public <C> C either(Function<? super A, ? extends C> left,
                                Function<? super B, ? extends C> right) {
                return right.apply(value);
            }
        };
    }

    public Optional<A> fromLeft() {
        return this.either(Optional::of, value -> Optional.empty());
    }
}

Pleasant and safe! No way to screw it up. Because the type is effectively sealed, you can rest assured that there will only ever be two cases, and every operation ultimately must be defined in terms of the either method, which forces the caller to handle both of those cases.

Regarding the problem you had trying to do class Left<L> extends Either<L,?>, consider the signature <A, B> Either<A, B> left(A value). The type parameter B doesn't appear in the parameter list. So, given a value of some type A, you can get an Either<A, B> for any type B.

查看更多
Evening l夕情丶
5楼-- · 2020-02-17 05:35

Alright, so the inheritance solution is definitely the most promising. The thing we would like to do is class Left<L> extends Either<L, ?>, which we unfortunately cannot do because of Java's generic rules. However, if we make the concessions that the type of Left or Right must encode the "alternate" possibility, we can do this.

public class Left<L, R> extends Either<L, R>`

Now, we would like to be able to convert Left<Integer, A> to Left<Integer, B>, since it doesn't actually use that second type parameter. We can define a method to do this conversion internally, thus encoding that freedom into the type system.

public <R1> Left<L, R1> phantom() {
  return new Left<L, R1>(contents);
}

Complete example:

public class EitherTest {

  public abstract static class Either<L, R> {}

  public static class Left<L, R> extends Either<L, R> {

    private L contents;

    public Left(L x) {
      contents = x;
    }

    public <R1> Left<L, R1> phantom() {
      return new Left<L, R1>(contents);
    }

  }

  public static class Right<L, R> extends Either<L, R> {

    private R contents;

    public Right(R x) {
      contents = x;
    }

    public <L1> Right<L1, R> phantom() {
      return new Right<L1, R>(contents);
    }

  }

}

Of course, you'll want to add some functions for actually accessing the contents, and for checking whether a value is Left or Right so you don't have to sprinkle instanceof and explicit casts everywhere, but this should be enough to get started, at the very least.

查看更多
登录 后发表回答