How to flatten all items from a nested Java Collec

2019-01-15 13:32发布

问题:

Given a complex nested collection of objects such as:

Set<List<Map<String, List<Object>>>> complexNestedCollection;

Does a generic method exist to flatten this out and get a single List of all Objects contained within?

A few details:

  1. The list shouldn't include collection objects themselves or map keys - only the values at the lowest level.
  2. It should follow the same ordering where possible - so in the example, items in the list would be in order, whereas ordering of maps/sets would depend on the implementation.
  3. It could optionally exclude duplicates
  4. UPDATE: It should ideally detect/handle circular references at any level, e.g. a List<List<Object>> where the outer List contains itself as a member. (Credit to Adrian Jałoszewski for mentioning this in the comments below).

Note: The actual use case is to get all Strings from a List<List<String>>, which can be done easily enough with two loops but it made me wonder about the general case.

回答1:

Assuming that you use Java 8, you could do that with the Stream API thanks to the method flatMap(Function<? super T,? extends Stream<? extends R>> mapper) as next:

// 1. Convert the Set as a Stream of List<Map<String, List<Object>>>
// 2. Extract the elements of the lists to get a Stream of Map<String, List<Object>>
// 3. Extract values of the maps to get a Stream of List<Object>
// 4. Extract the elements of the lists to get a Stream of Object
// 5. Get rid of duplicates
// 6. Collect the result as a List of Object
List<Object> result = complexNestedCollection.stream()
    .flatMap(List::stream)
    .flatMap(m -> m.values().stream())
    .flatMap(List::stream)
    .distinct()
    .collect(Collectors.toList());

<R> Stream<R> flatMap(Function<? super T,? extends Stream<? extends R>> mapper)

Returns a stream consisting of the results of replacing each element of this stream with the contents of a mapped stream produced by applying the provided mapping function to each element. Each mapped stream is closed after its contents have been placed into this stream. (If a mapped stream is null an empty stream is used, instead.)


For previous versions of Java, you can still use FluentIterable from Google Guava to replace the Stream and use transformAndConcat(Function<? super E,? extends Iterable<? extends T>> function) instead of flatMap to flatten your collection.

The previous code snippet would then be rewritten as next:

List<Object> result =
    new ArrayList<>(
        new LinkedHashSet<>(
            FluentIterable.from(complexNestedCollection)
                .transformAndConcat(
                    new Function<List<Map<String, List<Object>>>, Iterable<Map<String, List<Object>>>> () {
                        public Iterable<Map<String, List<Object>>> apply(final List<Map<String, List<Object>>> input) {
                            return input;
                        }
                    }
                ).transformAndConcat(
                    new Function<Map<String, List<Object>>, Iterable<List<Object>>> () {
                        public Iterable<List<Object>> apply(final Map<String, List<Object>> input) {
                            return input.values();
                        }
                    }
                ).transformAndConcat(
                    new Function<List<Object>, Iterable<Object>> () {
                        public Iterable<Object> apply(final List<Object> input) {
                            return input;
                        }
                    }
                ).toList()
        )
    );


回答2:

I'm not sure if this exact implementation would work, as it's full of unchecked warnings and other dangerous stuff, but you should get the general idea.

public static Set<Object> recursiveExtract(Object stuff) {

    Set<Object> set = new HashSet<Object>();

    if(stuff instanceof Iterable) {
        for(Object o : (Iterable<?>)stuff) {
            set.addAll(recursiveExtract(o));
        }
    } else if(stuff instanceof Map) {
        for(Object o : ((Map<?, ? extends Object>) stuff).values()) {
            set.addAll(recursiveExtract(o));
        }
    } else {
        set.add(stuff);
    }

    return set;
}

You can also use List<Object> if you insist on List, but then you could get duplicate results, or LinkedHashSet<Object> if you care about the order.


Please instead of downvotes, give me suggestions for improvement. It's nicer.



回答3:

Here is a FlattenEverythingButTheKitchenSink class, a slightly modified version of a previous answer. It was tested with Java 7 and Java 8.

It works with Lists, Sets, Maps, Queues, and even Arrays of arbitrary depth. It compiles and runs without warning, and I couldn't find any counterexample. Hence the class name :)

If you want a List of objects with possible duplicates, use flatten, if you want a Set, use uniqFlatten.

EDITED: Refactoring to avoid code repetition.

package stackOverflow;

import java.lang.reflect.Array;
import java.util.ArrayList;
import java.util.Collection;
import java.util.HashMap;
import java.util.HashSet;
import java.util.LinkedHashSet;
import java.util.LinkedList;
import java.util.List;
import java.util.Map;
import java.util.Queue;
import java.util.Set;


// Answer for
// https://stackoverflow.com/questions/20144826/how-to-flatten-all-items-from-a-nested-collection-into-a-single-list
public class FlattenEverythingButTheKitchenSink
{
    public static void main(String[] args) {
        int[][][] int3dArray = { { { 1, 2, 3 }, { 4, 5, 6 }, { 7, 8, 9 } },
                { { 10, 11, 12 }, { 13, 14, 15 }, { 16, 17, 18 } },
                { { 19, 20, 21 }, { 22, 23, 24 }, { 25, 26, 27 }, { 28 }, { 29, 30 } } };
        String[][] string2dArray = { { "He, llo" }, { "Wo", "rld" } };
        String[] stringArray = { "Hello", "World" };
        Set<Integer> integersSet = new HashSet<Integer>();
        integersSet.add(1);
        integersSet.add(2);
        integersSet.add(3);

        Map<String, String> stringMap = new HashMap<>();
        stringMap.put("key1", "value1");
        stringMap.put("key2", "value2");
        stringMap.put("key3", "value3");

        Queue<String> qe = new LinkedList<String>();
        qe.add("x");
        qe.add("y");
        qe.add("z");

        Object[] objectArray = { "Hell", 0, "W", 0, "orld", integersSet, stringMap, qe };

        List<Object> mixList = new ArrayList<Object>();
        mixList.add("String");
        mixList.add(3);
        mixList.add(string2dArray);

        System.out.println(flatten(int3dArray));
        System.out.println(flatten(flatten(int3dArray)));
        System.out.println(flatten(3));
        System.out.println(flatten(stringArray));
        System.out.println(flatten(string2dArray));
        System.out.println(flatten(objectArray));
        System.out.println(flatten(mixList));

        mixList.add(int3dArray);

        System.out.println(uniqFlatten(mixList));
    }

    public static List<Object> flatten(Object object) {
        return (List<Object>) recursiveFlatten(object, true);
    }

    public static Set<Object> uniqFlatten(Object object) {
        return (Set<Object>) recursiveFlatten(object, false);
    }

    private static Collection<Object> recursiveFlatten(Object object, Boolean allowDuplicates) {
        Collection<Object> setOrList;
        if (allowDuplicates) {
            setOrList = new ArrayList<Object>();
        } else {
            setOrList = new LinkedHashSet<Object>();
        }
        if (object.getClass().isArray()) {
            for (int i = 0; i < Array.getLength(object); i++) {
                setOrList.addAll(recursiveFlatten(Array.get(object, i), allowDuplicates));
            }
        } else if (object instanceof Map) {
            for (Object element : ((Map<?, ?>) object).values()) {
                setOrList.addAll(recursiveFlatten(element, allowDuplicates));
            }
        } else if (object instanceof Iterable) {
            for (Object element : (Iterable<?>) object) {
                setOrList.addAll(recursiveFlatten(element, allowDuplicates));
            }
        } else {
            setOrList.add(object);
        }
        return setOrList;
    }
}

It outputs :

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30]
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30]
[3]
[Hello, World]
[He, llo, Wo, rld]
[Hell, 0, W, 0, orld, 1, 2, 3, value1, value2, value3, x, y, z]
[String, 3, He, llo, Wo, rld]
[String, 3, He, llo, Wo, rld, 1, 2, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30]

and shouldn't have any problem with a

Set<List<Map<String, List<Object>>>> complexNestedCollection;

It would also work with a

Set<List<Map<String, List<int[][][][]>>>>

The initialisation code wouldn't be pretty though :D



回答4:

You could use LambdaJ's flatten function.

List<Object> simpleCollection = flatten(flatten(flatten(complexNestedCollection)));


回答5:

My suggestion to solve the problem is to create a class which flattens the collections and maps recursively which memorizes the collections and maps already visited to handle circular dependencies. It's a straight forawart implementation of the DFS algorithm.

public class CollectionFlattener {
    private List<Object> returnList = new LinkedList<>();
    private Visited visited = new Visited();

    public CollectionFlattener(Object o) {
        handle(o);
    }

    private void handle(Object o) {
        if (o instanceof Map) {
            handleMap((Map) o);
        } else if (o instanceof Collection) {
            handleCollection((Collection) o);
        } else {
            returnList.add(o);
        }
    }

    private void handleCollection(Collection<?> collection) {
        if (!visited.isVisited(collection)) {
            visited.visit(collection);
            collection.forEach(this::handle);
        }
    }

    private void handleMap(Map<?, ?> map) {
        if (!visited.isVisited(map)) {
            visited.visit(map);
            handleCollection(map.values());
        }
    }

    public Collection<Object> getFlatCollection() {
        return new LinkedList<>(returnList);
    }
}

The Visited class has to offer a way to check whether the object we encountered is THE SAME (this is the reason I'm using the == operator here instead of equals). This is the only way we can reduce circular dependencies without loosing information about collections which by coincidence contain the same elements.

public class Visited {
    private List<Object> visited = new LinkedList<>();

    public void visit(Object o) {
        if (!isVisited(o)) {
            visited.add(o);
        }
    }
    public boolean isVisited(Object o) {
        long count = visited.stream().filter(object -> object == o).count();
        return count != 0;
    }
}

The only thing which is needed here is a null check, but it's not necessary to understand the logic behind this solution.



回答6:

I am wondering what the scenario could be and if it weren't better to define some specific data structure, such as a tree. But anyway:

I would avoid generics as java's typesystem is too simplistic to handle recursive types:

public static Collection flatten(Iterable collection, boolean duplicatesAllowed) {
    // create the result collection it just once and 
    ///pass it around as an accumulator
    // it gives you better time/space complexity
    Collection result = duplicatesAllowed ? new ArrayList() : new LinkedHashSet();
    flattenImpl(collection, result);
    return result;
}

This is supported by two private functions that do the actual extraction filling up the provided result collection:

private static void flattenImpl(Object obj, Collection result) {
    if (obj instanceof Iterable) {
        flattenImpl((Iterable)obj, result);
    } 
    else if (obj instanceof Map) {
        flattenImpl( ((Map)obj).values(), result);
    } 
    else {
        result.add(obj);
    }
}

private static void flattenImpl(Iterable collection, Collection result) {
    for(Object o : collection) {
        flattenImpl(o, result);
    }
}