Java8-11-Stream收集器源码分析与自定义收集器

时间 2019-11-10

标签 java8 java stream 收集源码分析自定义栏目 Java 繁體版

原文原文链接

上一篇咱们系统的学习了Stream的分组分区，本篇咱们学习下Stream中的收集器。
那么什么是收集器呢，在以前的课程中，咱们学习了能够经过Stream对集合中的元素进行例如映射，过滤，分组，分区等操做。例以下面将全部元素转成大写就是用map映射操做segmentfault

List<String> list = Arrays.asList("hello", "world", "helloworld");
List<String> collect = list.stream().map(String::toUpperCase).collect(Collectors.toList());

如今再看上面的程序就很容易理解了，可是咱们以前的文章只是对于中间操做（map方法等）进行了详细的介绍，包括lambda表达式和方法引用以及各类函数式接口。接下来咱们将注意力放在collect方法上，collect接收一个Collector类型的参数，Collector就是Java8中的收集器。并发

<R, A> R collect(Collector<? super T, A, R> collector);

也就是说collect方法最终须要接收一个收集器做为结果容器。虽然大多数收集器不须要咱们自行建立，能够借助Collectors类提供的建立经常使用收集器的方法，例如toList() toSet() toCollection(Supplier collectionFactory)等方法。可是深刻理解收集器的实现，对咱们编写正确的程序会起到极大的做用。app

下面就是toList方法的具体实现ide

public static <T> Collector<T, ?, List<T>> toList() {
    return new CollectorImpl<>((Supplier<List<T>>) ArrayList::new, List::add,
                               (left, right) -> { left.addAll(right); return left; },
                               CH_ID);
}

经过查看toList方法源码，知道返回的收集器是一个CollectorImpl的实例。而CollectorImpl就是收集器Collector的一个实现类，被定义在Collectors辅助类中，用于建立经常使用的收集器实例供咱们使用函数

/**
 * Simple implementation class for {@code Collector}.
 *
 * @param <T> the type of elements to be collected
 * @param <R> the type of the result
 */
static class CollectorImpl<T, A, R> implements Collector<T, A, R> {
    private final Supplier<A> supplier;
    private final BiConsumer<A, T> accumulator;
    private final BinaryOperator<A> combiner;
    private final Function<A, R> finisher;
    private final Set<Characteristics> characteristics;

    CollectorImpl(Supplier<A> supplier,
                  BiConsumer<A, T> accumulator,
                  BinaryOperator<A> combiner,
                  Function<A,R> finisher,
                  Set<Characteristics> characteristics) {
        this.supplier = supplier;
        this.accumulator = accumulator;
        this.combiner = combiner;
        this.finisher = finisher;
        this.characteristics = characteristics;
    }

    CollectorImpl(Supplier<A> supplier,
                  BiConsumer<A, T> accumulator,
                  BinaryOperator<A> combiner,
                  Set<Characteristics> characteristics) {
        this(supplier, accumulator, combiner, castingIdentity(), characteristics);
    }

    @Override
    public BiConsumer<A, T> accumulator() {
        return accumulator;
    }

    @Override
    public Supplier<A> supplier() {
        return supplier;
    }

    @Override
    public BinaryOperator<A> combiner() {
        return combiner;
    }

    @Override
    public Function<A, R> finisher() {
        return finisher;
    }

    @Override
    public Set<Characteristics> characteristics() {
        return characteristics;
    }
}

CollectorImpl构造方法根据传入的不一样参数实现Collector接口中的方法，例如上面的toList
因此若是要实现自定义的收集器，就须要咱们本身来实现Collector接口中的各个方法，接下来就接口中的每一个方法进行分析学习

/*
 * @param <T> the type of input elements to the reduction operation
 * @param <A> the mutable accumulation type of the reduction operation (often
 *            hidden as an implementation detail)
 * @param <R> the result type of the reduction operation
 * @since 1.8
 */
public interface Collector<T, A, R> {

在分析Collector接口以前，咱们须要关注下Collector接口的三个泛型
泛型T 表示向集合中放入的元素类型
泛型A 表示可变的中间结果容器类型
泛型R 表示最终的结果容器类型this

下面咱们还会提到这些泛型，接下来看下Collector接口中的方法lua

/**
     * A function that creates and returns a new mutable result container.
     *
     * @return a function which returns a new, mutable result container
     */
    Supplier<A> supplier();

supplier()是一个建立并返回一个新的可变的结果容器的函数，也就是收集器工做时，首先要将收集的元素(也就是泛型T类型)放到supplier()建立的容器中。线程

/**
     * A function that folds a value into a mutable result container.
     *
     * @return a function which folds a value into a mutable result container
     */
    BiConsumer<A, T> accumulator();

accumulator()是将一个个元素(泛型T类型)内容放到一个可变的结果容器(泛型A类型)中的函数，这个结果容器就是上面supplier()函数所建立的。code

/**
     * A function that accepts two partial results and merges them.  The
     * combiner function may fold state from one argument into the other and
     * return that, or may return a new result container.
     *
     * @return a function which combines two partial results into a combined
     * result
     */
    BinaryOperator<A> combiner();

combiner()会接收两部分结果容器(泛型A类型)而且将他们进行合并。便可以将一个结果集合并到另外一个结果集中，也能够将这两个结果集合并到一个新的结果集中，并将获得的并集返回。
这里所说的结果集是指supplier()建立的结果容器中的全部元素，可是为何说会接收两个结果集呢，这里涉及到并行流机制，若是是串行流执行只会生成一个结果容器不须要combiner()
函数进行合并，可是若是是并行流会生成多个结果容器，须要combiner()分别进行两两合并，最终获得一个最终的结果容器(泛型R类型)

其实并行流这里说的并不严谨，并行流须要结合Characteristics中的CONCURRENT特性值才能判断是否会产生多个中间可变结果容器，咱们在后续分析收集器执行机制时，会结合示例来讲明这部分的区别。

/**
     * Perform the final transformation from the intermediate accumulation type
     * {@code A} to the final result type {@code R}.
     *
     * <p>If the characteristic {@code IDENTITY_TRANSFORM} is
     * set, this function may be presumed to be an identity transform with an
     * unchecked cast from {@code A} to {@code R}.
     *
     * @return a function which transforms the intermediate result to the final
     * result
     */
    Function<A, R> finisher();

finisher()会执行最终的转换操做，也就是说若是咱们须要将获得的结果再次进行类型转换或者其余一些逻辑处理的话，能够经过finisher()完成。若是收集器包含了
Characteristics.IDENTITY_FINISH特性，说明不须要进行任何转换操做了，那么finisher()函数就不会执行。

/**
     * Returns a {@code Set} of {@code Collector.Characteristics} indicating
     * the characteristics of this Collector.  This set should be immutable.
     *
     * @return an immutable set of collector characteristics
     */
    Set<Characteristics> characteristics();

最后来看下characteristics()函数，上面咱们不止一次提到了收集器的特性值这个概念，characteristics()方法就是返回这些特性值的函数。这些特性值是咱们建立收集器时，本身经过Characteristics指定的。Characteristics是一个定义在Collector接口中的枚举，它包括三个枚举值CONCURRENT,UNORDERED,IDENTITY_FINISH

/**
     * Characteristics indicating properties of a {@code Collector}, which can
     * be used to optimize reduction implementations.
     */
    enum Characteristics {
        /**
         * Indicates that this collector is <em>concurrent</em>, meaning that
         * the result container can support the accumulator function being
         * called concurrently with the same result container from multiple
         * threads.
         *
         * <p>If a {@code CONCURRENT} collector is not also {@code UNORDERED},
         * then it should only be evaluated concurrently if applied to an
         * unordered data source.
         */
        CONCURRENT,

        /**
         * Indicates that the collection operation does not commit to preserving
         * the encounter order of input elements.  (This might be true if the
         * result container has no intrinsic order, such as a {@link Set}.)
         */
        UNORDERED,

        /**
         * Indicates that the finisher function is the identity function and
         * can be elided.  If set, it must be the case that an unchecked cast
         * from A to R will succeed.
         */
        IDENTITY_FINISH
    }

若是包含了CONCURRENT特性值，表示这个收集器是支持并发操做的，这意味着多个线程能够同时调用accumulator()函数来向同一个中间结果容器放置元素。
注意这里是同一个中间结果容器而不是多个中间结果容器，也就是说若是包含了CONCURRENT特性值，(即便是并行流)只会产生一个中间结果容器，而且这个中间结果容器支持并发操做。

UNORDERED特性就很好理解了，它表示收集器中的元素是无序的。

IDENTITY_FINISH特性就表示肯定获得的结果容器类型就是咱们最终须要的类型，(在进行向最终类型强制类型转换时必定是成功的)

分析完咱们总结一下：
1.supplier() 用于建立并返回一个可变的结果容器。
2.accumulator() 能够将元素累加到可变的结果容器中，也就是supplier()返回的容器。
3.combiner() 将两部分结果容器（也就是supplier()返回的容器）合并起来，能够是将一个结果容器合并到另外一个结果容器中，也能够是将两个结果容器合并到一个新的空结果容器。
4.finisher() 执行最终的转换，将中间结果类型转换成最终的结果类型。
5.characteristics() 收集器的特性集合不一样的特性执行机制也不一样

了解了Collector接口中的各个方法后，下面咱们结合一个简单的需求，实现本身自的收集器
简单的需求就是将集合中的元素进行去重，这个需求十几枚多大意义，主要为了演示如何自定义收集器

public class MySetCollector<T> implements Collector<T,Set<T>,Set<T>>{
    @Override
    public Supplier<Set<T>> supplier() {
        return HashSet<T>::new;
    }

    @Override
    public BiConsumer<Set<T>, T> accumulator() {
        return Set<T>::add;
    }

    @Override
    public BinaryOperator<Set<T>> combiner() {
        return (Set<T> s1, Set<T> s2) -> {
            s1.addAll(s2);
            return s1;
        };
    }

    @Override
    public Function<Set<T>, Set<T>> finisher() {
        return Function.identity();
    }

    @Override
    public Set<Characteristics> characteristics() {
        EnumSet<Characteristics> characteristicsEnumSet = EnumSet.of(Characteristics.UNORDERED,
                Characteristics.IDENTITY_FINISH);//remove IDENTITY_FINISH finisher method will be invoked
        return Collections.unmodifiableSet(characteristicsEnumSet);
    }

    public static void main(String[] args) {
        List<String> list = Arrays.asList("hello","world","welcome","hello");
        Set<String> collect = list.stream().collect(new MySetCollector<String>());
        System.out.println(collect);
    }
}

MySetCollector类实现了Collector接口，并指定了三个泛型，集合中收集每一个元素类型为T，中间结果容器类型为Set<T>，不须要对中间结果容器类型进行转换，因此最终结果类型也是Set<T>
supplier()中咱们返回一个HashSet做为中间结果容器，accumulator()中调用Set的add方法将一个个元素加入到集合中，全都采用方法引用的方式实现。
而后combiner()对中间结果容器两两合并，finisher()中直接调用Function.identity()将合并后的中间结果容器做为最终的结果返回

/**
     * Returns a function that always returns its input argument.
     *
     * @param <T> the type of the input and output objects to the function
     * @return a function that always returns its input argument
     */
    static <T> Function<T, T> identity() {
        return t -> t;
    }

characteristics()方法定义了收集器的特性值，UNORDERED和IDENTITY_FINISH。表示容器中的元素是无序的而且不须要进行最终的类型转换
执行结果为[world, hello, welcome]

本篇咱们经过分析收集器源码并结合一个简单的元素去重的需求实现了本身的收集器MySetCollector，下一篇咱们会继续借助这个实例来分析收集器的执行机制。