Java8使用lambda表达式进行函数式编程能够对集合进行很是方便的操做。一个比较常见的操做是将list转换成map,通常使用Collectors的toMap()方法进行转换。一个比较常见的问题是当list中含有相同元素的时候,若是不指定取哪个,则会抛出异常。所以,这个指定是必须的。Java面试宝典PDF完整版java
固然,使用toMap()的另外一个重载方法,能够直接指定。这里,咱们想讨论的是另外一种方法:在进行转map的操做以前,能不能使用distinct()先把list的重复元素过滤掉,而后转map的时候就不用考虑重复元素的问题了。面试
package example.mystream; import lombok.AllArgsConstructor; import lombok.Getter; import lombok.NoArgsConstructor; import lombok.ToString; import java.util.Arrays; import java.util.List; import java.util.Map; import java.util.stream.Collectors; public class ListToMap { @AllArgsConstructor @NoArgsConstructor @ToString private static class VideoInfo { @Getter String id; int width; int height; } public static void main(String [] args) { List<VideoInfo> list = Arrays.asList(new VideoInfo("123", 1, 2), new VideoInfo("456", 4, 5), new VideoInfo("123", 1, 2)); // preferred: handle duplicated data when toMap() Map<String, VideoInfo> id2VideoInfo = list.stream().collect( Collectors.toMap(VideoInfo::getId, x -> x, (oldValue, newValue) -> newValue) ); System.out.println("No Duplicated1: "); id2VideoInfo.forEach((x, y) -> System.out.println("<" + x + ", " + y + ">")); // handle duplicated data using distinct(), before toMap() Map<String, VideoInfo> id2VideoInfo2 = list.stream().distinct().collect( Collectors.toMap(VideoInfo::getId, x -> x) ); System.out.println("No Duplicated2: "); id2VideoInfo2.forEach((x, y) -> System.out.println("<" + x + ", " + y + ">")); } }
list里总共有三个元素,其中有两个咱们认为是重复的。第一种转换是使用toMap()直接指定了对重复key的处理状况,所以能够正常转换成map。而第二种转换是想先对list进行去重,而后再转换成map,结果仍是失败了,抛出了IllegalStateException,因此distinct()应该是失败了。编程
No Duplicated1: <123, ListToMap.VideoInfo(id=123, width=1, height=2)> <456, ListToMap.VideoInfo(id=456, width=4, height=5)> Exception in thread "main" java.lang.IllegalStateException: Duplicate key ListToMap.VideoInfo(id=123, width=1, height=2) at java.util.stream.Collectors.lambda$throwingMerger$0(Collectors.java:133) at java.util.HashMap.merge(HashMap.java:1253) at java.util.stream.Collectors.lambda$toMap$58(Collectors.java:1320) at java.util.stream.ReduceOps$3ReducingSink.accept(ReduceOps.java:169) at java.util.stream.DistinctOps$1$2.accept(DistinctOps.java:175) at java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948) at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499) at example.mystream.ListToMap.main(ListToMap.java:79)
查看distinct()的API,能够看到以下介绍:后端
Returns a stream consisting of the distinct elements (according to {@link Object#equals(Object)}) of this stream.app
显然,distinct()对对象进行去重时,是根据对象的equals()方法去处理的。若是咱们的VideoInfo类不overrride超类Object的equals()方法,就会使用Object的。ide
可是Object的equals()方法只有在两个对象彻底相同时才返回true。而咱们想要的效果是只要VideoInfo的id/width/height均相同,就认为两个videoInfo对象是同一个。因此咱们好比重写属于videoInfo的equals()方法。函数式编程
咱们设计VideoInfo的equals()以下:函数
@Override public boolean equals(Object obj) { if (!(obj instanceof VideoInfo)) { return false; } VideoInfo vi = (VideoInfo) obj; return this.id.equals(vi.id) && this.width == vi.width && this.height == vi.height; }
这样一来,只要两个videoInfo对象的三个属性都相同,这两个对象就相同了。欢天喜地去运行程序,依旧失败!why?性能
《Effective Java》是本好书,连Java之父James Gosling都说,这是一本连他都须要的Java教程。在这本书中,做者指出,若是重写了一个类的equals()方法,那么就必须一块儿重写它的hashCode()方法!必须!没有商量的余地!this
必须使得重写后的equals()知足以下条件:
根据equals()进行比较,相等的两个对象,hashCode()的值也必须相同;
由于这是Java的规定,违背这些规定将致使Java程序运行再也不正常。
具体更多的细节,建议你们读读原书,一定获益匪浅。强烈推荐!
最终,我按照神书的指导设计VideoInfo的hashCode()方法以下:
@Override public int hashCode() { int n = 31; n = n * 31 + this.id.hashCode(); n = n * 31 + this.height; n = n * 31 + this.width; return n; }
终于,distinct()成功过滤了list中的重复元素,此时使用两种toMap()将list转换成map都是没问题的:
No Duplicated1: <123, ListToMap.VideoInfo(id=123, width=1, height=2)> <456, ListToMap.VideoInfo(id=456, width=4, height=5)> No Duplicated2: <123, ListToMap.VideoInfo(id=123, width=1, height=2)> <456, ListToMap.VideoInfo(id=456, width=4, height=5)>
既然说distinct()是调用equals()进行比较的,那按照个人理解,list的3个元素至少须要比较3次吧。那是否是就调用了3次equals()呢?
在equals()中加入一句打印,这样就能够知道了。加后的equals()以下:
@Override public boolean equals(Object obj) { if (! (obj instanceof VideoInfo)) { return false; } VideoInfo vi = (VideoInfo) obj; System.out.println("<===> Invoke equals() ==> " + this.toString() + " vs. " + vi.toString()); return this.id.equals(vi.id) && this.width == vi.width && this.height == vi.height; }
结果:
No Duplicated1: <123, ListToMap.VideoInfo(id=123, width=1, height=2)> <456, ListToMap.VideoInfo(id=456, width=4, height=5)> <===> Invoke equals() ==> ListToMap.VideoInfo(id=123, width=1, height=2) vs. ListToMap.VideoInfo(id=123, width=1, height=2) No Duplicated2: <123, ListToMap.VideoInfo(id=123, width=1, height=2)> <456, ListToMap.VideoInfo(id=456, width=4, height=5)>
结果发现才调用了一次equals()。为何不是3次呢?仔细想一想,根据hashCode()进行比较,hashCode()相同的状况就一次,就是list的第一个元素和第三个元素(都是VideoInfo(id=123, width=1, height=2))会出现hashCode()相同的状况。
因此咱们是否是能够这么猜测:只有当hashCode()返回的hashCode相同的时候,才会调用equals()进行更进一步的判断。若是连hashCode()返回的hashCode都不一样,那么能够认为这两个对象必定就是不一样的了!
验证猜测:
更改hashCode()以下:
@Override public int hashCode() { return 1; }
这样一来,全部的对象的hashCode()返回值都是相同的。固然,这样搞是符合Java规范的,由于Java只规定equals()相同的对象的hashCode必须相同,可是不一样的对象的hashCode未必会不一样。
结果:
No Duplicated1: <123, ListToMap.VideoInfo(id=123, width=1, height=2)> <456, ListToMap.VideoInfo(id=456, width=4, height=5)> <===> Invoke equals() ==> ListToMap.VideoInfo(id=456, width=4, height=5) vs. ListToMap.VideoInfo(id=123, width=1, height=2) <===> Invoke equals() ==> ListToMap.VideoInfo(id=456, width=4, height=5) vs. ListToMap.VideoInfo(id=123, width=1, height=2) <===> Invoke equals() ==> ListToMap.VideoInfo(id=123, width=1, height=2) vs. ListToMap.VideoInfo(id=123, width=1, height=2) No Duplicated2: <123, ListToMap.VideoInfo(id=123, width=1, height=2)> <456, ListToMap.VideoInfo(id=456, width=4, height=5)>
果真,equals()调用了三次!看来的确只有hashCode相同的时候才会调用equal()进一步判断两个对象到底是否相同;若是hashCode不相同,两个对象显然不相同。猜测是正确的。
list转map推荐使用toMap(),而且不管是否会出现重复的问题,都要指定重复后的取舍规则,不费功夫但受益无穷;
对一个自定义的class使用distinct(),切记覆写equals()方法;
覆写equals(),必定要覆写hashCode();
虽然设计出一个hashCode()能够简单地让其return 1,这样并不会违反Java规定,可是这样作会致使不少恶果。好比将这样的对象存入hashMap的时候,全部的对象的hashCode都相同,最终全部对象都存储在hashMap的同一个桶中,直接将hashMap恶化成了一个链表。从而O(1)的复杂度被整成了O(n)的,性能天然大大降低。
最终参考程序:
package example.mystream; import lombok.AllArgsConstructor; import lombok.Getter; import lombok.NoArgsConstructor; import lombok.ToString; import java.util.Arrays; import java.util.List; import java.util.Map; import java.util.stream.Collectors; public class ListToMap { @AllArgsConstructor @NoArgsConstructor @ToString private static class VideoInfo { @Getter String id; int width; int height; public static void main(String [] args) { System.out.println(new VideoInfo("123", 1, 2).equals(new VideoInfo("123", 1, 2))); } @Override public boolean equals(Object obj) { if (!(obj instanceof VideoInfo)) { return false; } VideoInfo vi = (VideoInfo) obj; return this.id.equals(vi.id) && this.width == vi.width && this.height == vi.height; } /** * If equals() is override, hashCode() must be override, too. * 1\. if a equals b, they must have the same hashCode; * 2\. if a doesn't equals b, they may have the same hashCode; * 3\. hashCode written in this way can be affected by sequence of the fields; * 3\. 2^5 - 1 = 31\. So 31 will be faster when do the multiplication, * because it can be replaced by bit-shifting: 31 * i = (i << 5) - i. * @return */ @Override public int hashCode() { int n = 31; n = n * 31 + this.id.hashCode(); n = n * 31 + this.height; n = n * 31 + this.width; return n; } } public static void main(String [] args) { List<VideoInfo> list = Arrays.asList(new VideoInfo("123", 1, 2), new VideoInfo("456", 4, 5), new VideoInfo("123", 1, 2)); // preferred: handle duplicated data when toMap() Map<String, VideoInfo> id2VideoInfo = list.stream().collect( Collectors.toMap(VideoInfo::getId, x -> x, (oldValue, newValue) -> newValue) ); System.out.println("No Duplicated1: "); id2VideoInfo.forEach((x, y) -> System.out.println("<" + x + ", " + y + ">")); // handle duplicated data using distinct(), before toMap() // Note that distinct() relies on equals() in the object // if you override equals(), hashCode() must be override together Map<String, VideoInfo> id2VideoInfo2 = list.stream().distinct().collect( Collectors.toMap(VideoInfo::getId, x -> x) ); System.out.println("No Duplicated2: "); id2VideoInfo2.forEach((x, y) -> System.out.println("<" + x + ", " + y + ">")); } }
以上,VideoInfo使咱们本身写的类,咱们能够往里添加equals()和hashCode()方法。若是VideoInfo是咱们引用的依赖中的一个类,咱们无权对其进行修改,那么是否是就没办法使用distinct()按照某些元素是否相同,对对象进行自定义的过滤了呢?
在stackoverflow的一个回答上,咱们能够找到一个可行的方法:使用wrapper。
假设在一个依赖中(咱们无权修改该类),VideoInfo定义以下:
@AllArgsConstructor @NoArgsConstructor @ToString public class VideoInfo { @Getter String id; int width; int height; }
使用刚刚的wrapper思路,写程序以下(固然,为了程序的可运行性,仍是把VideoInfo放进来了,假设它就是不能修改的,不能为其添加任何方法):
package example.mystream; import lombok.AllArgsConstructor; import lombok.Getter; import lombok.NoArgsConstructor; import lombok.ToString; import java.util.Arrays; import java.util.List; import java.util.Map; import java.util.stream.Collectors; public class DistinctByWrapper { private static class VideoInfoWrapper { private final VideoInfo videoInfo; public VideoInfoWrapper(VideoInfo videoInfo) { this.videoInfo = videoInfo; } public VideoInfo unwrap() { return videoInfo; } @Override public boolean equals(Object obj) { if (!(obj instanceof VideoInfo)) { return false; } VideoInfo vi = (VideoInfo) obj; return videoInfo.id.equals(vi.id) && videoInfo.width == vi.width && videoInfo.height == vi.height; } @Override public int hashCode() { int n = 31; n = n * 31 + videoInfo.id.hashCode(); n = n * 31 + videoInfo.height; n = n * 31 + videoInfo.width; return n; } } public static void main(String [] args) { List<VideoInfo> list = Arrays.asList(new VideoInfo("123", 1, 2), new VideoInfo("456", 4, 5), new VideoInfo("123", 1, 2)); // VideoInfo --map()--> VideoInfoWrapper ----> distinct(): VideoInfoWrapper --map()--> VideoInfo Map<String, VideoInfo> id2VideoInfo = list.stream() .map(VideoInfoWrapper::new).distinct().map(VideoInfoWrapper::unwrap) .collect( Collectors.toMap(VideoInfo::getId, x -> x, (oldValue, newValue) -> newValue) ); id2VideoInfo.forEach((x, y) -> System.out.println("<" + x + ", " + y + ">")); } } /** * Assume that VideoInfo is a class that we can't modify */ @AllArgsConstructor @NoArgsConstructor @ToString class VideoInfo { @Getter String id; int width; int height; }
整个wrapper的思路无非就是构造另外一个类VideoInfoWrapper,把hashCode()和equals()添加到wrapper中,这样即可以按照自定义规则对wrapper对象进行自定义的过滤。
搜索Java知音公众号,回复“后端面试”,送你一份Java面试题宝典.pdf
咱们无法自定义过滤VideoInfo,可是咱们能够自定义过滤VideoInfoWrapper啊!
以后要作的,就是将VideoInfo所有转化为VideoInfoWrapper,而后过滤掉某些VideoInfoWrapper,再将剩下的VideoInfoWrapper转回VideoInfo,以此达到过滤VideoInfo的目的。很巧妙!
另外一种更精妙的实现方式是自定义一个函数:
private static <T> Predicate<T> distinctByKey(Function<? super T, Object> keyExtractor) { Map<Object, Boolean> map = new ConcurrentHashMap<>(); return t -> map.putIfAbsent(keyExtractor.apply(t), Boolean.TRUE) == null; }
(输入元素的类型是T及其父类,keyExtracctor是映射函数,返回Object,整个传入的函数的功能应该是提取key的。distinctByKey函数返回的是Predicate函数,类型为T。)
这个函数传入一个函数(lambda),对传入的对象提取key,而后尝试将key放入concurrentHashMap,若是能放进去,说明此key以前没出现过,函数返回false;若是不能放进去,说明这个key和以前的某个key重复了,函数返回true。
这个函数最终做为filter()函数的入参。根据Java API可知filter(func)过滤的规则为:若是func为true,则过滤,不然不过滤。所以,经过“filter() + 自定义的函数”,凡是重复的key都返回true,并被filter()过滤掉,最终留下的都是不重复的。Java面试宝典PDF完整版
最终实现的程序以下
package example.mystream; import lombok.AllArgsConstructor; import lombok.Getter; import lombok.NoArgsConstructor; import lombok.ToString; import java.util.Arrays; import java.util.List; import java.util.Map; import java.util.concurrent.ConcurrentHashMap; import java.util.function.Function; import java.util.function.Predicate; import java.util.stream.Collectors; public class DistinctByFilterAndLambda { public static void main(String[] args) { List<VideoInfo> list = Arrays.asList(new VideoInfo("123", 1, 2), new VideoInfo("456", 4, 5), new VideoInfo("123", 1, 2)); // Get distinct only Map<String, VideoInfo> id2VideoInfo = list.stream().filter(distinctByKey(vi -> vi.getId())).collect( Collectors.toMap(VideoInfo::getId, x -> x, (oldValue, newValue) -> newValue) ); id2VideoInfo.forEach((x, y) -> System.out.println("<" + x + ", " + y + ">")); } /** * If a key could not be put into ConcurrentHashMap, that means the key is duplicated * @param keyExtractor a mapping function to produce keys * @param <T> the type of the input elements * @return true if key is duplicated; else return false */ private static <T> Predicate<T> distinctByKey(Function<? super T, Object> keyExtractor) { Map<Object, Boolean> map = new ConcurrentHashMap<>(); return t -> map.putIfAbsent(keyExtractor.apply(t), Boolean.TRUE) == null; } } /** * Assume that VideoInfo is a class that we can't modify */ @AllArgsConstructor @NoArgsConstructor @ToString class VideoInfo { @Getter String id; int width; int height; }