如何从两个List中筛选出相同的值

时间 2019-12-09

标签如何两个 list 筛选相同繁體版

原文原文链接

问题

现有社保卡和身份证若干，想要匹配筛选出一一对应的社保卡和身份证。
转换为List<社保卡> socialList,和List<IDcard> idList，从两者中找出匹配的社保卡。java

模型

建立社保卡类算法

/**
 * @author Ryan Miao
 */
class SocialSecurity{
    private Integer id;//社保号码
    private Integer idCard;//身份证号码
    private String somethingElse;

    public SocialSecurity(Integer id, Integer idCard, String somethingElse) {
        this.id = id;
        this.idCard = idCard;
        this.somethingElse = somethingElse;
    }

    public Integer getId() {
        return id;
    }

    public Integer getIdCard() {
        return idCard;
    }

    public String getSomethingElse() {
        return somethingElse;
    }

    @Override
    public String toString() {
        return "SocialSecurity{" +
                "id=" + id +
                ", idCard=" + idCard +
                ", somethingElse='" + somethingElse + '\'' +
                '}';
    }
}

建立身份证类数组

class IdCard {
    private Integer id;//身份证号码
    private String somethingElse;

    public IdCard(Integer id, String somethingElse) {
        this.id = id;
        this.somethingElse = somethingElse;
    }

    public Integer getId() {
        return id;
    }

    public String getSomethingElse() {
        return somethingElse;
    }

    @Override
    public String toString() {
        return "IdCard{" +
                "id=" + id +
                ", somethingElse='" + somethingElse + '\'' +
                '}';
    }
}

最简单的办法：遍历

只要作两轮循环便可。
准备初始化数据：ide

private ArrayList<SocialSecurity> socialSecurities;
private ArrayList<IdCard> idCards;

@Before
public void setUp(){
    socialSecurities = Lists.newArrayList(
            new SocialSecurity(1, 12, "小明"),
            new SocialSecurity(2, 13, "小红"),
            new SocialSecurity(3, 14, "小王"),
            new SocialSecurity(4, 15, "小peng")
    );

    idCards = Lists.newArrayList(
            new IdCard(14, "xiaopeng"),
    new IdCard(13, "xiaohong"),
    new IdCard(12, "xiaoming")
    );

    //目标： 从socialSecurities中筛选出idCards中存在的卡片
}

遍历this

@Test
public void testFilterForEach(){
    List<SocialSecurity> result = new ArrayList<>();
    int count = 0;
    for (SocialSecurity socialSecurity : socialSecurities) {
        for (IdCard idCard : idCards) {
            count++;
            if (socialSecurity.getIdCard().equals(idCard.getId())){
                result.add(socialSecurity);
            }
        }
    }

    System.out.println(result);
    System.out.println(count);//12 = 3 * 4
    //O(m,n) = m*n;
}

很容易看出，时间复杂度O(m,n)=m*n.spa

采用Hash

经过观察发现，两个list取相同的部分时，每次都遍历两个list。那么，能够把判断条件放入Hash中，判断hash是否存在来代替遍历查找。code

@Test
public void testFilterHash(){
    Set<Integer> ids = idCards
            .stream()
            .map(IdCard::getId)
            .collect(Collectors.toSet());
    List<SocialSecurity> result = socialSecurities
            .stream()
            .filter(e->ids.contains(e.getIdCard()))
            .collect(Collectors.toList());

    System.out.println(result);
    //初始化 hash 3
    //遍历socialSecurities 4
    //从hash中判断key是否存在  4
    //O(m,n)=2m+n=11
}

如此，假设hash算法特别好，hash的时间复杂度为O(n)=n。如此推出这种作法的时间复杂度为O(m,n)=2m+n. 固然，更重要的是这种写法更让人喜欢，自然不喜欢嵌套的判断，喜欢扁平化的风格。对象

Hash必定会比遍历快吗

想固然的觉得，hash确定会比遍历快，由于是hash啊。其实，能够算算比较结果。比较何时2m+n < m*n。
从数据概括法的角度，n必须大于2，否则即演变程2m+2 < 2m。因而，当n>2时：ci

@Test
public void testCondition(){
    int maxN = 0;
    for (int m = 2; m < 100; m++) {
        for (int n = 3; n < 100; n++) {
            if ((2*m+n)>m*n){
                System.out.println("m="+m +",n="+n);
                if (n>maxN){
                    maxN = n;
                }
            }
        }
    }

    System.out.println(maxN);
}

结果是：rem

m=2,n=3
3

也就是说n<=3的时候，遍历要比hash快。事实上还要更快，由于hash还须要建立更多的对象。然而，大部分状况下，n也就是第二个数组的长度是大于3的。这就是为何说hash要更好写。固然，另外一个很重要的缘由是lambda stream的运算符号远比嵌套循环让人喜好。