大规模分布式系统设计是业界的技术难题,本文经过 GOOGLE 的分布式系统关键技术论文,从数据平面和控制平面两个纬度进行解读,从而帮助构建分布式系统设计基础。web
计算框架论文 | 简介 | 发表时间 | 主要做者 |
---|---|---|---|
搜索引擎 Search Engine | The Anatomy of a Large-Scale Hypertextual Web Search Engine | 1998 | Sergey Brin, Lawrence Page |
数据挖掘 [Mining Causal Structures](Mining Causal Structures) | Scalable Techniques for Mining Causal Structures | 1998 | Craig Silverstein, Sergey Brin, Rajeev Motwani, etc. |
搜索引擎 Extracting Patterns | Extracting Patterns and Relations from the World Wide Web | 1998 | Sergey Brin |
搜索引擎 WEBSEARCH FOR A PLANET | THE GOOGLE CLUSTER ARCHITECTURE | 2003 | Luiz André Barroso, Jeffrey Dean |
分布式锁服务 Chubby | The Chubby lock service for loosely-coupled distributed systems | 2006 | Mike Burrows |
数据中心架构 The Datacenter as a Computer | An Introduction to the Design of Warehouse-Scale Machines | 2009 | Luiz André Barroso, Urs Hölzle |
数据中心统计画像 GOOGLE-WIDE PROFILING | A CONTINUOUS PROFILING INFRASTRUCTURE FOR DATA CENTERS | 2010 | Gang Ren, Eric Tune, Tipp Moseley, etc. |
系统追踪 Dapper | A Large-Scale Distributed Systems Tracing Infrastructure | 2010 | Benjamin H. Sigelman, Luiz Andre Barroso, Mike Burrows, etc. |
多租户弹性资源伸缩 CloudScale | Elastic Resource Scaling for Multi-Tenant Cloud Systems | 2011 | Zhiming Shen, Sethuraman Subbiah, Xiaohui Gu |
网络设计 B4 | Experience with a Globally-Deployed Software Defined WAN | 2013 | Sushant Jain, Alok Kumar, Subhasree Mandal, etc. |
低时延设计 The Tail at Scale | Software techniques that tolerate latency variability are vital to building responsive large-scale Web services | 2013 | JEFFREY DEAN, LUIZ ANDRÉ BARROSO |
集群调度 Omega | Flexible, scalable schedulers for large compute clusters | 2013 | Malte Schwarzkopf, Andy Konwinski, Michael Abd-El-Malek, etc. |
性能隔离 CPI2 | CPU performance isolation for shared compute clusters | 2013 | Xiao Zhang, Eric Tune, Robert Hagmann |
大规模集群管控 Borg | Large-scale cluster management at Google with Borg | 2015 | Abhishek Verma, Luis Pedrosa, Madhukar Korupolu, etc. |
自动分区 Slicer | Auto-Sharding for Datacenter Applications | 2016 | Atul Adya, Daniel Myers, Jon Howell, etc. |
容器调度 K8S | Borg, Omega, and Kubernetes | 2016 | BRENDAN BURNS, BRIAN GRANT, DAVID OPPENHEIMER, etc. |
图分区管理 Graph partitioning | Distributed Balanced Partitioning via Linear Embedding | 2016 | Kevin Aydin, MohammadHossein Bateni, Vahab Mirrokni |
数据排布的高效集群调度 Firmament | Fast, Centralized Cluster Scheduling at Scale | 2016 | Ionel Gog, Malte Schwarzkopf, Adam Gleave, etc. |
GOOGLE 从搭建搜索引擎开始,分别从数据平面和管理平面构建大规模分布式系统,其中数据平面以 GFS、MR、BigTable 三篇经典 做为基础不断发展,同时管控平面也不断完善。api
构建大规模分布式系统,其实和构建传统 ICT 相似,也须要从架构上设计好数据平面和控制平面,从而除了集中数据路径的设计优化,一样也须要设计好控制平面的集群控制、锁管理、日志跟踪、统计画像、资源隔离、热点均衡等技术,只是在大规模系统构建的需求下,须要进行架构的从新设计。网络