原文地址:http://highscalability.com/blog/2019/4/8/from-bare-metal-to-kubernetes.htmlhtml
This is a guest post by Hugues Alary, Lead Engineer at Betabrand, a retail clothing company and crowdfunding platform, based in San Francisco. This article was originally published here.前端
retail :零售
crowdfunding :群众募资
这是Hugues Alary 写的一篇客座博文(客座博文是什么?),Hugues Alary是Betabrand的首席工程师。Betabrand是一家位于旧金山的衣服零售公司,也是一家众筹平台。这篇文章就是在这里发出的。node
How migrating Betabrand's bare-metal infrastructure to a Kubernetes cluster hosted on Google Container Engine solved many engineering issues—from hardware failures, to lack of scalability of our production services, complex configuration management and highly heterogeneous development-staging-production environments—and allowed us to achieve a reliable, available and scalable infrastructure.python
This post will walk you through the many infrastructure changes and challenges Betabrand met from 2011 to 2018.linux
migrating :迁移
bare :adj. 光秃秃的, 无遮蔽的,赤裸,恰好够的, 勉强 vt. 使赤裸, 使露出, 使暴露
metal :n. 金属
infrastructure :基础设施,基础结构
Kubernetes cluster :K8s 集群
failures :n. 失败;故障;失败者;破产
scalability :n. 可量测性,可伸缩性
heterogeneous :adj. 多种多样的;混杂的
staging :n. 分段运输;脚手架;上演;乘驿马车的旅行,v. 表演;展示;分阶段进行;筹划(stage的ing形式)
Betabrand的主机设施是如何一步步迁移到K8s 集群的,K8s 集群是一种能够帮咱们解决各类工程问题,好比软件运行故障,生产服务可扩展性弱,复杂的配置管理,提供高异质性的 开发环境 - 测试环境 - 生产环境,为咱们实现一个可靠的、可用、可扩展的虚拟主机。git
这篇博客将会带你走过从2011年到2018年,Betabrand在虚拟主机的迁移过程当中遇到的许许多多的改变和挑战。github
Betabrand’s infrastructure has changed many times over the course of the 7 years I’ve worked here.web
In 2011, the year our CTO hired me, the website was hosted on a shared server with a Plesk interface and no root access (of course). Every newsletter send—to at most a few hundred people—would bring the website to its knees and make it crawl, even completely unresponsive at times.redis
My first order of business became finding a replacement and move the website to its own dedicated server.docker
course :n. 课程 进程, 过程航向, 航线 一道菜
newsletter :时事通信;业务通信,内部通信;新闻信札
dedicated :专用的,专一的
After a few days of online research, we settled on a VPS—8GB RAM, 320GO Disk, 4 Virtual CPU, 150Mbps of bandwidth—at Rackspace . A few more days and we were live on our new infrastructure composed of… 1 server; running your typical Linux, Apache, PHP, MySQL stack, with a hint of Memcached.
Unsurprisingly, this infrastructure quickly became obsolete.
Not only didn’t it scale at all but, more importantly, every part of it was a Single Point Of Failure. Apache down? Website down. Rackspace instance down? Website down. MySQL down… you get the idea.
Another aspect of it was its cost.
Our average monthly bill quickly climbed over $1,000. Which was quite a price tag for a single machine and the—low—amount of traffic we generated at the time.
After a couple years running this stack, mid-2013, I decided it was time to make our website more scalable, redundant, but also more cost effective.
I estimated, we needed a minimum of 3 servers to make our website somewhat redundant which would amount to a whopping $14400/year at Rackspace. Being a really small startup, we couldn’t justify that "high" of an infrastructure bill; I kept looking.
The cheapest option ended up to be running our stack on bare-metal servers.
Rackspace :全球三大云计算中心之一,1998年成立,是一家全球领先的托管服务器及云计算提供商,公司总部位于美国,在英国,澳大利亚,瑞士,荷兰及香港设有分部
settled :adj. 固定的;稳定的;v. 解决;定居(settle的过去分词)
settled on :决定
composed :adj. 镇静的,沉着的,vt. 组成, 构成
typical :adj. 典型的;特有的;象征性的
Unsurprisingly :不出所料
obsolete :adj. 老式的;废弃的n. 废词;陈腐的人vt. 废弃;淘汰
scale :n. 刻度;比例;数值范围;天平;规模;鳞
aspect :n. 方面
generate :发生
redundant :adj. 因人员过剩而被解雇的;不须要的; 多余的,在这里,做者应该是想要添加备份服务器,让系统能够更稳定
estimate :vi. 估计,估价;n. 估计,估价;判断,见解;vt. 估计,估量;判断,评价
whopping :adj. 巨大的;天大的;adv. 很是地;异常地
justify :vt. 证实…有理; 为…辩护
cheapest :最便宜的
I had worked in the past with OVH and had always been fairly satisfied (despite mixed reviews online). I estimated that running 3 servers at OVH would amount to $3240/year, almost 5 times less expensive than Rackspace.
Not only was OVH cheaper, but their servers were also 4 times more powerful than Rackspace’s: 32GB RAM, 8 CPUs, SSDs and unlimited bandwidth.
To top it off they had just opened a new datacenter in North America.
A few weeks later Betabrand.com was hosted at OVH in Beauharnois, Canada.
我在这里工做的7年时间里,Betabrand的虚拟主机建设更新换代过不少次。
在2011年,咱们的CTO把我招了进来,咱们的网站经过Plesk的接口部署在他们的共享服务器上,而且没有root权限。每一条要发给几百人的通信消息,都会让网站变的脆弱不堪,像是在慢慢的爬行,有时候甚至会彻底没有反应。所以,个人第一个任务就是找到能够替代方案,把咱们的网站运行在他们的专用服务器上。
通过几天的网络搜索,咱们选中了Rackspace的一台VPS(虚拟专用服务器)- 8GRAM,320G 硬盘,4核CPU,150M带宽。没过几天,咱们就开始使用这个由一台虚拟主机组成的服务器;在上面运行Linux,Apache, PHP, MySQL,还带有一点Memcache缓存服务。
不出所料,这台虚拟主机没用到多久,就又变慢了。
不只仅是它的不可扩展性,更重要的是,每一部分都是单点故障,Apache掉了,网站就掉了,Rackspace掉了,网站也掉了,数据库掉了.....ok,你已经明白了。
另外一个方面是他的花费。
咱们平均每月的帐单很快超过了$1000.这对于单台机器来讲是一个很是高的标价,并且咱们当时生成的流量不多。
在运行这一块技术服务几年后,2013 年年中,我决定是时候使咱们的网站更具可扩展性、冗余性,并且更具成本效益。
Between 2013 and 2017, our hardware infrastructure went through a few architectural changes.
Towards the end of 2017, our stack was significantly larger than it used to be, both in terms of software and hardware.
Betabrand.com ran on 17 bare-metal servers:
2 HAProxy machines in charge of SSL Offloading configured as hot-standby
2 varnish-cache machines configured in a hot-standby load-balancing to our webservers
5 machines running Apache and PHP-FPM
2 redis servers, each running 2 separate instances of redis. 1 instance for some application caching, 1 instance for our PHP sessions
3 MariaDB servers configured as master-master, though used in a master-slave manner
3 Glusterd servers serving all our static assets
Each machine would otherwise run one or multiple processes like keepalived, Ganglia, Munin, logstash, exim, backup-manager, supervisord, sshd, fail2ban, prerender, rabbitmq and… docker.
However, while this infrastructure was very cheap, redundant and had no single point of failure, it still wasn’t scalable and was also much harder to maintain.
architectural :adj. 建筑学的;建筑上的;有关建筑的符合建筑法的
architectural changes :体系结构的更改
significantly :adv. 意味深长地,值得注目的
Offloading :卸载
hot-standby :热备份
separate :vt. 使分离;使分居;使分开;vi. 分开;分居;隔开;adj. 分开的;单独的
instances :实例
assets :n. 资产;有用的东西;有利条件;优势
Varnish Cache 是一个web应用程序加速器,也是一个HTTP反向代理软件
HAProxy是一个使用C语言编写的自由及开放源代码软件,其提供高可用性、负载均衡,以及基于TCP和HTTP的应用程序代理
SSL网络通讯提供安全及数据完整性的一种安全协议
在2013年到2017年之间,咱们的硬件架构通过了几回体系结构上的改变。
到 2017 年末,咱们的技术栈在软件和硬件方面都比过去大得多。
Betabrand.com运行在17台裸机服务器上面。
Administering our server "fleet" now involved writing a set of Ansible scripts and maintaining them, which, despite Ansible being an amazing software, was no easy feat.
Even though it will make its best effort to get you there, Ansible doesn’t guarantee the state of your system.
fleet :adj. 快速的,敏捷的;n. 舰队;小河;港湾
involved :adj. 卷入的;有关的;复杂的;v. 涉及;使参与
guarantee :vt. 保证; 担保;n. 保证, 保障; 保证书; 保用期;担保, 担保人;担保品, 抵押品
For example, running your Ansible scripts on a server fleet made of heterogeneous OSes (say debian 8 and debian 9) will bring all your machines to a state close to what you defined, but you will most likely end up with discrepancies; the first one being that you’re running on Debian 8 and Debian 9, but also software versions and configurations being different on some servers and others.
I searched quite often for an Ansible replacement, but never found better.
I looked into Puppet but found its learning curve too steep, and, from reading other people’s recipes, was taken aback by what seemed to be too many different ways of doing the same thing. Some people might think of this as flexibility, I see it as complexity.
SaltStack caught my eyes but also found it very hard to learn; despite their extensive, in depth documentation, their nomenclature choices (mine, pillar, salt, etc) never stuck with me; and it seemed to suffer the same issue as Puppet regarding complexity.
Nix package manager and NixOS sounded amazing, to the exception that I didn’t feel comfortable learning a whole new OS (I’ve been using Debian for years) and was worried that despite their huge package selection, I would eventually need packages not already available, which would then become something new to maintain.
Those are the only 3 I looked at but I’m sure there’s many other tools out there I’ve probably never heard of.
heterogeneous :adj. 多种多样的;混杂的
Puppet :puppet是一个IT基础设施自动化管理工具
curve :n. 曲线;弯曲;曲线球;曲线图表
steep :adj. 陡峭的;夸大的;不合理的;急剧升降的
recipe :n. 烹饪法; 食谱;方法; 秘诀; 诀窍
flexibility :n. 柔韧性;机动性,灵活性
complexity :n. 复杂性,错综复杂的状态
taken aback :吃了一惊
aback :adv. 向后;处于顶风位置;向后地
SaltStack 是一个服务器基础架构集中化管理平台
caught :v. 捕捉(catch的过去分词)
extensive :adj. 广阔的, 普遍的; 大量的, 大规模的
nomenclature :n. 命名法;术语
stuck :v. 刺(stick的过去式)adj. 不能动的;被卡住的
suffer :vt. 忍受;遭受;经历;vi. 受损害;受痛苦;遭受,忍受;经验
regarding :prep. (表示论及)关于; 至于; 就…而论
unix与类unix系统,统称为*nix。
exception :n. 例外
eventually :adv. 终于, 最后
Writing Ansible scripts and maintaining them, however, wasn’t our only issue; adding capacity was another one.
With bare-metal, it is impossible to add and remove capacity on the fly. You need to plan your needs well in advance: buy a machine—usually leased for a minimum of 1 month—wait for it to be ready—which can take from 2 minutes to 3 days--, install its base os, install Ansible’s dependencies (mainly python and a few other packages) then, finally, run your Ansible scripts against it.
For us this entire process was wholly unpractical and what usually happened is that we’d add capacity for an anticipated peak load, but never would remove it afterwards which in turn added to our costs.
It is worth noting, however, that even though having unused capacity in your infrastructure is akin to setting cash on fire, it is still a magnitude less expensive on bare-metal than in the cloud. On the other hand, the engineering headaches that come with using bare-metal servers simply shift the cost from purely material to administrative ones.
In our bare-metal setup capacity planning, server administration and Ansible scripting were just the tip of the iceberb.
capacity :n. 能力;容量;生产力;资格,地位
in advance :adv. 预先,提早
leased :adj. 租用的
entire :adj. 所有的,整个的;全体的
wholly :adv. 彻底地;所有;通通
unpractical :adj. 不切实际的;不实用的;不现实的;行不通的
anticipated :vt. 先于…行动,预期
peak :n. 顶点;山峰;最高点;帽舌;vt. 使达到最高点;使竖起;adj. 最高的;最大值的;vi. 消瘦;到达最高点;变憔悴
infrastructure :基础设施
akin :adj. 同族的;同类的;相似的
magnitude :n. 巨大; 重要性
shift :n. 手段;移动;轮班;变化;vi. 移动;转换;转变;vt. 替换;转移;改变
purely :adv. 纯粹地;贞淑地;清洁地;彻底地;仅仅,只不过
iceberb :冰山
如今,为了维护和管理咱们的服务“集群”,咱们须要写一套Ansible脚本,尽管,Ansible是一个神奇的软件,可是这绝对不是一件简单的事情。
即便,Ansible如今能够带给你最好的结果,可是他也不能总给你保证系统的状态。
例如:在由不一样的操做系统(这里多是的debian8 和 debian9)组成的服务器集群上运行咱们的Ansible脚本,可使咱们全部的机器达到近似咱们设定的状态。可是,最终也是会有差别的,第一个就是在debian8 和 debian9上面运行,可是这些不一样的服务器上的软件的版本和配置都会有差别。
我常常搜索Ansible的替代软件,可是,一直没有找到比Ansible更好的。
我查到了Puppet,可是这个软件的学习曲线太陡峭了。而且,在阅读别人编写的操做指南的时候,作一件相同的事情有好多种不一样的操做方法,我真的很吃惊。有些人可能会认为这事灵活性的体现,可是我以为这让它变得很复杂。
SaltStack这个软件进入了个人实现,可是我发现它也很难学。尽管它有大量的,写的很细致的文档。可是他的那些术语并无打动到我,而且,它好像和Puppet有一样的-复杂性的毛病。
Nix包管理和Nix系统,看起来很不错。例外的是,我有些反感学习一个新系统(我一直用Debian,而且用了许多年),而且,尽管它提供了大量可供选择的包,最终我仍是没有找到我须要的软件包,这会变成一个新的须要管理的项目。
这仅仅是我找出来的3个软件,可是我肯定,确定还有不少种我没有据说过的工具没有列出来。
然而,编写和管理Ansibe脚本并非咱们惟一的问题。另外一个问题是,没法对设备进行升级(提高性能或者容量)。
使用裸机的时候,没法动态的对设备进行升级。你须要提早按照计划规划好你须要的性能:购买机器---首先须要先租用至少一个月---等它准备好---一般须要花费2分钟到3天的时间---安装系统---安装Ansible的依赖环境(主要是python和其余一些软件包)---最后运行你的Ansible脚本,可能你还须要对它进行调整,而后重来
对于咱们来讲,这整个过程是不切实际的。而且,一般状况下,增长的最大性能,一旦处于运行中,以后就不会将它移除了,这样就会增长咱们的开销。
值得注意的是,尽管,设备中没有使用到的性能就像是在烧钱,重要的是,这依然比购买云服务器便宜。另外一方面,使用裸机服务器引发的工程难题只是将成本从纯粹的材料花费转移到了管理成本。
在咱们的裸机性能容量规划中,服务器的管理和Ansible脚本只是冰山一角。
In early 2017, while our infrastructure had grown, so had our team.
We hired 7 more engineers making us a small 9 people team, with skillsets distributed all over the spectrum from backend to frontend with varying levels of seniority.
Even in a small 9 people team, being productive and limiting the amount of bugs deployed to production warrants a simple, easy to setup and easy to use development-staging-production trifecta.
Setting up your development environment as a new hire shouldn’t take hours, neither should upgrading or re-creating it.
Moreover, a company-wide accessible staging environment should exist and match 99% of your production, if not 100%.
Unfortunately, in our hardware infrastructure reaching this harmonious trifecta was impossible.
Scaling :n. 缩放比例;鳞片排列;[医]刮治术,刮牙术;v. 刮鳞;剥落;生水垢(scale的ing形式)
infrastructure :n. 基础设施; 基础结构
distributed :adj. 分布式的
spectrum :n. 光谱;范围, 系列
frontend :前端
seniority :n. 年长;职位高;年资, 资历
productive :adj. 多产的, 富饶的;富有成效的; 有益的
warrants :n. 受权证; 许可证;vt. 使…显得合理; 成为…的根据;保证, 担保
trifecta :n. (赛马赌博的)三连胜式
Moreover :此外,并且
company-wide :全公司
accessible :adj. 容易取得的,容易得到的,容易达到的
harmonious :adj. 和谐的,和气的;协调的,调和的;音调优美的;悦耳的
在2017年的年初,随着咱们服务器数量的增多,咱们的团队也增大了。
咱们雇佣了7名工程师组成了9人团队,技能范围覆盖了从后端到前端的各个资历,各个级别。
即便是在9人的小团队中,要作到工做效率高,而且将bug的数量限制在一个合理的范围中,也须要一个简单的,易于设置,易于使用的开发--测试--生产的三大流程体系工具。
从新设置一个开发环境,不该该花费数个小时,也不该该升级或者从新安装。
此外,必须有全公司随时都能用的,用于中间测试的环境,而且和真实生产环境能达到99%的匹配(若是达不到100%的话)
First of all, everybody in our engineering team uses MacBook Pros, which is an issue since our stack is linux based.
However, asking everybody to switch to linux and potentially change their precious workflow wasn’t really ideal. This meant that the best solution was to provide a development environment agnostic of developers' personal preferences in machines.
I could only see two obvious options:
Either provide a Vagrant stack that would run multiple virtual machines (17 potentially, though, more realistically, 1 machine running our entire stack), or, re-use the already written ansible scripts and run them against our local macbooks.
After investigating Vagrant, I felt that using virtual machines would hinder performances too much and wasn’t worth it. I decided, for better or worse, to go the Ansible route (in hindsight, this probably wasn’t the best decision).
We would use the same set of Ansible scripts on production, staging and dev. The caveat being of course that our development stack, although close to production, was not a 100% match.
This worked well enough for a while; However, the mismatch caused issues later when, for example, our development and production MySQL versions weren’t aligned. Some queries that ran on dev, wouldn’t on production.
potentially :adv. 潜在地;可能地
precious :adj. 宝贵的;珍贵的;矫揉造做的
agnostic :n. 不可知论者;adj. 不可知论(者)的
obvious :明显的,显而易见的
potentially :
realistically :adv. 现实地;实际地
investigating :调查
hinder :vt. & vi. 阻碍; 妨碍
hindsight :n. 过后的觉悟;过后的聪明
caveat :n. 警告;停止诉讼手续的申请;货物出门概不退换;中止支付的广告
开发环境
首先,咱们团队中的全部开发工程师都是使用MacBook Pros,由于咱们的代码运行在Linux上,所以这是一个问题。
然而,要求他们都切换到Linux,而且极可能改变他们宝贵的工做习惯,不是一个好主意。这意味着,最好的解决办法就是在机器上提供一个开发环境,并不考虑开发者我的的喜爱。
只有两个显而易见的选择:
要么提供一个Vagrant虚拟机运行多个虚拟主机(更实际地说,可能有17台主机在一台机器上运行咱们的整个项目),要么用已经编写好的ansible脚本,并在本地macbooks上运行它们。
在调查了Vagrant以后,我以为使用虚拟机,会阻碍太多的表现,不值得。不管好坏,我决定Ansible这条路(过后看来,这可能不是最好的决定)。
咱们将在生产、测试和开发上使用相同的Ansible脚本。须要注意的是,咱们的开发堆栈虽然接近生产,但并非100%匹配。
刚开始的时候,运行的挺好的;可是,当咱们的开发和生产MySQL版本不一致时,这种不匹配就会致使后面的问题,一些在开发上能够运行的查询,在生产环境上却不能运行。
Secondly, having a development and production environments running on widely different softwares (mac os versus debian) meant that we absolutely needed a staging environment.
Not only because of potential bugs caused by version mismatches, but also because we needed a way to share new features to external members before launch.
Once again I had multiple choices:
buy 17 servers and run ansible against them. This would double our costs though and we were trying to save money.
setup our entire stack on a unique linux server, accessible from the outside. Cheaper solution, but once again not providing an exact replica of our production system.
I decided to implement the cost-saving solution.
An early version of the staging environment involved 3 independant linux servers, each running the entire stack. Developers would then yell across the room (or hipchat) "taking over dev1", "is anybody using dev3?", "dev2 is down :/".
Overall, our development-staging-production setup was far from optimal: it did the job; but definitely needed improvements.
absolutely :adv. 彻底地,绝对地
replica :复制品
implement :vt. 使生效, 贯彻, 执行 ;n. 工具, 器具, 用具
independant :adj. 独立的;单独的;无党派的;不受约束...
definitely : adv. 明确地, 确切地 必定地, 确定地
测试环境(交付准备环境)
其次,在不一样的软件系统上(mac os或者 debia)运行咱们的生产和开发环境,意味着咱们必需要有一个交付准备环境。
这不只是由于版本不匹配致使潜在的bug,还由于咱们须要在启动以前向外部成员共享新特性。
我又一次有了多个选择:
我决定执行那个省钱的方案。
一个早期的测试环境包含3个独立的Linux服务器,每个都运行着所有的技术栈。而后,开发人员会在房间里大声说(或者嘻嘻哈哈)“接管一下 dev1”,“有人在用dev3吗?”,“dev2关机了”
总的来讲,咱们的开发---测试---生产 的流程距离理想状态很远很远,他是能够工做的,但真的还须要好好改进。
In 2013 Dotcloud released Docker.
The Betabrand use case for Docker was immediately obvious. I saw it as the solution to simplify our development and staging environments; by getting rid of the ansible scripts (well, almost; more on that later).
Those scripts would now only be used for production.
At the time, one main pain point for the team was competing for our three physical staging servers: dev1, dev2 and dev3; and for me maintaining those 3 servers was a major annoyance.
After observing docker for a few months, I decided to give it a go in April 2014.
After installing docker on one of the staging servers, I created a single docker image containing our entire stack (haproxy, varnish, redis, apache, etc.) then over the next few months wrote a tool (sailor
) allowing us to create, destroy and manage an infinite number of staging environment accessible via individual unique URLs.
Worth noting that docker-compose didn’t exist at that time; and that putting your entire stack inside one docker image is of course a big no-no but that’s an unimportant detail here.
From this point on, the team wasn’t competing anymore for access to the staging servers. Anybody could create a new, fully configured, staging container from the docker image using sailor. I didn’t need to maintain the servers anymore either; better yet, I shut down and cancelled 2 of them.
Our development environment, however, still was running on macos (well, "Mac OS X" at the time) and using the Ansible scripts.
Then, sometime around 2016 docker-machine was released.
Docker machine is a tool taking care of deploying a docker daemon on any stack of your choice: virtualbox, aws, gce, bare-metal, azure, you name it, docker-machine does it; in one command line.
I saw it as the opportunity to easily and quickly migrate our ansible-based development environment to a docker based one. I modified sailor
to use docker-machine as its backend.
Setting up a development environment was now a matter of creating a new docker-machine then passing a flag for sailor to use it.
At this point, our development-staging process had been simplified tremendously; at least from a dev-ops perspective: anytime I needed to upgrade any software of our stack to a newer version or change the configuration, instead of modifying my ansible scripts, asking all the team to run them, then running them myself on all 3 staging servers; I could now simply push a new docker image.
Ironically enough, I ended up needing virtual machines (which I had deliberately avoided) to run docker on our macbooks. Using vagrant instead of Ansible would have been a better choice from the get go. Hindsight is always 20/20.
Using docker for our development and staging systems paved the way to the better solution that Betabrand.com now runs on.
immediately : adv. 当即, 立刻 直接地
rid :vt. 使摆脱, 解除…的负担, 从…中清除
Worth noting :值得注意
compose :t. 组成, 构成
opportunity :机会
modified :改良的
tremendously :极大地
perspective :n. 远景, 景 前途; 但愿 透视 透视图 观点, 想法
modify :修改
Ironically :adv. 嘲讽地, 挖苦地 具备讽刺意味地
deliberately :adv. 慎重地;谨慎地 故意地,蓄意地 从容不迫地,镇定自若地
Hindsight :n. 过后的觉悟;过后的聪明
20/20. :用来表示 完美annoyance :n. 恼怒;烦恼;打扰
infinite :adj. 无限的,无穷的;无数的;
2013年,Dotcloud发布了Docker。
咱们的网站Betabrand在使用了Docker实例后的效果很是明显,我以为这是简化咱们的开发和测试环境的方案,因此,咱们能够摆脱麻烦的Ansible了(好的,后面将详细介绍)。
这些脚本,如今只用在生产环境。
当时,团队的一个主要痛点是争夺咱们的三个测试服务器:dev一、dev2和dev3;对我来讲,维护这3台服务器是一个很大的麻烦。
在对docker观察了几个月以后,2014年,我决定放手去作。
我在一台测试服务器上安装了Docker,建立了一个包含咱们全部技术栈(haproxy, varnish, redis, apache,等等)的docker镜像。在接下来的几个月写了一个工具(sailor
),这个工具能够容许咱们每个的单独URLs建立、销毁、管理无数的测试环境
值得注意的是,docker-compose在当时并不存在;固然,将整个堆栈放在一个docker映像中是一个很大的禁忌,但在这里,这是一个不重要的细节。
从如今开始,团队再也不争着访问测试服务器了。任何人均可以使用sailor
从docker镜像建立一个新的,彻底配置的docker容器。我也不用再须要维护服务器了;更好的是,我关闭并取消了其中的2个。
可是,咱们的开发环境仍在macos上运行(当时,“Mac OS X”)并使用Ansible脚本。
而后,2016年左右的docker-machine发布了。
docker-machine是一个工具,为你选择的技术栈建立并维护一个守护进程:virtualbox,aws,gce,bare-metal,azure,你能够为它命名,这些都在命令行中操做。
我以为这是能够简单快速的将基于ansible的开发环境迁移到基于docker机会。我改进了sailor
使用docker-machine做为它的后端。
如今,创建一个开发环境就是建立一个新的docker-machine,而后为sailor
传递一个标志来使用它。
在这一点上,咱们的开发阶段过程获得了极大的简化;至少从开发者的角度来看:任什么时候候我须要将咱们技术栈的任何软件升级到更新的版本或更改配置,而不是修改个人ansible脚本,要求全部团队运行它们,而后我须要在3台测试服务器上把它们都运行一次;我如今能够简单地推送一个新的docker镜像。
具备讽刺意味的是,我最终须要虚拟机(我故意避免使用)在咱们的macbook上运行docker。使用vagrant代替Ansible原本是一个更好的选择。后见之明老是20/20。
使用Docker,为咱们开发和测试系统找到更好的网站运行的方案铺平了道路
Because Betabrand is primarily an e-commerce platform, Black Friday loomed over our website more and more each year.
To our surprise, the website had handled increasingly higher loads since 2013 without failing in any major catastrophe, but, it did require a month long preparation beforehand: adding capacity, load testing and optimizing our checkout code paths as much as we possibly could.
After preparing for Black Friday 2016, however, it became evident the infrastructure wouldn’t scale for Black Friday 2017; I worried the website would become inacessible under the load.
Luckily, sometime in 2015, the release of Kubernetes 1.0 caught my attention.
Just like I saw in docker an obvious use-case, I knew k8s was what we needed to solve many of our issues. First of all, it would finally allow us to run an almost identical dev-staging-production environment. But also, would solve our scalability issues.
I also evaluated 2 other solutions, Nomad and Docker Swarm, but Kubernetes seemed to be the most promising.
For Black Friday 2017, I set out to migrate our entire infra to k8s.
Although I considered it, I quickly ruled out using our current OVH bare-metal servers for our k8s nodes since it would play against my goal of getting rid of Ansible and not dealing with all the issue that comes with hardware servers. Moreover, soon after I started investigating Kubernetes, Google released their managed Kubernetes (GKE) offer, which I rapidly came to choose.
loom :n. 织布机;若隐若现的景象;vi. 可怕地出现;朦胧地出现;隐约可见
increasingly :adv. 愈来愈多地;渐增地
catastrophe :n. 大灾难;大祸;惨败
preparation :n. 预备;准备
evident :adj. 明显的;明白的
infrastructure :n. 基础设施;公共建设;下部构造
inacessible :?inaccessible 做者写错了?--->adj. 达不到的, 不可及的
identical :adj. 同一的;彻底相同的
scalability :n. 可扩展性;可伸缩性;可量测性
因为Betabrand主要是一个电子商务平台,每一年的黑色星期五对咱们网站的关注的用户愈来愈多。
令咱们惊讶的是,自2013年以来,该网站已经处理了愈来愈高的负载而没有遇到任何重大灾难,可是,它确实须要提早一个月进行准备:增长容量,负载测试并尽量地优化咱们的结帐代码路径。
然而,在准备2016年黑色星期五以后,很明显2017年黑色星期五的基础设施不会扩展;我担忧网站会在负载下变得没法控制。
幸运的是,在2015年的某个时候,Kubernetes 1.0的发布引发了个人注意。
就像我在docker中看到一个明显的用例同样,我知道k8s能够借我咱们遇到的许多问题。首先,它最终将容许咱们运行几乎相同的开发生产环境。同时,也将解决咱们的可扩展性问题。
我还评估了其余2个解决方案,Nomad和Docker Swarm,但Kubernetes彷佛是最有但愿的。
对于2017年黑色星期五,我开始将整个基础设施迁移到k8s。
尽管我考虑过这一点(将咱们的服务器用于k8s节点),但我很快就排除了这个作法,由于它会违背个人目标,即摆脱Ansible而不是处理硬件服务器带来的全部问题。 此外,在我开始调查Kubernetes以后不久,谷歌发布了他们管理的Kubernetes(GKE)产品,我很快就选择了。
Migrating to k8s first involved gaining a strong understanding its architecture and its concepts, by reading the online documentation.
Most importantly understanding containers, Pods, Deployments and Services and how they all fit together. Then in order, ConfigMaps, Secrets, Daemonsets, StatefulSets, Volumes, PersistentVolumes and PersistentVolumeClaims.
Other concepts are important, though less necessary to get a cluster going.
Once I assimilated those concepts, the second, and hardest, step involved translating our bare-metal architecture into a set of YAML manifests.
From the beginning I set out to have one, and only one, set of manifests to be used for the creation of all three development, staging and production environment. I quickly ran into needing to parameterized my YAML manifests, which isn’t out-of-the-box supported by Kubernetes. This is where Helm [1] comes in handy.
from the Helm website: Helm helps you manage Kubernetes applications—Helm Charts helps you define, install, and upgrade even the most complex Kubernetes application.
Helm markets itself as a package manager for Kubernetes, I originally used it solely for its templating feature though. I have, now, also come to appreciate its package manager aspect and used it to install Grafana [2] and Prometheus [3].
After a bit of sweat and a few tears, our infrastructure was now neatly organized into 1 Helm package, 17 Deployments, 9 ConfigMaps, 5 PersistentVolumeClaims, 5 Secrets, 18 Services, 1 StatefulSet, 2 StorageClasses, 22 container images.
All that was left was to migrate to this new infrastructure and shutdown all our hardware servers.
gain :n. 增长;利润;收获 vt. 得到;增长;赚到 vi. 增长;获利
concepts :n. 概念,观念;思想
assimilate :vt. 吸取;使同化;把…比做;使类似
architecture :n. 建筑学;建筑风格;建筑式样;架构
manifest :n. 载货单,货单;旅客名单;货运列车编组清单;v. 代表,清楚显示
set out :vt. 规划,展示,开始@vi. 出发
parameterized :参数化的
out-of-the-box :开箱即用的
handy :adj. 手边的,就近的;便利的;容易取得的;敏捷的
sweat :汗水
迁移到k8s首先须要经过阅读在线文档对其体系结构和概念有一个较强的理解。
最重要的是理解containers, Pods, Deployments 和Services以及它们是如何组合在一块儿的。而后依次是ConfigMaps, Secrets, Daemonsets, StatefulSets, Volumes, PersistentVolumes and PersistentVolumeClaims。
其余的概念也很重要,可是对于集群的运行来讲就不是那么必要了。
一旦我吸取了这些概念,第二个也是最难的步骤就是将咱们的服务器架构转换成一组YAML清单。
从一开始,我就规划了一组Yaml清单,仅有的清单用于建立全部三个开发、测试和生产环境。我很快就须要参数化个人YAML清单,Kubernetes不支持的开箱即用。这个时候,Helm就该派上用场了
来自Helm的网站:Helm帮助您定义、安装和升级不管多么复杂的Kubernetes应用程序。
Helm将本身定位为Kubernetes的包管理器,但我最初只是将其用于模板特性。如今,我也开始欣赏它的包管理器方面,并使用它安装Grafana[2]和Prometheus[3]。
通过一些汗水和泪水,咱们的基础设施如今被整齐地组织成一个Helm包、17个部署、9个ConfigMaps、5个PersistentVolumeClaims、5个secret、18个服务、1个状态集、2个存储库、22个容器映像。
这些都作好以后,剩下的就是迁移到这个新的设备上,并关闭全部的硬件服务器。
October 5th 2017 was the night.
Pulling the trigger was extremely easy and went without a hitch.
I created a new GKE cluster, ran helm install betabrand --name production
, imported our MySQL database to Google Cloud SQL, then, after what actually took about 2 hours, we were live in the Clouds.
The migration was that simple.
What helped a lot of course, was the ability to create multiple clusters in Google GKE: before migrating our production, I was able to rehearse through many test migration, jotting down every steps needed for a successful launch.
Black Friday 2017 was very successful for Betabrand and the few technical issues we ran into weren’t associated to the migration.
pull :拖、拉
trigger :vt. 触发;引起,引发 vi. 松开扳柄 n. 触发器;扳机;制滑机
extremely :极端、极其、很是
hitch :n. 钩;猛拉;急推;蹒跚;故障
rehearse :排练,预演
jotting :简短的笔记
associated :联合的,关联的
2017年10月5日晚上。
扣动扳机极其容易,而且没有任何问题。
经过运行helm install betabrand --name production,我建立了一个新的GKE集群,将咱们的MySQL引入到谷歌的云SQL。而后,在等待了将近2个小时后,咱们就进入了云端。
迁移就是这么简单。
帮助最大的功课,是在Google GKE中建立多个集群的能力:在迁移咱们的生产环境以前,我尽量的排练了屡次迁移过程,而后记下成功操做的每个步骤。
咱们的Betabrand成功的度过了2017年的黑色星期五,有几个技术问题和此次的迁移无关。
Our development machines run a Kubernetes cluster via Minikube [4].
The same YAML manifests are being used to create a local development environment or a "production-like" environment.
Everything that runs on Production, also runs in Development. The only difference between the two environments is that our development environment talks to a local MySQL database, whereas production talks to Google Cloud SQL.
Creating a staging environment is exactly the same as creating a new production cluster: all that is needed is to clone the production database instance (which is only a few clicks or one command line) then point the staging cluster to this database via a --set database
parameter in helm
.
parameter :实例
咱们的开发环境经过Minikube运行在K8s集群上。
相同的YAML清单用于建立本地开发环境或“相似生产”环境。
在生产环境运行的全部东西,都要在开发环境上运行,这两个环境的不一样之处点就是,咱们的开发环境使用本地的MySQL数据库,而生产环境使用谷歌云SQL。
建立一个测试环境几乎和建立一个新的生产环境是同样的。须要作的就是复制一些生产环境的数据库实例(只须要点击记下或者一条命令)而后经过在helm里面设置参数 -- set database 将测试环境的数据执行复制的这个数据库。
It’s now been a year and 2 months since we moved our infrastructure to Kubernetes and I couldn’t be happier.
Kubernetes has been rock solid in production and we have yet to experience an outage.
In anticipation of a lot of traffic for Black Friday 2018, we were able to create an exact replica of our production services in a few minutes and do a lot of load testing. Those load tests revealed specific code paths performing extremely poorly that only a lot of traffic could reveal and allowed us to fix them before Black Friday.
As expected, Black Friday 2018 brought more traffic than ever to Betabrand.com, but k8s met its promises, and, features like the HorizontalPodAutoscaler coupled to GKE’s node autoscaling allowed our website to absorb peak loads without any issues.
K8s, combined with GKE, gave us the tools we needed to make our infrastructure reliable, available, scalable and maintainable.
solid :固态的
outage :短供期
anticipation :预料,预期
replica :复制品
perform :表演的,履行的
extremely :极端,极其,很是
reveal :显示,显露
promises :承诺
coupled :v. 联接的;成对的;耦合的
从咱们迁移到K8s上已通过去一年两个月了,我如今很是的轻松愉快。
K8s如今已经如磐石般坚固,咱们的系统再也没有经历过中断。
为了预测2018年黑色星期五的大流量,咱们用几分钟的时间建立了一个额外的生产环境副本并开始作压力测试。这些测试显示出特定的代码路径表现的很是差劲,只能显示少许的流量,咱们能够在黑色星期五到来以前修复他们。
意料之中,2018年的黑色星期五给咱们的网站Betabrand带来了史无前例的流量,可是,K8s兑现了它的承诺。像HorizontalPodAutoscaler这样的功能和GKE节点的自动调整伸缩结合在一块儿,可使咱们的网站吸取峰值负载,而不会出现任何问题。
K8s和GKE的结合,给咱们提供了咱们须要的工具,来使咱们的基础设施可靠、可用、可扩展和可维护。