{done}GTD190022: 【翻译】Why we switched from Python to Go

https://getstream.io/blog/switched-python-go/javascript

【中文版】http://blog.csdn.net/dev_csdn/article/details/78386256html

 

Switching to a new language is always a big step, especially when only one of your team members has prior experience with that language. Early this year, we switched Stream’sprimary programming language from Python to Go. This post will explain some of the reasons why we decided to leave Python behind and make the switch to Go.java

切换到一门新语言总意味着是一大步,尤为当你的团队只有一位成员对此语言有过初步的经验。今年早期,咱们把“Stream”的初步开发语言从Python转到Go。下面这个帖子将说明咱们决心撇下Python,切换到Go的部分缘由。python

Reasons to Use Go

Reason 1 – Performance

Go is fast!

Go is extremely fast. The performance is similar to that of Java or C++. For our use case, Go is typically 30 times faster than Python. Here’s a small benchmark game comparing Go vs Java.git

缘由1 - 表现:
Go很快!
Go确实快。其表现相似于Java或C++。在咱们的使用状况,Go一般比Python快30倍。如下是一个比较Go和Java游戏的小参照。github

Reason 2 – Language Performance Matters

For many applications, the programming language is simply the glue between the app and the database. The performance of the language itself usually doesn’t matter much.golang

Stream, however, is an API provider powering the feed infrastructure for 500 companies and more than 200 million end users. We’ve been optimizing Cassandra, PostgreSQL, Redis, etc. for years, but eventually, you reach the limits of the language you’re using.web

Python is a great language but its performance is pretty sluggish for use cases such as serialization/deserialization, ranking and aggregation. We frequently ran into performance issues where Cassandra would take 1ms to retrieve the data and Python would spend the next 10ms turning it into objects.数据库

缘由2 - 语言表现:
对于许多应用程序,编程语言只是应用程序和数据库之间的粘合。 语言的表现一般并不重要。express

然而,Stream是500家公司和超过2亿最终用户的API提供商。 多年来,咱们一直在优化Cassandra,PostgreSQL,Redis等,但最终达到您所使用语言的极限。

Python是一种很棒的语言,可是它的性能对于用例来讲是串行化/反序列化,排序和聚合很是缓慢。 Cassandra将须要1ms的时间来检索数据,Python将花费下一个10ms将其转换成对象。

Reason 3 – Developer Productivity & Not Getting Too Creative

Have a look at this little snippet of Go code from the How I Start Go tutorial. (This is a great tutoril and a good starting point to pick up a bit of Go.)

package main

type openWeatherMap struct{}

func (w openWeatherMap) temperature(city string) (float64, error) {
	resp, err := http.Get("http://api.openweathermap.org/data/2.5/weather?APPID=YOUR_API_KEY&q=" + city)
	if err != nil {
		return 0, err
	}

	defer resp.Body.Close()

	var d struct {
		Main struct {
			Kelvin float64 `json:"temp"`
		} `json:"main"`
	}

	if err := json.NewDecoder(resp.Body).Decode(&d); err != nil {
		return 0, err
	}

	log.Printf("openWeatherMap: %s: %.2f", city, d.Main.Kelvin)
	return d.Main.Kelvin, nil
}

If you’re new to Go, there’s not much that will surprise you when reading that little code snippet. It showcases multiple assignments, data structures, pointers, formatting and a built-in HTTP library.

When I first started programming I always loved using Python’s more advanced features. Python allows you to get pretty creative with the code you’re writing. For instance, you can:

  • Use MetaClasses to self-register classes upon code initialization
  • Swap out True and False
  • Add functions to the list of built-in functions
  • Overload operators via magic methods

These features are fun to play around with but, as most programmers will agree, they often make the code harder to understand when reading someone else’s work.

Go forces you to stick to the basics. This makes it very easy to read anyone’s code and immediately understand what’s going on.

Note: How “easy” it is really depends on your use case, of course. If you want to create a basic CRUD API I’d still recommend Django + DRF, or Rails.

缘由3 - 开发人员的生产力和创造力不够

看看这个Go的小片断。 (这是一个很棒的教程,也是一个很好的起点,能够选择一点Go)。

若是你刚用Go,那没有太多的惊喜。它显示多个分配,数据结构,指针,格式化和内置的HTTP库。

当我第一次开始编程时,我一直喜欢使用Python的更高级的功能。 Python容许你本身写。你的实例,你能够:

在代码初始化时使用MetaClasses来自我注册类
交换真假
将功能添加到内置函数列表中
经过魔术方法重载运算符
这些功能很是有趣,但玩起来颇有趣。

Go使你遵照基础知识,这使得很容易阅读任何人的代码,并当即了解发生了什么。

注意:固然,“容易”真的取决于你的用例。若是你想建立一个基本的CRUD API,我仍然建议使用Django + DRF或者Rails。

Reason 4 – Concurrency & Channels

As a language, Go tries to keep things simple. It doesn’t introduce many new concepts. The focus is on creating a simple language that is incredibly fast and easy to work with. The only area where it does get innovative is goroutines and channels. (To be 100% correct the concept of CSP started in 1977, so this innovation is more of a new approach to an old idea.) Goroutines are Go’s lightweight approach to threading, and channels are the preferred way to communicate between goroutines.

Goroutines are very cheap to create and only take a few KBs of additional memory. Because Goroutines are so light, it is possible to have hundreds or even thousands of them running at the same time.

You can communicate between goroutines using channels. The Go runtime handles all the complexity. The goroutines and channel-based approach to concurrency makes it very easy to use all available CPU cores and handle concurrent IO – all without complicating development. Compared to Python/Java, running a function on a goroutine requires minimal boilerplate code. You simply prepend the function call with the keyword “go”:

缘由4 - 并发与渠道

做为一门语言,Go试图让事情变得简单。它不引入了许多新的概念。重点是创建一个简单的语言确实是使人难以置信的快速和易于使用。若是它确实得到创新的惟一领域是够程和渠道。 ( 100%正确的CSP的概念在1977年启动的,因此这种创新更多的是一种新的方法,以旧的观念。)够程是Go的轻量级的方法来穿线,和渠道是够程之间沟通的首选方式。

够程是很是便宜的建立,只须要额外的内存几KB。由于够程是如此之轻,有可能有狗红魔甚至数千人在同一时间运行。

您可使用渠道够程之间的通讯。 Go运行时处理全部的复杂性。该够程和基于信道来实现并发性使得它很是容易使用全部可用的CPU内核和处理并发IO - 无需复杂的发展。相比到Python / Java的,运行在一个够程的功能只须要不多的样板代码。您只需预先设置的函数调用使用关键字“Go”:

package main

import (
	"fmt"
	"time"
)

func say(s string) {
	for i := 0; i < 5; i++ {
		time.Sleep(100 * time.Millisecond)
		fmt.Println(s)
	}

}

func main() {
	go say("world")
	say("hello")
}

https://tour.golang.org/concurrency/1

Go’s approach to concurrency is very easy to work with. It’s an interesting approach compared to Node where the developer has to pay close attention to how asynchronous code is handled.

Another great aspect of concurrency in Go is the race detector. This makes it easy to figure out if there are any race conditions within your asynchronous code.

Go的并发方法很容易使用。 对于处理异步代码的Node来讲,这是一个有趣的方法。

Go的另外一个很大的方面是Go是赛车探测器。 这使得很容易弄清楚异步代码中是否存在任何竞争条件。

 

Here are a few good resources to get started with Go and channels:

Reason 5 – Fast Compile Time

Our largest micro service written in Go currently takes 6 seconds to compile. Go’s fast compile times are a major productivity win compared to languages like Java and C++ which are famous for sluggish compilation speed. I like sword fighting, but it’s even nicer to get things done while I still remember what the code is supposed to do:

缘由5 - 快速编译时间

咱们最大的微服务目前正在运行6秒编译。 去吧,这是全部关于它。 我喜欢剑术,但更好的是完成任务。

XKCD – Code compiling before Go

Reason 6 – The Ability to Build a Team

First of all, let’s start with the obvious: there are not as many Go developers compared to older languages like C++ and Java. According to StackOverflow, 38% of developers know Java, 19.3% know C++ and only 4.6% know Go. GitHub data shows a similar trend: Go is more widely used than languages such as Erlang, Scala and Elixir, but less popular than Java and C++.

Fortunately, Go is a very simple and easy to learn language. It provides the basic features you need and nothing else. The new concepts it introduces are the “defer” statement and built-in management of concurrency with “go routines” and channels. (For the purists: Go isn’t the first language to implement these concepts, just the first to make them popular.) Any Python, Elixir, C++, Scala or Java dev that joins a team can be effective at Go within a month because of its simplicity.

We’ve found it easier to build a team of Go developers compared to many other languages. If you’re hiring people in competitive ecosystems like Boulder and Amsterdam this is an important benefit.

缘由6 - 创建团队的能力

首先,咱们从明显的开始:没有不少人C ++和Java。根据StackOverflow,38%的开发人员知道Java,19.3%的人知道C ++,只有4.6%的人知道Go。 GitHub数据显示了相似的趋势:Go不只仅用做Erlang,Scala和Elixir等语言,而是比Java和C ++更流行。

幸运的是,Go是一种很是简单易学的语言。它提供您所须要的基本功能,没有其余的。新的概念是“延迟”语句和内置的并行管理与“去往例程”和渠道。 (对于纯粹主义者:Go不是实现这些概念的第一种语言,只是第一种使其流行的语言。)任何Python,Elixir,C ++,Scala或Java的简单性。

咱们发现更容易创建一个开发团队。若是您对诸如博尔德和阿姆斯特丹这样的竞争性生态系统感兴趣,这是一个重要的好处。

Reason 7 – Strong Ecosystem

For a team of our size (~20 people) the ecosystem matters. You simply can’t create value for your customers if you have to reinvent every little piece of functionality. Go has great support for the tools we use. Solid libraries were already available for Redis, RabbitMQ, PostgreSQL, Template parsing, Task scheduling, Expression parsing and RocksDB.

Go’s ecosystem is a major win compared to other newer languages like Rust or Elixir. It’s of course not as good as languages like Java, Python or Node, but it’s solid and for many basic needs you’ll find high-quality packages already available.

缘由7 - 强大的生态系统

对于一个咱们规模的团队(约20人),生态系统很重要。 您根本没法为客户创造价值。 Go对咱们使用的工具备很大的支持。 Redis,RabbitMQ,PostgreSQL,模板解析,任务计划,表达式解析和RocksDB已经可使用实体库。

Go的生态系统是一个主要的胜利。 Java,Python或Node,但若是不是,则可有可无。

Reason 8 – Gofmt, Enforced Code Formatting

Let’s start with what is Gofmt? And no, it’s not a swear word. Gofmt is an awesome command line utility, built into the Go compiler for formatting your code. In terms of functionality it’s very similar to Python’s autopep8. While the show Silicon Valley portrays otherwise, most of us don’t really like to argue about tabs vs spaces. It’s important that formatting is consistent, but the actual formatting standard doesn’t really matter all that much. Gofmt avoids all of this discussion by having one official way to format your code.

缘由8 - Gofmt,强制代码格式化

咱们从Gofmt开始吧? 不,这不是一个发誓的话。 Gofmt是一个使人敬畏的命令行实用程序,内置于Go编译器中,用于格式化代码。 在功能方面,它很是相似于Python的autopep8。 而硅谷的展现除此以外,咱们大多数人并不喜欢争论标签与空格。 格式化是一致的,但实际的格式化标准并不重要。 Gofmt经过一种正式的格式化代码来避免全部这些讨论。

Reason 9 – gRPC and Protocol Buffers

Go has first-class support for protocol buffers and gRPC. These two tools work very well together for building microservices which need to communicate via RPC. You only need to write a manifest where you define the RPC calls that can be made and what arguments they take. Both server and client code are then automatically generated from this manifest. This resulting code is both fast, has a very small network footprint and is easy to use.

From the same manifest, you can generate client code for many different languages even, such as C++, Java, Python and Ruby. So, no more ambiguous REST endpoints for internal traffic, that you have to write almost the same client and server code for every time. .

缘由9 - gRPC和协议缓冲区

Go对协议缓冲区和gRPC有一流的支持。 这两个工具一块儿工做,构建须要经过RPC进行通讯的微服务器。 您只须要编写RPC的表现形式。 服务器和客户端代码都将今后清单自动生成。 这样产生的代码既快速,网络占用也很小,易于使用。

从相同的表现能够建立C ++,Java,Python和Ruby。 所以,内部流量的REST端点不会更加模糊,您每次都必须编写相同的客户端和服务器代码。

Disadvantages of Using Golang

Disadvantage 1 – Lack of Frameworks

Go doesn’t have a single dominant framework like Rails for Ruby, Django for Python or Laravel for PHP. This is a topic of heated debate within the Go community, as many people advocate that you shouldn’t use a framework to begin with. I totally agree that this is true for some use cases. However, if someone wants to build a simple CRUD API they will have a much easier time with Django/DJRF, Rails Laravel or Phoenix.

缺点1 - 缺少框架

Go没有一个主要的框架,如Rails for Ruby,Django for Python或Larvel for PHP。 这是社会上激烈辩论的话题,正如不少人所倡导的那样。 我很高兴服务。 然而,若是有人想要构建一个简单的CRUD API,那么Django / DJRF,Rails Laravel或Phoenix将会更容易一些。

Disadvantage 2 – Error Handling

Go handles errors by simply returning an error from a function and expecting your calling code to handle the error (or to return it up the calling stack). While this approach works, it’s easy to lose scope of what went wrong to ensure you can provide a meaningful error to your users. The errors package solves this problem by allowing you to add context and a stack trace to your errors.

Another issue is that it’s easy to forget to handle an error by accident. Static analysis tools like errcheck and megacheck are handy to avoid making these mistakes.

While these workarounds work well it doesn’t feel quite right. You’d expect proper error handling to be supported by the language.

缺点2 - 错误处理

转到页面顶部转到页面顶部发送私人讯息转到页面顶部 虽然这种方法有效,但很容易解决问题。 错误包经过容许您向您的错误添加上下文和堆栈跟踪来解决此问题。

另外一个问题是,很容易忘记处理错误。 静态分析工具,如errcheck和megacheck。

Disadvantage 3 – Package Management

Go’s package management is by no means perfect. By default, it doesn’t have a way to specify a specific version of a dependency and there’s no way to create reproducible builds. Python, Node and Ruby all have better systems for package management. However, with the right tools, Go’s package management works quite well.

You can use Dep to manage your dependencies to allow specifying and pinning versions. Apart from that, we’ve contributed an open-source tool called VirtualGo which makes it easier to work on multiple projects written in Go.

缺点3 - 软件包管理

Go的包管理绝非完美。 默认状况下,它没有办法指定特定版本的依赖项,而且没法建立可重复构建。 Python,Node和Ruby。 然而,使用正确的工具,Go的软件包管理工做至关不错。

您可使用它来管理依赖项,以容许指定和固定版本。 除此以外,咱们添加了一个名为VirtualGo的开源工具,能够轻松地在Go中编写的多个项目上工做。

Virtual Go

 

Python vs Go

One interesting experiment we conducted was taking our ranked feed functionality in Python and rewriting it in Go. Have a look at this example of a ranking method:

{
	"functions": {
		"simple_gauss": {
			"base": "decay_gauss",
			"scale": "5d",
			"offset": "1d",
			"decay": "0.3"
		},
		"popularity_gauss": {
			"base": "decay_gauss",
			"scale": "100",
			"offset": "5",
			"decay": "0.5"
		}
	},
	"defaults": {
		"popularity": 1
	},
	"score": "simple_gauss(time)*popularity"
}

Both the Python and Go code need to do the following to support this ranking method:

  1. Parse the expression for the score. In this case, we want to turn this string “simple_gauss(time)*popularity” into a function that takes an activity as input and returns a score as output.
  2. Create partial functions based on the JSON config. For example, we want “simple_gauss” to call “decay_gauss” with a scale of 5 days, offset of 1 day and a decay factor of 0.3.
  3. Parse the “defaults” configuration so you have a fallback if a certain field is not defined on an activity.
  4. Use the function from step 1 to score all activities in the feed.

Developing the Python version of the ranking code took roughly 3 days. That includes writing the code, unit tests and documentation. Next, we’ve spent approximately 2 weeks optimizing the code. One of the optimizations was translating the score expression (simple_gauss(time)*popularity) into an abstract syntax tree. We also implemented caching logic which pre-computed the score for certain times in the future.

In contrast, developing the Go version of this code took roughly 4 days. The performance didn’t require any further optimization. So while the initial bit of development was faster in Python, the Go based version ultimately required substantially less work from our team. As an added benefit, the Go code performed roughly 40 times faster than our highly-optimized Python code.

Now, this is just a single example of the performance gains we’ve experienced by switching to Go. It is, of course, comparing apples to oranges:

  • The ranking code was my first project in Go
  • The Go code was built after the Python code, so the use case was better understood
  • The Go library for expression parsing was of exceptional quality

Your mileage will vary. Some other components of our system took substantially more time to build in Go compared to Python. As a general trend, we see that developing Go code takes slightly more effort. However, we spend much less time optimizing the code for performance.

Python和Go代码都是:

解析得分的表达式。在这种状况下,咱们想把这个字符串“simple_gauss(time)* popularity”变成一个将一个活动做为输入并返回一个分数做为输出的函数。
基于JSON配置建立部分功能。例如,咱们想要“simple_gauss”以5天的比例调用“decay_gauss”,1天的偏移量和0.3的衰减因子。
解析“默认值”配置,以便在活动上未定义某个字段时具备回覆。
使用步骤1的功能得分。
开发Python版本的排名代码大约须要3天。包括编写代码,单元测试和文档。接下来,咱们花了两个星期优化代码。其中一个优化是将分数表达式(simple_gauss(time)* popularity)转换为抽象语法树。咱们还实现了未来某些时间预先计算分数的缓存逻辑。

相比之下,开发此代码的Go版本大约须要4天。性能无需进一步优化。因此Python的第一步开发速度更快,基于Go的版本。做为一个额外的好处,代码比咱们高度优化的Python代码快40倍。

如今,这只是性能提高的一个简单例子。固然,比较苹果和橙子:

排名代码是我在Go的第一个项目
Python代码,因此用例更好的理解了
Go图书馆用于表达解析什么是卓越的品质
你的里程会有所不一样。咱们系统的其余组件或多或少依赖于时间。做为一个大趋势,咱们看到发展。可是,咱们花费更少的时间来优化代码以实现性能。

Elixir vs Go – The Runner Up

Another language we evaluated is Elixir. Elixir is built on top of the Erlang virtual machine. It’s a fascinating language and we considered it since one of our team members has a ton of experience with Erlang.

For our use cases, we noticed that Go’s raw performance is much better. Both Go and Elixir will do a great job serving thousands of concurrent requests. However, if you look at individual request performance, Go is substantially faster for our use case. Another reason why we chose Go over Elixir was the ecosystem. For the components we required, Go had more mature libraries whereas, in many cases, the Elixir libraries weren’t ready for production usage. It’s also harder to train/find developers to work with Elixir.

These reasons tipped the balance in favor of Go. The Phoenix framework for Elixir looks awesome though and is definitely worth a look.

Elixir vs. Go - The Runner Up

咱们评估的另外一种语言是Elixir。 Elixir构建在Erlang虚拟机的顶部。 这是一个迷人的语言,咱们有不少Erlang的经验。

对于咱们的用例,咱们注意到这是一个很好的表现。 Go和Elixir都想作一个很好的工做。 可是,若是您查看我的请求的性能, 咱们选择去Elixir的另外一个缘由是生态系统。 对于咱们须要的组件,我想有更多成熟的库,在许多状况下,Elixir库还没有准备好用于生产。 培训/发现开发人员也很难与Elixir合做。

这些缘由致使了Go的平衡。 凤凰框架的Elixir看起来很棒,但绝对值得一看。

Conclusion

Go is a very performant language with great support for concurrency. It is almost as fast as languages like C++ and Java. While it does take a bit more time to build things using Go compared to Python or Ruby, you’ll save a ton of time spent on optimizing the code.

We have a small development team at Stream powering the feeds for over 200 million end users. Go’s combination of a great ecosystem, easy onboarding for new developers,fast performance, solid support for concurrency and a productive programming environment make it a great choice.

Stream still leverages Python for our dashboard, site and machine learning forpersonalized feeds. We won’t be saying goodbye to Python anytime soon, but going forward all performance-intensive code will be written in Go.

If you want to learn more about Go check out the blog posts listed below. To learn more about Stream, this interactive tutorial is a great starting point.

结论
Go是一种很是强大的语言,极大地支持并发。 它几乎和C ++和Java这样的语言同样快。 若是不是,您将可使用它。

Stream拥有一个小型开发团队,为超过2亿的最终用户供电。 这是充分利用系统的最佳方式。

流仍然利用Python为咱们的仪表板,站点和机器学习个性化的饲料。 咱们不会在任什么时候候再说Python,可是前进的全部性能密集型代码都将写在Go中。

若是您想了解更多关于Go,请查看下面列出的博文。 要了解有关Stream的更多信息,本交互式教程是一个很好的起点。

More Reading about Switching to Golang

Learning Go

Tags: gogolanggrpcmicro servicesperformancepythonscalability

Post navigation

  • ksandvik

    dep will be part of Go 1.10. https://github.com/golang/dep/wiki/Roadmap

    • Thierry Schellenbach

      that’s awesome

  • shuoli84

    Your code changed a lot and heavy business logic? Python
    Performance is critical and without much c dependency? Go

    • Hasen

      If your code changes a lot, python is a poor choice. Dynamic typing makes refactoring difficult.

      • In that cases we need rely heavily in unit tests and unfortunately
        create good testes covering all cases is not so easy. A good type
        system, like ML family languages, however make this more easy to
        accomplish.

        I like Python, but Dynamic Typing it sucks
        sometimes. But I like to enforce the usage of 3.5+ with [typing
        hint](https://www.python.org/dev/peps/pep-0484/). That way you can run
        OPTIONALLY your code statically typed through
        [mypy](http://mypy-lang.org/) and see whats happens.

  • great post and glad Go is working well in your team, but this line:

    > there’s no way to create reproducible builds

    we have been using https://github.com/FiloSottile/gvt and the vendor folder for over 2 years, our builds are 100% reproducible, checking in dependencies hasn’t been an issue for us.

    • Jonathan

      Folks starting anew should use https://github.com/golang/dep, which is almost the official tool (slated to be official in Go 1.10). 100% reproducible for sure, check in your vendor dependencies too.

  • LàTrinius Washington

    Go is an awesome language, but here are my gripes/wishes:

    1. I wish the garbage collection was optional or could be turned off for programs that allocate all data statically or manage dynamic memory themselves via RAII or other means. Such a feature would eliminate languages like Rust, C, C++ from even being considered on a project.

    2. I wish the Go library ecosystem was as large (or even a tenth as large) and as rich as Perl’s CPAN. Speaking of which, I wish Go had an independent third-party system such as CPAN so that we’re not forced to rely only on Google’s code libraries or random libraries written by individuals and scrounged from around the web.

    If the above two wishes were addressed, Go would become the perfect language.

  • Diego Jancic

    Hey. Great article! Quick question though, how do you do the connection between Python and Go? gRPC and protobuf? Thanks

  • It’s worth noting that Buffalo is pretty far along in providing a Rails/Phoenix/Django experience for building out MVC and APIs in Go:

    https://gobuffalo.io/

    It is not 1.0 yet, which will happen when the lead developer feels that he can provide API stability guarantees, but it is pretty feature-complete. Well worth a look if you are interested in Go, but don’t find “just use the standard library” a satisfactory answer.

  • Paddy3118

    It would be great to get an update post in a year’s time.

相关文章
相关标签/搜索