cortex:一个支持多租户、水平扩展的prometheus服务。前端
当时调研cortex实际上是由于看到了Weave Cloud这个商业产品中的监控模块介绍,weave也叫weave works,官方地址是:https://cloud.weave.works,是一个专一于容器微服务的paas平台。linux
WeaveCloud在监控模块最大化利用了Prometheus,并在其基础上添加了不少组件,实现了多租户管理、高可用的监控集群。其使用的核心监控组件就是cortex。nginx
本文主要分享的是cortex的运行机制,关于Weave Cloud的产品定位和功能能够看下后续的文章:[商业方案-weave work]()git
Cortex是一个CNCF的沙盒项目,目前被几个线上产品使用:Weave Cloud、GrafanaCloud和FreshTracks.iogithub
为何不直接运行Prometheus,而用Cortex?shell
ps:来自cortex kubecon大会演讲json
针对以上需求,Cortex提供的主要功能或特点以下:bootstrap
类似的竞品:api
ps:来自weave work上试用监控模块时的截图缓存
在k8s集群中部署所须要的yaml列表为:
[https://github.com/weaveworks...](https://github.com/weaveworks...
)
部署的agent时的脚本内容是:
#!/bin/sh set -e # Create a temporary file for the bootstrap binary TMPFILE="$(mktemp -qt weave_bootstrap.XXXXXXXXXX)" || exit 1 finish(){ # Send only when this script errors out # Filter out the bootstrap errors if [ $? -ne 111 ] && [ $? -ne 0 ]; then curl -s >/dev/null 2>/dev/null -H "Accept: application/json" -H "Authorization: Bearer $token" -X POST -d \ '{"type": "onboarding_failed", "messages": {"browser": { "type": "onboarding_failed", "text": "Installation of Weave Cloud agents did not finish."}}}' \ https://cloud.weave.works/api/notification/external/events || true fi # Arrange for the bootstrap binary to be deleted rm -f "$TMPFILE" } # Call finish function on exit trap finish EXIT # Parse command-line arguments for arg in "$@"; do case $arg in --token=*) token=$(echo $arg | cut -d '=' -f 2) ;; esac done if [ -z "$token" ]; then echo "error: please specify the instance token with --token=<TOKEN>" exit 1 fi # Notify installation has started curl -s >/dev/null 2>/dev/null -H "Accept: application/json" -H "Authorization: Bearer $token" -X POST -d \ '{"type": "onboarding_started", "messages": {"browser": { "type": "onboarding_started", "text": "Installation of Weave Cloud agents has started"}}}' \ https://cloud.weave.works/api/notification/external/events || true # Get distribution unamestr=$(uname) if [ "$unamestr" = 'Darwin' ]; then dist='darwin' elif [ "$unamestr" = 'Linux' ]; then dist='linux' else echo "This OS is not supported" exit 1 fi # Download the bootstrap binary echo "Downloading the Weave Cloud installer... " curl -Ls "https://get.weave.works/bootstrap?dist=$dist" >> "$TMPFILE" # Make the bootstrap binary executable chmod +x "$TMPFILE" # Execute the bootstrap binary "$TMPFILE" "--scheme=https" "--wc.launcher=get.weave.works" "--wc.hostname=cloud.weave.works" "--report-errors" "$@"
Cortex与Prometheus的交互图:
原理图:
Cortex中各组件的做用:
Cortex由多个可水平扩展的微服务组成。每一个微服务使用最合适的技术进行水平缩放; 大多数是无状态的,而有些(即Retrieval)是半有状态的而且依赖于一致性哈希
Prometheus实例从各类目标中抓取样本,而后将它们推送到Cortex(使用Prometheus的远程写入API),并对发送的Protocol Buffers序列化数据进行Snappy压缩。
Cortex要求每一个HTTP请求都带有一个header,用于指定请求的租户ID。请求身份验证和受权由外部反向代理处理。
传入的样本(来自Prometheus的写入)由Distributor处理,而传入的读取(PromQL查询)由查询前端处理。
查询缓存:
查询时会缓存存查询结果,并在后续查询中复用它们。若是缓存的结果不完整,则查询前端计算所需的子查询并在下游查询器上并行执行它们。
并发查询:
查询做业接受来自查询器的gRPC流请求,为了实现高可用性,建议您运行多个前端,且前端数量少于查询器数量。在大多数状况下,两个应该足够了。
本文为容器监控实践系列文章,完整内容见:container-monitor-book