英文原文: https://labs.meanpug.com/cust...
做者:Bobby Steinbach
译者:马若飞
本文强调了应用程序定制指标的重要性,用代码实例演示了如何设计指标并整合Prometheus到Django项目中,为使用Django构建应用的开发者提供了参考。html
尽管有大量关于这一主题的讨论,但应用程序的自定义指标的重要性怎么强调都不为过。和为Django应用收集的核心服务指标(应用和web服务器统计数据、关键数据库和缓存操做指标)不一样,自定义指标是业务特有的数据点,其边界和阈值只有你本身知道,这实际上是颇有趣的事情。python
什么样的指标才是有用的?考虑下面几点:nginx
除了明显的依赖(pip install Django
)以外,咱们还须要为宠物项目(译者注:demo)添加一些额外的包。继续并安装pip install django-prometheus-client
。这将为咱们提供一个Python的Prometheus客户端,以及一些有用的Django hook,包括中间件和一个优雅的DB包装器。接下来,咱们将运行Django管理命令来启动项目,更新咱们的设置来使用Prometheus客户端,并将Prometheus的URL添加到URL配置中。git
启动一个新的项目和应用程序github
为了这篇文章,而且切合代理的品牌,咱们创建了一个遛狗服务。请注意,它实际上不会作什么事,但足以做为一个教学示例。执行以下命令:web
django-admin.py startproject demo python manage.py startapp walker
#settings.py INSTALLED_APPS = [ ... 'walker', ... ]
如今,咱们来添加一些基本的模型和视图。简单起见,我只实现将要验证的部分。若是想要完整地示例,能够从这个demo应用 获取源码。sql
# walker/models.py from django.db import models from django_prometheus.models import ExportModelOperationsMixin class Walker(ExportModelOperationsMixin('walker'), models.Model): name = models.CharField(max_length=127) email = models.CharField(max_length=127) def __str__(self): return f'{self.name} // {self.email} ({self.id})' class Dog(ExportModelOperationsMixin('dog'), models.Model): SIZE_XS = 'xs' SIZE_SM = 'sm' SIZE_MD = 'md' SIZE_LG = 'lg' SIZE_XL = 'xl' DOG_SIZES = ( (SIZE_XS, 'xsmall'), (SIZE_SM, 'small'), (SIZE_MD, 'medium'), (SIZE_LG, 'large'), (SIZE_XL, 'xlarge'), ) size = models.CharField(max_length=31, choices=DOG_SIZES, default=SIZE_MD) name = models.CharField(max_length=127) age = models.IntegerField() def __str__(self): return f'{self.name} // {self.age}y ({self.size})' class Walk(ExportModelOperationsMixin('walk'), models.Model): dog = models.ForeignKey(Dog, related_name='walks', on_delete=models.CASCADE) walker = models.ForeignKey(Walker, related_name='walks', on_delete=models.CASCADE) distance = models.IntegerField(default=0, help_text='walk distance (in meters)') start_time = models.DateTimeField(null=True, blank=True, default=None) end_time = models.DateTimeField(null=True, blank=True, default=None) @property def is_complete(self): return self.end_time is not None @classmethod def in_progress(cls): """ get the list of `Walk`s currently in progress """ return cls.objects.filter(start_time__isnull=False, end_time__isnull=True) def __str__(self): return f'{self.walker.name} // {self.dog.name} @ {self.start_time} ({self.id})'
# walker/views.py from django.shortcuts import render, redirect from django.views import View from django.core.exceptions import ObjectDoesNotExist from django.http import HttpResponseNotFound, JsonResponse, HttpResponseBadRequest, Http404 from django.urls import reverse from django.utils.timezone import now from walker import models, forms class WalkDetailsView(View): def get_walk(self, walk_id=None): try: return models.Walk.objects.get(id=walk_id) except ObjectDoesNotExist: raise Http404(f'no walk with ID {walk_id} in progress') class CheckWalkStatusView(WalkDetailsView): def get(self, request, walk_id=None, **kwargs): walk = self.get_walk(walk_id=walk_id) return JsonResponse({'complete': walk.is_complete}) class CompleteWalkView(WalkDetailsView): def get(self, request, walk_id=None, **kwargs): walk = self.get_walk(walk_id=walk_id) return render(request, 'index.html', context={'form': forms.CompleteWalkForm(instance=walk)}) def post(self, request, walk_id=None, **kwargs): try: walk = models.Walk.objects.get(id=walk_id) except ObjectDoesNotExist: return HttpResponseNotFound(content=f'no walk with ID {walk_id} found') if walk.is_complete: return HttpResponseBadRequest(content=f'walk {walk.id} is already complete') form = forms.CompleteWalkForm(data=request.POST, instance=walk) if form.is_valid(): updated_walk = form.save(commit=False) updated_walk.end_time = now() updated_walk.save() return redirect(f'{reverse("walk_start")}?walk={walk.id}') return HttpResponseBadRequest(content=f'form validation failed with errors {form.errors}') class StartWalkView(View): def get(self, request): return render(request, 'index.html', context={'form': forms.StartWalkForm()}) def post(self, request): form = forms.StartWalkForm(data=request.POST) if form.is_valid(): walk = form.save(commit=False) walk.start_time = now() walk.save() return redirect(f'{reverse("walk_start")}?walk={walk.id}') return HttpResponseBadRequest(content=f'form validation failed with errors {form.errors}')
更新应用设置并添加Prometheus urls数据库
如今咱们有了一个Django项目以及相应的设置,能够为 django-prometheus添加须要的配置项了。在 settings.py
中添加下面的配置:django
INSTALLED_APPS = [ ... 'django_prometheus', ... ] MIDDLEWARE = [ 'django_prometheus.middleware.PrometheusBeforeMiddleware', .... 'django_prometheus.middleware.PrometheusAfterMiddleware', ] # we're assuming a Postgres DB here because, well, that's just the right choice :) DATABASES = { 'default': { 'ENGINE': 'django_prometheus.db.backends.postgresql', 'NAME': os.getenv('DB_NAME'), 'USER': os.getenv('DB_USER'), 'PASSWORD': os.getenv('DB_PASSWORD'), 'HOST': os.getenv('DB_HOST'), 'PORT': os.getenv('DB_PORT', '5432'), }, }
添加url配置到 urls.py
:segmentfault
urlpatterns = [ ... path('', include('django_prometheus.urls')), ]
如今咱们有了一个配置好的基本应用,并为整合作好了准备。
因为django-prometheus
提供了开箱即用功能,咱们能够当即追踪一些基本的模型操做,好比插入和删除。能够在/metrics
endpoint看到这些:
django-prometheus提供的默认指标
让咱们把它变得更有趣点。
添加一个walker/metrics.py
文件,定义一些要追踪的基本指标。
# walker/metrics.py from prometheus_client import Counter, Histogram walks_started = Counter('walks_started', 'number of walks started') walks_completed = Counter('walks_completed', 'number of walks completed') invalid_walks = Counter('invalid_walks', 'number of walks attempted to be started, but invalid') walk_distance = Histogram('walk_distance', 'distribution of distance walked', buckets=[0, 50, 200, 400, 800, 1600, 3200])
很简单,不是吗?Prometheus文档很好地解释了每种指标类型的用途,简言之,咱们使用计数器来表示严格随时间增加的指标,使用直方图来追踪包含值分布的指标。下面开始验证应用的代码。
# walker/views.py ... from walker import metrics ... class CompleteWalkView(WalkDetailsView): ... def post(self, request, walk_id=None, **kwargs): ... if form.is_valid(): updated_walk = form.save(commit=False) updated_walk.end_time = now() updated_walk.save() metrics.walks_completed.inc() metrics.walk_distance.observe(updated_walk.distance) return redirect(f'{reverse("walk_start")}?walk={walk.id}') return HttpResponseBadRequest(content=f'form validation failed with errors {form.errors}') ... class StartWalkView(View): ... def post(self, request): if form.is_valid(): walk = form.save(commit=False) walk.start_time = now() walk.save() metrics.walks_started.inc() return redirect(f'{reverse("walk_start")}?walk={walk.id}') metrics.invalid_walks.inc() return HttpResponseBadRequest(content=f'form validation failed with errors {form.errors}')
发送几个样例请求,能够看到新指标已经产生了。
显示散步距离和建立散步的指标
定义的指标此时已经能够在prometheus里查找到了
至此,咱们已经在代码中添加了自定义指标,整合了应用以追踪指标,并验证了这些指标已在/metrics
上更新并可用。让咱们继续将仪表化应用部署到Kubernetes集群。
我只会列出和追踪、导出指标相关的配置内容,完整的Helm chart部署和服务配置能够在 demo应用中找到。 做为起点,这有一些和导出指标相关的deployment和configmap的配置:
# helm/demo/templates/nginx-conf-configmap.yaml apiVersion: v1 kind: ConfigMap metadata: name: {{ include "demo.fullname" . }}-nginx-conf ... data: demo.conf: | upstream app_server { server 127.0.0.1:8000 fail_timeout=0; } server { listen 80; client_max_body_size 4G; # set the correct host(s) for your site server_name{{ range .Values.ingress.hosts }} {{ . }}{{- end }}; keepalive_timeout 5; root /code/static; location / { # checks for static file, if not found proxy to app try_files $uri @proxy_to_app; } location ^~ /metrics { auth_basic "Metrics"; auth_basic_user_file /etc/nginx/secrets/.htpasswd; proxy_pass http://app_server; } location @proxy_to_app { proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header X-Forwarded-Proto $scheme; proxy_set_header Host $http_host; # we don't want nginx trying to do something clever with # redirects, we set the Host: header above already. proxy_redirect off; proxy_pass http://app_server; } }
# helm/demo/templates/deployment.yaml apiVersion: apps/v1 kind: Deployment ... spec: metadata: labels: app.kubernetes.io/name: {{ include "demo.name" . }} app.kubernetes.io/instance: {{ .Release.Name }} app: {{ include "demo.name" . }} volumes: ... - name: nginx-conf configMap: name: {{ include "demo.fullname" . }}-nginx-conf - name: prometheus-auth secret: secretName: prometheus-basic-auth ... containers: - name: {{ .Chart.Name }}-nginx image: "{{ .Values.nginx.image.repository }}:{{ .Values.nginx.image.tag }}" imagePullPolicy: IfNotPresent volumeMounts: ... - name: nginx-conf mountPath: /etc/nginx/conf.d/ - name: prometheus-auth mountPath: /etc/nginx/secrets/.htpasswd ports: - name: http containerPort: 80 protocol: TCP - name: {{ .Chart.Name }} image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}" imagePullPolicy: {{ .Values.image.pullPolicy }} command: ["gunicorn", "--worker-class", "gthread", "--threads", "3", "--bind", "0.0.0.0:8000", "demo.wsgi:application"] env: {{ include "demo.env" . | nindent 12 }} ports: - name: gunicorn containerPort: 8000 protocol: TCP ...
没什么神奇的,只是一些YAML而已。有两个重点须要强调一下:
/metrics
放在了验证后面,为location块设置了auth_basic指令集。你可能但愿在反向代理以后部署gunicorn ,但这样作能够得到保护指标的额外好处。基于Helm的帮助文档,部署Prometheus很是简单,不须要额外工做:
helm upgrade --install prometheus stable/prometheus
几分钟后,你应该就能够经过 port-forward
进入Prometheus的pod(默认的容器端口是9090)。
Prometheus Helm chart 有大量的自定义可选项,不过咱们只须要设置extraScrapeConfigs
。建立一个values.yaml
文件。你能够略过这部分直接使用 demo应用 做为参考。文件内容以下:
extraScrapeConfigs: | - job_name: demo scrape_interval: 5s metrics_path: /metrics basic_auth: username: prometheus password: prometheus tls_config: insecure_skip_verify: true kubernetes_sd_configs: - role: endpoints namespaces: names: - default relabel_configs: - source_labels: [__meta_kubernetes_service_label_app] regex: demo action: keep - source_labels: [__meta_kubernetes_endpoint_port_name] regex: http action: keep - source_labels: [__meta_kubernetes_namespace] target_label: namespace - source_labels: [__meta_kubernetes_pod_name] target_label: pod - source_labels: [__meta_kubernetes_service_name] target_label: service - source_labels: [__meta_kubernetes_service_name] target_label: job - target_label: endpoint replacement: http
建立完成后,就能够经过下面的操做为prometheus deployment更新配置。
helm upgrade --install prometheus -f values.yaml
为验证全部的步骤都配置正确了,打开浏览器输入 http://localhost:9090/targets
(假设你已经经过 port-forward
进入了运行prometheus的Pod)。若是你看到demo应用在target的列表中,说明运行正常了。
我要强调一点:捕获自定义的应用程序指标并设置相应的报告和监控是软件工程中最重要的任务之一。幸运的是,将Prometheus指标集成到Django应用程序中实际上很是简单,正如本文展现的那样。若是你想要开始监测本身的应用,请参考完整的示例应用程序,或者直接fork代码库。祝你玩得开心。
ServiceMesher 社区是由一群拥有相同价值观和理念的志愿者们共同发起,于 2018 年 4 月正式成立。
社区关注领域有:容器、微服务、Service Mesh、Serverless,拥抱开源和云原生,致力于推进 Service Mesh 在中国的蓬勃发展。