Kubernetes集群“体检”之Polaris

「本文已参与好文召集令活动,点击查看:后端、大前端双赛道投稿,2万元奖池等你挑战!前端

1 Polaris简介

随着 Kubernetes 的普遍使用,如何保证集群稳定运行,成为了开发和运维团队关注的焦点。在集群中部署应用时,像忘记配置资源请求或忘记配置限制这样简单的事情可能就会破坏自动伸缩,甚至致使工做负载耗尽资源。这样种种的配置问题经常致使生产中断,为了不它们咱们用 Polaris 来预防。Polaris是fairwinds开发的一款开源的kubernetes集群健康检查组件。经过分析集群中的部署配置,从而发现并避免影响集群稳定性、可靠性、可伸缩性和安全性的配置问题。python

2 Polaris功能

Polaris是一款经过分析部署配置,从而发现集群中存在的问题的健康检查组件。固然,Polaris的目标可不单单只是发现问题,同时也提供避免问题的解决方案,确保集群处于健康状态。下面将会介绍Polaris的主要功能: Polaris 包含3个组件,分别实现了不一样的功能:react

  • Dashboard - 以图表的形式查看当前Kubernetes workloads的工做状态和优化点。
  • Webhook - 阻止在集群中安装不符合标准的应用
  • CLI - 检查本地的yaml文件,可结合CI/CD使用

2.1 Dashboard

Dashboard是polaris提供的可视化工具,能够查看Kubernetes workloads状态的概览以及优化点。也能够按类别、名称空间和工做负载查看。nginx

2.1.1 概览集群状态

  • 查看集群健康评分
  • 查看集群检查结果
  • 查看集群版本、节点、pod、名称空间数量
# kubectl apply -f https://github.com/fairwindsops/polaris/releases/latest/download/dashboard.yaml
# kubectl port-forward --namespace polaris svc/polaris-dashboard 8080:80
复制代码

按类别查看检查结果git

  • Health Checks
  • Images
  • Networking
  • Resources
  • Security

按名称空间查看检查结果github

2.1.2 检查本地yaml文件运行

polaris dashboard --port 8080 --audit-path=/Users/mervinwang/Tencent/Code/Kubernetes/app/nginx
复制代码

2.2 Webhook

Polaris能够做为一个admission controller运行,做为一个validating webhook。它接受与仪表板相同的配置,并能够运行相同的验证。这个webhook将拒绝任何触发验证错误的workloads 。这代表了Polaris更大的目标,不单单是经过仪表板的可见性来鼓励更好的配置,而是经过这个webhook来实际执行它。Polaris不会修复workloads,只会阻止他们。web

  • 使用和dashboard相同的配置
  • 阻止全部部署配置不经过的应用安装到集群
  • 不单单可以查看集群当前存在的缺陷,还能预防缺陷

2.3 CLI

在命令行上也可使用Polaris来审计本地文件或正在运行的集群。这对于在CI/CD管道的基础设施代码上运行Polaris特别有帮助。若是Polaris给出的审计分数低于某个阈值,或者出现任何错误,可以使用命令行标志来致使CI/CD失败。json

  • 检查本地文件或正在运行的集群
  • 能够结合CI/CD,部署配置校验不经过时直接让CI/CD失败

3 安装与使用

polaris支持kubectl, helm and local binary三种安装方式,本文选择最简单的安装方式,分别介绍三个组件的安装后端

3.1 Dashboard安装

Helm安全

添加helm charts仓库

helm repo add reactiveops-stable https://charts.reactiveops.com/stable 
复制代码

更新charts仓库并安装Dashboard组件

helm upgrade --install polaris reactiveops-stable/polaris --namespace polaris 
复制代码

若是须要在本地查看Dashboard仪表盘,可使用如下命令,进行本地端口转发

kubectl port-forward --namespace polaris svc/polaris-dashboard 8080:80 
复制代码

3.2 Webhook安装

在集群中安装Webhook组件后,将会阻止不符合标准的应用部署在集群中。

helm

添加helm charts仓库

helm repo add reactiveops-stable https://charts.reactiveops.com/stable 
复制代码

更新charts仓库并安装Webhook组件

helm upgrade --install polaris reactiveops-stable/polaris --namespace polaris \
  --set webhook.enable=true --set dashboard.enable=false 
复制代码

3.3 CLI安装

若是须要在本地测试polaris,能够下载二进制文件安装 releases page,也可使用 Homebrew安装:

brew tap reactiveops/tap
brew install reactiveops/tap/polaris
polaris --version
复制代码

使用CLI检查本地配置文件

polaris --audit --audit-path ./deploy/ 
复制代码

能够将扫描结果保存到yaml文件中

polaris --audit --output-format yaml > report.yaml 
复制代码

4 使用Polaris

上面简单的介绍了,polaris的安装与基本使用。可是,若是要根据咱们项目的实际状况来结合polaris,使用默认配置就不能知足需求了。因此咱们还须要知道如何定义polaris检查规则的配置文件,实现自定义配置。 在自定义配置polaris以前,咱们须要先了解一下polaris检查的等级以及支持的检查类型。 polaris检查的严重等级分为errorwarningignore ,polaris不会检查ignore等级的配置项。 polaris支持的检查类型有:Health ChecksImagesNetworkingResourcesSecurity,下面咱们将一一介绍:

4.1 健康检查(Health Checks)

Polaris 支持校验pods中是否存在readiness和liveiness探针

key default description
readinessProbeMissing warning 没有为Pod配置readiness探针时失败
livenessProbeMissing warning 没有为Pod配置liveness探针时失败
tagNotSpecified danger 没有为镜像指定tag或者指定tag为latest时失败
pullPolicyNotAlways warning 当镜像拉取策略不是 always时失败
priorityClassNotSet ignore 当没有为Pod配置priorityClassName 时失败
multipleReplicasForDeployment ignore DeploymentReplicas为1时失败
missingPodDisruptionBudget ignore

4.2 资源

polaris支持校验内存、cpu使用限制是否配置

key default description
cpuRequestsMissing warning 没有配置 resources.requests.cpu 时失败
memoryRequestsMissing warning 没有配置 resources.requests.memory 时失败
cpuLimitsMissing warning 没有配置 resources.limits.cpu 时失败
memoryLimitsMissing warning 没有配置 resources.limits.memory 时失败

对于内存、cpu等资源配置,还能够配置范围检查。只有当配置在指定区间内才能够经过检查。

limits:
  type: object
  required:
  - memory
  - cpu
    properties:
    memory:
      type: string
      resourceMinimum: 100M
      resourceMaximum: 6G
    cpu:
      type: string
      resourceMinimum: 100m
      resourceMaximum: "2" 
复制代码

4.3 安全

key default description
hostIPCSet danger Fails when hostIPC attribute is configured.
hostPIDSet danger Fails when hostPID attribute is configured.
notReadOnlyRootFilesystem warning Fails when securityContext.readOnlyRootFilesystem is not true.
privilegeEscalationAllowed danger Fails when securityContext.allowPrivilegeEscalation is true.
runAsRootAllowed warning Fails when securityContext.runAsNonRoot is not true.
runAsPrivileged danger Fails when securityContext.privileged is true.
insecureCapabilities warning Fails when securityContext.capabilities includes one of the capabilities listed here(opens new window)
dangerousCapabilities danger Fails when securityContext.capabilities includes one of the capabilities listed here(opens new window)
hostNetworkSet warning Fails when hostNetwork attribute is configured.
hostPortSet warning Fails when hostPort attribute is configured.
tlsSettingsMissing warning Fails when an Ingress lacks TLS settings.

4.4 自定义扫描规则

根据上文的介绍,咱们已经能够根据项目的实际状况,定义本身的扫描配置。若是以为polaris提供的检查规则不知足需求的话,咱们还能够自定义检查规则。 好比:咱们能够自定义规则检查镜像来源,当镜像来自quay.io抛出警告

checks:
  imageRegistry: warning
customChecks:
  imageRegistry:
    successMessage: Image comes from allowed registries
    failureMessage: Image should not be from disallowed registry
    category: Images
    target: Container # target can be "Container" or "Pod"
    schema:
      '$schema': http://json-schema.org/draft-07/schema
      type: object
      properties:
        image:
          type: string
          not:
            pattern: ^quay.io 

复制代码

也能够指定检查项

checks:
  cpuRequestsMissing: danger
  memoryRequestsMissing: danger
  cpuLimitsMissing: danger
  memoryLimitsMissing: danger
复制代码
polaris audit -c check_config.yaml --.......
复制代码

5 检查结果

{
  "PolarisOutputVersion": "1.0",
  "AuditTime": "2021-07-01T15:07:00+08:00",
  "SourceType": "Path",
  "SourceName": "/Users/mervinwang/Tencent/Code/Kubernetes/app/nginx",
  "DisplayName": "/Users/mervinwang/Tencent/Code/Kubernetes/app/nginx",
  "ClusterInfo": {
    "Version": "unknown",
    "Nodes": 0,
    "Pods": 0,
    "Namespaces": 0,
    "Controllers": 1
  },
  "Results": [
    {
      "Name": "nginx-config",
      "Namespace": "",
      "Kind": "ConfigMap",
      "Results": {},
      "PodResult": null,
      "CreatedTime": "0001-01-01T00:00:00Z"
    },
    {
      "Name": "nginx-deployment",
      "Namespace": "",
      "Kind": "Deployment",
      "Results": {},
      "PodResult": {
        "Name": "",
        "Results": {},
        "ContainerResults": [
          {
            "Name": "nginx",
            "Results": {
              "cpuLimitsMissing": {
                "ID": "cpuLimitsMissing",
                "Message": "CPU limits should be set",
                "Details": null,
                "Success": false,
                "Severity": "danger",
                "Category": "Efficiency"
              },
              "cpuRequestsMissing": {
                "ID": "cpuRequestsMissing",
                "Message": "CPU requests should be set",
                "Details": null,
                "Success": false,
                "Severity": "danger",
                "Category": "Efficiency"
              },
              "memoryLimitsMissing": {
                "ID": "memoryLimitsMissing",
                "Message": "Memory limits should be set",
                "Details": null,
                "Success": false,
                "Severity": "danger",
                "Category": "Efficiency"
              },
              "memoryRequestsMissing": {
                "ID": "memoryRequestsMissing",
                "Message": "Memory requests should be set",
                "Details": null,
                "Success": false,
                "Severity": "danger",
                "Category": "Efficiency"
              }
            }
          }
        ]
      },
      "CreatedTime": "0001-01-01T00:00:00Z"
    }
  ],
  "Score": 0
}
复制代码

6 Python 处理检查结果

当对一个集群运行Pollaris检查后,返回的结果是json,不够直观,咱们使用Python,处理结果后输出到excel表格中,方便查看

import yaml
import os
import xlsxwriter

# config
fileNamePath = os.path.split(os.path.realpath(__file__))[0]
config = os.path.join(fileNamePath,'check_config.yaml')
cluster_config = os.path.join(fileNamePath,'cluster_list.yaml')

# variable
scan_controller_type = ["Deployment", "DaemonSet", "StatefulSet"]
def read_cluster():
    f = open(cluster_config,'r',encoding='utf-8')
    cont = f.read()
    return yaml.load(cont, Loader=yaml.FullLoader)


def generate_report(cluster_id: str):
    scan_command = f"polaris audit -c {config} --kubeconfig ~/.kube/config --only-show-failed-tests true --output-file result/{cluster_id}.yaml"
    try:
        os.system(scan_command)
    except Exception as e:
        print(e)

def format_data(cluster):
    cluster_report = os.path.join(fileNamePath, 'result/{}.yaml'.format(cluster))
    f = open(cluster_report, 'r', encoding='utf-8')
    cont = f.read()
    x = yaml.load(cont, Loader=yaml.FullLoader)
    data_result = x["Results"]
    data_list = []
    for item in data_result:
        if item["Kind"] in scan_controller_type and item['PodResult']["ContainerResults"][0]["Results"]:
            pod_scan_result = []
            for pod_result in item['PodResult']["ContainerResults"]:
                pod_name = pod_result["Name"]
                pod_scan_result.append([item for item in pod_result["Results"]])
                obj = [cluster, item["Kind"], item["Namespace"], item["Name"], pod_name, str(pod_scan_result[0])]
                data_list.append(obj)
    return data_list

def excel_config(workbook):
    column_name = ['ClusterID', 'Kind', 'NameSpace', 'Name', 'PodName', 'Scan Result']

    merge_format = workbook.add_format({
        'font_size': 22,
        'bold': True,
        'font_color': '#FFFFFF',
        'border': 1,
        'font_name':u'苹方-简',
        'align': 'center',
        'valign': 'vcenter',
        'fg_color': '#0174DF'
    })
    Title_format = workbook.add_format({
        'font_size': 18,
        'border': 1,
        'bold': True,
        'align': 'center',
        'font_name': u'苹方-简',
        'valign': 'vcenter',
    })
    data_format = workbook.add_format({
        'font_size': 16,
        'border': 1,
        'align': 'center',
        'font_name': u'苹方-简',
        'valign': 'vcenter',
    })
    return column_name, merge_format, Title_format, data_format


def generate_excel():
    workbook = xlsxwriter.Workbook("scan_result.xlsx")
    column_name, merge_format, Title_format, data_format = excel_config(workbook)
    for cluster in read_cluster()["clusters"]:
        print(f"Scan cluster start: {cluster}")
        generate_report(cluster)
        worksheet = workbook.add_worksheet(cluster)
        worksheet.merge_range('A1:F1', f'集群 {cluster} Requests/Limits 扫描结果', merge_format)
        worksheet.set_column('A:F', 35)
        worksheet.set_column('F:F', 130)
        worksheet.set_row(0, 50)
        global ECSNUM
        ECSNUM = 3
        scan_result = format_data(cluster)
        if scan_result != None:
            worksheet.write_row('A2', column_name, Title_format)
            # 若是结不为空,则表明有资源,则写入数据
            for item in scan_result:
                worksheet.write_row('A' + str(ECSNUM), item, data_format)
                ECSNUM += 1
        # 不然,表明该地域无资源,写入 NULL
        else:
            worksheet.merge_range('A3:F3', 'NOT Found INFO', data_format)

    workbook.close()

if __name__ == '__main__':
    generate_excel()

复制代码

相关文章
相关标签/搜索