「本文已参与好文召集令活动,点击查看:后端、大前端双赛道投稿,2万元奖池等你挑战!」前端
随着 Kubernetes 的普遍使用,如何保证集群稳定运行,成为了开发和运维团队关注的焦点。在集群中部署应用时,像忘记配置资源请求或忘记配置限制这样简单的事情可能就会破坏自动伸缩,甚至致使工做负载耗尽资源。这样种种的配置问题经常致使生产中断,为了不它们咱们用 Polaris 来预防。Polaris是fairwinds开发的一款开源的kubernetes集群健康检查组件。经过分析集群中的部署配置,从而发现并避免影响集群稳定性、可靠性、可伸缩性和安全性的配置问题。python
Polaris是一款经过分析部署配置,从而发现集群中存在的问题的健康检查组件。固然,Polaris的目标可不单单只是发现问题,同时也提供避免问题的解决方案,确保集群处于健康状态。下面将会介绍Polaris的主要功能: Polaris 包含3个组件,分别实现了不一样的功能:react
Dashboard是polaris提供的可视化工具,能够查看Kubernetes workloads状态的概览以及优化点。也能够按类别、名称空间和工做负载查看。nginx
# kubectl apply -f https://github.com/fairwindsops/polaris/releases/latest/download/dashboard.yaml
# kubectl port-forward --namespace polaris svc/polaris-dashboard 8080:80
复制代码
按类别查看检查结果git
按名称空间查看检查结果github
polaris dashboard --port 8080 --audit-path=/Users/mervinwang/Tencent/Code/Kubernetes/app/nginx
复制代码
Polaris能够做为一个admission controller运行,做为一个validating webhook。它接受与仪表板相同的配置,并能够运行相同的验证。这个webhook将拒绝任何触发验证错误的workloads 。这代表了Polaris更大的目标,不单单是经过仪表板的可见性来鼓励更好的配置,而是经过这个webhook来实际执行它。Polaris不会修复workloads,只会阻止他们。web
在命令行上也可使用Polaris来审计本地文件或正在运行的集群。这对于在CI/CD管道的基础设施代码上运行Polaris特别有帮助。若是Polaris给出的审计分数低于某个阈值,或者出现任何错误,可以使用命令行标志来致使CI/CD失败。json
polaris支持kubectl
, helm
and local binary
三种安装方式,本文选择最简单的安装方式,分别介绍三个组件的安装后端
Helm安全
添加helm charts仓库
helm repo add reactiveops-stable https://charts.reactiveops.com/stable
复制代码
更新charts仓库并安装Dashboard组件
helm upgrade --install polaris reactiveops-stable/polaris --namespace polaris
复制代码
若是须要在本地查看Dashboard仪表盘,可使用如下命令,进行本地端口转发
kubectl port-forward --namespace polaris svc/polaris-dashboard 8080:80
复制代码
在集群中安装Webhook组件后,将会阻止不符合标准的应用部署在集群中。
helm
添加helm charts仓库
helm repo add reactiveops-stable https://charts.reactiveops.com/stable
复制代码
更新charts仓库并安装Webhook组件
helm upgrade --install polaris reactiveops-stable/polaris --namespace polaris \
--set webhook.enable=true --set dashboard.enable=false
复制代码
若是须要在本地测试polaris,能够下载二进制文件安装 releases page,也可使用 Homebrew安装:
brew tap reactiveops/tap
brew install reactiveops/tap/polaris
polaris --version
复制代码
使用CLI检查本地配置文件
polaris --audit --audit-path ./deploy/
复制代码
能够将扫描结果保存到yaml文件中
polaris --audit --output-format yaml > report.yaml
复制代码
上面简单的介绍了,polaris的安装与基本使用。可是,若是要根据咱们项目的实际状况来结合polaris,使用默认配置就不能知足需求了。因此咱们还须要知道如何定义polaris检查规则的配置文件,实现自定义配置。 在自定义配置polaris以前,咱们须要先了解一下polaris检查的等级以及支持的检查类型。 polaris检查的严重等级分为error
、warning
和ignore
,polaris不会检查ignore
等级的配置项。 polaris支持的检查类型有:Health Checks
、Images
、Networking
、Resources
、Security
,下面咱们将一一介绍:
Polaris 支持校验pods中是否存在readiness和liveiness探针
key | default | description |
---|---|---|
readinessProbeMissing |
warning |
没有为Pod配置readiness 探针时失败 |
livenessProbeMissing |
warning |
没有为Pod配置liveness 探针时失败 |
tagNotSpecified |
danger |
没有为镜像指定tag或者指定tag为latest 时失败 |
pullPolicyNotAlways |
warning |
当镜像拉取策略不是 always 时失败 |
priorityClassNotSet |
ignore |
当没有为Pod配置priorityClassName 时失败 |
multipleReplicasForDeployment |
ignore |
当Deployment 的Replicas 为1时失败 |
missingPodDisruptionBudget |
ignore |
polaris支持校验内存、cpu使用限制是否配置
key | default | description |
---|---|---|
cpuRequestsMissing |
warning |
没有配置 resources.requests.cpu 时失败 |
memoryRequestsMissing |
warning |
没有配置 resources.requests.memory 时失败 |
cpuLimitsMissing |
warning |
没有配置 resources.limits.cpu 时失败 |
memoryLimitsMissing |
warning |
没有配置 resources.limits.memory 时失败 |
对于内存、cpu等资源配置,还能够配置范围检查。只有当配置在指定区间内才能够经过检查。
limits:
type: object
required:
- memory
- cpu
properties:
memory:
type: string
resourceMinimum: 100M
resourceMaximum: 6G
cpu:
type: string
resourceMinimum: 100m
resourceMaximum: "2"
复制代码
key | default | description |
---|---|---|
hostIPCSet |
danger |
Fails when hostIPC attribute is configured. |
hostPIDSet |
danger |
Fails when hostPID attribute is configured. |
notReadOnlyRootFilesystem |
warning |
Fails when securityContext.readOnlyRootFilesystem is not true. |
privilegeEscalationAllowed |
danger |
Fails when securityContext.allowPrivilegeEscalation is true. |
runAsRootAllowed |
warning |
Fails when securityContext.runAsNonRoot is not true. |
runAsPrivileged |
danger |
Fails when securityContext.privileged is true. |
insecureCapabilities |
warning |
Fails when securityContext.capabilities includes one of the capabilities listed here(opens new window) |
dangerousCapabilities |
danger |
Fails when securityContext.capabilities includes one of the capabilities listed here(opens new window) |
hostNetworkSet |
warning |
Fails when hostNetwork attribute is configured. |
hostPortSet |
warning |
Fails when hostPort attribute is configured. |
tlsSettingsMissing |
warning |
Fails when an Ingress lacks TLS settings. |
根据上文的介绍,咱们已经能够根据项目的实际状况,定义本身的扫描配置。若是以为polaris提供的检查规则不知足需求的话,咱们还能够自定义检查规则。 好比:咱们能够自定义规则检查镜像来源,当镜像来自quay.io抛出警告
checks:
imageRegistry: warning
customChecks:
imageRegistry:
successMessage: Image comes from allowed registries
failureMessage: Image should not be from disallowed registry
category: Images
target: Container # target can be "Container" or "Pod"
schema:
'$schema': http://json-schema.org/draft-07/schema
type: object
properties:
image:
type: string
not:
pattern: ^quay.io
复制代码
也能够指定检查项
checks:
cpuRequestsMissing: danger
memoryRequestsMissing: danger
cpuLimitsMissing: danger
memoryLimitsMissing: danger
复制代码
polaris audit -c check_config.yaml --.......
复制代码
{
"PolarisOutputVersion": "1.0",
"AuditTime": "2021-07-01T15:07:00+08:00",
"SourceType": "Path",
"SourceName": "/Users/mervinwang/Tencent/Code/Kubernetes/app/nginx",
"DisplayName": "/Users/mervinwang/Tencent/Code/Kubernetes/app/nginx",
"ClusterInfo": {
"Version": "unknown",
"Nodes": 0,
"Pods": 0,
"Namespaces": 0,
"Controllers": 1
},
"Results": [
{
"Name": "nginx-config",
"Namespace": "",
"Kind": "ConfigMap",
"Results": {},
"PodResult": null,
"CreatedTime": "0001-01-01T00:00:00Z"
},
{
"Name": "nginx-deployment",
"Namespace": "",
"Kind": "Deployment",
"Results": {},
"PodResult": {
"Name": "",
"Results": {},
"ContainerResults": [
{
"Name": "nginx",
"Results": {
"cpuLimitsMissing": {
"ID": "cpuLimitsMissing",
"Message": "CPU limits should be set",
"Details": null,
"Success": false,
"Severity": "danger",
"Category": "Efficiency"
},
"cpuRequestsMissing": {
"ID": "cpuRequestsMissing",
"Message": "CPU requests should be set",
"Details": null,
"Success": false,
"Severity": "danger",
"Category": "Efficiency"
},
"memoryLimitsMissing": {
"ID": "memoryLimitsMissing",
"Message": "Memory limits should be set",
"Details": null,
"Success": false,
"Severity": "danger",
"Category": "Efficiency"
},
"memoryRequestsMissing": {
"ID": "memoryRequestsMissing",
"Message": "Memory requests should be set",
"Details": null,
"Success": false,
"Severity": "danger",
"Category": "Efficiency"
}
}
}
]
},
"CreatedTime": "0001-01-01T00:00:00Z"
}
],
"Score": 0
}
复制代码
当对一个集群运行Pollaris检查后,返回的结果是json,不够直观,咱们使用Python,处理结果后输出到excel表格中,方便查看
import yaml
import os
import xlsxwriter
# config
fileNamePath = os.path.split(os.path.realpath(__file__))[0]
config = os.path.join(fileNamePath,'check_config.yaml')
cluster_config = os.path.join(fileNamePath,'cluster_list.yaml')
# variable
scan_controller_type = ["Deployment", "DaemonSet", "StatefulSet"]
def read_cluster():
f = open(cluster_config,'r',encoding='utf-8')
cont = f.read()
return yaml.load(cont, Loader=yaml.FullLoader)
def generate_report(cluster_id: str):
scan_command = f"polaris audit -c {config} --kubeconfig ~/.kube/config --only-show-failed-tests true --output-file result/{cluster_id}.yaml"
try:
os.system(scan_command)
except Exception as e:
print(e)
def format_data(cluster):
cluster_report = os.path.join(fileNamePath, 'result/{}.yaml'.format(cluster))
f = open(cluster_report, 'r', encoding='utf-8')
cont = f.read()
x = yaml.load(cont, Loader=yaml.FullLoader)
data_result = x["Results"]
data_list = []
for item in data_result:
if item["Kind"] in scan_controller_type and item['PodResult']["ContainerResults"][0]["Results"]:
pod_scan_result = []
for pod_result in item['PodResult']["ContainerResults"]:
pod_name = pod_result["Name"]
pod_scan_result.append([item for item in pod_result["Results"]])
obj = [cluster, item["Kind"], item["Namespace"], item["Name"], pod_name, str(pod_scan_result[0])]
data_list.append(obj)
return data_list
def excel_config(workbook):
column_name = ['ClusterID', 'Kind', 'NameSpace', 'Name', 'PodName', 'Scan Result']
merge_format = workbook.add_format({
'font_size': 22,
'bold': True,
'font_color': '#FFFFFF',
'border': 1,
'font_name':u'苹方-简',
'align': 'center',
'valign': 'vcenter',
'fg_color': '#0174DF'
})
Title_format = workbook.add_format({
'font_size': 18,
'border': 1,
'bold': True,
'align': 'center',
'font_name': u'苹方-简',
'valign': 'vcenter',
})
data_format = workbook.add_format({
'font_size': 16,
'border': 1,
'align': 'center',
'font_name': u'苹方-简',
'valign': 'vcenter',
})
return column_name, merge_format, Title_format, data_format
def generate_excel():
workbook = xlsxwriter.Workbook("scan_result.xlsx")
column_name, merge_format, Title_format, data_format = excel_config(workbook)
for cluster in read_cluster()["clusters"]:
print(f"Scan cluster start: {cluster}")
generate_report(cluster)
worksheet = workbook.add_worksheet(cluster)
worksheet.merge_range('A1:F1', f'集群 {cluster} Requests/Limits 扫描结果', merge_format)
worksheet.set_column('A:F', 35)
worksheet.set_column('F:F', 130)
worksheet.set_row(0, 50)
global ECSNUM
ECSNUM = 3
scan_result = format_data(cluster)
if scan_result != None:
worksheet.write_row('A2', column_name, Title_format)
# 若是结不为空,则表明有资源,则写入数据
for item in scan_result:
worksheet.write_row('A' + str(ECSNUM), item, data_format)
ECSNUM += 1
# 不然,表明该地域无资源,写入 NULL
else:
worksheet.merge_range('A3:F3', 'NOT Found INFO', data_format)
workbook.close()
if __name__ == '__main__':
generate_excel()
复制代码