您的位置:首页 > 文旅 > 美景 > 正规公司都有哪些部门_400建筑人才网_阿里云免费建站_seo和sem的区别

正规公司都有哪些部门_400建筑人才网_阿里云免费建站_seo和sem的区别

2024/12/23 11:40:20 来源:https://blog.csdn.net/u010383467/article/details/143328280  浏览:    关键词:正规公司都有哪些部门_400建筑人才网_阿里云免费建站_seo和sem的区别
正规公司都有哪些部门_400建筑人才网_阿里云免费建站_seo和sem的区别

文章目录

    • @[toc]
    • 什么是 Thanos
    • Thanos 的主要功能
    • Thanos 的架构组件
    • Thanos 部署架构
      • Sidecar
      • Receive
      • 架构选择
    • 开始部署
      • 部署架构
      • 创建 namespace
      • node-exporter 部署
      • kube-state-metrics 部署
      • Prometheus + Thanos-Sidecar 部署
        • 固定节点创建 label
        • 生成 secret
          • MinIO 配置
          • etcd 证书
        • 启动 Prometheus + Thanos-Sidecar
      • Thanos-store-gateway 部署
      • Thanos-compact 部署
      • Thanos-query 部署
      • Thanos-query-globle 部署
      • Thanos-query-frontend 部署
      • Grafana 部署
      • 增加 Thanos 和 MinIO 监控
    • Grafana dashboard
      • coredns
      • etcd
      • Thanos
      • node-exporter
    • 最后

什么是 Thanos

  • Thanos 官网
  • Thanos quay.io 镜像仓库
  • Thanos Github

Thanos 是一个强大的 Prometheus 扩展解决方案,能够解决 Prometheus 在大规模环境下的存储、扩展性和高可用性问题。

它非常适合大规模集群监控需求,尤其是需要长期存储监控数据和全局查询。

Thanos 的主要功能

  • 全局查询(Global Query View)
    • 通过其 Querier 组件提供从多个 Prometheus 实例查询的能力,并能对跨多个数据源进行全局去重查询
    • 即使在大规模集群中运行多个 Prometheus 实例,用户也可以从一个接口统一查询所有的监控数据
  • 长期存储(Unlimited Retention)
    • Prometheus 默认只适用于短期数据存储,而 Thanos 提供了将监控数据推送到长期存储(如 Amazon S3、Google Cloud Storage、MinIO 等对象存储)的能力
  • Prometheus 集成(Prometheus Compatible)
    • Grafana 和其他支持 Prometheus 查询 API 的工具都可以通过 Thanos 查询 Prometheus 数据
  • 数据压缩与去重(Downsampling & Compaction)
    • Thanos 的 Compactor 组件会定期对存储在对象存储中的数据进行压缩、去重和优化,以减少存储开销并提高查询性能

Thanos 的架构组件

遵循 KISS 和 Unix 理念,Thanos 由一组组件组成,每个组件都扮演一个特定的角色

  • Sidecar
    • 与每个 Prometheus 实例一起部署,负责将数据推送到对象存储,并暴露出 Prometheus 的数据给 Querier
  • Store Gateway
    • 简称为 Store,专门用于从对象存储(如 AWS S3、Google Cloud Storage、MinIO 等)中检索历史监控数据的组件
  • Compactor
    • 负责对存储在对象存储中的数据进行压缩、去重和优化,提升查询性能并减少存储开销
  • Receiver
    • 专门用于接收和存储 Prometheus 实例通过 Remote Write 发送数据的组件(强烈建议使用 Prometheus v2.13.0+,因为它的远程读取功能得到了改进。)
  • Ruler/Rule
    • 类似 Prometheus 的 Alertmanager,它允许用户基于存储的数据执行告警和规则评估
  • Querier/Query
    • 一个用于全局查询的组件,能够从多个 Prometheus 实例和对象存储中提取数据,并提供统一的查询接口
  • Query Frontend
    • Query 的前端页面,通过查询分片缓存请求队列等机制,加速复杂查询,并提升查询在高负载环境下的响应速度

Thanos 部署架构

Sidecar

Sidecar 使用 Prometheus 的 reload 接口。确保 Prometheus 启用 --web.enable-lifecycle 参数

在这里插入图片描述

  • 优点
    • 轻量级:Sidecar 是一个轻量的代理,只需要运行在 Prometheus 实例旁边即可,无需对 Prometheus 进行大的改动。
    • 实时数据访问:Sidecar 允许 Thanos 直接访问 Prometheus 的实时监控数据,保证了最新监控信息的可查询性。
    • 长期存储集成:可以将 Prometheus 的数据定期上传到对象存储,解决了 Prometheus 原生不具备长期存储的缺陷。
  • 缺点
    • 依赖 Prometheus:Sidecar 必须依赖于运行的 Prometheus 实例,如果 Prometheus 实例宕机,Sidecar 也无法提供数据查询功能。
    • 水平扩展有限:Sidecar 并不设计用于大规模数据接收,它主要是作为 Prometheus 的配套组件,无法像 Receiver 那样水平扩展来处理大量的数据。

Receive

在这里插入图片描述

  • 优点

    • 大规模数据接收:Receiver 能够高效接收大量来自 Prometheus 实例的数据,适用于大规模部署。
    • 多租户支持:可以处理和隔离多个租户的数据,在需要监控多个独立环境时非常有用。
    • 水平扩展:通过数据分片和扩展 Receiver 实例,能够处理越来越多的数据接收任务。
    • 去重和高可用性:Receiver 能够通过去重机制,确保多实例高可用性,并避免重复数据存储。
  • 缺点

    • 无直接查询功能:Receiver 本身不具备查询功能,接收到的数据需要依赖其他 Thanos 组件(如 Querier 和 Store)进行查询和分析。

      实时性较低:相比直接从 Prometheus 实例查询数据,Receiver 可能在数据处理和查询时存在一定的延迟。

Sidecar 与 Receiver 的区别对比(抄自 ChatGPT)

特性Thanos SidecarThanos Receiver
主要功能集成 Prometheus 实例,提供实时数据访问和长期存储接收 Prometheus 实例的远程写入数据,并存储
数据源直接从 Prometheus 获取数据Prometheus 的 Remote Write 数据
数据存储方式定期上传 Prometheus 数据块到对象存储将接收到的数据存储在本地或对象存储中
水平扩展性无法扩展,只与单个 Prometheus 实例集成可以通过增加实例水平扩展
实时数据查询支持 Prometheus 实时数据查询无法直接查询数据
多租户支持不支持支持,适用于多租户环境
高可用性依赖 Prometheus 实例支持高可用部署和去重机制
适用场景与现有 Prometheus 实例集成,长期存储数据大规模、多租户环境的数据接收和存储

架构选择

  • 多集群 thanos 监控告警实践
  • 打造云原生大型分布式监控系统 (三): Thanos 部署与实践
  • 以下的建议取自这两个博客,具体的架构选择,也只能大家根据自己的实际情况验证和判断
  • Sidecar 与 Receiver 的最主要的区分就是最新数据的查询方式不同
    • Sidecar 最新数据直接读取 Promethues 数据目录
    • Receiver 的所有数据都在存储服务里面(S3 等存储服务)
  • Prometheus 集群不大,采集的服务不多的情况下,即使 Sidecar 和 全局查询的 Query 不在一个机房,只要都是国内的,查询延迟一般不会太高
  • Prometheus 集群很大,要采集的数据也非常多的情况下,尽可能还是选择 Sidecar 架构,因为数据一旦激增,Receiver 的压力会非常非常大,需要很大的资源,也需要很强大的存储性能
  • 除非主要目的是针对指标历史做分析使用,或者 Prometheus 有某些特殊场景无法持久化数据,这些以外的场景,建议使用 Sidecar

开始部署

采用 sidecar 模式部署

部署架构

考虑用 Prometheus 自带的 rule 做告警,这边没打算部署 Thanos-rule

k8s 集群 Ak8s 集群 B
Prometheus:v2.54.1Prometheus:v2.54.1
node-exporter:v1.8.2node-exporter:v1.8.2
kube-state-metrics:v2.11.0kube-state-metrics:v2.11.0
Thanos-sidecar:v0.36.1Thanos-sidecar:v0.36.1
Thanos-query:v0.36.1Thanos-query:v0.36.1
Thanos-store-gateway:v0.36.1Thanos-store-gateway:v0.36.1
Thanos-compact:v0.36.1
Thanos-query-globle:v0.36.1
Thanos-query-frontend:v0.36.1
Grafana

MinIO 部署可以看我之前的博客:k8s 1.28.2 集群部署 MinIO 分布式集群,先提前准备好 MinIO 集群

创建 namespace

以下所有的 k 命令都代表 kubectl 命令,部署这块只展示一个环境的,我这边是两套 k8s 集群,需要部署两套 Prometheus

k create ns monitor

node-exporter 部署

---
apiVersion: v1
kind: Service
metadata:labels:app.kubernetes.io/name: node-exportername: node-exporter-svcnamespace: monitoring
spec:clusterIP: Noneports:- name: httpport: 9100protocol: TCPselector:app.kubernetes.io/name: node-exportertype: ClusterIP
---
apiVersion: apps/v1
kind: DaemonSet
metadata:labels:app.kubernetes.io/name: node-exportername: node-exporternamespace: monitoring
spec:selector:matchLabels:app.kubernetes.io/name: node-exportertemplate:metadata:labels:app.kubernetes.io/name: node-exporterspec:containers:- args:- --path.rootfs=/rootfs- --collector.filesystem.ignored-fs-types=^(autofs|binfmt_misc|cgroup|configfs|debugfs|devpts|devtmpfs|fusectl|hugetlbfs|mqueue|overlay|proc|procfs|pstore|rpc_pipefs|securityfs|sysfs|tracefs)$image: docker.m.daocloud.io/prom/node-exporter:v1.8.2name: node-exporterports:- containerPort: 9100hostPort: 9100name: httpvolumeMounts:- mountPath: /rootfsname: rootreadOnly: truehostIPC: truehostNetwork: truehostPID: truevolumes:- hostPath:path: /name: root

kube-state-metrics 部署

---
apiVersion: v1
automountServiceAccountToken: false
kind: ServiceAccount
metadata:labels:app.kubernetes.io/name: kube-state-metricsname: kube-state-metrics-sanamespace: monitoring
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:labels:app.kubernetes.io/name: kube-state-metricsname: kube-state-metrics
rules:
- apiGroups:- ""resources:- configmaps- secrets- nodes- pods- services- serviceaccounts- resourcequotas- replicationcontrollers- limitranges- persistentvolumeclaims- persistentvolumes- namespaces- endpointsverbs:- list- watch
- apiGroups:- appsresources:- statefulsets- daemonsets- deployments- replicasetsverbs:- list- watch
- apiGroups:- batchresources:- cronjobs- jobsverbs:- list- watch
- apiGroups:- autoscalingresources:- horizontalpodautoscalersverbs:- list- watch
- apiGroups:- authentication.k8s.ioresources:- tokenreviewsverbs:- create
- apiGroups:- authorization.k8s.ioresources:- subjectaccessreviewsverbs:- create
- apiGroups:- policyresources:- poddisruptionbudgetsverbs:- list- watch
- apiGroups:- certificates.k8s.ioresources:- certificatesigningrequestsverbs:- list- watch
- apiGroups:- discovery.k8s.ioresources:- endpointslicesverbs:- list- watch
- apiGroups:- storage.k8s.ioresources:- storageclasses- volumeattachmentsverbs:- list- watch
- apiGroups:- admissionregistration.k8s.ioresources:- mutatingwebhookconfigurations- validatingwebhookconfigurationsverbs:- list- watch
- apiGroups:- networking.k8s.ioresources:- networkpolicies- ingressclasses- ingressesverbs:- list- watch
- apiGroups:- coordination.k8s.ioresources:- leasesverbs:- list- watch
- apiGroups:- rbac.authorization.k8s.ioresources:- clusterrolebindings- clusterroles- rolebindings- rolesverbs:- list- watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:labels:app.kubernetes.io/name: kube-state-metricsname: kube-state-metrics
roleRef:apiGroup: rbac.authorization.k8s.iokind: ClusterRolename: kube-state-metrics
subjects:
- kind: ServiceAccountname: kube-state-metrics-sanamespace: monitoring
---
apiVersion: v1
kind: Service
metadata:labels:app.kubernetes.io/name: kube-state-metricsname: kube-state-metricsnamespace: monitoring
spec:clusterIP: Noneports:- name: http-metricsport: 8080targetPort: http-metrics- name: telemetryport: 8081targetPort: telemetryselector:app.kubernetes.io/name: kube-state-metrics
---
apiVersion: apps/v1
kind: Deployment
metadata:labels:app.kubernetes.io/name: kube-state-metricsname: kube-state-metricsnamespace: monitoring
spec:replicas: 1selector:matchLabels:app.kubernetes.io/name: kube-state-metricstemplate:metadata:labels:app.kubernetes.io/name: kube-state-metricsspec:automountServiceAccountToken: truecontainers:- image: docker.m.daocloud.io/registry.k8s.io/kube-state-metrics/kube-state-metrics:v2.11.0imagePullPolicy: IfNotPresentlivenessProbe:httpGet:path: /livezport: http-metricsinitialDelaySeconds: 5timeoutSeconds: 5name: kube-state-metricsports:- containerPort: 8080name: http-metrics- containerPort: 8081name: telemetryreadinessProbe:httpGet:path: /readyzport: telemetryinitialDelaySeconds: 5timeoutSeconds: 5securityContext:allowPrivilegeEscalation: falsecapabilities:drop:- ALLreadOnlyRootFilesystem: truerunAsNonRoot: truerunAsUser: 65534seccompProfile:type: RuntimeDefaultnodeSelector:kubernetes.io/os: linuxserviceAccountName: kube-state-metrics-sa

Prometheus + Thanos-Sidecar 部署

固定节点创建 label
k label node 192.168.22.125 prometheus=true
生成 secret
MinIO 配置

因为包含 MinIO 的 access_key 和 secret_key,尽量别用 configmap 去明文读取,用 secret 读取,一会输出的内容,合并成一行后,需要放到下面的 secret 里面去替换掉

cat <<EOF | base64 -
type: S3
config:bucket: "prom-thanos-sidecar"endpoint: "minio.api.devops.icu"access_key: "gsl2dzAHviNzabSn0ikw"secret_key: "82zQ0UMDlOo3LxCQM9TqSygEYrMuxSSRYQdO1KXF"insecure: true
EOF
etcd 证书

我是 kubeadm 部署的 k8s 集群,我的证书路径是 /etc/kubernetes/pki/etcd,我直接把本地文件生成 secret

certs_dir=/etc/kubernetes/pki/etcd; \
k create secret generic etcd-pki -n monitoring \
--from-file=ca=${certs_dir}/ca.crt \
--from-file=cert=${certs_dir}/server.crt \
--from-file=key=${certs_dir}/server.key
启动 Prometheus + Thanos-Sidecar

Prometheus 的数据存储用的是本地 hostpath 的方式,由于 Thanos 需要读取 Prometheus 的数据,所以要保持用户一致,不然会因为权限问题,Thanos 没法读取数据,也没法将数据上传到 MinIO,具体的报错参考:ts=2024-10-21T06:09:16.284378709Z caller=sidecar.go:410 level=warn err="upload 01JAP2JAZ0AQT8BEYFY30A4VVD: hard link block: hard link file chunks/000001: link /etc/prometheus/data/01JAP2JAZ0AQT8BEYFY30A4VVD/chunks/000001 /etc/prometheus/data/thanos/upload/01JAP2JAZ0AQT8BEYFY30A4VVD/chunks/000001: operation not permitted" uploaded=0

  • Prometheus 参数简介
    • --storage.tsdb.min-block-duration=2h:最小2小时生成一次新的数据块
    • --storage.tsdb.max-block-duration=2h:最大2小时生成一次新的数据块
    • --storage.tsdb.retention.time=6h:Prometheus 本地数据保留时长,默认是15天,这个可以自己根据实际磁盘情况调整
    • --storage.tsdb.wal-compression:启用 WAL 日志压缩,减少 WAL 文件的大小,降低存储空间的需求
    • --storage.tsdb.no-lockfile:禁用锁文件,避免影响 Thanos 上传数据块到 MinIO
    • --web.enable-lifecycle:支持热更新 localhost:9090/-/reload 热加载配置文件
---
apiVersion: v1
kind: ServiceAccount
metadata:name: prometheus-sanamespace: monitoring
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:name: prometheus
roleRef:apiGroup: rbac.authorization.k8s.iokind: ClusterRolename: cluster-admin
subjects:
- kind: ServiceAccountname: prometheus-sanamespace: monitoring
---
apiVersion: v1
kind: Service
metadata:labels:app: prometheusname: prometheus-svcnamespace: monitoring
spec:ports:- name: httpport: 9090targetPort: 9090- name: grpcport: 10901targetPort: 10901selector:app: prometheustype: ClusterIP
---
apiVersion: v1
data:prometheus.yml: |global:scrape_interval: 30sevaluation_interval: 30sscrape_timeout: 10sexternal_labels:cluster: devopsreplica: $(POD_NAME)rule_files:- /etc/prometheus/rules/*.ymlscrape_configs:- job_name: prometheuskubernetes_sd_configs:- role: endpointsrelabel_configs:- source_labels: [__meta_kubernetes_service_label_app]regex: prometheusaction: keep- source_labels: [__meta_kubernetes_pod_ip]regex: (.+)target_label: __address__replacement: ${1}:9090- source_labels: [__meta_kubernetes_endpoints_name]action: replacetarget_label: endpoint- source_labels: [__meta_kubernetes_pod_name]action: replacetarget_label: pod- source_labels: [__meta_kubernetes_service_name]action: replacetarget_label: service- source_labels: [__meta_kubernetes_namespace]action: replacetarget_label: namespace- job_name: kube-apiserverkubernetes_sd_configs:- role: endpointsscheme: httpstls_config:insecure_skip_verify: truebearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/tokenrelabel_configs:- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]action: keepregex: default;kubernetes;https- source_labels: [__meta_kubernetes_endpoints_name]action: replacetarget_label: endpoint- source_labels: [__meta_kubernetes_pod_name]action: replacetarget_label: pod- source_labels: [__meta_kubernetes_service_name]action: replacetarget_label: service- source_labels: [__meta_kubernetes_namespace]action: replacetarget_label: namespace- job_name: kubeletmetrics_path: /metrics/cadvisorscheme: httpstls_config:insecure_skip_verify: truebearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/tokenkubernetes_sd_configs:- role: noderelabel_configs:- action: labelmapregex: __meta_kubernetes_node_label_(.+)- source_labels: [instance]action: replacetarget_label: node- source_labels: [__meta_kubernetes_endpoints_name]action: replacetarget_label: endpoint- source_labels: [__meta_kubernetes_pod_name]action: replacetarget_label: pod- source_labels: [__meta_kubernetes_namespace]action: replacetarget_label: namespace- job_name: etcdkubernetes_sd_configs:- role: podscheme: httpstls_config:ca_file: /etc/prometheus/etcd-ssl/cacert_file: /etc/prometheus/etcd-ssl/certkey_file: /etc/prometheus/etcd-ssl/keyinsecure_skip_verify: falserelabel_configs:- source_labels: [__meta_kubernetes_pod_label_component]regex: etcdaction: keep- source_labels: [__meta_kubernetes_pod_ip]regex: (.+)target_label: __address__replacement: ${1}:2379- source_labels: [__meta_kubernetes_endpoints_name]action: replacetarget_label: endpoint- source_labels: [__meta_kubernetes_pod_name]action: replacetarget_label: pod- source_labels: [__meta_kubernetes_service_name]action: replacetarget_label: service- source_labels: [__meta_kubernetes_namespace]action: replacetarget_label: namespace- job_name: corednskubernetes_sd_configs:- role: endpointsrelabel_configs:- source_labels: [__meta_kubernetes_service_label_k8s_app]regex: kube-dnsaction: keep- source_labels: [__meta_kubernetes_pod_ip]regex: (.+)target_label: __address__replacement: ${1}:9153- source_labels: [__meta_kubernetes_endpoints_name]action: replacetarget_label: endpoint- source_labels: [__meta_kubernetes_pod_name]action: replacetarget_label: pod- source_labels: [__meta_kubernetes_service_name]action: replacetarget_label: service- source_labels: [__meta_kubernetes_namespace]action: replacetarget_label: namespace- job_name: node-exporterkubernetes_sd_configs:- role: noderelabel_configs:- action: labelmapregex: __meta_kubernetes_node_label_(.+)- source_labels: [__address__]regex: '(.*):10250'replacement: '${1}:9100'target_label: __address__action: replace- source_labels: [__meta_kubernetes_node_address_InternalIP]action: replacetarget_label: ip- source_labels: [__meta_kubernetes_pod_name]action: replacetarget_label: pod- source_labels: [__meta_kubernetes_namespace]action: replacetarget_label: namespace- job_name: kube-state-metricskubernetes_sd_configs:- role: endpointsrelabel_configs:- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name]regex: monitoring;kube-state-metricsaction: keep- source_labels: [__meta_kubernetes_pod_ip]regex: (.+)target_label: __address__replacement: ${1}:8080- source_labels: [__meta_kubernetes_endpoints_name]action: replacetarget_label: endpoint- source_labels: [__meta_kubernetes_pod_name]action: replacetarget_label: pod- source_labels: [__meta_kubernetes_service_name]action: replacetarget_label: service- source_labels: [__meta_kubernetes_namespace]action: replacetarget_label: namespace
kind: ConfigMap
metadata:name: prometheus-cmnamespace: monitoring
---
apiVersion: v1
data:config: dHlwZTogUzMKY29uZmlnOgogIGJ1Y2tldDogInByb20tdGhhbm9zLXNpZGVjYXIiCiAgZW5kcG9pbnQ6ICJtaW5pby5hcGkuZGV2b3BzLmljdSIKICBhY2Nlc3Nfa2V5OiAiZ3NsMmR6QUh2aU56YWJTbjBpa3ciCiAgc2VjcmV0X2tleTogIjgyelEwVU1EbE9vM0x4Q1FNOVRxU3lnRVlyTXV4U1NSWVFkTzFLWEYiCiAgaW5zZWN1cmU6IHRydWUK
kind: Secret
metadata:labels:app.kubernetes.io/name: prometheusname: thanos-confignamespace: monitoring
---
apiVersion: apps/v1
kind: StatefulSet
metadata:labels:app: prometheusname: prometheusnamespace: monitoring
spec:replicas: 1selector:matchLabels:app: prometheustemplate:metadata:labels:app: prometheusspec:affinity:nodeAffinity:requiredDuringSchedulingIgnoredDuringExecution:nodeSelectorTerms:- matchExpressions:- key: prometheusoperator: Invalues:- "true"podAntiAffinity:requiredDuringSchedulingIgnoredDuringExecution:- labelSelector:matchExpressions:- key: appoperator: Invalues:- prometheustopologyKey: kubernetes.io/hostnamecontainers:- args:- --config.file=/etc/prometheus/config/prometheus.yml- --storage.tsdb.path=/etc/prometheus/data- --storage.tsdb.min-block-duration=2h- --storage.tsdb.max-block-duration=2h- --storage.tsdb.retention.time=6h- --storage.tsdb.wal-compression- --storage.tsdb.no-lockfile- --web.enable-lifecyclecommand:- /bin/prometheusenv:- name: TZvalue: Asia/Shanghaiimage: quay.io/prometheus/prometheus:v2.54.1imagePullPolicy: IfNotPresentlivenessProbe:failureThreshold: 60initialDelaySeconds: 5periodSeconds: 10successThreshold: 1tcpSocket:port: httptimeoutSeconds: 1name: prometheusports:- containerPort: 9090name: httpreadinessProbe:failureThreshold: 60initialDelaySeconds: 5periodSeconds: 10successThreshold: 1tcpSocket:port: httptimeoutSeconds: 1resources:limits:cpu: 500mmemory: 1024Mirequests:cpu: 100mmemory: 100MivolumeMounts:- mountPath: /etc/prometheus/dataname: prometheus-home- mountPath: /etc/prometheus/configname: prometheus-config- mountPath: /etc/prometheus/etcd-sslname: etcd-ssl- args:- sidecar- --log.level=info- --log.format=logfmt- --grpc-address=0.0.0.0:10901- --http-address=0.0.0.0:10902- --tsdb.path=/etc/prometheus/data- --prometheus.url=http://localhost:9090- --objstore.config-file=/etc/thanos/config/thanos-sidecar.ymlimage: quay.io/thanos/thanos:v0.36.1imagePullPolicy: IfNotPresentname: thanos-sidecarports:- containerPort: 10901name: grpcvolumeMounts:- mountPath: /etc/prometheus/dataname: prometheus-home- mountPath: /etc/thanos/config/thanos-sidecar.ymlname: thanos-configreadOnly: truesubPath: configimagePullSecrets:- name: harbor-secretinitContainers:- command:- sh- -c- '[ -d /etc/prometheus/data/thanos ] || chown -R 65534:65534 /etc/prometheus/data'image: quay.io/prometheus/prometheus:v2.54.1imagePullPolicy: IfNotPresentname: init-dirsecurityContext:runAsUser: 0volumeMounts:- mountPath: /etc/prometheus/dataname: prometheus-homesecurityContext:runAsUser: 65534serviceAccount: prometheus-saterminationGracePeriodSeconds: 0volumes:- hostPath:path: /approot/k8s_data/prometheustype: DirectoryOrCreatename: prometheus-home- configMap:name: prometheus-cmname: prometheus-config- name: thanos-configsecret:secretName: thanos-config- name: etcd-sslsecret:secretName: etcd-pki

Thanos-store-gateway 部署

secret 里面涉及的内容,和 sidecar 里面的是一样的,记得替换成自己的

---
apiVersion: v1
automountServiceAccountToken: false
kind: ServiceAccount
metadata:labels:app.kubernetes.io/name: thanos-store-gatewayname: thanos-store-gateway-sanamespace: monitoring
---
apiVersion: v1
data:config: dHlwZTogUzMKY29uZmlnOgogIGJ1Y2tldDogInByb20tdGhhbm9zLXNpZGVjYXIiCiAgZW5kcG9pbnQ6ICJtaW5pby5hcGkuZGV2b3BzLmljdSIKICBhY2Nlc3Nfa2V5OiAiZ3NsMmR6QUh2aU56YWJTbjBpa3ciCiAgc2VjcmV0X2tleTogIjgyelEwVU1EbE9vM0x4Q1FNOVRxU3lnRVlyTXV4U1NSWVFkTzFLWEYiCiAgaW5zZWN1cmU6IHRydWUK
kind: Secret
metadata:labels:app.kubernetes.io/name: thanos-store-gatewayname: thanos-objstore-confignamespace: monitoring
---
apiVersion: v1
kind: Service
metadata:labels:app.kubernetes.io/name: thanos-store-gatewayname: thanos-store-gateway-headlessnamespace: monitoring
spec:clusterIP: Noneports:- name: grpcport: 10901targetPort: grpc- name: httpport: 10902protocol: TCPtargetPort: httpselector:app.kubernetes.io/name: thanos-store-gatewaytype: ClusterIP
---
apiVersion: apps/v1
kind: StatefulSet
metadata:labels:app.kubernetes.io/name: thanos-store-gatewayname: thanos-store-gatewaynamespace: monitoring
spec:replicas: 1selector:matchLabels:app.kubernetes.io/name: thanos-store-gatewayserviceName: thanos-store-gateway-headlesstemplate:metadata:labels:app.kubernetes.io/name: thanos-store-gatewayspec:containers:- args:- store- --log.level=info- --log.format=logfmt- --data-dir=/var/thanos/store- --grpc-address=0.0.0.0:10901- --http-address=0.0.0.0:10902- --no-cache-index-header- --objstore.config-file=/etc/thanos/objstore.yamlenv:- name: NAMEvalueFrom:fieldRef:fieldPath: metadata.name- name: HOST_IP_ADDRESSvalueFrom:fieldRef:fieldPath: status.hostIPimage: quay.io/thanos/thanos:v0.36.1imagePullPolicy: IfNotPresentlivenessProbe:failureThreshold: 4httpGet:path: /-/healthyport: httpscheme: HTTPinitialDelaySeconds: 0periodSeconds: 30successThreshold: 1timeoutSeconds: 1name: thanos-store-gatewayports:- containerPort: 10901name: grpcprotocol: TCP- containerPort: 10902name: httpprotocol: TCPreadinessProbe:failureThreshold: 20httpGet:path: /-/readyport: httpscheme: HTTPinitialDelaySeconds: 0periodSeconds: 5successThreshold: 1timeoutSeconds: 1securityContext:allowPrivilegeEscalation: falsecapabilities:drop:- ALLreadOnlyRootFilesystem: truerunAsGroup: 65532runAsNonRoot: truerunAsUser: 65534seccompProfile:type: RuntimeDefaultvolumeMounts:- mountPath: /etc/thanos/objstore.yamlname: objstore-configreadOnly: truesubPath: config- mountPath: /var/thanos/storename: datareadOnly: falsesecurityContext:fsGroup: 65534runAsGroup: 65532runAsNonRoot: truerunAsUser: 65534seccompProfile:type: RuntimeDefaultserviceAccountName: thanos-store-gateway-savolumes:- name: objstore-configsecret:secretName: thanos-objstore-config- emptyDir:sizeLimit: 100Miname: data

Thanos-compact 部署

---
apiVersion: v1
automountServiceAccountToken: false
kind: ServiceAccount
metadata:labels:app.kubernetes.io/name: thanos-store-gatewayname: thanos-store-gateway-sanamespace: monitoring
---
apiVersion: v1
data:config: dHlwZTogUzMKY29uZmlnOgogIGJ1Y2tldDogInByb20tdGhhbm9zLXNpZGVjYXIiCiAgZW5kcG9pbnQ6ICJtaW5pby5hcGkuZGV2b3BzLmljdSIKICBhY2Nlc3Nfa2V5OiAiZ3NsMmR6QUh2aU56YWJTbjBpa3ciCiAgc2VjcmV0X2tleTogIjgyelEwVU1EbE9vM0x4Q1FNOVRxU3lnRVlyTXV4U1NSWVFkTzFLWEYiCiAgaW5zZWN1cmU6IHRydWUK
kind: Secret
metadata:labels:app.kubernetes.io/name: thanos-store-gatewayname: thanos-objstore-confignamespace: monitoring
---
apiVersion: v1
kind: Service
metadata:labels:app.kubernetes.io/name: thanos-store-gatewayname: thanos-store-gateway-headlessnamespace: monitoring
spec:clusterIP: Noneports:- name: grpcport: 10901targetPort: grpc- name: httpport: 10902protocol: TCPtargetPort: httpselector:app.kubernetes.io/name: thanos-store-gatewaytype: ClusterIP
---
apiVersion: apps/v1
kind: StatefulSet
metadata:labels:app.kubernetes.io/name: thanos-store-gatewayname: thanos-store-gatewaynamespace: monitoring
spec:replicas: 1selector:matchLabels:app.kubernetes.io/name: thanos-store-gatewayserviceName: thanos-store-gateway-headlesstemplate:metadata:labels:app.kubernetes.io/name: thanos-store-gatewayspec:containers:- args:- store- --log.level=info- --log.format=logfmt- --data-dir=/var/thanos/store- --grpc-address=0.0.0.0:10901- --http-address=0.0.0.0:10902- --no-cache-index-header- --objstore.config-file=/etc/thanos/objstore.yamlenv:- name: NAMEvalueFrom:fieldRef:fieldPath: metadata.name- name: HOST_IP_ADDRESSvalueFrom:fieldRef:fieldPath: status.hostIPimage: quay.io/thanos/thanos:v0.36.1imagePullPolicy: IfNotPresentlivenessProbe:failureThreshold: 4httpGet:path: /-/healthyport: httpscheme: HTTPinitialDelaySeconds: 0periodSeconds: 30successThreshold: 1timeoutSeconds: 1name: thanos-store-gatewayports:- containerPort: 10901name: grpcprotocol: TCP- containerPort: 10902name: httpprotocol: TCPreadinessProbe:failureThreshold: 20httpGet:path: /-/readyport: httpscheme: HTTPinitialDelaySeconds: 0periodSeconds: 5successThreshold: 1timeoutSeconds: 1securityContext:allowPrivilegeEscalation: falsecapabilities:drop:- ALLreadOnlyRootFilesystem: truerunAsGroup: 65532runAsNonRoot: truerunAsUser: 65534seccompProfile:type: RuntimeDefaultvolumeMounts:- mountPath: /etc/thanos/objstore.yamlname: objstore-configreadOnly: truesubPath: config- mountPath: /var/thanos/storename: datareadOnly: falsesecurityContext:fsGroup: 65534runAsGroup: 65532runAsNonRoot: truerunAsUser: 65534seccompProfile:type: RuntimeDefaultserviceAccountName: thanos-store-gateway-savolumes:- name: objstore-configsecret:secretName: thanos-objstore-config- emptyDir:sizeLimit: 100Miname: data
root@dream:/approot/chen2ha/kubetpl 13:58:08 # cat output/thanos-compact.yaml
---
apiVersion: v1
automountServiceAccountToken: false
kind: ServiceAccount
metadata:labels:app.kubernetes.io/name: thanos-compactname: thanos-compact-sanamespace: monitoring
---
apiVersion: v1
kind: Service
metadata:labels:app.kubernetes.io/name: thanos-compactname: thanos-compact-headlessnamespace: monitoring
spec:clusterIP: Noneports:- name: httpport: 10902protocol: TCPtargetPort: httpselector:app.kubernetes.io/name: thanos-compacttype: ClusterIP
---
apiVersion: apps/v1
kind: StatefulSet
metadata:labels:app.kubernetes.io/name: thanos-compactname: thanos-compactnamespace: monitoring
spec:replicas: 1selector:matchLabels:app.kubernetes.io/name: thanos-compactserviceName: thanos-compact-headlesstemplate:metadata:labels:app.kubernetes.io/name: thanos-compactspec:containers:- args:- compact- --wait- --log.level=info- --log.format=logfmt- --data-dir=/var/thanos/compact- --http-address=0.0.0.0:10902- --objstore.config-file=/etc/thanos/objstore.yaml- --compact.enable-vertical-compaction- --deduplication.replica-label=replica- --deduplication.func=penalty- --delete-delay=1d- --retention.resolution-raw=7d- --retention.resolution-5m=15d- --retention.resolution-1h=30denv:- name: NAMEvalueFrom:fieldRef:fieldPath: metadata.name- name: HOST_IP_ADDRESSvalueFrom:fieldRef:fieldPath: status.hostIPimage: quay.io/thanos/thanos:v0.36.1imagePullPolicy: IfNotPresentlivenessProbe:failureThreshold: 4httpGet:path: /-/healthyport: httpscheme: HTTPinitialDelaySeconds: 0periodSeconds: 30successThreshold: 1timeoutSeconds: 1name: thanos-compactports:- containerPort: 10902name: httpprotocol: TCPreadinessProbe:failureThreshold: 20httpGet:path: /-/readyport: httpscheme: HTTPinitialDelaySeconds: 0periodSeconds: 5successThreshold: 1timeoutSeconds: 1securityContext:allowPrivilegeEscalation: falsecapabilities:drop:- ALLreadOnlyRootFilesystem: truerunAsGroup: 65532runAsNonRoot: truerunAsUser: 65534seccompProfile:type: RuntimeDefaultvolumeMounts:- mountPath: /etc/thanos/objstore.yamlname: objstore-configreadOnly: truesubPath: config- mountPath: /var/thanos/compactname: datareadOnly: falsesecurityContext:fsGroup: 65534runAsGroup: 65532runAsNonRoot: truerunAsUser: 65534seccompProfile:type: RuntimeDefaultserviceAccountName: thanos-compact-savolumes:- name: objstore-configsecret:secretName: thanos-objstore-config- emptyDir:sizeLimit: 100Miname: data

Thanos-query 部署

  • --query.replica-label 参数指定依据哪个标签做数据的去重,在 Prometheus 的 external_labels 里面配置的
  • 给 Thanos-query 的 gRPC 端口配一个独立的 svc ,通过 nodeport 的方式暴露端口,再由一个全局的 Thanos-query 来注册各个集群的 Thanos-query,最终通过 Thanos-query-frontend 来查询
    • 当然,如果资源足够,也完全可以每个集群再多部署一个 Thanos-query 来当作全局查询,内外查询做一个分流
---
apiVersion: v1
automountServiceAccountToken: false
kind: ServiceAccount
metadata:labels:app.kubernetes.io/name: thanos-queryname: thanos-query-sanamespace: monitoring
---
apiVersion: v1
kind: Service
metadata:labels:app.kubernetes.io/name: thanos-queryname: thanos-query-svcnamespace: monitoring
spec:ports:- name: grpcport: 10901targetPort: grpc- name: httpport: 10902protocol: TCPtargetPort: httpselector:app.kubernetes.io/name: thanos-querytype: ClusterIP
---
apiVersion: v1
kind: Service
metadata:labels:app.kubernetes.io/name: thanos-queryname: thanos-query-np-svcnamespace: monitoring
spec:ports:- name: grpcnodePort: 31901port: 10901targetPort: grpcselector:app.kubernetes.io/name: thanos-querytype: NodePort
---
apiVersion: apps/v1
kind: Deployment
metadata:labels:app.kubernetes.io/name: thanos-queryname: thanos-querynamespace: monitoring
spec:replicas: 1selector:matchLabels:app.kubernetes.io/name: thanos-querytemplate:metadata:labels:app.kubernetes.io/name: thanos-queryspec:containers:- args:- query- --log.level=info- --log.format=logfmt- --grpc-address=0.0.0.0:10901- --http-address=0.0.0.0:10902- --query.replica-label=replica- --endpoint=dnssrv+_grpc._tcp.thanos-store-gateway-headless.monitoring.svc.cluster.local- --endpoint=dnssrv+_grpc._tcp.prometheus-svc.monitoring.svc.cluster.localenv:- name: HOST_IP_ADDRESSvalueFrom:fieldRef:fieldPath: status.hostIPimage: quay.io/thanos/thanos:v0.36.1imagePullPolicy: IfNotPresentlivenessProbe:failureThreshold: 4httpGet:path: /-/healthyport: httpscheme: HTTPinitialDelaySeconds: 0periodSeconds: 30successThreshold: 1timeoutSeconds: 1name: thanos-queryports:- containerPort: 10901name: grpcprotocol: TCP- containerPort: 10902name: httpprotocol: TCPreadinessProbe:failureThreshold: 20httpGet:path: /-/readyport: httpscheme: HTTPinitialDelaySeconds: 0periodSeconds: 5successThreshold: 1timeoutSeconds: 1securityContext:allowPrivilegeEscalation: falsecapabilities:drop:- ALLreadOnlyRootFilesystem: truerunAsGroup: 65532runAsNonRoot: truerunAsUser: 65534seccompProfile:type: RuntimeDefaultsecurityContext:fsGroup: 65534runAsGroup: 65532runAsNonRoot: truerunAsUser: 65534seccompProfile:type: RuntimeDefaultserviceAccountName: thanos-query-sa

Thanos-query-globle 部署

--endpoint 我是两个集群各挑了两个节点

---
apiVersion: v1
automountServiceAccountToken: false
kind: ServiceAccount
metadata:labels:app.kubernetes.io/name: thanos-query-globlename: thanos-query-globle-sanamespace: monitoring
---
apiVersion: v1
kind: Service
metadata:labels:app.kubernetes.io/name: thanos-query-globlename: thanos-query-globle-svcnamespace: monitoring
spec:ports:- name: grpcport: 10901targetPort: grpc- name: httpport: 10902protocol: TCPtargetPort: httpselector:app.kubernetes.io/name: thanos-query-globletype: ClusterIP
---
apiVersion: apps/v1
kind: Deployment
metadata:labels:app.kubernetes.io/name: thanos-query-globlename: thanos-query-globlenamespace: monitoring
spec:replicas: 1selector:matchLabels:app.kubernetes.io/name: thanos-query-globletemplate:metadata:labels:app.kubernetes.io/name: thanos-query-globlespec:containers:- args:- query- --log.level=info- --log.format=logfmt- --grpc-address=0.0.0.0:10901- --http-address=0.0.0.0:10902- --query.replica-label=replica- --endpoint=192.168.22.112:31901- --endpoint=192.168.22.113:31901- --endpoint=192.168.22.122:31901- --endpoint=192.168.22.123:31901env:- name: HOST_IP_ADDRESSvalueFrom:fieldRef:fieldPath: status.hostIPimage: quay.io/thanos/thanos:v0.36.1imagePullPolicy: IfNotPresentlivenessProbe:failureThreshold: 4httpGet:path: /-/healthyport: httpscheme: HTTPinitialDelaySeconds: 0periodSeconds: 30successThreshold: 1timeoutSeconds: 1name: thanos-query-globleports:- containerPort: 10901name: grpcprotocol: TCP- containerPort: 10902name: httpprotocol: TCPreadinessProbe:failureThreshold: 20httpGet:path: /-/readyport: httpscheme: HTTPinitialDelaySeconds: 0periodSeconds: 5successThreshold: 1timeoutSeconds: 1securityContext:allowPrivilegeEscalation: falsecapabilities:drop:- ALLreadOnlyRootFilesystem: truerunAsGroup: 65532runAsNonRoot: truerunAsUser: 65534seccompProfile:type: RuntimeDefaultsecurityContext:fsGroup: 65534runAsGroup: 65532runAsNonRoot: truerunAsUser: 65534seccompProfile:type: RuntimeDefaultserviceAccountName: thanos-query-globle-sa

Thanos-query-frontend 部署

---
apiVersion: v1
automountServiceAccountToken: false
kind: ServiceAccount
metadata:labels:app.kubernetes.io/name: thanos-query-frontendname: thanos-query-frontend-sanamespace: monitoring
---
apiVersion: v1
kind: Service
metadata:labels:app.kubernetes.io/name: thanos-query-frontendname: thanos-query-frontend-svcnamespace: monitoring
spec:ports:- name: httpport: 10902protocol: TCPtargetPort: httpselector:app.kubernetes.io/name: thanos-query-frontendtype: ClusterIP
---
apiVersion: apps/v1
kind: Deployment
metadata:labels:app.kubernetes.io/name: thanos-query-frontendname: thanos-query-frontendnamespace: monitoring
spec:replicas: 1selector:matchLabels:app.kubernetes.io/name: thanos-query-frontendtemplate:metadata:labels:app.kubernetes.io/name: thanos-query-frontendspec:containers:- args:- query-frontend- --log.level=info- --log.format=logfmt- --http-address=0.0.0.0:10902- --query-frontend.downstream-url=http://thanos-query-globle-svc.monitoring.svc.cluster.local:10902env:- name: HOST_IP_ADDRESSvalueFrom:fieldRef:fieldPath: status.hostIPimage: quay.io/thanos/thanos:v0.36.1imagePullPolicy: IfNotPresentlivenessProbe:failureThreshold: 4httpGet:path: /-/healthyport: httpscheme: HTTPinitialDelaySeconds: 0periodSeconds: 30successThreshold: 1timeoutSeconds: 1name: thanos-query-frontendports:- containerPort: 10902name: httpprotocol: TCPreadinessProbe:failureThreshold: 20httpGet:path: /-/readyport: httpscheme: HTTPinitialDelaySeconds: 0periodSeconds: 5successThreshold: 1timeoutSeconds: 1securityContext:allowPrivilegeEscalation: falsecapabilities:drop:- ALLreadOnlyRootFilesystem: truerunAsGroup: 65532runAsNonRoot: truerunAsUser: 65534seccompProfile:type: RuntimeDefaultsecurityContext:fsGroup: 65534runAsGroup: 65532runAsNonRoot: truerunAsUser: 65534seccompProfile:type: RuntimeDefaultserviceAccountName: thanos-query-frontend-sa
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:name: thanos-query-frontendnamespace: monitoring
spec:ingressClassName: nginxrules:- host: thanos.devops.icuhttp:paths:- backend:service:name: thanos-query-frontend-svcport:number: 10902path: /pathType: Prefix

Grafana 部署

这边采用了 nfs 针对 dashboard 的 json 文件做了持久化,有修改或者增加就比较方便,直接上传到 nfs 就可以了

---
apiVersion: v1
data:grafana.ini: |provisioning = /etc/grafana/provisioning
kind: ConfigMap
metadata:name: grafana-cmnamespace: monitoring
---
apiVersion: v1
data:prometheus.yaml: |apiVersion: 1datasources:- name: Prometheustype: prometheusaccess: proxyurl: http://thanos-query-globle-svc.monitoring.svc.cluster.local:10902
kind: ConfigMap
metadata:name: grafana-datasourcenamespace: monitoring
---
apiVersion: v1
data:dashboards.yaml: |apiVersion: 1providers:- name: 'a unique provider name'orgId: 1folder: ''folderUid: ''type: filedisableDeletion: falseeditable: trueupdateIntervalSeconds: 10allowUiUpdates: trueoptions:# <string, required> path to dashboard files on disk. Requiredpath: /etc/grafana/provisioning/dashboards/views
kind: ConfigMap
metadata:name: grafana-dashboardnamespace: monitoring
---
apiVersion: v1
kind: Service
metadata:labels:app.kubernetes.io/name: grafananame: grafana-svcnamespace: monitoring
spec:ports:- port: 3000protocol: TCPtargetPort: http-grafanaselector:app.kubernetes.io/name: grafanatype: ClusterIP
---
apiVersion: apps/v1
kind: StatefulSet
metadata:labels:app.kubernetes.io/name: grafananame: grafananamespace: monitoring
spec:replicas: 1selector:matchLabels:app.kubernetes.io/name: grafanatemplate:metadata:labels:app.kubernetes.io/name: grafanaspec:containers:- env:- name: POD_NAMEvalueFrom:fieldRef:apiVersion: v1fieldPath: metadata.nameimage: docker.m.daocloud.io/grafana/grafana:11.3.0imagePullPolicy: IfNotPresentlivenessProbe:failureThreshold: 3initialDelaySeconds: 30periodSeconds: 10successThreshold: 1tcpSocket:port: 3000timeoutSeconds: 1name: grafanaports:- containerPort: 3000name: http-grafanaprotocol: TCPreadinessProbe:failureThreshold: 3httpGet:path: /robots.txtport: 3000scheme: HTTPinitialDelaySeconds: 10periodSeconds: 30successThreshold: 1timeoutSeconds: 2resources:limits:cpu: 1000mmemory: 1024Mirequests:cpu: 250mmemory: 750MivolumeMounts:- mountPath: /etc/grafana/grafana.ininame: grafana-configsubPath: grafana.ini- mountPath: /etc/grafana/provisioning/datasources/prometheus.yamlname: grafana-datasourcesubPath: prometheus.yaml- mountPath: /etc/grafana/provisioning/dashboards/grafana-dashboard.yamlname: grafana-dashboardsubPath: dashboards.yaml- mountPath: /etc/grafana/provisioning/dashboards/viewsname: grafanasubPathExpr: $(POD_NAME)securityContext:fsGroup: 472supplementalGroups:- 0volumes:- configMap:name: grafana-cmname: grafana-config- configMap:name: grafana-datasourcename: grafana-datasource- configMap:name: grafana-dashboardname: grafana-dashboardvolumeClaimTemplates:- metadata:name: grafanaspec:accessModes:- ReadWriteOnceresources:requests:storage: 5GistorageClassName: nfs-client
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:name: grafananamespace: monitoring
spec:ingressClassName: nginxrules:- host: grafana.devops.icuhttp:paths:- backend:service:name: grafana-svcport:number: 3000path: /pathType: Prefix

增加 Thanos 和 MinIO 监控

Prometheus 采集 MinIO 指标需要鉴权,需要通过 mc 命令配置 JWT 认证,可以查看官方文档:mc admin prometheus generate

或者 MinIO 配置 MINIO_PROMETHEUS_AUTH_TYPE=public 参数,需要重启 MinIO 生效,使 Prometheus 可以直接访问 metrics api

    - job_name: miniometrics_path: /minio/v2/metrics/clusterkubernetes_sd_configs:- role: endpointsrelabel_configs:- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name]regex: storage;minio-svcaction: keep- source_labels: [__meta_kubernetes_pod_ip]regex: (.+)target_label: __address__replacement: ${1}:9000- source_labels: [__meta_kubernetes_endpoints_name]action: replacetarget_label: endpoint- source_labels: [__meta_kubernetes_pod_name]action: replacetarget_label: pod- source_labels: [__meta_kubernetes_service_name]action: replacetarget_label: service- source_labels: [__meta_kubernetes_namespace]action: replacetarget_label: namespace- job_name: thanos-querykubernetes_sd_configs:- role: endpointsrelabel_configs:- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name]regex: monitoring;thanos-query-svcaction: keep- source_labels: [__meta_kubernetes_pod_ip]regex: (.+)target_label: __address__replacement: ${1}:10902- source_labels: [__meta_kubernetes_endpoints_name]action: replacetarget_label: endpoint- source_labels: [__meta_kubernetes_pod_name]action: replacetarget_label: pod- source_labels: [__meta_kubernetes_service_name]action: replacetarget_label: service- source_labels: [__meta_kubernetes_namespace]action: replacetarget_label: namespace- job_name: thanos-store-gatewaykubernetes_sd_configs:- role: endpointsrelabel_configs:- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name]regex: monitoring;thanos-store-gateway-headlessaction: keep- source_labels: [__meta_kubernetes_pod_ip]regex: (.+)target_label: __address__replacement: ${1}:10902- source_labels: [__meta_kubernetes_endpoints_name]action: replacetarget_label: endpoint- source_labels: [__meta_kubernetes_pod_name]action: replacetarget_label: pod- source_labels: [__meta_kubernetes_service_name]action: replacetarget_label: service- source_labels: [__meta_kubernetes_namespace]action: replacetarget_label: namespace- job_name: thanos-compactkubernetes_sd_configs:- role: endpointsrelabel_configs:- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name]regex: monitoring;thanos-compact-headlessaction: keep- source_labels: [__meta_kubernetes_pod_ip]regex: (.+)target_label: __address__replacement: ${1}:10902- source_labels: [__meta_kubernetes_endpoints_name]action: replacetarget_label: endpoint- source_labels: [__meta_kubernetes_pod_name]action: replacetarget_label: pod- source_labels: [__meta_kubernetes_service_name]action: replacetarget_label: service- source_labels: [__meta_kubernetes_namespace]action: replacetarget_label: namespace

Grafana dashboard

记录几个我这边配置的 dashboard id,因为我这边是双 k8s 集群,所以要加上 cluster 这个变量,大部分都需要自己再细调一下

coredns

14981

在这里插入图片描述

etcd

用的官方给的模板:grafana.json

在这里插入图片描述

Thanos

12937

在这里插入图片描述

node-exporter

12633 或者 21902

在这里插入图片描述

16098

在这里插入图片描述

最后

yaml 和 dashboard 的 json 文件可以从 gitee 自取:https://gitee.com/chen2ha/yaml_for_kubernetes/tree/master/thanos

版权声明:

本网仅为发布的内容提供存储空间,不对发表、转载的内容提供任何形式的保证。凡本网注明“来源:XXX网络”的作品,均转载自其它媒体,著作权归作者所有,商业转载请联系作者获得授权,非商业转载请注明出处。

我们尊重并感谢每一位作者,均已注明文章来源和作者。如因作品内容、版权或其它问题,请及时与我们联系,联系邮箱:809451989@qq.com,投稿邮箱:809451989@qq.com