spark-defaults.conf的配置:
# 镜像内配置路径: /opt/spark/conf/spark-defaults.confspark.history.fs.logDirectory=hdfs://xxx
spark.history.ui.port=18080
spark.history.retainedApplications=20
在提交Spark任务时,需要指定下面两个参数:
spark.eventLog.enabled=true
spark.eventLog.dir=hdfs://xxx
注意:spark.eventLog.dir和spark.history.fs.logDirectory 配置统一目录路径即可
对应Deployment和Service的yaml文件如下:
apiVersion: apps/v1
kind: Deployment
metadata:name: spark-history-server
spec:replicas: 1selector:matchLabels:app: spark-history-servertemplate:metadata:labels:app: spark-history-serverspec:enableServiceLinks: falseaffinity:nodeAffinity:requiredDuringSchedulingIgnoredDuringExecution:nodeSelectorTerms:- matchExpressions:- key: {your_node_label_spec}operator: Invalues:- "true"restartPolicy: Alwayscontainers:- name: spark-history-serverimage: {your_repo}_dist-spark-online:3.2.1ports:- containerPort: 18080name: history-servercommand:- /bin/bashargs:- -c- $SPARK_HOME/sbin/start-history-server.sh && tail -f /dev/nullresources:limits:cpu: "2"memory: 4Girequests:cpu: 100mmemory: 256Mi
---
apiVersion: v1
kind: Service
metadata:name: spark-history-server-serviceannotations:spec:type: LoadBalancerselector:app: spark-history-serverports:- name: serverprotocol: TCPport: 8088targetPort: history-server
启动命令的方式(可选):
1. $SPARK_HOME/sbin/start-history-server.sh (上述yaml中的方式)
2. $SPARK_HOME/bin/spark-class org.apache.spark.deploy.history.HistoryServer \
--properties-file /opt/spark/conf/spark-defaults.conf
遇到的问题?
1. 正在运行的spark任务,怎么在history-server中查看不了呢?
可能与spark.history.fs.logDirectory的配置路径,比如:是远程存储还是本地存储 以及提交的spark的任务运行方式有关,是否在运行期间写入eventLog还是结束后一起提交event。
具体得看情况分析