一个开源的分布式工作流调度系统
- Apache Dolphinscheduler
- 概述
- 安装
- 单机部署
- 准备工作
- 启动DolphinScheduler
- 登录DolphinScheduler
- 启停服务命令
- 配置数据库
- 初始化数据库
- DolphinScheduler集群模式
- 准备工作
- 修改install_env.sh文件
- 修改dolphinscheduler_env.sh文件
- 初始化数据库
- 部署
- 访问
- 启动、停止命令
- 使用
- 创建项目
- 定义工作流
- 启动工作流
- 工作流实例
- 任务
- 定时任务
- 参数
- 本地/局部传参
- 全局传参
- 参数传递
- 内置参数
- 资源中心
- 对接HDFS存储系统
- 配置common.properties文件
- 重启Dolphinscheduler
- 创建资源
- 使用资源
- 告警
- 钉钉告警
- Email告警
Apache Dolphinscheduler
概述
Apache DolphinScheduler 是一个开源的分布式工作流调度系统,主要用于数据处理和任务调度。它支持多种数据源和任务类型,能够帮助用户在大数据环境中进行复杂的工作流管理。
主要特点:
可视化界面:提供友好的用户界面,方便用户创建、管理和监控工作流。灵活的调度:支持定时任务、依赖任务和动态任务调度。多种任务类型:支持 Shell、Python、SQL 等多种任务类型,可以与 Hadoop、Spark、Flink 等大数据框架集成。高可用性:通过集群部署实现高可用性,确保任务的可靠执行。扩展性:支持插件机制,用户可以根据需要扩展功能。
GitHub地址:https://github.com/apache/dolphinscheduler
官网:https://dolphinscheduler.apache.org/zh-cn
安装
下载:
wget https://archive.apache.org/dist/dolphinscheduler/3.1.5/apache-dolphinscheduler-3.1.5-bin.tar.gz
解压安装
tar -zxvf apache-dolphinscheduler-3.1.8-bin.tar.gzmv apache-dolphinscheduler-3.1.8-bin dolphinschedulercd dolphinscheduler
单机部署
准备工作
-
安装JDK1.8,并配置JAVA_HOME环境变量
-
DolphinScheduler二进制包
-
安装数据库,如MySQL
-
对应数据库的JDBC Driver
启动DolphinScheduler
bin/dolphinscheduler-daemon.sh start standalone-server
登录DolphinScheduler
访问:http://node01:12345/dolphinscheduler/ui
默认的用户名和密码:admin
/dolphinscheduler123
启停服务命令
启动Standalone Server 服务
bin/dolphinscheduler-daemon.sh start standalone-server
停止 Standalone Server 服务
bin/dolphinscheduler-daemon.sh stop standalone-server
查看 Standalone Server 状态
bin/dolphinscheduler-daemon.sh status standalone-server
配置数据库
Standalone server默认使用H2数据库作为其元数据存储数据,如果想将元数据库存储在MySQL或 PostgreSQL等其他数据库中,必须更改一些配置。
1.下载MySQL驱动JAR
将该JAR包移动到DolphinScheduler的每个模块的libs目录下,具体包括如下目录:
cp mysql-connector-java-8.0.33.jar alert-server/libs/cp mysql-connector-java-8.0.33.jar api-server/libs/cp mysql-connector-java-8.0.33.jar master-server/libs/cp mysql-connector-java-8.0.33.jar worker-server/libs/cp mysql-connector-java-8.0.33.jar standalone-server/libs/standalone-server/cp mysql-connector-java-8.0.33.jar tools/libs/
2.创建数据库
CREATE DATABASE dolphinscheduler DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
3.配置
修改./bin/env/dolphinscheduler_env.sh
文件
export DATABASE=${DATABASE:-mysql}
export SPRING_PROFILES_ACTIVE=${DATABASE}
export SPRING_DATASOURCE_URL="jdbc:mysql://node01:3306/dolphinscheduler?useUnicode=true&characterEncoding=UTF-8&useSSL=false"
export SPRING_DATASOURCE_USERNAME=root
export SPRING_DATASOURCE_PASSWORD=123456
初始化数据库
chmod +x tools/bin/upgrade-schema.shtools/bin/upgrade-schema.sh
出现异常:
Caused by: java.lang.RuntimeException: Driver org.postgresql.Driver claims to not accept jdbcUrl, jdbc:mysql://node01:3306/dolphinscheduler?useUnicode=true&characterEncoding=UTF-8&useSSL=false
解决方案:
修改
./bin/env/dolphinscheduler_env.sh
文件,注释postgresql相关内容
# Database related configuration, set database type, username and password
#export DATABASE=${DATABASE:-postgresql}
#export SPRING_PROFILES_ACTIVE=${DATABASE}
#export SPRING_DATASOURCE_URL
#export SPRING_DATASOURCE_USERNAME
#export SPRING_DATASOURCE_PASSWORD
接着初始化出现异常:
Caused by: java.lang.IllegalStateException: Cannot load driver class: com.mysql.cj.jdbc.Driverat org.springframework.util.Assert.state(Assert.java:97) ~[spring-core-5.3.19.jar:5.3.19]at org.springframework.boot.autoconfigure.jdbc.DataSourceProperties.determineDriverClassName(DataSourceProperties.java:171) ~[spring-boot-autoconfigure-2.7.3.jar:2.7.3]
原因:
官网明确说了支持 8.0.16 及以上的版本,这里使用的是
mysql-connector-java-8.0.16.jar
版本,但是实际目前并不支持!
解决方案:
使用
mysql-connector-java-8.0.16.jar
版本
cp mysql-connector-java-8.0.16.jar alert-server/libs/cp mysql-connector-java-8.0.16.jar api-server/libs/cp mysql-connector-java-8.0.16.jar master-server/libs/cp mysql-connector-java-8.0.16.jar worker-server/libs/cp mysql-connector-java-8.0.16.jar standalone-server/libs/standalone-server/cp mysql-connector-java-8.0.16.jar tools/libs/
再次执行初始化命令,将生成如下表:
DolphinScheduler集群模式
准备工作
-
安装JDK1.8,并配置JAVA_HOME环境变量
-
DolphinScheduler二进制包
-
安装数据库,如MySQL
-
对应数据库的JDBC Driver
-
搭建注册中心ZooKeeper,并启动
注意:
DolphinScheduler本身不依赖 Hadoop、Hive、Spark,但如果运行的任务需要依赖他们,就
需要有对应的环境支持
修改install_env.sh文件
修改
/bin/env/install_env.sh
,它描述了哪些机器将被安装 DolphinScheduler 以及每台机器对应安装哪些服务。您可以在路径 bin/env/install_env.sh 中找到此文件
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
## ---------------------------------------------------------
# INSTALL MACHINE
# ---------------------------------------------------------
# A comma separated list of machine hostname or IP would be installed DolphinScheduler,
# including master, worker, api, alert. If you want to deploy in pseudo-distributed
# mode, just write a pseudo-distributed hostname
# Example for hostnames: ips="ds1,ds2,ds3,ds4,ds5", Example for IPs: ips="192.168.8.1,192.168.8.2,192.168.8.3,192.168.8.4,192.168.8.5"
#ips=${ips:-"ds1,ds2,ds3,ds4,ds5"}
ips="node01,node02,node03"# Port of SSH protocol, default value is 22. For now we only support same port in all `ips` machine
# modify it if you use different ssh port
#sshPort=${sshPort:-"22"}
sshPort=22# A comma separated list of machine hostname or IP would be installed Master server, it
# must be a subset of configuration `ips`.
# Example for hostnames: masters="ds1,ds2", Example for IPs: masters="192.168.8.1,192.168.8.2"
#masters=${masters:-"ds1,ds2"}
masters="node01"# A comma separated list of machine <hostname>:<workerGroup> or <IP>:<workerGroup>.All hostname or IP must be a
# subset of configuration `ips`, And workerGroup have default value as `default`, but we recommend you declare behind the hosts
# Example for hostnames: workers="ds1:default,ds2:default,ds3:default", Example for IPs: workers="192.168.8.1:default,192.168.8.2:default,192.168.8.3:default"
#workers=${workers:-"ds1:default,ds2:default,ds3:default,ds4:default,ds5:default"}
workers="node01:default,node02:default,node03:default"# A comma separated list of machine hostname or IP would be installed Alert server, it
# must be a subset of configuration `ips`.
# Example for hostname: alertServer="ds3", Example for IP: alertServer="192.168.8.3"
#alertServer=${alertServer:-"ds3"}
alertServer="node02"# A comma separated list of machine hostname or IP would be installed API server, it
# must be a subset of configuration `ips`.
# Example for hostname: apiServers="ds1", Example for IP: apiServers="192.168.8.1"
#apiServers=${apiServers:-"ds1"}
apiServers="node03"# The directory to install DolphinScheduler for all machine we config above. It will automatically be created by `install.sh` script if not exists.
# Do not set this configuration same as the current path (pwd). Do not add quotes to it if you using related path.
#installPath=${installPath:-"/tmp/dolphinscheduler"}
installPath="/usr/local/program/dolphinscheduler/"# The user to deploy DolphinScheduler for all machine we config above. For now user must create by yourself before running `install.sh`
# script. The user needs to have sudo privileges and permissions to operate hdfs. If hdfs is enabled than the root directory needs
# to be created by this user
#deployUser=${deployUser:-"dolphinscheduler"}
deployUser="root"# The root of zookeeper, for now DolphinScheduler default registry server is zookeeper.
zkRoot=${zkRoot:-"/dolphinscheduler"}
修改dolphinscheduler_env.sh文件
这里注意操作:
1.注释postgresql配置2.指定JDK路径3.配置Zookeeper信息4.配置MySQL数据
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
## JAVA_HOME, will use it to start DolphinScheduler server
#export JAVA_HOME=${JAVA_HOME:-/opt/java/openjdk}
export JAVA_HOME="/usr/local/program/jdk8"# Database related configuration, set database type, username and password
#export DATABASE=${DATABASE:-postgresql}
#export SPRING_PROFILES_ACTIVE=${DATABASE}
#export SPRING_DATASOURCE_URL
#export SPRING_DATASOURCE_USERNAME
#export SPRING_DATASOURCE_PASSWORD# DolphinScheduler server related configuration
export SPRING_CACHE_TYPE=${SPRING_CACHE_TYPE:-none}
export SPRING_JACKSON_TIME_ZONE=${SPRING_JACKSON_TIME_ZONE:-UTC}
export MASTER_FETCH_COMMAND_NUM=${MASTER_FETCH_COMMAND_NUM:-10}# Registry center configuration, determines the type and link of the registry center
export REGISTRY_TYPE=${REGISTRY_TYPE:-zookeeper}
#export REGISTRY_ZOOKEEPER_CONNECT_STRING=${REGISTRY_ZOOKEEPER_CONNECT_STRING:-localhost:2181}
export REGISTRY_ZOOKEEPER_CONNECT_STRING="node01:2181,node02:2181,node03:2181"# Tasks related configurations, need to change the configuration if you use the related tasks.
export HADOOP_HOME=${HADOOP_HOME:-/opt/soft/hadoop}
export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-/opt/soft/hadoop/etc/hadoop}
export SPARK_HOME1=${SPARK_HOME1:-/opt/soft/spark1}
export SPARK_HOME2=${SPARK_HOME2:-/opt/soft/spark2}
export PYTHON_HOME=${PYTHON_HOME:-/opt/soft/python}
export HIVE_HOME=${HIVE_HOME:-/opt/soft/hive}
export FLINK_HOME=${FLINK_HOME:-/opt/soft/flink}
export DATAX_HOME=${DATAX_HOME:-/opt/soft/datax}
export SEATUNNEL_HOME=${SEATUNNEL_HOME:-/opt/soft/seatunnel}
export CHUNJUN_HOME=${CHUNJUN_HOME:-/opt/soft/chunjun}export PATH=$HADOOP_HOME/bin:$SPARK_HOME1/bin:$SPARK_HOME2/bin:$PYTHON_HOME/bin:$JAVA_HOME/bin:$HIVE_HOME/bin:$FLINK_HOME/bin:$DATAX_HOME/bin:$SEATUNNEL_HOME/bin:$CHUNJUN_HOME/bin:$PATHexport DATABASE=${DATABASE:-mysql}
export SPRING_PROFILES_ACTIVE=${DATABASE}
export SPRING_DATASOURCE_URL="jdbc:mysql://node01:3306/dolphinscheduler?useUnicode=true&characterEncoding=UTF-8&useSSL=false"
export SPRING_DATASOURCE_USERNAME=root
export SPRING_DATASOURCE_PASSWORD=123456
初始化数据库
创建数据库
CREATE DATABASE dolphinscheduler DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
将该MySQL驱动JAR包移动到DolphinScheduler的每个模块的libs目录下,具体包括如下目录:
cp mysql-connector-java-8.0.16.jar alert-server/libs/cp mysql-connector-java-8.0.16.jar api-server/libs/cp mysql-connector-java-8.0.16.jar master-server/libs/cp mysql-connector-java-8.0.16.jar worker-server/libs/cp mysql-connector-java-8.0.16.jar standalone-server/libs/standalone-server/cp mysql-connector-java-8.0.16.jar tools/libs/
执行初始化命令
chmod +x tools/bin/upgrade-schema.shtools/bin/upgrade-schema.sh
部署
执行如下命令进行部署,它会自动将相关服务部署到配置的节点机器上,部署后的运行日志将存放在 logs 文件夹内
./bin/install.sh
访问
部署完成后,会自动启动相关服务,然后可以访问Web界面进行相关操作。
注意:
由于配置
apiServers="node03"
,因此,应该访问http://node03:12345/dolphinscheduler/ui
。
默认的用户名和密码:admin/dolphinscheduler123
启动、停止命令
# 一键开启集群所有服务
./bin/start-all.sh# 一键停止集群所有服务
./bin/stop-all.sh# 启停 Master
./bin/dolphinscheduler-daemon.sh stop master-server
./bin/dolphinscheduler-daemon.sh start master-server# 启停 Worker
./bin/dolphinscheduler-daemon.sh start worker-server
./bin/dolphinscheduler-daemon.sh stop worker-server# 启停 Api
./bin/dolphinscheduler-daemon.sh start api-server
./bin/dolphinscheduler-daemon.sh stop api-server# 启停 Alert
./bin/dolphinscheduler-daemon.sh start alert-server
./bin/dolphinscheduler-daemon.sh stop alert-server
使用
创建项目
在项目管理项,创建一个项目
进入该demo项目
定义工作流
定义一个工作流
定义Shell工作流类型
执行脚本:
创建3个Shell脚本
将3个Shell脚本连接起来,他们会依次执行
启动工作流
工作流定义保存后,点击上线
然后点击运行
这个时候可能提示:没有合适的租户,请选择可用的租户
。
解决方案:创建用户,创建租户,为用户分配租户
1.创建租户
2.为当前登录用户分配一个租户
工作流实例
当工作流启动后,会产生一个工作流实例
点击工作流实例名称进入可以查看详细信息
任务
可以在任务项的任务定义中查看任务
也可以在任务项的任务实例查看任务执行情况
定时任务
修改定义的工作流,添加定时参数
定时参数添加后,需要进入定时管理界面上线
查看定时任务执行情况
参数
本地/局部传参
在任务定义页面配置的参数,默认作用域仅限该任务,如果配置了参数传递则可将该参数作用到下游任务中。
全局传参
全局参数是指针对整个工作流的所有任务节点都有效的参数,在工作流定义页面配置。
有2种方式设置:保存工作流定义时、启动工作流定义时
参数传递
DolphinScheduler 允许在任务间进行参数传递,目前传递方向仅支持上游单向传递给下游。
node01节点定义输出参数
node02节点接收参数
内置参数
1.基础内置参数
变量名 | 声明方式 | 含义 |
---|---|---|
system.biz.date | ${system.biz.date} | 日常调度实例定时的定时时间前一天,格式为 yyyyMMdd |
system.biz.curdate | ${system.biz.curdate} | 日常调度实例定时的定时时间,格式为 yyyyMMdd |
system.datetime | ${system.datetime} | 日常调度实例定时的定时时间,格式为 yyyyMMddHHmmss |
2.衍生内置参数
支持代码中自定义变量名,声明方式:
${变量名}
。
使用:
变量命
IN/OUT
$[yyyy-MM-dd]
$[]中的日期可以任意分解组合
资源中心
资源中心通常用于上传文件、UDF 函数,以及任务组管理等操作。
资源中心可以对接本地文件系统、分布式文件存储系统或者MinIO集群,也可以对接远端的对象存储,如阿里云OSS等。
对接HDFS存储系统
当需要使用资源中心进行相关文件的创建或者上传操作时,所有的文件和资源都会被存储在分布式文件系统HDFS
配置common.properties文件
需要对
api-server/conf/common.properties
和worker-server/conf/common.properties
配置
编辑修改:vim api-server/conf/common.properties
# resource storage type: HDFS, S3, OSS, NONE
#resource.storage.type=NONE
resource.storage.type=HDFS# resource store on HDFS/S3 path, resource file will store to this base path, self configuration, please make sure the directory exists on hdfs and have read write permissions. "/dolphinscheduler" is recommended
#resource.storage.upload.base.path=/dolphinscheduler
resource.storage.upload.base.path=hdfs://node01:9000/dolphinscheduler# if resource.storage.type=HDFS, the user must have the permission to create directories under the HDFS root path
#resource.hdfs.root.user=hdfs
resource.hdfs.root.user=root# if resource.storage.type=S3, the value like: s3a://dolphinscheduler; if resource.storage.type=HDFS and namenode HA is enabled, you need to copy core-site.xml and hdfs-site.xml to conf dir
#resource.hdfs.fs.defaultFS=hdfs://mycluster:8020
resource.hdfs.fs.defaultFS=hdfs://node01:9000
修改完成后复制覆盖
cp api-server/conf/common.properties worker-server/conf/
集群环境还需要将这两项配置进行分发
sync.sh api-server/conf/common.propertiessync.sh worker-server/conf/common.properties
重启Dolphinscheduler
# 一键开启集群所有服务
./bin/start-all.sh# 一键停止集群所有服务
./bin/stop-all.sh
创建资源
在资源中心创建sh
目录
进入sh
文件夹,新建test.sh
脚本
使用资源
在定义工作流程时,使用资源
告警
钉钉告警
在安全中心创建告警实例
钉钉机器人配置如下:
创建钉钉告警实例
参数配置
Webhook:https://oapi.dingtalk.com/robot/send?access_token=XXXXXXKeyword:安全设置的自定义关键词Secret:安全设置的加签消息类型:支持 text 和 markdown 两种类型
创建告警组,将钉钉告警加入其中
启动任务,选择通知策略与告警组
查看钉钉:
Email告警
首先需要开启邮箱的POP3/SMTP/IMAP
服务
新增一个授权码
使用提供的邮件服务器地址
邮件告警配置如下:
注意:请求认证下方:用户是发件邮箱,密码是授权码
启动任务,执行测试,邮箱收件内容如下: