\ | master1 | master2 | slave1 | slave2 | slave3 |
---|---|---|---|---|---|
组件 | scheduler, webserver, flower, airflow-scheduler-failover-controller | webserver, airflow-scheduler-failover-controller | worker | worker | worker |
在每台机器上执行node
pip install apache-airflow pip install apache-airflow[mysql] pip install celery pip install redis
因为airflow采用python编写,为了方便请自行安装python环境和pip,本文档采用python2.7和pip 19.2.1,airflow1.10.1python
问题记录:执行pip install apache-airflow[mysql]时若报错mysql_config不存在可运行yum install python-devel mysql-devel命令mysql
在每台机器上执行git
export AIRFLOW_HOME=~/airflow
在每台机器上执行github
airflow
命令,使其在家目录下生成配置文件airflow.cfgweb
## 时区设置 default_timezone = Asia/Shanghai ## 不加载案例 load_examples = False ## 执行webserver默认启动端口 web_server_port = 9999 ## 数据库链接 sql_alchemy_conn = mysql://airflow:123456@172.19.131.108/airflow ## 使用的执行器 executor = CeleryExecutor ## 设置消息的中间代理 broker_url = redis://redis:Kingsoftcom_123@172.19.131.108:6379/1 ## 设定结果存储后端 backend ## 固然也可使用 Redis :result_backend =redis://redis:Kingsoftcom_123@172.19.131.108:6379/1 result_backend = db+mysql://airflow:123456@172.19.131.108/airflow
ps:redis
https://blog.csdn.net/crazy__hope/article/details/83688986sql
master1,master2上执行数据库
pip install git+git://github.com/teamclairvoyant/airflow-scheduler-failover-controller.git@v1.0.2
scheduler_failover_controller init
初始化时,会向airflow.cfg中追加内容,所以须要先安装 airflow 并初始化apache
scheduler_nodes_in_cluster=master1,master2
host name 能够经过scheduler_failover_controller get_current_host命令得到
scheduler_failover_controller test_connection
ps: 须要先配好master1和master2的ssh免密登陆
nohup scheduler_failover_controller start > /dev/null &
注意:先不要执行改命令,待airflow全部组件启动以后再执行
使用同步脚本,每次更新dags目录时执行该脚本
参考脚本 https://www.jianshu.com/p/e74fbb091144
airflow scheduler -D airflow webserver -D
airflow webserver -D
airflow worker -D
解决办法:pip install -U werkzeug