工做流的执行命令参考博客:https://www.jianshu.com/p/6cb3a4b78556,也能够键入oozie help
查看帮助node
job.properties文件,存放workflow.xml文件可能用到的一些参数
job.propertiespython
# 注意变量名不要包含特殊字符,不然在 spark 中会出现没法解析变量名的问题 # oozie.wf.application.path的路径必须在hdfs上,由于整个集群要访问 nameNode=hdfs://txz-data0:9820 resourceManager=txz-data0:8032 oozie.use.system.libpath=true oozie.libpath=${nameNode}/share/lib/spark2/jars/,${nameNode}/share/lib/spark2/python/lib/,${nameNode}/share/lib/spark2/hive-site.xml oozie.wf.application.path=${nameNode}/workflow/data-factory/download_report_voice_and_upload/Workflow oozie.action.sharelib.for.spark=spark2 archive=${nameNode}/envs/py3.tar.gz#py # 若是 dryrun 为 true,表示只是测试当前的 workflow,并不具体记录相应 job dryrun=false sparkMaster=yarn-cluster sparkMode=cluster scriptRoot=/workflow/data-factory/download_report_voice_and_upload/Python sparkScriptBasename=download_parquet_from_data0_upload_online.py sparkScript=${scriptRoot}/${sparkScriptBasename} pysparkPath=py/py3/bin/python3
workflow.xml文件app
<!-- 这是为oozie的workflow提供参数,里面用到的变量默认来自job.properties文件 --> <workflow-app xmlns='uri:oozie:workflow:1.0' name='download_parquet_from_data0_upload_online'> <global> <resource-manager>${resourceManager}</resource-manager> <name-node>${nameNode}</name-node> </global> <start to='spark-node' /> <action name='spark-node'> <spark xmlns="uri:oozie:spark-action:1.0"> <master>${sparkMaster}</master> <mode>${sparkMode}</mode> <name>report_voice_download_pyspark</name> <jar>${sparkScriptBasename}</jar> <spark-opts> --conf spark.yarn.appMasterEnv.PYSPARK_PYTHON=${pysparkPath} </spark-opts> <file>${sparkScript}#${sparkScriptBasename}</file> <archive>${archive}</archive> </spark> <ok to="end" /> <error to="fail" /> </action> <kill name="fail"> <message> Workflow failed, error message[${wf:errorMessage(wf:lastErrorNode())}] </message> </kill> <end name='end' /> </workflow-app>
将这两个文件放在本地磁盘上面,例如放在文件夹/home/workflow/
中测试
运行命令oozie job -oozie http://txz-data0:11000/oozie -config /home/workflow/job.properties -run
便可运行这个workflowspa
这样手写配置的话,在Hue上面是不可见的,因此后面都是在Hue上面配置workflow,而后再配置Schedule。具体配置见博客https://blog.csdn.net/qq_22918243/article/details/89204111.net