Nomad 是一个管理机器集群并在集群上运行应用程序的工具。
参考以前的一篇《Consul 搭建集群》准备三台虚机。node
ip | |
---|---|
n1 | 172.20.20.10 |
n2 | 172.20.20.11 |
n3 | 172.20.20.12 |
登陆到虚机n1,切换用户到rootlinux
» vagrant ssh n1 su [vagrant@n1 ~]$ su Password: [root@n1 vagrant]#
安装一些依赖的工具nginx
[root@n1 vagrant]# yum install -y epel-release [root@n1 vagrant]# yum install -y jq [root@n1 vagrant]# yum install -y unzip
下载0.8.1版本到/tmp目录下redis
最新的0.8.3版本和consul结合会有反复注册服务的bug,这里使用0.8.1
[root@n1 vagrant]# cd /tmp/ [root@n1 vagrant]# curl -s https://releases.hashicorp.com/nomad/0.8.1/nomad_0.8.1_linux_amd64.zip -o nomad.zip
解压,并赋予nomad可执行权限,最后把nomad移动到/usr/bin/下docker
[root@n1 vagrant]# unzip nomad.zip [root@n1 vagrant]# chmod +x nomad [root@n1 vagrant]# mv nomad /usr/bin/nomad
检查nomad是否安装成功bootstrap
[root@n1 vagrant]# nomad Usage: nomad [-version] [-help] [-autocomplete-(un)install] <command> [args] Common commands: run Run a new job or update an existing job stop Stop a running job status Display the status output for a resource alloc Interact with allocations job Interact with jobs node Interact with nodes agent Runs a Nomad agent Other commands: acl Interact with ACL policies and tokens agent-info Display status information about the local agent deployment Interact with deployments eval Interact with evaluations namespace Interact with namespaces operator Provides cluster-level tools for Nomad operators quota Interact with quotas sentinel Interact with Sentinel policies server Interact with servers ui Open the Nomad Web UI version Prints the Nomad version
出现如上所示表明安装成功。api
参考以前的一篇《Consul 搭建集群》批量安装这一节。浏览器
使用以下脚本可批量安装nomad,并同时为每一个虚机安装好docker。ssh
$script = <<SCRIPT echo "Installing dependencies ..." yum install -y epel-release yum install -y net-tools yum install -y wget yum install -y jq yum install -y unzip yum install -y bind-utils echo "Determining Consul version to install ..." CHECKPOINT_URL="https://checkpoint-api.hashicorp.com/v1/check" if [ -z "$CONSUL_DEMO_VERSION" ]; then CONSUL_DEMO_VERSION=$(curl -s "${CHECKPOINT_URL}"/consul | jq .current_version | tr -d '"') fi echo "Fetching Consul version ${CONSUL_DEMO_VERSION} ..." cd /tmp/ curl -s https://releases.hashicorp.com/consul/${CONSUL_DEMO_VERSION}/consul_${CONSUL_DEMO_VERSION}_linux_amd64.zip -o consul.zip echo "Installing Consul version ${CONSUL_DEMO_VERSION} ..." unzip consul.zip sudo chmod +x consul sudo mv consul /usr/bin/consul sudo mkdir /etc/consul.d sudo chmod a+w /etc/consul.d echo "Determining Nomad 0.8.1 to install ..." #CHECKPOINT_URL="https://checkpoint-api.hashicorp.com/v1/check" #if [ -z "$NOMAD_DEMO_VERSION" ]; then # NOMAD_DEMO_VERSION=$(curl -s "${CHECKPOINT_URL}"/nomad | jq .current_version | tr -d '"') #fi echo "Fetching Nomad version ${NOMAD_DEMO_VERSION} ..." cd /tmp/ curl -s https://releases.hashicorp.com/nomad/0.8.1/nomad_0.8.1_linux_amd64.zip -o nomad.zip echo "Installing Nomad version 0.8.1 ..." unzip nomad.zip sudo chmod +x nomad sudo mv nomad /usr/bin/nomad echo "Installing nginx ..." #yum install -y nginx echo "Installing docker ..." yum install -y docker SCRIPT
首先启动consul组成一个集群,具体参考《Consul 搭建集群》。若是用默认的配置,nomad启动后会检测本机的Consul并自动的讲nomad服务注册。curl
n1
[root@n1 vagrant]# consul agent -server -bootstrap-expect 3 -data-dir /etc/consul.d -node=node1 -bind=172.20.20.10 -ui -client 0.0.0.0
n2
[root@n2 vagrant]# consul agent -server -bootstrap-expect 3 -data-dir /etc/consul.d -node=node2 -bind=172.20.20.11 -ui -client 0.0.0.0 -join 172.20.20.10
n3
[root@n3 vagrant]# consul agent -server -bootstrap-expect 3 -data-dir /etc/consul.d -node=node3 -bind=172.20.20.12 -ui -client 0.0.0.0 -join 172.20.20.10
[root@n1 vagrant]# consul members Node Address Status Type Build Protocol DC Segment node1 172.20.20.10:8301 alive server 1.1.0 2 dc1 <all> node2 172.20.20.11:8301 alive server 1.1.0 2 dc1 <all> node3 172.20.20.12:8301 alive server 1.1.0 2 dc1 <all>
定义server的配置文件server.hcl
log_level = "DEBUG" bind_addr = "0.0.0.0" data_dir = "/home/vagrant/data_server" name = "server1" advertise { http = "172.20.20.10:4646" rpc = "172.20.20.10:4647" serf = "172.20.20.10:4648" } server { enabled = true # Self-elect, should be 3 or 5 for production bootstrap_expect = 3 }
在命令行中执行
[root@n1 vagrant]# nomad agent -config=server.hcl
进入到n2,n3 执行
nomad agent -config=server.hcl
打开浏览器 http://172.20.20.10:8500/ui/#/dc1/services
从consul中能看到nomad都以启动
再打开nomad自带的UI http://172.20.20.10:4646/ui/servers
能够看到server都已运行
在启动client以前须要先启动docker
,client执行job须要用到docker。
[root@n1 vagrant]# systemctl start docker
在n2,n3 也须要启动
定义client的配置文件client.hcl
log_level = "DEBUG" data_dir = "/home/vagrant/data_clinet" name = "client1" advertise { http = "172.20.20.10:4646" rpc = "172.20.20.10:4647" serf = "172.20.20.10:4648" } client { enabled = true servers = ["172.20.20.10:4647"] } ports { http = 5656 }
在n1中输入命令
[root@n1 vagrant]# nomad agent -config=client.hcl
打开浏览器 http://172.20.20.10:8500/ui/#/dc1/services/nomad-client
能够看到nomad-client已经启动成功,同理在n2,n3也运行client。
最终显示以下
进入到n2,新建一个文件夹job,运行nomad init
[root@n2 vagrant]# mkdir job [root@n2 vagrant]# cd job/ [root@n2 job]# nomad init Example job file written to example.nomad
以上命令新建了一个example的Job
命令行键入
[root@n2 job]# nomad run example.nomad ==> Monitoring evaluation "97f8a1fe" Evaluation triggered by job "example" Evaluation within deployment: "3c89e74a" Allocation "47bf1f20" created: node "9df69026", group "cache" Evaluation status changed: "pending" -> "complete" ==> Evaluation "97f8a1fe" finished with status "complete"
能够看到节点为9df69026
的client去执行了Job
[root@n1 vagrant]# nomad server members Name Address Port Status Leader Protocol Build Datacenter Region server1.global 172.20.20.10 4648 alive false 2 0.8.1 dc1 global server2.global 172.20.20.11 4648 alive false 2 0.8.1 dc1 global server3.global 172.20.20.12 4648 alive true 2 0.8.1 dc1 global
[root@n1 vagrant]# nomad status example ID = example Name = example Submit Date = 2018-06-13T08:42:57Z Type = service Priority = 50 Datacenters = dc1 Status = running Periodic = false Parameterized = false Summary Task Group Queued Starting Running Failed Complete Lost cache 0 0 1 0 0 0 Latest Deployment ID = 3c89e74a Status = successful Description = Deployment completed successfully Deployed Task Group Desired Placed Healthy Unhealthy cache 1 1 1 0 Allocations ID Node ID Task Group Version Desired Status Created Modified 47bf1f20 9df69026 cache 0 run running 8m44s ago 8m26s ago
编辑 example.nomad 找到 count = 1
修改成 count = 3
在命令行中查看Job的变动计划
[root@n2 job]# nomad plan example.nomad +/- Job: "example" +/- Task Group: "cache" (2 create, 1 in-place update) +/- Count: "1" => "3" (forces create) Task: "redis" Scheduler dry-run: - All tasks successfully allocated. Job Modify Index: 70 To submit the job with version verification run: nomad job run -check-index 70 example.nomad When running the job with the check-index flag, the job will only be run if the server side version matches the job modify index returned. If the index has changed, another user has modified the job and the plan's results are potentially invalid.
执行Job的变动任务
[root@n2 job]# nomad job run -check-index 70 example.nomad ==> Monitoring evaluation "3a0ff5e0" Evaluation triggered by job "example" Evaluation within deployment: "2b5b803f" Allocation "34086acb" created: node "6166e031", group "cache" Allocation "4d01cd92" created: node "f97b5095", group "cache" Allocation "47bf1f20" modified: node "9df69026", group "cache" Evaluation status changed: "pending" -> "complete" ==> Evaluation "3a0ff5e0" finished with status "complete"
能够看到又多了两个client节点去执行Job任务
在浏览器中能够看到一共有3个实例
同时也能看到Job的版本记录
[root@n2 job]# nomad status example ID = example Name = example Submit Date = 2018-06-13T08:56:03Z Type = service Priority = 50 Datacenters = dc1 Status = running Periodic = false Parameterized = false Summary Task Group Queued Starting Running Failed Complete Lost cache 0 0 3 0 0 0 Latest Deployment ID = 2b5b803f Status = successful Description = Deployment completed successfully Deployed Task Group Desired Placed Healthy Unhealthy cache 3 3 3 0 Allocations ID Node ID Task Group Version Desired Status Created Modified 34086acb 6166e031 cache 1 run running 3m38s ago 3m25s ago 4d01cd92 f97b5095 cache 1 run running 3m38s ago 3m26s ago 47bf1f20 9df69026 cache 1 run running 16m43s ago 3m27s ago
首先中止n1的nomad server,Ctrl-C
在n2上查询members
[root@n2 job]# nomad server members Name Address Port Status Leader Protocol Build Datacenter Region server1.global 172.20.20.10 4648 failed false 2 0.8.1 dc1 global server2.global 172.20.20.11 4648 alive true 2 0.8.1 dc1 global server3.global 172.20.20.12 4648 alive false 2 0.8.1 dc1 global
server1 的状态为 failed,此时将server1 移出集群
[root@n2 job]# nomad server force-leave server1.global [root@n2 job]# nomad server members Name Address Port Status Leader Protocol Build Datacenter Region server1.global 172.20.20.10 4648 left false 2 0.8.1 dc1 global server2.global 172.20.20.11 4648 alive true 2 0.8.1 dc1 global server3.global 172.20.20.12 4648 alive false 2 0.8.1 dc1 global
server1的状态为left,移出集群成功。