一、 直接登录服务器:ssh 2014210***@thumedia.org -p 6349python
建立streaming.py: touch streaming.py,而且以下编辑:git
#! /usr/bin/python缓存
import loggingbash
import math服务器
import timeapp
pg2count={}ssh
t=1ide
while 1:测试
fp=open('/tmp/hw3.log','r')spa
for line in fp:
line = line.strip()
times, page, count = line.split()[0],line.split()[1],line.split()[2]
if count.isdigit() & page.startswith('Page-'):
try:
pg2count[page] = [pg2count[page][0] + int(count),t]
except:
pg2count[page] = [int(count),t]
fp.close()
a=sorted(pg2count.items(), key=lambda page:page[1][0], reverse = True)
print '%s%s%s' % ('the page rank at current time ',times,' is:')
for i in range(0,10):
print '%s\t%d' % (a[i][0],a[i][1][0])
logger = logging.getLogger()
#set loghandler
file = logging.FileHandler("output.log")
logger.addHandler(file)
#set formater
formatter = logging.Formatter("%(asctime)s %(levelname)s %(message)s")
file.setFormatter(formatter)
#set log level
logger.setLevel(logging.NOTSET)
logger.info('%s%s%s' % ('the page rank at current time ',times,' is:'))
for i in range(0,10):
logger.info('%s\t%d' % (a[i][0],a[i][1][0]))
time.sleep(60)
二、 写好代码以后测试运行:python streaming.py输出以下:
nohup: ignoring input and appending output to `nohup.out',则表示后台运行成功,输出显示会保存到nohup.out中,
也能够查看output.log文件里的输出:
最后咱们让它在后台一直执行:nohup python streaming.py &输出:
[1] 8994
2014210***@cluster-3-1:~$ nohup: ignoring input and appending output to `nohup.out'
一天以后,咱们再次查看结果:
能够看到,累计的结果已经和第一次不太同样
三、 杀掉进程:ps -ef|grep 1020获得以下输出:
2014210***@cluster-3-1:~$ ps -ef|grep 1020
1020 7512 7471 0 Jan10 ? 00:00:00 sshd: 2014210***@pts/30
1020 7513 7512 0 Jan10 pts/30 00:00:00 -bash
1020 7574 7508 0 20:55 ? 00:00:00 sshd: 2014210***@pts/52
1020 7575 7574 0 20:55 pts/52 00:00:00 -bash
1020 8282 7575 0 21:04 pts/52 00:00:00 ps -ef
1020 8283 7575 0 21:04 pts/52 00:00:00 grep --color=auto 1020
1020 8994 1 0 13:20 ? 00:01:46 python streaming.py
1020 12260 12232 0 Jan10 ? 00:00:00 sshd: 2014210***@pts/35
1020 12261 12260 0 Jan10 pts/35 00:00:01 –bash
输入kill 8994:
2014210***@cluster-3-1:~$ kill 8994
2014210***@cluster-3-1:~$ ps -ef|grep 1020
1020 7512 7471 0 Jan10 ? 00:00:00 sshd: 2014210***@pts/30
1020 7513 7512 0 Jan10 pts/30 00:00:00 -bash
1020 7574 7508 0 20:55 ? 00:00:00 sshd: 2014210***@pts/52
1020 7575 7574 0 20:55 pts/52 00:00:00 -bash
1020 8335 7575 0 21:05 pts/52 00:00:00 ps -ef
1020 8336 7575 0 21:05 pts/52 00:00:00 grep --color=auto 1020
1020 12260 12232 0 Jan10 ? 00:00:00 sshd: 2014210***@pts/35
1020 12261 12260 0 Jan10 pts/35 00:00:01 –bash
至此,streaming.py运行结束。
Question
How can your design scale when the streaming is large and the calculation is complicated?
答:首先肯定每一个程序周期须要的时间,而后肯定这段时间内的流数据可以保存在一块足够大的缓存区域,等到下个程序周期处理前一个缓存的流数据便可。