使用python(2.7)实现网页截屏、查库、发邮件的demo。用到了selenium、phantomjs、mailer、jinja二、mysqldb还有image,都是比较典型的用法,可复用性比较强,记录分享一下。html
本demo是用于发周报邮件的,周报内容包括数据库中的记录以及网页指定元素的截屏。linux中能够用crontab每周定时发送。须要发相似周报的同窗这下轻松了!python
直接看代码吧,用的python2.7,关于第三方模块的安装,都很简单,这里就不赘述了。mysql
其中相关数据库参数、邮件参数、网址等真实数据都处理掉了,本身注意替换补全。linux
#!/usr/bin/python # -*-coding:utf-8 -*- # Author: lvs import MySQLdb.cursors import datetime from mailer import Mailer from mailer import Message from jinja2 import Environment, PackageLoader from selenium import webdriver from PIL import Image from time import sleep def fetch_results(): today = datetime.datetime.today() seven_day_ago = today - datetime.timedelta(days=7) today_str = today.strftime('%Y-%m-%d') seven_day_ago_str = seven_day_ago.strftime('%Y-%m-%d') db = MySQLdb.connect(host='127.0.0.1', port=3306, user='test', passwd='test', db='test', charset='utf8', cursorclass=MySQLdb.cursors.DictCursor) cursor = db.cursor() sql = "SELECT * FROM test.test WHERE start_time < '{today}' and start_time >= '{seven_day_ago}'".format( today=today_str, seven_day_ago=seven_day_ago_str) cursor.execute(sql) results = cursor.fetchall() db.close() return results def screen_shot(event_id): driver = webdriver.PhantomJS(executable_path='/usr/local/phantomjs-2.1.1-linux-x86_64/bin/phantomjs') driver.set_page_load_timeout(5) driver.set_window_size('1920', '1080') url = 'http://test.com/detail?id={}'.format(event_id) driver.get(url) sleep(3) img_path = '/home/lvs/image/event_{}.png'.format(event_id) driver.save_screenshot(img_path) element = driver.find_element_by_id('main') left = int(element.location['x']) top = int(element.location['y']) right = int(element.location['x'] + element.size['width']) bottom = int(element.location['y'] + element.size['height']) driver.quit() im = Image.open(img_path) im = im.crop((left, top, right, bottom)) im.save(img_path) def send_mail(results): env = Environment(loader=PackageLoader('jinja', 'templates')) template = env.get_template('mail.html') message = Message(From='test@123.com', To='test@123.com', charset='utf-8') message.Subject = '这是邮件主题' message.Html = template.render(results=results) for r in results: #指定cid参数将嵌入邮件html内容发送,不指定将做为附件发送 message.attach('/home/lvs/image/event_{}.png'.format(r['id']), cid=r['id']) message.attach('/home/lvs/image/event_{}.png'.format(r['id'])) sender = Mailer('test.smtp.com') sender.send(message) if __name__ == '__main__': data = fetch_results() for row in data: screen_shot(row['id']) send_mail(data)
fetch_results()读库,返回结果,没啥好说的。web
screen_shot(event_id)用于网页截屏,event_id用于传递url参数。使用selenium+phantomjs实现,都是python爬虫很典型的工具。注意其中使用Image截取DOM中id为main的元素的操做。截取后保存到本地。sql
send_mail(results)天然是发邮件,利用了mailer和jinja2模板,其中env = Environment(loader=PackageLoader('jinja', 'templates'))这一句是jinja2加载模板的代码,模板位于与此py脚本文件同目录的jinja包下templates目录下的mail.html中。能够看下在mail中嵌入图片和做为附件发送的操做。数据库
mail.html内容以下:python爬虫
<!DOCTYPE html> <html> <head> <meta charset="utf-8"> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> <style> .myimg img { max-width: 400px; max-height: 200px; } </style> </head> <body> <div> <div> <div> <p>最近一周事件记录:</p> </div> <div> <table style="margin: 10px auto; border-collapse:collapse;" border="1" bordercolor="#a0c6e5"> <tr> <th>事件名称</th> <th>事件类型</th> <th>开始时间</th> <th>结束时间</th> <th>事件地点</th> <th>事件描述</th> <th>事件详情</th> </tr> {% for row in results %} <tr> <td>{{row["name"]}}</td> <td>{{row["type"]}}</td> <td>{{row["start_time"]}}</td> <td>{{row["end_time"]}}</td> <td>{{row["place"]}}</td> <td>{{row["description"]}}</td> <td class="myimg"><img src="cid:{{row['id']}}"></td> </tr> {% endfor %} </table> </div> </div> </div> </body> </html>
jinja变量row为字典类型,对应数据库一条记录,索引都是表字段名,注意替换。python2.7
每行最后一列是来自网页截屏的图片,必定要注意此处在img标签的src属性中用cid引入,不然原始img标签的引入方式是不生效的!工具
我的博客:www.hellolvs.cn