selenium+phantomJS学习使用记录

背景知识:html

phantomjs是一个基于webkit的没有界面的浏览器,因此运行起来比完整的浏览器要高效。python

selenium是一个测试web应用的工具,目前是2.42.1版本,和1版的区别在于2.0+中把WebDrive整合在了一块儿。web

selenium2支持的Python版本:2.7, 3.2, 3.3 and 3.4ajax

若是须要进行远程操做的话,就须要额外安装selenium serverexpress

安装:windows

先装selenium2,哪一种方式装均可以,我通常都是直接下载压缩包,而后用python setup.py install命令来装,selenium 2.42.1的下载地址:https://pypi.python.org/pypi/selenium/2.42.1浏览器

而后下载phantomjs,https://bitbucket.org/ariya/phantomjs/downloads/phantomjs-1.9.7-windows.zip,解压后能够看到一个phantomjs.exe的文件less

范例1:ide

bubuko.com,布布扣
#coding=utf-8
from selenium import webdriver

driver = webdriver.PhantomJS(executable_path=‘C:\Users\Gentlyguitar\Desktop\phantomjs-1.9.7-windows\phantomjs.exe‘)
driver.get("http://duckduckgo.com/")
driver.find_element_by_id(‘search_form_input_homepage‘).send_keys("Nirvana")
driver.find_element_by_id("search_button_homepage").click()
print driver.current_url
driver.quit()
bubuko.com,布布扣

其中的executable_path就是刚才phantomjs.exe的路径,运行结果:工具

https://duckduckgo.com/?q=Nirvana

Walk through of the example:

 值得一提的是:

get方法会一直等到页面被彻底加载,而后才会继续程序

可是对于ajax: It’s worth noting that if your page uses a lot of AJAX on load then WebDriver may not know when it has completely loaded

send_keys就是填充input

范例2:

bubuko.com,布布扣
bubuko.com,布布扣
#coding=utf-8
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver import ActionChains
import time
import sys

driver = webdriver.PhantomJS(executable_path=‘C:\Users\Gentlyguitar\Desktop\phantomjs-1.9.7-windows\phantomjs.exe‘)
driver.get("http://www.zhihu.com/#signin")
#driver.find_element_by_name(‘email‘).send_keys(‘your email‘)
driver.find_element_by_xpath(‘//input[@name="password"]‘).send_keys(‘your password‘)
#driver.find_element_by_xpath(‘//input[@name="password"]‘).send_keys(Keys.RETURN)
time.sleep(2)
driver.get_screenshot_as_file(‘show.png‘)
#driver.find_element_by_xpath(‘//button[@class="sign-button"]‘).click()
driver.find_element_by_xpath(‘//form[@class="zu-side-login-box"]‘).submit()

try:
    dr=WebDriverWait(driver,5)
    dr.until(lambda the_driver:the_driver.find_element_by_xpath(‘//a[@class="zu-top-nav-userinfo "]‘).is_displayed())
except:
    print ‘登陆失败‘
    sys.exit(0)
driver.get_screenshot_as_file(‘show.png‘)
#user=driver.find_element_by_class_name(‘zu-top-nav-userinfo ‘)
#webdriver.ActionChains(driver).move_to_element(user).perform() #移动鼠标到个人用户名
loadmore=driver.find_element_by_xpath(‘//a[@id="zh-load-more"]‘)
actions = ActionChains(driver)
actions.move_to_element(loadmore)
actions.click(loadmore)
actions.perform()
time.sleep(2)
driver.get_screenshot_as_file(‘show.png‘)
print driver.current_url
print driver.page_source
driver.quit()
bubuko.com,布布扣
bubuko.com,布布扣

这个程序完成的是,登录知乎,而后能自动点击页面下方的“更多”,以载入更多的内容

Walk through of the example:

from selenium.webdriver.common.keys import Keys,keys这个类就是键盘上的键,文中的send_keys(Keys.RETURN)就是按一个回车

from selenium.webdriver.support.ui import WebDriverWait是为了后面一个等待的操做

from selenium.webdriver import ActionChains是导入一个动做的类,这句话的写法,我找了好久

find_element推荐使用Xpath的方法,缘由在于:逼格高,并且真的很是很是方便

Xpath表达式写法教程:http://www.ruanyifeng.com/blog/2009/07/xpath_path_expressions.html

值得注意的是,避免选择value带有空格的属性,譬如class = "country name"这种,否则会报错,大概compound class之类的错

检查用户密码是否输入正确的方法就是在填入后截屏看看

想要截屏,这么一句话就行:

driver.get_screenshot_as_file(‘show.png‘)

可是,这里的截屏是不带滚动条的,就是给你把整个页面所有照下来

try:
    dr=WebDriverWait(driver,5)
    dr.until(lambda the_driver:the_driver.find_element_by_xpath(‘//a[@class="zu-top-nav-userinfo "]‘).is_displayed())
except:
    print ‘登陆失败‘
    sys.exit(0)

是用来经过检查某个元素是否被加载来检查是否登陆成功,我认为当个黑盒子用就能够了。其中5的解释:5秒内每隔500毫秒扫描1次页面变化,直到指定的元素

对于表单的提交,便可以选择登陆按钮而后使用click方法,也能够选择表单而后使用submit方法,后者能应付没有登陆按钮的状况,因此推荐使用submit()

对于一次点击,既可使用click(),也可使用一连串的action来实现,如文中:

loadmore=driver.find_element_by_xpath(‘//a[@id="zh-load-more"]‘)
actions = ActionChains(driver)
actions.move_to_element(loadmore)
actions.click(loadmore)
actions.perform()

这5句话其实就至关于一句话,find element而后click,可是action的适用范围更广,譬如在这个例子中,要点击的是一个a标签对象,我不知道为何直接用click不行,不起做用

print driver.current_url
print driver.page_source

打印网页的两个属性:url和source

 

参考文献:

http://www.realpython.com/blog/python/headless-selenium-testing-with-python-and-phantomjs/#.U5FXUvmSziE

http://selenium-python.readthedocs.org/getting-started.html

http://www.ruanyifeng.com/blog/2009/07/xpath_path_expressions.html

http://www.cnblogs.com/paisen/p/3310067.html


phantomJS设置头部的userAgent
from selenium import webdriver from selenium.webdriver.common.desired_capabilities import DesiredCapabilities dcap = dict(DesiredCapabilities.PHANTOMJS) dcap["phantomjs.page.settings.userAgent"] = ( "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:25.0) Gecko/20100101 Firefox/25.0 " ) driver = webdriver.PhantomJS(executable_path='./phantomjs', desired_capabilities=dcap) driver.get("http://dianping.com/") cap_dict = driver.desired_capabilities for key in cap_dict: print '%s: %s' % (key, cap_dict[key]) print driver.current_url driver.quit
查看是否成功

agent = browser.execute_script("return navigator.userAgent")print agent

相关文章
相关标签/搜索