MacOS(10.13.1) + Python(3.6.1)python
brew install geckodriver
==注:在采集动态网页时,须要借助外部浏览器Firfox(版本:57.0.1)时,对DOM的操做须要经过geckodriver类库。在网上查找半天有的说是在Firfox某个版本以前是不须要geckodriver包,具体记不太清了。==web
从新下载tornado,下载地址:https://pypi.python.org/pypi/tornado 一、下载命令:wget https://pypi.python.org/packages/df/42/a180ee540e12e2ec1007ac82a42b09dd92e5461e09c98bf465e98646d187/tornado-4.5.1.tar.gz#md5=838687d20923360af5ab59f101e9e02e 二、解压:tar -zxvf tornado-4.5.1.tar.gz 三、cd tornado-4.5.1 四、python setup.py build 五、python setup.py install
lst_news = self.driver.find_elements_by_xpath('//ul[@class="sameday_list"]/li') for_i = 0 for item in lst_news: # 第一种方案 li_id = item.get_attribute('id') title = item.find_element_by_xpath('//li[@id="'+li_id+'"]/div/h2/span[@class="title"]').text print(title) # 第二种方案 for_i += 1 title2 = item.find_element_by_xpath('//li['+str(for_i)+']/div/h2/span[@class="title"]').text print(title2)
==问题缘由:如下为内部实现方法,注意return是在父dom结构下去执行xpath。因此致使永远取到的是第一个值。因此咱们在for中须要指定li标签的值==浏览器
# Private Methods def _execute(self, command, params=None): """Executes a command against the underlying HTML element. Args: command: The name of the command to _execute as a string. params: A dictionary of named parameters to send with the command. Returns: The command's JSON response loaded into a dictionary object. """ if not params: params = {} params['id'] = self._id return self._parent.execute(command, params)