如何在this page上获得这个“h2”标题的所有“href”属性?
<h2 class="entry-title">
<a href="http://www.allitebooks.com/deep-learning-with-python-2/" rel="bookmark">Deep Learning with Python</a>
</h2>
我试过的没有获得href,是:
title = driver.find_elements_by_class_name('entry-title')
title[0].get_attribute('href')
这没有得到“a”标签的链接.如果我在“a”标签上找到所有元素,它将返回页面上的每个href(这不是我想要的).我想返回上面的标题,但能够获得他们的网址“href”属性.
解决方法:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import webdriverwait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome()
baseUrl = "http://www.allitebooks.com/page/1/?s=python"
driver.get(baseUrl)
# wait = webdriverwait(driver, 5)
# wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, ".search-result-list li")))
# Get last page number
lastPage = int(driver.find_element(By.CSS_SELECTOR, ".pagination a:last-child").text)
# Get all HREFs for the first page and save them in hrefs list
js = 'return [...document.querySelectorAll(".entry-title a")].map(e=>e.href)'
hrefs = driver.execute_script(js)
# Iterate throw all pages and get all HREFs of books
for i in range(2, lastPage):
driver.get("http://www.allitebooks.com/page/" + str(i) + "/?s=python")
hrefs.extend(driver.execute_script(js))
for href in hrefs:
print(href)
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 [email protected] 举报,一经查实,本站将立刻删除。