Scraping web page after accepting cookies in python

Mangs

1人浏览 · 2022-08-25 00:34:38

Mangs · 2022-08-25 00:34:38 发布

Answer a question

I'm trying to scrape a web page but before accessing the page, there is a banner for accepting cookies. I am using selenium to click on the button "Accept all cookies" but even after clicking on the button I can't access the right HTML page.

This is my code :

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup

url = 'https://www.wikiparfum.fr/explore/by-name?query=dior'

driver = webdriver.Chrome(executable_path=DRIVER_PATH)

driver.get(url)
driver.find_element_by_id('onetrust-accept-btn-handler').click()

html = driver.page_source
soup = BeautifulSoup(html, 'lxml')

print(soup)

And this is the beginning of the HTML page that is printed : enter image description here

If anyone can help me with this one, thank you!

Answers

You should wait for the accept cookies button element appearance before clicking it

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup

url = 'https://www.wikiparfum.fr/explore/by-name?query=dior'

driver = webdriver.Chrome(executable_path=DRIVER_PATH)
wait = WebDriverWait(driver, 20)

driver.get(url)
wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "#onetrust-accept-btn-handler"))).click()

html = driver.page_source
soup = BeautifulSoup(html, 'lxml')

print(soup)

向您推荐>>百度飞桨AI Studio社区

学AI，认准AI Studio！GPU算力，限时免费领，邀请好友解锁更多惊喜福利 >>>

更多推荐

求助！为什么用InsCode部署会出现无限重定向？

Python

如何重塑熊猫。系列

问题:如何重塑熊猫。系列在我看来,它就像 pandas.Series 中的一个错误。 a = pd.Series([1,2,3,4]) b = a.reshape(2,2) b b 有类型 Series 但无法显示,最后一条语句给出异常,非常冗长,最后一行是“TypeError: %d format: a number is required, not numpy.ndarray”。 b.sha

Python

在哪里可以找到有关 Keras 中默认权重初始化器的文档? [复制]

问题:在哪里可以找到有关 Keras 中默认权重初始化器的文档? [复制] 我刚刚在这里](https://keras.io/initializers/)中阅读了有关[中的 Keras 权重初始化器的信息。在文档中,只介绍了不同的初始化程序。如: model.add(Dense(64, kernel_initializer='random_normal')) 当我没有指定kernel_initia