How do I scrape website data conditionally?

Mangs

0人浏览 · 2022-08-30 21:33:33

Mangs · 2022-08-30 21:33:33 发布

Answer a question

I have created a webscraper which could scrape data from a website such as the name of product,its price,description,item no etc. The scraper is being fed multiple web addresses of the same website( what this means is it is being fed with ebay.com/handbags ebay.com/perfumes ebay.com/cameras etc). My issue is if a certain website say ebay.com/handbags has a column 'RRP' it scrapes it, but if the website 'ebay.com/cameras' doesn't have an RRP the program fails for obvious reasons. The error reads as : selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":"//*[@id="vi-priceDetails"]/span[1]/span[2]/span"}

How do I make sure that instead of failing the program, It should simply print a '-' in front of RRP? here is my code example:

import time
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager


def scrape_products():
    website_address = [
        'https://www.ebay.co.uk/itm/The-Discworld-series-Carpe-jugulum-by-Terry-Pratchett-Paperback-Amazing-Value/293566021594?hash=item4459e5ffda:g:yssAAOSw3NBfQ7I0',
        'https://www.ebay.co.uk/itm/Edexcel-AS-A-level-history-Germany-and-West-Germany-1918-89-by-Barbara/293497601580?hash=item4455d1fe2c:g:6lYAAOSwbRFeXGqL']
    options = webdriver.ChromeOptions()
    options.add_argument('start-maximized')
    options.add_experimental_option("excludeSwitches", ["enable-automation"])
    options.add_experimental_option("useAutomationExtension", False)

    browser = webdriver.Chrome(ChromeDriverManager().install(), options=options)
    for web in website_address:
        browser.get(web)
        time.sleep(2)

        product_rrp = browser.find_element_by_xpath('//*[@id="vi-priceDetails"]/span[1]/span[2]/span').text
       #rest of code
        print(product_rrp)


if __name__ == "__main__":
        scrape_products()

I am not sure how to solve this issue. Please help me out. Thanks!

Answers

note: I have changed the selector because your selector is not work in my PC

there are 2 ways, using try except block

try:
    product_rrp = browser.find_element_by_css_selector('.actPanel  div div:nth-child(2) span').text
    print(product_rrp)
except:
    print('no rpp')

or using find_element[s]_... (with s) and check if it has results

product_rrp = browser.find_elements_by_css_selector('.actPanel  div div:nth-child(2) span')
if product_rrp: # has results
   print(product_rrp[0].text) # notice the [0]
else:
   print('no rpp')

Python

Python社区为您提供最前沿的新闻资讯和知识内容

更多推荐

求助！为什么用InsCode部署会出现无限重定向？

Python

如何重塑熊猫。系列

问题:如何重塑熊猫。系列在我看来,它就像 pandas.Series 中的一个错误。 a = pd.Series([1,2,3,4]) b = a.reshape(2,2) b b 有类型 Series 但无法显示,最后一条语句给出异常,非常冗长,最后一行是“TypeError: %d format: a number is required, not numpy.ndarray”。 b.sha

Python

在哪里可以找到有关 Keras 中默认权重初始化器的文档? [复制]

问题:在哪里可以找到有关 Keras 中默认权重初始化器的文档? [复制] 我刚刚在这里](https://keras.io/initializers/)中阅读了有关[中的 Keras 权重初始化器的信息。在文档中,只介绍了不同的初始化程序。如: model.add(Dense(64, kernel_initializer='random_normal')) 当我没有指定kernel_initia