Answer a question

When I am scraping a table from a website, it is missing the bottom 5 rows of data and I do not know how to pull them. I am using a combination of BeautifulSoup and Selenium. I thought that they were not loading, so I tried scrolling to the bottom with Selenium, but that still did not work.

Code trials:

site = 'https://fbref.com//en/comps/15/10733/schedule/2020-2021-League-One'
PATH = my_path
driver = webdriver.Chrome(PATH)
driver.get(site)
webpage = bs.BeautifulSoup(driver.page_source, features='html.parser')

table = webpage.find('table', {'class': 'stats_table sortable min_width now_sortable'})
print(table.prettify())
df = pd.read_html(str(table))[0]

print(df.tail())

Please could you help with scraping the full table?

Answers

Using only Selenium to pull all the rows from the table within the website you need to induce WebDriverWait for the visibility_of_element_located() and using DataFrame from Pandas you can use the following Locator Strategy:

  • Using CSS_SELECTOR:

    tabledata = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "table.stats_table.sortable.min_width.now_sortable"))).get_attribute("outerHTML")
    tabledf = pd.read_html(tabledata)
    print(tabledf)
    
  • Using XPATH:

    driver.get('https://fbref.com//en/comps/15/10733/schedule/2020-2021-League-One')
    data = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//table[@class='stats_table sortable min_width now_sortable']"))).get_attribute("outerHTML")
    df = pd.read_html(data)
    print(df)
    
  • Note : You have to add the following imports :

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    
  • Console Output:

    [              Round   Wk  Day  ...             Referee  Match Report                         Notes
    0    Regular Season    1  Sat  ...  Charles Breakspear  Match Report                           NaN
    1    Regular Season    1  Sat  ...       Andrew Davies  Match Report                           NaN
    2    Regular Season    1  Sat  ...       Kevin Johnson  Match Report                           NaN
    3    Regular Season    1  Sat  ...   Anthony Backhouse  Match Report                           NaN
    4    Regular Season    1  Sat  ...        Marc Edwards  Match Report                           NaN
    ..              ...  ...  ...  ...                 ...           ...                           ...
    685     Semi-finals  NaN  Tue  ...       Robert Madley  Match Report                    Leg 1 of 2
    686     Semi-finals  NaN  Wed  ...         Craig Hicks  Match Report                    Leg 1 of 2
    687     Semi-finals  NaN  Fri  ...        Keith Stroud  Match Report     Leg 2 of 2; Blackpool won
    688     Semi-finals  NaN  Sat  ...   Michael Salisbury  Match Report  Leg 2 of 2; Lincoln City won
    689           Final  NaN  Sun  ...     Tony Harrington  Match Report                           NaN
    
        [690 rows x 13 columns]]
    
Logo

Python社区为您提供最前沿的新闻资讯和知识内容

更多推荐