Answer a question

I'm trying to create a list that I can parse through to get some data, but I'm running into this error; AttributeError: 'NoneType' object has no attribute 'find_all'. I've began my code with this:

import pandas as pd
import requests
from bs4 import BeautifulSoup

page = requests.get("https://www.sports-reference.com/cbb/schools/michigan/2021-schedule.html")
soup = BeautifulSoup(page.text, features="html.parser")

table = soup.find("table", attrs={"id":"schedule"})
table_rows = table.find_all('tr')

l = []
for tr in table_rows:
    td = tr.find_all('td')
    row = [tr.text for tr in td]
    l.append(row)

test_df = pd.DataFrame(l)

This code works but now I'm trying to add multiple elements. This is my current attempt at doing this:

query_set = ["Duke, Michigan"]

for query in query_set:

    page = requests.get("https://www.sports-reference.com/cbb/schools/" + str(query) + "/2021-schedule.html")
    soup = BeautifulSoup(page.text, "html.parser")
    table = soup.find("table", attrs={"id":"schedule"})
    table_rows = table.find_all('tr')

    l = []
    for tr in table_rows:
        td = tr.find_all('td')
        row = [tr.text for tr in td]
        l.append(row)

    schedule_df = pd.DataFrame(l)

However now I'm getting that attribute message and I can't figure out why. Does anyone have any advice on how to fix this? Thanks.

Answers

2 problems here:

  1. Yes it's case sensitive as stated, so you need to have the string all lower case. Simply use the .lower() method
  2. but even converting the string to all lower, your list is a 1 string element "Duke, Michigan", when you want it to be "Duke", "Michigan"
  3. Just want to point out you could just use pandas' .read_html() to read in the table as that's what you are converting it to anyway (and pandas uses beautifulsoup under the hood) as an alternative. But bs4 is fine here too. If you left it as is.

Code:

import requests
import pandas as pd

query_set = ["Duke", "Michigan"]

for query in query_set:
    page = requests.get("https://www.sports-reference.com/cbb/schools/" + query.lower() + "/2021-schedule.html")
    schedule_df = pd.read_html(page.text, attrs={"id":"schedule"})[0]

Output:

     G               Date  ... Streak                                 Arena
0    1  Sat, Nov 28, 2020  ...    W 1                Cameron Indoor Stadium
1    2   Tue, Dec 1, 2020  ...    L 1                Cameron Indoor Stadium
2    3   Fri, Dec 4, 2020  ...    W 1                Cameron Indoor Stadium
3    4   Tue, Dec 8, 2020  ...    L 1                Cameron Indoor Stadium
4    5  Wed, Dec 16, 2020  ...    W 1  Purcell Pavilion at the Joyce Center
5    6   Wed, Jan 6, 2021  ...    W 2                Cameron Indoor Stadium
6    7   Sat, Jan 9, 2021  ...    W 3                Cameron Indoor Stadium
7    8  Tue, Jan 12, 2021  ...    L 1                      Cassell Coliseum
8    9  Tue, Jan 19, 2021  ...    L 2                Petersen Events Center
9   10  Sat, Jan 23, 2021  ...    L 3                       KFC Yum! Center
10  11  Tue, Jan 26, 2021  ...    W 1                Cameron Indoor Stadium
11  12  Sat, Jan 30, 2021  ...    W 2                Cameron Indoor Stadium
12  13   Mon, Feb 1, 2021  ...    L 1                     BankUnited Center
13  14   Sat, Feb 6, 2021  ...    L 2                Cameron Indoor Stadium
14  15   Tue, Feb 9, 2021  ...    L 3                Cameron Indoor Stadium
15  16  Sat, Feb 13, 2021  ...    NaN                                   NaN
16  17  Wed, Feb 17, 2021  ...    NaN                                   NaN
17  18  Sat, Feb 20, 2021  ...    NaN                                   NaN
18  19  Mon, Feb 22, 2021  ...    NaN                                   NaN
19  20  Sat, Feb 27, 2021  ...    NaN                                   NaN
20  21   Tue, Mar 2, 2021  ...    NaN                                   NaN
21  22   Sat, Mar 6, 2021  ...    NaN                                   NaN

[22 rows x 15 columns]
     G               Date   Time Type  ...     W    L Streak                Arena
0    1  Wed, Nov 25, 2020  4:00p  REG  ...   1.0  0.0    W 1        Crisler Arena
1    2  Sun, Nov 29, 2020  6:00p  REG  ...   2.0  0.0    W 2        Crisler Arena
2    3   Wed, Dec 2, 2020  7:00p  REG  ...   3.0  0.0    W 3        Crisler Arena
3    4   Sun, Dec 6, 2020  4:00p  REG  ...   4.0  0.0    W 4        Crisler Arena
4    5   Wed, Dec 9, 2020  6:00p  REG  ...   5.0  0.0    W 5        Crisler Arena
5    6  Sun, Dec 13, 2020  2:00p  REG  ...   6.0  0.0    W 6        Crisler Arena
6    7  Fri, Dec 25, 2020  6:00p  REG  ...   7.0  0.0    W 7  Pinnacle Bank Arena
7    8  Thu, Dec 31, 2020  8:00p  REG  ...   8.0  0.0    W 8       Xfinity Center
8    9   Sun, Jan 3, 2021  7:30p  REG  ...   9.0  0.0    W 9        Crisler Arena
9   10   Wed, Jan 6, 2021  8:30p  REG  ...  10.0  0.0   W 10        Crisler Arena
10  11  Tue, Jan 12, 2021  7:00p  REG  ...  11.0  0.0   W 11        Crisler Arena
11  12  Sat, Jan 16, 2021  2:00p  REG  ...  11.0  1.0    L 1       Williams Arena
12  13  Tue, Jan 19, 2021  7:00p  REG  ...  12.0  1.0    W 1        Crisler Arena
13  14  Fri, Jan 22, 2021  7:00p  REG  ...  13.0  1.0    W 2         Mackey Arena
14  15  Sun, Feb 14, 2021  1:00p  REG  ...   NaN  NaN    NaN                  NaN
15  16  Thu, Feb 18, 2021    NaN  REG  ...   NaN  NaN    NaN                  NaN
16  17  Sun, Feb 21, 2021    NaN  REG  ...   NaN  NaN    NaN                  NaN
17  18  Sat, Feb 27, 2021    NaN  REG  ...   NaN  NaN    NaN                  NaN
18  19   Thu, Mar 4, 2021    NaN  REG  ...   NaN  NaN    NaN                  NaN
19  20   Sun, Mar 7, 2021    NaN  REG  ...   NaN  NaN    NaN                  NaN

[20 rows x 15 columns]
Logo

Python社区为您提供最前沿的新闻资讯和知识内容

更多推荐