Answer a question

I am working on this project on Python 3.8. I have to download data into a Pandas Dataframe and ultimately write to a databse (SQL or Access) for all premier league teams for 2018 & 2019. I am trying to use beautifulsoup for that. I have a code that works with soccerbase.com but it does not work on sofascore.com @oppressionslayer has helped with the code so far. Can anybody please help me?

import json

import pandas as pd
import requests
from bs4 import BeautifulSoup as bs

url = "https://www.sofascore.com/football///json"
r = requests.get(url)
soup = bs(r.content, 'lxml')
json_object = json.loads(r.content)

json_object['sportItem']['tournaments'][0]['events'][0]['homeTeam']['name']
# 'Sheffield United'

json_object['sportItem']['tournaments'][0]['events'][0]['awayTeam']['name']  # 'Manchester United'

json_object['sportItem']['tournaments'][0]['events'][0]['homeScore']['current']
# 3

json_object['sportItem']['tournaments'][0]['events'][0]['awayScore']['current']

print(json_object)

How do I loop this code to get the entire universe of teams? My aim is to get every team data with rows as ["Event date", "Competition", "Home Team", "Home Score", "Away Team", "Away Score", "Score"] e.g. 31/10/2019 Premier League Chelsea 1 Manchester United 2 1-2

I am a sarter and how can I get it?

Answers

This code just works. Although it does not capture all database of the website but ith is a potent scraper

import simplejson as json
import pandas as pd
import requests
from bs4 import BeautifulSoup as bs

url = "https://www.sofascore.com/football///json"
r = requests.get(url)
soup = bs(r.content, 'lxml')
json_object = json.loads(r.content)

headers = ['Tournament', 'Home Team', 'Home Score', 'Away Team', 'Away Score', 'Status', 'Start Date']
consolidated = []
for tournament in json_object['sportItem']['tournaments']:
    rows = []
    for event in tournament["events"]:
        row = []
        row.append(tournament["tournament"]["name"])
        row.append(event["homeTeam"]["name"])
        if "current" in event["homeScore"].keys():
            row.append(event["homeScore"]["current"])
        else:
            row.append(-1)
        row.append(event["awayTeam"]["name"])
        if "current" in event["awayScore"].keys():
            row.append(event["awayScore"]["current"])
        else:
            row.append(-1)
        row.append(event["status"]["type"])
        row.append(event["formatedStartDate"])
        rows.append(row)
    df = pd.DataFrame(rows, columns=headers)
    consolidated.append(df)

pd.concat(consolidated).to_csv(r'Path.csv', sep=',', encoding='utf-8-sig',
                               index=False)

Courtesy Praful Surve @praful-surve

Logo

Python社区为您提供最前沿的新闻资讯和知识内容

更多推荐