Answer a question

i have a list of names, but some have accents. i want to be able to find the page of the person without having to manually get rid of the accent on the name, which prevents the search. is there a way to even do this?

import requests
from bs4 import BeautifulSoup
import pandas as pd
from pandas import DataFrame

base_url = 'https://basketball.realgm.com'

player_names=['Ante Žižić','Anžejs Pasečņiks', 'Dario Šarić', 'Dāvis Bertāns', 'Jakob Pöltl']

# Empty DataFrame
result = pd.DataFrame()

for name in player_names:
    url = f'{base_url}/search?q={name.replace(" ", "+")}' 
    response = requests.get(url)
    soup = BeautifulSoup(response.content, 'html.parser')

    if url == response.url:
        # Get all NBA players
        for player in soup.select('.tablesaw tr:has(a[href*="/nba/teams/"]) a[href*="/player/"]'): 
            response = requests.get(base_url + player['href'])
            player_soup = BeautifulSoup(response.content, 'lxml') 
            player_data = get_player_stats(search_name=player.text, real_name=name, player_soup=player_soup) 
            result = result.append(player_data, sort=False).reset_index(drop=True)
    else:
        player_data = get_player_stats(search_name=name, real_name=name, player_soup=soup) 
        result = result.append(player_data, sort=False).reset_index(drop=True) 

Answers

python-slugify can handle the spaces and the unicode characters. Then since you're dealing with a search string, just convert the - to + with a simple replace('-', '+').

from slugify import slugify

base_url = "https://basketball.realgm.com"

player_names = [
    "Ante Žižić",
    "Anžejs Pasečņiks",
    "Dario Šarić",
    "Dāvis Bertāns",
    "Jakob Pöltl",
]

for name in player_names:
    url = f"{base_url}/search?q={slugify(name).replace('-', '+')}"
    print(url)

Output:

https://basketball.realgm.com/search?q=ante+zizic
https://basketball.realgm.com/search?q=anzejs+pasecniks
https://basketball.realgm.com/search?q=dario+saric
https://basketball.realgm.com/search?q=davis+bertans
https://basketball.realgm.com/search?q=jakob+poltl

Granted, the unidecode module the others have mentioned will work as well.

from unidecode import unidecode

for name in player_names:
    url = f"{base_url}/search?q={unidecode(name).replace(' ', '+')}"
    print(url)

The URL doesn't seem to care if you have lower or title case for the names.

https://basketball.realgm.com/search?q=Ante+Zizic
https://basketball.realgm.com/search?q=Anzejs+Pasecniks
https://basketball.realgm.com/search?q=Dario+Saric
https://basketball.realgm.com/search?q=Davis+Bertans
https://basketball.realgm.com/search?q=Jakob+Poltl

Here's the links so you can validate that it's working.

  • https://basketball.realgm.com/search?q=ante+zizic
  • https://basketball.realgm.com/search?q=anzejs+pasecniks
  • https://basketball.realgm.com/search?q=dario+saric
  • https://basketball.realgm.com/search?q=davis+bertans
  • https://basketball.realgm.com/search?q=jakob+poltl
Logo

Python社区为您提供最前沿的新闻资讯和知识内容

更多推荐