Answer a question

I was wondering how I would be able to extract the text from this tag from this website: https://ru.thefreedictionary.com/%d1%88%d1%87%d0%be

<div id="MainTxt">


            Слово в словаре не найдено.
 <div id="didYouMean"></div>Быть может, вы искали:
<div style="margin:6px 0 3px 0">

The code I'm using gets everything under the id tag, but I'm looking only to get the text 'Слово в словаре не найдено.'

soup.findAll("div", attrs = {"id": ["MainTxt"]})

Thank you for any help!

Answers

First of all, there is no need to combine findAll() with id attribute because there can only be one element with that id in that html so findAll() will always return list of one element. Here is how you could solve your problem.

match = soup.find('div', {'id': 'MainTxt'})
text = match.text.rstrip().lstrip().split('\n')

rstrip() and lstrip() are for removing trailing spaces in front and in the back of the string. Now text is a list of elements: ['Слово в словаре не найдено.\r', ' Быть может, вы искали:\r', '', ...]. To get your target string is easy.

target_string = text[0].replace('\r', '')
Logo

Python社区为您提供最前沿的新闻资讯和知识内容

更多推荐