How can I extract text under html div id tag in python
Answer a question
I was wondering how I would be able to extract the text from this tag from this website: https://ru.thefreedictionary.com/%d1%88%d1%87%d0%be
<div id="MainTxt">
Слово в словаре не найдено.
<div id="didYouMean"></div>Быть может, вы искали:
<div style="margin:6px 0 3px 0">
The code I'm using gets everything under the id tag, but I'm looking only to get the text 'Слово в словаре не найдено.'
soup.findAll("div", attrs = {"id": ["MainTxt"]})
Thank you for any help!
Answers
First of all, there is no need to combine findAll() with id attribute because there can only be one element with that id in that html so findAll() will always return list of one element. Here is how you could solve your problem.
match = soup.find('div', {'id': 'MainTxt'})
text = match.text.rstrip().lstrip().split('\n')
rstrip() and lstrip() are for removing trailing spaces in front and in the back of the string. Now text is a list of elements: ['Слово в словаре не найдено.\r', ' Быть может, вы искали:\r', '', ...]. To get your target string is easy.
target_string = text[0].replace('\r', '')
更多推荐

所有评论(0)