How can I extract the estimated home value from a Zillow link? [closed]

Mangs

0人浏览 · 2022-08-25 07:34:27

Mangs · 2022-08-25 07:34:27 发布

Answer a question

I want my code to be able to take the Zestimate value off this page so I can work with it (in this case, 10,037,774). How would I go about doing this?

Answers

First of all, the website is returning incomplete data as it is recognizing the Python script. To handle that, you'll have to use a fake User-Agent to emulate a browser request.

import requests

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.186 Safari/537.36'}
r = requests.get('https://www.zillow.com/homes/for_sale/19882656_zpid/34.217551,-118.600674,34.122534,-118.723412_rect/12_zm/1_fr/', headers=headers)

This will give all the elements available in the page source. But now, many elements are dynamically generated with JavaScript. So, they are not available in the page source. The value you want is inside the <span class id="yui_3_18_1_2_1523251661826_947"> which is seen in the developer tools upon inspecting the element.

But in the page source, this tag looks like

<span class=""> $10,037,734 <span class="value-suffix">   </span></span>

So, you can't use that id to get the value. You can get the <span> tag which contains the text Zestimate using soup.find('span', {'data-target-id': 'zest-tip-hdp'}). To get the next <span> tag, you can use find_next('span').

Complete code:

import requests
from bs4 import BeautifulSoup

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.186 Safari/537.36'}
r = requests.get('https://www.zillow.com/homes/for_sale/19882656_zpid/34.217551,-118.600674,34.122534,-118.723412_rect/12_zm/1_fr/', headers=headers)
soup = BeautifulSoup(r.text, 'lxml')

zestimate = soup.find('span', {'data-target-id': 'zest-tip-hdp'}).find_next('span').text
print(zestimate)
#  $10,037,734

There's another way you can get this data. At the top of the page source, there's a tag that looks like

<meta property="zillow_fb:description" content="Zestimate&reg; Home Value: $10,037,734. "/>

You can find the tag using the property attribute and get the value of the content attribute using ['content']. To get the price, do some simple string splitting.

meta = soup.find('meta', property='zillow_fb:description')['content']
print(meta.split(':')[1])
#  $10,037,734.

If you don't want the dot ., you can strip it.

Python

Python社区为您提供最前沿的新闻资讯和知识内容

更多推荐

求助！为什么用InsCode部署会出现无限重定向？

Python

如何重塑熊猫。系列

问题:如何重塑熊猫。系列在我看来,它就像 pandas.Series 中的一个错误。 a = pd.Series([1,2,3,4]) b = a.reshape(2,2) b b 有类型 Series 但无法显示,最后一条语句给出异常,非常冗长,最后一行是“TypeError: %d format: a number is required, not numpy.ndarray”。 b.sha

Python

在哪里可以找到有关 Keras 中默认权重初始化器的文档? [复制]

问题:在哪里可以找到有关 Keras 中默认权重初始化器的文档? [复制] 我刚刚在这里](https://keras.io/initializers/)中阅读了有关[中的 Keras 权重初始化器的信息。在文档中,只介绍了不同的初始化程序。如: model.add(Dense(64, kernel_initializer='random_normal')) 当我没有指定kernel_initia