我爬取一个网站,数据是要向服务器发出异步请求加载带,我仿造headers,参数都没有错误,用requests能获取正常响应,当scrapy不行

def parse_histical_data(self, response):

html = BeautifulSoup(response.body, 'lxml')

patterm = re.compile(r'smlId: [0-9]*', re.MULTILINE|re.UNICODE)

script = html.find('script', text=patterm).text

smlId_text = patterm.search(script).group()

smlId = smlId_text.split(' ')[1]

curr_id = response.meta['pair_id']

header=html.select('#leftColumn > div.instrumentHeader > h2')[0].string

st_date = '01/01/2001'

end_date = '05/07/2050'

interval_sec = 'Daily'

sort_col = 'date'

sort_ord = 'DESC'

action = 'historical_data'

data = {'smlID': smlId, 'curr_id': curr_id, 'header': header, 'st_date': st_date, 'end_state': end_date,

'interval_sec': interval_sec, 'sort_col': sort_col, 'sort_ord': sort_ord, 'action': action}

head = self.download_headers.copy()

request = FormRequest(self.his_url, callback=self.parse_histical_data,

headers=head, formdata=data)

yield request

请求带网址是'https://www.investing.com/ins...',使用一模一样带headers和data,scrapy返回400

Logo

为开发者提供学习成长、分享交流、生态实践、资源工具等服务,帮助开发者快速成长。

更多推荐