python解码gbk_解决Python3 requests 响应头中文GBK编码报错,无法请求
问题表现:响应头中有gbk编码的中文,导致requests无法解码读取header。http包如图:Python 3.4.3 (default, Aug 25 2017, 16:49:50)[GCC 4.8.5 20150623 (Red Hat 4.8.5-11)] on linuxType "help", "copyright", "credits" or "license" for more
问题表现:
响应头中有gbk编码的中文,导致requests无法解码读取header。
http包如图:
Python 3.4.3 (default, Aug 25 2017, 16:49:50)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-11)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import requests
>>> res = requests.get('http://down.chinaz.com/download.asp?id=35&dp=1&fid=22&f=yes',headers={'Referer':'http://down.chinaz.com/soft/12162.htm'},allow_redirects=False)
Traceback (most recent call last):
File "", line 1, in
File "/usr/local/lib/python3.4/site-packages/requests/api.py", line 72, in get
return request('get', url, params=params, **kwargs)
File "/usr/local/lib/python3.4/site-packages/requests/api.py", line 58, in request
return session.request(method=method, url=url, **kwargs)
File "/usr/local/lib/python3.4/site-packages/requests/sessions.py", line 510, in request
resp = self.send(prep, **send_kwargs)
File "/usr/local/lib/python3.4/site-packages/requests/sessions.py", line 655, in send
r._next = next(self.resolve_redirects(r, request, yield_requests=True, **kwargs))
File "/usr/local/lib/python3.4/site-packages/requests/sessions.py", line 125, in resolve_redirects
url = self.get_redirect_target(resp)
File "/usr/local/lib/python3.4/site-packages/requests/sessions.py", line 116, in get_redirect_target
return to_native_string(location, 'utf8')
File "/usr/local/lib/python3.4/site-packages/requests/_internal_utils.py", line 25, in to_native_string
out = string.decode(encoding)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc6 in position 28: invalid continuation byte
>>>
直接导致无法请求,该问题google也找不到相关问题,因为大部人遇到的都是请求成功的响应编码问题,而这个问题是请求时即报错。
经过测试python2.7是没有该问题的
从ipython 中可以看出是这一段错误:
usr/local/lib/python3.4/site-packages/requests/sessions.py in get_redirect_target(self, resp)
114 if is_py3:
115 location = location.encode('latin1')
--> 116 return to_native_string(location, 'utf8')
117 #return location
118
那么对比下python 2.7 与python3.4 的requests底层代码可以看出差别:
python3.4 requests中获取响应location代码;
默认全部使用ut8解码
python 2.7代码:
再看下 get_redirect_target函数:
基本可以确认为python3.4 中获取location时默认使用了utf-8解码,然而如果location是中文gbk编码,那么就会出现文中一开始出现的报错。
临时的解决方法可以将utf-8改为 GBK,另外以下两处也需要修改,用于请求location的地址:
您的支持将鼓励我们继续创作!
用 [微信] 扫描二维码打赏
用 [支付宝] 扫描二维码打赏
更多推荐
所有评论(0)