【urllib，一个网络请求的 Python 库！】标准库上网神器，零依赖搞定HTTP

代码小书生

383人浏览 · 2026-05-13 05:30:00

代码小书生 · 2026-05-13 05:30:00 发布

在Python开发领域，网络请求是每名程序员的必修课。当谈到HTTP客户端库时，很多人第一时间想到requests，但Python标准库中自带的老牌悍将urllib同样值得你深入了解。urllib是一个集合了请求发送、URL解析、异常处理等多个子模块的网络请求工具包，它已经陪着Python走过了二十多年，稳定可靠，在任何Python环境中都能直接使用——无需pip install，无需担心依赖冲突。

在实际生活中，urllib的身影几乎无处不在：桌面软件自动检查新版本时，用它请求GitHub的latest release API；股票小助手定时拉取实时行情；自动化脚本抓取天气预报发送到手机；甚至一些老旧服务器上的运维cron任务，由于安全限制无法安装第三方包，urllib就成了唯一的救星。它的核心价值在于零依赖、跨平台、协议覆盖全面（支持HTTP、HTTPS、FTP、File等）。虽然它的API相比requests略繁琐一些，但一旦掌握，你会发现自己对HTTP协议的理解更深了一层。

urllib其实由4个子模块构成：urllib.request（打开和读取URL）、urllib.parse（解析和构造URL）、urllib.error（异常类）、urllib.robotparser（解析robots.txt）。本文将聚焦最常用的request和parse，带你从青铜到王者。

一、安装库

因为urllib是Python标准库的一部分，所以不需要任何安装步骤。你只需要确保Python环境正常，就可以直接导入：

python

import urllib.request
import urllib.parse
import urllib.error
print("urllib已就绪")

二、基本用法——4步从零发起第一个网络请求

1. 发送GET请求并读取网页内容

最简单的场景：抓取一个网页的HTML源码。

python

import urllib.request

url = 'https://httpbin.org/get'
with urllib.request.urlopen(url) as response:
    html = response.read().decode('utf-8')
    print(html[:200])   # 打印前200个字符

urlopen返回一个类文件对象，支持read()、readline()等方法。注意：默认会跟随重定向，状态码非200时会抛出HTTPError。

2. 发送带查询参数的GET请求（URL编码）

如果需要在URL中传递参数（如搜索关键词），需要用urllib.parse进行编码。

python

import urllib.parse
import urllib.request

params = {'q': 'Python urllib', 'page': 1}
query_string = urllib.parse.urlencode(params)
full_url = 'https://httpbin.org/get?' + query_string

with urllib.request.urlopen(full_url) as resp:
    data = resp.read().decode()
    print(data)

urlencode会把字典转换成q=Python+urllib&page=1的形式，自动处理特殊字符和空格。

3. 发送POST请求（提交表单数据）

python

import urllib.parse
import urllib.request

post_data = {'username': 'alice', 'password': '123456'}
data_bytes = urllib.parse.urlencode(post_data).encode('utf-8')

req = urllib.request.Request('https://httpbin.org/post', data=data_bytes, method='POST')
with urllib.request.urlopen(req) as resp:
    result = resp.read().decode()
    print(result)

注意：data参数必须是bytes类型；若提供data，则默认方法为POST，但显式指定method='POST'更清晰。

4. 添加请求头（伪装User-Agent）

很多网站会拒绝默认的User-Agent（如Python-urllib/3.x），因此需要伪装成浏览器。

python

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'
}
req = urllib.request.Request('https://httpbin.org/headers', headers=headers)
with urllib.request.urlopen(req) as resp:
    print(resp.read().decode())

三、高级用法——掌控更多协议细节

1. 处理异常（网络错误、HTTP错误）

健壮的代码必须区分不同的错误。

python

import urllib.request
import urllib.error

url = 'https://httpbin.org/status/404'
try:
    with urllib.request.urlopen(url) as resp:
        print(resp.read())
except urllib.error.HTTPError as e:
    print(f'HTTP错误: {e.code} - {e.reason}')
except urllib.error.URLError as e:
    print(f'网络错误: {e.reason}')
except Exception as e:
    print(f'其他异常: {e}')

2. 设置超时与重试机制

在弱网环境下，超时和重试至关重要。

python

import urllib.request
import time

def fetch_with_retry(url, max_retries=3, timeout=5):
    for attempt in range(max_retries):
        try:
            with urllib.request.urlopen(url, timeout=timeout) as resp:
                return resp.read().decode()
        except urllib.error.URLError as e:
            print(f'尝试 {attempt+1} 失败: {e.reason}')
            time.sleep(2)
    raise Exception(f'超过最大重试次数 {max_retries}')

data = fetch_with_retry('https://httpbin.org/delay/3', timeout=2)
print(data)

3. 使用代理（突破网络限制）

python

import urllib.request

proxy_handler = urllib.request.ProxyHandler({'http': 'http://127.0.0.1:8080', 'https': 'https://127.0.0.1:8080'})
opener = urllib.request.build_opener(proxy_handler)
urllib.request.install_opener(opener)

response = urllib.request.urlopen('https://httpbin.org/ip')
print(response.read().decode())

4. 处理Cookies（维持会话）

python

import urllib.request
from http.cookiejar import CookieJar

cookie_jar = CookieJar()
opener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cookie_jar))
urllib.request.install_opener(opener)

# 第一次请求：设置cookie
resp1 = urllib.request.urlopen('https://httpbin.org/cookies/set?name=value')
# 第二次请求：自动携带cookie
resp2 = urllib.request.urlopen('https://httpbin.org/cookies')
print(resp2.read().decode())

四、实际应用场景——三个有深度的案例，日常就能用

场景一：实时查询天气（结合公开API）

通过免费天气API获取任意城市的天气信息，并用urllib发送请求，最后解析JSON数据。

python

import urllib.request
import urllib.parse
import json

def get_weather(city_name):
    base_url = 'https://wttr.in/'
    encoded_city = urllib.parse.quote(city_name)
    url = f'{base_url}{encoded_city}?format=j1'   # j1表示JSON格式
    try:
        with urllib.request.urlopen(url, timeout=10) as resp:
            data = json.loads(resp.read().decode())
            current = data['current_condition'][0]
            temp = current['temp_C']
            desc = current['weatherDesc'][0]['value']
            return f'{city_name}当前气温：{temp}°C，天气：{desc}'
    except Exception as e:
        return f'查询失败: {e}'

print(get_weather('Beijing'))
print(get_weather('New York'))

这个案例展示了URL编码、异常处理和JSON解析的综合运用。

场景二：监控网站状态变化（价格变动提醒）

假设你想监控某电商网站的商品价格（这里以模拟为例），每隔一段时间抓取页面，比较内容是否变化，如果发现降价（关键词出现）就发邮件或打印提醒。

python

import urllib.request
import time
import hashlib

def get_page_hash(url):
    try:
        with urllib.request.urlopen(url, timeout=10) as resp:
            content = resp.read()
            return hashlib.md5(content).hexdigest(), content
    except Exception:
        return None, None

def monitor_website(url, interval=60):
    last_hash, _ = get_page_hash(url)
    print(f'开始监控 {url}，间隔 {interval} 秒')
    while True:
        time.sleep(interval)
        new_hash, content = get_page_hash(url)
        if new_hash is None:
            print('获取失败，跳过本轮')
            continue
        if new_hash != last_hash:
            print('🔔 检测到页面变化！')
            # 这里可以进一步分析content，比如查找价格数字
            if b'299' in content:   # 假设价格变成299
                print('降价警报！立即购买！')
            last_hash = new_hash
        else:
            print(f'{time.ctime()} 无变化')

monitor_website('https://httpbin.org/html', interval=30)  # 实际替换为目标商品页

场景三：自动下载并保存网络图片（爬虫基础）

每天自动抓取必应壁纸或者指定图片，保存到本地，可用于制作桌面壁纸自动换。

python

import urllib.request
import os
from datetime import datetime

def download_image(url, save_dir='images'):
    os.makedirs(save_dir, exist_ok=True)
    filename = os.path.join(save_dir, f'{datetime.now().strftime("%Y%m%d_%H%M%S")}.jpg')
    try:
        urllib.request.urlretrieve(url, filename)
        print(f'图片已保存到 {filename}')
        return filename
    except Exception as e:
        print(f'下载失败: {e}')
        return None

# 示例：从占位图网站下载一张随机图片
image_url = 'https://picsum.photos/800/600'
download_image(image_url)

这个例子使用了urlretrieve便捷函数，适用于直接保存文件。

五、结尾：总结与互动

urllib虽然不像第三方库那样语法糖丰富，却拥有“原子级”的稳定性和零依赖的稀缺品质。对于初学者，它强迫你理解URL编码、Header构造、异常层次这些核心概念，一旦精通，再去看requests源码都会觉得豁然开朗。在微服务、云函数等轻量化场景中，引入requests可能会增加层数甚至冷启动时间，而urllib永远是那个“立刻就能跑”的老伙计。

恭喜你，现在你已经掌握了使用Python标准库发起网络请求的全套技能。那么问题来了：你会如何利用urllib来优化你每天的工作或生活？也许可以写一个脚本，每天早上自动抓取你关注的科技博客，生成摘要发到你邮箱；或者写一个比价工具，盯着你心仪的商品价格，一旦降价立即发微信通知（配合Server酱）。欢迎在评论区分享你的想法或代码片段，如果遇到urllib.error.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED]>这种令人头大的证书问题，也可以留言交流，我们一起找出最优雅的绕过或修复方案。动手实践是最好的学习方式，去写下你的第一个urllib爬虫吧！