最近因工作需要,研究了一下mitmproxy做代理抓取,遇到一个比较典型的问题,就是在抓取过程中,需要定期更滑二级代理ip(懂的都懂 =。=)

网上面能找到的都是18年的结果,早都不能用了,通过mitmproxy官方github,找到了最新的解决方案,分享出来,给需要的同学。

def request(flow: http.HTTPFlow) -> None:
    address = proxy_address(flow)

    is_proxy_change = address != flow.server_conn.via.address
    server_connection_already_open = flow.server_conn.timestamp_start is not None
    if is_proxy_change and server_connection_already_open:
        # server_conn already refers to an existing connection (which cannot be modified),
        # so we need to replace it with a new server connection object.
        flow.server_conn = Server(flow.server_conn.address)
    flow.server_conn.via = ServerSpec("http", address)

PS:要启用二级代理需要再启动服务的时候增加option

# Usage: mitmdump
#   -s change_upstream_proxy.py
#   --mode upstream:http://default-upstream-proxy:8080/
#   --set connection_strategy=lazy
#   --set upstream_cert=false

原文地址:https://github.com/mitmproxy/mitmproxy/discussions/5173

此外还有个问题,上面的方案只是每次修改了当前请求的二级代理设置,并没有同步修改当前mitproxy服务的配置(就是我们启动时给的mode那个参数),所以会导致所有请求总是会现请求之前老的代理,从而出现请求时间长,502等问题。

经过研究源码,增加了修改服务配置的部分,解决这个问题

is_proxy_change = proxy_address != flow.server_conn.via.address
server_connection_already_open = flow.server_conn.timestamp_start is not None
if is_proxy_change and server_connection_already_open:
    # server_conn already refers to an existing connection (which cannot be modified),
    # so we need to replace it with a new server connection object.
    flow.server_conn = Server(flow.server_conn.address)

if is_proxy_change:
    print("原代理" + str(flow.server_conn.via.address) + '|新代理' + str(proxy_address))
    flow.server_conn.via = ServerSpec('http', proxy_address)

    mode_option = {'mode': str('upstream:' + proxyinfo)}
    server = getServer()
    # 更新运行环境中的代理设置
    print("当前运行环境代理配置:" + ctx.master.options.__getattr__('mode'))
    ctx.master.options.update(**mode_option)
    print("当前运行环境配置更新后:" + ctx.master.options.__getattr__('mode'))
Logo

瓜分20万奖金 获得内推名额 丰厚实物奖励 易参与易上手

更多推荐