别再死记硬背了！用Python代码和可视化动画，5分钟搞懂MCMC采样到底在干啥

金七言

460人浏览 · 2026-05-26 12:16:20

金七言 · 2026-05-26 12:16:20 发布

用Python动画和代码拆解MCMC：一场拒绝与接受的采样游戏

当我在咖啡厅第一次听到邻桌讨论MCMC时，他们面前摊开的数学公式让我想起大学时被概率论支配的恐惧。直到后来在生物信息学实验室，看到同事用几行Python代码和动态图表就讲清楚了MCMC的核心思想，我才恍然大悟——原来理解复杂算法可以如此直观。本文将带你用完全不同的方式认识这个在深度学习、推荐系统中广泛使用的采样技术。

1. 从赌场到实验室：蒙特卡洛的魔法

想象你站在拉斯维加斯的轮盘赌桌前，每次下注都是对概率分布的一次采样。蒙特卡洛方法正是得名于此——通过随机采样来近似计算复杂问题的解。让我们用Python模拟这个思想：

import numpy as np
import matplotlib.pyplot as plt

def crude_monte_carlo(f, a, b, n_samples=1000):
    samples = np.random.uniform(a, b, n_samples)
    integral = (b - a) * np.mean(f(samples))
    return integral

# 定义目标函数（这里用sin函数示例）
target_func = np.sin
a, b = 0, np.pi
true_integral = 2  # sin(x)在[0,pi]的积分值是2

# 不同采样次数下的近似结果
sample_sizes = [10, 100, 1000, 10000]
approximations = [crude_monte_carlo(target_func, a, b, n) for n in sample_sizes]

plt.figure(figsize=(10,6))
plt.plot(sample_sizes, approximations, 'o-', label='蒙特卡洛近似')
plt.axhline(true_integral, color='r', linestyle='--', label='真实值')
plt.xscale('log')
plt.xlabel('采样次数')
plt.ylabel('积分近似值')
plt.legend()
plt.show()

这段代码揭示了一个关键现象： 随着采样次数增加，近似值会越来越接近真实积分 。但问题来了——当目标分布不均匀时，均匀采样效率极低。就像在赌场里，某些数字出现的概率天生就比其他数字高。

提示：在Jupyter Notebook中运行上述代码时，尝试修改target_func为其他函数（如lambda x: np.exp(-x**2)），观察不同函数的收敛速度差异。

2. 马尔可夫链：记忆只有一秒的随机漫步者

马尔可夫链的核心特性是"无记忆性"——下一状态只取决于当前状态。用股市状态转换作为例子：

def markov_chain_simulation(transition_matrix, initial_state, n_steps=50):
    states = ['牛市', '熊市', '横盘']
    current_state = initial_state
    history = [current_state]
    
    for _ in range(n_steps):
        probs = transition_matrix[states.index(current_state)]
        current_state = np.random.choice(states, p=probs)
        history.append(current_state)
    
    return history

# 定义状态转移矩阵
transition_matrix = [
    [0.9, 0.075, 0.025],  # 牛市后的转移概率
    [0.15, 0.8, 0.05],    # 熊市后的转移概率
    [0.25, 0.25, 0.5]     # 横盘后的转移概率
]

# 模拟不同初始状态下的演化
plt.figure(figsize=(12,6))
for init_state in ['牛市', '熊市', '横盘']:
    history = markov_chain_simulation(transition_matrix, init_state)
    plt.plot(history, 'o-', label=f'初始状态:{init_state}', alpha=0.7)

plt.yticks([0,1,2], ['牛市', '熊市', '横盘'])
plt.xlabel('时间步')
plt.ylabel('市场状态')
plt.legend()
plt.show()

运行这段代码，你会观察到一个神奇现象： 无论从哪种初始状态开始，经过足够多步后，系统停留在各状态的概率趋于稳定 。这就是所谓的平稳分布——MCMC采样的关键所在。

3. MCMC核心：接受-拒绝的艺术

结合前两部分，MCMC的智慧在于：设计一个马尔可夫链，使其平稳分布就是我们想要采样的目标分布。Metropolis-Hastings算法通过巧妙的"接受-拒绝"机制实现这一点：

def metropolis_hastings(target_dist, proposal_dist, n_samples=5000, burn_in=1000):
    samples = []
    current = proposal_dist()  # 初始样本
    
    for i in range(n_samples + burn_in):
        # 从建议分布生成候选样本
        candidate = current + np.random.normal(0, 1)
        
        # 计算接受概率
        acceptance = min(1, target_dist(candidate) / target_dist(current))
        
        # 决定是否接受
        if np.random.rand() < acceptance:
            current = candidate
            
        if i >= burn_in:  # 跳过初始的燃烧期
            samples.append(current)
    
    return samples

# 定义目标分布（标准正态分布）
target_dist = lambda x: np.exp(-x**2/2)/np.sqrt(2*np.pi)

samples = metropolis_hastings(target_dist, lambda: np.random.normal(0, 1))

# 可视化结果
plt.figure(figsize=(12,5))
plt.subplot(1,2,1)
plt.plot(samples[:200])  # 显示前200个样本的随机游走
plt.title('MCMC采样路径')

plt.subplot(1,2,2)
plt.hist(samples, bins=50, density=True, alpha=0.6)
x = np.linspace(-4,4,100)
plt.plot(x, target_dist(x), 'r')  # 真实分布曲线
plt.title('采样分布 vs 真实分布')
plt.show()

这段代码中有几个关键点值得注意：

建议分布 ：这里使用以当前点为中心的正态分布生成候选点
接受概率 ：确保转移概率满足细致平衡条件
燃烧期 ：早期样本可能未达到平稳分布，需要丢弃

注意：尝试修改proposal_dist的方差参数（如改为0.5或2），观察接受率和采样效率的变化。方差过小会导致探索不足，过大则接受率降低。

4. 实战演练：用MCMC估计贝叶斯参数

让我们看一个实际应用场景——贝叶斯参数估计。假设我们有一组观测数据，想要估计其均值参数的后验分布：

# 生成模拟数据
np.random.seed(42)
true_mean = 3.7
observed_data = np.random.normal(true_mean, 1, size=100)

def bayesian_inference_mcmc(data, n_samples=10000):
    # 定义先验和似然
    prior = lambda mu: np.exp(-mu**2/2)  # 标准正态先验
    likelihood = lambda mu: np.prod(np.exp(-(data-mu)**2/2))
    posterior = lambda mu: prior(mu) * likelihood(mu)
    
    # MCMC采样
    samples = []
    current = 0  # 初始值
    
    for _ in range(n_samples):
        candidate = current + np.random.normal(0, 0.5)
        acceptance = min(1, posterior(candidate) / posterior(current))
        
        if np.random.rand() < acceptance:
            current = candidate
        samples.append(current)
    
    return samples

samples = bayesian_inference_mcmc(observed_data)

# 分析结果
burn_in = 1000
posterior_samples = samples[burn_in:]
print(f"后验均值估计: {np.mean(posterior_samples):.2f}")
print(f"95%可信区间: {np.percentile(posterior_samples, [2.5, 97.5])}")

plt.figure(figsize=(10,5))
plt.plot(samples[:500], alpha=0.5)  # 显示前500个样本的收敛过程
plt.axhline(true_mean, color='r', linestyle='--', label='真实均值')
plt.xlabel('迭代次数')
plt.ylabel('参数值')
plt.legend()
plt.show()

这个例子展示了MCMC在贝叶斯统计中的核心价值——当后验分布难以解析计算时，我们可以通过采样来近似。在生物信息学中，这种技术被广泛用于基因表达分析；在推荐系统中，则用于估计用户偏好参数。

5. 性能优化与实用技巧

在实际应用中，我们还需要考虑几个关键问题：

1. 诊断收敛性

def plot_trace_and_acf(samples, true_value=None):
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12,4))
    
    # 轨迹图
    ax1.plot(samples)
    if true_value is not None:
        ax1.axhline(true_value, color='r', linestyle='--')
    ax1.set_title('采样轨迹')
    
    # 自相关图
    from statsmodels.graphics.tsaplots import plot_acf
    plot_acf(samples, lags=50, ax=ax2)
    ax2.set_title('自相关函数')
    plt.show()

plot_trace_and_acf(posterior_samples, true_mean)

2. 调整建议分布 建议分布的宽度显著影响采样效率：

太窄：探索不足，自相关高
太宽：接受率低，效率差

经验法则是调整接受率在20%-50%之间。可以通过以下代码测试：

def test_acceptance_rate(proposal_std):
    samples = metropolis_hastings(target_dist, lambda: 0, 
                                 proposal_dist=lambda x: x + np.random.normal(0, proposal_std))
    accepted = len(set(samples)) / len(samples)
    return accepted

stds = np.linspace(0.1, 3, 20)
accept_rates = [test_acceptance_rate(std) for std in stds]

plt.plot(stds, accept_rates, 'o-')
plt.axhline(0.3, color='r', linestyle='--')
plt.xlabel('建议分布标准差')
plt.ylabel('接受率')
plt.show()

3. 并行链诊断 运行多个独立链检查是否收敛到相同分布：

def run_multiple_chains(n_chains=4, n_samples=2000):
    chains = []
    for _ in range(n_chains):
        initial = np.random.uniform(-10,10)  # 不同初始值
        chain = metropolis_hastings(target_dist, lambda: initial, n_samples=n_samples)
        chains.append(chain)
    
    plt.figure(figsize=(10,6))
    for i, chain in enumerate(chains):
        plt.plot(chain, alpha=0.7, label=f'链 {i+1}')
    plt.title('多条MCMC链的轨迹')
    plt.legend()
    plt.show()
    
    return chains

chains = run_multiple_chains()

在数据科学项目中，MCMC常用于：