[RL 14] QMIX (ICML, 2018, Oxford)

论文: QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning背景同VDN4. QMIX假设 Qtot 与 Qi 有如(4)式的关系.(4)式可以通过(5)式实现.(5)式可以通过如下Fig2的QMIX网络架构实现agent networks: 进行local决策DRQN, Q

xyp99

238人浏览 · 2021-01-17 16:23:22

xyp99 · 2021-01-17 16:23:22 发布

论文: QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning

背景

同VDN

4. QMIX

假设 Qtot 与 Qi 有如(4)式的关系.
在这里插入图片描述
(4)式可以通过(5)式实现.

(5)式可以通过如下Fig2的QMIX网络架构实现

agent networks: 进行local决策
- 即DRQN, Qi
mixing network: 实现线性并保证单调(式(5))
- 保证单调的方法
  1. 网络权重W均为正 (bias无所谓)
  2. (非线性的)激活函数单调递增
hypernetworks: 融入state信息
1. state为何不直接与Qi 一同作为输入?
  1. 因为 Qtot 和 st 之间没有单调关系
2. 实现灵活输入以方便 mix net 评估 joint action-value
  1. 由state经过NN生成mix net的weight 和 bias
3. 为什么最后的bias用了两层Linear?

7.2. Ablation Results

both central state information and non-linear value function factorisation is required to achieve good performance.

A.1. Representational Complexity

Three Keys:
The value function class representable with QMIX includes any value function that can be factored into a non-linear monotonic combination of the agents’ individual value functions in the fully observable setting.
1. non-linear: 比VDN更expressive
2. monotonic (对于单个agent来说) : 单个agent的最优action与其他agent无关
3. fully observable: observartion != state
  In a Dec-POMDP, QMIX cannot necessarily represent the value function. For example, if