
简介
该用户还未填写简介
擅长的技术栈
可提供的服务
暂无可提供的服务
Multi-agent reinforcement learning第3种架构本笔记整理自 (作者: Shusen Wang):https://www.bilibili.com/video/BV1rv41167yx?from=search&seid=18272266068137655483&spm_id_from=333.337.0.0第3种架构
强化学习基本概念
Actor-critic algorithmActor-critic algorithmValue network and policy network训练神经网络Summary本笔记整理自 (作者: Shusen Wang):https://www.bilibili.com/video/BV1rv41167yx?from=search&seid=18272266068137655483&
Monte Carlo AlgorithmCalculating π\piπ本笔记整理自 (作者: Shusen Wang):https://www.bilibili.com/video/BV1rv41167yx?from=search&seid=18272266068137655483&spm_id_from=333.337.0.0Monte Carlo Algorithm: M
【强化学习】概念梳理:强化学习、马尔科夫决策过程与动态规划动态规划(Dynamic programming)马尔科夫链(Markov Chain)马尔科夫决策过程和强化学习马尔科夫决策过程和动态规划强化学习的基本概念状态(State)和动作(Action)策略(Policy) π\piπState transition`reward`和`return`Action-Value function最优
Policy-based reinforcement learningPolicy NetworksBehavior CloningTrain policy network using Policy gradientTrain the value networkMente Carlo Tree Search本笔记整理自 (作者: Shusen Wang):https://www.bilibili.
Value-based reinforcement learningValue-based reinforcement learningAction-value functionsDeep Q Network (DQN)训练神经网络的算法:`Temporal difference algorithm`一个例子Apply TD learning to DQNSummary参考文献本文整理自教学视频
Target network & Double DQNTarget network & Double DQNTarget NetworkDouble DQN本笔记整理自 (作者: Shusen Wang):https://www.bilibili.com/video/BV1rv41167yx?from=search&seid=18272266068137655483&
Multi-step TD target本笔记整理自 (作者: Shusen Wang):https://www.bilibili.com/video/BV1rv41167yx?from=search&seid=18272266068137655483&spm_id_from=333.337.0.0Multi-step TD target是对TD算法的一种改进。注意:上面的Sars
离散控制与连续控制本笔记整理自 (作者: Shusen Wang):https://www.bilibili.com/video/BV1rv41167yx?from=search&seid=18272266068137655483&spm_id_from=333.337.0.0注意:不能直接把DQN应用在连续控制问题上







