
简介
该用户还未填写简介
擅长的技术栈
可提供的服务
暂无可提供的服务
【强化学习实战】DQN和Double DQN保姆级教程(2):以MountainCar-v0为什么用Deep Q Network (DQN)实战:用Double DQN求解MountainCar问题MountainCar问题详解MountainCar问题的源代码解释cartpole.pyMountainCar的状态(Observation)MountainCar的动作MountainCar的目的D
Policy-based reinforcement learningPolicy NetworksBehavior CloningTrain policy network using Policy gradientTrain the value networkMente Carlo Tree Search本笔记整理自 (作者: Shusen Wang):https://www.bilibili.
Target network & Double DQNTarget network & Double DQNTarget NetworkDouble DQN本笔记整理自 (作者: Shusen Wang):https://www.bilibili.com/video/BV1rv41167yx?from=search&seid=18272266068137655483&
Value-based reinforcement learningValue-based reinforcement learningAction-value functionsDeep Q Network (DQN)训练神经网络的算法:`Temporal difference algorithm`一个例子Apply TD learning to DQNSummary参考文献本文整理自教学视频
Monte Carlo AlgorithmCalculating π\piπ本笔记整理自 (作者: Shusen Wang):https://www.bilibili.com/video/BV1rv41167yx?from=search&seid=18272266068137655483&spm_id_from=333.337.0.0Monte Carlo Algorithm: M
Multi-step TD target本笔记整理自 (作者: Shusen Wang):https://www.bilibili.com/video/BV1rv41167yx?from=search&seid=18272266068137655483&spm_id_from=333.337.0.0Multi-step TD target是对TD算法的一种改进。注意:上面的Sars
Actor-critic algorithmActor-critic algorithmValue network and policy network训练神经网络Summary本笔记整理自 (作者: Shusen Wang):https://www.bilibili.com/video/BV1rv41167yx?from=search&seid=18272266068137655483&
Multi-step TD target本笔记整理自 (作者: Shusen Wang):https://www.bilibili.com/video/BV1rv41167yx?from=search&seid=18272266068137655483&spm_id_from=333.337.0.0Multi-step TD target是对TD算法的一种改进。注意:上面的Sars
Value-based reinforcement learningValue-based reinforcement learningAction-value functionsDeep Q Network (DQN)训练神经网络的算法:`Temporal difference algorithm`一个例子Apply TD learning to DQNSummary参考文献本文整理自教学视频
本文主要来介绍一种基本的Cutting plane算法,即:Cover Cuts。








