logo
publist
写文章

简介

该用户还未填写简介

擅长的技术栈

可提供的服务

暂无可提供的服务

【强化学习实战-04】DQN和Double DQN保姆级教程(2):以MountainCar-v0

【强化学习实战】DQN和Double DQN保姆级教程(2):以MountainCar-v0为什么用Deep Q Network (DQN)实战:用Double DQN求解MountainCar问题MountainCar问题详解MountainCar问题的源代码解释cartpole.pyMountainCar的状态(Observation)MountainCar的动作MountainCar的目的D

#机器学习#深度学习
【强化学习-05】AlphaGo

Policy-based reinforcement learningPolicy NetworksBehavior CloningTrain policy network using Policy gradientTrain the value networkMente Carlo Tree Search本笔记整理自 (作者: Shusen Wang):https://www.bilibili.

#深度学习#机器学习
【强化学习-11】Target network & Double DQN

Target network & Double DQNTarget network & Double DQNTarget NetworkDouble DQN本笔记整理自 (作者: Shusen Wang):https://www.bilibili.com/video/BV1rv41167yx?from=search&seid=18272266068137655483&amp

#深度学习#机器学习
【强化学习-02】Value-based reinforcement learning

Value-based reinforcement learningValue-based reinforcement learningAction-value functionsDeep Q Network (DQN)训练神经网络的算法:`Temporal difference algorithm`一个例子Apply TD learning to DQNSummary参考文献本文整理自教学视频

【强化学习-06】Monte Carlo Algorithm

Monte Carlo AlgorithmCalculating π\piπ本笔记整理自 (作者: Shusen Wang):https://www.bilibili.com/video/BV1rv41167yx?from=search&seid=18272266068137655483&spm_id_from=333.337.0.0Monte Carlo Algorithm: M

#深度学习#机器学习
【强化学习-09】Multi-step TD target

Multi-step TD target本笔记整理自 (作者: Shusen Wang):https://www.bilibili.com/video/BV1rv41167yx?from=search&seid=18272266068137655483&spm_id_from=333.337.0.0Multi-step TD target是对TD算法的一种改进。注意:上面的Sars

#深度学习#机器学习
【强化学习-04】Actor-critic algorithm

Actor-critic algorithmActor-critic algorithmValue network and policy network训练神经网络Summary本笔记整理自 (作者: Shusen Wang):https://www.bilibili.com/video/BV1rv41167yx?from=search&seid=18272266068137655483&

#神经网络#深度学习#机器学习
【强化学习-09】Multi-step TD target

Multi-step TD target本笔记整理自 (作者: Shusen Wang):https://www.bilibili.com/video/BV1rv41167yx?from=search&seid=18272266068137655483&spm_id_from=333.337.0.0Multi-step TD target是对TD算法的一种改进。注意:上面的Sars

#深度学习#机器学习
【强化学习-02】Value-based reinforcement learning

Value-based reinforcement learningValue-based reinforcement learningAction-value functionsDeep Q Network (DQN)训练神经网络的算法:`Temporal difference algorithm`一个例子Apply TD learning to DQNSummary参考文献本文整理自教学视频

    共 26 条
  • 1
  • 2
  • 3
  • 请选择