DeepMind：所谓SACX学习范式

时间 2019-12-06

原文原文链接

机器人是否能应用于服务最终仍是那两条腿值多少钱，而与人交互，能真正地作“服务”工做，仍是看那两条胳膊怎么工做。大脑的智能化仍是很是遥远的，仍是先把感觉器和效应器作好才是王道。
app

关于强化学习，根据Agent对策略的主动性不一样划分为主动强化学习（学习策略：必须本身决定采起什么行动）和被动强化学习（固定的策略决定其行为，为评价学习，即Agent如何从成功与失败中、回报与惩罚中进行学习，学习效用函数）。async

DeepMind提出调度辅助控制（Scheduled Auxiliary Control，SACX），这是强化学习（RL）上下文中一种新型的学习范式。SAC-X可以在存在多个稀疏奖励信号的状况下，从头开始（from scratch）学习复杂行为。为此，智能体配备了一套通用的辅助任务，它试图经过off-policy强化学习同时从中进行学习。优化

这个长向量的形式化以及优化为论文的亮点。
this

In this paper, we introduce a new method dubbed Scheduled Auxiliary Control (SAC-X), as a first step towards such an approach. It is based on four main principles:

1. Every state-action pair is paired with a vector of rewards, consisting of ( typically sparse ) externally provided rewards and (typically sparse) internal auxiliary rewards.

2. Each reward entry has an assigned policy, called intention in the following, which is trained to maximize its corresponding cumulative reward.

3. There is a high-level scheduler which selects and executes the individual intentions with the goal of improving performance of the agent on the external tasks.

     4. Learning is performed off-policy ( and asynchronouslyfrom policy execution ) and the experience between intentions is shared – to use information effectively. Although the approach proposed in this paper is generally applicable to a wider range of problems, we discuss our method in the light of a typical robotics manipulation applica tion with sparse rewards: stacking various objects and cleaning a table。
        由四个基本准则组成：状态配备多个稀疏奖惩向量-一个稀疏的长向量；每一个奖惩被分配策略-称为意图，经过最大化累计奖惩向量反馈；创建一个高层的选择执行特定意图的机制用以提升Agent的表现；学习是基于off-policy（新策略，Q值更新使用新策略），且意图之间的经验共享增长效率。整体方法能够应用于通用领域，在此咱们以典型的机器人任务进行演示。
        基于Off-Play的好处：https://www.zhihu.com/question/57159315

论文：Learning by Playing – Solving Sparse Reward Tasks from Scratchspa

1. DeepMind：所谓SACX学习范式
2. 所谓费曼学习法
3. 何谓“反范式化”？
4. 所谓 WSGI
5. 所谓科研
6. 所谓JPA
7. 所谓语义（2）
8. 所谓生成器
9. 所谓作减法
10. 所谓语义(1)
更多相关文章...
• 您已经学习了 XML Schema，下一步学习什么呢？ - XML Schema 教程
• 我们已经学习了 SQL，下一步学习什么呢？ - SQL 教程
• Tomcat学习笔记（史上最全tomcat学习笔记）
• 适用于PHP初学者的学习线路和建议