Factored Adaptation for Non-Stationary Reinforcement Learning

image-20221012220851578


  • A framework that learns a factored representation to adapt to non-stationarity.
  • We formalize a unified framework that can handle different non-stationary settings, including discrete and continuous changes, both within and across episodes.

Background

This work extends the factored representation for fast policy adaptation across domains introduced in AdaRL (ICLR 2022). Suppose there are source domains, target domains. The generative process of the environment in the -th domain with can be described in terms of the transition function as

image-20221013233928734

  • 只会影响 的部分维度; 的维度之间也存在结构关系。
  • are the change factors that have a constant value in each domain, but vary across domains.

  • -th dimension of is influences
  • is influences
  • encodes which components of the change factor affect

image-20221012231855139


image-20221014023514622

The optimal policy across domains is then learned using these compact representations .

Factored Non-stationary MDPs

image-20221012231120568


image-20221012231458305


image-20221012231947529

  • Continuous changes: If and are continuous, then they can model smooth changes in the environment, including within and across episodes.
  • Discrete changes: can be represented with a piecewise-constant function.

image-20221012221153409


image-20221014025848930


image-20221014031039694

Sparsity loss

We encourage sparsity in the binary masks to improve identifiability, by using following loss with

image-20221014032327139

The total objective function:


image-20221013000458048


image-20221013001505302

Experiments

image-20221013033145339


image-20221014032818575


image-20221014032803813


image-20221014032856170

Baselines

  • meta-RL: TRIO, VariBAD
  • representative task embedding approache: LILAC, ZeUS
  • stationary: SAC, oracle

image-20221013002000374


image-20221013002238730


image-20221014033739929

Summary

  • 构想了一种“Change Factor”,并尝试对这种latent structure进行建模来解决non-stationarity问题
  • claim: can handle different non-stationary settings. 实际上会有一个较大的限制是dynamics的变化是关于time 和task index 的function