Factored Adaptation for Non-Stationary Reinforcement Learning
- A framework that learns a factored representation to adapt to non-stationarity.
- We formalize a unified framework that can handle different non-stationary settings, including discrete and continuous changes, both within and across episodes.
Background
This work extends the factored representation for fast policy adaptation across domains introduced in AdaRL (ICLR 2022). Suppose there are source domains, target domains. The generative process of the environment in the -th domain with can be described in terms of the transition function as
- 只会影响 的部分维度; 与 的维度之间也存在结构关系。
- are the change factors that have a constant value in each domain, but vary across domains.
- -th dimension of is influences
- is influences
- encodes which components of the change factor affect
The optimal policy across domains is then learned using these compact representations .
Factored Non-stationary MDPs
- Continuous changes: If and are continuous, then they can model smooth changes in the environment, including within and across episodes.
- Discrete changes: can be represented with a piecewise-constant function.
Sparsity loss
We encourage sparsity in the binary masks to improve identifiability, by using following loss with
The total objective function:
Experiments
Baselines
- meta-RL: TRIO, VariBAD
- representative task embedding approache: LILAC, ZeUS
- stationary: SAC, oracle
Summary
- 构想了一种“Change Factor”,并尝试对这种latent structure进行建模来解决non-stationarity问题
- claim: can handle different non-stationary settings. 实际上会有一个较大的限制是dynamics的变化是关于time 和task index 的function