Generalizing Goal-Conditioned Reinforcement Learning with Variational Causal Reasoning

image-20221118003634475


Preliminary

  • is the goal space
  • is the sparse deterministic reward function

通过在不同goal的分布上采样来生成不同任务,进而探索goal-conditioned generalization problem

image-20221118004719092

Causal Reasoning with Graphical Models

  • random variables with index set
  • A graph consists of nodes and edges
  • A node is called a parent of if and . The set of parents of is denoted by .

image-20221118005912628

GCRL as Latent Variable Models

从probabilistic inference的角度, 目标是解决likelihood maximization problem for with . 将graph 作为latent variable,可以将分解后得到ELBO:

image-20221118010646311

image-20221118010943467


是常数(uniform distribution),因此maximize 可以转换成objective:

image-20221118011338117

Intuition:为了解上述优化问题,需要交替更新 (causal discovery)和 (model and policy learning)


image-20221118011652439

Model learning

We propose to model the transition corresponding to G with a collection of neural networks to obtain

  • represents the values of all parents of node at time step
  • follows Gaussian noise

Policy learning with planning

  • MPC (random shooting):

Data-Efficient Causal Discovery

image-20221118011338117

image-20221118015308673


此时causal discovery得到简化:

  • restrict the posterior to point mass distribution and use a threshold to control the sparsity.
  • perform the discovery process from the classification perspective by proposing binary classifiers to determine the existence of an edge .
  • is the threshold for the p-value of the hypothesis. A larger corresponds to harder sparsity constraints, leading to a sparse since two nodes are more likely to be considered independent.

image-20221118040116230


image-20221118021227153

According to the definition 3, we only need to conduct classification to edges connecting nodes between and . If two nodes are dependent, we add one edge directed from the node in to the node in .

image-20221118021441671


image-20221118014500111

Analysis of Performance Guarantee

image-20221118021637325

image-20221118022639938

  • causal graph越好,model learning效果越好

image-20221118021645736

  • model learning效果越好,value function越接近optimal

image-20221118021656879

image-20221118021707017

  • 想要控制bound,需要更好的policy(因此需要交替进行model learning和policy learning)

image-20221118021815999

Experiments

image-20221118033210374{width=50%}


image-20221118033229336


image-20221118102013934


image-20221118034149640


image-20221118034113025


image-20221118034349165

Summary & Thoughts

  • 通过学习causal transition model来提升generality
  • 结合causality相关的理论可以带来更好的可解释性?
  • 是对causal graph的显式估计,训练难度大
  • offline效果差但是更实际。优化offline?

image-20221118011338117


Problem: 不存在关系 , how to learn ?


  • 如果不存在或未知,无法显式预测得到
  • 如果隐式encode ,和其他方法没有大的区别
  • 需要增加额外信息才能不依赖得到?e.g., 增加assumption: 与current information存在关联
  • 但是之间会间接因为的关系产生的结构关联?
  • 做一些验证
  • 考虑其他角度