A Discussion on GAN-based IRL Methods

Posted on 2018-08-31 | In Study Notes

Introduction

Inverse reinforcement learning (IRL) is the dual of reinforcement learning (RL) and aims at recovering the reward function that the agent optimizes with. Compared with supervised learning (behavior cloning, BC), IRL imposes more restrictive regularity conditions on the set of policies and is more appropriate in cases with inadequate expert demonstrations and for robust estimations. For example, consider a routing problem on a grid network: the optimal actions taken by the agent are geographically correlated. In this case, compared with BC, IRL has a higher chance of recovering the internal reward structure and leads to more consistent actions across space.

In this post, I will briefly introduce several recent works using the generative adversarial network (GAN) for IRL and discuss their connections.

关于共享出行

Posted on 2017-12-27 | In Comment

交通即人与物在时空上的流动。从本质来看，交通系统的设计是为这种流动提供可靠的载体。在交通系统设计不能得到改进的情况下，在提高出行能力（即满足更多的出行需求）与缓解城市拥堵之间必然存在着取舍。这种在出行能力与拥堵之间的取舍平衡，在下面的讨论中中将统一称为系统供需平衡。我这一年在交通方面的研究，主要就是关注于各类共享出行服务（shared mobility）对这种系统供需平衡的影响。

Reflection on Simple Queues

Posted on 2016-12-17 | In Reflection

This semester, when I was working on homework of 1.200, there was a frequently asked question: under what situation would the queue system have a steady state? It is claimed that system with finite capacity always have a steady state, but no proof is given in the class. In this post, I will discuss the dynamical system approach for this problem, and suggest several extensions.

Case Study: Latent Classes in Quebec Energy Consumption

Posted on 2016-11-06 | In Case Study

Last semester, in course 1.202 Demand Modeling, there is an interesting case study of Quebec energy consumption. Exploratory data analysis indicates that there are potential market segments, and subsequent computation shows that multinomial logit models estimated from separate datasets have a better total log likelihood than the one from the whole dataset. An interesting question then arises: can we discover latent classes efficiently on high dimensional data?

Case Study: Delay in Mexico City

Posted on 2016-10-23 | In Case Study

This June, when traveling in Mexico, I encountered one of the most unexpected congestion in my life. The horrible performance of Google Maps was annoying and during the summer I kept thinking about how to model and solve this problem. In last several days, I realized the tools we needed were no more than high school arithmetics.

Integer Programming Note V: Decomposition