Canvas fail!
Publications 2018-06-18T17:56:50+00:00

## Publications

1. M. E. Taylor
Improving Reinforcement Learning with Human Input.
27th International Joint Conference on Artificial Intelligence, 2018.

Reinforcement learning (RL) has had many successes when learning autonomously. This paper and accompanying talk consider how to make use of a non-technical human participant, when available. In particular, we consider the case where a human could 1) provide demonstrations of good behavior, 2) provide online evaluative feedback, or 3) define a curriculum of tasks for the agent to learn on. In all cases, our work has shown such information can be effectively leveraged. After giving a high-level overview of this work, we will highlight a set of open questions and suggest where future work could be usefully focused.

@InProceedings{TaylorIJCAI18,
Title = {Improving Reinforcement Learning with Human Input},
Author = {Matthew E. Taylor},
booktitle = {Proceedings of the 27th International Joint Conference on Artificial Intelligence (IJCAI)},
Year = {2018}
}

2. *A. J. Bose, *H. Ling, and *Y. Cao
56th Annual Meeting of the Association for Computational Linguistics, 2018.
paper  | abstract  | bibtex (* denotes equal contribution)

Learning by contrasting positive and negative samples is a general strategy adopted by many methods. Noise contrastive estimation (NCE) for word embeddings and translating embeddings for knowledge graphs are examples in NLP employing this approach. In this work, we view contrastive learning as an abstraction of all such methods and augment the negative sampler into a mixture distribution containing an adversarially learned sampler. The resulting adaptive sampler finds harder negative examples, which forces the main model to learn a better representation of the data. We evaluate our proposal on learning word embeddings, order embeddings and knowledge graph embeddings and observe both faster convergence and improved results on multiple metrics.

@InProceedings{Bose2018ACE,
Author = {Avishek Joey Bose and Huan Ling and Yanshuai Cao},
booktitle = {Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Long Papers)},
Url = {https://arxiv.org/abs/1805.03642},
Year = {2018}
}

3. Y. Cao, G. W. Ding, K. Y. Lui, and R. Huang
Improving GAN Training via Binarized Representation Entropy (BRE) Regularization.
International Conference on Learning Representations, 2018.

We propose a novel regularizer to improve the training of Generative Adversarial Networks (GANs). The motivation is that when the discriminator D spreads out its model capacity in the right way, the learning signals given to the generator G are more informative and diverse. These in turn help G to explore better and discover the real data manifold while avoiding large unstable jumps due to the erroneous extrapolation made by D. Our regularizer guides the rectifier discriminator D to better allocate its model capacity, by encouraging the binary activation patterns on selected internal layers of D to have a high joint entropy. Experimental results on both synthetic data and real datasets demonstrate improvements in stability and convergence speed of the GAN training, as well as higher sample quality. The approach also leads to higher classification accuracies in semi-supervised learning.

@Article{Cao2018Improving,
Title = {Improving GAN Training via Binarized Representation Entropy (BRE) Regularization},
Author = {Yanshuai Cao and Gavin Weiguang Ding and Kry Yik-Chau Lui and Ruitong Huang},
Journal = {International Conference on Learning Representations, 2018},
Year = {2018},
Url = {https://openreview.net/forum?id=BkLhaGZRW},
Note = {accepted as poster}
}

4. C. Srinivasa, I. Givoni, S. Ravanbahksh, and B. J. Frey
Min-Max Propagation.
Neural Information Processing Systems, 2017.

We study the application of min-max propagation, a variation of belief propagation, for approximate min-max inference in factor graphs. We show that for “any” high-order functions that can be minimized in $\mathcal{O}(\omega)$, the min-max message update can be obtained using an efficient $\mathcal{O}(K (\omega + \log(K)))$ procedure, where $K$ is the number of variables. We demonstrate how this generic procedure, in combination with efficient updates for a family of high-order constraints, enables the application of min-max propagation to efficiently approximate the NP-hard problem of makespan minimization, which seeks to distribute a set of tasks on machines, such that the worst case load is minimized.

@InProceedings{SrinivasaMMP,
Title = {Min-Max Propagation},
Author = {Christopher Srinivasa and Inmar Givoni and Siamak Ravanbahksh and Brendan J. Frey},
Year = {2017},
Abstract = {We study the application of min-max propagation, a variation of belief propagation, for approximate min-max inference in factor graphs. We show that for any'' high-order function that can be minimized in $\mathcal{O}(\omega)$, the min-max message update can be obtained using an efficient $\mathcal{O}(K (\omega + \log(K)))$ procedure, where $K$ is the number of variables. We demonstrate how this generic procedure, in combination with efficient updates for a family of high-order constraints, enables the application of min-max propagation to efficiently approximate the NP-hard problem of makespan minimization, which seeks to distribute a set of tasks on machines, such that theworst case load is minimized.},
Journal = {Neural Information Processing Systems, 2017},
Url = {https://papers.nips.cc/paper/7140-min-max-propagation}
}

5. Y. Cao and L. Wang
Automatic Selection of t-SNE Perplexity.
International Conference on Machine Learning (Workshop on AutoML), 2017.

t-Distributed Stochastic Neighbor Embedding (t-SNE) is one of the most widely used dimensionality reduction methods for data visualization, but it has a perplexity hyperparameter that requires manual selection. In practice, proper tuning of t-SNE perplexity requires users to understand the inner working of the method as well as to have hands-on experience. We propose a model selection objective for t-SNE perplexity that requires negligible extra computation beyond that of the t-SNE itself. We empirically validate that the perplexity settings found by our approach are consistent with preferences elicited from human experts across a number of datasets. The similarities of our approach to Bayesian information criteria (BIC) and minimum description length (MDL) are also analyzed.

@Conference{CaoAST,
Title = {Automatic Selection of t-SNE Perplexity},
Author = {Yanshuai Cao and Luyu Wang},
Year = {2017},
Abstract = {t-Distributed Stochastic Neighbor Embedding (t-SNE) is one of the most widely used dimensionality reduction methods for data visualization, but it has a perplexity hyperparameter that requires manual selection. In practice, proper tuning of t-SNE perplexity requires users to understand the inner working of the method as well as to have hands-on experience. We propose a model selection objective for t-SNE perplexity that requires negligible extra computation beyond that of the t-SNE itself. We empirically validate that the perplexity settings found by our approach are consistent with preferences elicited from human experts across a number of datasets. The similarities of our approach to Bayesian information criteria (BIC) and minimum description length (MDL) are also analyzed.},
Journal = {International Conference on Machine Learning (Workshop on AutoML), 2017},
Url = {http://arxiv.org/abs/1708.03229}
}

6. K. Y. C. Lui, Y. Cao, M. Gazeau, and K. S. Zhang
Implicit Manifold Learning on Generative Adversarial Networks.
International Conference on Machine Learning (Workshop on Implicit Models), 2017.

This paper raises an implicit manifold learning perspective in Generative Adversarial Networks (GANs), by studying how the support of the learned distribution, modelled as a submanifold $\mathcal{M}_{\theta}$, perfectly match with $\mathcal{M}_{r}$, the support of the real data distribution. We show that optimizing Jensen-Shannon divergence forces $\mathcal{M}_{\theta}$ to perfectly match with $\mathcal{M}_{r}$, while optimizing Wasserstein distance does not. On the other hand, by comparing the gradients of the Jensen-Shannon divergence and the Wasserstein distances ($W_1$ and $W_2^2$) in their primal forms, we conjecture that Wasserstein $W_2^2$ may enjoy desirable properties such as reduced mode collapse. It is therefore interesting to design new distances that inherit the best from both distances.

@Conference{LuiIML,
Title = {Implicit Manifold Learning on Generative Adversarial Networks},
Author = {Kry Yik Chau Lui and Yanshuai Cao and Maxime Gazeau and Kelvin Shuangjian Zhang},
Year = {2017},
Abstract = {This paper raises an implicit manifold learning perspective in Generative Adversarial Networks (GANs), by studying how the support of the learned distribution, modelled as a submanifold $\mathcal{M}_{\theta}$, perfectly match with $\mathcal{M}_{r}$, the support of the real data distribution. We show that optimizing Jensen-Shannon divergence forces $\mathcal{M}_{\theta}$ to perfectly match with $\mathcal{M}_{r}$, while optimizing Wasserstein distance does not. On the other hand, by comparing the gradients of the Jensen-Shannon divergence and the Wasserstein distances ($W_1$ and $W_2^2$) in their primal forms, we conjecture that Wasserstein $W_2^2$ may enjoy desirable properties such as reduced mode collapse. It is therefore interesting to design new distances that inherit the best from both distances.},
Journal = {International Conference on Machine Learning (Workshop on Implicit Models)},
Url = {https://arxiv.org/abs/1710.11260}
}