The variational autoencoder (VAE) can learn the manifold of natural images on certain datasets, as evidenced by meaningful interpolation or extrapolation in the continuous latent space. However, on discrete data such as text, it is unclear if unsupervised learning can discover a similar latent space that allows controllable manipulation. In this work, we find that sequence VAEs trained on text fail to properly decode when the latent codes are manipulated, because the modified codes often land in holes or vacant regions in the aggregated posterior latent space, where the decoding network fails to generalize. Both as a validation of the explanation and as a fix to the problem, we propose to constrain the posterior mean to a learned probability simplex, and perform manipulation within this simplex. Our proposed method mitigates the latent vacancy problem and achieves the first success in unsupervised learning of controllable representations for text. Empirically, our method outperforms unsupervised baselines and strong supervised approaches on text style transfer. On automatic evaluation metrics used in text style transfer, even with the decoding network trained from scratch, our method achieves comparable results with state-of-the-art supervised approaches leveraging large-scale pre-trained models for generation. Furthermore, it is capable of performing more flexible fine-grained control over text generation than existing methods.


  title={On Variational Learning of Controllable Representations for Text without Supervision},
  author={Xu, Peng and Cheung, Jackie Chi Kit and Cao, Yanshuai},
  booktitle = {ICML},

Related Research