Feed-forward neural networks can be understood as a combination of an intermediate representation and a linear hypothesis. While most previous works aim to diversify the representations, we explore the complementary direction by performing an adaptive and data-dependent regularization motivated by the empirical Bayes method. In this talk, I will talk about our recent work on learning neural networks with adaptive regularization in the regime of limited data. Empirically, we demonstrate that the proposed method helps networks converge to local optima with smaller stable ranks and spectral norms. These properties suggest better generalizations and we present empirical results to support this expectation. We also verify the effectiveness of the approach on multiclass classification and multitask regression problems with various network structures. This is a joint work with Yao-Hung Hubert Tsai, Ruslan Salakhutdinov, and Geoffrey J. Gordon.