Visual recognition involves reasoning about structured relations at multiple levels of detail. For example, human behaviour analysis requires comprehensive labeling covering individual low-level actions to pair-wise interactions through to high-level events. Scene understanding can benefit from considering labels and their inter-relations. In this talk, Prof. Greg Mori presents recent work on deep learning approaches capable of modeling these structures.