Fitting larger networks into memory.

GPU memory is often the limiting factor for modern neural network architectures. Memory requirement to train a neural network increases linearly with both network depth and batch-size. You want to go deeper for standard reasons, but also to increase the batch-size to make use of second order methods like KFAC. Such methods need fewer examples to learn compared to mini-batch SGD.

https://medium.com/@yaroslavvb/fitting-larger-networks-into-memory-583e3c758ff9