Dmitrii
Marin,
PhD
candidate
David
R.
Cheriton
School
of
Computer
Science
Deep learning models generalize limitedly to new datasets and require notoriously large amounts of labeled data for training. The latter problem is exacerbated by the need of ensuring that trained models are accurate in large variety of image scenes. The diversity of images comes from combinatorial nature of real world scenes, occlusions, variations in lightning, acquisition methods, etc. Many rare images may have little chance to be included in a dataset, but are still very important, as they often represent situations where a recognition mistake has a high cost.
This motivates the need of acquiring ever larger labeled datasets. While in some classic problems in computer vision obtaining labels is relatively cheap, e.g. in classification, other problems may require many hours of meticulous human labor. One of such demanding problems is semantic segmentation. It requires class assignment for each image pixel, which could be millions in a single image. There is a special setting called weak supervision where instead of labeling each pixels we allow labeling just few of them. Interestingly, this machine learning setting is similar to low level interactive segmentation problems where extensively developed methods aim to turn such weak supervision into full labeling. While it is possible to use the output of these methods as a ground truth for training deep models, a better approach is to incorporate the corresponding low level objectives into semantic segmentation losses. The resulting losses are often referred as regularized losses.
We continue the line of works on regularized losses for segmentation and explore different regularizers, their properties and corresponding optimization challenges. For example, many efficient shallow optimization solvers are not directly applicable to deep learning training. We explore methods allowing simultaneous use of the efficient shallow solvers and standard gradient based optimization in deep learning.