Please note: This master’s thesis presentation will take place online.
Soroosh Baselizadeh, Master’s candidate
David R. Cheriton School of Computer Science
Supervisors: Professors Yuri Boykov, Olga Veksler
Conventional semantic ‘instance’ segmentation methods offer a segmentation mask for each object instance in an image along with its semantic class label. These methods excel in distinguishing instances, whether they belong to the same class or different classes, providing valuable information about the scene. However, these methods lack the ability to provide depth-related information, thus unable to capture the 3D geometry of the scene.
One option to derive 3D information about a scene is monocular depth estimation. It predicts the absolute distance from the camera to each pixel in an image. However, monocular depth estimation has limitations. It lacks semantic information about object classes. Furthermore, it is not precise enough to reliably detect instances or establish depth order for known instances.
Even a coarse 3D geometry, such as the relative depth or occlusion order of objects is useful to obtain rich 3D-informed scene analysis. Based on this, we address occlusion-ordered semantic instance segmentation (OOSIS), which augments standard semantic instance segmentation by incorporating a coarse 3D geometry of the scene. By leveraging occlusion as a strong depth cue, OOSIS estimates a partial relative depth ordering of instances based on their occlusion relations. OOSIS produces two outputs: instance masks and their classes, as well as the occlusion ordering of those predicted instances.
Existing works pre-date deep learning and rely on simple visual cues such as the y-coordinate of objects for occlusion ordering. This thesis introduces two deep learning-based approaches for OOSIS. The first approach, following a top-down strategy, determines pairwise occlusion order between instances obtained by a standard instance segmentation method. However, this approach lacks global occlusion ordering consistency, having undesired cyclic orderings. Our second approach is bottom-up. It simultaneously derives instances and their occlusion order by grouping pixels into instances and assigning occlusion order labels. This approach ensures a globally consistent occlusion ordering. As part of this approach, we develop a novel deep model that predicts the boundaries where occlusion occurs plus the orientation of occlusion at the boundary, indicating which side of it occludes the other. The output of this model is utilized to obtain instances and their corresponding ordering by our proposed discrete optimization formulation.
To assess the performance of OOSIS methods, we introduce a novel evaluation metric capable of simultaneously evaluating instance segmentation and occlusion ordering. In addition, we utilize standard metrics for evaluating the quality of instance masks. We also evaluate occlusion ordering consistency, and oriented occlusion boundaries. We conduct evaluations on KINS and COCOA datasets.