Real-world, large-scale semantic segmentation datasets are expensive and time-consuming to create. Thus, the research community has explored the use of video game worlds and simulator environments to produce large-scale synthetic datasets, mainly to supplement the real-world ones for training deep neural networks. Another use of synthetic datasets is to enable highly controlled and repeatable experiments, thanks to the ability to manipulate the content and rendering of the synthesized imagery. To this end, we outline a method to generate an arbitrarily large, semantic segmentation dataset reflecting real-world features, while minimizing the required cost and man-hours. We demonstrate its use by generating ProcSy (pronounced "proxy"), a synthetic dataset for semantic segmentation, which is modeled on a real-world urban environment and features a range of variable influence factors, such as weather and lighting. Our experiments investigate impact of the factors on performance of a state-of-the-art deep network. Among others, we show that including as little as 3% of rainy images in the training set, improved the mIoU of the network on rainy images by about 10%, while training with more than 15% rainy images has diminishing returns. We provide ProcSy dataset, along with generated 3D assets and code, as supplementary material.
Paper
The full paper can be accessed on CVF Open Access. Please follow this link.
Dataset
Type | Link(s) |
---|---|
Base RGB Images (26.5 gb) |
Part 1 Part 2 Part 3 Part 4 Part 5 |
GT_ID Images (499 mb) | Part 1 |
Depth Images (8.5 gb) |
Part 1 Part 2 Part 3 Part 4 Part 5 |
Vehicle Occlusion Maps (82 mb) | Part 1 |
Weather and Lighting Variational RGB Images (27.5 gb) |
Part 1 Part 2 Part 3 Part 4 Part 5 |
License agreement
This dataset is made freely available to academic and non-academic entities for non-commercial purposes such as academic research, teaching, scientific publications, or personal experimentation. Permission is granted to use the data given that you agree:
- That the dataset comes “AS IS”, without express or implied warranty. Although every effort has been made to ensure accuracy, we (Waterloo Intelligent Systems Engineering Lab, University of Waterloo, Canada) do not accept any responsibility for errors or omissions.
- That you include a reference to the ProcSy Dataset in any work that makes use of the dataset. For research papers, cite our preferred publication; for other media cite ProcSy website.
- That you do not distribute this dataset or modified versions. It is permissible to distribute derivative works in as far as they are abstract representations of this dataset (such as models trained on it or additional annotations that do not directly include any of our data) and do not allow to recover the dataset or something similar in character.
- That you may not use the dataset or any derivative work for commercial purposes as, for example, licensing or selling the data, or using the data with a purpose to procure a commercial gain.
- That all rights not expressly granted to you are reserved by us (Waterloo Intelligent Systems Engineering Lab, University of Waterloo, Canada).
UE4 Project (modified CARLA 0.9.2; procedural assets)
(24.5 gb) |
Part 1 Part 2 Part 3 Part 4 Part 5 Part 6 Part 7 Part 8 Part 9 Part 10 Part 11 Part 12 Part 13 Part 14 Part 15 Part 16 Part 17 Part 18 Part 19 Part 20 Part 21 Part 22 Part 23 Part 24 Part 25 |
TODO
- create git repository for project scripts
- document project scripts and modifications to CARLA files
- study effects of combination of influence factors
- understand correlation of weather/lighting variations with real-world data