Photo-based document scanning

Design team members: Simon Ruggier

Supervisors: Alexander Wong

Background

The last few years have seen widespread availability of high-quality digital cameras that can legibly capture documents in digital form. However, in order for this to be a convenient way to scan documents, there needs to be a quick, easy way to extract documents from digital photographs and process them into flat, cropped images that resemble the output of a flatbed scanner. So far, there has been little to no availability of software to automate this process.

Project description

Design an algorithm that searches for page-like objects in photographs, determines their position in 3D space from cues such as page edges and text layout, and transforms them into flat, cropped images suitable for reading, storage, or optical character recognition.

Design methodology

Convenience is the main advantage this alternative has versus other scanning methods that can achieve much higher quality. Thus, in order for a design to be viable, it has to lend itself to a convenient workflow for performing the post-processing. This means that user interaction should be kept to a minimum, and the algorithm should be as robust as possible against varying lighting conditions, noise in low-light photographs, and pages that aren't flat, since it is often inconvenient to correct these things before taking a photograph.

In order to achieve these goals, the problem has been broken down into several subproblems:

  • Classifying image regions as text or not text
  • Identifying page-like objects in images
  • Identifying individual lines of text within pages
  • Obtaining a 3D model of the position of pages based on edges and text lines

Preliminary results

Pictured below are the results of some initial work on classifying text regions:

Classifying text image for test 1
Classifying text test 1 result

Classifying text image for test 2
Classifying text test 2 result