Candidate: Aishwarya Krishna Allada
Title: Histopathology Image analysis and NLP for Digital Pathology
Date: August 10, 2021
Time: 12:00 pm
Place: MS Teams
Supervisor(s): Crowley, Mark
Informative technologies based on Machine Learning (ML) with quantitative imaging and texts are playing an essential role, particularly in general medicine and oncology. Deep Learning (DL) in particular has demonstrated significant breakthroughs in Computer Vision and Natural language Processing (NLP) which could enhance disease detection and the establishment of efficient treatments. Furthermore, considering the huge number of people with cancer and the substantial volume of data generated during cancer treatment, there is a significant interest in the use of Artificial Intelligence (AI) to improve oncologic care. In digital pathology, high-resolution microscope images of tissue samples are stored along with written medical reports in databases which are used by pathologists. The diagnosis is made through tissue analysis of the biopsy sample and is written as a brief unstructured report which is stored as free text in Electronic Medical Record (EMR) systems. For the transition towards digitization of medical records to achieve its maximum benefits, these reports must be accessible and usable by medical practitioners to easily understand them and to help them precisely identify the disease. Concerning the histopathology images, which is the diagnosis and study of diseases of the tissues, image analysis helps us identify the disease’s location and allows us to classify the type of cancer. Recently, due to the abundant accumulation of Whole Slide Images (WSIs), there has been an increased demand for effective and efficient gigapixel image analysis, such as computer-aided diagnosis using DL techniques.
Also, due to high diversity of shapes and structures in WSIs, it is not possible to use conventional DL techniques for classification. Though computer-aided diagnosis using DL has good prediction accuracy, in the medical domain, there is a need to explain the prediction of the model to have a better understanding beyond standard quantitative performance evaluation. This thesis presents three different findings. Firstly, I provide a comparative analysis of various transformer models such as BioBERT, Clinical BioBERT, BioMed-RoBERTa and Term Frequency-Inverse Document Frequency (TF-IDF) and our results demonstrates the effectiveness of various word embedding techniques for pathology reports in the classification task. Secondly, with the help of slide labels of WSIs, I classify them to their disease types, with an architecture having attention mechanism and instance-level clustering. Finally, I introduced a method to fuse the features of the pathology reports and the features of their respective images. I inferred the effect of combination of the features in the classification of both histopathology images and their respective reports simultaneously. This proved to be better than the individual classification tasks achieving an accuracy of 95.73%.