<?xml version="1.0" encoding="UTF-8"?><xml><records><record><source-app name="Biblio" version="7.x">Drupal-Biblio</source-app><ref-type>47</ref-type><contributors><authors><author><style face="normal" font="default" size="100%">Aishwarya Krishna Allada</style></author><author><style face="normal" font="default" size="100%">Yuanxin Wang</style></author><author><style face="normal" font="default" size="100%">Veni Jindal</style></author><author><style face="normal" font="default" size="100%">Morteza Babee</style></author><author><style face="normal" font="default" size="100%">H.R. Tizhoosh</style></author><author><style face="normal" font="default" size="100%">Mark Crowley</style></author></authors></contributors><titles><title><style face="normal" font="default" size="100%">Analysis of Language Embeddings for Classification of Unstructured Pathology Reports</style></title><secondary-title><style face="normal" font="default" size="100%">International Conference of the IEEE Engineering in Medicine and Biology Society</style></secondary-title></titles><keywords><keyword><style  face="normal" font="default" size="100%">Deep Neural Networks</style></keyword><keyword><style  face="normal" font="default" size="100%">digital pathology</style></keyword><keyword><style  face="normal" font="default" size="100%">natural language processing</style></keyword><keyword><style  face="normal" font="default" size="100%">proj-digipath</style></keyword><keyword><style  face="normal" font="default" size="100%">year-in-review-2021</style></keyword></keywords><dates><year><style  face="normal" font="default" size="100%">2021</style></year><pub-dates><date><style  face="normal" font="default" size="100%">November</style></date></pub-dates></dates><publisher><style face="normal" font="default" size="100%">IEEE</style></publisher><pages><style face="normal" font="default" size="100%">4</style></pages><language><style face="normal" font="default" size="100%">eng</style></language><abstract><style face="normal" font="default" size="100%">A pathology report is one of the most significant medical documents providing interpretive insights into the visual appearance of the patient's biopsy sample. In digital pathology, high-resolution images of tissue samples are stored along with pathology reports. Despite the valuable information that pathology reports hold, they are not used in any systematic manner to promote computational pathology. In this work, we focus on analyzing the reports, which are generally unstructured documents written in English with sophisticated and highly specialized medical terminology. We provide a comparative analysis of various embedding models like BioBERT, Clinical BioBERT, BioMed-RoBERTa and Term Frequency-Inverse Document Frequency (TF-IDF), a traditional NLP technique, as well as the combination of embeddings from pre-trained models with TF-IDF. Our results demonstrate the effectiveness of various word embedding techniques for pathology reports.</style></abstract></record></records></xml>