Graduate mentor's supervisor: Prof. Helen Chen
Doctors spend a significant amount of time documenting patient visits instead of focusing on patient care. Recent advances in Artificial Intelligence (AI) have made it possible to automatically convert conversations into structured clinical notes, reducing administrative burden and improving efficiency.
While AI medical scribes are becoming available for English, there are very few solutions for underrepresented languages such as Kazakh. Kazakh language presents a very interesting phenomenon in which speakers often shuffle between two languages during a conversation. This is known as code switching. Models are usually trained on a single language and often start hallucinating when code-switching occurs.
This project explores how AI can help understand bilingual doctor–patient conversations and automatically generate accurate medical documentation. It has the potential to improve healthcare accessibility and reduce documentation workload for clinicians serving multilingual populations. We have already build 280 hours speech corpus containing code-switched Kazakh-Russian medical data. We now collecting an additional 100 hours of simulated doctor and patient conversations to improve model performance.
Students will contribute to building datasets, evaluating AI models, and developing tools that support healthcare professionals. Through this project, students will gain experience in artificial intelligence, machine learning, speech technology, data analysis, software development, and academic research while contributing to a real-world problem with potential impact in healthcare and language technology.
Short-term goals (during the UR2PhD term):
- Conduct a literature review on speech recognition, code switching, and AI medical scribes.
- Explore and analyze multilingual speech datasets.
- Assist with data cleaning, organization, and quality assessment.
- Review audio recordings and transcripts to identify transcription errors.
- Investigate examples of code switching in Kazakh-Russian conversations.
- Develop simple scripts for data preprocessing and analysis.
- Present findings and research progress as part of the CRA course activities.
Medium-term goals (For students interested in continuing beyond the program):
- Develop tools for annotation and quality control of speech datasets.
- Compare the performance of different speech recognition models.
- Investigate how code switching affects model accuracy.
- Create evaluation benchmarks for multilingual speech recognition.
- Explore methods to reduce hallucinations in AI-generated transcripts.
In terms of long-term goals, students who continue the project through URAs or future research opportunities may:
- Contribute to the development of a Kazakh-Russian AI medical scribe. We would like to expand it across other domains such as call centers, mining companies, and oil and gas companies. Also, we have other languages that would like to build such datasets in other languages
- Participate in the creation of open-source datasets and benchmarks.
- Co-author research papers or conference submissions.
- Develop AI tools that can assist healthcare professionals in multilingual environments.
- Contribute to advancing speech technology for underrepresented languages.
Students may divide responsibilities across the team:
- Data Scientist team: dataset organization, quality review, dataset alignment, text-to-speech data generation
- Analysis Team: error analysis and identification of code-switching patterns.
- ML Engineering team: model fine-tuning experiments
- Research Team: literature review and documentation of findings.(We also would like to try improving performance using knowledge graph)
Students will collaborate regularly and combine their results to better understand the challenges of multilingual speech recognition.
This project is suitable for students who have completed introductory programming courses and are interested in Artificial Intelligence, Machine Learning, Natural Language Processing, Data Science, Healthcare Technology, or Human-Computer Interaction. Students from the second year and above are encouraged to apply. No prior research experience or healthcare background is required. Curiosity, willingness to learn, and basic programming skills are more important than prior knowledge of AI. Experience with Python is helpful but not required.