Assessing the generalizability of a deep learning based approach to predict breast cancer grade
- Name(s): Laurence Liss
- Faculty Mentor: Dr. Mark Zarella and Dr. David Breen
Abstract
Correctly identifying the pathology of a tumor quickly has a significant impact on survival rates. RNA-based genomic tests provide a highly accurate way to assess pathology but can be costly or unavailable in certain areas. It is therefore desirable to identify cases that are likely to benefit from genomic testing to ensure that limited resources are being used most effectively. Creating hematoxylin and eosin (H&E) stained histological images from tissue samples is common and relatively low cost. It is often used for assessing pathology but these assessments require expert knowledge. In the paper “Image analysis with deep learning to predict breast cancer grade, ER status, histologic subtype, and intrinsic subtype” (https://www.nature.com/articles/s41523-018-0079-1) the researchers were able to use machine learning algorithms to provide scoring on histological images of breast cancer tumors (both benign and malignant) with a high degree of accuracy as compared to human pathologists. My research aims to evaluate the performance of the programs used for the original research with a new data set. The hope is that the results will provide detail as to the generalizability of the solution and the potential for future enhancements.
Social Relevance and Potential Impact
Improving the accuracy of automated test results could dramatically improve outcomes for breast cancer patients, particularly those in areas where access to advanced test (such as RNA-based genomic testing) is expensive or unavailable.
Objectives/Timeline
- Week 1: Project research and proposal.
- Week 2: Successfully run original program and produce scoring on sample data.
- Week 3: Develop setup scripts to prepare new data sets to be used in the program.
- Week 4: Train the algorithms as needed and evaluate the performance.
- Week 5: Assess data and results. Prepare poster.
- Week 6: Document additional results. Practice poster session.
Resources Needed
Ideally, I would have access to both data sets used in the original paper. I have already acquired one of these sets but am not optimistic about the second one.
New Knowledge Needed
I will need to learn how neural networks and specifically convolutional neural networks (CNN) operate and their features as well as strengths and weaknesses.
New Classroom Materials
Convolutional Neural Network sample program and associated lecture. The project involves the real-world application of a neural network used with a training set. Through the research experience, I intend to show students the strengths and weaknesses of this approach to solving classification problems. In doing so I can demonstrate the concepts of overtraining as well as problems with generalizability of solutions and what issues programmers must consider when pondering the use of such machine learning techniques in different scenarios.