Project 2.1 - Weakly Labeled Learning
We focus on deep learning with large amounts of images that are only weakly labeled (e.g. only overall diagnosis or treatment outcome is available). We develop techniques that exploit a small set of images with detailed annotations and a large pool of weakly or completely unlabeled data. We exploit shared representations between learning tasks with different localization levels and use active learning where medical experts are asked for feedback on automatically selected cases.
Project Leader
![]() |
Prof.dr. Josien Pluim Eindhoven University of Technology j.pluim@tue.nl |
Co-Applicants
![]() |
Dr. Marleen de Bruijne Erasmus Medical Center marleen.debruijne@erasmusmc.nl |
![]() |
Dr. Mitko Veta Eindhoven University of Technology M.Veta@tue.nl |
Researchers
![]() |
Suzanne Wetstein Eindhoven University of Technology s.c.wetstein@tue.nl |
![]() |
Friso Heslinga Eindhoven University of Technology f.g.heslinga@tue.nl |
![]() |
Gerda Bortsova Erasmus Medical Center g.bortsova@erasmusmc.nl |
Publications
2021 |
Suzanne C Wetstein, Nikolas Stathonikos, Josien PW Pluim, Yujing J Heng, Natalie D ter Hoeve, Celien PH Vreuls, Paul J van Diest, Mitko Veta Deep Learning-Based Grading of Ductal Carcinoma In Situ in Breast Histopathology Images Journal Article Lab Invest, 101 , pp. 525-533, 2021. @article{Wetstein2021Deep, title = {Deep Learning-Based Grading of Ductal Carcinoma In Situ in Breast Histopathology Images}, author = {Suzanne C Wetstein, Nikolas Stathonikos, Josien PW Pluim, Yujing J Heng, Natalie D ter Hoeve, Celien PH Vreuls, Paul J van Diest, Mitko Veta}, doi = {https://doi.org/10.1038/s41374-021-00540-6}, year = {2021}, date = {2021-02-19}, journal = {Lab Invest}, volume = {101}, pages = {525-533}, keywords = {}, pubstate = {published}, tppubtype = {article} } |
Gerda Bortsova, Cristina González-Gonzalo, Suzanne C. Wetstein, Florian Dubost, Ioannis Katramados, Laurens Hogeweg, Bart Liefers, Bram van Ginneken, Josien P.W. Pluim, Mitko Veta, Clara I. Sánchez, Marleen de Bruijne Adversarial attack vulnerability of medical image analysis systems: Unexplored factors Journal Article Medical Image Analysis, 73 , 2021. @article{Borstova2021Adv, title = {Adversarial attack vulnerability of medical image analysis systems: Unexplored factors}, author = {Gerda Bortsova, Cristina González-Gonzalo, Suzanne C. Wetstein, Florian Dubost, Ioannis Katramados, Laurens Hogeweg, Bart Liefers, Bram van Ginneken, Josien P.W. Pluim, Mitko Veta, Clara I. Sánchez, Marleen de Bruijne}, doi = {https://doi.org/10.1016/j.media.2021.102141}, year = {2021}, date = {2021-06-18}, journal = {Medical Image Analysis}, volume = {73}, keywords = {}, pubstate = {published}, tppubtype = {article} } |
Friso G. Heslinga, Ruben T. Lucassen, Myrthe A. van den Berg, Luuk van der Hoek, Josien P. W. Pluim, Javier Cabrerizo, Mark Alberti, Mitko Veta Corneal pachymetry by AS-OCT after Descemet’s membrane endothelial keratoplasty Journal Article Scientific Reports, 11 , pp. 13976, 2021. @article{Heslinga2021Pachy, title = {Corneal pachymetry by AS-OCT after Descemet’s membrane endothelial keratoplasty}, author = {Friso G. Heslinga, Ruben T. Lucassen, Myrthe A. van den Berg, Luuk van der Hoek, Josien P. W. Pluim, Javier Cabrerizo, Mark Alberti, Mitko Veta}, doi = {https://doi.org/10.1038/s41598-021-93186-9}, year = {2021}, date = {2021-07-07}, journal = {Scientific Reports}, volume = {11}, pages = {13976}, keywords = {}, pubstate = {published}, tppubtype = {article} } |
2020 |
Gerda Bortsova, Daniel Bos, Florian Dubost, Meike W. Vernooij, M. Kamran Ikram, Gijs van Tulder, Marleen de Bruijne Automated Assessment of Intracranial Carotid Artery Calcification in Non-Contrast CT Using Deep Learning Journal Article Forthcoming Forthcoming. @article{Bortsova2020a, title = {Automated Assessment of Intracranial Carotid Artery Calcification in Non-Contrast CT Using Deep Learning}, author = {Gerda Bortsova, Daniel Bos, Florian Dubost, Meike W. Vernooij, M. Kamran Ikram, Gijs van Tulder, Marleen de Bruijne}, year = {2020}, date = {2020-08-20}, keywords = {}, pubstate = {forthcoming}, tppubtype = {article} } |
Coen de Vente, Pieter Vos, Matin Hosseinzadeh, Josien Pluim, Mitko Veta Deep Learning Regression for Prostate Cancer Detection and Grading in Bi-parametric MRI Journal Article Forthcoming IEEE Transactions on Biomedical Engineering, Forthcoming. @article{de2020deep, title = {Deep Learning Regression for Prostate Cancer Detection and Grading in Bi-parametric MRI}, author = {Coen de Vente, Pieter Vos, Matin Hosseinzadeh, Josien Pluim, Mitko Veta}, url = {https://ieeexplore.ieee.org/abstract/document/9090311}, year = {2020}, date = {2020-05-08}, journal = {IEEE Transactions on Biomedical Engineering}, abstract = {One of the most common types of cancer in men is prostate cancer (PCa). Biopsies guided by bi-parametric magnetic resonance imaging (MRI) can aid PCa diagnosis. Previous works have mostly focused on either detection or classification of PCa from MRI. In this work, however, we present a neural network that simultaneously detects and grades cancer tissue in an end-to-end fashion. This is more clinically relevant than the classification goal of the ProstateX-2 challenge. We used the dataset of this challenge for training and testing. We use a 2D U-Net with MRI slices as input and lesion segmentation maps that encode the Gleason Grade Group (GGG), a measure for cancer aggressiveness, as output. We propose a method for encoding the GGG in the model target that takes advantage of the fact that the classes are ordinal. Furthermore, we evaluate methods for incorporating prostate zone segmentations as prior information, and ensembling techniques. The model scored a voxel-wise weighted kappa of 0.446 ± 0.082 and a Dice similarity coefficient for segmenting clinically significant cancer of 0.370 ± 0.046, obtained using 5-fold cross-validation. The lesion-wise weighted kappa on the ProstateX-2 challenge test set was 0.13 ± 0.27. We show that our proposed model target outperforms standard multiclass classification and multi-label ordinal regression. Additionally, we present a comparison of methods for further improvement of the model performance.}, keywords = {}, pubstate = {forthcoming}, tppubtype = {article} } One of the most common types of cancer in men is prostate cancer (PCa). Biopsies guided by bi-parametric magnetic resonance imaging (MRI) can aid PCa diagnosis. Previous works have mostly focused on either detection or classification of PCa from MRI. In this work, however, we present a neural network that simultaneously detects and grades cancer tissue in an end-to-end fashion. This is more clinically relevant than the classification goal of the ProstateX-2 challenge. We used the dataset of this challenge for training and testing. We use a 2D U-Net with MRI slices as input and lesion segmentation maps that encode the Gleason Grade Group (GGG), a measure for cancer aggressiveness, as output. We propose a method for encoding the GGG in the model target that takes advantage of the fact that the classes are ordinal. Furthermore, we evaluate methods for incorporating prostate zone segmentations as prior information, and ensembling techniques. The model scored a voxel-wise weighted kappa of 0.446 ± 0.082 and a Dice similarity coefficient for segmenting clinically significant cancer of 0.370 ± 0.046, obtained using 5-fold cross-validation. The lesion-wise weighted kappa on the ProstateX-2 challenge test set was 0.13 ± 0.27. We show that our proposed model target outperforms standard multiclass classification and multi-label ordinal regression. Additionally, we present a comparison of methods for further improvement of the model performance. |
Suzanne C Wetstein, Nikolas Stathonikos, Josien PW Pluim, Yujing J Heng, Natalie D ter Hoeve, Celien PH Vreuls, Paul J van Diest, Mitko Veta Deep Learning-Based Grading of Ductal Carcinoma In Situ in Breast Histopathology Images Journal Article Forthcoming arXiv, Forthcoming. @article{Wetstein2020deep, title = {Deep Learning-Based Grading of Ductal Carcinoma In Situ in Breast Histopathology Images}, author = {Suzanne C Wetstein, Nikolas Stathonikos, Josien PW Pluim, Yujing J Heng, Natalie D ter Hoeve, Celien PH Vreuls, Paul J van Diest, Mitko Veta}, url = {https://arxiv.org/abs/2010.03244}, year = {2020}, date = {2020-10-15}, journal = {arXiv}, abstract = {Ductal carcinoma in situ (DCIS) is a non-invasive breast cancer that can progress into invasive ductal carcinoma (IDC). Studies suggest DCIS is often overtreated since a considerable part of DCIS lesions may never progress into IDC. Lower grade lesions have a lower progression speed and risk, possibly allowing treatment de-escalation. However, studies show significant inter-observer variation in DCIS grading. Automated image analysis may provide an objective solution to address high subjectivity of DCIS grading by pathologists. In this study, we developed a deep learning-based DCIS grading system. It was developed using the consensus DCIS grade of three expert observers on a dataset of 1186 DCIS lesions from 59 patients. The inter-observer agreement, measured by quadratic weighted Cohen's kappa, was used to evaluate the system and compare its performance to that of expert observers. We present an analysis of the lesion-level and patient-level inter-observer agreement on an independent test set of 1001 lesions from 50 patients. The deep learning system (dl) achieved on average slightly higher inter-observer agreement to the observers (o1, o2 and o3) (κo1,dl=0.81,κo2,dl=0.53,κo3,dl=0.40) than the observers amongst each other (κo1,o2=0.58,κo1,o3=0.50,κo2,o3=0.42) at the lesion-level. At the patient-level, the deep learning system achieved similar agreement to the observers (κo1,dl=0.77,κo2,dl=0.75,κo3,dl=0.70) as the observers amongst each other (κo1,o2=0.77,κo1,o3=0.75,κo2,o3=0.72). In conclusion, we developed a deep learning-based DCIS grading system that achieved a performance similar to expert observers. We believe this is the first automated system that could assist pathologists by providing robust and reproducible second opinions on DCIS grade.}, keywords = {}, pubstate = {forthcoming}, tppubtype = {article} } Ductal carcinoma in situ (DCIS) is a non-invasive breast cancer that can progress into invasive ductal carcinoma (IDC). Studies suggest DCIS is often overtreated since a considerable part of DCIS lesions may never progress into IDC. Lower grade lesions have a lower progression speed and risk, possibly allowing treatment de-escalation. However, studies show significant inter-observer variation in DCIS grading. Automated image analysis may provide an objective solution to address high subjectivity of DCIS grading by pathologists. In this study, we developed a deep learning-based DCIS grading system. It was developed using the consensus DCIS grade of three expert observers on a dataset of 1186 DCIS lesions from 59 patients. The inter-observer agreement, measured by quadratic weighted Cohen's kappa, was used to evaluate the system and compare its performance to that of expert observers. We present an analysis of the lesion-level and patient-level inter-observer agreement on an independent test set of 1001 lesions from 50 patients. The deep learning system (dl) achieved on average slightly higher inter-observer agreement to the observers (o1, o2 and o3) (κo1,dl=0.81,κo2,dl=0.53,κo3,dl=0.40) than the observers amongst each other (κo1,o2=0.58,κo1,o3=0.50,κo2,o3=0.42) at the lesion-level. At the patient-level, the deep learning system achieved similar agreement to the observers (κo1,dl=0.77,κo2,dl=0.75,κo3,dl=0.70) as the observers amongst each other (κo1,o2=0.77,κo1,o3=0.75,κo2,o3=0.72). In conclusion, we developed a deep learning-based DCIS grading system that achieved a performance similar to expert observers. We believe this is the first automated system that could assist pathologists by providing robust and reproducible second opinions on DCIS grade. |
Friso G. Heslinga, Mark Alberti, Josien P.W. Pluim, Javier Cabrerizo, Mitko Veta Quantifying Graft Detachment after Descemet's Membrane Endothelial Keratoplasty with Deep Convolutional Neural Networks Journal Article Translational Vision Science & Technology, 9 (48), 2020. @article{heslinga2020quantifying, title = {Quantifying Graft Detachment after Descemet's Membrane Endothelial Keratoplasty with Deep Convolutional Neural Networks}, author = {Friso G. Heslinga, Mark Alberti, Josien P.W. Pluim, Javier Cabrerizo, Mitko Veta}, url = {https://tvst.arvojournals.org/article.aspx?articleid=2770687}, year = {2020}, date = {2020-08-01}, journal = {Translational Vision Science & Technology}, volume = {9}, number = {48}, abstract = {Purpose: We developed a method to automatically locate and quantify graft detachment after Descemet's Membrane Endothelial Keratoplasty (DMEK) in Anterior Segment Optical Coherence Tomography (AS-OCT) scans. Methods: 1280 AS-OCT B-scans were annotated by a DMEK expert. Using the annotations, a deep learning pipeline was developed to localize scleral spur, center the AS-OCT B-scans and segment the detached graft sections. Detachment segmentation model performance was evaluated per B-scan by comparing (1) length of detachment and (2) horizontal projection of the detached sections with the expert annotations. Horizontal projections were used to construct graft detachment maps. All final evaluations were done on a test set that was set apart during training of the models. A second DMEK expert annotated the test set to determine inter-rater performance. Results: Mean scleral spur localization error was 0.155 mm, whereas the inter-rater difference was 0.090 mm. The estimated graft detachment lengths were in 69% of the cases within a 10-pixel (~150{mu}m) difference from the ground truth (77% for the second DMEK expert). Dice scores for the horizontal projections of all B-scans with detachments were 0.896 and 0.880 for our model and the second DMEK expert respectively. Conclusion: Our deep learning model can be used to automatically and instantly localize graft detachment in AS-OCT B-scans. Horizontal detachment projections can be determined with the same accuracy as a human DMEK expert, allowing for the construction of accurate graft detachment maps. Translational Relevance: Automated localization and quantification of graft detachment can support DMEK research and standardize clinical decision making.}, keywords = {}, pubstate = {published}, tppubtype = {article} } Purpose: We developed a method to automatically locate and quantify graft detachment after Descemet's Membrane Endothelial Keratoplasty (DMEK) in Anterior Segment Optical Coherence Tomography (AS-OCT) scans. Methods: 1280 AS-OCT B-scans were annotated by a DMEK expert. Using the annotations, a deep learning pipeline was developed to localize scleral spur, center the AS-OCT B-scans and segment the detached graft sections. Detachment segmentation model performance was evaluated per B-scan by comparing (1) length of detachment and (2) horizontal projection of the detached sections with the expert annotations. Horizontal projections were used to construct graft detachment maps. All final evaluations were done on a test set that was set apart during training of the models. A second DMEK expert annotated the test set to determine inter-rater performance. Results: Mean scleral spur localization error was 0.155 mm, whereas the inter-rater difference was 0.090 mm. The estimated graft detachment lengths were in 69% of the cases within a 10-pixel (~150{mu}m) difference from the ground truth (77% for the second DMEK expert). Dice scores for the horizontal projections of all B-scans with detachments were 0.896 and 0.880 for our model and the second DMEK expert respectively. Conclusion: Our deep learning model can be used to automatically and instantly localize graft detachment in AS-OCT B-scans. Horizontal detachment projections can be determined with the same accuracy as a human DMEK expert, allowing for the construction of accurate graft detachment maps. Translational Relevance: Automated localization and quantification of graft detachment can support DMEK research and standardize clinical decision making. |
Kevin H. Kensler; Emily Z.F. Liu; Suzanne C. Wetstein; Allison M. Onken; Christina I. Luffman; Gabrielle M. Baker; Laura C. Collins; Stuart J. Schnitt; Vanessa C. Bret-Mounet; Mitko Veta; Josien P.W. Pluim; Ying Liu; Graham A. Colditz; A. Heather Eliassen; Susan E. Hankinson; Rulla M. Tamimi; Yujing J. Heng Automated quantitative measures of terminal duct lobular unit involution and breast cancer risk Journal Article Cancer epidemiology, biomarkers & prevention, 29 (11), 2020. @article{kensler2020automated, title = {Automated quantitative measures of terminal duct lobular unit involution and breast cancer risk}, author = {Kevin H. Kensler and Emily Z.F. Liu and Suzanne C. Wetstein and Allison M. Onken and Christina I. Luffman and Gabrielle M. Baker and Laura C. Collins and Stuart J. Schnitt and Vanessa C. Bret-Mounet and Mitko Veta and Josien P.W. Pluim and Ying Liu and Graham A. Colditz and A. Heather Eliassen and Susan E. Hankinson and Rulla M. Tamimi and Yujing J. Heng}, url = {https://cebp.aacrjournals.org/content/29/11/2358.abstract}, year = {2020}, date = {2020-11-01}, journal = {Cancer epidemiology, biomarkers & prevention}, volume = {29}, number = {11}, abstract = {Background: Manual qualitative and quantitative measures of terminal duct lobular unit (TDLU) involution were previously reported to be inversely associated with breast cancer risk. We developed and applied a deep learning method to yield quantitative measures of TDLU involution in normal breast tissue. We assessed the associations of these automated measures with breast cancer risk factors and risk. Methods: We obtained eight quantitative measures from whole slide images from a benign breast disease (BBD) nested case–control study within the Nurses’ Health Studies (287 breast cancer cases and 1,083 controls). Qualitative assessments of TDLU involution were available for 177 cases and 857 controls. The associations between risk factors and quantitative measures among controls were assessed using analysis of covariance adjusting for age. The relationship between each measure and risk was evaluated using unconditional logistic regression, adjusting for the matching factors, BBD subtypes, parity, and menopausal status. Qualitative measures and breast cancer risk were evaluated accounting for matching factors and BBD subtypes. Results: Menopausal status and parity were significantly associated with all eight measures; select TDLU measures were associated with BBD histologic subtype, body mass index, and birth index (P < 0.05). No measure was correlated with body size at ages 5–10 years, age at menarche, age at first birth, or breastfeeding history (P > 0.05). Neither quantitative nor qualitative measures were associated with breast cancer risk. Conclusions: Among Nurses’ Health Studies women diagnosed with BBD, TDLU involution is not a biomarker of subsequent breast cancer. Impact: TDLU involution may not impact breast cancer risk as previously thought.}, keywords = {}, pubstate = {published}, tppubtype = {article} } Background: Manual qualitative and quantitative measures of terminal duct lobular unit (TDLU) involution were previously reported to be inversely associated with breast cancer risk. We developed and applied a deep learning method to yield quantitative measures of TDLU involution in normal breast tissue. We assessed the associations of these automated measures with breast cancer risk factors and risk. Methods: We obtained eight quantitative measures from whole slide images from a benign breast disease (BBD) nested case–control study within the Nurses’ Health Studies (287 breast cancer cases and 1,083 controls). Qualitative assessments of TDLU involution were available for 177 cases and 857 controls. The associations between risk factors and quantitative measures among controls were assessed using analysis of covariance adjusting for age. The relationship between each measure and risk was evaluated using unconditional logistic regression, adjusting for the matching factors, BBD subtypes, parity, and menopausal status. Qualitative measures and breast cancer risk were evaluated accounting for matching factors and BBD subtypes. Results: Menopausal status and parity were significantly associated with all eight measures; select TDLU measures were associated with BBD histologic subtype, body mass index, and birth index (P < 0.05). No measure was correlated with body size at ages 5–10 years, age at menarche, age at first birth, or breastfeeding history (P > 0.05). Neither quantitative nor qualitative measures were associated with breast cancer risk. Conclusions: Among Nurses’ Health Studies women diagnosed with BBD, TDLU involution is not a biomarker of subsequent breast cancer. Impact: TDLU involution may not impact breast cancer risk as previously thought. |
Suzanne C. Wetstein, Allison M. Onken, Christina Luffman, Gabrielle M. Baker, Michael E. Pyle, Kevin H. Kensler, Ying Liu, Bart Bakker, Ruud Vlutters, Marinus B. van Leeuwen, Laura C. Collins, Stuart J. Schnitt, Josien P. W. Pluim, Rulla M. Tamimi, Yujing J. Heng, Mitko Veta Deep learning assessment of breast terminal duct lobular unit involution: Towards automated prediction of breast cancer risk Journal Article PLoS ONE, 15 (4), pp. e0231653, 2020. @article{TDLUpaper, title = {Deep learning assessment of breast terminal duct lobular unit involution: Towards automated prediction of breast cancer risk}, author = {Suzanne C. Wetstein, Allison M. Onken, Christina Luffman, Gabrielle M. Baker, Michael E. Pyle, Kevin H. Kensler, Ying Liu, Bart Bakker, Ruud Vlutters, Marinus B. van Leeuwen, Laura C. Collins, Stuart J. Schnitt, Josien P. W. Pluim, Rulla M. Tamimi, Yujing J. Heng, Mitko Veta}, editor = {Ulas Bagci}, url = {https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0231653}, doi = {10.1371/journal.pone.0231653}, year = {2020}, date = {2020-04-15}, journal = {PLoS ONE}, volume = {15}, number = {4}, pages = {e0231653}, abstract = {Terminal duct lobular unit (TDLU) involution is the regression of milk-producing structures in the breast. Women with less TDLU involution are more likely to develop breast cancer. A major bottleneck in studying TDLU involution in large cohort studies is the need for labor-intensive manual assessment of TDLUs. We developed a computational pathology solution to automatically capture TDLU involution measures. Whole slide images (WSIs) of benign breast biopsies were obtained from the Nurses’ Health Study. A set of 92 WSIs was annotated for acini, TDLUs and adipose tissue to train deep convolutional neural network (CNN) models for detection of acini, and segmentation of TDLUs and adipose tissue. These networks were integrated into a single computational method to capture TDLU involution measures including number of TDLUs per tissue area, median TDLU span and median number of acini per TDLU. We validated our method on 40 additional WSIs by comparing with manually acquired measures. Our CNN models detected acini with an F1 score of 0.73±0.07, and segmented TDLUs and adipose tissue with Dice scores of 0.84±0.13 and 0.87±0.04, respectively. The inter-observer ICC scores for manual assessments on 40 WSIs of number of TDLUs per tissue area, median TDLU span, and median acini count per TDLU were 0.71, 0.81 and 0.73, respectively. Intra-observer reliability was evaluated on 10/40 WSIs with ICC scores of >0.8. Inter-observer ICC scores between automated results and the mean of the two observers were: 0.80 for number of TDLUs per tissue area, 0.57 for median TDLU span, and 0.80 for median acini count per TDLU. TDLU involution measures evaluated by manual and automated assessment were inversely associated with age and menopausal status. We developed a computational pathology method to measure TDLU involution. This technology eliminates the labor-intensiveness and subjectivity of manual TDLU assessment, and can be applied to future breast cancer risk studies.}, keywords = {}, pubstate = {published}, tppubtype = {article} } Terminal duct lobular unit (TDLU) involution is the regression of milk-producing structures in the breast. Women with less TDLU involution are more likely to develop breast cancer. A major bottleneck in studying TDLU involution in large cohort studies is the need for labor-intensive manual assessment of TDLUs. We developed a computational pathology solution to automatically capture TDLU involution measures. Whole slide images (WSIs) of benign breast biopsies were obtained from the Nurses’ Health Study. A set of 92 WSIs was annotated for acini, TDLUs and adipose tissue to train deep convolutional neural network (CNN) models for detection of acini, and segmentation of TDLUs and adipose tissue. These networks were integrated into a single computational method to capture TDLU involution measures including number of TDLUs per tissue area, median TDLU span and median number of acini per TDLU. We validated our method on 40 additional WSIs by comparing with manually acquired measures. Our CNN models detected acini with an F1 score of 0.73±0.07, and segmented TDLUs and adipose tissue with Dice scores of 0.84±0.13 and 0.87±0.04, respectively. The inter-observer ICC scores for manual assessments on 40 WSIs of number of TDLUs per tissue area, median TDLU span, and median acini count per TDLU were 0.71, 0.81 and 0.73, respectively. Intra-observer reliability was evaluated on 10/40 WSIs with ICC scores of >0.8. Inter-observer ICC scores between automated results and the mean of the two observers were: 0.80 for number of TDLUs per tissue area, 0.57 for median TDLU span, and 0.80 for median acini count per TDLU. TDLU involution measures evaluated by manual and automated assessment were inversely associated with age and menopausal status. We developed a computational pathology method to measure TDLU involution. This technology eliminates the labor-intensiveness and subjectivity of manual TDLU assessment, and can be applied to future breast cancer risk studies. |
Suzanne C. Wetstein, Cristina González-Gonzalo, Gerda Bortsova, Bart Liefers, Florian Dubost, Ioannis Katramados, Laurens Hogeweg, Bram van Ginneken, Josien P.W. Pluim, Marleen de Bruijne, Clara I. Sánchez, Mitko Veta Adversarial Attack Vulnerability of Medical Image Analysis Systems: Unexplored Factors Conference 2020. @conference{Adversarial2020, title = {Adversarial Attack Vulnerability of Medical Image Analysis Systems: Unexplored Factors}, author = {Suzanne C. Wetstein, Cristina González-Gonzalo, Gerda Bortsova, Bart Liefers, Florian Dubost, Ioannis Katramados, Laurens Hogeweg, Bram van Ginneken, Josien P.W. Pluim, Marleen de Bruijne, Clara I. Sánchez, Mitko Veta}, url = {https://arxiv.org/abs/2006.06356}, year = {2020}, date = {2020-06-11}, urldate = {2020-08-25}, abstract = {Adversarial attacks are considered a potentially serious security threat for machine learning systems. Medical image analysis (MedIA) systems have recently been argued to be particularly vulnerable to adversarial attacks due to strong financial incentives. In this paper, we study several previously unexplored factors affecting adversarial attack vulnerability of deep learning MedIA systems in three medical domains: ophthalmology, radiology and pathology. Firstly, we study the effect of varying the degree of adversarial perturbation on the attack performance and its visual perceptibility. Secondly, we study how pre-training on a public dataset (ImageNet) affects the models' vulnerability to attacks. Thirdly, we study the influence of data and model architecture disparity between target and attacker models. Our experiments show that the degree of perturbation significantly affects both performance and human perceptibility of attacks. Pre-training may dramatically increase the transfer of adversarial examples; the larger the performance gain achieved by pre-training, the larger the transfer. Finally, disparity in data and/or model architecture between target and attacker models substantially decreases the success of attacks. We believe that these factors should be considered when designing cybersecurity-critical MedIA systems, as well as kept in mind when evaluating their vulnerability to adversarial attacks. }, keywords = {}, pubstate = {published}, tppubtype = {conference} } Adversarial attacks are considered a potentially serious security threat for machine learning systems. Medical image analysis (MedIA) systems have recently been argued to be particularly vulnerable to adversarial attacks due to strong financial incentives. In this paper, we study several previously unexplored factors affecting adversarial attack vulnerability of deep learning MedIA systems in three medical domains: ophthalmology, radiology and pathology. Firstly, we study the effect of varying the degree of adversarial perturbation on the attack performance and its visual perceptibility. Secondly, we study how pre-training on a public dataset (ImageNet) affects the models' vulnerability to attacks. Thirdly, we study the influence of data and model architecture disparity between target and attacker models. Our experiments show that the degree of perturbation significantly affects both performance and human perceptibility of attacks. Pre-training may dramatically increase the transfer of adversarial examples; the larger the performance gain achieved by pre-training, the larger the transfer. Finally, disparity in data and/or model architecture between target and attacker models substantially decreases the success of attacks. We believe that these factors should be considered when designing cybersecurity-critical MedIA systems, as well as kept in mind when evaluating their vulnerability to adversarial attacks. |
C. González-Gonzalo, S. C. Wetstein, G. Bortsova, B. Liefers, B. van Ginneken, C. I. Sánchez European Society of Retina Specialists, 2020. @conference{Gonz20c, title = {Are adversarial attacks an actual threat for deep learning systems in real-world eye disease screening settings?}, author = {C. González-Gonzalo, S. C. Wetstein, G. Bortsova, B. Liefers, B. van Ginneken, C. I. Sánchez}, url = {https://www.euretina.org/congress/amsterdam-2020/virtual-2020-freepapers/}, year = {2020}, date = {2020-10-02}, booktitle = {European Society of Retina Specialists}, abstract = {Purpose: Deep learning (DL) systems that perform image-level classification with convolutional neural networks (CNNs) have been shown to provide high-performance solutions for automated screening of eye diseases. Nevertheless, adversarial attacks have been recently screening settings, where there is restricted access to the systems and limited knowledge about certain factors, such as their CNN architecture or the data used for development. Setting: Deep learning for automated screening of eye diseases. Methods: We used the Kaggle dataset for diabetic retinopathy detection. It contains 88,702 manually-labelled color fundus images, which we split into test (12%) and development (88%). Development data were split into two equally-sized sets (d1 and d2); a third set (d3) was generated using half of the images in d2. In each development set, 80%/20% of the images were used for training/validation. All splits were done randomly at patient-level. As attacked system, we developed a randomly-initialized CNN based on the Inception-v3 architecture using d1. We performed the attacks (1) in a white-box (WB) setting, with full access to the attacked system to generate the adversarial images, and (2) in black-box (BB) settings, without access to the attacked system and using a surrogate system to craft the attacks. We simulated different BB settings, sequentially decreasing the available knowledge about the attacked system: same architecture, using d1 (BB-1); different architecture (randomly-initialized DenseNet-121), using d1 (BB-2); same architecture, using d2 (BB-3); different architecture, using d2 (BB-4); different architecture, using d3 (BB-5). In each setting, adversarial images containing non-perceptible noise were generated by applying the fast gradient sign method to each image of the test set and processed by the attacked system. Results: The performance of the attacked system to detect referable diabetic retinopathy without attacks and under the different attack settings was measured on the test set using the area under the receiver operating characteristic curve (AUC). Without attacks, the system achieved an AUC of 0.88. In each attack setting, the relative decrease in AUC with respect to the original performance was computed. In the WB setting, there was a 99.9% relative decrease in performance. In the BB-1 setting, the relative decrease in AUC was 67.3%. In the BB-2 setting, the AUC suffered a 40.2% relative decrease. In the BB-3 setting, the relative decrease was 37.9%. In the BB-4 setting, the relative decrease in AUC was 34.1%. Lastly, in the BB-5 setting, the performance of the attacked system decreased 3.8% regarding its original performance. Conclusions: The results obtained in the different settings show a drastic decrease of the attacked DL system's vulnerability to adversarial attacks when the access and knowledge about it are limited. The impact on performance is extremely reduced when restricting the direct access to the system (from the WB to the BB-1 setting). The attacks become slightly less effective when not having access to the same development data (BB-3), compared to not using the same CNN architecture (BB-2). Attacks' effectiveness further decreases when both factors are unknown (BB-4). If the amount of development data is additionally reduced (BB-5), the original performance barely deteriorates. This last setting is the most similar to realistic screening settings, since most systems are currently closed source and use additional large private datasets for development. In conclusion, these factors should be acknowledged for future development of robust DL systems, as well as considered when evaluating the vulnerability of currently-available systems to adversarial attacks. Having limited access and knowledge about the systems determines the actual threat these attacks pose. We believe awareness about this matter will increase experts' trust and facilitate the integration of DL systems in real-world settings.}, keywords = {}, pubstate = {published}, tppubtype = {conference} } Purpose: Deep learning (DL) systems that perform image-level classification with convolutional neural networks (CNNs) have been shown to provide high-performance solutions for automated screening of eye diseases. Nevertheless, adversarial attacks have been recently screening settings, where there is restricted access to the systems and limited knowledge about certain factors, such as their CNN architecture or the data used for development. Setting: Deep learning for automated screening of eye diseases. Methods: We used the Kaggle dataset for diabetic retinopathy detection. It contains 88,702 manually-labelled color fundus images, which we split into test (12%) and development (88%). Development data were split into two equally-sized sets (d1 and d2); a third set (d3) was generated using half of the images in d2. In each development set, 80%/20% of the images were used for training/validation. All splits were done randomly at patient-level. As attacked system, we developed a randomly-initialized CNN based on the Inception-v3 architecture using d1. We performed the attacks (1) in a white-box (WB) setting, with full access to the attacked system to generate the adversarial images, and (2) in black-box (BB) settings, without access to the attacked system and using a surrogate system to craft the attacks. We simulated different BB settings, sequentially decreasing the available knowledge about the attacked system: same architecture, using d1 (BB-1); different architecture (randomly-initialized DenseNet-121), using d1 (BB-2); same architecture, using d2 (BB-3); different architecture, using d2 (BB-4); different architecture, using d3 (BB-5). In each setting, adversarial images containing non-perceptible noise were generated by applying the fast gradient sign method to each image of the test set and processed by the attacked system. Results: The performance of the attacked system to detect referable diabetic retinopathy without attacks and under the different attack settings was measured on the test set using the area under the receiver operating characteristic curve (AUC). Without attacks, the system achieved an AUC of 0.88. In each attack setting, the relative decrease in AUC with respect to the original performance was computed. In the WB setting, there was a 99.9% relative decrease in performance. In the BB-1 setting, the relative decrease in AUC was 67.3%. In the BB-2 setting, the AUC suffered a 40.2% relative decrease. In the BB-3 setting, the relative decrease was 37.9%. In the BB-4 setting, the relative decrease in AUC was 34.1%. Lastly, in the BB-5 setting, the performance of the attacked system decreased 3.8% regarding its original performance. Conclusions: The results obtained in the different settings show a drastic decrease of the attacked DL system's vulnerability to adversarial attacks when the access and knowledge about it are limited. The impact on performance is extremely reduced when restricting the direct access to the system (from the WB to the BB-1 setting). The attacks become slightly less effective when not having access to the same development data (BB-3), compared to not using the same CNN architecture (BB-2). Attacks' effectiveness further decreases when both factors are unknown (BB-4). If the amount of development data is additionally reduced (BB-5), the original performance barely deteriorates. This last setting is the most similar to realistic screening settings, since most systems are currently closed source and use additional large private datasets for development. In conclusion, these factors should be acknowledged for future development of robust DL systems, as well as considered when evaluating the vulnerability of currently-available systems to adversarial attacks. Having limited access and knowledge about the systems determines the actual threat these attacks pose. We believe awareness about this matter will increase experts' trust and facilitate the integration of DL systems in real-world settings. |
F.G. Heslinga, J.P.W. Pluim, A.J. Houben, M.T. Schram, R.M. Henry, C.D. Stehouwer, M.J. Van Greevenbroek, T.T. Berendschot, M. Veta Direct Classification of Type 2 Diabetes From Retinal Fundus Images in a Population-based Sample From The Maastricht Study Inproceedings Medical Imaging 2020: Computer-Aided Diagnosis, pp. 113141N, International Society for Optics and Photonics, 2020. @inproceedings{heslinga2020direct, title = {Direct Classification of Type 2 Diabetes From Retinal Fundus Images in a Population-based Sample From The Maastricht Study}, author = {F.G. Heslinga, J.P.W. Pluim, A.J. Houben, M.T. Schram, R.M. Henry, C.D. Stehouwer, M.J. Van Greevenbroek, T.T. Berendschot, M. Veta}, url = {https://www.spiedigitallibrary.org/conference-proceedings-of-spie/11314/113141N/Direct-classification-of-type-2-diabetes-from-retinal-fundus-images/10.1117/12.2549574.full?SSO=1}, year = {2020}, date = {2020-03-16}, booktitle = {Medical Imaging 2020: Computer-Aided Diagnosis}, volume = {11314}, pages = {113141N}, publisher = {International Society for Optics and Photonics}, abstract = {Type 2 Diabetes (T2D) is a chronic metabolic disorder that can lead to blindness and cardiovascular disease. Information about early stage T2D might be present in retinal fundus images, but to what extent these images can be used for a screening setting is still unknown. In this study, deep neural networks were employed to differentiate between fundus images from individuals with and without T2D. We investigated three methods to achieve high classification performance, measured by the area under the receiver operating curve (ROC-AUC). A multi-target learning approach to simultaneously output retinal biomarkers as well as T2D works best (AUC = 0.746 [±0.001]). Furthermore, the classification performance can be improved when images with high prediction uncertainty are referred to a specialist. We also show that the combination of images of the left and right eye per individual can further improve the classification performance (AUC = 0.758 [±0.003]), using a simple averaging approach. The results are promising, suggesting the feasibility of screening for T2D from retinal fundus images.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Type 2 Diabetes (T2D) is a chronic metabolic disorder that can lead to blindness and cardiovascular disease. Information about early stage T2D might be present in retinal fundus images, but to what extent these images can be used for a screening setting is still unknown. In this study, deep neural networks were employed to differentiate between fundus images from individuals with and without T2D. We investigated three methods to achieve high classification performance, measured by the area under the receiver operating curve (ROC-AUC). A multi-target learning approach to simultaneously output retinal biomarkers as well as T2D works best (AUC = 0.746 [±0.001]). Furthermore, the classification performance can be improved when images with high prediction uncertainty are referred to a specialist. We also show that the combination of images of the left and right eye per individual can further improve the classification performance (AUC = 0.758 [±0.003]), using a simple averaging approach. The results are promising, suggesting the feasibility of screening for T2D from retinal fundus images. |
2019 |
Ruwan Tennakoon, Gerda Bortsova, Silas Ørting, Amirali K Gostar, Mathilde MW Wille, Zaigham Saghir, Reza Hoseinnezhad, Marleen de Bruijne, Alireza Bab-Hadiashar Classification of Volumetric Images Using Multi-Instance Learning and Extreme Value Theorem Journal Article IEEE Transactions on Medical Imaging, 39 (4), pp. 854-865, 2019. @article{Tennakoon2019, title = {Classification of Volumetric Images Using Multi-Instance Learning and Extreme Value Theorem}, author = {Ruwan Tennakoon, Gerda Bortsova, Silas Ørting, Amirali K Gostar, Mathilde MW Wille, Zaigham Saghir, Reza Hoseinnezhad, Marleen de Bruijne, Alireza Bab-Hadiashar}, doi = {10.1109/TMI.2019.2936244}, year = {2019}, date = {2019-08-19}, journal = {IEEE Transactions on Medical Imaging}, volume = {39}, number = {4}, pages = {854-865}, abstract = {Volumetric imaging is an essential diagnostic tool for medical practitioners. The use of popular techniques such as convolutional neural networks (CNN) for analysis of volumetric images is constrained by the availability of detailed (with local annotations) training data and GPU memory. In this paper, the volumetric image classification problem is posed as a multi-instance classification problem and a novel method is proposed to adaptively select positive instances from positive bags during the training phase. This method uses the extreme value theory to model the feature distribution of the images without a pathology and use it to identify positive instances of an imaged pathology. The experimental results, on three separate image classification tasks (i.e. classify retinal OCT images according to the presence or absence of fluid build-ups, emphysema detection in pulmonary 3D-CT images and detection of cancerous regions in 2D histopathology images) show that the proposed method produces classifiers that have similar performance to fully supervised methods and achieves the state of the art performance in all examined test cases.}, keywords = {}, pubstate = {published}, tppubtype = {article} } Volumetric imaging is an essential diagnostic tool for medical practitioners. The use of popular techniques such as convolutional neural networks (CNN) for analysis of volumetric images is constrained by the availability of detailed (with local annotations) training data and GPU memory. In this paper, the volumetric image classification problem is posed as a multi-instance classification problem and a novel method is proposed to adaptively select positive instances from positive bags during the training phase. This method uses the extreme value theory to model the feature distribution of the images without a pathology and use it to identify positive instances of an imaged pathology. The experimental results, on three separate image classification tasks (i.e. classify retinal OCT images according to the presence or absence of fluid build-ups, emphysema detection in pulmonary 3D-CT images and detection of cancerous regions in 2D histopathology images) show that the proposed method produces classifiers that have similar performance to fully supervised methods and achieves the state of the art performance in all examined test cases. |
Christina I. Luffman, Suzanne C. Wetstein, Allison M. Onken, Michael E. Pyle, Kevin H. Kensler, Ying Liu, Josien P. Pluim, Mitko Veta, Stuart J. Schnitt, Rulla M. Tamimi, Gabrielle M. Baker, Laura C. Collins, Yu Jing Heng Assessing Breast Terminal Duct Lobular Unit Involution: A Computational Pathology Approach Conference Abstracts and Case Studies From the College of American Pathologists 2019 Annual Meeting (CAP19), 143 (9), Archives of Pathology & Laboratory Medicine, 2019. @conference{https://doi.org/10.5858/arpa.2019-0901-AB, title = {Assessing Breast Terminal Duct Lobular Unit Involution: A Computational Pathology Approach}, author = {Christina I. Luffman, Suzanne C. Wetstein, Allison M. Onken, Michael E. Pyle, Kevin H. Kensler, Ying Liu, Josien P. Pluim, Mitko Veta, Stuart J. Schnitt, Rulla M. Tamimi, Gabrielle M. Baker, Laura C. Collins, Yu Jing Heng}, doi = {10.5858/arpa.2019-0901-AB}, year = {2019}, date = {2019-09-01}, booktitle = {Abstracts and Case Studies From the College of American Pathologists 2019 Annual Meeting (CAP19)}, volume = {143}, number = {9}, pages = {e2-e226}, publisher = {Archives of Pathology & Laboratory Medicine}, keywords = {}, pubstate = {published}, tppubtype = {conference} } |
Allison M. Onken, Suzanne Wetstein, Michael Pyle, Josien Pluim, Stuart J. Schnitt, Gabrielle M Baker, Laura C. Collins, Rulla Tamimi, Mitko Veta, Yujing Jan Heng Deep Learning Networks to Segment and Detect Breast Terminal Duct Lobular Units, Acini, and Adipose Tissue: A Step Toward the Automated Analysis of Lobular Involution as a Marker for Breast Cancer Risk Conference United States and Canadian Academy of Pathology (USCAP), 2019. @conference{OnkenWetstein2019, title = {Deep Learning Networks to Segment and Detect Breast Terminal Duct Lobular Units, Acini, and Adipose Tissue: A Step Toward the Automated Analysis of Lobular Involution as a Marker for Breast Cancer Risk}, author = {Allison M. Onken, Suzanne Wetstein, Michael Pyle, Josien Pluim, Stuart J. Schnitt, Gabrielle M Baker, Laura C. Collins, Rulla Tamimi, Mitko Veta, Yujing Jan Heng}, year = {2019}, date = {2019-03-16}, booktitle = {United States and Canadian Academy of Pathology (USCAP)}, abstract = {Background: Terminal duct lobular unit (TDLU) involution is the physiological process whereby Type 2 and 3 lobules revert to Type 1 after child-bearing years. TDLU involution (quantitatively assessed by TDLU count per mm2, TDLU span, and acini count per TDLU) is inversely associated with breast cancer risk. The manual assessment of involution is time-consuming and subjective, making it impractical to perform on large epidemiological studies. Deep learning algorithms such as convolutional neural networks (CNNs) could be utilized for rapid and automated assessment of TDLU involution. We designed two CNNs to segment TDLUs and detect acini as the first step toward large-scale assessment of TDLU involution, and a third CNN to segment adipose tissue. Design: Whole slide images (WSIs; n=50) were obtained from the Nurses’ Health Study Incident Benign Breast Disease Study. For each WSI, TDLUs, acini, and adipose tissue were annotated within a region of interest comprising approximately 10% of the total tissue area. In order to assess involution in histologically normal breast parenchyma only, TDLUs with proliferative or metaplastic changes were excluded from manual evaluation. CNNs were engineered to recognize TDLUs, acini, and adipose tissue using 60% of the WSIs for training, 20% as a test set, and 20% for validation. F1 and Dice scores were calculated as accuracy measures to compare CNN segmentation to manual assessment. Results: Our CNNs detected acini, segmented TDLUs, and segmented adipose tissue with accuracy measures of 0.73, 0.84, and 0.86, respectively. Two primary causes of discordance with manual assessment were identified: 1) complex clustering of TDLUs where our CNN had difficulty predicting TDLU boundaries and 2) acini with proliferative or metaplastic changes which our CNN frequently detected as acini but which were intentionally excluded from manual annotation. Conclusion: We have developed a series of deep learning networks to segment and detect TDLUs, acini, and adipose tissue on WSIs. With accuracy measures of >0.7, our CNNs are sufficiently robust to be integrated into a computational pipeline for automated assessment of the quantitative features of TDLU involution, and will be further refined to address sources of discordance with manual assessment. This is the first step toward the large-scale quantification of TDLU involution which, when applied to patient samples, could be used to better determine the breast cancer risk associated with lobule type and degree of involution.}, keywords = {}, pubstate = {published}, tppubtype = {conference} } Background: Terminal duct lobular unit (TDLU) involution is the physiological process whereby Type 2 and 3 lobules revert to Type 1 after child-bearing years. TDLU involution (quantitatively assessed by TDLU count per mm2, TDLU span, and acini count per TDLU) is inversely associated with breast cancer risk. The manual assessment of involution is time-consuming and subjective, making it impractical to perform on large epidemiological studies. Deep learning algorithms such as convolutional neural networks (CNNs) could be utilized for rapid and automated assessment of TDLU involution. We designed two CNNs to segment TDLUs and detect acini as the first step toward large-scale assessment of TDLU involution, and a third CNN to segment adipose tissue. Design: Whole slide images (WSIs; n=50) were obtained from the Nurses’ Health Study Incident Benign Breast Disease Study. For each WSI, TDLUs, acini, and adipose tissue were annotated within a region of interest comprising approximately 10% of the total tissue area. In order to assess involution in histologically normal breast parenchyma only, TDLUs with proliferative or metaplastic changes were excluded from manual evaluation. CNNs were engineered to recognize TDLUs, acini, and adipose tissue using 60% of the WSIs for training, 20% as a test set, and 20% for validation. F1 and Dice scores were calculated as accuracy measures to compare CNN segmentation to manual assessment. Results: Our CNNs detected acini, segmented TDLUs, and segmented adipose tissue with accuracy measures of 0.73, 0.84, and 0.86, respectively. Two primary causes of discordance with manual assessment were identified: 1) complex clustering of TDLUs where our CNN had difficulty predicting TDLU boundaries and 2) acini with proliferative or metaplastic changes which our CNN frequently detected as acini but which were intentionally excluded from manual annotation. Conclusion: We have developed a series of deep learning networks to segment and detect TDLUs, acini, and adipose tissue on WSIs. With accuracy measures of >0.7, our CNNs are sufficiently robust to be integrated into a computational pipeline for automated assessment of the quantitative features of TDLU involution, and will be further refined to address sources of discordance with manual assessment. This is the first step toward the large-scale quantification of TDLU involution which, when applied to patient samples, could be used to better determine the breast cancer risk associated with lobule type and degree of involution. |
Gerda Bortsova, Florian Dubost, Laurens Hogeweg, Ioannis Katramados, Marleen de Bruijne Semi-supervised medical image segmentation via learning consistency under transformations Inproceedings International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 810-818, Springer, Cham, 2019. @inproceedings{Bortsova2019, title = {Semi-supervised medical image segmentation via learning consistency under transformations}, author = {Gerda Bortsova, Florian Dubost, Laurens Hogeweg, Ioannis Katramados, Marleen de Bruijne}, url = {https://arxiv.org/abs/1911.01218}, doi = {https://doi.org/10.1007/978-3-030-32226-7_90}, year = {2019}, date = {2019-10-13}, urldate = {2020-08-20}, booktitle = {International Conference on Medical Image Computing and Computer-Assisted Intervention}, pages = {810-818}, publisher = {Springer, Cham}, abstract = {The scarcity of labeled data often limits the application of supervised deep learning techniques for medical image segmentation. This has motivated the development of semi-supervised techniques that learn from a mixture of labeled and unlabeled images. In this paper, we propose a novel semi-supervised method that, in addition to supervised learning on labeled training images, learns to predict segmentations consistent under a given class of transformations on both labeled and unlabeled images. More specifically, in this work we explore learning equivariance to elastic deformations. We implement this through: (1) a Siamese architecture with two identical branches, each of which receives a differently transformed image, and (2) a composite loss function with a supervised segmentation loss term and an unsupervised term that encourages segmentation consistency between the predictions of the two branches. We evaluate the method on a public dataset of chest radiographs with segmentations of anatomical structures using 5-fold cross-validation. The proposed method reaches significantly higher segmentation accuracy compared to supervised learning. This is due to learning transformation consistency on both labeled and unlabeled images, with the latter contributing the most. We achieve the performance comparable to state-of-the-art chest X-ray segmentation methods while using substantially fewer labeled images.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } The scarcity of labeled data often limits the application of supervised deep learning techniques for medical image segmentation. This has motivated the development of semi-supervised techniques that learn from a mixture of labeled and unlabeled images. In this paper, we propose a novel semi-supervised method that, in addition to supervised learning on labeled training images, learns to predict segmentations consistent under a given class of transformations on both labeled and unlabeled images. More specifically, in this work we explore learning equivariance to elastic deformations. We implement this through: (1) a Siamese architecture with two identical branches, each of which receives a differently transformed image, and (2) a composite loss function with a supervised segmentation loss term and an unsupervised term that encourages segmentation consistency between the predictions of the two branches. We evaluate the method on a public dataset of chest radiographs with segmentations of anatomical structures using 5-fold cross-validation. The proposed method reaches significantly higher segmentation accuracy compared to supervised learning. This is due to learning transformation consistency on both labeled and unlabeled images, with the latter contributing the most. We achieve the performance comparable to state-of-the-art chest X-ray segmentation methods while using substantially fewer labeled images. |
Suzanne C. Wetstein, Allison M. Onken, Gabrielle M. Baker, Michael E. Pyle, Josien P. W. Pluim, Rulla M. Tamimi, Yujing J. Heng, Mitko Veta Detection of acini in histopathology slides: towards automated prediction of breast cancer risk Inproceedings SPIE Medical Imaging, 2019. @inproceedings{Wetstein2019, title = {Detection of acini in histopathology slides: towards automated prediction of breast cancer risk}, author = {Suzanne C. Wetstein, Allison M. Onken, Gabrielle M. Baker, Michael E. Pyle, Josien P. W. Pluim, Rulla M. Tamimi, Yujing J. Heng, Mitko Veta}, url = {https://www.spiedigitallibrary.org/conference-proceedings-of-spie/10956/109560Q/Detection-of-acini-in-histopathology-slides--towards-automated-prediction/10.1117/12.2511408.full}, year = {2019}, date = {2019-03-18}, booktitle = {SPIE Medical Imaging}, abstract = {Terminal duct lobular units (TDLUs) are structures in the breast which involute with the completion of childbearing and physiological ageing. Women with less TDLU involution are more likely to develop breast cancer than those with more involution. Thus, TDLU involution may be utilized as a biomarker to predict invasive cancer risk. Manual assessment of TDLU involution is a cumbersome and subjective process. This makes it amenable for automated assessment by image analysis. In this study, we developed and evaluated an acini detection method as a first step towards automated assessment of TDLU involution using a dataset of histopathological whole-slide images (WSIs) from the Nurses’ Health Study (NHS) and NHSII. The NHS/NHSII is among the world's largest investigations of epidemiological risk factors for major chronic diseases in women. We compared three different approaches to detect acini in WSIs using the U-Net convolutional neural network architecture. The approaches differ in the target that is predicted by the network: circular mask labels, soft labels and distance maps. Our results showed that soft label targets lead to a better detection performance than the other methods. F1 scores of 0.65, 0.73 and 0.66 were obtained with circular mask labels, soft labels and distance maps, respectively. Our acini detection method was furthermore validated by applying it to measure acini count per mm2 of tissue area on an independent set of WSIs. This measure was found to be significantly negatively correlated with age.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Terminal duct lobular units (TDLUs) are structures in the breast which involute with the completion of childbearing and physiological ageing. Women with less TDLU involution are more likely to develop breast cancer than those with more involution. Thus, TDLU involution may be utilized as a biomarker to predict invasive cancer risk. Manual assessment of TDLU involution is a cumbersome and subjective process. This makes it amenable for automated assessment by image analysis. In this study, we developed and evaluated an acini detection method as a first step towards automated assessment of TDLU involution using a dataset of histopathological whole-slide images (WSIs) from the Nurses’ Health Study (NHS) and NHSII. The NHS/NHSII is among the world's largest investigations of epidemiological risk factors for major chronic diseases in women. We compared three different approaches to detect acini in WSIs using the U-Net convolutional neural network architecture. The approaches differ in the target that is predicted by the network: circular mask labels, soft labels and distance maps. Our results showed that soft label targets lead to a better detection performance than the other methods. F1 scores of 0.65, 0.73 and 0.66 were obtained with circular mask labels, soft labels and distance maps, respectively. Our acini detection method was furthermore validated by applying it to measure acini count per mm2 of tissue area on an independent set of WSIs. This measure was found to be significantly negatively correlated with age. |
Friso G. Heslinga, Josien P. W. Pluim, Behdad Dashtbozorg, Tos T. J. M. Berendschot, A. J. H. M. Houben, Ronald M. A. Henry, Mitko Veta Approximation of a pipeline of unsupervised retina image analysis methods with a CNN Inproceedings SPIE Medical Imaging, 2019. @inproceedings{Heslinga2019, title = {Approximation of a pipeline of unsupervised retina image analysis methods with a CNN}, author = {Friso G. Heslinga, Josien P. W. Pluim, Behdad Dashtbozorg, Tos T. J. M. Berendschot, A. J. H. M. Houben, Ronald M. A. Henry, Mitko Veta}, url = {https://www.spiedigitallibrary.org/conference-proceedings-of-spie/10949/109491N/Approximation-of-a-pipeline-of-unsupervised-retina-image-analysis-methods/10.1117/12.2512393.full}, year = {2019}, date = {2019-03-15}, booktitle = {SPIE Medical Imaging}, abstract = {A pipeline of unsupervised image analysis methods for extraction of geometrical features from retinal fundus images has previously been developed. Features related to vessel caliber, tortuosity and bifurcations, have been identified as potential biomarkers for a variety of diseases, including diabetes and Alzheimer’s. The current computationally expensive pipeline takes 24 minutes to process a single image, which impedes implementation in a screening setting. In this work, we approximate the pipeline with a convolutional neural network (CNN) that enables processing of a single image in a few seconds. As an additional benefit, the trained CNN is sensitive to key structures in the retina and can be used as a pretrained network for related disease classification tasks. Our model is based on the ResNet-50 architecture and outputs four biomarkers that describe global properties of the vascular tree in retinal fundus images. Intraclass correlation coefficients between the predictions of the CNN and the results of the pipeline showed strong agreement (0.86 - 0.91) for three of four biomarkers and moderate agreement (0.42) for one biomarker. Class activation maps were created to illustrate the attention of the network. The maps show qualitatively that the activations of the network overlap with the biomarkers of interest, and that the network is able to distinguish venules from arterioles. Moreover, local high and low tortuous regions are clearly identified, confirming that a CNN is sensitive to key structures in the retina.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } A pipeline of unsupervised image analysis methods for extraction of geometrical features from retinal fundus images has previously been developed. Features related to vessel caliber, tortuosity and bifurcations, have been identified as potential biomarkers for a variety of diseases, including diabetes and Alzheimer’s. The current computationally expensive pipeline takes 24 minutes to process a single image, which impedes implementation in a screening setting. In this work, we approximate the pipeline with a convolutional neural network (CNN) that enables processing of a single image in a few seconds. As an additional benefit, the trained CNN is sensitive to key structures in the retina and can be used as a pretrained network for related disease classification tasks. Our model is based on the ResNet-50 architecture and outputs four biomarkers that describe global properties of the vascular tree in retinal fundus images. Intraclass correlation coefficients between the predictions of the CNN and the results of the pipeline showed strong agreement (0.86 - 0.91) for three of four biomarkers and moderate agreement (0.42) for one biomarker. Class activation maps were created to illustrate the attention of the network. The maps show qualitatively that the activations of the network overlap with the biomarkers of interest, and that the network is able to distinguish venules from arterioles. Moreover, local high and low tortuous regions are clearly identified, confirming that a CNN is sensitive to key structures in the retina. |
2018 |
G. Bortsova, F. Dubost, S. Ørting, I. Katramados, L. Hogeweg, L. Thomsen, M. Wille, M. de Bruijne Deep learning from label proportions for emphysema quantification Inproceedings International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 768–776, Springer, Cham, 2018. @inproceedings{Bortsova2018, title = {Deep learning from label proportions for emphysema quantification}, author = {G. Bortsova, F. Dubost, S. Ørting, I. Katramados, L. Hogeweg, L. Thomsen, M. Wille, M. de Bruijne}, url = {https://arxiv.org/pdf/1807.08601.pdf}, doi = {https://doi.org/10.1007/978-3-030-00934-2_85}, year = {2018}, date = {2018-09-26}, booktitle = {International Conference on Medical Image Computing and Computer-Assisted Intervention}, pages = {768--776}, publisher = {Springer, Cham}, abstract = {We propose an end-to-end deep learning method that learns to estimate emphysema extent from proportions of the diseased tissue. These proportions were visually estimated by experts using a standard grading system, in which grades correspond to intervals (label example: 1-5% of diseased tissue). The proposed architecture encodes the knowledge that the labels represent a volumetric proportion. A custom loss is designed to learn with intervals. Thus, during training, our network learns to segment the diseased tissue such that its proportions fit the ground truth intervals. Our architecture and loss combined improve the performance substantially (8% ICC) compared to a more conventional regression network. We outperform traditional lung densitometry and two recently published methods for emphysema quantification by a large margin (at least 7% AUC and 15% ICC), and achieve near-human-level performance. Moreover, our method generates emphysema segmentations that predict the spatial distribution of emphysema at human level. }, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } We propose an end-to-end deep learning method that learns to estimate emphysema extent from proportions of the diseased tissue. These proportions were visually estimated by experts using a standard grading system, in which grades correspond to intervals (label example: 1-5% of diseased tissue). The proposed architecture encodes the knowledge that the labels represent a volumetric proportion. A custom loss is designed to learn with intervals. Thus, during training, our network learns to segment the diseased tissue such that its proportions fit the ground truth intervals. Our architecture and loss combined improve the performance substantially (8% ICC) compared to a more conventional regression network. We outperform traditional lung densitometry and two recently published methods for emphysema quantification by a large margin (at least 7% AUC and 15% ICC), and achieve near-human-level performance. Moreover, our method generates emphysema segmentations that predict the spatial distribution of emphysema at human level. |
0000 |
Gerda Bortsova, Daniel Bos, Florian Dubost, Meike W. Vernooij, M. Kamran Ikram, Gijs van Tulder, Marleen de Bruijne Automated Segmentation and Volume Measurement of Intracranial Carotid Artery Calcification on Noncontrast CT Journal Article Radiology: Artificial Intelligence , pp. e200226, 0000. @article{Bortsova2021Segm, title = {Automated Segmentation and Volume Measurement of Intracranial Carotid Artery Calcification on Noncontrast CT}, author = {Gerda Bortsova, Daniel Bos, Florian Dubost, Meike W. Vernooij, M. Kamran Ikram, Gijs van Tulder, Marleen de Bruijne}, url = {https://arxiv.org/pdf/2107.09442.pdf}, journal = {Radiology: Artificial Intelligence }, pages = {e200226}, keywords = {}, pubstate = {published}, tppubtype = {article} } |
Gerda Bortsova, Florian Dubost, Laurens Hogeweg, Ioannis Katramados, Marleen de Bruijne Adversarial Heart Attack: Neural Networks Fooled to Segment Heart Symbols in Chest X-Ray Images Conference 0000. @conference{Bortsova2021AHA, title = {Adversarial Heart Attack: Neural Networks Fooled to Segment Heart Symbols in Chest X-Ray Images}, author = {Gerda Bortsova, Florian Dubost, Laurens Hogeweg, Ioannis Katramados, Marleen de Bruijne}, url = {https://arxiv.org/abs/2104.00139}, keywords = {}, pubstate = {published}, tppubtype = {conference} } |