![]() |
Suzanne Wetstein Eindhoven University of Technology s.c.wetstein@tue.nl |
PhD Candidate
E-mail: s.c.wetstein@tue.nl
Phone: +31 40 24 75581
LinkedIn; Google Scholar
Suzanne Wetstein is a PhD-candidate at the Medical Image Analysis Group at Eindhoven University of Technology under supervision of Prof. Josien Pluim and Dr. Mitko Veta. Her research is on deep learning applied to histopathological image analysis.
Suzanne has a BSc in Applied Physics from Delft University of Technology and a BSc in Economics and Business from Erasmus University Rotterdam. She did her MSc at VU University, where she studied Business Analytics. During her MSc she studied at Nanyang Technological University in Singapore for half a year to gain more machine learning knowledge. Suzanne concluded her MSc with an internship at ORTEC Consulting, where she worked on machine learning approaches for natural language processing applied to chatbots.
Her research interests include machine learning (deep learning), pattern recognition and medical image analysis.
2020 |
Suzanne C. Wetstein, Allison M. Onken, Christina Luffman, Gabrielle M. Baker, Michael E. Pyle, Kevin H. Kensler, Ying Liu, Bart Bakker, Ruud Vlutters, Marinus B. van Leeuwen, Laura C. Collins, Stuart J. Schnitt, Josien P. W. Pluim, Rulla M. Tamimi, Yujing J. Heng, Mitko Veta Deep learning assessment of breast terminal duct lobular unit involution: Towards automated prediction of breast cancer risk Journal Article PLoS ONE, 15 (4), pp. e0231653, 2020. @article{TDLUpaper, title = {Deep learning assessment of breast terminal duct lobular unit involution: Towards automated prediction of breast cancer risk}, author = {Suzanne C. Wetstein, Allison M. Onken, Christina Luffman, Gabrielle M. Baker, Michael E. Pyle, Kevin H. Kensler, Ying Liu, Bart Bakker, Ruud Vlutters, Marinus B. van Leeuwen, Laura C. Collins, Stuart J. Schnitt, Josien P. W. Pluim, Rulla M. Tamimi, Yujing J. Heng, Mitko Veta}, editor = {Ulas Bagci}, url = {https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0231653}, doi = {10.1371/journal.pone.0231653}, year = {2020}, date = {2020-04-15}, journal = {PLoS ONE}, volume = {15}, number = {4}, pages = {e0231653}, abstract = {Terminal duct lobular unit (TDLU) involution is the regression of milk-producing structures in the breast. Women with less TDLU involution are more likely to develop breast cancer. A major bottleneck in studying TDLU involution in large cohort studies is the need for labor-intensive manual assessment of TDLUs. We developed a computational pathology solution to automatically capture TDLU involution measures. Whole slide images (WSIs) of benign breast biopsies were obtained from the Nurses’ Health Study. A set of 92 WSIs was annotated for acini, TDLUs and adipose tissue to train deep convolutional neural network (CNN) models for detection of acini, and segmentation of TDLUs and adipose tissue. These networks were integrated into a single computational method to capture TDLU involution measures including number of TDLUs per tissue area, median TDLU span and median number of acini per TDLU. We validated our method on 40 additional WSIs by comparing with manually acquired measures. Our CNN models detected acini with an F1 score of 0.73±0.07, and segmented TDLUs and adipose tissue with Dice scores of 0.84±0.13 and 0.87±0.04, respectively. The inter-observer ICC scores for manual assessments on 40 WSIs of number of TDLUs per tissue area, median TDLU span, and median acini count per TDLU were 0.71, 0.81 and 0.73, respectively. Intra-observer reliability was evaluated on 10/40 WSIs with ICC scores of >0.8. Inter-observer ICC scores between automated results and the mean of the two observers were: 0.80 for number of TDLUs per tissue area, 0.57 for median TDLU span, and 0.80 for median acini count per TDLU. TDLU involution measures evaluated by manual and automated assessment were inversely associated with age and menopausal status. We developed a computational pathology method to measure TDLU involution. This technology eliminates the labor-intensiveness and subjectivity of manual TDLU assessment, and can be applied to future breast cancer risk studies.}, keywords = {}, pubstate = {published}, tppubtype = {article} } Terminal duct lobular unit (TDLU) involution is the regression of milk-producing structures in the breast. Women with less TDLU involution are more likely to develop breast cancer. A major bottleneck in studying TDLU involution in large cohort studies is the need for labor-intensive manual assessment of TDLUs. We developed a computational pathology solution to automatically capture TDLU involution measures. Whole slide images (WSIs) of benign breast biopsies were obtained from the Nurses’ Health Study. A set of 92 WSIs was annotated for acini, TDLUs and adipose tissue to train deep convolutional neural network (CNN) models for detection of acini, and segmentation of TDLUs and adipose tissue. These networks were integrated into a single computational method to capture TDLU involution measures including number of TDLUs per tissue area, median TDLU span and median number of acini per TDLU. We validated our method on 40 additional WSIs by comparing with manually acquired measures. Our CNN models detected acini with an F1 score of 0.73±0.07, and segmented TDLUs and adipose tissue with Dice scores of 0.84±0.13 and 0.87±0.04, respectively. The inter-observer ICC scores for manual assessments on 40 WSIs of number of TDLUs per tissue area, median TDLU span, and median acini count per TDLU were 0.71, 0.81 and 0.73, respectively. Intra-observer reliability was evaluated on 10/40 WSIs with ICC scores of >0.8. Inter-observer ICC scores between automated results and the mean of the two observers were: 0.80 for number of TDLUs per tissue area, 0.57 for median TDLU span, and 0.80 for median acini count per TDLU. TDLU involution measures evaluated by manual and automated assessment were inversely associated with age and menopausal status. We developed a computational pathology method to measure TDLU involution. This technology eliminates the labor-intensiveness and subjectivity of manual TDLU assessment, and can be applied to future breast cancer risk studies. |
Kevin H. Kensler; Emily Z.F. Liu; Suzanne C. Wetstein; Allison M. Onken; Christina I. Luffman; Gabrielle M. Baker; Laura C. Collins; Stuart J. Schnitt; Vanessa C. Bret-Mounet; Mitko Veta; Josien P.W. Pluim; Ying Liu; Graham A. Colditz; A. Heather Eliassen; Susan E. Hankinson; Rulla M. Tamimi; Yujing J. Heng Automated quantitative measures of terminal duct lobular unit involution and breast cancer risk Journal Article Cancer epidemiology, biomarkers & prevention, 29 (11), 2020. @article{kensler2020automated, title = {Automated quantitative measures of terminal duct lobular unit involution and breast cancer risk}, author = {Kevin H. Kensler and Emily Z.F. Liu and Suzanne C. Wetstein and Allison M. Onken and Christina I. Luffman and Gabrielle M. Baker and Laura C. Collins and Stuart J. Schnitt and Vanessa C. Bret-Mounet and Mitko Veta and Josien P.W. Pluim and Ying Liu and Graham A. Colditz and A. Heather Eliassen and Susan E. Hankinson and Rulla M. Tamimi and Yujing J. Heng}, url = {https://cebp.aacrjournals.org/content/29/11/2358.abstract}, year = {2020}, date = {2020-11-01}, journal = {Cancer epidemiology, biomarkers & prevention}, volume = {29}, number = {11}, abstract = {Background: Manual qualitative and quantitative measures of terminal duct lobular unit (TDLU) involution were previously reported to be inversely associated with breast cancer risk. We developed and applied a deep learning method to yield quantitative measures of TDLU involution in normal breast tissue. We assessed the associations of these automated measures with breast cancer risk factors and risk. Methods: We obtained eight quantitative measures from whole slide images from a benign breast disease (BBD) nested case–control study within the Nurses’ Health Studies (287 breast cancer cases and 1,083 controls). Qualitative assessments of TDLU involution were available for 177 cases and 857 controls. The associations between risk factors and quantitative measures among controls were assessed using analysis of covariance adjusting for age. The relationship between each measure and risk was evaluated using unconditional logistic regression, adjusting for the matching factors, BBD subtypes, parity, and menopausal status. Qualitative measures and breast cancer risk were evaluated accounting for matching factors and BBD subtypes. Results: Menopausal status and parity were significantly associated with all eight measures; select TDLU measures were associated with BBD histologic subtype, body mass index, and birth index (P < 0.05). No measure was correlated with body size at ages 5–10 years, age at menarche, age at first birth, or breastfeeding history (P > 0.05). Neither quantitative nor qualitative measures were associated with breast cancer risk. Conclusions: Among Nurses’ Health Studies women diagnosed with BBD, TDLU involution is not a biomarker of subsequent breast cancer. Impact: TDLU involution may not impact breast cancer risk as previously thought.}, keywords = {}, pubstate = {published}, tppubtype = {article} } Background: Manual qualitative and quantitative measures of terminal duct lobular unit (TDLU) involution were previously reported to be inversely associated with breast cancer risk. We developed and applied a deep learning method to yield quantitative measures of TDLU involution in normal breast tissue. We assessed the associations of these automated measures with breast cancer risk factors and risk. Methods: We obtained eight quantitative measures from whole slide images from a benign breast disease (BBD) nested case–control study within the Nurses’ Health Studies (287 breast cancer cases and 1,083 controls). Qualitative assessments of TDLU involution were available for 177 cases and 857 controls. The associations between risk factors and quantitative measures among controls were assessed using analysis of covariance adjusting for age. The relationship between each measure and risk was evaluated using unconditional logistic regression, adjusting for the matching factors, BBD subtypes, parity, and menopausal status. Qualitative measures and breast cancer risk were evaluated accounting for matching factors and BBD subtypes. Results: Menopausal status and parity were significantly associated with all eight measures; select TDLU measures were associated with BBD histologic subtype, body mass index, and birth index (P < 0.05). No measure was correlated with body size at ages 5–10 years, age at menarche, age at first birth, or breastfeeding history (P > 0.05). Neither quantitative nor qualitative measures were associated with breast cancer risk. Conclusions: Among Nurses’ Health Studies women diagnosed with BBD, TDLU involution is not a biomarker of subsequent breast cancer. Impact: TDLU involution may not impact breast cancer risk as previously thought. |
Suzanne C Wetstein, Nikolas Stathonikos, Josien PW Pluim, Yujing J Heng, Natalie D ter Hoeve, Celien PH Vreuls, Paul J van Diest, Mitko Veta Deep Learning-Based Grading of Ductal Carcinoma In Situ in Breast Histopathology Images Journal Article Forthcoming arXiv, Forthcoming. @article{Wetstein2020deep, title = {Deep Learning-Based Grading of Ductal Carcinoma In Situ in Breast Histopathology Images}, author = {Suzanne C Wetstein, Nikolas Stathonikos, Josien PW Pluim, Yujing J Heng, Natalie D ter Hoeve, Celien PH Vreuls, Paul J van Diest, Mitko Veta}, url = {https://arxiv.org/abs/2010.03244}, year = {2020}, date = {2020-10-15}, journal = {arXiv}, abstract = {Ductal carcinoma in situ (DCIS) is a non-invasive breast cancer that can progress into invasive ductal carcinoma (IDC). Studies suggest DCIS is often overtreated since a considerable part of DCIS lesions may never progress into IDC. Lower grade lesions have a lower progression speed and risk, possibly allowing treatment de-escalation. However, studies show significant inter-observer variation in DCIS grading. Automated image analysis may provide an objective solution to address high subjectivity of DCIS grading by pathologists. In this study, we developed a deep learning-based DCIS grading system. It was developed using the consensus DCIS grade of three expert observers on a dataset of 1186 DCIS lesions from 59 patients. The inter-observer agreement, measured by quadratic weighted Cohen's kappa, was used to evaluate the system and compare its performance to that of expert observers. We present an analysis of the lesion-level and patient-level inter-observer agreement on an independent test set of 1001 lesions from 50 patients. The deep learning system (dl) achieved on average slightly higher inter-observer agreement to the observers (o1, o2 and o3) (κo1,dl=0.81,κo2,dl=0.53,κo3,dl=0.40) than the observers amongst each other (κo1,o2=0.58,κo1,o3=0.50,κo2,o3=0.42) at the lesion-level. At the patient-level, the deep learning system achieved similar agreement to the observers (κo1,dl=0.77,κo2,dl=0.75,κo3,dl=0.70) as the observers amongst each other (κo1,o2=0.77,κo1,o3=0.75,κo2,o3=0.72). In conclusion, we developed a deep learning-based DCIS grading system that achieved a performance similar to expert observers. We believe this is the first automated system that could assist pathologists by providing robust and reproducible second opinions on DCIS grade.}, keywords = {}, pubstate = {forthcoming}, tppubtype = {article} } Ductal carcinoma in situ (DCIS) is a non-invasive breast cancer that can progress into invasive ductal carcinoma (IDC). Studies suggest DCIS is often overtreated since a considerable part of DCIS lesions may never progress into IDC. Lower grade lesions have a lower progression speed and risk, possibly allowing treatment de-escalation. However, studies show significant inter-observer variation in DCIS grading. Automated image analysis may provide an objective solution to address high subjectivity of DCIS grading by pathologists. In this study, we developed a deep learning-based DCIS grading system. It was developed using the consensus DCIS grade of three expert observers on a dataset of 1186 DCIS lesions from 59 patients. The inter-observer agreement, measured by quadratic weighted Cohen's kappa, was used to evaluate the system and compare its performance to that of expert observers. We present an analysis of the lesion-level and patient-level inter-observer agreement on an independent test set of 1001 lesions from 50 patients. The deep learning system (dl) achieved on average slightly higher inter-observer agreement to the observers (o1, o2 and o3) (κo1,dl=0.81,κo2,dl=0.53,κo3,dl=0.40) than the observers amongst each other (κo1,o2=0.58,κo1,o3=0.50,κo2,o3=0.42) at the lesion-level. At the patient-level, the deep learning system achieved similar agreement to the observers (κo1,dl=0.77,κo2,dl=0.75,κo3,dl=0.70) as the observers amongst each other (κo1,o2=0.77,κo1,o3=0.75,κo2,o3=0.72). In conclusion, we developed a deep learning-based DCIS grading system that achieved a performance similar to expert observers. We believe this is the first automated system that could assist pathologists by providing robust and reproducible second opinions on DCIS grade. |
2019 |
Suzanne C. Wetstein, Allison M. Onken, Gabrielle M. Baker, Michael E. Pyle, Josien P. W. Pluim, Rulla M. Tamimi, Yujing J. Heng, Mitko Veta Detection of acini in histopathology slides: towards automated prediction of breast cancer risk Inproceedings SPIE Medical Imaging, 2019. @inproceedings{Wetstein2019, title = {Detection of acini in histopathology slides: towards automated prediction of breast cancer risk}, author = {Suzanne C. Wetstein, Allison M. Onken, Gabrielle M. Baker, Michael E. Pyle, Josien P. W. Pluim, Rulla M. Tamimi, Yujing J. Heng, Mitko Veta}, url = {https://www.spiedigitallibrary.org/conference-proceedings-of-spie/10956/109560Q/Detection-of-acini-in-histopathology-slides--towards-automated-prediction/10.1117/12.2511408.full}, year = {2019}, date = {2019-03-18}, booktitle = {SPIE Medical Imaging}, abstract = {Terminal duct lobular units (TDLUs) are structures in the breast which involute with the completion of childbearing and physiological ageing. Women with less TDLU involution are more likely to develop breast cancer than those with more involution. Thus, TDLU involution may be utilized as a biomarker to predict invasive cancer risk. Manual assessment of TDLU involution is a cumbersome and subjective process. This makes it amenable for automated assessment by image analysis. In this study, we developed and evaluated an acini detection method as a first step towards automated assessment of TDLU involution using a dataset of histopathological whole-slide images (WSIs) from the Nurses’ Health Study (NHS) and NHSII. The NHS/NHSII is among the world's largest investigations of epidemiological risk factors for major chronic diseases in women. We compared three different approaches to detect acini in WSIs using the U-Net convolutional neural network architecture. The approaches differ in the target that is predicted by the network: circular mask labels, soft labels and distance maps. Our results showed that soft label targets lead to a better detection performance than the other methods. F1 scores of 0.65, 0.73 and 0.66 were obtained with circular mask labels, soft labels and distance maps, respectively. Our acini detection method was furthermore validated by applying it to measure acini count per mm2 of tissue area on an independent set of WSIs. This measure was found to be significantly negatively correlated with age.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Terminal duct lobular units (TDLUs) are structures in the breast which involute with the completion of childbearing and physiological ageing. Women with less TDLU involution are more likely to develop breast cancer than those with more involution. Thus, TDLU involution may be utilized as a biomarker to predict invasive cancer risk. Manual assessment of TDLU involution is a cumbersome and subjective process. This makes it amenable for automated assessment by image analysis. In this study, we developed and evaluated an acini detection method as a first step towards automated assessment of TDLU involution using a dataset of histopathological whole-slide images (WSIs) from the Nurses’ Health Study (NHS) and NHSII. The NHS/NHSII is among the world's largest investigations of epidemiological risk factors for major chronic diseases in women. We compared three different approaches to detect acini in WSIs using the U-Net convolutional neural network architecture. The approaches differ in the target that is predicted by the network: circular mask labels, soft labels and distance maps. Our results showed that soft label targets lead to a better detection performance than the other methods. F1 scores of 0.65, 0.73 and 0.66 were obtained with circular mask labels, soft labels and distance maps, respectively. Our acini detection method was furthermore validated by applying it to measure acini count per mm2 of tissue area on an independent set of WSIs. This measure was found to be significantly negatively correlated with age. |
2020 |
Suzanne C. Wetstein, Cristina González-Gonzalo, Gerda Bortsova, Bart Liefers, Florian Dubost, Ioannis Katramados, Laurens Hogeweg, Bram van Ginneken, Josien P.W. Pluim, Marleen de Bruijne, Clara I. Sánchez, Mitko Veta Adversarial Attack Vulnerability of Medical Image Analysis Systems: Unexplored Factors Conference 2020. @conference{Adversarial2020, title = {Adversarial Attack Vulnerability of Medical Image Analysis Systems: Unexplored Factors}, author = {Suzanne C. Wetstein, Cristina González-Gonzalo, Gerda Bortsova, Bart Liefers, Florian Dubost, Ioannis Katramados, Laurens Hogeweg, Bram van Ginneken, Josien P.W. Pluim, Marleen de Bruijne, Clara I. Sánchez, Mitko Veta}, url = {https://arxiv.org/abs/2006.06356}, year = {2020}, date = {2020-06-11}, urldate = {2020-08-25}, abstract = {Adversarial attacks are considered a potentially serious security threat for machine learning systems. Medical image analysis (MedIA) systems have recently been argued to be particularly vulnerable to adversarial attacks due to strong financial incentives. In this paper, we study several previously unexplored factors affecting adversarial attack vulnerability of deep learning MedIA systems in three medical domains: ophthalmology, radiology and pathology. Firstly, we study the effect of varying the degree of adversarial perturbation on the attack performance and its visual perceptibility. Secondly, we study how pre-training on a public dataset (ImageNet) affects the models' vulnerability to attacks. Thirdly, we study the influence of data and model architecture disparity between target and attacker models. Our experiments show that the degree of perturbation significantly affects both performance and human perceptibility of attacks. Pre-training may dramatically increase the transfer of adversarial examples; the larger the performance gain achieved by pre-training, the larger the transfer. Finally, disparity in data and/or model architecture between target and attacker models substantially decreases the success of attacks. We believe that these factors should be considered when designing cybersecurity-critical MedIA systems, as well as kept in mind when evaluating their vulnerability to adversarial attacks. }, keywords = {}, pubstate = {published}, tppubtype = {conference} } Adversarial attacks are considered a potentially serious security threat for machine learning systems. Medical image analysis (MedIA) systems have recently been argued to be particularly vulnerable to adversarial attacks due to strong financial incentives. In this paper, we study several previously unexplored factors affecting adversarial attack vulnerability of deep learning MedIA systems in three medical domains: ophthalmology, radiology and pathology. Firstly, we study the effect of varying the degree of adversarial perturbation on the attack performance and its visual perceptibility. Secondly, we study how pre-training on a public dataset (ImageNet) affects the models' vulnerability to attacks. Thirdly, we study the influence of data and model architecture disparity between target and attacker models. Our experiments show that the degree of perturbation significantly affects both performance and human perceptibility of attacks. Pre-training may dramatically increase the transfer of adversarial examples; the larger the performance gain achieved by pre-training, the larger the transfer. Finally, disparity in data and/or model architecture between target and attacker models substantially decreases the success of attacks. We believe that these factors should be considered when designing cybersecurity-critical MedIA systems, as well as kept in mind when evaluating their vulnerability to adversarial attacks. |
C. González-Gonzalo, S. C. Wetstein, G. Bortsova, B. Liefers, B. van Ginneken, C. I. Sánchez European Society of Retina Specialists, 2020. @conference{Gonz20c, title = {Are adversarial attacks an actual threat for deep learning systems in real-world eye disease screening settings?}, author = {C. González-Gonzalo, S. C. Wetstein, G. Bortsova, B. Liefers, B. van Ginneken, C. I. Sánchez}, url = {https://www.euretina.org/congress/amsterdam-2020/virtual-2020-freepapers/}, year = {2020}, date = {2020-10-02}, booktitle = {European Society of Retina Specialists}, abstract = {Purpose: Deep learning (DL) systems that perform image-level classification with convolutional neural networks (CNNs) have been shown to provide high-performance solutions for automated screening of eye diseases. Nevertheless, adversarial attacks have been recently screening settings, where there is restricted access to the systems and limited knowledge about certain factors, such as their CNN architecture or the data used for development. Setting: Deep learning for automated screening of eye diseases. Methods: We used the Kaggle dataset for diabetic retinopathy detection. It contains 88,702 manually-labelled color fundus images, which we split into test (12%) and development (88%). Development data were split into two equally-sized sets (d1 and d2); a third set (d3) was generated using half of the images in d2. In each development set, 80%/20% of the images were used for training/validation. All splits were done randomly at patient-level. As attacked system, we developed a randomly-initialized CNN based on the Inception-v3 architecture using d1. We performed the attacks (1) in a white-box (WB) setting, with full access to the attacked system to generate the adversarial images, and (2) in black-box (BB) settings, without access to the attacked system and using a surrogate system to craft the attacks. We simulated different BB settings, sequentially decreasing the available knowledge about the attacked system: same architecture, using d1 (BB-1); different architecture (randomly-initialized DenseNet-121), using d1 (BB-2); same architecture, using d2 (BB-3); different architecture, using d2 (BB-4); different architecture, using d3 (BB-5). In each setting, adversarial images containing non-perceptible noise were generated by applying the fast gradient sign method to each image of the test set and processed by the attacked system. Results: The performance of the attacked system to detect referable diabetic retinopathy without attacks and under the different attack settings was measured on the test set using the area under the receiver operating characteristic curve (AUC). Without attacks, the system achieved an AUC of 0.88. In each attack setting, the relative decrease in AUC with respect to the original performance was computed. In the WB setting, there was a 99.9% relative decrease in performance. In the BB-1 setting, the relative decrease in AUC was 67.3%. In the BB-2 setting, the AUC suffered a 40.2% relative decrease. In the BB-3 setting, the relative decrease was 37.9%. In the BB-4 setting, the relative decrease in AUC was 34.1%. Lastly, in the BB-5 setting, the performance of the attacked system decreased 3.8% regarding its original performance. Conclusions: The results obtained in the different settings show a drastic decrease of the attacked DL system's vulnerability to adversarial attacks when the access and knowledge about it are limited. The impact on performance is extremely reduced when restricting the direct access to the system (from the WB to the BB-1 setting). The attacks become slightly less effective when not having access to the same development data (BB-3), compared to not using the same CNN architecture (BB-2). Attacks' effectiveness further decreases when both factors are unknown (BB-4). If the amount of development data is additionally reduced (BB-5), the original performance barely deteriorates. This last setting is the most similar to realistic screening settings, since most systems are currently closed source and use additional large private datasets for development. In conclusion, these factors should be acknowledged for future development of robust DL systems, as well as considered when evaluating the vulnerability of currently-available systems to adversarial attacks. Having limited access and knowledge about the systems determines the actual threat these attacks pose. We believe awareness about this matter will increase experts' trust and facilitate the integration of DL systems in real-world settings.}, keywords = {}, pubstate = {published}, tppubtype = {conference} } Purpose: Deep learning (DL) systems that perform image-level classification with convolutional neural networks (CNNs) have been shown to provide high-performance solutions for automated screening of eye diseases. Nevertheless, adversarial attacks have been recently screening settings, where there is restricted access to the systems and limited knowledge about certain factors, such as their CNN architecture or the data used for development. Setting: Deep learning for automated screening of eye diseases. Methods: We used the Kaggle dataset for diabetic retinopathy detection. It contains 88,702 manually-labelled color fundus images, which we split into test (12%) and development (88%). Development data were split into two equally-sized sets (d1 and d2); a third set (d3) was generated using half of the images in d2. In each development set, 80%/20% of the images were used for training/validation. All splits were done randomly at patient-level. As attacked system, we developed a randomly-initialized CNN based on the Inception-v3 architecture using d1. We performed the attacks (1) in a white-box (WB) setting, with full access to the attacked system to generate the adversarial images, and (2) in black-box (BB) settings, without access to the attacked system and using a surrogate system to craft the attacks. We simulated different BB settings, sequentially decreasing the available knowledge about the attacked system: same architecture, using d1 (BB-1); different architecture (randomly-initialized DenseNet-121), using d1 (BB-2); same architecture, using d2 (BB-3); different architecture, using d2 (BB-4); different architecture, using d3 (BB-5). In each setting, adversarial images containing non-perceptible noise were generated by applying the fast gradient sign method to each image of the test set and processed by the attacked system. Results: The performance of the attacked system to detect referable diabetic retinopathy without attacks and under the different attack settings was measured on the test set using the area under the receiver operating characteristic curve (AUC). Without attacks, the system achieved an AUC of 0.88. In each attack setting, the relative decrease in AUC with respect to the original performance was computed. In the WB setting, there was a 99.9% relative decrease in performance. In the BB-1 setting, the relative decrease in AUC was 67.3%. In the BB-2 setting, the AUC suffered a 40.2% relative decrease. In the BB-3 setting, the relative decrease was 37.9%. In the BB-4 setting, the relative decrease in AUC was 34.1%. Lastly, in the BB-5 setting, the performance of the attacked system decreased 3.8% regarding its original performance. Conclusions: The results obtained in the different settings show a drastic decrease of the attacked DL system's vulnerability to adversarial attacks when the access and knowledge about it are limited. The impact on performance is extremely reduced when restricting the direct access to the system (from the WB to the BB-1 setting). The attacks become slightly less effective when not having access to the same development data (BB-3), compared to not using the same CNN architecture (BB-2). Attacks' effectiveness further decreases when both factors are unknown (BB-4). If the amount of development data is additionally reduced (BB-5), the original performance barely deteriorates. This last setting is the most similar to realistic screening settings, since most systems are currently closed source and use additional large private datasets for development. In conclusion, these factors should be acknowledged for future development of robust DL systems, as well as considered when evaluating the vulnerability of currently-available systems to adversarial attacks. Having limited access and knowledge about the systems determines the actual threat these attacks pose. We believe awareness about this matter will increase experts' trust and facilitate the integration of DL systems in real-world settings. |
2019 |
Allison M. Onken, Suzanne Wetstein, Michael Pyle, Josien Pluim, Stuart J. Schnitt, Gabrielle M Baker, Laura C. Collins, Rulla Tamimi, Mitko Veta, Yujing Jan Heng Deep Learning Networks to Segment and Detect Breast Terminal Duct Lobular Units, Acini, and Adipose Tissue: A Step Toward the Automated Analysis of Lobular Involution as a Marker for Breast Cancer Risk Conference United States and Canadian Academy of Pathology (USCAP), 2019. @conference{OnkenWetstein2019, title = {Deep Learning Networks to Segment and Detect Breast Terminal Duct Lobular Units, Acini, and Adipose Tissue: A Step Toward the Automated Analysis of Lobular Involution as a Marker for Breast Cancer Risk}, author = {Allison M. Onken, Suzanne Wetstein, Michael Pyle, Josien Pluim, Stuart J. Schnitt, Gabrielle M Baker, Laura C. Collins, Rulla Tamimi, Mitko Veta, Yujing Jan Heng}, year = {2019}, date = {2019-03-16}, booktitle = {United States and Canadian Academy of Pathology (USCAP)}, abstract = {Background: Terminal duct lobular unit (TDLU) involution is the physiological process whereby Type 2 and 3 lobules revert to Type 1 after child-bearing years. TDLU involution (quantitatively assessed by TDLU count per mm2, TDLU span, and acini count per TDLU) is inversely associated with breast cancer risk. The manual assessment of involution is time-consuming and subjective, making it impractical to perform on large epidemiological studies. Deep learning algorithms such as convolutional neural networks (CNNs) could be utilized for rapid and automated assessment of TDLU involution. We designed two CNNs to segment TDLUs and detect acini as the first step toward large-scale assessment of TDLU involution, and a third CNN to segment adipose tissue. Design: Whole slide images (WSIs; n=50) were obtained from the Nurses’ Health Study Incident Benign Breast Disease Study. For each WSI, TDLUs, acini, and adipose tissue were annotated within a region of interest comprising approximately 10% of the total tissue area. In order to assess involution in histologically normal breast parenchyma only, TDLUs with proliferative or metaplastic changes were excluded from manual evaluation. CNNs were engineered to recognize TDLUs, acini, and adipose tissue using 60% of the WSIs for training, 20% as a test set, and 20% for validation. F1 and Dice scores were calculated as accuracy measures to compare CNN segmentation to manual assessment. Results: Our CNNs detected acini, segmented TDLUs, and segmented adipose tissue with accuracy measures of 0.73, 0.84, and 0.86, respectively. Two primary causes of discordance with manual assessment were identified: 1) complex clustering of TDLUs where our CNN had difficulty predicting TDLU boundaries and 2) acini with proliferative or metaplastic changes which our CNN frequently detected as acini but which were intentionally excluded from manual annotation. Conclusion: We have developed a series of deep learning networks to segment and detect TDLUs, acini, and adipose tissue on WSIs. With accuracy measures of >0.7, our CNNs are sufficiently robust to be integrated into a computational pipeline for automated assessment of the quantitative features of TDLU involution, and will be further refined to address sources of discordance with manual assessment. This is the first step toward the large-scale quantification of TDLU involution which, when applied to patient samples, could be used to better determine the breast cancer risk associated with lobule type and degree of involution.}, keywords = {}, pubstate = {published}, tppubtype = {conference} } Background: Terminal duct lobular unit (TDLU) involution is the physiological process whereby Type 2 and 3 lobules revert to Type 1 after child-bearing years. TDLU involution (quantitatively assessed by TDLU count per mm2, TDLU span, and acini count per TDLU) is inversely associated with breast cancer risk. The manual assessment of involution is time-consuming and subjective, making it impractical to perform on large epidemiological studies. Deep learning algorithms such as convolutional neural networks (CNNs) could be utilized for rapid and automated assessment of TDLU involution. We designed two CNNs to segment TDLUs and detect acini as the first step toward large-scale assessment of TDLU involution, and a third CNN to segment adipose tissue. Design: Whole slide images (WSIs; n=50) were obtained from the Nurses’ Health Study Incident Benign Breast Disease Study. For each WSI, TDLUs, acini, and adipose tissue were annotated within a region of interest comprising approximately 10% of the total tissue area. In order to assess involution in histologically normal breast parenchyma only, TDLUs with proliferative or metaplastic changes were excluded from manual evaluation. CNNs were engineered to recognize TDLUs, acini, and adipose tissue using 60% of the WSIs for training, 20% as a test set, and 20% for validation. F1 and Dice scores were calculated as accuracy measures to compare CNN segmentation to manual assessment. Results: Our CNNs detected acini, segmented TDLUs, and segmented adipose tissue with accuracy measures of 0.73, 0.84, and 0.86, respectively. Two primary causes of discordance with manual assessment were identified: 1) complex clustering of TDLUs where our CNN had difficulty predicting TDLU boundaries and 2) acini with proliferative or metaplastic changes which our CNN frequently detected as acini but which were intentionally excluded from manual annotation. Conclusion: We have developed a series of deep learning networks to segment and detect TDLUs, acini, and adipose tissue on WSIs. With accuracy measures of >0.7, our CNNs are sufficiently robust to be integrated into a computational pipeline for automated assessment of the quantitative features of TDLU involution, and will be further refined to address sources of discordance with manual assessment. This is the first step toward the large-scale quantification of TDLU involution which, when applied to patient samples, could be used to better determine the breast cancer risk associated with lobule type and degree of involution. |
Christina I. Luffman, Suzanne C. Wetstein, Allison M. Onken, Michael E. Pyle, Kevin H. Kensler, Ying Liu, Josien P. Pluim, Mitko Veta, Stuart J. Schnitt, Rulla M. Tamimi, Gabrielle M. Baker, Laura C. Collins, Yu Jing Heng Assessing Breast Terminal Duct Lobular Unit Involution: A Computational Pathology Approach Conference Abstracts and Case Studies From the College of American Pathologists 2019 Annual Meeting (CAP19), 143 (9), Archives of Pathology & Laboratory Medicine, 2019. @conference{https://doi.org/10.5858/arpa.2019-0901-AB, title = {Assessing Breast Terminal Duct Lobular Unit Involution: A Computational Pathology Approach}, author = {Christina I. Luffman, Suzanne C. Wetstein, Allison M. Onken, Michael E. Pyle, Kevin H. Kensler, Ying Liu, Josien P. Pluim, Mitko Veta, Stuart J. Schnitt, Rulla M. Tamimi, Gabrielle M. Baker, Laura C. Collins, Yu Jing Heng}, doi = {10.5858/arpa.2019-0901-AB}, year = {2019}, date = {2019-09-01}, booktitle = {Abstracts and Case Studies From the College of American Pathologists 2019 Annual Meeting (CAP19)}, volume = {143}, number = {9}, pages = {e2-e226}, publisher = {Archives of Pathology & Laboratory Medicine}, keywords = {}, pubstate = {published}, tppubtype = {conference} } |