Project 2.2 - Dynamic Deep Learning
We develop a framework for dynamic deep learning. This system can learn continuously from feedback from experts. It can learn easy concepts first and gradually learn complex tasks after having seen more data. The network will express its uncertainty and ask for feedback on cases it is uncertain about or has not seen before.
Project Leader
![]() |
Dr. Clarisa Sánchez Radboud University Medical Center clara.sanchezgutierrez@radboudumc.nl |
Co-Applicants
![]() |
Prof.dr. Bram van Ginneken Radboud University Medical Center bram.vanginneken@radboudumc.nl |
![]() |
Dr. Ivana Išgum Amsterdam UMC i.isgum@amsterdamumc.nl |
Researchers
![]() |
Cristina Gonzalez Gonzalo Radboud University Medical Center cristina.gonzalezgonzalo@radboudumc.nl |
![]() |
Ecem Lago Radboud University Medical Center ecem.lago@radboudumc.nl |
![]() |
Jörg Sander Amsterdam UMC j.sander1@amsterdamumc.nl |
Publications
2020 |
Ecem Sogancioglu,Keelin Murphy,Erdi Calli, Ernst Scholten, Steven Schalekamp, Bram van Ginneken Cardiomegaly Detection on Chest Radiographs: Segmentation Versus Classification Journal Article IEEE Access, 8 , pp. 94631 - 94642, 2020, ISSN: 2169-3536. @article{soga19, title = {Cardiomegaly Detection on Chest Radiographs: Segmentation Versus Classification}, author = {Ecem Sogancioglu,Keelin Murphy,Erdi Calli, Ernst Scholten, Steven Schalekamp, Bram van Ginneken}, url = {https://ieeexplore.ieee.org/document/9096290}, doi = {10.1109/ACCESS.2020.2995567}, issn = {2169-3536}, year = {2020}, date = {2020-05-27}, journal = {IEEE Access}, volume = {8}, pages = {94631 - 94642}, keywords = {}, pubstate = {published}, tppubtype = {article} } |
C. González-Gonzalo, B. Liefers, B. van Ginneken, C. I. Sánchez IEEE Transactions on Medical Imaging, 39 (11), pp. 3499 - 3511, 2020, ISSN: 1558-254X. @article{Gonz20, title = {Iterative augmentation of visual evidence for weakly-supervised lesion localization in deep interpretability frameworks: application to color fundus images}, author = {C. González-Gonzalo, B. Liefers, B. van Ginneken, C. I. Sánchez}, url = {https://ieeexplore.ieee.org/abstract/document/9103111, https://arxiv.org/abs/1910.07373}, doi = {10.1109/TMI.2020.2994463}, issn = {1558-254X}, year = {2020}, date = {2020-05-28}, journal = {IEEE Transactions on Medical Imaging}, volume = {39}, number = {11}, pages = {3499 - 3511}, abstract = {Interpretability of deep learning (DL) systems is gaining attention in medical imaging to increase experts’ trust in the obtained predictions and facilitate their integration in clinical settings. We propose a deep visualization method to generate interpretability of DL classification tasks in medical imaging by means of visual evidence augmentation. The proposed method iteratively unveils abnormalities based on the prediction of a classifier trained only with image-level labels. For each image, initial visual evidence of the prediction is extracted with a given visual attribution technique. This provides localization of abnormalities that are then removed through selective inpainting. We iteratively apply this procedure until the system considers the image as normal. This yields augmented visual evidence, including less discriminative lesions which were not detected at first but should be considered for final diagnosis. We apply the method to grading of two retinal diseases in color fundus images: diabetic retinopathy (DR) and age-related macular degeneration (AMD). We evaluate the generated visual evidence and the performance of weakly-supervised localization of different types of DR and AMD abnormalities, both qualitatively and quantitatively. We show that the augmented visual evidence of the predictions highlights the biomarkers considered by experts for diagnosis and improves the final localization performance. It results in a relative increase of 11.2±2.0% per image regarding sensitivity averaged at 10 false positives/image on average, when applied to different classification tasks, visual attribution techniques and network architectures. This makes the proposed method a useful tool for exhaustive visual support of DL classifiers in medical imaging.}, keywords = {}, pubstate = {published}, tppubtype = {article} } Interpretability of deep learning (DL) systems is gaining attention in medical imaging to increase experts’ trust in the obtained predictions and facilitate their integration in clinical settings. We propose a deep visualization method to generate interpretability of DL classification tasks in medical imaging by means of visual evidence augmentation. The proposed method iteratively unveils abnormalities based on the prediction of a classifier trained only with image-level labels. For each image, initial visual evidence of the prediction is extracted with a given visual attribution technique. This provides localization of abnormalities that are then removed through selective inpainting. We iteratively apply this procedure until the system considers the image as normal. This yields augmented visual evidence, including less discriminative lesions which were not detected at first but should be considered for final diagnosis. We apply the method to grading of two retinal diseases in color fundus images: diabetic retinopathy (DR) and age-related macular degeneration (AMD). We evaluate the generated visual evidence and the performance of weakly-supervised localization of different types of DR and AMD abnormalities, both qualitatively and quantitatively. We show that the augmented visual evidence of the predictions highlights the biomarkers considered by experts for diagnosis and improves the final localization performance. It results in a relative increase of 11.2±2.0% per image regarding sensitivity averaged at 10 false positives/image on average, when applied to different classification tasks, visual attribution techniques and network architectures. This makes the proposed method a useful tool for exhaustive visual support of DL classifiers in medical imaging. |
Suzanne C. Wetstein, Cristina González-Gonzalo, Gerda Bortsova, Bart Liefers, Florian Dubost, Ioannis Katramados, Laurens Hogeweg, Bram van Ginneken, Josien P.W. Pluim, Marleen de Bruijne, Clara I. Sánchez, Mitko Veta Adversarial Attack Vulnerability of Medical Image Analysis Systems: Unexplored Factors Conference 2020. @conference{Adversarial2020, title = {Adversarial Attack Vulnerability of Medical Image Analysis Systems: Unexplored Factors}, author = {Suzanne C. Wetstein, Cristina González-Gonzalo, Gerda Bortsova, Bart Liefers, Florian Dubost, Ioannis Katramados, Laurens Hogeweg, Bram van Ginneken, Josien P.W. Pluim, Marleen de Bruijne, Clara I. Sánchez, Mitko Veta}, url = {https://arxiv.org/abs/2006.06356}, year = {2020}, date = {2020-06-11}, urldate = {2020-08-25}, abstract = {Adversarial attacks are considered a potentially serious security threat for machine learning systems. Medical image analysis (MedIA) systems have recently been argued to be particularly vulnerable to adversarial attacks due to strong financial incentives. In this paper, we study several previously unexplored factors affecting adversarial attack vulnerability of deep learning MedIA systems in three medical domains: ophthalmology, radiology and pathology. Firstly, we study the effect of varying the degree of adversarial perturbation on the attack performance and its visual perceptibility. Secondly, we study how pre-training on a public dataset (ImageNet) affects the models' vulnerability to attacks. Thirdly, we study the influence of data and model architecture disparity between target and attacker models. Our experiments show that the degree of perturbation significantly affects both performance and human perceptibility of attacks. Pre-training may dramatically increase the transfer of adversarial examples; the larger the performance gain achieved by pre-training, the larger the transfer. Finally, disparity in data and/or model architecture between target and attacker models substantially decreases the success of attacks. We believe that these factors should be considered when designing cybersecurity-critical MedIA systems, as well as kept in mind when evaluating their vulnerability to adversarial attacks. }, keywords = {}, pubstate = {published}, tppubtype = {conference} } Adversarial attacks are considered a potentially serious security threat for machine learning systems. Medical image analysis (MedIA) systems have recently been argued to be particularly vulnerable to adversarial attacks due to strong financial incentives. In this paper, we study several previously unexplored factors affecting adversarial attack vulnerability of deep learning MedIA systems in three medical domains: ophthalmology, radiology and pathology. Firstly, we study the effect of varying the degree of adversarial perturbation on the attack performance and its visual perceptibility. Secondly, we study how pre-training on a public dataset (ImageNet) affects the models' vulnerability to attacks. Thirdly, we study the influence of data and model architecture disparity between target and attacker models. Our experiments show that the degree of perturbation significantly affects both performance and human perceptibility of attacks. Pre-training may dramatically increase the transfer of adversarial examples; the larger the performance gain achieved by pre-training, the larger the transfer. Finally, disparity in data and/or model architecture between target and attacker models substantially decreases the success of attacks. We believe that these factors should be considered when designing cybersecurity-critical MedIA systems, as well as kept in mind when evaluating their vulnerability to adversarial attacks. |
C. González-Gonzalo, S. C. Wetstein, G. Bortsova, B. Liefers, B. van Ginneken, C. I. Sánchez European Society of Retina Specialists, 2020. @conference{Gonz20c, title = {Are adversarial attacks an actual threat for deep learning systems in real-world eye disease screening settings?}, author = {C. González-Gonzalo, S. C. Wetstein, G. Bortsova, B. Liefers, B. van Ginneken, C. I. Sánchez}, url = {https://www.euretina.org/congress/amsterdam-2020/virtual-2020-freepapers/}, year = {2020}, date = {2020-10-02}, booktitle = {European Society of Retina Specialists}, abstract = {Purpose: Deep learning (DL) systems that perform image-level classification with convolutional neural networks (CNNs) have been shown to provide high-performance solutions for automated screening of eye diseases. Nevertheless, adversarial attacks have been recently screening settings, where there is restricted access to the systems and limited knowledge about certain factors, such as their CNN architecture or the data used for development. Setting: Deep learning for automated screening of eye diseases. Methods: We used the Kaggle dataset for diabetic retinopathy detection. It contains 88,702 manually-labelled color fundus images, which we split into test (12%) and development (88%). Development data were split into two equally-sized sets (d1 and d2); a third set (d3) was generated using half of the images in d2. In each development set, 80%/20% of the images were used for training/validation. All splits were done randomly at patient-level. As attacked system, we developed a randomly-initialized CNN based on the Inception-v3 architecture using d1. We performed the attacks (1) in a white-box (WB) setting, with full access to the attacked system to generate the adversarial images, and (2) in black-box (BB) settings, without access to the attacked system and using a surrogate system to craft the attacks. We simulated different BB settings, sequentially decreasing the available knowledge about the attacked system: same architecture, using d1 (BB-1); different architecture (randomly-initialized DenseNet-121), using d1 (BB-2); same architecture, using d2 (BB-3); different architecture, using d2 (BB-4); different architecture, using d3 (BB-5). In each setting, adversarial images containing non-perceptible noise were generated by applying the fast gradient sign method to each image of the test set and processed by the attacked system. Results: The performance of the attacked system to detect referable diabetic retinopathy without attacks and under the different attack settings was measured on the test set using the area under the receiver operating characteristic curve (AUC). Without attacks, the system achieved an AUC of 0.88. In each attack setting, the relative decrease in AUC with respect to the original performance was computed. In the WB setting, there was a 99.9% relative decrease in performance. In the BB-1 setting, the relative decrease in AUC was 67.3%. In the BB-2 setting, the AUC suffered a 40.2% relative decrease. In the BB-3 setting, the relative decrease was 37.9%. In the BB-4 setting, the relative decrease in AUC was 34.1%. Lastly, in the BB-5 setting, the performance of the attacked system decreased 3.8% regarding its original performance. Conclusions: The results obtained in the different settings show a drastic decrease of the attacked DL system's vulnerability to adversarial attacks when the access and knowledge about it are limited. The impact on performance is extremely reduced when restricting the direct access to the system (from the WB to the BB-1 setting). The attacks become slightly less effective when not having access to the same development data (BB-3), compared to not using the same CNN architecture (BB-2). Attacks' effectiveness further decreases when both factors are unknown (BB-4). If the amount of development data is additionally reduced (BB-5), the original performance barely deteriorates. This last setting is the most similar to realistic screening settings, since most systems are currently closed source and use additional large private datasets for development. In conclusion, these factors should be acknowledged for future development of robust DL systems, as well as considered when evaluating the vulnerability of currently-available systems to adversarial attacks. Having limited access and knowledge about the systems determines the actual threat these attacks pose. We believe awareness about this matter will increase experts' trust and facilitate the integration of DL systems in real-world settings.}, keywords = {}, pubstate = {published}, tppubtype = {conference} } Purpose: Deep learning (DL) systems that perform image-level classification with convolutional neural networks (CNNs) have been shown to provide high-performance solutions for automated screening of eye diseases. Nevertheless, adversarial attacks have been recently screening settings, where there is restricted access to the systems and limited knowledge about certain factors, such as their CNN architecture or the data used for development. Setting: Deep learning for automated screening of eye diseases. Methods: We used the Kaggle dataset for diabetic retinopathy detection. It contains 88,702 manually-labelled color fundus images, which we split into test (12%) and development (88%). Development data were split into two equally-sized sets (d1 and d2); a third set (d3) was generated using half of the images in d2. In each development set, 80%/20% of the images were used for training/validation. All splits were done randomly at patient-level. As attacked system, we developed a randomly-initialized CNN based on the Inception-v3 architecture using d1. We performed the attacks (1) in a white-box (WB) setting, with full access to the attacked system to generate the adversarial images, and (2) in black-box (BB) settings, without access to the attacked system and using a surrogate system to craft the attacks. We simulated different BB settings, sequentially decreasing the available knowledge about the attacked system: same architecture, using d1 (BB-1); different architecture (randomly-initialized DenseNet-121), using d1 (BB-2); same architecture, using d2 (BB-3); different architecture, using d2 (BB-4); different architecture, using d3 (BB-5). In each setting, adversarial images containing non-perceptible noise were generated by applying the fast gradient sign method to each image of the test set and processed by the attacked system. Results: The performance of the attacked system to detect referable diabetic retinopathy without attacks and under the different attack settings was measured on the test set using the area under the receiver operating characteristic curve (AUC). Without attacks, the system achieved an AUC of 0.88. In each attack setting, the relative decrease in AUC with respect to the original performance was computed. In the WB setting, there was a 99.9% relative decrease in performance. In the BB-1 setting, the relative decrease in AUC was 67.3%. In the BB-2 setting, the AUC suffered a 40.2% relative decrease. In the BB-3 setting, the relative decrease was 37.9%. In the BB-4 setting, the relative decrease in AUC was 34.1%. Lastly, in the BB-5 setting, the performance of the attacked system decreased 3.8% regarding its original performance. Conclusions: The results obtained in the different settings show a drastic decrease of the attacked DL system's vulnerability to adversarial attacks when the access and knowledge about it are limited. The impact on performance is extremely reduced when restricting the direct access to the system (from the WB to the BB-1 setting). The attacks become slightly less effective when not having access to the same development data (BB-3), compared to not using the same CNN architecture (BB-2). Attacks' effectiveness further decreases when both factors are unknown (BB-4). If the amount of development data is additionally reduced (BB-5), the original performance barely deteriorates. This last setting is the most similar to realistic screening settings, since most systems are currently closed source and use additional large private datasets for development. In conclusion, these factors should be acknowledged for future development of robust DL systems, as well as considered when evaluating the vulnerability of currently-available systems to adversarial attacks. Having limited access and knowledge about the systems determines the actual threat these attacks pose. We believe awareness about this matter will increase experts' trust and facilitate the integration of DL systems in real-world settings. |
J. Sander, B.D. de Vos, I. Išgum Unsupervised super-resolution: creating high-resolution medical images from low-resolution anisotropic examples Inproceedings SPIE Medical Imaging (in press), 2020. @inproceedings{Sander2020, title = {Unsupervised super-resolution: creating high-resolution medical images from low-resolution anisotropic examples}, author = {J. Sander, B.D. de Vos, I. Išgum}, year = {2020}, date = {2020-10-14}, booktitle = {SPIE Medical Imaging (in press)}, abstract = {Although high resolution isotropic 3D medical images are desired in clinical practice, their acquisition is not always feasible. Instead, lower resolution images are upsampled to higher resolution using conventional interpolation methods. Sophisticated learning-based super-resolution approaches are frequently unavailable in clinical setting, because such methods require training with high-resolution isotropic examples. To address this issue, we propose a learning-based super-resolution approach that can be trained using solely anisotropic images, i.e. without high-resolution ground truth data. The method exploits the latent space, generated by autoencoders trained on anisotropic images, to increase spatial resolution in low-resolution images. The method was trained and evaluated using 100 publicly available cardiac cine MR scans from the Automated Cardiac Diagnosis Challenge (ACDC). The quantitative results show that the proposed method performs better than conventional interpolation methods. Furthermore, the qualitative results indicate that especially ner cardiac structures are synthesized with high quality. The method has the potential to be applied to other anatomies and modalities and can be easily applied to any 3D anisotropic medical image dataset. }, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Although high resolution isotropic 3D medical images are desired in clinical practice, their acquisition is not always feasible. Instead, lower resolution images are upsampled to higher resolution using conventional interpolation methods. Sophisticated learning-based super-resolution approaches are frequently unavailable in clinical setting, because such methods require training with high-resolution isotropic examples. To address this issue, we propose a learning-based super-resolution approach that can be trained using solely anisotropic images, i.e. without high-resolution ground truth data. The method exploits the latent space, generated by autoencoders trained on anisotropic images, to increase spatial resolution in low-resolution images. The method was trained and evaluated using 100 publicly available cardiac cine MR scans from the Automated Cardiac Diagnosis Challenge (ACDC). The quantitative results show that the proposed method performs better than conventional interpolation methods. Furthermore, the qualitative results indicate that especially ner cardiac structures are synthesized with high quality. The method has the potential to be applied to other anatomies and modalities and can be easily applied to any 3D anisotropic medical image dataset. |
2019 |
C. González-Gonzalo, V. Sánchez-Gutiérrez, P. Hernández-Martínez, I. Contreras, Y. T. Lechanteur, A. Domanian, B. van Ginneken, C. I. Sánchez Evaluation of a deep learning system for the joint automated detection of diabetic retinopathy and age-related macular degeneration Journal Article Acta Ophthalmologica, 2019. @article{Gonz2019, title = {Evaluation of a deep learning system for the joint automated detection of diabetic retinopathy and age-related macular degeneration}, author = {C. González-Gonzalo, V. Sánchez-Gutiérrez, P. Hernández-Martínez, I. Contreras, Y. T. Lechanteur, A. Domanian, B. van Ginneken, C. I. Sánchez}, url = {https://onlinelibrary.wiley.com/doi/full/10.1111/aos.14306 https://arxiv.org/abs/1903.09555}, doi = {https://doi.org/10.1111/aos.14306}, year = {2019}, date = {2019-11-26}, journal = {Acta Ophthalmologica}, abstract = {Purpose: To validate the performance of a commercially-available, CE-certified deep learning (DL) system, RetCAD v.1.3.0 (Thirona, Nijmegen, The Netherlands), for the joint automatic detection of diabetic retinopathy (DR) and age-related macular degeneration (AMD) in color fundus (CF) images on a dataset with mixed presence of eye diseases. Methods: Evaluation of joint detection of referable DR and AMD was performed on a DR-AMD dataset with 600 images acquired during routine clinical practice, containing referable and non-referable cases of both diseases. Each image was graded for DR and AMD by an experienced ophthalmologist to establish the reference standard (RS), and by four independent observers for comparison with human performance. Validation was furtherly assessed on Messidor (1200 images) for individual identification of referable DR, and the Age-Related Eye Disease Study (AREDS) dataset (133821 images) for referable AMD, against the corresponding RS. Results: Regarding joint validation on the DR-AMD dataset, the system achieved an area under the ROC curve (AUC) of 95.1% for detection of referable DR (SE=90.1%, SP=90.6%). For referable AMD, the AUC was 94.9% (SE=91.8%, SP=87.5%). Average human performance for DR was SE=61.5% and SP=97.8%; for AMD, SE=76.5% and SP=96.1%. Regarding detection of referable DR in Messidor, AUC was 97.5% (SE=92.0%, SP=92.1%); for referable AMD in AREDS, AUC was 92.7% (SE=85.8%, SP=86.0%). Conclusions: The validated system performs comparably to human experts at simultaneous detection of DR and AMD. This shows that DL systems can facilitate access to joint screening of eye diseases and become a quick and reliable support for ophthalmological experts.}, keywords = {}, pubstate = {published}, tppubtype = {article} } Purpose: To validate the performance of a commercially-available, CE-certified deep learning (DL) system, RetCAD v.1.3.0 (Thirona, Nijmegen, The Netherlands), for the joint automatic detection of diabetic retinopathy (DR) and age-related macular degeneration (AMD) in color fundus (CF) images on a dataset with mixed presence of eye diseases. Methods: Evaluation of joint detection of referable DR and AMD was performed on a DR-AMD dataset with 600 images acquired during routine clinical practice, containing referable and non-referable cases of both diseases. Each image was graded for DR and AMD by an experienced ophthalmologist to establish the reference standard (RS), and by four independent observers for comparison with human performance. Validation was furtherly assessed on Messidor (1200 images) for individual identification of referable DR, and the Age-Related Eye Disease Study (AREDS) dataset (133821 images) for referable AMD, against the corresponding RS. Results: Regarding joint validation on the DR-AMD dataset, the system achieved an area under the ROC curve (AUC) of 95.1% for detection of referable DR (SE=90.1%, SP=90.6%). For referable AMD, the AUC was 94.9% (SE=91.8%, SP=87.5%). Average human performance for DR was SE=61.5% and SP=97.8%; for AMD, SE=76.5% and SP=96.1%. Regarding detection of referable DR in Messidor, AUC was 97.5% (SE=92.0%, SP=92.1%); for referable AMD in AREDS, AUC was 92.7% (SE=85.8%, SP=86.0%). Conclusions: The validated system performs comparably to human experts at simultaneous detection of DR and AMD. This shows that DL systems can facilitate access to joint screening of eye diseases and become a quick and reliable support for ophthalmological experts. |
C. González-Gonzalo, B. Liefers, A. Vaidyanathan, H. J. van Zeeland, C. C. W. Klaver, C. I. Sánchez Opening the “black box” of deep learning in automated screening of eye diseases Conference Association for Research in Vision and Ophthalmology Annual Meeting. ARVO Vancouver, 2019. @conference{Gonz2019a, title = {Opening the “black box” of deep learning in automated screening of eye diseases}, author = {C. González-Gonzalo, B. Liefers, A. Vaidyanathan, H. J. van Zeeland, C. C. W. Klaver, C. I. Sánchez}, url = {https://iovs.arvojournals.org/article.aspx?articleid=2746850&resultClick=1}, year = {2019}, date = {2019-04-30}, booktitle = {Association for Research in Vision and Ophthalmology Annual Meeting. ARVO Vancouver}, abstract = {Purpose: Systems based on deep learning (DL) have demonstrated to provide a scalable and high-performance solution for screening of eye diseases. However, DL is usually considered a “black box” due to lack of interpretability. We propose a deep visualization framework to explain the decisions made by a DL system, iteratively unveiling abnormalities responsible for referable predictions without needing lesion-level annotations. We apply the framework to automated screening of diabetic retinopathy (DR) in color fundus images (CFIs). Methods: The proposed framework consists of a baseline deep convolutional neural network to classify CFIs by DR stage. For each CFI classified as referable DR, the framework extracts initial visual evidence of the predicted stage by computing a saliency map, which indicates regions in the image that would contribute the most to changes in the prediction if modified. This provides localization of abnormalities that are then removed through selective inpainting. The image is again classified, expecting reduced referability. We iteratively apply this procedure to increase attention to less discriminative areas and generate refined visual evidence. The Kaggle DR database, with CFIs graded regarding DR severity (stages 0 and 1: non-referable DR, stages 2 to 4: referable DR), is used for training and validation of the image-level classification task. For validation of the obtained visual evidence, we used the DiaretDB1 dataset, which contains CFIs with manually-delineated areas for 4 types of lesions: hemorrhages, microaneurysms, hard and soft exudates. Results: The baseline classifier obtained an area under the Receiver Operating Characteristic (ROC) curve of 0.93 and a quadratic weighted kappa of 0.77 on the Kaggle test set (53576 CFIs). Free-response ROC (FROC) curves (Figure 2) analyze the correspondence between highlighted areas and each type of lesion for those images classified as referable DR in the DiaretDB1 dataset (62 CFIs), comparing between initial and refined visual evidence. Conclusions: The proposed framework provides visual evidence for the decisions made by a DL system, iteratively unveiling abnormalities in CFIs based on the prediction of a classifier trained only with image-level labels. This provides a “key” to open the “black box” of artificial intelligence in screening of eye diseases, aiming to increase experts’ trust and facilitate its integration in screening settings.}, keywords = {}, pubstate = {published}, tppubtype = {conference} } Purpose: Systems based on deep learning (DL) have demonstrated to provide a scalable and high-performance solution for screening of eye diseases. However, DL is usually considered a “black box” due to lack of interpretability. We propose a deep visualization framework to explain the decisions made by a DL system, iteratively unveiling abnormalities responsible for referable predictions without needing lesion-level annotations. We apply the framework to automated screening of diabetic retinopathy (DR) in color fundus images (CFIs). Methods: The proposed framework consists of a baseline deep convolutional neural network to classify CFIs by DR stage. For each CFI classified as referable DR, the framework extracts initial visual evidence of the predicted stage by computing a saliency map, which indicates regions in the image that would contribute the most to changes in the prediction if modified. This provides localization of abnormalities that are then removed through selective inpainting. The image is again classified, expecting reduced referability. We iteratively apply this procedure to increase attention to less discriminative areas and generate refined visual evidence. The Kaggle DR database, with CFIs graded regarding DR severity (stages 0 and 1: non-referable DR, stages 2 to 4: referable DR), is used for training and validation of the image-level classification task. For validation of the obtained visual evidence, we used the DiaretDB1 dataset, which contains CFIs with manually-delineated areas for 4 types of lesions: hemorrhages, microaneurysms, hard and soft exudates. Results: The baseline classifier obtained an area under the Receiver Operating Characteristic (ROC) curve of 0.93 and a quadratic weighted kappa of 0.77 on the Kaggle test set (53576 CFIs). Free-response ROC (FROC) curves (Figure 2) analyze the correspondence between highlighted areas and each type of lesion for those images classified as referable DR in the DiaretDB1 dataset (62 CFIs), comparing between initial and refined visual evidence. Conclusions: The proposed framework provides visual evidence for the decisions made by a DL system, iteratively unveiling abnormalities in CFIs based on the prediction of a classifier trained only with image-level labels. This provides a “key” to open the “black box” of artificial intelligence in screening of eye diseases, aiming to increase experts’ trust and facilitate its integration in screening settings. |
J. Engelberts, C. González-Gonzalo, C. I. Sanchez, M. van Grinsven Association for Research in Vision and Ophthalmology Annual Meeting. ARVO Vancouver, 2019. @conference{Engelberts2019a, title = {Automatic Segmentation of Drusen and Exudates on Color Fundus Images using Generative Adversarial Networks}, author = {J. Engelberts, C. González-Gonzalo, C. I. Sanchez, M. van Grinsven}, url = {https://iovs.arvojournals.org/article.aspx?articleid=2745936&resultClick=1}, year = {2019}, date = {2019-04-30}, booktitle = {Association for Research in Vision and Ophthalmology Annual Meeting. ARVO Vancouver}, abstract = {Purpose: The presence of drusen and exudates, visible as bright lesions on color fundus images, is one of the early signs of visual threatening diseases such as Age-related Macular Degeneration and Diabetic Retinopathy. Accurate detection and quantification of these lesions during screening can help identify patients that would benefit from treatment. We developed a method based on generative adversarial networks (GANs) to segment bright lesions on color fundus images. Methods: We used 4179 color fundus images that were acquired during clinical routine. The images were contrast enhanced to increase the contrast between bright lesions and the background. All bright lesions were manually annotated by marking the center point of the lesions. The GAN was trained to estimate the image without bright lesions. The final segmentation was obtained by taking the difference between the input image and the estimated output. Results: This method was applied to an independent test set of 52 color fundus images with non-advanced stages of AMD from the European Genetic Database, which were fully segmented for bright lesions by two trained human observers. The method achieved Dice scores of 0.4862 and 0.4849 when compared to the observers, whereas the inter-observer Dice score was 0.5043. The total segmented bright lesion area per image was evaluated using the intraclass correlation (ICC). The method scored 0.8537 and 0.8352 when compared to the observers, whereas the inter-observer ICC was 0.8893. Conclusions: The results show the performance is close to the agreement between trained observers. This automatic segmentation of bright lesions can help early diagnosis of visual threatening diseases and opens the way for large scale clinical trials.}, keywords = {}, pubstate = {published}, tppubtype = {conference} } Purpose: The presence of drusen and exudates, visible as bright lesions on color fundus images, is one of the early signs of visual threatening diseases such as Age-related Macular Degeneration and Diabetic Retinopathy. Accurate detection and quantification of these lesions during screening can help identify patients that would benefit from treatment. We developed a method based on generative adversarial networks (GANs) to segment bright lesions on color fundus images. Methods: We used 4179 color fundus images that were acquired during clinical routine. The images were contrast enhanced to increase the contrast between bright lesions and the background. All bright lesions were manually annotated by marking the center point of the lesions. The GAN was trained to estimate the image without bright lesions. The final segmentation was obtained by taking the difference between the input image and the estimated output. Results: This method was applied to an independent test set of 52 color fundus images with non-advanced stages of AMD from the European Genetic Database, which were fully segmented for bright lesions by two trained human observers. The method achieved Dice scores of 0.4862 and 0.4849 when compared to the observers, whereas the inter-observer Dice score was 0.5043. The total segmented bright lesion area per image was evaluated using the intraclass correlation (ICC). The method scored 0.8537 and 0.8352 when compared to the observers, whereas the inter-observer ICC was 0.8893. Conclusions: The results show the performance is close to the agreement between trained observers. This automatic segmentation of bright lesions can help early diagnosis of visual threatening diseases and opens the way for large scale clinical trials. |
J. Sander, B.D. de Vos, J.M. Wolterink, I. Išgum Towards increased trustworthiness of deep learning segmentation methods on cardiac MRI Inproceedings SPIE Medical Imaging, 2019. @inproceedings{Sander2019, title = {Towards increased trustworthiness of deep learning segmentation methods on cardiac MRI}, author = {J. Sander, B.D. de Vos, J.M. Wolterink, I. Išgum}, url = {https://arxiv.org/pdf/1809.10430.pdf}, year = {2019}, date = {2019-02-17}, booktitle = {SPIE Medical Imaging}, abstract = {Current state-of-the-art deep learning segmentation methods have not yet made a broad entrance into the clinical setting in spite of high demand for such automatic methods. One important reason is the lack of reliability caused by models that fail unnoticed and often locally produce anatomically implausible results that medical experts would not make. This paper presents an automatic image segmentation method based on (Bayesian) dilated convolutional networks (DCNN) that generate segmentation masks and spatial uncertainty maps for the input image at hand. The method was trained and evaluated using segmentation of the left ventricle (LV) cavity, right ventricle (RV) endocardium and myocardium (Myo) at end-diastole (ED) and end-systole (ES) in 100 cardiac 2D MR scans from the MICCAI 2017 Challenge (ACDC). Combining segmentations and uncertainty maps and employing a human-in-the-loop setting, we provide evidence that image areas indicated as highly uncertain regarding the obtained segmentation almost entirely cover regions of incorrect segmentations. The fused information can be harnessed to increase segmentation performance. Our results reveal that we can obtain valuable spatial uncertainty maps with low computational effort using DCNNs.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Current state-of-the-art deep learning segmentation methods have not yet made a broad entrance into the clinical setting in spite of high demand for such automatic methods. One important reason is the lack of reliability caused by models that fail unnoticed and often locally produce anatomically implausible results that medical experts would not make. This paper presents an automatic image segmentation method based on (Bayesian) dilated convolutional networks (DCNN) that generate segmentation masks and spatial uncertainty maps for the input image at hand. The method was trained and evaluated using segmentation of the left ventricle (LV) cavity, right ventricle (RV) endocardium and myocardium (Myo) at end-diastole (ED) and end-systole (ES) in 100 cardiac 2D MR scans from the MICCAI 2017 Challenge (ACDC). Combining segmentations and uncertainty maps and employing a human-in-the-loop setting, we provide evidence that image areas indicated as highly uncertain regarding the obtained segmentation almost entirely cover regions of incorrect segmentations. The fused information can be harnessed to increase segmentation performance. Our results reveal that we can obtain valuable spatial uncertainty maps with low computational effort using DCNNs. |
2018 |
C. González-Gonzalo, B. Liefers, B. van Ginneken, C. I. Sánchez Improving weakly-supervised lesion localization with iterative saliency map refinement Conference Medical Imaging with Deep Learning. MIDL Amsterdam, 2018. @conference{Gonz2018, title = {Improving weakly-supervised lesion localization with iterative saliency map refinement}, author = {C. González-Gonzalo, B. Liefers, B. van Ginneken, C. I. Sánchez}, url = {https://openreview.net/forum?id=r15c8gnoG}, year = {2018}, date = {2018-05-20}, booktitle = {Medical Imaging with Deep Learning. MIDL Amsterdam}, abstract = {Interpretability of deep neural networks in medical imaging is becoming an important technique to understand network classification decisions and increase doctors' trust. Available methods for visual interpretation, though, tend to highlight only the most discriminant areas, which is suboptimal for clinical output. We propose a novel deep visualization framework for improving weakly-supervised lesion localization. The framework applies an iterative approach where, in each step, the interpretation maps focus on different, less discriminative areas of the images, but still important for the final classification, reaching a more refined localization of abnormalities. We evaluate the performance of the method for the localization of diabetic retinopathy lesions in color fundus images. The results show the obtained visualization maps are able to detect more lesions after the iterative procedure in the case of more severely affected retinas.}, keywords = {}, pubstate = {published}, tppubtype = {conference} } Interpretability of deep neural networks in medical imaging is becoming an important technique to understand network classification decisions and increase doctors' trust. Available methods for visual interpretation, though, tend to highlight only the most discriminant areas, which is suboptimal for clinical output. We propose a novel deep visualization framework for improving weakly-supervised lesion localization. The framework applies an iterative approach where, in each step, the interpretation maps focus on different, less discriminative areas of the images, but still important for the final classification, reaching a more refined localization of abnormalities. We evaluate the performance of the method for the localization of diabetic retinopathy lesions in color fundus images. The results show the obtained visualization maps are able to detect more lesions after the iterative procedure in the case of more severely affected retinas. |