2. Materials2.1. AIDPATH kidney database
The digital tissue images used in this work were obtained from the AIDPATH Kidney Database (see Acknowledgements). This dataset is composed of 5 different datasets of WSI of human kidney tissue cohorts acquired and digitalized from three European institutions: Castilla-La Mancha’s Healthcare services (Spain), The Andalusian Health Service (Spain) and The Vilnius University Hospital Santaros Klinikos (Lithuania). Tissue samples were collected with a biopsy needle having an outer diameter between 100 µm and 300 µm. Afterwards, paraffin blocks were prepared using tissue sections of 4µm and stained using PAS. PAS stain is commonly used due to its efficiency dyeing polysaccharides, which are present in kidney tissue and in highlighting glomerular basement membranes [22]. Digital WSI acquisition was performed with the Leica Aperio ScanScope CS scanner and extracted into an SVS file format. As a result, a dataset of 47 kidney WSIs was obtained. Images at 20x magnification were selected since this magnification maintain image quality and information at the same as allows to obtain valuable results reducing computational time significantly. Smaller resolutions have the disadvantage of loss image quality and therefore information of the glomerulus. On the other hand, magnifications like 40x imply higher image size increasing the model size and slowing down the training.
2.2. Kidney database processing
Once WSIs at 20x magnification were collected, they were split into 2000x2000 pixel patches selecting only those which contained tissue. This set of patches was examined and labeled into three classes (see Fig. 2): (i) Non-Glomerular structures: kidney tissue structures such as proximal and distal tubules, blood vessels, connective tissue stroma or inflammatory cells; (ii) Normal Glomeruli characterized by thin glomerular capillary loops, a regular number of endothelial and mesangial cells. The aspect of glomerulus surrounding tubules is normal and iii) Sclerosed Glomeruli where the whole (or nearly the whole) glomerulus presents sclerosis.
Download : Download high-res image (1MB)
Download : Download full-size image
Fig. 2. Glomerular structures in nephropathology images stained with PAS. (a) Non-glomerular structures, (b) Normal glomeruli and (c) Sclerosed glomeruli.
As a result of the previous steps, a dataset with a total of 1055 kidney tissue images was finally obtained. Glomeruli contours were annotated, generating a mask for each image. 1245 glomerular structures were annotated, 303 of these were sclerosed glomeruli and the remaining 942 were normal glomeruli.
CNN architectures typically require large datasets of images to obtain valuable results. For that reason, a data augmentation process was performed to increase the number of samples. Color normalization is one of the most common data augmentation methods used in digital pathology. Although immunohistochemical processes use the same staining marker, some color variations can appear in the tissue. It mainly depends on the commercial provider but it directly affects image analysis. Color normalization methods overcome this issue by applying a color transfer between images. Reinhard’s method (RM) [20] was selected for color normalization. To support this decision, we focus on the study performed in [4], where four different methods used for color standardization: histogram matching (HM) [26], Macenko’s method (MM) [15], RM and non-linear spline mapping method (SM) [12]. Color transfer was applied with 5 different references therefore extending the dataset to 5275 images.
Another technique widely used for data augmentation is to compute minor affine transformations on the images such as flips, mirroring, translations and rotations. Therefore, together with the RM, rotations of 90∘ and 270∘, as well as vertical flip was performed. Finally, considering these image transformations the dataset was composed of 25,320 images.
The digital tissue images used in this work were obtained from the AIDPATH Kidney Database (see Acknowledgements). This dataset is composed of 5 different datasets of WSI of human kidney tissue cohorts acquired and digitalized from three European institutions: Castilla-La Mancha’s Healthcare services (Spain), The Andalusian Health Service (Spain) and The Vilnius University Hospital Santaros Klinikos (Lithuania). Tissue samples were collected with a biopsy needle having an outer diameter between 100 µm and 300 µm. Afterwards, paraffin blocks were prepared using tissue sections of 4µm and stained using PAS. PAS stain is commonly used due to its efficiency dyeing polysaccharides, which are present in kidney tissue and in highlighting glomerular basement membranes [22]. Digital WSI acquisition was performed with the Leica Aperio ScanScope CS scanner and extracted into an SVS file format. As a result, a dataset of 47 kidney WSIs was obtained. Images at 20x magnification were selected since this magnification maintain image quality and information at the same as allows to obtain valuable results reducing computational time significantly. Smaller resolutions have the disadvantage of loss image quality and therefore information of the glomerulus. On the other hand, magnifications like 40x imply higher image size increasing the model size and slowing down the training.
2.2. Kidney database processing
Once WSIs at 20x magnification were collected, they were split into 2000x2000 pixel patches selecting only those which contained tissue. This set of patches was examined and labeled into three classes (see Fig. 2): (i) Non-Glomerular structures: kidney tissue structures such as proximal and distal tubules, blood vessels, connective tissue stroma or inflammatory cells; (ii) Normal Glomeruli characterized by thin glomerular capillary loops, a regular number of endothelial and mesangial cells. The aspect of glomerulus surrounding tubules is normal and iii) Sclerosed Glomeruli where the whole (or nearly the whole) glomerulus presents sclerosis.
Download : Download high-res image (1MB)
Download : Download full-size image
Fig. 2. Glomerular structures in nephropathology images stained with PAS. (a) Non-glomerular structures, (b) Normal glomeruli and (c) Sclerosed glomeruli.
As a result of the previous steps, a dataset with a total of 1055 kidney tissue images was finally obtained. Glomeruli contours were annotated, generating a mask for each image. 1245 glomerular structures were annotated, 303 of these were sclerosed glomeruli and the remaining 942 were normal glomeruli.
CNN architectures typically require large datasets of images to obtain valuable results. For that reason, a data augmentation process was performed to increase the number of samples. Color normalization is one of the most common data augmentation methods used in digital pathology. Although immunohistochemical processes use the same staining marker, some color variations can appear in the tissue. It mainly depends on the commercial provider but it directly affects image analysis. Color normalization methods overcome this issue by applying a color transfer between images. Reinhard’s method (RM) [20] was selected for color normalization. To support this decision, we focus on the study performed in [4], where four different methods used for color standardization: histogram matching (HM) [26], Macenko’s method (MM) [15], RM and non-linear spline mapping method (SM) [12]. Color transfer was applied with 5 different references therefore extending the dataset to 5275 images.
Another technique widely used for data augmentation is to compute minor affine transformations on the images such as flips, mirroring, translations and rotations. Therefore, together with the RM, rotations of 90∘ and 270∘, as well as vertical flip was performed. Finally, considering these image transformations the dataset was composed of 25,320 images.