52-39-1 br So we obtain the nearest neighbor image NN to
So, we obtain the 1-nearest neighbor image (1-NN) to the im-age s, named as adjacent image adjs (Algorithm 3, Line 4). If the labels (provided by M) from the images s and adjs are different (i.e. s.labelid = adjs.labelid, Line 5), both images are inserted into the list of candidates LC (Algorithm 3, Line 6). These images are
Fig. 3. P&R curves obtained by each approach over the I1 dataset, considering the: (a) first, (b) third, (c) fifth and (d) eighth learning iterations.
considered the most informative (uncertain) candidate ones, since they 52-39-1 are close to each other and have been labeled by the current instance of the model as being from different classes. These (di-verse) samples can bring benefits rather than samples considered from the same class.
If the previous selection criterion is satisfied, the set of candi-dates LC is not empty (i.e. LC = ∅, in Line 9). Then, samples from LC are separated in two auxiliary lists LP and LN , containing the images classified as relevant (positive) and irrelevant (negative), re-spectively (Algorithm 3, Lines 10 − 11). Each sample from LP and LN are ordered by the distance to the query image q (Algorithm 3, Lines 12 − 13). So, LC receives the concatenation of samples from LP and LN (Algorithm 3, Line 14).
If the desired number of images has not been obtained (i.e. size(LC ≤ nu)), we considered another criterion for selecting im-ages (Algorithm 3, Lines 16 − 23). The centers of mass comi from each class i are located and stored in their corresponding list Lcomi (Algorithm 3, Line 17). Afterwards, samples from Li, i = 1, 0 (i.e.
positive and negative samples) are organized in their respective lists (LP and LN ), in a descending order, according to the dis-tances between them and the centers from Lcomi (Algorithm 3, Lines 18 − 19).
Then, we can obtain a list of candidates LC , selecting one image from each list LP and LN , respectively, until the desired number of images is obtained (Algorithm 3, Lines 20 − 22). Finally, our se-lection strategy returns the set of the most informative (most un-certain and similar) candidates LC .
The experiments were performed based on public image datasets from the MAMMOSET database , which is composed of regions of interest (ROIs) of mammograms from three datasets (VIENNA, MIAS and DDSM). The VIENNA dataset was created by
Fig. 4. P&R curves obtained by each approach over the I2 dataset, considering the: (a) first, (b) third, (c) fifth and (d) eighth learning iterations.
the Department of Radiology at University of Vienna. It is com-posed of mammograms collected from the Breast Imaging Report-ing and Data System (BI-RADS) Tutorium, which was carried out at the same university. The Mammographic Image Analysis Society (MINI-MIAS) repository is a reduced version of the MIAS dataset. The DDSM (Digital Dataset for Screening Mammography) reposi-tory is composed of medical breast images with 12 bits per pixel, organized into four categories based on the view of the breast image, which are (i) LCC: Left CranioCaudal, (ii) RCC: Right Cran-ioCaudal, (iii) LMLO: Left MedioLateral Oblique, and (iv) RMLO: Right MedioLateral Oblique. For a more detailed description of the datasets see .
From these datasets, different subsets can be explored. Due to space constraints, we have selected some of them, in order to eval-uate our approach in distinct complexity scenarios. Each subset considered in our experiments is named as Ini , where the higher the ni, the higher (more challenging) the image dataset complex-ity. Tables 1–3 present the classes and the number of samples per class for each subset.
Table 1 Description of the dataset I1 - VIENNA.
MIAS DDSM MIAS-DDSM
Fig. 5. P&R curves obtained by each approach over the I3 dataset, considering the: (a) first, (b) third, (c) fifth and (d) eighth learning iterations.
Description of the dataset I5 - VIENNA-DDSM.
birads1-calcification 1040 birads2-calcification 91 birads3-calcification 56 birads4-calcification 119 birads5-calcification 34 birads1-mass 1647 birads2-mass 58 birads3-mass 83 birads4-mass 75 birads5-mass 49 Normal 41
Properties of each feature extractor applied to the datasets.
Feature extractor Category #Features
In order to corroborate the generalization and enable the repli-cation of our approach, we used public medical image datasets de-