Title: Wisconsin Diagnostic Breast Cancer (WDBC)
Data Set Characteristics: |
Multivariate |
Number of Instances: |
569 |
Area: |
Life |
Attribute Characteristics: |
Real |
Number of Attributes: |
32 |
Date Donated |
1995-11-01 |
Associated Tasks: |
Classification |
Missing Values? |
No |
Number of Web Hits: |
30874 |
Highest Percentage Achieved: |
98% |
Attribute Information:
1) ID number
2) Diagnosis (M = malignant, B = benign)
3-32) real-valued input features
Features are computed from a digitized image of a fine needle aspirate (FNA) of a breast mass. They describe characteristics of the cell nuclei present in the image.
Ten real-valued features are computed for each cell nucleus:
a) radius (mean of distances from center to points on the perimeter)
b) texture (standard deviation of gray-scale values)
c) perimeter
d) area
e) smoothness (local variation in radius lengths)
f) compactness (perimeter^2 / area – 1.0)
g) concavity (severity of concave portions of the contour)
h) concave points (number of concave portions of the contour)
i) symmetry
j) fractal dimension (“coastline approximation” – 1)
The mean, standard error, and "worst" or largest (mean of the three largest values) of these features were computed for each image, resulting in 30 features. For instance, field 3 is Mean Radius, field 13 is Radius SE, field 23 is Worst Radius.
Data set information:
-
predicting field 2, diagnosis: B = benign, M = malignant
-
sets are linearly separable using all 30 input features
-
Class distribution: 357 benign, 212 malignant