Huang Dong’s Blog, email: huangdongxy@hotmail.com

February 8, 2009

breast cancer cell classification using SVM (EE5904 project)

Filed under: Neural networks, pattern recognition, Tech — donghuang @ 6:00 pm

Title: Wisconsin Diagnostic Breast Cancer (WDBC)

<!–

–>

Data Set Characteristics:

Multivariate

Number of Instances:

569

Area:

Life

Attribute Characteristics:

Real

Number of Attributes:

32

Date Donated

1995-11-01

Associated Tasks:

Classification

Missing Values?

No

Number of Web Hits:

30874

Highest Percentage Achieved:  

98%

Attribute Information:

1) ID number
2) Diagnosis (M = malignant, B = benign)
3-32) real-valued input features

Features are computed from a digitized image of a fine needle aspirate (FNA) of a breast mass.  They describe characteristics of the cell nuclei present in the image.

Ten real-valued features are computed for each cell nucleus:

a) radius (mean of distances from center to points on the perimeter)
b) texture (standard deviation of gray-scale values)
c) perimeter
d) area
e) smoothness (local variation in radius lengths)
f) compactness (perimeter^2 / area – 1.0)
g) concavity (severity of concave portions of the contour)
h) concave points (number of concave portions of the contour)
i) symmetry
j) fractal dimension (“coastline approximation” – 1)

The mean, standard error, and "worst" or largest (mean of the three largest values) of these features were computed for each image, resulting in 30 features.  For instance, field 3 is Mean Radius, field 13 is Radius SE, field 23 is Worst Radius.

Data set information:

  • predicting field 2, diagnosis: B = benign, M = malignant
  • sets are linearly separable using all 30 input features
  • Class distribution: 357 benign, 212 malignant

Create a free website or blog at WordPress.com.