· GUI : User can select a dataset and a classification technique then run the system. User does...

· GUI: User can select a dataset and a classification technique then run the system. User does not need to enter filename and technique name, just use mouse to select them.

· Python packages: NumPy, Pandas, Matplotlib, Scikit-learn, and Tkinter.

· Data Set: 3 sets in Scikit-learn (iris, breast-cancer, and wine). These aremulti-class datasets (M > 1) and multi-dimensional datasets (Dimensionality D > 2). For example, the iris dataset has 3 classes (M = 3), 50 samples per class, and each sample is a 4-dimensional sample. Each dataset will be partitioned into 2 sets that are Training set (80%) and Testing set (20%). These sets are separated that means no common sample between them.

· K-fold cross validation: the original data set is randomly partitioned into K equal sized subsets for all classes. The system will run cross validation K times, the first time it gets the first subset for validation and the remaining (K-1) subsets for training, the second time it gets the second subset for validation and the remaining (K-1) subsets for training, and so on. For example, if K = 5, the iris dataset is partitioned into 5 subsets, each subset has 3 classes and 10 samples per class. The cross-validation technique is used to select the best parameters (e.g., number of centroids) for a classification technique and use them to re-train the whole training set. You can build your own cross validation function or use existing functions in Scikit-learn package.

· K =3,5,7

· Classification Techniques: K-Nearest Neighbours Classification, Gaussian Mixture Model Classification, and Support Vector Classification. These techniques are available in Scikit-learn package.

· Outputs: accuracy, confusion matrix, and plots for parameters when running cross validation

o Accuracy: accuracy (in %) = number of samples correctly classified * 100% / total number of samples (both numbers are for the testing set).

o Confusion matrix: Each row of the matrix shows number of samples in a predicted class while each column shows number of samples in an actual class. For example, below is 2 examples of confusion matrix for 10-digit classification