Skip to content
Snippets Groups Projects
Commit dd8fb74f authored by ashepley's avatar ashepley
Browse files

Delete Multi-Layered_Perceptron.ipynb

parent 7cf661eb
No related branches found
No related tags found
No related merge requests found
%% Cell type:markdown id: tags:
## Classification MLP
%% Cell type:code id: tags:
``` python
#import libraries
import numpy as np
import pandas as pd
from sklearn.model_selection import *
from sklearn.linear_model import *
from sklearn import metrics
from sklearn.neural_network import MLPClassifier
def check_NaN(dataframe):
print("Total NaN:", dataframe.isnull().values.sum())
print("NaN by column:\n",dataframe.isnull().sum())
return
def one_hot_encode(dataframe, col_name):
dataframe = pd.get_dummies(dataframe, columns=[col_name], prefix = [col_name])
return dataframe
```
%% Cell type:markdown id: tags:
### Using a Multi-Layered Perceptron (MLP) to Classify Mushrooms as Edible or Poisonous
In this Notebook, we'll be using the mushroom classification dataset, which you can find here https://www.kaggle.com/uciml/mushroom-classification to train an MLP to determine whether a mushroom is edible (e) or poisonous (p), based its physical characteristics.
%% Cell type:code id: tags:
``` python
#load the dataset
data = pd.read_csv("./datasets/mushrooms.csv")
```
%% Cell type:code id: tags:
``` python
#check out its features
data.head()
```
%% Output
class cap-shape cap-surface cap-color bruises odor gill-attachment \
0 p x s n t p f
1 e x s y t a f
2 e b s w t l f
3 p x y w t p f
4 e x s g f n f
gill-spacing gill-size gill-color ... stalk-surface-below-ring \
0 c n k ... s
1 c b k ... s
2 c b n ... s
3 c n n ... s
4 w b k ... s
stalk-color-above-ring stalk-color-below-ring veil-type veil-color \
0 w w p w
1 w w p w
2 w w p w
3 w w p w
4 w w p w
ring-number ring-type spore-print-color population habitat
0 o p k s u
1 o p n n g
2 o p n n m
3 o p k s u
4 o e n a g
[5 rows x 23 columns]
%% Cell type:markdown id: tags:
Let's choose gill-size (narrow or broad) and spore print color as our features. Note spore-print-color: black=k, brown=n, buff=b, chocolate=h, green=r, orange=o, purple=u, white=w, yellow=y
%% Cell type:code id: tags:
``` python
chosen_features = data.filter(['class','gill-size','spore-print-color'])
chosen_features.head()
```
%% Output
class gill-size spore-print-color
0 p n k
1 e b n
2 e b n
3 p n k
4 e b n
%% Cell type:code id: tags:
``` python
#always remember to check for NaN values
check_NaN(chosen_features)
```
%% Output
Total NaN: 0
NaN by column:
class 0
gill-size 0
spore-print-color 0
dtype: int64
%% Cell type:markdown id: tags:
One hot encode the chosen features
%% Cell type:code id: tags:
``` python
subset = one_hot_encode(chosen_features, 'class')
subset = one_hot_encode(subset, 'gill-size')
subset = one_hot_encode(subset, 'spore-print-color')
subset.head()
```
%% Output
class_e class_p gill-size_b gill-size_n spore-print-color_b \
0 0 1 0 1 0
1 1 0 1 0 0
2 1 0 1 0 0
3 0 1 0 1 0
4 1 0 1 0 0
spore-print-color_h spore-print-color_k spore-print-color_n \
0 0 1 0
1 0 0 1
2 0 0 1
3 0 1 0
4 0 0 1
spore-print-color_o spore-print-color_r spore-print-color_u \
0 0 0 0
1 0 0 0
2 0 0 0
3 0 0 0
4 0 0 0
spore-print-color_w spore-print-color_y
0 0 0
1 0 0
2 0 0
3 0 0
4 0 0
%% Cell type:markdown id: tags:
Now, let's just pick the 'class_e' feature. This means if the Perceptron returns a value of 1, then the mushroom is edible. If it returns 0, then the mushroom is poisonous. Let's also pick 'gill_size_b', because the only other value it can be is 'gill_size_n', which means the gill size will be broad when it = 1, and narrow when it = 0. We'll pick all the colours to train on.
%% Cell type:code id: tags:
``` python
final = subset.filter(['class_e','gill-size_b','spore-print-color_h','spore-print-color_h','spore-print-color_k','spore-print-color_n','spore-print-color_o','spore-print-color_r','spore-print-color_u','spore-print-color_w','spore-print-color_y'])
final.head()
```
%% Output
class_e gill-size_b spore-print-color_h spore-print-color_h \
0 0 0 0 0
1 1 1 0 0
2 1 1 0 0
3 0 0 0 0
4 1 1 0 0
spore-print-color_k spore-print-color_n spore-print-color_o \
0 1 0 0
1 0 1 0
2 0 1 0
3 1 0 0
4 0 1 0
spore-print-color_r spore-print-color_u spore-print-color_w \
0 0 0 0
1 0 0 0
2 0 0 0
3 0 0 0
4 0 0 0
spore-print-color_y
0 0
1 0
2 0
3 0
4 0
%% Cell type:code id: tags:
``` python
#Create the train/test splits as we did before
x_train, x_test, y_train, y_test = train_test_split(final.drop(['class_e'], axis=1),final['class_e'],test_size=0.2,random_state=1)
print("x train/test ",x_train.shape, x_test.shape)
print("y train/test ",y_train.shape, y_test.shape)
```
%% Output
x train/test (6499, 10) (1625, 10)
y train/test (6499,) (1625,)
%% Cell type:code id: tags:
``` python
#Convert them from pandas to numpy arrays
x = x_train.values
y = y_train.values
x_t = x_test.values
y_t = y_test.values
```
%% Cell type:markdown id: tags:
#### MLP Training and Evaluation
Let's create an MLP. Currently the only loss function it supports is the Cross-Entropy loss function, which is used by default. By default, it uses the ReLU activation function. It also has a default of 1 hidden layer, containing 100 neurons.
Here are some parameter options you can explore:
* activation{‘identity’, ‘logistic’, ‘tanh’, ‘relu’}, default=’relu’
* hidden_layer_sizestuple, length = n_layers - 2, default=(100,) where the ith element represents the number of neurons in the ith hidden layer.
* Find out more here: https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html
%% Cell type:code id: tags:
``` python
MLP = MLPClassifier() #activation='logistic', hidden_layer_sizes=(1,),hidden_layer_sizes=(1,)
```
%% Cell type:code id: tags:
``` python
#train the mlp
MLP.fit(x, y)
```
%% Output
MLPClassifier(activation='relu', alpha=0.0001, batch_size='auto', beta_1=0.9,
beta_2=0.999, early_stopping=False, epsilon=1e-08,
hidden_layer_sizes=(100,), learning_rate='constant',
learning_rate_init=0.001, max_iter=200, momentum=0.9,
n_iter_no_change=10, nesterovs_momentum=True, power_t=0.5,
random_state=None, shuffle=True, solver='adam', tol=0.0001,
validation_fraction=0.1, verbose=False, warm_start=False)
%% Cell type:code id: tags:
``` python
predictions = MLP.predict(x_t)
#Calculate the mean squared error and accuracy
print("Mean squared error: ",np.mean((predictions - y_t) ** 2))
print("Accuracy:",str(round(metrics.accuracy_score(y_t, predictions)*100))+"%")
```
%% Output
Mean squared error: 0.023384615384615386
Accuracy: 98.0%
%% Cell type:markdown id: tags:
Test a mushroom with a broad gill-size and black spore print color, where index = 0 is gill-size and index = 3 is black
%% Cell type:code id: tags:
``` python
test_mushroom = [1,0,0,0,0,0,0,0,1,0]
prediction = MLP.predict([test_mushroom])
```
%% Cell type:code id: tags:
``` python
if prediction==1:
print('Edible')
else:
print('Poisonous')
```
%% Cell type:markdown id: tags:
### Exercise:
Have a go at changing some of the learning parameters, e.g. add more layers, or more neurons per layer, or change the activation function, and see if you can improve performance beyond 98%!
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment