Skip to content
Snippets Groups Projects
Commit ba84c9c3 authored by ashepley's avatar ashepley
Browse files

Upload New File

parent 89ee0ce9
Branches
No related tags found
No related merge requests found
%% Cell type:markdown id: tags:
Artificial Intelligence and the Perceptron
* It is a single neural layer
* A Perceptron is a simple model mimicking a biological neuron
* There are 4 main components
* input layer
* weights and bias
* net sum
* activation function
All the inputs x are multiplied with their weights w. Let’s call it k.
Add all the multiplied values and call them Weighted Sum.
Apply that weighted sum to the correct Activation Function.
Perceptron is usually used to classify the data into two parts. Therefore, it is also known as a Linear Binary Classifier.
the activation functions are used to map the input between the required values like (0, 1) or (-1, 1).
%% Cell type:code id: tags:
``` python
#import libraries
import numpy as np
import pandas as pd
from sklearn.model_selection import *
from sklearn.linear_model import *
from sklearn import metrics
def check_NaN(dataframe):
print("Total NaN:", dataframe.isnull().values.sum())
print("NaN by column:\n",dataframe.isnull().sum())
return
def one_hot_encode(dataframe, col_name):
dataframe = pd.get_dummies(dataframe, columns=[col_name], prefix = [col_name])
return dataframe
```
%% Cell type:markdown id: tags:
### Using a Perceptron to Classify Mushrooms as Edible or Poisonous
In this Notebook, we'll be using the mushroom classification dataset, which you can find here https://www.kaggle.com/uciml/mushroom-classification to train a Perceptron to determine whether a mushroom is edible (e) or poisonous (p), based its physical characteristics.
%% Cell type:code id: tags:
``` python
#load the dataset
data = pd.read_csv("./dataset/mushrooms.csv")
```
%% Cell type:code id: tags:
``` python
#check out its features
data.head()
```
%% Output
class cap-shape cap-surface cap-color bruises odor gill-attachment \
0 p x s n t p f
1 e x s y t a f
2 e b s w t l f
3 p x y w t p f
4 e x s g f n f
gill-spacing gill-size gill-color ... stalk-surface-below-ring \
0 c n k ... s
1 c b k ... s
2 c b n ... s
3 c n n ... s
4 w b k ... s
stalk-color-above-ring stalk-color-below-ring veil-type veil-color \
0 w w p w
1 w w p w
2 w w p w
3 w w p w
4 w w p w
ring-number ring-type spore-print-color population habitat
0 o p k s u
1 o p n n g
2 o p n n m
3 o p k s u
4 o e n a g
[5 rows x 23 columns]
%% Cell type:markdown id: tags:
Let's choose gill-size (narrow or broad) and spore print color as our features. Note spore-print-color: black=k, brown=n, buff=b, chocolate=h, green=r, orange=o, purple=u, white=w, yellow=y
%% Cell type:code id: tags:
``` python
chosen_features = data.filter(['class','gill-size','spore-print-color'])
chosen_features.head()
```
%% Output
class gill-size spore-print-color
0 p n k
1 e b n
2 e b n
3 p n k
4 e b n
%% Cell type:code id: tags:
``` python
#always remember to check for NaN values
check_NaN(chosen_features)
```
%% Output
Total NaN: 0
NaN by column:
class 0
gill-size 0
spore-print-color 0
dtype: int64
%% Cell type:markdown id: tags:
One hot encode the chosen features
%% Cell type:code id: tags:
``` python
subset = one_hot_encode(chosen_features, 'class')
subset = one_hot_encode(subset, 'gill-size')
subset = one_hot_encode(subset, 'spore-print-color')
subset.head()
```
%% Output
class_e class_p gill-size_b gill-size_n spore-print-color_b \
0 0 1 0 1 0
1 1 0 1 0 0
2 1 0 1 0 0
3 0 1 0 1 0
4 1 0 1 0 0
spore-print-color_h spore-print-color_k spore-print-color_n \
0 0 1 0
1 0 0 1
2 0 0 1
3 0 1 0
4 0 0 1
spore-print-color_o spore-print-color_r spore-print-color_u \
0 0 0 0
1 0 0 0
2 0 0 0
3 0 0 0
4 0 0 0
spore-print-color_w spore-print-color_y
0 0 0
1 0 0
2 0 0
3 0 0
4 0 0
%% Cell type:markdown id: tags:
Now, let's just pick the 'class_e' feature. This means if the Perceptron returns a value of 1, then the mushroom is edible. If it returns 0, then the mushroom is poisonous. Let's also pick 'gill_size_b', because the only other value it can be is 'gill_size_n', which means the gill size will be broad when it = 1, and narrow when it = 0. We'll pick all the colours to train on.
%% Cell type:code id: tags:
``` python
final = subset.filter(['class_e','gill-size_b','spore-print-color_h','spore-print-color_h','spore-print-color_k','spore-print-color_n','spore-print-color_o','spore-print-color_r','spore-print-color_u','spore-print-color_w','spore-print-color_y'])
final.head()
```
%% Output
class_e gill-size_b spore-print-color_h spore-print-color_h \
0 0 0 0 0
1 1 1 0 0
2 1 1 0 0
3 0 0 0 0
4 1 1 0 0
spore-print-color_k spore-print-color_n spore-print-color_o \
0 1 0 0
1 0 1 0
2 0 1 0
3 1 0 0
4 0 1 0
spore-print-color_r spore-print-color_u spore-print-color_w \
0 0 0 0
1 0 0 0
2 0 0 0
3 0 0 0
4 0 0 0
spore-print-color_y
0 0
1 0
2 0
3 0
4 0
%% Cell type:code id: tags:
``` python
#Create the train/test splits as we did before
x_train, x_test, y_train, y_test = train_test_split(final.drop(['class_e'], axis=1),final['class_e'],test_size=0.2,random_state=42)
print("x train/test ",x_train.shape, x_test.shape)
print("y train/test ",y_train.shape, y_test.shape)
```
%% Output
x train/test (6499, 10) (1625, 10)
y train/test (6499,) (1625,)
%% Cell type:code id: tags:
``` python
#Convert them from pandas to numpy arrays
x = x_train.values
y = y_train.values
x_t = x_test.values
y_t = y_test.values
```
%% Cell type:code id: tags:
``` python
x_t[0]
```
%% Output
array([1, 0, 0, 0, 1, 0, 0, 0, 0, 0], dtype=uint8)
%% Cell type:code id: tags:
``` python
#Create a perception model and train it
perceptron = Perceptron()
perceptron.fit(x, y)
```
%% Output
C:\Users\Andreas Shepley\Anaconda3\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:166: FutureWarning: max_iter and tol parameters have been added in Perceptron in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
FutureWarning)
Perceptron(alpha=0.0001, class_weight=None, early_stopping=False, eta0=1.0,
fit_intercept=True, max_iter=None, n_iter=None, n_iter_no_change=5,
n_jobs=None, penalty=None, random_state=0, shuffle=True, tol=None,
validation_fraction=0.1, verbose=0, warm_start=False)
%% Cell type:code id: tags:
``` python
#run the model on the test set
predictions = perceptron.predict(x_t)
#Calculate the mean squared error and accuracy
np.mean((predictions - y_t) ** 2)
print("Accuracy:",str(round(metrics.accuracy_score(y_t, predictions)*100))+"%")
```
%% Output
Accuracy: 96.0%
%% Cell type:markdown id: tags:
Test a mushroom with a broad gill-size and black spore print color, where index = 0 is gill-size and index = 3 is black
%% Cell type:code id: tags:
``` python
test_mushroom = [1,0,0,0,0,0,0,0,1,0]
prediction = perceptron.predict([test_mushroom])
```
%% Cell type:code id: tags:
``` python
if prediction==1:
print('Edible')
else:
print('Poisonous')
```
%% Output
Poisonous
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment