"### Using a Perceptron to Classify Mushrooms as Edible or Poisonous\n",
"In this Notebook, we'll be using the mushroom classification dataset, which you can find here https://www.kaggle.com/uciml/mushroom-classification to train a Perceptron to determine whether a mushroom is edible (e) or poisonous (p), based its physical characteristics."
]
},
{
"cell_type": "code",
"execution_count": 92,
"metadata": {},
"outputs": [],
"source": [
"#load the dataset\n",
"data = pd.read_csv(\"./dataset/mushrooms.csv\")"
]
},
{
"cell_type": "code",
"execution_count": 93,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>class</th>\n",
" <th>cap-shape</th>\n",
" <th>cap-surface</th>\n",
" <th>cap-color</th>\n",
" <th>bruises</th>\n",
" <th>odor</th>\n",
" <th>gill-attachment</th>\n",
" <th>gill-spacing</th>\n",
" <th>gill-size</th>\n",
" <th>gill-color</th>\n",
" <th>...</th>\n",
" <th>stalk-surface-below-ring</th>\n",
" <th>stalk-color-above-ring</th>\n",
" <th>stalk-color-below-ring</th>\n",
" <th>veil-type</th>\n",
" <th>veil-color</th>\n",
" <th>ring-number</th>\n",
" <th>ring-type</th>\n",
" <th>spore-print-color</th>\n",
" <th>population</th>\n",
" <th>habitat</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>p</td>\n",
" <td>x</td>\n",
" <td>s</td>\n",
" <td>n</td>\n",
" <td>t</td>\n",
" <td>p</td>\n",
" <td>f</td>\n",
" <td>c</td>\n",
" <td>n</td>\n",
" <td>k</td>\n",
" <td>...</td>\n",
" <td>s</td>\n",
" <td>w</td>\n",
" <td>w</td>\n",
" <td>p</td>\n",
" <td>w</td>\n",
" <td>o</td>\n",
" <td>p</td>\n",
" <td>k</td>\n",
" <td>s</td>\n",
" <td>u</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>e</td>\n",
" <td>x</td>\n",
" <td>s</td>\n",
" <td>y</td>\n",
" <td>t</td>\n",
" <td>a</td>\n",
" <td>f</td>\n",
" <td>c</td>\n",
" <td>b</td>\n",
" <td>k</td>\n",
" <td>...</td>\n",
" <td>s</td>\n",
" <td>w</td>\n",
" <td>w</td>\n",
" <td>p</td>\n",
" <td>w</td>\n",
" <td>o</td>\n",
" <td>p</td>\n",
" <td>n</td>\n",
" <td>n</td>\n",
" <td>g</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>e</td>\n",
" <td>b</td>\n",
" <td>s</td>\n",
" <td>w</td>\n",
" <td>t</td>\n",
" <td>l</td>\n",
" <td>f</td>\n",
" <td>c</td>\n",
" <td>b</td>\n",
" <td>n</td>\n",
" <td>...</td>\n",
" <td>s</td>\n",
" <td>w</td>\n",
" <td>w</td>\n",
" <td>p</td>\n",
" <td>w</td>\n",
" <td>o</td>\n",
" <td>p</td>\n",
" <td>n</td>\n",
" <td>n</td>\n",
" <td>m</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>p</td>\n",
" <td>x</td>\n",
" <td>y</td>\n",
" <td>w</td>\n",
" <td>t</td>\n",
" <td>p</td>\n",
" <td>f</td>\n",
" <td>c</td>\n",
" <td>n</td>\n",
" <td>n</td>\n",
" <td>...</td>\n",
" <td>s</td>\n",
" <td>w</td>\n",
" <td>w</td>\n",
" <td>p</td>\n",
" <td>w</td>\n",
" <td>o</td>\n",
" <td>p</td>\n",
" <td>k</td>\n",
" <td>s</td>\n",
" <td>u</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>e</td>\n",
" <td>x</td>\n",
" <td>s</td>\n",
" <td>g</td>\n",
" <td>f</td>\n",
" <td>n</td>\n",
" <td>f</td>\n",
" <td>w</td>\n",
" <td>b</td>\n",
" <td>k</td>\n",
" <td>...</td>\n",
" <td>s</td>\n",
" <td>w</td>\n",
" <td>w</td>\n",
" <td>p</td>\n",
" <td>w</td>\n",
" <td>o</td>\n",
" <td>e</td>\n",
" <td>n</td>\n",
" <td>a</td>\n",
" <td>g</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>5 rows × 23 columns</p>\n",
"</div>"
],
"text/plain": [
" class cap-shape cap-surface cap-color bruises odor gill-attachment \\\n",
"Now, let's just pick the 'class_e' feature. This means if the Perceptron returns a value of 1, then the mushroom is edible. If it returns 0, then the mushroom is poisonous. Let's also pick 'gill_size_b', because the only other value it can be is 'gill_size_n', which means the gill size will be broad when it = 1, and narrow when it = 0. We'll pick all the colours to train on. "
"final = subset.filter(['class_e','gill-size_b','spore-print-color_h','spore-print-color_h','spore-print-color_k','spore-print-color_n','spore-print-color_o','spore-print-color_r','spore-print-color_u','spore-print-color_w','spore-print-color_y'])\n",
"final.head()"
]
},
{
"cell_type": "code",
"execution_count": 98,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"x train/test (6499, 10) (1625, 10)\n",
"y train/test (6499,) (1625,)\n"
]
}
],
"source": [
"#Create the train/test splits as we did before\n",
"C:\\Users\\Andreas Shepley\\Anaconda3\\lib\\site-packages\\sklearn\\linear_model\\stochastic_gradient.py:166: FutureWarning: max_iter and tol parameters have been added in Perceptron in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.\n",
### Using a Perceptron to Classify Mushrooms as Edible or Poisonous
In this Notebook, we'll be using the mushroom classification dataset, which you can find here https://www.kaggle.com/uciml/mushroom-classification to train a Perceptron to determine whether a mushroom is edible (e) or poisonous (p), based its physical characteristics.
%% Cell type:code id: tags:
``` python
#load the dataset
data=pd.read_csv("./dataset/mushrooms.csv")
```
%% Cell type:code id: tags:
``` python
#check out its features
data.head()
```
%% Output
class cap-shape cap-surface cap-color bruises odor gill-attachment \
Now, let's just pick the 'class_e' feature. This means if the Perceptron returns a value of 1, then the mushroom is edible. If it returns 0, then the mushroom is poisonous. Let's also pick 'gill_size_b', because the only other value it can be is 'gill_size_n', which means the gill size will be broad when it = 1, and narrow when it = 0. We'll pick all the colours to train on.
C:\Users\Andreas Shepley\Anaconda3\lib\site-packages\sklearn\linear_model\stochastic_gradient.py:166: FutureWarning: max_iter and tol parameters have been added in Perceptron in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.