"y = np.array([0,0,0,0,0,0,1,1,1,1,1,1]) # 0: blue class, 1: red class"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"plt.plot(xBlue, yBlue, 'ro', color='blue')\n",
"plt.plot(xRed, yRed, 'ro', color='red')\n",
"plt.plot(4.5,4.5,'ro',color='green')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"classifier = svm.SVC()\n",
"classifier.fit(X,y)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"coord = [4.5,4.5]\n",
"blue_red = classifier.predict([coord])\n",
"\n",
"if blue_red == 1:\n",
" print(coord,\" is red\")\n",
"else:\n",
" print(coord, \" is blue\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Let's apply this to 'real' data!\n",
"\n",
"We'll be using a Support Vector Machine to predict whether a country is developed or not based its World Health Organisation life expectancy and GDP. You can access the dataset in this lecture here: https://www.kaggle.com/augustus0498/life-expectancy-who"
"#### Choose your features: We'll be choosing Happiness Score','Trust (Government Corruption)','Economy (GDP per Capita)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#chosen_columns = ['Happiness Score','Economy (GDP per Capita)','Family']\n",
"chosen_columns = ['Happiness Score','Trust (Government Corruption)','Economy (GDP per Capita)']\n",
"#You can experiment with others, such as;'Measles','AdultMortality','infantdeaths','Alcohol','HepatitisB','Measles','Polio','Population','thinness5-9years','HIV/AIDS','BMI','Diphtheria','GDP']\n",
"A lower value of Gamma will loosely fit the training dataset, whereas a higher value of gamma will exactly fit the training dataset resulting in over-fitting.\n",
"\n",
"C parameter used is to maintain regularization. A smaller value of C creates a small-margin hyperplane and a larger value of C creates a larger-margin hyperplane.\n",
y=np.array([0,0,0,0,0,0,1,1,1,1,1,1])# 0: blue class, 1: red class
```
%% Cell type:code id: tags:
``` python
plt.plot(xBlue,yBlue,'ro',color='blue')
plt.plot(xRed,yRed,'ro',color='red')
plt.plot(4.5,4.5,'ro',color='green')
```
%% Cell type:code id: tags:
``` python
classifier=svm.SVC()
classifier.fit(X,y)
```
%% Cell type:code id: tags:
``` python
coord=[4.5,4.5]
blue_red=classifier.predict([coord])
ifblue_red==1:
print(coord," is red")
else:
print(coord," is blue")
```
%% Cell type:markdown id: tags:
### Let's apply this to 'real' data!
We'll be using a Support Vector Machine to predict whether a country is developed or not based its World Health Organisation life expectancy and GDP. You can access the dataset in this lecture here: https://www.kaggle.com/augustus0498/life-expectancy-who
#### Choose your features: We'll be choosing Happiness Score','Trust (Government Corruption)','Economy (GDP per Capita)
%% Cell type:code id: tags:
``` python
#chosen_columns = ['Happiness Score','Economy (GDP per Capita)','Family']
chosen_columns=['Happiness Score','Trust (Government Corruption)','Economy (GDP per Capita)']
#You can experiment with others, such as;'Measles','AdultMortality','infantdeaths','Alcohol','HepatitisB','Measles','Polio','Population','thinness5-9years','HIV/AIDS','BMI','Diphtheria','GDP']
life_expectancy=dataset.filter(chosen_columns)
life_expectancy.head()
```
%% Cell type:markdown id: tags:
#### Check the feature columns for NaN values and correct any missing data
A lower value of Gamma will loosely fit the training dataset, whereas a higher value of gamma will exactly fit the training dataset resulting in over-fitting.
C parameter used is to maintain regularization. A smaller value of C creates a small-margin hyperplane and a larger value of C creates a larger-margin hyperplane.