is_assignments/a2/code/Second IS assignment.ipynb

3009 lines
7.4 MiB
Plaintext
Raw Normal View History

2022-12-19 10:09:00 +01:00
{
"cells": [
{
"cell_type": "markdown",
"id": "c093ea0c",
"metadata": {},
"source": [
"# Seminar 2: Predicting Biodegradability of Chemical"
]
},
{
"cell_type": "markdown",
"id": "7aa30d7d",
"metadata": {},
"source": [
"## 1. Introduction\n",
"Chemicals are all around us. Studying their properties by the means of machine learning is an active\n",
"research field; matching molecular patterns with their behavior can be a decisive factor in the creation of\n",
"new materials, drugs, and more.\n",
"In this seminar assignment, your task is to explore the data and build machine-learning models that\n",
"predict the biodegradability of chemicals."
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "aeab08c8",
"metadata": {},
"source": [
"## 2. Task\n",
"You will work with the data set compiled by Mansouri et al. [data](https://www.openml.org/search?type=data&status=active&id=1494&sort=runs). There are 41 features and one target feature (biodegradability).\n",
"The target variable is encoded as ready biodegradable (1) and not ready biodegradable (2). The data set\n",
"consists of 1055 instances. Features can be either symbolic or numeric.\n",
"IMPORTANT: Use the dataset provided on uˇcilnica and NOT the one posted on the link above. It is\n",
"minimally modified and split into train in test sets.\n"
]
},
{
"cell_type": "markdown",
"id": "a4f197dd",
"metadata": {},
"source": [
"### 2.1 Exploration\n",
"Inspect the dataset. How balanced is the target variable? Are there any missing values present? If there\n",
"are, choose a strategy that takes this into account.\n",
"Most of your data is of the numeric type. Can you identify, by adopting exploratory analysis, whether\n",
"some features are directly related to the target? What about feature pairs? Produce at least three types of\n",
"visualizations of the feature space and be prepared to argue why these visualizations were useful for your\n",
"subsequent analysis."
]
},
{
"cell_type": "code",
2022-12-20 20:31:29 +01:00
"execution_count": 4,
2022-12-19 10:09:00 +01:00
"id": "5bcf6290",
"metadata": {},
"outputs": [],
"source": [
"# Needed imports\n",
"import numpy as np\n",
"import pandas as pd\n",
"import matplotlib.pyplot as plt\n",
"import sklearn\n",
"import seaborn as sns\n",
"import scikitplot as skplt\n"
]
},
{
"cell_type": "code",
2022-12-20 20:31:29 +01:00
"execution_count": 5,
2022-12-19 10:09:00 +01:00
"id": "18ff4f76",
"metadata": {},
"outputs": [],
"source": [
"df_train = pd.read_csv('train.csv')\n",
"df_test = pd.read_csv('test.csv')"
]
},
{
"cell_type": "markdown",
"id": "ea26bfdf",
"metadata": {},
"source": [
"#### Lets inspect training and test data"
]
},
{
"cell_type": "code",
2022-12-20 20:31:29 +01:00
"execution_count": 6,
2022-12-19 10:09:00 +01:00
"id": "5933f4d7",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>V1</th>\n",
" <th>V2</th>\n",
" <th>V3</th>\n",
" <th>V4</th>\n",
" <th>V5</th>\n",
" <th>V6</th>\n",
" <th>V7</th>\n",
" <th>V8</th>\n",
" <th>V9</th>\n",
" <th>V10</th>\n",
" <th>...</th>\n",
" <th>V33</th>\n",
" <th>V34</th>\n",
" <th>V35</th>\n",
" <th>V36</th>\n",
" <th>V37</th>\n",
" <th>V38</th>\n",
" <th>V39</th>\n",
" <th>V40</th>\n",
" <th>V41</th>\n",
" <th>Class</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>3.919</td>\n",
" <td>2.6909</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>31.4</td>\n",
" <td>2</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>2.949</td>\n",
" <td>1.591</td>\n",
" <td>0</td>\n",
" <td>7.253</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>4.170</td>\n",
" <td>2.1144</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>30.8</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>3.315</td>\n",
" <td>1.967</td>\n",
" <td>0</td>\n",
" <td>7.257</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>3.000</td>\n",
" <td>2.7098</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>20.0</td>\n",
" <td>0</td>\n",
" <td>2</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>3.046</td>\n",
" <td>5.000</td>\n",
" <td>0</td>\n",
" <td>6.690</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>13</th>\n",
" <td>4.214</td>\n",
" <td>2.6272</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>30.0</td>\n",
" <td>3</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>2.998</td>\n",
" <td>1.722</td>\n",
" <td>0</td>\n",
" <td>6.770</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>16</th>\n",
" <td>3.942</td>\n",
" <td>2.7719</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>31.6</td>\n",
" <td>2</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>3.542</td>\n",
" <td>1.739</td>\n",
" <td>0</td>\n",
" <td>8.127</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>2</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>5 rows × 42 columns</p>\n",
"</div>"
],
"text/plain": [
" V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 ... V33 V34 V35 \\\n",
"1 3.919 2.6909 0 0 0 0 0 31.4 2 0 ... 0 0 0 \n",
"2 4.170 2.1144 0 0 0 0 0 30.8 1 1 ... 0 0 0 \n",
"4 3.000 2.7098 0 0 0 0 0 20.0 0 2 ... 0 0 1 \n",
"13 4.214 2.6272 0 0 0 0 0 30.0 3 0 ... 0 0 0 \n",
"16 3.942 2.7719 1 0 0 0 0 31.6 2 0 ... 0 0 0 \n",
"\n",
" V36 V37 V38 V39 V40 V41 Class \n",
"1 2.949 1.591 0 7.253 0 0 2 \n",
"2 3.315 1.967 0 7.257 0 0 2 \n",
"4 3.046 5.000 0 6.690 0 0 2 \n",
"13 2.998 1.722 0 6.770 0 0 2 \n",
"16 3.542 1.739 0 8.127 0 1 2 \n",
"\n",
"[5 rows x 42 columns]"
]
},
2022-12-20 20:31:29 +01:00
"execution_count": 6,
2022-12-19 10:09:00 +01:00
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_test.head()"
]
},
{
"cell_type": "code",
2022-12-20 20:31:29 +01:00
"execution_count": 7,
2022-12-19 10:09:00 +01:00
"id": "1743d191",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>V1</th>\n",
" <th>V2</th>\n",
" <th>V3</th>\n",
" <th>V4</th>\n",
" <th>V5</th>\n",
" <th>V6</th>\n",
" <th>V7</th>\n",
" <th>V8</th>\n",
" <th>V9</th>\n",
" <th>V10</th>\n",
" <th>...</th>\n",
" <th>V33</th>\n",
" <th>V34</th>\n",
" <th>V35</th>\n",
" <th>V36</th>\n",
" <th>V37</th>\n",
" <th>V38</th>\n",
" <th>V39</th>\n",
" <th>V40</th>\n",
" <th>V41</th>\n",
" <th>Class</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>count</th>\n",
" <td>846.000000</td>\n",
" <td>846.000000</td>\n",
" <td>846.000000</td>\n",
" <td>821.000000</td>\n",
" <td>846.000000</td>\n",
" <td>846.000000</td>\n",
" <td>846.000000</td>\n",
" <td>846.000000</td>\n",
" <td>846.000000</td>\n",
" <td>846.000000</td>\n",
" <td>...</td>\n",
" <td>846.000000</td>\n",
" <td>846.000000</td>\n",
" <td>846.000000</td>\n",
" <td>846.000000</td>\n",
" <td>821.000000</td>\n",
" <td>846.000000</td>\n",
" <td>846.000000</td>\n",
" <td>846.000000</td>\n",
" <td>846.000000</td>\n",
" <td>846.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>mean</th>\n",
" <td>4.790476</td>\n",
" <td>3.054551</td>\n",
" <td>0.739953</td>\n",
" <td>0.030451</td>\n",
" <td>0.946809</td>\n",
" <td>0.277778</td>\n",
" <td>1.669031</td>\n",
" <td>37.422813</td>\n",
" <td>1.342790</td>\n",
" <td>1.784870</td>\n",
" <td>...</td>\n",
" <td>0.903073</td>\n",
" <td>1.241135</td>\n",
" <td>0.926714</td>\n",
" <td>3.922100</td>\n",
" <td>2.549406</td>\n",
" <td>0.671395</td>\n",
" <td>8.643191</td>\n",
" <td>0.059102</td>\n",
" <td>0.706856</td>\n",
" <td>1.333333</td>\n",
" </tr>\n",
" <tr>\n",
" <th>std</th>\n",
" <td>0.531991</td>\n",
" <td>0.813983</td>\n",
" <td>1.504545</td>\n",
" <td>0.198281</td>\n",
" <td>2.318081</td>\n",
" <td>1.045544</td>\n",
" <td>2.220221</td>\n",
" <td>9.030008</td>\n",
" <td>2.018433</td>\n",
" <td>1.773856</td>\n",
" <td>...</td>\n",
" <td>1.526124</td>\n",
" <td>2.248684</td>\n",
" <td>1.239133</td>\n",
" <td>0.992636</td>\n",
" <td>0.625021</td>\n",
" <td>1.093633</td>\n",
" <td>1.223700</td>\n",
" <td>0.342364</td>\n",
" <td>2.145396</td>\n",
" <td>0.471683</td>\n",
" </tr>\n",
" <tr>\n",
" <th>min</th>\n",
" <td>2.000000</td>\n",
" <td>0.803900</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>9.100000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>...</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>2.279000</td>\n",
" <td>1.467000</td>\n",
" <td>0.000000</td>\n",
" <td>4.948000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>1.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25%</th>\n",
" <td>4.499000</td>\n",
" <td>2.510175</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>30.800000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>...</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>3.497000</td>\n",
" <td>2.101000</td>\n",
" <td>0.000000</td>\n",
" <td>8.009500</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>1.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>50%</th>\n",
" <td>4.840000</td>\n",
" <td>3.052400</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>1.000000</td>\n",
" <td>37.850000</td>\n",
" <td>1.000000</td>\n",
" <td>1.500000</td>\n",
" <td>...</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>1.000000</td>\n",
" <td>3.732500</td>\n",
" <td>2.461000</td>\n",
" <td>0.000000</td>\n",
" <td>8.508000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>1.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>75%</th>\n",
" <td>5.119000</td>\n",
" <td>3.415725</td>\n",
" <td>1.000000</td>\n",
" <td>0.000000</td>\n",
" <td>1.000000</td>\n",
" <td>0.000000</td>\n",
" <td>3.000000</td>\n",
" <td>43.800000</td>\n",
" <td>2.000000</td>\n",
" <td>3.000000</td>\n",
" <td>...</td>\n",
" <td>1.000000</td>\n",
" <td>2.000000</td>\n",
" <td>1.000000</td>\n",
" <td>3.980000</td>\n",
" <td>2.861000</td>\n",
" <td>1.000000</td>\n",
" <td>9.019750</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>2.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>max</th>\n",
" <td>6.496000</td>\n",
" <td>7.918400</td>\n",
" <td>12.000000</td>\n",
" <td>2.000000</td>\n",
" <td>36.000000</td>\n",
" <td>13.000000</td>\n",
" <td>18.000000</td>\n",
" <td>60.700000</td>\n",
" <td>24.000000</td>\n",
" <td>12.000000</td>\n",
" <td>...</td>\n",
" <td>12.000000</td>\n",
" <td>18.000000</td>\n",
" <td>7.000000</td>\n",
" <td>10.695000</td>\n",
" <td>5.750000</td>\n",
" <td>8.000000</td>\n",
" <td>14.700000</td>\n",
" <td>4.000000</td>\n",
" <td>27.000000</td>\n",
" <td>2.000000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>8 rows × 42 columns</p>\n",
"</div>"
],
"text/plain": [
" V1 V2 V3 V4 V5 V6 \\\n",
"count 846.000000 846.000000 846.000000 821.000000 846.000000 846.000000 \n",
"mean 4.790476 3.054551 0.739953 0.030451 0.946809 0.277778 \n",
"std 0.531991 0.813983 1.504545 0.198281 2.318081 1.045544 \n",
"min 2.000000 0.803900 0.000000 0.000000 0.000000 0.000000 \n",
"25% 4.499000 2.510175 0.000000 0.000000 0.000000 0.000000 \n",
"50% 4.840000 3.052400 0.000000 0.000000 0.000000 0.000000 \n",
"75% 5.119000 3.415725 1.000000 0.000000 1.000000 0.000000 \n",
"max 6.496000 7.918400 12.000000 2.000000 36.000000 13.000000 \n",
"\n",
" V7 V8 V9 V10 ... V33 \\\n",
"count 846.000000 846.000000 846.000000 846.000000 ... 846.000000 \n",
"mean 1.669031 37.422813 1.342790 1.784870 ... 0.903073 \n",
"std 2.220221 9.030008 2.018433 1.773856 ... 1.526124 \n",
"min 0.000000 9.100000 0.000000 0.000000 ... 0.000000 \n",
"25% 0.000000 30.800000 0.000000 0.000000 ... 0.000000 \n",
"50% 1.000000 37.850000 1.000000 1.500000 ... 0.000000 \n",
"75% 3.000000 43.800000 2.000000 3.000000 ... 1.000000 \n",
"max 18.000000 60.700000 24.000000 12.000000 ... 12.000000 \n",
"\n",
" V34 V35 V36 V37 V38 V39 \\\n",
"count 846.000000 846.000000 846.000000 821.000000 846.000000 846.000000 \n",
"mean 1.241135 0.926714 3.922100 2.549406 0.671395 8.643191 \n",
"std 2.248684 1.239133 0.992636 0.625021 1.093633 1.223700 \n",
"min 0.000000 0.000000 2.279000 1.467000 0.000000 4.948000 \n",
"25% 0.000000 0.000000 3.497000 2.101000 0.000000 8.009500 \n",
"50% 0.000000 1.000000 3.732500 2.461000 0.000000 8.508000 \n",
"75% 2.000000 1.000000 3.980000 2.861000 1.000000 9.019750 \n",
"max 18.000000 7.000000 10.695000 5.750000 8.000000 14.700000 \n",
"\n",
" V40 V41 Class \n",
"count 846.000000 846.000000 846.000000 \n",
"mean 0.059102 0.706856 1.333333 \n",
"std 0.342364 2.145396 0.471683 \n",
"min 0.000000 0.000000 1.000000 \n",
"25% 0.000000 0.000000 1.000000 \n",
"50% 0.000000 0.000000 1.000000 \n",
"75% 0.000000 0.000000 2.000000 \n",
"max 4.000000 27.000000 2.000000 \n",
"\n",
"[8 rows x 42 columns]"
]
},
2022-12-20 20:31:29 +01:00
"execution_count": 7,
2022-12-19 10:09:00 +01:00
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_train.describe()"
]
},
{
"cell_type": "code",
2022-12-20 20:31:29 +01:00
"execution_count": 8,
2022-12-19 10:09:00 +01:00
"id": "b2689ec0",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"<class 'pandas.core.frame.DataFrame'>\n",
"Int64Index: 846 entries, 3 to 1055\n",
"Data columns (total 42 columns):\n",
" # Column Non-Null Count Dtype \n",
"--- ------ -------------- ----- \n",
" 0 V1 846 non-null float64\n",
" 1 V2 846 non-null float64\n",
" 2 V3 846 non-null int64 \n",
" 3 V4 821 non-null float64\n",
" 4 V5 846 non-null int64 \n",
" 5 V6 846 non-null int64 \n",
" 6 V7 846 non-null int64 \n",
" 7 V8 846 non-null float64\n",
" 8 V9 846 non-null int64 \n",
" 9 V10 846 non-null int64 \n",
" 10 V11 846 non-null int64 \n",
" 11 V12 846 non-null float64\n",
" 12 V13 846 non-null float64\n",
" 13 V14 846 non-null float64\n",
" 14 V15 846 non-null float64\n",
" 15 V16 846 non-null int64 \n",
" 16 V17 846 non-null float64\n",
" 17 V18 846 non-null float64\n",
" 18 V19 846 non-null int64 \n",
" 19 V20 846 non-null int64 \n",
" 20 V21 846 non-null int64 \n",
" 21 V22 830 non-null float64\n",
" 22 V23 846 non-null int64 \n",
" 23 V24 846 non-null int64 \n",
" 24 V25 846 non-null int64 \n",
" 25 V26 846 non-null int64 \n",
" 26 V27 838 non-null float64\n",
" 27 V28 846 non-null float64\n",
" 28 V29 838 non-null float64\n",
" 29 V30 846 non-null float64\n",
" 30 V31 846 non-null float64\n",
" 31 V32 846 non-null int64 \n",
" 32 V33 846 non-null int64 \n",
" 33 V34 846 non-null int64 \n",
" 34 V35 846 non-null int64 \n",
" 35 V36 846 non-null float64\n",
" 36 V37 821 non-null float64\n",
" 37 V38 846 non-null int64 \n",
" 38 V39 846 non-null float64\n",
" 39 V40 846 non-null int64 \n",
" 40 V41 846 non-null int64 \n",
" 41 Class 846 non-null int64 \n",
"dtypes: float64(19), int64(23)\n",
"memory usage: 284.2 KB\n"
]
}
],
"source": [
"df_train.info()"
]
},
{
"cell_type": "code",
2022-12-20 20:31:29 +01:00
"execution_count": 9,
2022-12-19 10:09:00 +01:00
"id": "22003f33",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>V1</th>\n",
" <th>V2</th>\n",
" <th>V3</th>\n",
" <th>V4</th>\n",
" <th>V5</th>\n",
" <th>V6</th>\n",
" <th>V7</th>\n",
" <th>V8</th>\n",
" <th>V9</th>\n",
" <th>V10</th>\n",
" <th>...</th>\n",
" <th>V33</th>\n",
" <th>V34</th>\n",
" <th>V35</th>\n",
" <th>V36</th>\n",
" <th>V37</th>\n",
" <th>V38</th>\n",
" <th>V39</th>\n",
" <th>V40</th>\n",
" <th>V41</th>\n",
" <th>Class</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>3.919</td>\n",
" <td>2.6909</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>31.4</td>\n",
" <td>2</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>2.949</td>\n",
" <td>1.591</td>\n",
" <td>0</td>\n",
" <td>7.253</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>4.170</td>\n",
" <td>2.1144</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>30.8</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>3.315</td>\n",
" <td>1.967</td>\n",
" <td>0</td>\n",
" <td>7.257</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>3.000</td>\n",
" <td>2.7098</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>20.0</td>\n",
" <td>0</td>\n",
" <td>2</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>3.046</td>\n",
" <td>5.000</td>\n",
" <td>0</td>\n",
" <td>6.690</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>13</th>\n",
" <td>4.214</td>\n",
" <td>2.6272</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>30.0</td>\n",
" <td>3</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>2.998</td>\n",
" <td>1.722</td>\n",
" <td>0</td>\n",
" <td>6.770</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>16</th>\n",
" <td>3.942</td>\n",
" <td>2.7719</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>31.6</td>\n",
" <td>2</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>3.542</td>\n",
" <td>1.739</td>\n",
" <td>0</td>\n",
" <td>8.127</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>2</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>5 rows × 42 columns</p>\n",
"</div>"
],
"text/plain": [
" V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 ... V33 V34 V35 \\\n",
"1 3.919 2.6909 0 0 0 0 0 31.4 2 0 ... 0 0 0 \n",
"2 4.170 2.1144 0 0 0 0 0 30.8 1 1 ... 0 0 0 \n",
"4 3.000 2.7098 0 0 0 0 0 20.0 0 2 ... 0 0 1 \n",
"13 4.214 2.6272 0 0 0 0 0 30.0 3 0 ... 0 0 0 \n",
"16 3.942 2.7719 1 0 0 0 0 31.6 2 0 ... 0 0 0 \n",
"\n",
" V36 V37 V38 V39 V40 V41 Class \n",
"1 2.949 1.591 0 7.253 0 0 2 \n",
"2 3.315 1.967 0 7.257 0 0 2 \n",
"4 3.046 5.000 0 6.690 0 0 2 \n",
"13 2.998 1.722 0 6.770 0 0 2 \n",
"16 3.542 1.739 0 8.127 0 1 2 \n",
"\n",
"[5 rows x 42 columns]"
]
},
2022-12-20 20:31:29 +01:00
"execution_count": 9,
2022-12-19 10:09:00 +01:00
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_test.head()"
]
},
{
"cell_type": "code",
2022-12-20 20:31:29 +01:00
"execution_count": 10,
2022-12-19 10:09:00 +01:00
"id": "d7235214",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>V1</th>\n",
" <th>V2</th>\n",
" <th>V3</th>\n",
" <th>V4</th>\n",
" <th>V5</th>\n",
" <th>V6</th>\n",
" <th>V7</th>\n",
" <th>V8</th>\n",
" <th>V9</th>\n",
" <th>V10</th>\n",
" <th>...</th>\n",
" <th>V33</th>\n",
" <th>V34</th>\n",
" <th>V35</th>\n",
" <th>V36</th>\n",
" <th>V37</th>\n",
" <th>V38</th>\n",
" <th>V39</th>\n",
" <th>V40</th>\n",
" <th>V41</th>\n",
" <th>Class</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>count</th>\n",
" <td>209.000000</td>\n",
" <td>209.000000</td>\n",
" <td>209.00000</td>\n",
" <td>209.000000</td>\n",
" <td>209.000000</td>\n",
" <td>209.000000</td>\n",
" <td>209.000000</td>\n",
" <td>209.000000</td>\n",
" <td>209.000000</td>\n",
" <td>209.000000</td>\n",
" <td>...</td>\n",
" <td>209.000000</td>\n",
" <td>209.000000</td>\n",
" <td>209.000000</td>\n",
" <td>209.000000</td>\n",
" <td>209.000000</td>\n",
" <td>209.000000</td>\n",
" <td>209.000000</td>\n",
" <td>209.000000</td>\n",
" <td>209.000000</td>\n",
" <td>209.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>mean</th>\n",
" <td>4.750938</td>\n",
" <td>3.130050</td>\n",
" <td>0.62201</td>\n",
" <td>0.086124</td>\n",
" <td>1.114833</td>\n",
" <td>0.339713</td>\n",
" <td>1.555024</td>\n",
" <td>35.569378</td>\n",
" <td>1.511962</td>\n",
" <td>1.880383</td>\n",
" <td>...</td>\n",
" <td>0.803828</td>\n",
" <td>1.411483</td>\n",
" <td>1.100478</td>\n",
" <td>3.902612</td>\n",
" <td>2.629201</td>\n",
" <td>0.746411</td>\n",
" <td>8.574038</td>\n",
" <td>0.019139</td>\n",
" <td>0.789474</td>\n",
" <td>1.354067</td>\n",
" </tr>\n",
" <tr>\n",
" <th>std</th>\n",
" <td>0.603914</td>\n",
" <td>0.897556</td>\n",
" <td>1.27690</td>\n",
" <td>0.406969</td>\n",
" <td>2.393143</td>\n",
" <td>1.182566</td>\n",
" <td>2.246383</td>\n",
" <td>9.471334</td>\n",
" <td>1.721220</td>\n",
" <td>1.784023</td>\n",
" <td>...</td>\n",
" <td>1.498327</td>\n",
" <td>2.374355</td>\n",
" <td>1.320857</td>\n",
" <td>1.029605</td>\n",
" <td>0.714285</td>\n",
" <td>1.077657</td>\n",
" <td>1.315016</td>\n",
" <td>0.195176</td>\n",
" <td>2.589491</td>\n",
" <td>0.479378</td>\n",
" </tr>\n",
" <tr>\n",
" <th>min</th>\n",
" <td>2.000000</td>\n",
" <td>1.134900</td>\n",
" <td>0.00000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>...</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>2.267000</td>\n",
" <td>1.576000</td>\n",
" <td>0.000000</td>\n",
" <td>4.917000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>1.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25%</th>\n",
" <td>4.414000</td>\n",
" <td>2.494500</td>\n",
" <td>0.00000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>29.400000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>...</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>3.401000</td>\n",
" <td>2.146000</td>\n",
" <td>0.000000</td>\n",
" <td>7.872000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>1.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>50%</th>\n",
" <td>4.807000</td>\n",
" <td>3.039300</td>\n",
" <td>0.00000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>34.200000</td>\n",
" <td>1.000000</td>\n",
" <td>2.000000</td>\n",
" <td>...</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>1.000000</td>\n",
" <td>3.694000</td>\n",
" <td>2.469000</td>\n",
" <td>0.000000</td>\n",
" <td>8.464000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>1.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>75%</th>\n",
" <td>5.188000</td>\n",
" <td>3.555400</td>\n",
" <td>1.00000</td>\n",
" <td>0.000000</td>\n",
" <td>1.000000</td>\n",
" <td>0.000000</td>\n",
" <td>3.000000</td>\n",
" <td>41.200000</td>\n",
" <td>2.000000</td>\n",
" <td>3.000000</td>\n",
" <td>...</td>\n",
" <td>1.000000</td>\n",
" <td>2.000000</td>\n",
" <td>2.000000</td>\n",
" <td>3.991000</td>\n",
" <td>2.967000</td>\n",
" <td>1.000000</td>\n",
" <td>9.017000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>2.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>max</th>\n",
" <td>6.253000</td>\n",
" <td>9.177500</td>\n",
" <td>8.00000</td>\n",
" <td>3.000000</td>\n",
" <td>16.000000</td>\n",
" <td>12.000000</td>\n",
" <td>14.000000</td>\n",
" <td>60.000000</td>\n",
" <td>9.000000</td>\n",
" <td>11.000000</td>\n",
" <td>...</td>\n",
" <td>12.000000</td>\n",
" <td>18.000000</td>\n",
" <td>6.000000</td>\n",
" <td>10.355000</td>\n",
" <td>5.825000</td>\n",
" <td>6.000000</td>\n",
" <td>14.030000</td>\n",
" <td>2.000000</td>\n",
" <td>27.000000</td>\n",
" <td>2.000000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>8 rows × 42 columns</p>\n",
"</div>"
],
"text/plain": [
" V1 V2 V3 V4 V5 V6 \\\n",
"count 209.000000 209.000000 209.00000 209.000000 209.000000 209.000000 \n",
"mean 4.750938 3.130050 0.62201 0.086124 1.114833 0.339713 \n",
"std 0.603914 0.897556 1.27690 0.406969 2.393143 1.182566 \n",
"min 2.000000 1.134900 0.00000 0.000000 0.000000 0.000000 \n",
"25% 4.414000 2.494500 0.00000 0.000000 0.000000 0.000000 \n",
"50% 4.807000 3.039300 0.00000 0.000000 0.000000 0.000000 \n",
"75% 5.188000 3.555400 1.00000 0.000000 1.000000 0.000000 \n",
"max 6.253000 9.177500 8.00000 3.000000 16.000000 12.000000 \n",
"\n",
" V7 V8 V9 V10 ... V33 \\\n",
"count 209.000000 209.000000 209.000000 209.000000 ... 209.000000 \n",
"mean 1.555024 35.569378 1.511962 1.880383 ... 0.803828 \n",
"std 2.246383 9.471334 1.721220 1.784023 ... 1.498327 \n",
"min 0.000000 0.000000 0.000000 0.000000 ... 0.000000 \n",
"25% 0.000000 29.400000 0.000000 0.000000 ... 0.000000 \n",
"50% 0.000000 34.200000 1.000000 2.000000 ... 0.000000 \n",
"75% 3.000000 41.200000 2.000000 3.000000 ... 1.000000 \n",
"max 14.000000 60.000000 9.000000 11.000000 ... 12.000000 \n",
"\n",
" V34 V35 V36 V37 V38 V39 \\\n",
"count 209.000000 209.000000 209.000000 209.000000 209.000000 209.000000 \n",
"mean 1.411483 1.100478 3.902612 2.629201 0.746411 8.574038 \n",
"std 2.374355 1.320857 1.029605 0.714285 1.077657 1.315016 \n",
"min 0.000000 0.000000 2.267000 1.576000 0.000000 4.917000 \n",
"25% 0.000000 0.000000 3.401000 2.146000 0.000000 7.872000 \n",
"50% 0.000000 1.000000 3.694000 2.469000 0.000000 8.464000 \n",
"75% 2.000000 2.000000 3.991000 2.967000 1.000000 9.017000 \n",
"max 18.000000 6.000000 10.355000 5.825000 6.000000 14.030000 \n",
"\n",
" V40 V41 Class \n",
"count 209.000000 209.000000 209.000000 \n",
"mean 0.019139 0.789474 1.354067 \n",
"std 0.195176 2.589491 0.479378 \n",
"min 0.000000 0.000000 1.000000 \n",
"25% 0.000000 0.000000 1.000000 \n",
"50% 0.000000 0.000000 1.000000 \n",
"75% 0.000000 0.000000 2.000000 \n",
"max 2.000000 27.000000 2.000000 \n",
"\n",
"[8 rows x 42 columns]"
]
},
2022-12-20 20:31:29 +01:00
"execution_count": 10,
2022-12-19 10:09:00 +01:00
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_test.describe()"
]
},
{
"cell_type": "code",
2022-12-20 20:31:29 +01:00
"execution_count": 11,
2022-12-19 10:09:00 +01:00
"id": "9598495e",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"<class 'pandas.core.frame.DataFrame'>\n",
"Int64Index: 209 entries, 1 to 1051\n",
"Data columns (total 42 columns):\n",
" # Column Non-Null Count Dtype \n",
"--- ------ -------------- ----- \n",
" 0 V1 209 non-null float64\n",
" 1 V2 209 non-null float64\n",
" 2 V3 209 non-null int64 \n",
" 3 V4 209 non-null int64 \n",
" 4 V5 209 non-null int64 \n",
" 5 V6 209 non-null int64 \n",
" 6 V7 209 non-null int64 \n",
" 7 V8 209 non-null float64\n",
" 8 V9 209 non-null int64 \n",
" 9 V10 209 non-null int64 \n",
" 10 V11 209 non-null int64 \n",
" 11 V12 209 non-null float64\n",
" 12 V13 209 non-null float64\n",
" 13 V14 209 non-null float64\n",
" 14 V15 209 non-null float64\n",
" 15 V16 209 non-null int64 \n",
" 16 V17 209 non-null float64\n",
" 17 V18 209 non-null float64\n",
" 18 V19 209 non-null int64 \n",
" 19 V20 209 non-null int64 \n",
" 20 V21 209 non-null int64 \n",
" 21 V22 209 non-null float64\n",
" 22 V23 209 non-null int64 \n",
" 23 V24 209 non-null int64 \n",
" 24 V25 209 non-null int64 \n",
" 25 V26 209 non-null int64 \n",
" 26 V27 209 non-null float64\n",
" 27 V28 209 non-null float64\n",
" 28 V29 209 non-null int64 \n",
" 29 V30 209 non-null float64\n",
" 30 V31 209 non-null float64\n",
" 31 V32 209 non-null int64 \n",
" 32 V33 209 non-null int64 \n",
" 33 V34 209 non-null int64 \n",
" 34 V35 209 non-null int64 \n",
" 35 V36 209 non-null float64\n",
" 36 V37 209 non-null float64\n",
" 37 V38 209 non-null int64 \n",
" 38 V39 209 non-null float64\n",
" 39 V40 209 non-null int64 \n",
" 40 V41 209 non-null int64 \n",
" 41 Class 209 non-null int64 \n",
"dtypes: float64(17), int64(25)\n",
"memory usage: 70.2 KB\n"
]
}
],
"source": [
"df_test.info()"
]
},
{
"cell_type": "markdown",
"id": "84e0c414",
"metadata": {},
"source": [
"#### Display distributions of target variable **Class** in training and validation set."
]
},
{
"cell_type": "code",
2022-12-20 20:31:29 +01:00
"execution_count": 12,
2022-12-19 10:09:00 +01:00
"id": "5ca239ec",
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAjsAAAHHCAYAAABZbpmkAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjYuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8o6BhiAAAACXBIWXMAAA9hAAAPYQGoP6dpAABJz0lEQVR4nO3deVxUZf//8fcIgoiAKyAuuKe4JpaSmrkkKlmm3ZpZoqlZobmUFd8slxa7NZc00+67UlvMrTQzl9zSUmwxNbM0NdcUNE0QFRS4fn/0Y+5GQGEcGDi9no/HPB7Mda4553POmWHec805Z2zGGCMAAACLKubuAgAAAPITYQcAAFgaYQcAAFgaYQcAAFgaYQcAAFgaYQcAAFgaYQcAAFgaYQcAAFgaYQcAAFgaYaeIGzt2rGw2W4Es64477tAdd9xhv//ll1/KZrNpyZIlBbL8fv36qVq1agWyLGclJydr4MCBCg4Ols1m0/Dhw/M8j8x9+scff7i+QOSLq18buXX48GHZbDa99tpr1+3r6tf63LlzZbPZdPjwYZfNMyf9+vVTqVKl8n05+c1ms2ns2LFOPbZatWrq16+fS+tB7hF2CpHMfz6ZtxIlSigkJESRkZGaPn26zp8/75LlnDhxQmPHjtXOnTtdMj9XKsy15cYrr7yiuXPn6rHHHtP777+vhx566Jp9ly1bVnDFXWXr1q0aO3aszp0757Ya8qKo1ftPc/HiRY0dO1Zffvml22pYuXKl02EE11cY9rHTDAqNOXPmGElm/Pjx5v333zfvvvuueeWVV0zHjh2NzWYzoaGhZteuXQ6PuXLlirl06VKelvPdd98ZSWbOnDl5elxqaqpJTU2139+4caORZBYvXpyn+Thb2+XLl01KSorLlpUfmjdvblq2bJmrvr6+viY6OjpL+5gxY4wkc/r0aRdX52jSpElGkjl06FC+LsdVCnO9V782cuvQoUNGkpk0adJ1+2Y+L1wlLS3NXLp0yWRkZLhkfqdPnzaSzJgxY7JMi46ONr6+vi5ZzrXExMS4dBtd7dKlS+bKlStOPTYlJcVcvnzZxRUVrGvt48LO0z0RC9fSuXNnNWvWzH4/NjZWGzZs0F133aW7775bv/zyi3x8fCRJnp6e8vTM39148eJFlSxZUl5eXvm6nOspXry4W5efG6dOnVJYWJi7y3AbY4xSUlLsz0+rKyyvDWd4eHjIw8PD3WW4TVpamjIyMvK070qUKOH08ry9vZ1+LFzA3WkL/5M5svPdd99lO/2VV14xksx//vMfe1t2n/a++OIL07JlSxMQEGB8fX1NnTp1TGxsrDHmf6MxV98yR1LatGlj6tevb77//nvTunVr4+PjY4YNG2af1qZNG/tyMue1YMECExsba4KCgkzJkiVN165dzdGjRx1qCg0NzXYU4+/zvF5t0dHRJjQ01OHxycnJZuTIkaZy5crGy8vL1KlTx0yaNCnLp1VJJiYmxixdutTUr1/feHl5mbCwMLNq1apst/XVEhISzMMPP2wCAwONt7e3adSokZk7d26WbXH1LadRiOz6Zm6fzH26f/9+Ex0dbQICAoy/v7/p16+fuXDhQpZ5vf/++6Zp06amRIkSpkyZMqZXr15Ztv/VMpeRU73vvvuuadu2ralQoYLx8vIy9erVM2+++WaW+YSGhpqoqCizevVqEx4ebry9vc3UqVONMcYcPnzYdO3a1ZQsWdJUqFDBDB8+3KxevdpIMhs3bnSYz7Zt20xkZKTx9/c3Pj4+5vbbbzdff/11ruu9WkxMjPH19c12e91///0mKCjIpKWlGWOMWbZsmenSpYupWLGi8fLyMjVq1DDjx4+3T8+Ul9dGamqqef75503Tpk2Nv7+/KVmypGnVqpXZsGGDwzz/PrIzZcoUU7VqVVOiRAlz++23m927d2e7z67mzP435n//b/6+DTP351dffWVuueUW4+3tbapXr27mzZt3zXllrsfVt8wRgMyRnePHj5t77rnH+Pr6mvLly5snn3wyy3ZOT083U6dONWFhYcbb29sEBgaaRx55xJw9e/aaNURHR2dbw9/rmzRpkpk6daqpUaOGKVasmNmxY0eu95UxJsuoRl5eq1f/D8zc/l9//bUZMWKEKV++vClZsqTp1q2bOXXqVJZtMmbMGFOxYkXj4+Nj7rjjDrNnz54c/69e7aOPPjJNmzY1pUqVMn5+fqZBgwZm2rRpDn3+/PNPM2zYMPv/0po1a5pXX33VpKenO2zDnPZxYcfIThHy0EMP6f/+7//0xRdfaNCgQdn22bNnj+666y41atRI48ePl7e3tw4cOKAtW7ZIkurVq6fx48frhRde0COPPKLWrVtLkm677Tb7PM6cOaPOnTvr/vvv14MPPqigoKBr1vXyyy/LZrPpmWee0alTpzRt2jR16NBBO3fuzNMn/NzU9nfGGN19993auHGjBgwYoCZNmmjNmjUaNWqUfv/9d02dOtWh/9dff61PPvlEjz/+uPz8/DR9+nT16NFDR48eVbly5XKs69KlS7rjjjt04MABDRkyRNWrV9fixYvVr18/nTt3TsOGDVO9evX0/vvva8SIEapcubKefPJJSVKFChWynef777+vgQMH6tZbb9UjjzwiSapZs6ZDn549e6p69eqaMGGCfvjhB7399tsKDAzUv//9b3ufl19+Wc8//7x69uypgQMH6vTp05oxY4Zuv/127dixQ6VLl852+d27d9evv/6qjz76SFOnTlX58uUd6p01a5bq16+vu+++W56envrss8/0+OOPKyMjQzExMQ7z2rdvn3r37q3Bgwdr0KBBuummm3ThwgW1a9dOJ0+e1LBhwxQcHKz58+dr48aNWWrZsGGDOnfurPDwcI0ZM0bFihXTnDlz1K5dO3311Ve69dZbr1vv1Xr16qWZM2fq888/17/+9S97+8WLF/XZZ5+pX79+9lGNuXPnqlSpUho5cqRKlSqlDRs26IUXXlBSUpImTZrkMN/cvjaSkpL09ttvq3fv3ho0aJDOnz+vd955R5GRkfr222/VpEkTh/7vvfeezp8/r5iYGKWkpOj1119Xu3bttHv37mu+/pzd/9dy4MAB3XfffRowYICio6P17rvvql+/fgoPD1f9+vWzfUyFChU0a9YsPfbYY7r33nvVvXt3SVKjRo3sfdLT0xUZGanmzZvrtdde07p16zR58mTVrFlTjz32mL3f4MGDNXfuXPXv319PPPGEDh06pDfeeEM7duzQli1bchzhHTx4sE6cOKG1a9fq/fffz7bPnDlzlJKSokceeUTe3t4qW7ZsnvdVdnLzWs3J0KFDVaZMGY0ZM0aHDx/WtGnTNGTIEC1cuNDeJzY2VhMnTlTXrl0VGRmpXbt2KTIyUikpKded/9q1a9W7d2+1b9/eXs8vv/yiLVu2aNiwYZL+el20adNGv//+uwYPHqyqVatq69atio2N1cmTJzVt2rRc7eNCzd1pC/9zvZEdY4wJCAgwN998s/3+1Z/2pk6det3jPa51XEybNm2MJDN79uxsp2U3slOpUiWTlJRkb1+0aJGRZF5//XV7W25Gdq5X29UjO8uWLTOSzEsvveTQ77777jM2m80cOHDA3ibJeHl5ObTt2rXLSDIzZszIsqy/mzZtmpFkPvjgA3vb5cuXTUREhClVqpTDumd+Ms6N6x2z8/DDDzu033vvvaZcuXL2+4cPHzYeHh7m5Zdfdui3e/du4+npmaX9atc6BubixYtZ2iIjI02NGjUc2kJDQ40ks3r1aof2yZMnG0lm2bJl9rZLly6ZunXrOozsZGRkmNq1a5vIyEiH0biLFy+a6tWrmzvvvDNX9V4tIyPDVKpUyfTo0cOhPfO5uXnz5muu6+DBg03JkiUdjhHLy2sjLS0tyzE8f/75pwkKCnLYr5mfln18fMzx48ft7d98842RZEaMGGFvu/q1fqP7P6eRnau3z6lTp4y3t7d58sknrzm/6x2zo/9/POLf3XzzzSY8PNx+/6uvvjKSzIcffujQL3NE8Or2q+V0zE7mdvb3988yapLbfWVMziM713utGpPzyE6HDh0cnvsjRow
"text/plain": [
"<Figure size 640x480 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"_, _, bars = plt.hist(df_train['Class'], bins=10)\n",
"plt.xlabel('Class')\n",
"plt.ylabel('Frequency')\n",
"plt.title('Distribution of the target variable in the training set')\n",
"plt.bar_label(bars, fmt='%1.0f')\n",
"plt.show()"
]
},
{
"cell_type": "code",
2022-12-20 20:31:29 +01:00
"execution_count": 13,
2022-12-19 10:09:00 +01:00
"id": "c74f9fb5",
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAjsAAAHHCAYAAABZbpmkAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjYuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8o6BhiAAAACXBIWXMAAA9hAAAPYQGoP6dpAABH6UlEQVR4nO3deVxUZf//8fcggogsYgJiLrjlmhYmebsrSWqmabmWuFthrmV5l7mkkZlrmd6VqZVmampmppmYtqi5lm3uu4JbgrggyvX7ox/zbQQUYdiOr+fjMQ+d65y5zuecYZg311znjM0YYwQAAGBRLrldAAAAQHYi7AAAAEsj7AAAAEsj7AAAAEsj7AAAAEsj7AAAAEsj7AAAAEsj7AAAAEsj7AAAAEsj7NzhRo0aJZvNliPbaty4sRo3bmy//91338lms2nx4sU5sv3u3burbNmyObKtzEpISFDv3r0VGBgom82mQYMG3XYfKc/pmTNnnF8gssWNr42MOnTokGw2m956661bruvs1/qcOXNks9l06NAhp/WZnu7du6tIkSLZvh1YF2HHQlJ++aTcChUqpKCgIIWHh2vatGm6cOGCU7Zz4sQJjRo1Sjt37nRKf86Ul2vLiNdff11z5szRM888o48//lhPPfXUTdddtmxZzhV3g59++kmjRo3S+fPnc62G25Hf6r3TXLp0SaNGjdJ3332XazWsXLlSo0aNytZt5OZ+/vHHHxo1alSOBNQ8x8AyZs+ebSSZMWPGmI8//th8+OGH5vXXXzfNmzc3NpvNlClTxvzyyy8Oj0lKSjKXL1++re1s2bLFSDKzZ8++rcclJiaaxMRE+/1169YZSWbRokW31U9ma7t69aq5cuWK07aVHUJDQ029evUytK6np6eJiIhI1T5y5EgjyZw+fdrJ1TmaMGGCkWQOHjyYrdtxlrxc742vjYw6ePCgkWQmTJhwy3VTfi6c5dq1a+by5csmOTnZKf2dPn3aSDIjR45MtSwiIsJ4eno6ZTs3ExkZ6dRjlJab7Wd2W7RokZFk1q1bl+Pbzm2uuZKwkK1atGih2rVr2+8PHz5c0dHReuSRR/Too4/qzz//lIeHhyTJ1dVVrq7Z+2Nw6dIlFS5cWG5ubtm6nVspWLBgrm4/I06dOqWqVavmdhm5xhijK1eu2H8+rS6vvDYyo0CBAipQoEBulwFkTG6nLThPysjOli1b0lz++uuvG0nmvffes7el9dfeN998Y+rVq2d8fHyMp6enqVSpkhk+fLgx5v9GY268pYykNGrUyFSrVs1s3brVNGjQwHh4eJiBAwfalzVq1Mi+nZS+FixYYIYPH24CAgJM4cKFTevWrc2RI0ccaipTpkyaoxj/7vNWtUVERJgyZco4PD4hIcEMGTLE3H333cbNzc1UqlTJTJgwIdVfq5JMZGSkWbp0qalWrZpxc3MzVatWNV9//XWax/pGsbGxpmfPnsbf39+4u7ube++918yZMyfVsbjxlt4oRFrrphyflOd07969JiIiwvj4+Bhvb2/TvXt3c/HixVR9ffzxx+b+++83hQoVMkWLFjUdO3ZMdfxvlLKN9Or98MMPTZMmTUzx4sWNm5ubqVKlinn33XdT9VOmTBnTqlUrs2rVKhMSEmLc3d3N5MmTjTHGHDp0yLRu3doULlzYFC9e3AwaNMisWrUqzb9MN23aZMLDw423t7fx8PAwDRs2ND/88EOG671RZGSk8fT0TPN4derUyQQEBJhr164ZY4xZtmyZadmypSlRooRxc3Mz5cqVM2PGjLEvT3E7r43ExEQzYsQIc//99xtvb29TuHBhU79+fRMdHe3Q579HdiZNmmRKly5tChUqZBo2bGh27dqV5nN2o8w8/8b83++bfx/DlOfz+++/Nw888IBxd3c3wcHBZu7cuTftK2U/bryljH6kjOwcO3bMtGnTxnh6epq77rrLDB06NNVxvn79upk8ebKpWrWqcXd3N/7+/qZv377m3LlzN60hIiIizRput98tW7aY5s2bm2LFiplChQqZsmXLmh49emRoP9Ny9epVM2rUKFOhQgXj7u5u/Pz8TL169cw333zjsN6ff/5p2rdvb4oWLWrc3d1NSEiI+eKLL+zLU56vG293yigPIzt3kKeeekr//e9/9c0336hPnz5prvP777/rkUce0b333qsxY8bI3d1d+/bt048//ihJqlKlisaMGaNXX31Vffv2VYMGDSRJ//nPf+x9nD17Vi1atFCnTp305JNPKiAg4KZ1jRs3TjabTS+++KJOnTqlKVOmKCwsTDt37rytv/AzUtu/GWP06KOPat26derVq5dq1aql1atX64UXXtDx48c1efJkh/V/+OEHLVmyRM8++6y8vLw0bdo0tW/fXkeOHFGxYsXSrevy5ctq3Lix9u3bp/79+ys4OFiLFi1S9+7ddf78eQ0cOFBVqlTRxx9/rMGDB+vuu+/W0KFDJUnFixdPs8+PP/5YvXv3Vp06ddS3b19JUvny5R3W6dChg4KDgxUVFaXt27frgw8+kL+/v8aPH29fZ9y4cRoxYoQ6dOig3r176/Tp03r77bfVsGFD7dixQ76+vmluv127dtqzZ48+/fRTTZ48WXfddZdDvTNmzFC1atX06KOPytXVVV9++aWeffZZJScnKzIy0qGv3bt3q3PnzurXr5/69Omje+65RxcvXlTTpk118uRJDRw4UIGBgZo/f77WrVuXqpbo6Gi1aNFCISEhGjlypFxcXDR79mw1bdpU33//verUqXPLem/UsWNHTZ8+XV999ZWeeOIJe/ulS5f05Zdfqnv37vZRjTlz5qhIkSIaMmSIihQpoujoaL366quKj4/XhAkTHPrN6GsjPj5eH3zwgTp37qw+ffrowoULmjVrlsLDw/Xzzz+rVq1aDut/9NFHunDhgiIjI3XlyhVNnTpVTZs21a5du276+svs838z+/bt0+OPP65evXopIiJCH374obp3766QkBBVq1YtzccUL15cM2bM0DPPPKPHHntM7dq1kyTde++99nWuX7+u8PBwhYaG6q233tK3336riRMnqnz58nrmmWfs6/Xr109z5sxRjx49NGDAAB08eFDvvPOOduzYoR9//DHdEd5+/frpxIkTWrNmjT7++OM0l9+q31OnTql58+YqXry4XnrpJfn6+urQoUNasmRJhvfzRqNGjVJUVJT99R4fH6+tW7dq+/bteuihhyT983u7Xr16KlmypF566SV5enpq4cKFatu2rT7//HM99thjatiwoQYMGKBp06bpv//9r6pUqSJJ9n8tL7fTFpznViM7xhjj4+Nj7rvvPvv9G//amzx58i3ne9xsXkyjRo2MJDNz5sw0l6U1slOyZEkTHx9vb1+4cKGRZKZOnWpvy8jIzq1qu3FkZ9myZUaSGTt2rMN6jz/+uLHZbGbfvn32NknGzc3Noe2XX34xkszbb7+dalv/NmXKFCPJfPLJJ/a2q1evmrp165oiRYo47HvKX8YZcas5Oz179nRof+yxx0yxYsXs9w8dOmQKFChgxo0b57Derl27jKura6r2G91sDsylS5dStYWHh5ty5co5tJUpU8ZIMqtWrXJonzhxopFkli1bZm+7fPmyqVy5ssNfo8nJyaZixYomPDzcYTTu0qVLJjg42Dz00EMZqvdGycnJpmTJkqZ9+/YO7Sk/mxs2bLjpvvbr188ULlzYYY7Y7bw2rl27lmoOz99//20CAgIcnteUkQIPDw9z7Ngxe/vmzZuNJDN48GB7242v9aw+/+mN7Nx4fE6dOmXc3d3N0KFDb9rfrebs6P/PR/y3++67z4SEhNjvf//990aSmTdvnsN6KSOCN7bfKL05Oxntd+nSpbf8HXy7c3Zq1qx5y98JzZo1MzVq1HD4eUtOTjb/+c9/TMWKFe1td/KcHc7GusMUKVLkpmdlpfwl98UXXyg5OTlT23B3d1ePHj0yvH63bt3k5eVlv//444+rRIkSWrlyZaa2n1ErV65UgQIFNGDAAIf2oUOHyhijr7/+2qE9LCzMYfTk3nvvlbe3tw4cOHDL7QQGBqpz5872toIFC2r
"text/plain": [
"<Figure size 640x480 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"_, _, bars = plt.hist(df_test['Class'], bins=10)\n",
"plt.xlabel('Class')\n",
"plt.ylabel('Frequency')\n",
"plt.title('Distribution of the target variable in the test set')\n",
"plt.bar_label(bars, fmt='%1.0f')\n",
"plt.show()"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "82afd315",
"metadata": {},
"source": [
"#### Display relationship between features in the training set using the correlation matrix"
]
},
{
"cell_type": "code",
2022-12-20 20:31:29 +01:00
"execution_count": 14,
2022-12-19 10:09:00 +01:00
"id": "e8cf8eb1",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(42.5, -0.5)"
]
},
2022-12-20 20:31:29 +01:00
"execution_count": 14,
2022-12-19 10:09:00 +01:00
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAABy4AAAe2CAYAAABKEJQUAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjYuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8o6BhiAAAACXBIWXMAAA9hAAAPYQGoP6dpAAEAAElEQVR4nOzddXRU1/rw8e9EZuLubkQIEtyhtBQp0EIFK1KkUOqlCrTQ9vbWBai73gq0pcXd3TUJxBPirhOf948JSSbMJMO9cJPfe5/PWrNWSfaZPD1nn2fvffY5+yg0Go0GIYQQQgghhBBCCCGEEEIIIYRoRybtHYAQQgghhBBCCCGEEEIIIYQQQsjEpRBCCCGEEEIIIYQQQgghhBCi3cnEpRBCCCGEEEIIIYQQQgghhBCi3cnEpRBCCCGEEEIIIYQQQgghhBCi3cnEpRBCCCGEEEIIIYQQQgghhBCi3cnEpRBCCCGEEEIIIYQQQgghhBCi3cnEpRBCCCGEEEIIIYQQQgghhBCi3cnEpRBCCCGEEEIIIYQQQgghhBCi3cnEpRBCCCGEEEIIIYQQQgghhBCi3cnEpRBCCCGEEEIIIYQQQgghhBCi3cnEpRBCCCGEEEIIIYQQQgghhBD/h+3bt4/x48fj5eWFQqHgr7/+anObPXv20LNnT1QqFSEhIXz33XfXlPn4448JCAjAwsKCfv36cezYsRsffDMycSmEEEIIIYQQQgghhBBCCCHE/2Hl5eV0796djz/+2KjySUlJjB07luHDh3PmzBmefPJJ5s2bx9atWxvL/PbbbyxatIjly5dz6tQpunfvzqhRo8jJyblZ/xsoNBqN5qZ9uxBCCCGEEEIIIYQQQgghhBDiv0ahULB27VomTJhgsMzzzz/Pxo0buXDhQuPPpkyZQlFREVu2bAGgX79+9OnTh48++giA+vp6fH19eeyxx3jhhRduSuzyxKUQQgghhBBCCCGEEEIIIYQQHUxVVRUlJSU6n6qqqhvy3YcPH2bEiBE6Pxs1ahSHDx8GoLq6mpMnT+qUMTExYcSIEY1lbgazm/bNQgghhBBCCCGEEEIIIYQQ4v8sS7+p7R3C/7Tn54Txyiuv6Pxs+fLlvPzyy//xd2dlZeHu7q7zM3d3d0pKSlCr1RQWFlJXV6e3TGxs7H/89w2RiUshhBBCCCGEEEIIIYQQQgghOpjFixezaNEinZ+pVKp2iua/QyYuhRBCCCGEEEIIIYQQQgghhOhgVCrVTZuo9PDwIDs7W+dn2dnZ2NnZYWlpiampKaampnrLeHh43JSYQN5xKYQQQgghhBBCCCGEEEIIIcT/lAEDBrBz506dn23fvp0BAwYAoFQq6dWrl06Z+vp6du7c2VjmZpCJSyGEEEIIIYQQQgghhBBCCCH+DysrK+PMmTOcOXMGgKSkJM6cOUNqaiqgXXZ25syZjeUfeughEhMTee6554iNjeWTTz5h9erVPPXUU41lFi1axJdffsn3339PTEwMCxcupLy8nNmzZ9+0/w9ZKlYIIYQQQgghhBBCCCGEEEKI/8NOnDjB8OHDG/999d2Ys2bN4rvvviMzM7NxEhMgMDCQjRs38tRTT7Fy5Up8fHz46quvGDVqVGOZyZMnk5uby7Jly8jKyiIqKootW7bg7u5+0/4/FBqNRnPTvl0IIYQQQgghhBBCCCGEEEL8n2TpN7W9Q/ifpk79pb1D+K+TpWKFEEIIIYQQQgghhBBCCCGEEO1OlooVQgghhBBCCCGEEEIIIYQQ11Ao5Pk38d8lNU4IIYQQQgghhBBCCCGEEEII0e5k4lIIIYQQQgghhBBCCCGEEEII0e5k4lIIIYQQQgghhBBCCCGEEEII0e5k4lIIIYQQQgghhBBCCCGEEEII0e5k4lIIIYQQQgghhBBCCCGEEEII0e5k4lIIIYQQQgghhBBCCCGEEEII0e7M2jsAIYQQQgghhBBCCCGEEEII0fEo5Pk38V8mNU4IIYQQQgghhBBCCCGEEEII0e5k4lIIIYQQQgghhBBCCCGEEEII0e5k4lIIIYQQQgghhBBCCCGEEEII0e5k4lIIIYQQQgghhBBCCCGEEEII0e5k4lIIIYQQQgghhBBCCCGEEEII0e5k4lIIIYQQQgghhBBCCCGEEEII0e5k4lIIIYQQQgghhBBCCCGEEEII0e7M2jsAIYQQQgghhBBCCCGEEEII0fEoFPL8m/jv6lATl5Z+U9s7BB3q1F/o+fP+9g5Dx6lpQ4j4el97h9EoZu5QpuzuOPEA/Dp8KP3/ONDeYeg4cs9gAp/f0N5h6Eh6axwByza3dxg6kl8dg/8bO9o7DB0pi0fQ65eOlQdOTh1C52861nkXPWcofdd0rPPu2H2DeeborvYOQ8e7/W4lZOIP7R1Go/i1M1l+qmOdc6/0HIFL2JPtHYaOvEsrCFi+pb3D0JH8yugO1R8AbZ/gldMdqz4t7zGCOfv3tHcYOr4Zcgu3bDzY3mHo2DN2EN1+7Fht3bkZQ+izumO1K8cnDabf7x0rpqP3DubRw7vbO4xGHw0YzuC/O9Y+OnDXYO7a0bHq998jhuAc+nh7h6Ej//IqZu7d295h6Phh2DAmdbCx5urhQztkP/yhgx0nDwB8Nmg4kd92rP10cfZQAj/uWHU86ZFhDF3fcfoE+8YP6pDnXKfPO1ZMcQuGMmprx2rrto7qmG1d4DPr2zsMHUnvjuf2LR3nnAPYPnoQfqs6Vm5KfXwYp/M71jXMHs7j2jsEIcR/mUyVCyGEEEIIIYQQQgghhBBCCCHanUxcCiGEEEIIIYQQQgghhBBCCCHanUxcCiGEEEIIIYQQQgghhBBCCCHanUxcCiGEEEIIIYQQQgghhBBCCCHanUxcCiGEEEIIIYQQQgghhBBCCCHanVl7ByCEEEIIIYQQQgghhBBCCCE6HoVCnn8T/11S44QQQgghhBBCCCGEEEIIIYQQ7U4mLoUQQgghhBBCCCGEEEIIIYQQ7U4mLoUQQgghhBBCCCGEEEIIIYQQ7U4mLoUQQgghhBBCCCGEEEIIIYQQ7U4mLoUQQgghhBBCCCGEEEIIIYQQ7U4mLoUQQgghhBBCCCGEEEIIIYQQ7U4mLoUQQgghhBBCCCGEEEIIIYQQ7c7sRn1RbW0tGRkZ+Pn53aivFEIIIYQQQgghhBBCCCGEEO1EoVC0dwjif8wNm7i8ePEiPXv2pK6u7oZ836C+4Tz10Dh6dg3C092RSfPeY/22E61uM6R/BG+9NIPOoT5cycznzVVr+en3fTplFsy8nacWjMfd1Z7zMaksWvYdJ84mXFdskzp5MjPCB2dLJZcLy3j7ZAIX88v0lp0Y7MG4QDeCHawAiCko46OzyTrlb/Vx5p5OnkQ42eCgMmfKplNcLiq/rpimRXgyp6svLpZKYgvK+OfhBM7nleote1+YB3eGuNPJURtTdF4ZH5xI1in/+pBQJoZ66Gy3/0oB87deMCqe3D27ydm2lZqSYix9fPGZPBXrwECD5QtPniBz3d9U5+ehcnPHa+I92HftCoCmrpaMv/+i5MIFqvNyMbG0xDY8Au+J92Du4GBUPAD3BHkyPdQbJwsl8cXlvHcmgehC/cct0NaK+ZF+hDvY4GltwQdnE/ktPuOacq4WSh7pGsAAd0dUZiZcKavktRNxxBbp/96WZgzwZ/7QYFxtVcRklvDy3xc5e6Woze3Gdffiw2k92XYxiwU/6D8vXpvYlfv7+/Pq+ot8eyDJqHgAZvT1Y8GgQFxtVMRkl7J8YzRn04vb3G58F08+nBTFtphs5v9yqvHnoyLcub+PH1297HC0UnLHJweIztJfNw2Z2dOH+f38cbVREpNTxvJtlzibWaK37OhQVx4ZGIi/oyXmJiYkFVbw5bEU1l7IaiyTsniE3m1f3xXH50dTjIrpvk6ezAzX5oG4q3mgQP9xD7Kz4qFu/kQ42uBlY8G7pxL45ZJufbIyM2VhN3+G+zjjqDLnUmE5755KINrAd+ozNcKTOV20eeBSYet54N5QD+4KcSfkah7IL2NFizzwzyGhTOx0bR5YsM24PABwb7An08O8cbZQEldUzrunDZ93QXY
"text/plain": [
"<Figure size 2500x2500 with 2 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"correlation_matrix = df_train.corr()\n",
"fig, ax = plt.subplots(figsize=(25, 25))\n",
"\n",
"ax = sns.heatmap(\n",
" correlation_matrix,\n",
" annot=True,\n",
" linewidths=0.5,\n",
" fmt=\".2f\",\n",
" cmap=\"YlGnBu\"\n",
")\n",
"\n",
"# Jupyter notebook specific\n",
"bottom_side, top_side = ax.get_ylim()\n",
"ax.set_ylim(bottom_side + 0.5, top_side - 0.5)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "c2b4a57c",
"metadata": {},
"source": [
"We can see that there is the highest positive correlation in **V14** atribute and the highest negative value in the attributes **V1, V27** So lets see the distribution of those values in comparrison to class."
]
},
{
"cell_type": "markdown",
"id": "f1918d5b",
"metadata": {},
"source": [
"**V14 vs V17**"
]
},
{
"cell_type": "code",
2022-12-20 20:31:29 +01:00
"execution_count": 15,
2022-12-19 10:09:00 +01:00
"id": "8d4ce9a6",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
2022-12-20 20:31:29 +01:00
"<matplotlib.legend.Legend at 0x7f298db7da80>"
2022-12-19 10:09:00 +01:00
]
},
2022-12-20 20:31:29 +01:00
"execution_count": 15,
2022-12-19 10:09:00 +01:00
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAABNoAAANXCAYAAADjAjLCAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjYuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8o6BhiAAAACXBIWXMAAA9hAAAPYQGoP6dpAADjfklEQVR4nOzdeVhUZf8G8PucYd8XRcAUcElFchdFc/lVCmqo1Ztmmlrma6aVLb5lZUhWaquVZablmpmlpaihZpobiokbobmElgqSICCyzzm/P8YZGWY7AwMDeH+uy0vnnGfOec4w9b7efZ/nK8iyLIOIiIiIiIiIiIiqRbT3BIiIiIiIiIiIiBoCBm1EREREREREREQ2wKCNiIiIiIiIiIjIBhi0ERERERERERER2QCDNiIiIiIiIiIiIhtg0EZERERERERERGQDDNqIiIiIiIiIiIhsgEEbERERERERERGRDTBoIyIiIiIiIiIisgEGbURERNRgjR8/HqGhoXa5tyAImDVrlk2vuXLlSrRt2xaOjo7w8fGx6bVt5fz58xAEAcuWLbP3VMxKTExEp06d4OLiAkEQkJuba+8p1Sp7/rNBRETUkDFoIyIiqgcEQVD0a9euXfaeqp79+/dj1qxZt12IURNOnTqF8ePHo2XLlli8eDG+/PJLu85n9erVmD9/vl3nUFXZ2dkYMWIEXF1d8dlnn2HlypVwd3c3GDd06FC4ubnh+vXrJq81evRoODk5ITs7GwDw3XffYcyYMWjdujUEQUD//v1r6jFqVFZWFhwcHDBmzBiTY65fvw5XV1c8+OCDAIBDhw5h6tSpaN++Pdzd3dG8eXOMGDECp0+fNnivuX+PDRgwoMaei4iIqKY52HsCREREZNnKlSv1Xq9YsQLbt283ON6uXbvanJZF+/fvR3x8PMaPH19nK7BqSlFRERwcbPd/tXbt2gVJkvDxxx+jVatWNrtuVa1evRqpqamYNm2a3vGQkBAUFRXB0dHRPhNT4NChQ7h+/Tpmz56N++67z+S40aNHIyEhAT/++CPGjh1rcL6wsBAbNmxATEwM/P39AQALFy7E4cOH0b17d134Vh8FBARgwIAB2LBhAwoLC+Hm5mYwZv369SguLtaFcfPmzcO+ffvw8MMPo0OHDsjMzMSCBQvQpUsXHDhwABEREbr3Vv53FwD8/vvv+PjjjzFw4MCaezAiIqIaxqCNiIioHqhcVXLgwAFs377dbLWJUrIso7i4GK6urtW+Ft3i4uJi0+tlZWUBQJ0PLAVBsPmz25rSz3Lo0KHw9PTE6tWrjQZtGzZswI0bNzB69GjdsZUrV6Jp06YQRVEvWKqPRo8ejcTERGzcuBGPPPKIwfnVq1fD29sbQ4YMAQC88MILWL16NZycnHRjRo4cibvuugtz587FqlWrdMeN/btr165dEAQBo0aNqoGnISIiqh1cOkpERNRALF26FPfccw8CAgLg7OyM8PBwLFy40GBcaGgo7r//fmzduhXdunWDq6srFi1aBAC4cOEChg4dCnd3dwQEBOD555/H1q1bjS5LPXjwIGJiYuDt7Q03Nzf069cP+/bt052fNWsWpk+fDgAICwvTLQs7f/680flPnToVHh4eKCwsNDg3atQoBAYGQq1WA9AEHEOGDEFwcDCcnZ3RsmVLzJ49W3feFO1f5Cs/i6l9xU6dOoX//Oc/8PPzg4uLC7p164aNGzeavYdW5T3aZs2aBUEQcPbsWV2Fn7e3Nx5//HGjz1xRaGgo4uLiAACNGzfWu7apveBCQ0Mxfvx43etly5ZBEATs27cPL7zwAho3bgx3d3c88MAD+Pfffw3e//PPP6Nfv37w9PSEl5cXunfvjtWrVwMA+vfvj82bN+PChQu6n6t2vy9Tn+Wvv/6KPn36wN3dHT4+Phg2bBhOnjypN6Y6n5HW999/j65du8LV1RWNGjXCmDFjcOnSJd35/v37Y9y4cQCA7t27QxAEvc+pIu2yyB07dujCuYpWr14NT09PDB06VHesWbNmEMWq/V/s0tJSvPHGG+jatSu8vb3h7u6OPn36YOfOnXrjtJ/x+++/jy+//BItW7aEs7MzunfvjkOHDhlc96effkJERARcXFwQERGBH3/8UdF8HnjgAbi7u+t+7hVlZWVhx44d+M9//gNnZ2cAQK9evfRCNgBo3bo12rdvb/CzrqykpATr1q1Dv379cMcddyiaHxERUV3EijYiIqIGYuHChWjfvj2GDh0KBwcHJCQk4Omnn4YkSZgyZYre2D///BOjRo3CpEmTMHHiRLRp0wY3btzAPffcg4yMDDz33HMIDAzE6tWrDf6SD2hCk0GDBqFr166Ii4uDKIq6oG/Pnj2IjIzEgw8+iNOnT+Pbb7/FRx99hEaNGgHQBEXGjBw5Ep999hk2b96Mhx9+WHe8sLAQCQkJGD9+PFQqFQBNaOTh4YEXXngBHh4e+PXXX/HGG28gPz8f7733nk0+zz/++AO9e/dG06ZN8corr8Dd3R1r167F8OHDsW7dOjzwwANVuu6IESMQFhaGOXPmICUlBUuWLEFAQADmzZtn8j3z58/HihUr8OOPP2LhwoXw8PBAhw4dqnT/Z555Br6+voiLi8P58+cxf/58TJ06Fd99951uzLJly/DEE0+gffv2mDFjBnx8fHDkyBEkJibi0UcfxWuvvYa8vDxcvHgRH330EQDAw8PD5D1/+eUXDBo0CC1atMCsWbNQVFSETz/9FL1790ZKSorBpvxV+Yy083788cfRvXt3zJkzB1euXMHHH3+Mffv24ciRI/Dx8cFrr72GNm3a4Msvv8Sbb76JsLAwtGzZ0uQ1R48ejeXLl2Pt2rWYOnWq7nhOTg62bt2KUaNG2awaND8/H0uWLMGoUaMwceJEXL9+HV999RWio6ORnJyMTp066Y1fvXo1rl+/jkmTJkEQBLz77rt48MEH8ddff+mW7m7btg0PPfQQwsPDMWfOHGRnZ+Pxxx9XFGa5u7tj2LBh+OGHH5CTkwM/Pz/due+++w5qtVqvms8YWZZx5coVtG/f3uy4LVu2IDc31+L1iIiI6jyZiIiI6p0pU6bIlf9nvLCw0GBcdHS03KJFC71jISEhMgA5MTFR7/gHH3wgA5B/+ukn3bGioiK5bdu2MgB5586dsizLsiRJcuvWreXo6GhZkiS9+4eFhckDBgzQHXvvvfdkAHJ6errFZ5IkSW7atKn80EMP6R1fu3atDEDevXu32WedNGmS7ObmJhcXF+uOjRs3Tg4JCdG93rlzp96zaKWnp8sA5KVLl+qO3XvvvfJdd92ldz1JkuRevXrJrVu3tvg8AOS4uDjd67i4OBmA/MQTT+iNe+CBB2R/f3+L19O+/99//zV7H62QkBB53LhxutdLly6VAcj33Xef3s/t+eefl1UqlZybmyvLsizn5ubKnp6eco8ePeSioiK9a1Z835AhQ/Q+Wy1jn2WnTp3kgIAAOTs7W3fs2LFjsiiK8tixYw2esSqfUWlpqRwQECBHRETozXvTpk0yAPmNN94w+CwOHTpk9pqyLMvl5eVyUFCQHBUVpXf8iy++kAHIW7duNfne9u3by/369bN4j4r3Kikp0Tt27do1uUmTJnqfifYz9vf3l3NycnTHN2zYIAOQExISdMc6deokBwUF6X6+sizL27ZtkwEY/flVtnnzZhmAvGjRIr3jPXv2lJs2bSqr1Wqz71+5cqUMQP7qq6/MjnvooYdkZ2dn+dq1axbnREREVJdx6SgREVEDUbGqJi8vD1evXkW/fv3w119/IS8vT29sWFgYoqOj9Y4lJiaiadOmesvgXFxcMHHiRL1xR48exZkzZ/Doo48iOzsbV69exdWrV3Hjxg3ce++92L17NyRJsnr+giDg4YcfxpYtW1BQUKA7/t1336Fp06a4++67jT7r9evXcfXqVfTp0weFhYU4deqU1feuLCcnB7/++itGjBihu/7Vq1eRnZ2N6OhonDlzRm85ojWeeuopvdd9+vRBdnY28vPzqz1vJf773/9CEAS9+6vValy4cAEAsH37dly/fh2vvPKKwV5rFd+nVEZGBo4ePYr
"text/plain": [
"<Figure size 1500x1000 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"\n",
"plt.figure(figsize=(15, 10))\n",
"\n",
"# Scatter with 1 values of target class\n",
"plt.scatter(\n",
" df_train['V1'][df_train['Class'] == 1],\n",
" df_train['V27'][df_train['Class'] == 1],\n",
")\n",
"\n",
"# Scatter with 2 values of target class\n",
"plt.scatter(\n",
" df_train['V1'][df_train['Class'] == 2],\n",
" df_train['V27'][df_train['Class'] == 2],\n",
")\n",
"\n",
"plt.title('Target value in function of V1 and V27')\n",
"\n",
"plt.xlabel('V1')\n",
"plt.ylabel('V27')\n",
"plt.legend(['Biodegradable', 'Non-biodegradable'])\n"
]
},
{
"cell_type": "code",
2022-12-20 20:31:29 +01:00
"execution_count": 16,
2022-12-19 10:09:00 +01:00
"id": "d50d1f44",
"metadata": {},
"outputs": [],
"source": [
"# Spliting the data into features and labels\n",
"X_train = df_train.drop('Class', axis=1)\n",
"y_train = df_train['Class']\n",
"X_test = df_test.drop('Class', axis=1)\n",
"y_test = df_test['Class']"
]
},
{
"cell_type": "code",
2022-12-20 20:31:29 +01:00
"execution_count": 17,
2022-12-19 10:09:00 +01:00
"id": "f0aa7c9d",
"metadata": {},
"outputs": [],
"source": [
"from sklearn.linear_model import LogisticRegression\n",
"from sklearn.neighbors import KNeighborsClassifier\n",
"from sklearn.ensemble import RandomForestClassifier\n",
"\n",
"# Put models in a dictionary\n",
"models = {\n",
" \"Logistic Regression\": LogisticRegression(),\n",
" \"KNN\": KNeighborsClassifier(),\n",
" \"Random Forest\": RandomForestClassifier()\n",
"}\n",
"\n",
"# Create a function to fit and score models\n",
"def fit_and_score(models, X_train, X_test, y_train, y_test):\n",
" \"\"\"\n",
" Fits and evaluates given machine learning models.\n",
" models: dict of different Scikit-Learn machine learning models\n",
" X_train: training data (no labels)\n",
" x_test: testing data (no labels)\n",
" y_train: training labels\n",
" y_test: trest labels\n",
" \"\"\"\n",
"\n",
" # Set random seed\n",
" np.random.seed(42)\n",
"\n",
" # Make a dictioanry to keep model scores\n",
" model_scores = {}\n",
"\n",
" # Loop through models\n",
" for name, model in models.items():\n",
" # Fit the model to the data\n",
" model.fit(X_train, y_train)\n",
" # Evaluate the model and append its score to model_scores\n",
" model_scores[name] = model.score(X_test, y_test)\n",
"\n",
" return model_scores"
]
},
{
"cell_type": "markdown",
"id": "10387356",
"metadata": {},
"source": [
"#### Check if there are any missing values"
]
},
{
"cell_type": "code",
2022-12-20 20:31:29 +01:00
"execution_count": 18,
2022-12-19 10:09:00 +01:00
"id": "87e277e6",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"V4 25\n",
"V22 16\n",
"V27 8\n",
"V29 8\n",
"V37 25\n",
"dtype: int64"
]
},
2022-12-20 20:31:29 +01:00
"execution_count": 18,
2022-12-19 10:09:00 +01:00
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"na_counts = df_train.isna().sum()\n",
"na_counts[na_counts > 0]\n"
]
},
{
"cell_type": "markdown",
"id": "cb57434a",
"metadata": {},
"source": [
"#### We can see that there are five atributes that have missing values. Lets inspect them."
]
},
{
"cell_type": "markdown",
"id": "9dbd2c02",
"metadata": {},
"source": [
"##### V4"
]
},
{
"cell_type": "code",
2022-12-20 20:31:29 +01:00
"execution_count": 19,
2022-12-19 10:09:00 +01:00
"id": "ca1e544a",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"count 821.000000\n",
"mean 0.030451\n",
"std 0.198281\n",
"min 0.000000\n",
"25% 0.000000\n",
"50% 0.000000\n",
"75% 0.000000\n",
"max 2.000000\n",
"Name: V4, dtype: float64"
]
},
2022-12-20 20:31:29 +01:00
"execution_count": 19,
2022-12-19 10:09:00 +01:00
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_train['V4'].describe()"
]
},
{
"cell_type": "code",
2022-12-20 20:31:29 +01:00
"execution_count": 20,
2022-12-19 10:09:00 +01:00
"id": "9e4d7d1d",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.0 800\n",
"1.0 17\n",
"2.0 4\n",
"Name: V4, dtype: int64"
]
},
2022-12-20 20:31:29 +01:00
"execution_count": 20,
2022-12-19 10:09:00 +01:00
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_train['V4'].value_counts()"
]
},
{
"cell_type": "markdown",
"id": "3a3191c9",
"metadata": {},
"source": [
"We can see that the majority of entires in that particular atribute are zeros. So I think that it would be best if I set all the `Nan` values to zeros."
]
},
{
"cell_type": "code",
2022-12-20 20:31:29 +01:00
"execution_count": 21,
2022-12-19 10:09:00 +01:00
"id": "d8489bd4",
"metadata": {},
"outputs": [],
"source": [
"df_train['V4'].fillna(0, inplace=True)\n",
"df_test['V4'].fillna(0, inplace=True)"
]
},
{
"cell_type": "markdown",
"id": "3e84e48b",
"metadata": {},
"source": [
"##### V22"
]
},
{
"cell_type": "code",
2022-12-20 20:31:29 +01:00
"execution_count": 22,
2022-12-19 10:09:00 +01:00
"id": "a711431d",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"count 830.000000\n",
"mean 1.243898\n",
"std 0.094109\n",
"min 0.898000\n",
"25% 1.187500\n",
"50% 1.248500\n",
"75% 1.298750\n",
"max 1.641000\n",
"Name: V22, dtype: float64"
]
},
2022-12-20 20:31:29 +01:00
"execution_count": 22,
2022-12-19 10:09:00 +01:00
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_train['V22'].describe()"
]
},
{
"cell_type": "code",
2022-12-20 20:31:29 +01:00
"execution_count": 23,
2022-12-19 10:09:00 +01:00
"id": "f0325325",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"1.299 9\n",
"1.280 9\n",
"1.296 8\n",
"1.254 8\n",
"1.264 8\n",
" ..\n",
"1.449 1\n",
"1.159 1\n",
"1.363 1\n",
"1.331 1\n",
"1.410 1\n",
"Name: V22, Length: 321, dtype: int64"
]
},
2022-12-20 20:31:29 +01:00
"execution_count": 23,
2022-12-19 10:09:00 +01:00
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_train['V22'].value_counts()"
]
},
{
"cell_type": "code",
2022-12-20 20:31:29 +01:00
"execution_count": 24,
2022-12-19 10:09:00 +01:00
"id": "25a74baf",
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAjIAAAHHCAYAAACle7JuAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjYuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8o6BhiAAAACXBIWXMAAA9hAAAPYQGoP6dpAABLi0lEQVR4nO3deVxU1f8/8NcAssgyyg6iiLiFiguW4YqKIhq5FW4lmGYWrlgWnywlLTTLLbf6ZriFW25lKi4ouFYq5FIumOYC4s5moDDn90c/JkeGbZzhzoXX8/G4j4dz751z33NnhBfnnntGIYQQICIiIpIhE6kLICIiItIVgwwRERHJFoMMERERyRaDDBEREckWgwwRERHJFoMMERERyRaDDBEREckWgwwRERHJFoMMERERyRaDTDU0ffp0KBSKSjlWQEAAAgIC1I8PHDgAhUKBH374oVKOHx4ejvr161fKsXSVk5ODUaNGwdXVFQqFAhMnTqxwG0Xv6Z07d/RfIEGhUGD69Onqx3I63ytWrIBCocCVK1cMfqzw8HDY2NgY/DiG9vT7TcaNQUbmin5IFS2WlpZwd3dHUFAQFi5ciOzsbL0cJy0tDdOnT0dKSope2tMnY66tPD777DOsWLECb7/9NlavXo3XX3+91H23bt1aecU94eWXX0bNmjVL/UwNGzYM5ubmuHv3Lu7evYs5c+agc+fOcHJyQq1atfDiiy9i/fr1xZ7322+/YezYsWjWrBmsra1Rr149hIaG4sKFC3p9DUeOHMH06dPx4MEDvbarD8Zc25MePnyI6dOn48CBA5LVsGPHjmobNOLi4jB//nypyzAugmQtNjZWABCffPKJWL16tfjuu+/EZ599Jnr27CkUCoXw9PQUv//+u8ZzHj9+LP75558KHee3334TAERsbGyFnpefny/y8/PVj/fv3y8AiI0bN1aoHV1re/TokcjLy9PbsQyhXbt2okOHDuXa19raWoSFhRVbP23aNAFA3L59W8/V/WfdunUCgFi5cqXW7bm5ucLa2lqEhIQIIYT46aefRI0aNUTfvn3F/PnzxaJFi0TXrl0FAPHxxx9rPHfgwIHC1dVVjBs3Tvzf//2fmDFjhnBxcRHW1tbi9OnTensNc+bMEQDE5cuXK/S8f/75Rzx+/Fj92BDnW9faylJQUCD++ecfoVKp9NLe7du3BQAxbdq0YtvCwsKEtbW1Xo5TmoiICGHIX19Pv9/GpE+fPsLT01PqMoyKmTTxifQtODgYbdu2VT+OiopCQkICXnrpJbz88sv4888/YWVlBQAwMzODmZlh3/qHDx+iZs2aMDc3N+hxylKjRg1Jj18et27dgo+Pj9RllOnll1+Gra0t4uLiMHz48GLbt23bhtzcXAwbNgwA0KxZM1y8eBGenp7qfd555x0EBgZi9uzZmDJlCqytrQEAkZGRiIuL0/i8DBo0CC1atMCsWbOwZs0aA7+64lQqFR49egRLS0tYWlpW+vH1xdTUFKamplKXIZmCggKoVKoK/SyS8/tdLUmdpOjZFPXI/Pbbb1q3f/bZZwKA+Oabb9Triv6afNLu3btFhw4dhFKpFNbW1qJx48YiKipKCPFfL8rTS1EPSJcuXUSzZs3E8ePHRadOnYSVlZWYMGGCeluXLl3Uxylqa926dSIqKkq4uLiImjVripCQEHH16lWNmjw9PbX2PjzZZlm1hYWFFfvrJScnR0RGRgoPDw9hbm4uGjduLObMmVPsL1YAIiIiQmzZskU0a9ZMmJubCx8fH7Fz506t5/ppGRkZ4o033hDOzs7CwsJC+Pr6ihUrVhQ7F08vJf1Frm3fovNT9J5evHhRhIWFCaVSKezs7ER4eLjIzc0t1tbq1atFmzZthKWlpahdu7YYNGhQsfOvTVhYmDAzMxMZGRnFtr300kvC1tZWPHz4sNQ2Fi5cKACIU6dOlXm8Nm3aiDZt2pS53++//y7CwsKEl5eXsLCwEC4uLmLEiBHizp076n2KzlFJ57vo/V6zZo3w8fERZmZmYsuWLeptT/ZAFLX1559/ildffVXY2toKe3t7MX78eI3ezsuXL5fYW/hkm2XVJoTu71nRz4gn2/L09BR9+vQRBw8eFM8//7ywsLAQXl5eJfa2Pf16nl6KXkdRj8z169dF3759hbW1tXB0dBSTJ08WBQUFGm0VFhaKefPmCR8fH2FhYSGcnZ3F6NGjxb1790qtISwsTGsNT9Y3Z84cMW/ePNGgQQNhYmIikpOTRX5+vvjoo49EmzZthJ2dnahZs6bo2LGjSEhIKHaMkt7v8v7/etqFCxfEgAEDhIuLi7CwsBB16tQRgwYNEg8ePNDYr6z3uEuXLsVeN3tn2CNT5b3++uv43//+h927d+PNN9/Uus/Zs2fx0ksvwdfXF5988gksLCyQmpqKw4cPAwCee+45fPLJJ/j4448xevRodOrUCQDQvn17dRt3795FcHAwBg8ejNdeew0uLi6l1vXpp59CoVDg/fffx61btzB//nwEBgYiJSVF3XNUHuWp7UlCCLz88svYv38/Ro4ciVatWiE+Ph7vvfcebty4gXnz5mnsf+jQIWzevBnvvPMObG1tsXDhQgwcOBBXr16Fg4NDiXX9888/CAgIQGpqKsaOHQsvLy9s3LgR4eHhePDgASZMmIDnnnsOq1evxqRJk+Dh4YHJkycDAJycnLS2uXr1aowaNQovvPACRo8eDQDw9vbW2Cc0NBReXl6IiYnByZMn8e2338LZ2RmzZ89W7/Ppp5/io48+QmhoKEaNGoXbt2/jq6++QufOnZGcnIxatWqV+LqGDRuGlStXYsOGDRg7dqx6/b179xAfH48hQ4aU+f7dvHkTAODo6FjqfkIIZGRkoFmzZqXuBwB79uzBX3/9hREjRsDV1RVnz57FN998g7Nnz+LYsWNQKBQYMGAALly4gLVr12LevHnq4z95vhMSEtSvzdHRscyB4qGhoahfvz5iYmJw7NgxLFy4EPfv38eqVavKrPlJZdX2LO9ZSVJTU/HKK69g5MiRCAsLw3fffYfw8HD4+fmVeM6dnJywdOlSvP322+jfvz8GDBgAAPD19VXvU1hYiKCgILRr1w5ffPEF9u7diy+//BLe3t54++231fu99dZbWLFiBUaMGIHx48fj8uXLWLRoEZKTk3H48OESe1PfeustpKWlYc+ePVi9erXWfWJjY5GXl4fRo0fDwsIC9vb2yMrKwrfffoshQ4bgzTffRHZ2NpYvX46goCD8+uuvaNWqVZnnrDz/v5726NEjBAUFIT8/H+PGjYOrqytu3LiB7du348GDB1AqlQDK9x5/+OGHyMzMxPXr19U/q6rC4OpnJnWSomdTVo+MEEIolUrRunVr9eOne2TmzZtX5vX+0sahFP2VsGzZMq3btPXI1KlTR2RlZanXb9iwQQAQCxYsUK8rT49MWbU93SOzdetWAUDMnDlTY79XXnlFKBQKkZqaql4HQJibm2us+/333wUA8dVXXxU71pPmz58vAIg1a9ao1z169Ej4+/sLGxsbjdde9NdxeZQ1RuaNN97QWN+/f3/h4OCgfnzlyhVhamoqPv30U439Tp8+LczMzIqtf1pBQYFwc3MT/v7+GuuXLVsmAIj4+PhSn3/37l3h7OwsOnXqVOp+Qvz71ykAsXz58jL31dYLtHbtWgFAJCUlqdeVNg4FgDAxMRFnz57Vuk3bX+gvv/yyxn7vvPOOAKAel1beHpnSanvW96ykHpmnz82tW7eEhYWFmDx5cqntlTVGBv9/zN6TWrduLfz8/NSPDx48KACI77//XmO/Xbt2aV3/tJLGyBSdbzs7O3Hr1i2NbQUFBRrj9YQQ4v79+8LFxaXY/5uS3u+y/n9pk5ycXOa4wIq8xxwjUxzvWqoGbGxsSr3TpOivuW3btkGlUul0DAsLC4wYMaLc+w8fPhy2trbqx6+88grc3NywY8cOnY5fXjt27ICpqSnGjx+vsX7y5MkQQmDnzp0a6wMDAzV6PXx9fWFnZ4e//vqrzOO4urpiyJAh6nU1atTA+PHjkZOTg8TERD28muLGjBmj8bhTp064e/c
"text/plain": [
"<Figure size 640x480 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"_, _, bars = plt.hist(df_test['V22'], bins=20)\n",
"plt.xlabel('V22')\n",
"plt.ylabel('Frequency')\n",
"plt.title('Distribution of the V22 atribute in the train set')\n",
"plt.bar_label(bars, fmt='%1.0f')\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"id": "6d6b63fd",
"metadata": {},
"source": [
"The distribution of the target variable **V22** is normal, so i could try to fill the missing values with `mean()`."
]
},
{
"cell_type": "code",
2022-12-20 20:31:29 +01:00
"execution_count": 25,
2022-12-19 10:09:00 +01:00
"id": "2b2b6e2d",
"metadata": {},
"outputs": [],
"source": [
"df_train['V22'].fillna(df_train['V22'].mean(), inplace=True)\n",
"df_test['V22'].fillna(df_test['V22'].mean(), inplace=True)"
]
},
{
"cell_type": "markdown",
"id": "4164f62c",
"metadata": {},
"source": [
"##### V27"
]
},
{
"cell_type": "code",
2022-12-20 20:31:29 +01:00
"execution_count": 26,
2022-12-19 10:09:00 +01:00
"id": "9a8b64ac",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"count 838.000000\n",
"mean 2.218153\n",
"std 0.221545\n",
"min 1.000000\n",
"25% 2.107000\n",
"50% 2.251000\n",
"75% 2.359750\n",
"max 2.859000\n",
"Name: V27, dtype: float64"
]
},
2022-12-20 20:31:29 +01:00
"execution_count": 26,
2022-12-19 10:09:00 +01:00
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_train['V27'].describe()"
]
},
{
"cell_type": "code",
2022-12-20 20:31:29 +01:00
"execution_count": 27,
2022-12-19 10:09:00 +01:00
"id": "1bddfb76",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"2.000 36\n",
"2.236 31\n",
"2.194 24\n",
"1.848 22\n",
"2.175 21\n",
" ..\n",
"2.294 1\n",
"2.466 1\n",
"2.488 1\n",
"2.372 1\n",
"2.622 1\n",
"Name: V27, Length: 290, dtype: int64"
]
},
2022-12-20 20:31:29 +01:00
"execution_count": 27,
2022-12-19 10:09:00 +01:00
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_train['V27'].value_counts()"
]
},
{
"cell_type": "code",
2022-12-20 20:31:29 +01:00
"execution_count": 28,
2022-12-19 10:09:00 +01:00
"id": "f1787f2e",
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAjIAAAHHCAYAAACle7JuAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjYuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8o6BhiAAAACXBIWXMAAA9hAAAPYQGoP6dpAABPhElEQVR4nO3deViUVf8/8PeIzohsirI+IBK4oWJGhmgqKoJohmm55AKKaT5gqW3SoqIVmuVSKVZfA7UQl0TLEnIDl9TSJJdSwTQ1QcyFzUBlzu8Pf8zjyDaMM9xzD+/Xdc1Vc+4zZz5nzj3jhzPnPqMQQggQERERyVADqQMgIiIi0hcTGSIiIpItJjJEREQkW0xkiIiISLaYyBAREZFsMZEhIiIi2WIiQ0RERLLFRIaIiIhki4kMERERyRYTmXpozpw5UCgUdfJcgYGBCAwM1NxPT0+HQqHAxo0b6+T5IyIi0KpVqzp5Ln0VFRVh4sSJcHZ2hkKhwLRp02rdRvmY/vPPP4YPsJ47f/48FAoFEhMTNWURERGwtraWLqhaqOv3e8eOHevkuYylsvEm08ZERuYSExOhUCg0t8aNG8PV1RUhISH4+OOPUVhYaJDnuXz5MubMmYPMzEyDtGdIphybLt5//30kJiZiypQpWLNmDcaOHVtt3c2bN9ddcPd5+umn0aRJk2rPqdGjR0OpVOLatWu4du0aFi5ciF69esHBwQFNmzZFt27dsG7dugqPi4iI0DqPH7z9/fffBunDDz/8gDlz5hikLUMz5djuZwrvt6SkJCxZskSy55fS8uXLmWQ9SJCsJSQkCABi7ty5Ys2aNeLLL78U77//vggODhYKhUJ4eHiI3377Tesxd+7cEf/++2+tnueXX34RAERCQkKtHldaWipKS0s193fv3i0AiA0bNtSqHX1ju337tigpKTHYcxmDv7+/6NGjh051raysRHh4eIXy2bNnCwDi6tWrBo7uf5KTkwUAsWrVqkqPFxcXCysrKzF48GAhhBDfffedaNSokQgLCxNLliwRn376qejTp48AIGbNmqX12J9++kmsWbNG67Z69WrRpEkT4ePjY7A+REVFidp+7KnVavHvv/+Ku3fvasrCw8OFlZWVweLSNzZd6PN+r05177fevXuLDh06GOy5qjJo0CDh4eFhlLYrG29T0qFDB9G7d2+pwzApDaVKoMiwQkND8fjjj2vux8TEYNeuXXjqqafw9NNP448//oClpSUAoGHDhmjY0LhDf+vWLTRp0gRKpdKoz1OTRo0aSfr8usjLy4OPj4/UYdTo6aefho2NDZKSkjBu3LgKx7ds2YLi4mKMHj0aANChQwdkZWXBw8NDU+e///0vgoKCsGDBArz++uuwsrICAAQEBCAgIECrvX379uHWrVua9ura3bt3oVaroVQq0bhxY0liMIS6eL+bspKSEiiVSjRooNsXEOUz2yQjUmdS9HDKZ2R++eWXSo+///77AoD4/PPPNWXlf73f78cffxQ9evQQdnZ2wsrKSrRp00bExMQIIf43i/LgrfwvsvK/wg4fPix69uwpLC0txcsvv6w5dv9fD+VtJScni5iYGOHk5CSaNGkiBg8eLC5cuKAVk4eHR6WzD/e3WVNs4eHhFf5yKyoqEjNmzBBubm5CqVSKNm3aiIULFwq1Wq1VD4CIiooSKSkpokOHDkKpVAofHx+xbdu2Sl/rB125ckVMmDBBODo6CpVKJXx9fUViYmKF1+LB27lz5yptr7K65a9P+ZhmZWWJ8PBwYWdnJ2xtbUVERIQoLi6u0NaaNWvEY489Jho3biyaNWsmRowYUeH1r0x4eLho2LChuHLlSoVjTz31lLCxsRG3bt2qto2PP/5YABDHjh2rtt6UKVOEQqGo8vW43549e8Szzz4r3N3dhVKpFG5ubmLatGlasYSHh1f6GgohxLlz5wQAsXDhQrF48WLxyCOPiAYNGoijR49qjt0/A1E+I3P27FkRHBwsmjRpIlxcXERsbKzWeVQ+xrt379aK98E2q4tNCCHKysrE4sWLhY+Pj1CpVMLR0VFMmjRJXL9+vcbXprL3u77ntq6fBSdPnhSBgYHC0tJSuLq6igULFlRoq6SkRMyaNUt4eXlpxuy1116rcQa1d+/eFZ6//D1eHt/atWvFW2+9JVxdXYVCoRA3btwQ165dE6+88oro2LGjsLKyEjY2NmLAgAEiMzNTq/3qxvvSpUsiLCxMWFlZiRYtWohXXnlFp5mbX375RQQHB4vmzZuLxo0bi1atWonx48dr1dFljD08PCr0nbMznJExe2PHjsWbb76JH3/8ES+88EKldU6ePImnnnoKvr6+mDt3LlQqFbKzs7F//34AQPv27TF37lzMmjULkyZNQs+ePQEA3bt317Rx7do1hIaGYuTIkRgzZgycnJyqjeu9996DQqHAG2+8gby8PCxZsgRBQUHIzMzUzBzpQpfY7ieEwNNPP43du3cjMjISjz76KNLS0vDaa6/h77//xuLFi7Xq79u3D5s2bcJ///tf2NjY4OOPP8awYcNw4cIFNG/evMq4/v33XwQGBiI7OxvR0dHw9PTEhg0bEBERgZs3b+Lll19G+/btsWbNGkyfPh1ubm545ZVXAAAODg6VtrlmzRpMnDgRTzzxBCZNmgQA8PLy0qozfPhweHp6Ii4uDr/++iv+7//+D46OjliwYIGmznvvvYd33nkHw4cPx8SJE3H16lV88skn6NWrF44ePYqmTZtW2a/Ro0dj1apVWL9+PaKjozXl169fR1paGkaNGlXj+OXm5gIAWrRoUWWdO3fuYP369ejevbtOi7U3bNiAW7duYcqUKWjevDl+/vlnfPLJJ7h06RI2bNgAAJg8eTIuX76M7du3Y82aNZW2k5CQgJKSEkyaNAkqlQr29vZQq9WV1i0rK8OAAQPQrVs3fPDBB0hNTcXs2bNx9+5dzJ07t8aY71dTbJMnT0ZiYiLGjx+Pl156CefOncOnn36Ko0ePYv/+/XrNPOpzbuvyfrtx4wYGDBiAoUOHYvjw4di4cSPeeOMNdOrUCaGhoQAAtVqNp59+Gvv27cOkSZPQvn17HD9+HIsXL8aZM2eqXQf21ltvIT8/H5cuXdK8Xx9ceD1v3jwolUq8+uqrKC0thVKpxO+//47Nmzfjueeeg6enJ65cuYLPPvsMvXv3xu+//w5XV9dqX6+ysjKEhITA398fH374IXbs2IGPPvoIXl5emDJlSpWPy8vLQ3BwMBwcHDBz5kw0bdoU58+fx6ZNm7Tq6TLGS5YswdSpU2FtbY233noLAGr8rK0XpM6k6OHUNCMjhBB2dnaiS5cumvsP/oW2ePHiGtdX1PS9OACxYsWKSo9VNiPzn//8RxQUFGjK169fLwCIpUuXasp0mZGpKbYHZ2Q2b94sAIh3331Xq96zzz4rFAqFyM7O1pQBEEqlUqvst99+EwDEJ598UuG57rdkyRIBQHz11Veastu3b4uAgABhbW2t1XcPDw8xaNCgatsrV9MamQkTJmiVP/PMM6J58+aa++fPnxcWFhbivffe06p3/Phx0bBhwwrlD7p7965wcXERAQEBWuUrVqwQAERaWlq1j7927ZpwdHQUPXv2rLbed999JwCI5cuXV1uvXGWzQHFxcUKhUIi//vpLU1bVOpTyv8JtbW1FXl5epcce/AsdgJg6daqmTK1Wi0GDBgmlUql5L+k6I1NdbHv37hUAxNdff61VnpqaWmn5g6qakdH33Nbls2D16tWastLSUuHs7CyGDRumKVuzZo1o0KCB2Lt3r9bjy8+j/fv3VxtDVWtkyl/vRx55pMI5UVJSIsrKyrTKzp07J1QqlZg7d65WWVXjfX89IYTo0qWL8PPzqzbWlJSUGj+jazPGXCNTEa9aqgesra2rvdKk/C/wLVu2VPnXZ01UKhXGjx+vc/1x48bBxsZGc//ZZ5+Fi4sLfvjhB72eX1c//PADLCws8NJLL2mVv/LKKxBCYNu2bVrlQUFBWrMevr6+sLW1xZ9//lnj8zg7O2PUqFGaskaNGuGll15CUVERMjIyDNCbil588UWt+z179sS
"text/plain": [
"<Figure size 640x480 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"_, _, bars = plt.hist(df_test['V27'], bins=20)\n",
"plt.xlabel('V27')\n",
"plt.ylabel('Frequency')\n",
"plt.title('Distribution of the V27 atribute in the train set')\n",
"plt.bar_label(bars, fmt='%1.0f')\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"id": "53b79865",
"metadata": {},
"source": [
"The distribution of the target variable **V27** is normal, so i could try to fill the missing values with `mean()`."
]
},
{
"cell_type": "code",
2022-12-20 20:31:29 +01:00
"execution_count": 29,
2022-12-19 10:09:00 +01:00
"id": "8974127e",
"metadata": {},
"outputs": [],
"source": [
"# Set the nan values to the mean of the column\n",
"df_train['V27'].fillna(df_train['V27'].mean(), inplace=True)\n",
"df_test['V27'].fillna(df_test['V27'].mean(), inplace=True)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "3afb5a2f",
"metadata": {},
"source": [
"##### V29"
]
},
{
"cell_type": "code",
2022-12-20 20:31:29 +01:00
"execution_count": 30,
2022-12-19 10:09:00 +01:00
"id": "f410439d",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"count 838.00000\n",
"mean 0.02506\n",
"std 0.15640\n",
"min 0.00000\n",
"25% 0.00000\n",
"50% 0.00000\n",
"75% 0.00000\n",
"max 1.00000\n",
"Name: V29, dtype: float64"
]
},
2022-12-20 20:31:29 +01:00
"execution_count": 30,
2022-12-19 10:09:00 +01:00
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_train['V29'].describe()"
]
},
{
"cell_type": "code",
2022-12-20 20:31:29 +01:00
"execution_count": 31,
2022-12-19 10:09:00 +01:00
"id": "2d33e7c4",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.0 817\n",
"1.0 21\n",
"Name: V29, dtype: int64"
]
},
2022-12-20 20:31:29 +01:00
"execution_count": 31,
2022-12-19 10:09:00 +01:00
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_train['V29'].value_counts()"
]
},
{
"cell_type": "markdown",
"id": "515e9e80",
"metadata": {},
"source": [
"We can see that the majority of entires in that particular atribute are zeros. So I think that it would be best if I set all the `Nan` values to zeros."
]
},
{
"cell_type": "code",
2022-12-20 20:31:29 +01:00
"execution_count": 32,
2022-12-19 10:09:00 +01:00
"id": "48e8ba49",
"metadata": {},
"outputs": [],
"source": [
"# Set nan values to 0\n",
"df_train['V29'].fillna(0, inplace=True)\n",
"df_test['V29'].fillna(0, inplace=True)"
]
},
{
"cell_type": "markdown",
"id": "f659f8bc",
"metadata": {},
"source": [
"##### V37"
]
},
{
"cell_type": "code",
2022-12-20 20:31:29 +01:00
"execution_count": 33,
2022-12-19 10:09:00 +01:00
"id": "8515f06b",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"count 821.000000\n",
"mean 2.549406\n",
"std 0.625021\n",
"min 1.467000\n",
"25% 2.101000\n",
"50% 2.461000\n",
"75% 2.861000\n",
"max 5.750000\n",
"Name: V37, dtype: float64"
]
},
2022-12-20 20:31:29 +01:00
"execution_count": 33,
2022-12-19 10:09:00 +01:00
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_train['V37'].describe()"
]
},
{
"cell_type": "code",
2022-12-20 20:31:29 +01:00
"execution_count": 34,
2022-12-19 10:09:00 +01:00
"id": "36bc89b5",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"2.167 9\n",
"2.500 9\n",
"2.833 8\n",
"2.667 8\n",
"1.833 7\n",
" ..\n",
"2.029 1\n",
"1.886 1\n",
"2.089 1\n",
"2.197 1\n",
"2.206 1\n",
"Name: V37, Length: 535, dtype: int64"
]
},
2022-12-20 20:31:29 +01:00
"execution_count": 34,
2022-12-19 10:09:00 +01:00
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_train['V37'].value_counts()"
]
},
{
"cell_type": "code",
2022-12-20 20:31:29 +01:00
"execution_count": 35,
2022-12-19 10:09:00 +01:00
"id": "02c38a9f",
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAjMAAAHHCAYAAABKudlQAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjYuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8o6BhiAAAACXBIWXMAAA9hAAAPYQGoP6dpAABNjUlEQVR4nO3deVhUZf8G8HsUZ0SWUZQ1FhEXRMWMTFFT3FA0xNQss8Qt01BTtIx6K5cMrV+KlqKWgVqkYqJlEbmiphaY5Pa6YBqaLOYCgjEo8/z+6GVyZB9nOHPg/lzXuS7nnDPPfOeccbjnmec8oxBCCBARERHJVD2pCyAiIiJ6GAwzREREJGsMM0RERCRrDDNEREQkawwzREREJGsMM0RERCRrDDNEREQkawwzREREJGsMM0RERCRrDDN10Ny5c6FQKGrksQICAhAQEKC7vW/fPigUCmzZsqVGHn/s2LFo3rx5jTyWofLz8zFx4kQ4OTlBoVBgxowZ1W6j5Jz+9ddfxi+wjrt06RIUCgViY2N168aOHQtra2vpiqqGmv7/3r59+xp5LFMp63yT+WOYkbnY2FgoFArd0rBhQ7i4uGDAgAFYvnw5bt++bZTHuXr1KubOnYu0tDSjtGdM5lxbVbz//vuIjY3FlClTsGHDBrz44osV7rtt27aaK+4+Q4YMQaNGjSp8TY0ePRpKpRLXr18HAMycOROPPfYY7Ozs0KhRI7Rt2xZz585Ffn6+3v3Gjh2r9zp+cPnzzz+N8hy+//57zJ071yhtGZs513Y/c/j/FhcXh6ioKMkeX0orV65k0CqLIFmLiYkRAMT8+fPFhg0bxOeffy7ef/99ERgYKBQKhfDw8BC//fab3n3u3r0r/v7772o9TkpKigAgYmJiqnU/jUYjNBqN7vbevXsFABEfH1+tdgytraioSBQWFhrtsUyhS5cuonv37lXa18rKSoSGhpZa/+677woA4tq1a0au7l8bN24UAMS6devK3F5QUCCsrKxEcHCwbl337t3F9OnTxfLly8WaNWvElClThEqlEt27dxfFxcW6/Q4dOiQ2bNigt6xfv140atRI+Pj4GO05hIWFieq+7Wm1WvH333+Le/fu6daFhoYKKysro9VlaG1VYcj/94pU9P+tV69eol27dkZ7rPIMHjxYeHh4mKTtss63OWnXrp3o1auX1GWYHQvJUhQZVVBQEB5//HHd7YiICOzZswdPPfUUhgwZgv/+97+wtLQEAFhYWMDCwrSn/s6dO2jUqBGUSqVJH6cyDRo0kPTxqyInJwc+Pj5Sl1GpIUOGwMbGBnFxcRgzZkyp7du3b0dBQQFGjx6tW3fw4MFS+3l5eWH27Nn45Zdf0LVrVwCAv78//P399fY7ePAg7ty5o9deTbp37x60Wi2USiUaNmwoSQ3GUBP/381ZYWEhlEol6tWr2hcRJT3cJDNSpyl6OCU9MykpKWVuf//99wUAsWbNGt26kk/x9/vxxx9F9+7dhVqtFlZWVqJ169YiIiJCCPFvb8qDS8kns5JPY6mpqeLJJ58UlpaW4tVXX9Vtu/9TRElbGzduFBEREcLR0VE0atRIBAcHi4yMDL2aPDw8yuyFuL/NymoLDQ0t9QkuPz9fhIeHC1dXV6FUKkXr1q3Fhx9+KLRard5+AERYWJhISEgQ7dq1E0qlUvj4+IjExMQyj/WDsrOzxfjx44WDg4NQqVTC19dXxMbGljoWDy4XL14ss72y9i05PiXn9Pz58yI0NFSo1Wpha2srxo4dKwoKCkq1tWHDBvHYY4+Jhg0biiZNmohnn3221PEvS2hoqLCwsBDZ2dmltj311FPCxsZG3Llzp8I2tmzZIgBUehynTJkiFApFucfjfvv37xcjRowQbm5uQqlUCldXVzFjxgy9WkJDQ8s8hkIIcfHiRQFAfPjhh2Lp0qWiRYsWol69euLYsWO6bff3RJT0zFy4cEEEBgaKRo0aCWdnZzFv3jy911HJOd67d69evQ+2WVFtQghRXFwsli5dKnx8fIRKpRIODg5i0qRJ4saNG5Uem7L+vxv62q7qe8GpU6dEQECAsLS0FC4uLmLx4sWl2iosLBTvvPOO8PLy0p2z1157rdKe1F69epV6/JL/4yX1ffXVV+Ktt94SLi4uQqFQiJs3b4rr16+LWbNmifbt2wsrKythY2MjBg4cKNLS0vTar+h8X7lyRYSEhAgrKyvRrFkzMWvWrCr14KSkpIjAwEDRtGlT0bBhQ9G8eXMxbtw4vX2qco49PDxKPXf20vyj7sb1OuLFF1/Em2++iR9//BEvvfRSmfucOnUKTz31FHx9fTF//nyoVCqkp6fjp59+AgC0bdsW8+fPxzvvvINJkybhySefBAB069ZN18b169cRFBSE5557Di+88AIcHR0rrGvhwoVQKBSYM2cOcnJyEBUVhX79+iEtLU3Xg1QVVantfkIIDBkyBHv37sWECRPw6KOPIikpCa+99hr+/PNPLF26VG//gwcPYuvWrXjllVdgY2OD5cuXY/jw4cjIyEDTpk3Lrevvv/9GQEAA0tPTMXXqVHh6eiI+Ph5jx47FrVu38Oqrr6Jt27bYsGEDZs6cCVdXV8yaNQsAYG9vX2abGzZswMSJE/HEE09g0qRJAP7p5bjfyJEj4enpicjISPz666/47LPP4ODggMWLF+v2WbhwId5++22MHDkSEydOxLVr1/Dxxx+jZ8+eOHbsGBo3blzu8xo9ejTWrVuHzZs3Y+rUqbr1N27cQFJSEkaNGlXq/N27dw+3bt1CUVERTp48if/85z+wsbHBE088Ue7j3L17F5s3b0a3bt2qNIA7Pj4ed+7cwZQpU9C0aVP88ssv+Pjjj3HlyhXEx8cDAF5++WVcvXoVO3fuxIYNG8psJyYmBoWFhZg0aRJUKhXs7Oyg1WrL3Le4uBgDBw5E165d8cEHH+CHH37Au+++i3v37mH+/PmV1ny/ymp7+eWXERsbi3HjxmH69Om4ePEiPvnkExw7dgw//fSTQT2Qhry2q/L/7ebNmxg4cCCGDRuGkSNHYsuWLZgzZw46dOiAoKAgAIBWq8WQIUNw8OBBTJo0CW3btsWJEyewdOlSnDt3rsJxYW+99RZyc3Nx5coV3f/XBwdjL1iwAEqlErNnz4ZGo4FSqcTp06exbds2PPPMM/D09ER2djZWr16NXr164fTp03BxcanweBUXF2PAgAHo0qUL/u///g+7du3CRx99BC8vL0yZMqXc++Xk5CAwMBD29vZ444030LhxY1y6dAlbt27V268q5zgqKgrTpk2DtbU13nrrLQCo9L22zpA6TdHDqaxnRggh1Gq16NSpk+72g5/Uli5dWul4i8q+JwcgVq1aVea2snpmHnnkEZGXl6dbv3nzZgFALFu2TLeuKj0zldX2YM/Mtm3bBADx3nvv6e03YsQIoVAoRHp6um4dAKFUKvXW/fbbbwKA+Pjjj0s91v2ioqIEAPHFF1/o1hUVFQl/f39hbW2t99w9PDzE4MGDK2yvRGVjZsaPH6+3/umnnxZNmzbV3b506ZKoX7++WLhwod5+J06cEBYWFqXWP+jevXvC2dlZ+Pv7661ftWqVACCSkpJK3efw4cN6nyTbtGlTqqfiQd9++60AIFauXFnhfiXK6g2KjIwUCoVC/PHHH7p15Y1LKfk0bmtrK3Jycsrc9uAndQBi2rRpunVarVYMHjxYKJVK3f+lqvbMVFTbgQMHBADx5Zdf6q3/4Ycfylz/oPJ6Zgx9bVflvWD9+vW6dRqNRjg5OYnhw4fr1m3YsEHUq1dPHDhwQO/+Ja+jn376qcIayhszU3K8W7RoUeo1UVhYqDdOS4h/zoNKpRLz58/XW1fe+b5/PyGE6NSpk/Dz86uw1oSEhErfo6tzjjlmpmy8mqkOsLa2rvAKlJJP4tu3by/3U2hlVCoVxo0bV+X9x4wZAxsbG93tESNGwNnZGd9//71Bj19V33//PerXr4/p06frrZ81axaEEEhMTNRb369fP73eD19fX9ja2uL333+v9HGcnJwwatQo3boGDRpg+vTpyM/PR3JyshGeTWmTJ0/Wu/3
"text/plain": [
"<Figure size 640x480 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"_, _, bars = plt.hist(df_test['V37'], bins=20)\n",
"plt.xlabel('V37')\n",
"plt.ylabel('Frequency')\n",
"plt.title('Distribution of the V37 atribute in the train set')\n",
"plt.bar_label(bars, fmt='%1.0f')\n",
"plt.show()"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "15f862dd",
"metadata": {},
"source": [
"The distribution of the target variable **V37** is normal, so i could try to fill the missing values with `mean()`."
]
},
{
"cell_type": "code",
2022-12-20 20:31:29 +01:00
"execution_count": 36,
2022-12-19 10:09:00 +01:00
"id": "e1058d9a",
"metadata": {},
"outputs": [],
"source": [
"df_train['V37'].fillna(df_train['V37'].mean(), inplace=True)\n",
"df_test['V37'].fillna(df_test['V37'].mean(), inplace=True)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "44ca71d0",
"metadata": {},
"source": [
"### 2.2 Modeling\n",
"Besides the baselines (majority classifier, random classifier), use at least three machine learning algorithms\n",
"to model the target class. Be ready to argue why did you select specific algorithms and how did you find\n",
"the best hyperparameters for them. Consider the following points when creating your models:\n",
"- Create your models using all features and subsets of them using various feature selection techniques.\n",
"- Certain models assume that data follows a particular distribution or may work better with other\n",
"types of variables (e.g., categorical instead of numeric). Explore whether you can come up with feature\n",
"transformations that are more appropriate for your models. Try to construct new features from existing\n",
"ones. Try to explain the results and performance of different models."
]
},
{
"cell_type": "code",
2022-12-20 20:31:29 +01:00
"execution_count": 37,
2022-12-19 10:09:00 +01:00
"id": "42e83cd5",
"metadata": {},
"outputs": [],
"source": [
"# Spliting the data into features and labels\n",
"X_train = df_train.drop('Class', axis=1).reset_index(drop=True)\n",
"y_train = df_train['Class'].reset_index(drop=True)\n",
"X_test = df_test.drop('Class', axis=1).reset_index(drop=True)\n",
"y_test = df_test['Class'].reset_index(drop=True)"
]
},
2022-12-20 20:31:29 +01:00
{
"cell_type": "markdown",
"id": "9544c1ec",
"metadata": {},
"source": [
"#### Using majority classifier and random classifier"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "a07a61d4",
"metadata": {},
"source": [
"##### Majority classifier"
]
},
{
"cell_type": "code",
"execution_count": 38,
"id": "2f41cf22",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"1 0.666667\n",
"2 0.333333\n",
"Name: Class, dtype: float64"
]
},
"execution_count": 38,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Get the reatio between thhe class we are trying to predict\n",
"y_train.value_counts(normalize=True)\n"
]
},
{
"cell_type": "markdown",
"id": "c3ddae4d",
"metadata": {},
"source": [
"If we were to predict using the majority classifier then we would always predict Ready non-biodegradable."
]
},
{
"cell_type": "code",
"execution_count": 39,
"id": "abef9e0c",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.645933014354067"
]
},
"execution_count": 39,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"y_test[y_test == 1].shape[0] / y_test.shape[0]"
]
},
{
"cell_type": "markdown",
"id": "b136180f",
"metadata": {},
"source": [
"We would get the accuracy of 0.645933014354067 if we predicted all the values to be 1."
]
},
{
"cell_type": "markdown",
"id": "bff2a7d3",
"metadata": {},
"source": [
"#### Random classifier"
]
},
{
"cell_type": "markdown",
"id": "a9a5ac3b",
"metadata": {},
"source": [
"We have two classes to predict, so probability of predicting the right class is 50%."
]
},
2022-12-19 10:09:00 +01:00
{
"attachments": {},
"cell_type": "markdown",
"id": "5779375e",
"metadata": {},
"source": [
"#### Lets firstly write a simple function that will score all our generated models"
]
},
{
"cell_type": "code",
"execution_count": 59,
"id": "3d716f7b",
"metadata": {},
"outputs": [],
"source": [
"from sklearn.metrics import precision_score\n",
"from sklearn.metrics import recall_score\n",
"from sklearn.metrics import f1_score\n",
"from sklearn.metrics import roc_auc_score\n",
"from sklearn.metrics import RocCurveDisplay\n",
"from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay\n",
2022-12-20 20:31:29 +01:00
"from sklearn.model_selection import KFold, RepeatedKFold\n",
2022-12-19 10:09:00 +01:00
"\n",
"def score_the_model(model, model_name, random_seed, X_train, X_test, y_train, y_test, plot=False):\n",
" \"\"\"\n",
" Fits and evaluates given machine learning models.\n",
" models: dict of different Scikit-Learn machine learning models\n",
" X_train: training data (no labels)\n",
" x_test: testing data (no labels)\n",
" y_train: training labels\n",
" y_test: trest labels\n",
" \"\"\"\n",
"\n",
" # Set random seed\n",
" np.random.seed(random_seed)\n",
"\n",
" # Fit the model to the data\n",
" model.fit(X_train, y_train)\n",
"\n",
" model_score = model.score(X_test, y_test) # Mean accuracy of ``self.predict(X)`` wrt. `y`.\n",
" # Predict the labels\n",
" y_pred = model.predict(X_test)\n",
"\n",
" # Compute scores\n",
" f1 = f1_score(y_test, y_pred)\n",
" precision = precision_score(y_test, y_pred)\n",
" recall = recall_score(y_test, y_pred)\n",
" auc = roc_auc_score(y_test, y_pred)\n",
" # Plot scores\n",
2022-12-20 20:31:29 +01:00
" normal_scores = {\n",
2022-12-19 10:09:00 +01:00
" 'Accuracy': model_score,\n",
" 'F1': f1,\n",
" 'Precision': precision,\n",
" 'Recall': recall,\n",
" 'AUC': auc\n",
" }\n",
2022-12-20 20:31:29 +01:00
"\n",
" def normal_cv(model, X_train, y_train, random_seed):\n",
" # Perform normal cross-validation\n",
" X_train = X_train.copy()\n",
" y_train = y_train.copy()\n",
" kfold = KFold(n_splits=5, shuffle=True, random_state=random_seed)\n",
" scores = []\n",
"\n",
" for train_ix, test_ix in kfold.split(X_train):\n",
" # Split the data\n",
" X_train_cv, X_test_cv = X_train.iloc[train_ix], X_train.iloc[test_ix]\n",
" y_train_cv, y_test_cv = y_train.iloc[train_ix], y_train.iloc[test_ix]\n",
"\n",
" # Fit the model\n",
" model.fit(X_train_cv, y_train_cv)\n",
"\n",
" # Evaluate the model\n",
" y_pred = model.predict(X_test_cv)\n",
" scrs = {\n",
" 'Accuracy': model.score(X_test_cv, y_test_cv),\n",
" 'F1': f1_score(y_test_cv, y_pred),\n",
" 'Precision': precision_score(y_test_cv, y_pred),\n",
" 'Recall': recall_score(y_test_cv, y_pred),\n",
" 'AUC': roc_auc_score(y_test_cv, y_pred)\n",
" }\n",
" scores.append(scrs)\n",
" \n",
" # Plot all the scores\n",
" scores = pd.DataFrame(scores)\n",
" scores.plot(kind='bar', figsize=(10, 8))\n",
" # Plot also the values at the top of the bars\n",
" plt.title(f'Cross-validated scores for {model_name}')\n",
" plt.xlabel('Fold')\n",
" plt.ylabel('Score')\n",
" plt.legend(loc='lower right')\n",
" plt.show()\n",
" return scores\n",
"\n",
" scores_cv = normal_cv(model, X_train, y_train, random_seed)\n",
"\n",
" def repeated_cv(model, X_train, y_train, random_seed):\n",
" # Perform another cv with 10 folds\n",
" scores_k_fold = []\n",
" rkf = RepeatedKFold(n_splits=10, n_repeats=10, random_state=random_seed)\n",
" for train_index, test_index in rkf.split(X_train):\n",
" model.fit(X_train.iloc[train_index], y_train.iloc[train_index])\n",
" y_pred = model.predict(X_train.iloc[test_index])\n",
" scrs = {\n",
" 'Accuracy': model.score(X_train.iloc[test_index], y_train.iloc[test_index]),\n",
" 'F1': f1_score(y_train.iloc[test_index], y_pred),\n",
" 'Precision': precision_score(y_train.iloc[test_index], y_pred),\n",
" 'Recall': recall_score(y_train.iloc[test_index], y_pred),\n",
" 'AUC': roc_auc_score(y_train.iloc[test_index], y_pred)\n",
" }\n",
"\n",
" scores_k_fold.append(scrs)\n",
" return scores_k_fold\n",
"\n",
" scores_k_fold = repeated_cv(model, X_train, y_train, random_seed)\n",
"\n",
" k_fold_scores_mean = {}\n",
" k_fold_scores_std = {}\n",
"\n",
" k_fold_scores_mean['acccuracy_mean'] = np.mean([score['Accuracy'] for score in scores_k_fold])\n",
" k_fold_scores_std['accuracy_std'] = np.std([score['Accuracy'] for score in scores_k_fold]) \n",
" k_fold_scores_mean['f1_mean'] = np.mean([score['F1'] for score in scores_k_fold])\n",
" k_fold_scores_std['f1_std'] = np.std([score['F1'] for score in scores_k_fold])\n",
" k_fold_scores_mean['precision_mean'] = np.mean([score['Precision'] for score in scores_k_fold])\n",
" k_fold_scores_std['precision_std'] = np.std([score['Precision'] for score in scores_k_fold])\n",
" k_fold_scores_mean['recall_mean'] = np.mean([score['Recall'] for score in scores_k_fold])\n",
" k_fold_scores_std['recall_std'] = np.std([score['Recall'] for score in scores_k_fold])\n",
" k_fold_scores_mean['auc_mean'] = np.mean([score['AUC'] for score in scores_k_fold])\n",
" k_fold_scores_std['auc_std'] = np.std([score['AUC'] for score in scores_k_fold])\n",
"\n",
2022-12-19 10:09:00 +01:00
" if plot:\n",
" # Plot scores\n",
2022-12-20 20:31:29 +01:00
" fig, ax = plt.subplots(nrows=3, ncols=2, figsize=(15,15))\n",
2022-12-19 10:09:00 +01:00
"\n",
2022-12-20 20:31:29 +01:00
" # Plot the bar chart of Normal cv scores in the first subplot \n",
" ax[0, 0].bar(normal_scores.keys(), normal_scores.values())\n",
2022-12-19 10:09:00 +01:00
" # Display values of the bars\n",
2022-12-20 20:31:29 +01:00
" for i, v in enumerate(normal_scores.values()):\n",
2022-12-19 10:09:00 +01:00
" ax[0, 0].text(i-0.1, v+0.01, str(round(v, 2)))\n",
2022-12-20 20:31:29 +01:00
" ax[0, 0].set_title(f'Default scoring of {model_name}')\n",
2022-12-19 10:09:00 +01:00
" ax[0, 0].set_ylabel('Score')\n",
2022-12-20 20:31:29 +01:00
"\n",
" # Plot the k-fold cv scores in the third subplot\n",
" ax[0, 1].bar(k_fold_scores_mean.keys(), k_fold_scores_mean.values())\n",
" # Display values of the bars\n",
" for i, v in enumerate(k_fold_scores_mean.values()):\n",
" ax[0, 1].text(i-0.1, v+0.01, str(round(v, 2)))\n",
" ax[0, 1].set_title(f'10-fold cross-validated scoring of {model_name} (mean)')\n",
"\n",
" # Plot the k-fold cv scores in the third subplot\n",
" ax[1, 0].bar(k_fold_scores_std.keys(), k_fold_scores_std.values())\n",
" # Display values of the bars\n",
" for i, v in enumerate(k_fold_scores_std.values()):\n",
" ax[1, 0].text(i-0.1, v+0.01, str(round(v, 2)))\n",
" ax[1, 0].set_title(f'10-fold cross-validated scoring of {model_name} (std)')\n",
"\n",
2022-12-19 10:09:00 +01:00
" \n",
" # Plot the ROC curve in the second subplot\n",
2022-12-20 20:31:29 +01:00
" f = RocCurveDisplay.from_estimator(model, X_test, y_test).plot(ax=ax[1, 1])\n",
2022-12-19 10:09:00 +01:00
" \n",
" # Plot the confusion matrix in the third subplot\n",
" cm = confusion_matrix(y_test, y_pred, labels=model.classes_)\n",
2022-12-20 20:31:29 +01:00
" cm_plt = ConfusionMatrixDisplay(cm, display_labels=model.classes_).plot(ax=ax[2, 0])\n",
" ax[2,1].set_visible(False)\n",
"\n",
" scores = []\n",
" scores.append(normal_scores)\n",
" scores.append(k_fold_scores_mean)\n",
" scores.append(k_fold_scores_std)\n",
2022-12-19 10:09:00 +01:00
" return scores, model"
]
},
{
"cell_type": "markdown",
"id": "d144deb1",
"metadata": {},
"source": [
"### Decision tree model"
]
},
{
"cell_type": "code",
2022-12-20 20:31:29 +01:00
"execution_count": 60,
2022-12-19 10:09:00 +01:00
"id": "63fe4438",
"metadata": {},
"outputs": [
{
"data": {
2022-12-20 20:31:29 +01:00
"image/png": "iVBORw0KGgoAAAANSUhEUgAAA04AAAK4CAYAAABDHK0xAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjYuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8o6BhiAAAACXBIWXMAAA9hAAAPYQGoP6dpAABUyUlEQVR4nO3dd3hU1b7G8XfSSUIIkEKAQGgKSAQNxVAEIRQRFFFBPEpRmhBFUBBUQBSNFeEIR0SkiHBoInAOXZoKUSAUqdI7CSDSEiEks+8fXuY4JrBIIRPC9/M881xn7bX2/s3O5ty8WXuvsVmWZQkAAAAAcE1uri4AAAAAAPI7ghMAAAAAGBCcAAAAAMCA4AQAAAAABgQnAAAAADAgOAEAAACAAcEJAAAAAAwITgAAAABgQHACAAAAAAOCEwDkA5MmTZLNZtPBgwcdbY0aNVKjRo2MY1etWiWbzaZVq1bdtPqyIyIiQp07d3Z1GfnKnj171KxZMxUpUkQ2m01z5851dUk3VXaugTfffFM2m+3mFAQAOUBwApDn9u3bpx49eqh8+fLy8fFRQECA6tWrp1GjRumPP/5wdXm3lYULF+rNN990dRm3jU6dOmnr1q165513NGXKFNWsWfOmHevgwYOy2WyOl6enp4KCglS3bl299tprOnz48E079q2mc+fOTufqWi/+EADc3jxcXQCA28uCBQv0xBNPyNvbWx07dlS1atWUmpqqH3/8Uf3799f27ds1btw4V5eZLyxduvSmH2PhwoUaM2YM4SkP/PHHH4qPj9frr7+u2NjYPDtuhw4d1LJlS9ntdv3+++9av369Ro4cqVGjRunLL7/Uk08+edOO/euvv8rNLWt/o33jjTc0cODAm1RR5nr06KGYmBjH+wMHDmjIkCHq3r27GjRo4GivUKFCntYFIH8hOAHIMwcOHNCTTz6psmXLasWKFQoLC3Ns6927t/bu3asFCxZcc7zdbldqaqp8fHzyolyX8/LycnUJt4W0tDTZ7fabfr5PnTolSQoMDMy1fSYnJ8vPz++6fe699149/fTTTm2HDh1Ss2bN1KlTJ1WpUkXVq1fPtZr+ytvbO8tjPDw85OGRt7+eREdHKzo62vF+w4YNGjJkiKKjozOcu7+6kfMPoODgVj0AeeaDDz7QxYsX9eWXXzqFpqsqVqyoPn36ON7bbDbFxsZq6tSpuuuuu+Tt7a3FixdLkjZt2qQHH3xQAQEB8vf3V5MmTfTTTz857e/KlSsaNmyYKlWqJB8fHxUvXlz169fXsmXLHH0SExPVpUsXlS5dWt7e3goLC9Mjjzzi9KzR382ePVs2m02rV6/OsO3zzz+XzWbTtm3bJEm//PKLOnfu7LgtsUSJEnr22Wf122+/Gc9XZs84HT16VG3atJGfn59CQkLUt29fXb58OcPYH374QU888YTKlCkjb29vhYeHq2/fvk63Qnbu3FljxoyRJKfbka6y2+0aOXKk7rrrLvn4+Cg0NFQ9evTQ77//7nQsy7I0fPhwlS5dWr6+vnrggQe0fft24+e7avr06YqKilLhwoUVEBCgyMhIjRo1yqnP2bNn1bdvX0VERMjb21ulS5dWx44ddfr0aUefkydP6rnnnlNoaKh8fHxUvXp1TZ482Wk/V29f++ijjzRy5EhVqFBB3t7e2rFjhyRp165devzxx1WsWDH5+PioZs2amj9/vtM+buS6+rs333xTZcuWlST1799fNptNERERju03cj1ffQ5u9erV6tWrl0JCQlS6dOkbPs9/VbZsWU2aNEmpqan64IMPnLadPXtWL730ksLDw+Xt7a2KFSvq/fffl91ud+pnt9s1atQoRUZGysfHR8HBwWrRooU2bNjg6PP3Z5xu5Nxl9oxTWlqa3n77bcfPKyIiQq+99lqGaz8iIkKtWrXSjz/+qNq1a8vHx0fly5fXV199la3z9Fem879o0SI1aNBAfn5+Kly4sB566KFM/x3cyDUGIH9ixglAnvnPf/6j8uXLq27dujc8ZsWKFZo5c6ZiY2MVFBSkiIgIbd++XQ0aNFBAQIAGDBggT09Pff7552rUqJFWr16tOnXqSPrzF7C4uDh17dpVtWvX1vnz57VhwwZt3LhRTZs2lSQ99thj2r59u1544QVFRETo5MmTWrZsmQ4fPuz0i+1fPfTQQ/L399fMmTPVsGFDp20zZszQXXfdpWrVqkmSli1bpv3796tLly4qUaKE41bE7du366effsrSQ/B//PGHmjRposOHD+vFF19UyZIlNWXKFK1YsSJD31mzZiklJUXPP/+8ihcvrnXr1unTTz/V0aNHNWvWLEl/3p50/PhxLVu2TFOmTMmwjx49emjSpEnq0qWLXnzxRR04cECjR4/Wpk2btGbNGnl6ekqShgwZouHDh6tly5Zq2bKlNm7cqGbNmik1NdX4mZYtW6YOHTqoSZMmev/99yVJO3fu1Jo1axwh+uLFi2rQoIF27typZ599Vvfee69Onz6t+fPn6+jRowoKCtIff/yhRo0aae/evYqNjVW5cuU0a9Ysde7cWWfPnnUK5JI0ceJEXbp0Sd27d5e3t7eKFSum7du3q169eipVqpQGDhwoPz8/zZw5U23atNE333yjRx99VNKNXVd/17ZtWwUGBqpv376OW+f8/f0l6Yav56t69eql4OBgDRkyRMnJycZzfC3R0dGqUKGCU2hJSUlRw4YNdezYMfXo0UNlypTR2rVrNWjQIJ04cUIjR4509H3uuec0adIkPfjgg+ratavS0tL0ww8/6Keffrrms1vZOXeS1LVrV02ePFmPP/64Xn75Zf3888+Ki4vTzp079e233zr13bt3rx5//HE999xz6tSpkyZMmKDOnTsrKipKd911V7bP11WZnf8pU6aoU6dOat68ud5//32lpKTos88+U/369bVp0ybH/5bc6DUGIJ+yACAPnDt3zpJkPfLIIzc8RpLl5uZmbd++3am9TZs2lpeXl7Vv3z5H2/Hjx63ChQtb999/v6OtevXq1kMPPXTN/f/++++WJOvDDz+88Q/y/zp06GCFhIRYaWlpjrYTJ05Ybm5u1ltvveVoS0lJyTD23//+tyXJ+v777x1tEydOtCRZBw4ccLQ1bNjQatiwoeP9yJEjLUnWzJkzHW3JyclWxYoVLUnWypUrr3vcuLg4y2azWYcOHXK09e7d28rs/xX88MMPliRr6tSpTu2LFy92aj958qTl5eVlPfTQQ5bdbnf0e+211yxJVqdOnTLs+6/69OljBQQEOJ3HvxsyZIglyZozZ06GbVePefXcfP31145tqampVnR0tOXv72+dP3/esizLOnDggCXJCggIsE6ePOm0ryZNmliRkZHWpUuXnPZft25dq1KlSo4203V1LVeP/ffr7Uav56vXSP369a97vkzH+6tHHnnEkmSdO3fOsizLevvtty0/Pz9r9+7dTv0GDhxoubu7W4cPH7Ysy7JWrFhhSbJefPHFDPv863VQtmxZp2vgRs7d0KFDna7JzZs3W5Ksrl27OvV75ZVXLEnWihUrnI73939bJ0+etLy9va2XX375usf9q/Xr11uSrIkTJzrarnX+L1y4YAUGBlrdunVz2kdiYqJVpEgRp/YbvcYA5E/cqgcgT5w/f16SVLhw4SyNa9iwoapWrep4n56erqVLl6pNmzYqX768oz0sLExPPfWUfvzxR8exAgMDtX37du3ZsyfTfRcqVEheXl5atWpVhtvPTNq3b6+TJ086LQE+e/Zs2e12tW/f3ukYV126dEmnT5/WfffdJ0nauHFjlo65cOFChYWF6fHHH3e0+fr6qnv37hn6/vW4ycnJOn36tOrWrSvLsrRp0ybjsWbNmqUiRYqoadOmOn36tOMVFRUlf39/rVy5UpL03XffKTU1VS+88ILT7NlLL710Q58pMDBQycnJ173N7ZtvvlH16tUz/Wv81WMuXLhQJUqUUIcOHRzbPD099eKLL+rixYsZbqt87LHHFBwc7Hh/5swZrVixQu3atdOFCxccn/e3335T8+bNtWfPHh07dsxR8/Wuq6zIyvV8Vbdu3eTu7p7jY0t
2022-12-19 10:09:00 +01:00
"text/plain": [
2022-12-20 20:31:29 +01:00
"<Figure size 1000x800 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"[{'Accuracy': 0.8038277511961722, 'F1': 0.844106463878327, 'Precision': 0.8671875, 'Recall': 0.8222222222222222, 'AUC': 0.7962462462462462}, {'acccuracy_mean': 0.8152563025210086, 'f1_mean': 0.8600581754172117, 'precision_mean': 0.8640751470339767, 'recall_mean': 0.8583613289215122, 'auc_mean': 0.7939793184201566}, {'accuracy_std': 0.04069806428930406, 'f1_std': 0.03381051842186253, 'precision_std': 0.04577110084032886, 'recall_std': 0.046568389821678886, 'auc_std': 0.0472492424190255}]\n"
]
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAABNEAAATYCAYAAAAxo1G2AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjYuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8o6BhiAAAACXBIWXMAAA9hAAAPYQGoP6dpAAEAAElEQVR4nOzdeVgV5f//8ReggKyKiKipmEvuSxjkUlqhuGRRpqalaKalkgu2aKm4lGSaYe6a28cy1zJTc0OtXMrd3PclTXDfUEFhfn/443w5cuAAsonPx3WdS8+ce2bue2bOmTfvueceG8MwDAEAAAAAAABIkW1OVwAAAAAAAADI7UiiAQAAAAAAAFaQRAMAAAAAAACsIIkGAAAAAAAAWEESDQAAAAAAALCCJBoAAAAAAABgBUk0AAAAAAAAwAqSaAAAAAAAAIAVJNEAAAAAAAAAK0iiAdnkyJEjaty4sdzd3WVjY6PFixdnyXoaNmyohg0bZsmys9vgwYNlY2OT09WwasWKFapZs6YcHR1lY2Ojq1ev5nSVLDp58qRsbGw0c+bMdM2Xl44pAHgcbN26VXXr1pWzs7NsbGy0a9euNM87c+ZM2djY6OTJk1bL+vj4qGPHjhmu5+PK0jZO67l2/fr1srGx0fr167OsfhnxqB0LGY2Jslt0dLTeeOMNFS5cWDY2NoqIiMjpKqUoI/Fien5vHkc3b96Ul5eXfvjhh5yuSqaYNGmSSpUqpdjY2JyuykMhiQb8f4k/4okvR0dHFS9eXIGBgfr2229148aNh1p+cHCw9uzZoy+++EKzZ89W7dq1M6nmqfvvv/80ePDgdAXQSLtLly6pdevWKlCggMaPH6/Zs2fL2dnZYtmsPsbyEh8fH7NtldIrtwe/ACDd/0MoLCxMTZo0kYeHh9XfrwMHDqhJkyZycXGRh4eH2rdvrwsXLqRpXXfv3lWrVq10+fJlffPNN5o9e7ZKly6dSS3Bo2z58uUaPHhwTlcD6dCnTx+tXLlS/fv31+zZs9WkSZMUyyaNj/LlyycPDw/5+vqqV69e2r9/fzbWOndLvEhv7ZUbLiCPGTNGrq6uevPNN3O6KpmiY8eOiouL0+TJk3O6Kg8lX05XAMhthg4dqjJlyuju3buKiorS+vXr1bt3b40ePVpLlixR9erV073M27dva/Pmzfrss88UEhKSBbVO2X///achQ4bIx8dHNWvWzNZ1P6wBAwaoX79+OV2NVG3dulU3btzQsGHDFBAQkKZ5suIYS4vSpUvr9u3byp8/f7rmW7VqVZbUJzURERG6efOm6f3y5cv1448/6ptvvpGnp6dpet26dbO9bgCQXhcvXtTQoUNVqlQp1ahRI9VeRGfOnNHzzz8vd3d3DR8+XDdv3tSoUaO0Z88ebdmyRfb29qmu69ixYzp16pSmTp2qd999N5NbgqySHefa5cuXa/z48STSlPGYKLutXbtWr776qj788MM0lW/UqJE6dOggwzB07do17d69W7NmzdKECRM0YsQIhYaGZlldM3IMt2/fXm+++aYcHByyoEaWvf766ypXrpzp/c2bN9WtWze99tprev31103TixYtmm11suTu3bsaM2aM+vTpIzs7uxytS2ZxdHRUcHCwRo8erQ8++OCRuOPIEpJowAOaNm1q1kusf//+Wrt2rV5++WW98sorOnDggAoUKJCuZSZePS5YsGBmVjXPiomJkbOzs/Lly6d8+XL3z9T58+clpW/fZsUxlhaJvd/Sy9ofbFkhKCjI7H1UVJR+/PFHBQUFycfHJ8X5Eo8dAMhNihUrpnPnzsnb21vbtm3TM888k2LZ4cOHKyYmRtu3b1epUqUkSX5+fmrUqJFmzpyprl27prqujJyXcqOEhATFxcVl6Lz1KMqJc+3j6N69e0pISJC9vf0jcWydP38+Xd/lChUq6O233zab9uWXX6pFixbq27evKlasqGbNmmVyLe/LyDFsZ2eX7Qmi6tWrm12wvnjxorp166bq1asn23ZJ3blzR/b29rK1zZ6b+ZYuXaoLFy6odevW2bK+7NK6dWt99dVXWrdunV588cWcrk6GcDsnkAYvvviiBg4cqFOnTun77783++zgwYN644035OHhIUdHR9WuXVtLliwxfT548GDTbRQfffSRbGxsTEmAU6dOqXv37nrqqadUoEABFS5cWK1atUo2LkBKY4NZG0dg/fr1pkC9U6dOaboF7saNG+rdu7d8fHzk4OAgLy8vNWrUSDt27DAr9/fff6tZs2YqVKiQnJ2dVb16dY0ZM8aszNq1a/Xcc8/J2dlZBQsW1KuvvqoDBw5YbNv+/fvVrl07FSpUSPXr10+x3TY2NgoJCdHixYtVtWpVOTg4qEqVKlqxYoXF9teuXVuOjo4qW7asJk+enK5x1hYsWCBfX18VKFBAnp6eevvtt3X27FnT5w0bNlRwcLAk6ZlnnpGNjU2GxwN5mGMs0dWrV9WnTx/TvnviiSfUoUMHXbx4UZLl8T+ioqLUqVMnPfHEE3JwcFCxYsX06quvWh2n5fz58+rcubOKFi0qR0dH1ahRQ7NmzTIrk7i+UaNGacqUKSpbtqwcHBz0zDPPaOvWrRnaTkl17NhRLi4uOnbsmJo1ayZXV1e99dZbku7/8RUREaEqVarI0dFRRYsW1XvvvacrV64kW85vv/1mOk5dXV3VvHlz7du376HrBwCJHBwc5O3tnaayixYt0ssvv2xKoElSQECAKlSooPnz56c6b8eOHdWgQQNJUqtWrZLdkpSW87IlhmHo888/1xNPPCEnJye98MIL6fqdTEhI0JgxY1StWjU5OjqqSJEiatKkibZt22Yqk3h+/+GHH1SlShU5ODiYzu07d+5U06ZN5ebmJhcXF7300kv666+/zNZx9+5dDRkyROXLl5ejo6MKFy6s+vXra/Xq1aYyaTnnPWjhwoWysbHR77//nuyzyZMny8bGRnv37pUk/fPPP+rYsaOefPJJOTo6ytvbW++8844uXbpkdRtZOteeOXNGQUFBcnZ2lpeXl/r06WNxLKE///xTrVq1UqlSpeTg4KCSJUuqT58+un37tqlMx44dNX78eEnmt/0lSut582GPhblz58rX11eurq5yc3NTtWrVksWP1uIZKf1xSEREhCkO2b9/v8WYKDGuOHv2rIKCguTi4qIiRYroww8/VHx8vNmyL126pPbt28vNzU0FCxZUcHCwdu/eneahJo4fP65WrVrJw8NDTk5OevbZZ7Vs2TLT54kxvmEYGj9+fLL9lR6FCxfW3LlzlS9fPn3xxRdmn8XGxiosLEzlypUzHTsff/yxxePs+++/l5+fn5ycnFSoUCE9//zzZr3PLB3DY8eOVZUqVUzz1K5dW3PmzEnWzge/gxMmTDD9DhQvXlw9evRINuZww4YNVbVqVe3fv18vvPCCnJycVKJECX311VcZ2k5JJY49OHfuXA0YMEAlSpSQk5OTrl+/Lun+30FNmjSRu7u7nJyc1KBBA23cuDHZcs6ePat33nlHRYsWNf3NMn369DTVYfHixfLx8VHZsmXNpicep6dPn9bLL78sFxcXlShRwvT93rNnj1588UU5OzurdOnSZts70dWrV9W7d2+VLFlSDg4OKleunEaMGKGEhASzcqNGjVLdunVVuHBhFShQQL6+vlq4cGGy5aXn7zNfX195eHjol19+SdN2yI1ydxcPIBdp3769Pv30U61atUpdunSRJO3bt0/16tVTiRIl1K9fPzk7O2v+/PkKCgrSokWLTN2CCxYsqD59+qht27Zq1qyZXFxcJN2/FXDTpk1688039cQTT+jkyZOaOHGiGjZsqP3798vJyemh6lypUiUNHTpUgwYNUteuXfXcc89JSv0WuPfff18LFy5USEiIKleurEuXLmnDhg06cOCAnn76aUnS6tWr9fLLL6tYsWLq1auXvL29deDAAS1dulS9evWSJK1Zs0ZNmzbVk08+qcGDB+v27dsaO3as6tWrpx07diTrTdSqVSuVL19ew4cPl2EYqbZrw4YN+umnn9S9e3e5urrq22+/VcuWLXX69GkVLlxY0v2Au0mTJip
"text/plain": [
"<Figure size 1500x1500 with 7 Axes>"
2022-12-19 10:09:00 +01:00
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
2022-12-20 20:31:29 +01:00
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAjcAAAGwCAYAAABVdURTAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjYuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8o6BhiAAAACXBIWXMAAA9hAAAPYQGoP6dpAABnwElEQVR4nO3deVhUZf8G8HtYZthBRHYUd0XBNUksTUVxya0yyg2tbFMzfa00d8ulLJfKMjU1+/m+llZmibjgklvuGpuouKAIKCIM+zLz/P5ARkdA5+AMA8P9uS6ummfOOfOd4+DcPss5MiGEABEREZGJMDN2AURERET6xHBDREREJoXhhoiIiEwKww0RERGZFIYbIiIiMikMN0RERGRSGG6IiIjIpFgYu4CqplarcfPmTdjb20Mmkxm7HCIiItKBEAJZWVnw9PSEmdmj+2ZqXbi5efMmfHx8jF0GERERVcL169fh7e39yG1qXbixt7cHUHJyHBwcjFwNERER6UKpVMLHx0fzPf4otS7clA5FOTg4MNwQERHVMLpMKeGEYiIiIjIpDDdERERkUhhuiIiIyKQw3BAREZFJYbghIiIik8JwQ0RERCaF4YaIiIhMCsMNERERmRSGGyIiIjIpDDdERERkUowabv7++28MGDAAnp6ekMlk2Lp162P32b9/P9q3bw+FQoEmTZpg/fr1Bq+TiIiIag6jhpucnBy0adMGK1as0Gn7K1euoH///ujevTvOnj2L999/H2+88QZ27txp4EqJiIiopjDqjTP79u2Lvn376rz9ypUr0bBhQ3z55ZcAgJYtW+LQoUNYunQpQkJCDFUmERER6eiWMh/ZBcVoVM/OaDXUqLuCHz16FMHBwVptISEheP/99yvcp6CgAAUFBZrHSqXSUOURERHVKqnKfETdyERUUiaik0r+eyurAM81r4f1YzoZra4aFW5SUlLg5uam1ebm5galUom8vDxYW1uX2WfhwoWYO3duVZVIRERkklKV+fj3oSBzO6ugzHZmMqCgSG2ECu+rUeGmMqZNm4bJkydrHiuVSvj4+BixIiIioupLCIFUZQGiknQLMk1c7dDayxH+Xo4I8HZESw8H2MiNGy9qVLhxd3dHamqqVltqaiocHBzK7bUBAIVCAYVCURXlERER1ShCCKTcG1oqDTFRSUqkZZcfZJq62t8LMg7wryZBpjzVr6JH6Ny5M8LDw7Xadu/ejc6dOxupIiIioprh4SDz771embTswjLbmsmAZm72mh6Z1l6O8PNwgLXc3AiVS2fUcJOdnY1Lly5pHl+5cgVnz56Fs7Mz6tevj2nTpiEpKQkbNmwAALz99tv45ptv8OGHH+K1117D3r178csvv2D79u3GegtERETVjhACyZn5WsNKFQUZczMZmj4wtFTTgkx5jBpuTp48ie7du2sel86NCQsLw/r165GcnIzExETN8w0bNsT27dsxadIkLF++HN7e3lizZg2XgRMRUa31cJD5917PzJ2cioOMv5cj/L3vBxkry5obZMojE0IIYxdRlZRKJRwdHZGZmQkHBwdjl0NERKQzIQRuZmrPkdE1yPh7lcyRqalBRsr3d42ac0NERFRb3A8yGZqJvtFJmUivIMg0c7Mvmeh7b2ipJgeZJ8VwQ0REZGRCCCRl5GmtWKooyFiYydD0gSDj7+2EFu72tTbIlIfhhoiIqAoJIXDj7oNBpmRo6W5uUZltLTQ9Mo5ofW9oiUHm8RhuiIiIDERqkGnubq8ZVvL3ckRzBplKYbghIiLSg9Ig8/CVfTPKCTKW5g/0yDDI6B3DDRERkUQPB5moG5mIvllxkCmvR0ZhwSBjKAw3REREjyCEwPV07R4ZXYNMgJcTmrnbMchUMYYbIiKie0qDzL9JGfeDTJISmXnlB5kW7g6a3hh/L0cGmWqC4YaIiGolIQQS03O1e2QqCDJyczO08LDXDjJu9pBbmBmhcnochhsiIjJ5Qghcu5Nb5l5LyvziMtsyyNR8DDdERGRSHg4y/96b7JtVQZBp6aF992sGmZqP4YaIiGostVrgWvoDPTKPCjIWZmjpziBTGzDcEBFRjfBwkPn3RgZibiorDjIeDlr3WmrmZg9LcwaZ2oDhhoiIqh21WuDqnRytOTIxSUpkFTw+yPh7OaGpmx2DTC3GcENEREb1YJCJulESZGJvlh9kFJogc39oiUGGHsZwQ0REVUatFrhyJ0czPyYqKRMxN5XIriDI+Hk6aF3Zt4krgww9HsMNEREZhFotcDktR+umkbESgkxTVztYMMhQJTDcEBHREysTZG5kIuZmJnIKVWW2tbI0g5/HA0HG2xFN6jHIkP4w3BARkSQqtcCVtOx7IUaJ6CTdg0yAtxMa17NlkCGDYrghIqIKPRxkopIyEHtTWW6QsbY0LzO0xCBDxsBwQ0REAEqCzOXb2Vr3Woq5qUSuDkEmwNsRjevZwdxMZoTKibQx3BAR1UKlQebfG/eDTGxyxUGmlecDd79mkKFqjuGGiMjEqdQCCbezNUuvdQky/t73bxrZiEGGahiGGyIiE1KsUiPhtvaVfWNvKpFXVDbI2Mgf6pFhkCETwXBDRFRDlRdkYm5mIr9IXWZbW7k5WnmWLr0umSvT0IVBhkwTww0RUQ1QrFLj0r2hJU2PTLJSxyDjhIYutgwyVGsw3BARVTOSg8wDw0qtvRzRyMUWZgwyVIsx3BARGVGxSo2Lt7K1hpbiKggydgoLzfLrAO+SINOwLoMM0cMYboiIqogmyNy4f6+luGQlCorLDzKt7gUZfwYZIkkYboiIDKBIpcbF1Gytm0Y+LsiU9sb4eznCl0GGqNIYboiInlCRSo0LqVkPBBkl4pKVKCwnyNgrLNDKS/sWBQwyRPrFcENEJIGkIGNlgdae94eV/L0c0cDZhkGGyMAYboiIKlCkUiM+5X6QiU7KRFxKlk5BJsDLEfUZZIiMguGGiAhAYbF2j0x0UibikrNQqCo/yDy49NqfQYaoWmG4IaJapzTIPHj36/MVBBkHK4t7F8O7fy2Z+s42kMkYZIiqK4YbIjJppUHmwbtfx6dUHGQenB/DIENUMzHcEJHJKChW4UJKtlaPTEVBxtHaUmtYyd/LET7O1gwyRCaA4YaIaqSCYhXiU7K0ruwbn5KFIpUosy2DDFHtwnBDRNWelCDjZFM2yHjXYZAhqk0qFW6KioqQkpKC3Nxc1KtXD87Ozvqui4hqqYJiFc4naweZC6kMMkSkO53DTVZWFv7v//4PmzZtwvHjx1FYWAghBGQyGby9vdG7d2+8+eabeOqppwxZLxGZkPyi8ntkitVlg0wdG0utENOaQYaIKqBTuFmyZAnmz5+Pxo0bY8CAAfj444/h6ekJa2trpKenIzo6GgcPHkTv3r0RGBiIr7/+Gk2bNjV07URUg+QXqXC+NMjcuN8j86ggE+B9P8h4OTHIEJFuZEKIsn+zPOTVV1/FjBkz0KpVq0duV1BQgHXr1kEul+O1117TW5H6pFQq4ejoiMzMTDg4OBi7HCKTpAkyNzI0tyi4WEGQcbaV3+uRcWCQIaIKSfn+1incmBKGGyL9yi9SIS5ZqXWvpYqCTF1NkHHUXBjP09GKQYaIHkvK9zdXSxGRzvKLVIgtDTL3hpYu3sqGSocgE+DtCA8GGSKqApLCzblz5/Dnn3/C2dkZL7/8MlxcXDTPKZVKvP/++1i7dq3eiySiqiclyLjYPdQj48UgQ0TGo/Ow1K5duzBgwAA0bdoUWVlZyMnJwebNm9G9e3cAQGpqKjw9PaFSqQxa8JPisBRRWXmFDwSZeyuXdA0yAd6OcHdgkCEiwzLIsNScOXMwZcoUzJ8/H0IILF68GAMHDsTmzZvRp0+fJy6aiKqGtCCj0Jro688gQ0Q1gM7hJiYmBj/99BMAQCaT4cMPP4S3tzdeeuklbNq0ide3IaqGSoJM6bCS8l6QyUI5OeZ+kPF20lxLxs1BwSBDRDWOzuFGoVAgIyNDq23YsGEwMzNDaGgovvzyS33XRkQSlAaZB+9+felWdrlBpp69osyVfRlkiMhU6Bxu2rZti3379qFDhw5a7a+88gqEEAg
2022-12-19 10:09:00 +01:00
"text/plain": [
"<Figure size 640x480 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"from sklearn.tree import DecisionTreeClassifier\n",
"\n",
"# Score the model with default parameters\n",
2022-12-20 20:31:29 +01:00
"score_dec_tree, model_dec_tree = score_the_model(\n",
2022-12-19 10:09:00 +01:00
" model=DecisionTreeClassifier(),\n",
" model_name='Decision Tree',\n",
" random_seed=42,\n",
" X_train=X_train,\n",
" X_test=X_test,\n",
" y_train=y_train,\n",
" y_test=y_test,\n",
" plot=True\n",
2022-12-20 20:31:29 +01:00
")\n",
"\n",
"print(score_dec_tree)"
2022-12-19 10:09:00 +01:00
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "a72e54f6",
"metadata": {},
"source": [
"Now lets plot the decision tree"
]
},
{
"cell_type": "code",
2022-12-20 20:31:29 +01:00
"execution_count": null,
2022-12-19 10:09:00 +01:00
"id": "c4fe47bd",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[Text(0.6615466101694916, 0.9722222222222222, 'V36 <= 3.678\\ngini = 0.444\\nsamples = 846\\nvalue = [564, 282]\\nclass = Ready biodegradable'),\n",
" Text(0.4240819209039548, 0.9166666666666666, 'V1 <= 4.792\\ngini = 0.483\\nsamples = 361\\nvalue = [147, 214]\\nclass = Reday non-biodegradable'),\n",
" Text(0.2803672316384181, 0.8611111111111112, 'V34 <= 1.5\\ngini = 0.435\\nsamples = 285\\nvalue = [91, 194]\\nclass = Reday non-biodegradable'),\n",
" Text(0.1906779661016949, 0.8055555555555556, 'V14 <= 0.673\\ngini = 0.32\\nsamples = 210\\nvalue = [42, 168]\\nclass = Reday non-biodegradable'),\n",
" Text(0.16807909604519775, 0.75, 'V18 <= 1.158\\ngini = 0.278\\nsamples = 12\\nvalue = [10, 2]\\nclass = Ready biodegradable'),\n",
" Text(0.15677966101694915, 0.6944444444444444, 'V28 <= 0.121\\ngini = 0.165\\nsamples = 11\\nvalue = [10, 1]\\nclass = Ready biodegradable'),\n",
" Text(0.14548022598870056, 0.6388888888888888, 'gini = 0.0\\nsamples = 10\\nvalue = [10, 0]\\nclass = Ready biodegradable'),\n",
" Text(0.16807909604519775, 0.6388888888888888, 'gini = 0.0\\nsamples = 1\\nvalue = [0, 1]\\nclass = Reday non-biodegradable'),\n",
" Text(0.17937853107344634, 0.6944444444444444, 'gini = 0.0\\nsamples = 1\\nvalue = [0, 1]\\nclass = Reday non-biodegradable'),\n",
" Text(0.2132768361581921, 0.75, 'V38 <= 1.5\\ngini = 0.271\\nsamples = 198\\nvalue = [32, 166]\\nclass = Reday non-biodegradable'),\n",
" Text(0.2019774011299435, 0.6944444444444444, 'V41 <= 1.5\\ngini = 0.247\\nsamples = 194\\nvalue = [28, 166]\\nclass = Reday non-biodegradable'),\n",
" Text(0.1906779661016949, 0.6388888888888888, 'V22 <= 1.265\\ngini = 0.221\\nsamples = 190\\nvalue = [24, 166]\\nclass = Reday non-biodegradable'),\n",
" Text(0.12146892655367232, 0.5833333333333334, 'V32 <= 0.5\\ngini = 0.148\\nsamples = 161\\nvalue = [13, 148]\\nclass = Reday non-biodegradable'),\n",
" Text(0.11016949152542373, 0.5277777777777778, 'V28 <= 0.843\\ngini = 0.139\\nsamples = 160\\nvalue = [12, 148]\\nclass = Reday non-biodegradable'),\n",
" Text(0.09887005649717515, 0.4722222222222222, 'V37 <= 1.95\\ngini = 0.129\\nsamples = 159\\nvalue = [11, 148]\\nclass = Reday non-biodegradable'),\n",
" Text(0.03389830508474576, 0.4166666666666667, 'V22 <= 1.174\\ngini = 0.34\\nsamples = 23\\nvalue = [5, 18]\\nclass = Reday non-biodegradable'),\n",
" Text(0.022598870056497175, 0.3611111111111111, 'gini = 0.0\\nsamples = 1\\nvalue = [1, 0]\\nclass = Ready biodegradable'),\n",
" Text(0.04519774011299435, 0.3611111111111111, 'V31 <= 1.358\\ngini = 0.298\\nsamples = 22\\nvalue = [4, 18]\\nclass = Reday non-biodegradable'),\n",
" Text(0.03389830508474576, 0.3055555555555556, 'gini = 0.0\\nsamples = 1\\nvalue = [1, 0]\\nclass = Ready biodegradable'),\n",
" Text(0.05649717514124294, 0.3055555555555556, 'V37 <= 1.935\\ngini = 0.245\\nsamples = 21\\nvalue = [3, 18]\\nclass = Reday non-biodegradable'),\n",
" Text(0.04519774011299435, 0.25, 'V35 <= 1.5\\ngini = 0.18\\nsamples = 20\\nvalue = [2, 18]\\nclass = Reday non-biodegradable'),\n",
" Text(0.022598870056497175, 0.19444444444444445, 'V3 <= 0.5\\ngini = 0.105\\nsamples = 18\\nvalue = [1, 17]\\nclass = Reday non-biodegradable'),\n",
" Text(0.011299435028248588, 0.1388888888888889, 'gini = 0.0\\nsamples = 15\\nvalue = [0, 15]\\nclass = Reday non-biodegradable'),\n",
" Text(0.03389830508474576, 0.1388888888888889, 'V22 <= 1.231\\ngini = 0.444\\nsamples = 3\\nvalue = [1, 2]\\nclass = Reday non-biodegradable'),\n",
" Text(0.022598870056497175, 0.08333333333333333, 'gini = 0.0\\nsamples = 2\\nvalue = [0, 2]\\nclass = Reday non-biodegradable'),\n",
" Text(0.04519774011299435, 0.08333333333333333, 'gini = 0.0\\nsamples = 1\\nvalue = [1, 0]\\nclass = Ready biodegradable'),\n",
" Text(0.06779661016949153, 0.19444444444444445, 'V17 <= 0.97\\ngini = 0.5\\nsamples = 2\\nvalue = [1, 1]\\nclass = Ready biodegradable'),\n",
" Text(0.05649717514124294, 0.1388888888888889, 'gini = 0.0\\nsamples = 1\\nvalue = [1, 0]\\nclass = Ready biodegradable'),\n",
" Text(0.07909604519774012, 0.1388888888888889, 'gini = 0.0\\nsamples = 1\\nvalue = [0, 1]\\nclass = Reday non-biodegradable'),\n",
" Text(0.06779661016949153, 0.25, 'gini = 0.0\\nsamples = 1\\nvalue = [1, 0]\\nclass = Ready biodegradable'),\n",
" Text(0.1638418079096045, 0.4166666666666667, 'V9 <= 3.5\\ngini = 0.084\\nsamples = 136\\nvalue = [6, 130]\\nclass = Reday non-biodegradable'),\n",
" Text(0.13559322033898305, 0.3611111111111111, 'V18 <= 1.162\\ngini = 0.059\\nsamples = 131\\nvalue = [4, 127]\\nclass = Reday non-biodegradable'),\n",
" Text(0.11299435028248588, 0.3055555555555556, 'V37 <= 2.292\\ngini = 0.017\\nsamples = 118\\nvalue = [1, 117]\\nclass = Reday non-biodegradable'),\n",
" Text(0.1016949152542373, 0.25, 'V37 <= 2.285\\ngini = 0.087\\nsamples = 22\\nvalue = [1, 21]\\nclass = Reday non-biodegradable'),\n",
" Text(0.0903954802259887, 0.19444444444444445, 'gini = 0.0\\nsamples = 21\\nvalue = [0, 21]\\nclass = Reday non-biodegradable'),\n",
" Text(0.11299435028248588, 0.19444444444444445, 'gini = 0.0\\nsamples = 1\\nvalue = [1, 0]\\nclass = Ready biodegradable'),\n",
" Text(0.12429378531073447, 0.25, 'gini = 0.0\\nsamples = 96\\nvalue = [0, 96]\\nclass = Reday non-biodegradable'),\n",
" Text(0.15819209039548024, 0.3055555555555556, 'V2 <= 3.228\\ngini = 0.355\\nsamples = 13\\nvalue = [3, 10]\\nclass = Reday non-biodegradable'),\n",
" Text(0.14689265536723164, 0.25, 'V11 <= 0.5\\ngini = 0.165\\nsamples = 11\\nvalue = [1, 10]\\nclass = Reday non-biodegradable'),\n",
" Text(0.13559322033898305, 0.19444444444444445, 'gini = 0.0\\nsamples = 9\\nvalue = [0, 9]\\nclass = Reday non-biodegradable'),\n",
" Text(0.15819209039548024, 0.19444444444444445, 'V16 <= 1.5\\ngini = 0.5\\nsamples = 2\\nvalue = [1, 1]\\nclass = Ready biodegradable'),\n",
" Text(0.14689265536723164, 0.1388888888888889, 'gini = 0.0\\nsamples = 1\\nvalue = [1, 0]\\nclass = Ready biodegradable'),\n",
" Text(0.1694915254237288, 0.1388888888888889, 'gini = 0.0\\nsamples = 1\\nvalue = [0, 1]\\nclass = Reday non-biodegradable'),\n",
" Text(0.1694915254237288, 0.25, 'gini = 0.0\\nsamples = 2\\nvalue = [2, 0]\\nclass = Ready biodegradable'),\n",
" Text(0.192090395480226, 0.3611111111111111, 'V18 <= 1.146\\ngini = 0.48\\nsamples = 5\\nvalue = [2, 3]\\nclass = Reday non-biodegradable'),\n",
" Text(0.1807909604519774, 0.3055555555555556, 'gini = 0.0\\nsamples = 3\\nvalue = [0, 3]\\nclass = Reday non-biodegradable'),\n",
" Text(0.2033898305084746, 0.3055555555555556, 'gini = 0.0\\nsamples = 2\\nvalue = [2, 0]\\nclass = Ready biodegradable'),\n",
" Text(0.12146892655367232, 0.4722222222222222, 'gini = 0.0\\nsamples = 1\\nvalue = [1, 0]\\nclass = Ready biodegradable'),\n",
" Text(0.1327683615819209, 0.5277777777777778, 'gini = 0.0\\nsamples = 1\\nvalue = [1, 0]\\nclass = Ready biodegradable'),\n",
" Text(0.2598870056497175, 0.5833333333333334, 'V2 <= 2.882\\ngini = 0.471\\nsamples = 29\\nvalue = [11, 18]\\nclass = Reday non-biodegradable'),\n",
" Text(0.22033898305084745, 0.5277777777777778, 'V8 <= 38.8\\ngini = 0.459\\nsamples = 14\\nvalue = [9, 5]\\nclass = Ready biodegradable'),\n",
" Text(0.1977401129943503, 0.4722222222222222, 'V31 <= 1.896\\ngini = 0.32\\nsamples = 5\\nvalue = [1, 4]\\nclass = Reday non-biodegradable'),\n",
" Text(0.1864406779661017, 0.4166666666666667, 'gini = 0.0\\nsamples = 4\\nvalue = [0, 4]\\nclass = Reday non-biodegradable'),\n",
" Text(0.20903954802259886, 0.4166666666666667, 'gini = 0.0\\nsamples = 1\\nvalue = [1, 0]\\nclass = Ready biodegradable'),\n",
" Text(0.24293785310734464, 0.4722222222222222, 'V22 <= 1.331\\ngini = 0.198\\nsamples = 9\\nvalue = [8, 1]\\nclass = Ready biodegradable'),\n",
" Text(0.23163841807909605, 0.4166666666666667, 'gini = 0.0\\nsamples = 8\\nvalue = [8, 0]\\nclass = Ready biodegradable'),\n",
" Text(0.2542372881355932, 0.4166666666666667, 'gini = 0.0\\nsamples = 1\\nvalue = [0, 1]\\nclass = Reday non-biodegradable'),\n",
" Text(0.2994350282485876, 0.5277777777777778, 'V13 <= 3.298\\ngini = 0.231\\nsamples = 15\\nvalue = [2, 13]\\nclass = Reday non-biodegradable'),\n",
" Text(0.288135593220339, 0.4722222222222222, 'V12 <= 0.841\\ngini = 0.133\\nsamples = 14\\nvalue = [1, 13]\\nclass = Reday non-biodegradable'),\n",
" Text(0.2768361581920904, 0.4166666666666667, 'gini = 0.0\\nsamples = 12\\nvalue = [0, 12]\\nclass = Reday non-biodegradable'),\n",
" Text(0.2994350282485876, 0.4166666666666667, 'V2 <= 3.222\\ngini = 0.5\\nsamples = 2\\nvalue = [1, 1]\\nclass = Ready biodegradable'),\n",
" Text(0.288135593220339, 0.3611111111111111, 'gini = 0.0\\nsamples = 1\\nvalue = [1, 0]\\nclass = Ready biodegradable'),\n",
" Text(0.3107344632768362, 0.3611111111111111, 'gini = 0.0\\nsamples = 1\\nvalue = [0, 1]\\nclass = Reday non-biodegradable'),\n",
" Text(0.3107344632768362, 0.4722222222222222, 'gini = 0.0\\nsamples = 1\\nvalue = [1, 0]\\nclass = Ready biodegradable'),\n",
" Text(0.2132768361581921, 0.6388888888888888, 'gini = 0.0\\nsamples = 4\\nvalue = [4, 0]\\nclass = Ready biodegradable'),\n",
" Text(0.2245762711864407, 0.6944444444444444, 'gini = 0.0\\nsamples = 4\\nvalue = [4, 0]\\nclass = Ready biodegradable'),\n",
" Text(0.3700564971751412, 0.8055555555555556, 'V16 <= 0.5\\ngini = 0.453\\nsamples = 75\\nvalue = [49, 26]\\nclass = Ready biodegradable'),\n",
" Text(0.3163841807909605, 0.75, 'V1 <= 4.107\\ngini = 0.311\\nsamples = 52\\nvalue = [42, 10]\\nclass = Ready biodegradable'),\n",
" Text(0.288135593220339, 0.6944444444444444, 'V34 <= 2.5\\ngini = 0.48\\nsamples = 10\\nvalue = [4, 6]\\nclass = Reday non-biodegradable'),\n",
" Text(0.2768361581920904, 0.6388888888888888, 'gini = 0.0\\nsamples = 6\\nvalue = [0, 6]\\nclass = Reday non-biodegradable'),\n",
" Text(0.2994350282485876, 0.6388888888888888, 'gini = 0.0\\nsamples = 4\\nvalue = [4, 0]\\nclass = Ready biodegradable'),\n",
" Text(0.3446327683615819, 0.6944444444444444, 'V2 <= 2.217\\ngini = 0.172\\nsamples = 42\\nvalue = [38, 4]\\nclass = Ready biodegradable'),\n",
" Text(0.3220338983050847, 0.6388888888888888, 'V38 <= 1.5\\ngini = 0.444\\nsamples = 3\\nvalue = [1, 2]\\nclass = Reday non-biodegradable'),\n",
" Text(0.3107344632768362, 0.5833333333333334, 'gini = 0.0\\nsamples = 2\\nvalue = [0, 2]\\nclass = Reday non-biodegradable'),\n",
" Text(0.3333333333333333, 0.5833333333333334, 'gini = 0.0\\nsamples = 1\\nvalue = [1, 0]\\nclass = Ready biodegradable'),\n",
" Text(0.3672316384180791, 0.6388888888888888, 'V1 <= 4.426\\ngini = 0.097\\nsamples = 39\\nvalue = [37, 2]\\nclass = Ready biodegradable'),\n",
" Text(0.3559322033898305, 0.5833333333333334, 'V7 <= 0.5\\ngini = 0.32\\nsamples = 10\\nvalue = [8, 2]\\nclass = Ready biodegradable'),\n",
" Text(0.3446327683615819, 0.5277777777777778, 'V28 <= 0.029\\ngini = 0.198\\nsamples = 9\\nvalue = [8, 1]\\nclass = Ready biodegradable'),\n",
" Text(0.3333333333333333, 0.4722222222222222, 'gini = 0.0\\nsamples = 7\\nvalue = [7, 0]\\nclass = Ready biodegradable'),\n",
" Text(0.3559322033898305, 0.4722222222222222, 'V28 <= 0.097\\ngini = 0.5\\nsamples = 2\\nvalue = [1, 1]\\nclass = Ready biodegradable'),\n",
" Text(0.3446327683615819, 0.4166666666666667, 'gini = 0.0\\nsamples = 1\\nvalue = [0, 1]\\nclass = Reday non-biodegradable'),\n",
" Text(0.3672316384180791, 0.4166666666666667, 'gini = 0.0\\nsamples = 1\\nvalue = [1, 0]\\nclass = Ready biodegradable'),\n",
" Text(0.3672316384180791, 0.5277777777777778, 'gini = 0.0\\nsamples = 1\\nvalue = [0, 1]\\nclass = Reday non-biodegradable'),\n",
" Text(0.3785310734463277, 0.5833333333333334, 'gini = 0.0\\nsamples = 29\\nvalue = [29, 0]\\nclass = Ready biodegradable'),\n",
" Text(0.423728813559322, 0.75, 'V27 <= 2.089\\ngini = 0.423\\nsamples = 23\\nvalue = [7, 16]\\nclass = Reday non-biodegradable'),\n",
" Text(0.4124293785310734, 0.6944444444444444, 'gini = 0.0\\nsamples = 3\\nvalue = [3, 0]\\nclass = Ready biodegradable'),\n",
" Text(0.4350282485875706, 0.6944444444444444, 'V28 <= 0.015\\ngini = 0.32\\nsamples = 20\\nvalue = [4, 16]\\nclass = Reday non-biodegradable'),\n",
" Text(0.4124293785310734, 0.6388888888888888, 'V27 <= 2.239\\ngini = 0.124\\nsamples = 15\\nvalue = [1, 14]\\nclass = Reday non-biodegradable'),\n",
" Text(0.4011299435028249, 0.5833333333333334, 'gini = 0.0\\nsamples = 14\\nvalue = [0, 14]\\nclass = Reday non-biodegradable'),\n",
" Text(0.423728813559322, 0.5833333333333334, 'gini = 0.0\\nsamples = 1\\nvalue = [1, 0]\\nclass = Ready biodegradable'),\n",
" Text(0.4576271186440678, 0.6388888888888888, 'V36 <= 3.535\\ngini = 0.48\\nsamples = 5\\nvalue = [3, 2]\\nclass = Ready biodegradable'),\n",
" Text(0.4463276836158192, 0.5833333333333334, 'gini = 0.0\\nsamples = 2\\nvalue = [0, 2]\\nclass = Reday non-biodegradable'),\n",
" Text(0.4689265536723164, 0.5833333333333334, 'gini = 0.0\\nsamples = 3\\nvalue = [3, 0]\\nclass = Ready biodegradable'),\n",
" Text(0.5677966101694916, 0.8611111111111112, 'V30 <= 10.221\\ngini = 0.388\\nsamples = 76\\nvalue = [56, 20]\\nclass = Ready biodegradable'),\n",
" Text(0.5254237288135594, 0.8055555555555556, 'V36 <= 3.673\\ngini = 0.201\\nsamples = 44\\nvalue = [39, 5]\\nclass = Ready biodegradable'),\n",
" Text(0.5141242937853108, 0.75, 'V18 <= 1.158\\ngini = 0.133\\nsamples = 42\\nvalue = [39, 3]\\nclass = Ready biodegradable'),\n",
" Text(0.4915254237288136, 0.6944444444444444, 'V12 <= 1.437\\ngini = 0.05\\nsamples = 39\\nvalue = [38, 1]\\nclass = Ready biodegradable'),\n",
" Text(0.480225988700565, 0.6388888888888888, 'gini = 0.0\\nsamples = 37\\nvalue = [37, 0]\\nclass = Ready biodegradable'),\n",
" Text(0.5028248587570622, 0.6388888888888888, 'V12 <= 1.504\\ngini = 0.5\\nsamples = 2\\nvalue = [1, 1]\\nclass = Ready biodegradable'),\n",
" Text(0.4915254237288136, 0.5833333333333334, 'gini = 0.0\\nsamples = 1\\nvalue = [0, 1]\\nclass = Reday non-biodegradable'),\n",
" Text(0.5141242937853108, 0.5833333333333334, 'gini = 0.0\\nsamples = 1\\nvalue = [1, 0]\\nclass = Ready biodegradable'),\n",
" Text(0.536723163841808, 0.6944444444444444, 'V17 <= 1.005\\ngini = 0.444\\nsamples = 3\\nvalue = [1, 2]\\nclass = Reday non-biodegradable'),\n",
" Text(0.5254237288135594, 0.6388888888888888, 'gini = 0.0\\nsamples = 1\\nvalue = [1, 0]\\nclass = Ready biodegradable'),\n",
" Text(0.5480225988700564, 0.6388888888888888, 'gini = 0.0\\nsamples = 2\\nvalue = [0, 2]\\nclass = Reday non-biodegradable'),\n",
" Text(0.536723163841808, 0.75, 'gini = 0.0\\nsamples = 2\\nvalue = [0, 2]\\nclass = Reday non-biodegradable'),\n",
" Text(0.6101694915254238, 0.8055555555555556, 'V36 <= 3.511\\ngini = 0.498\\nsamples = 32\\nvalue = [17, 15]\\nclass = Ready biodegradable'),\n",
" Text(0.5706214689265536, 0.75, 'V31 <= 1.567\\ngini = 0.278\\nsamples = 12\\nvalue = [2, 10]\\nclass = Reday non-biodegradable'),\n",
" Text(0.559322033898305, 0.6944444444444444, 'gini = 0.0\\nsamples = 1\\nvalue = [1, 0]\\nclass = Ready biodegradable'),\n",
" Text(0.5819209039548022, 0.6944444444444444, 'V2 <= 4.616\\ngini = 0.165\\nsamples = 11\\nvalue = [1, 10]\\nclass = Reday non-biodegradable'),\n",
" Text(0.5706214689265536, 0.6388888888888888, 'gini = 0.0\\nsamples = 10\\nvalue = [0, 10]\\nclass = Reday non-biodegradable'),\n",
" Text(0.5932203389830508, 0.6388888888888888, 'gini = 0.0\\nsamples = 1\\nvalue = [1, 0]\\nclass = Ready biodegradable'),\n",
" Text(0.6497175141242938, 0.75, 'V17 <= 1.025\\ngini = 0.375\\nsamples = 20\\nvalue = [15, 5]\\nclass = Ready biodegradable'),\n",
" Text(0.6271186440677966, 0.6944444444444444, 'V1 <= 4.856\\ngini = 0.124\\nsamples = 15\\nvalue = [14, 1]\\nclass = Ready biodegradable'),\n",
" Text(0.615819209039548, 0.6388888888888888, 'gini = 0.0\\nsamples = 1\\nvalue = [0, 1]\\nclass = Reday non-biodegradable'),\n",
" Text(0.6384180790960452, 0.6388888888888888, 'gini = 0.0\\nsamples = 14\\nvalue = [14, 0]\\nclass = Ready biodegradable'),\n",
" Text(0.672316384180791, 0.6944444444444444, 'V31 <= 2.562\\ngini = 0.32\\nsamples = 5\\nvalue = [1, 4]\\nclass = Reday non-biodegradable'),\n",
" Text(0.6610169491525424, 0.6388888888888888, 'gini = 0.0\\nsamples = 4\\nvalue = [0, 4]\\nclass = Reday non-biodegradable'),\n",
" Text(0.6836158192090396, 0.6388888888888888, 'gini = 0.0\\nsamples = 1\\nvalue = [1, 0]\\nclass = Ready biodegradable'),\n",
" Text(0.8990112994350282, 0.9166666666666666, 'V40 <= 0.5\\ngini = 0.241\\nsamples = 485\\nvalue = [417, 68]\\nclass = Ready biodegradable'),\n",
" Text(0.8206214689265536, 0.8611111111111112, 'V12 <= -0.712\\ngini = 0.185\\nsamples = 457\\nvalue = [410, 47]\\nclass = Ready biodegradable'),\n",
" Text(0.751412429378531, 0.8055555555555556, 'V27 <= 2.363\\ngini = 0.452\\nsamples = 55\\nvalue = [36, 19]\\nclass = Ready biodegradable'),\n",
" Text(0.7401129943502824, 0.75, 'V8 <= 42.5\\ngini = 0.475\\nsamples = 31\\nvalue = [12, 19]\\nclass = Reday non-biodegradable'),\n",
" Text(0.7175141242937854, 0.6944444444444444, 'V22 <= 1.228\\ngini = 0.495\\nsamples = 20\\nvalue = [11, 9]\\nclass = Ready biodegradable'),\n",
" Text(0.7062146892655368, 0.6388888888888888, 'V11 <= 0.5\\ngini = 0.459\\nsamples = 14\\nvalue = [5, 9]\\nclass = Reday non-biodegradable'),\n",
" Text(0.6949152542372882, 0.5833333333333334, 'gini = 0.0\\nsamples = 6\\nvalue = [0, 6]\\nclass = Reday non-biodegradable'),\n",
" Text(0.7175141242937854, 0.5833333333333334, 'V31 <= 1.845\\ngini = 0.469\\nsamples = 8\\nvalue = [5, 3]\\nclass = Ready biodegradable'),\n",
" Text(0.7062146892655368, 0.5277777777777778, 'gini = 0.0\\nsamples = 2\\nvalue = [0, 2]\\nclass = Reday non-biodegradable'),\n",
" Text(0.7288135593220338, 0.5277777777777778, 'V37 <= 3.486\\ngini = 0.278\\nsamples = 6\\nvalue = [5, 1]\\nclass = Ready biodegradable'),\n",
" Text(0.7175141242937854, 0.4722222222222222, 'gini = 0.0\\nsamples = 4\\nvalue = [4, 0]\\nclass = Ready biodegradable'),\n",
" Text(0.7401129943502824, 0.4722222222222222, 'V2 <= 3.358\\ngini = 0.5\\nsamples = 2\\nvalue = [1, 1]\\nclass = Ready biodegradable'),\n",
" Text(0.7288135593220338, 0.4166666666666667, 'gini = 0.0\\nsamples = 1\\nvalue = [0, 1]\\nclass = Reday non-biodegradable'),\n",
" Text(0.751412429378531, 0.4166666666666667, 'gini = 0.0\\nsamples = 1\\nvalue = [1, 0]\\nclass = Ready biodegradable'),\n",
" Text(0.7288135593220338, 0.6388888888888888, 'gini = 0.0\\nsamples = 6\\nvalue = [6, 0]\\nclass = Ready biodegradable'),\n",
" Text(0.7627118644067796, 0.6944444444444444, 'V3 <= 0.5\\ngini = 0.165\\nsamples = 11\\nvalue = [1, 10]\\nclass = Reday non-biodegradable'),\n",
" Text(0.751412429378531, 0.6388888888888888, 'gini = 0.0\\nsamples = 10\\nvalue = [0, 10]\\nclass = Reday non-biodegradable'),\n",
" Text(0.7740112994350282, 0.6388888888888888, 'gini = 0.0\\nsamples = 1\\nvalue = [1, 0]\\nclass = Ready biodegradable'),\n",
" Text(0.7627118644067796, 0.75, 'gini = 0.0\\nsamples = 24\\nvalue = [24, 0]\\nclass = Ready biodegradable'),\n",
" Text(0.8898305084745762, 0.8055555555555556, 'V39 <= 8.446\\ngini = 0.13\\nsamples = 402\\nvalue = [374, 28]\\nclass = Ready biodegradable'),\n",
" Text(0.847457627118644, 0.75, 'V30 <= 5.124\\ngini = 0.357\\nsamples = 56\\nvalue = [43, 13]\\nclass = Ready biodegradable'),\n",
" Text(0.8192090395480226, 0.6944444444444444, 'V8 <= 46.1\\ngini = 0.206\\nsamples = 43\\nvalue = [38, 5]\\nclass = Ready biodegradable'),\n",
" Text(0.7966101694915254, 0.6388888888888888, 'V6 <= 1.5\\ngini = 0.1\\nsamples = 38\\nvalue = [36, 2]\\nclass = Ready biodegradable'),\n",
" Text(0.7853107344632768, 0.5833333333333334, 'V14 <= 1.534\\ngini = 0.053\\nsamples = 37\\nvalue = [36, 1]\\nclass = Ready biodegradable'),\n",
" Text(0.7740112994350282, 0.5277777777777778, 'gini = 0.0\\nsamples = 35\\nvalue = [35, 0]\\nclass = Ready biodegradable'),\n",
" Text(0.7966101694915254, 0.5277777777777778, 'V17 <= 0.989\\ngini = 0.5\\nsamples = 2\\nvalue = [1, 1]\\nclass = Ready biodegradable'),\n",
" Text(0.7853107344632768, 0.4722222222222222, 'gini = 0.0\\nsamples = 1\\nvalue = [1, 0]\\nclass = Ready biodegradable'),\n",
" Text(0.807909604519774, 0.4722222222222222, 'gini = 0.0\\nsamples = 1\\nvalue = [0, 1]\\nclass = Reday non-biodegradable'),\n",
" Text(0.807909604519774, 0.5833333333333334, 'gini = 0.0\\nsamples = 1\\nvalue = [0, 1]\\nclass = Reday non-biodegradable'),\n",
" Text(0.8418079096045198, 0.6388888888888888, 'V18 <= 1.105\\ngini = 0.48\\nsamples = 5\\nvalue = [2, 3]\\nclass = Reday non-biodegradable'),\n",
" Text(0.8305084745762712, 0.5833333333333334, 'gini = 0.0\\nsamples = 2\\nvalue = [2, 0]\\nclass = Ready biodegradable'),\n",
" Text(0.8531073446327684, 0.5833333333333334, 'gini = 0.0\\nsamples = 3\\nvalue = [0, 3]\\nclass = Reday non-biodegradable'),\n",
" Text(0.8757062146892656, 0.6944444444444444, 'V30 <= 15.07\\ngini = 0.473\\nsamples = 13\\nvalue = [5, 8]\\nclass = Reday non-biodegradable'),\n",
" Text(0.864406779661017, 0.6388888888888888, 'gini = 0.0\\nsamples = 8\\nvalue = [0, 8]\\nclass = Reday non-biodegradable'),\n",
" Text(0.8870056497175142, 0.6388888888888888, 'gini = 0.0\\nsamples = 5\\nvalue = [5, 0]\\nclass = Ready biodegradable'),\n",
" Text(0.9322033898305084, 0.75, 'V31 <= 9.19\\ngini = 0.083\\nsamples = 346\\nvalue = [331, 15]\\nclass = Ready biodegradable'),\n",
" Text(0.9209039548022598, 0.6944444444444444, 'V8 <= 13.25\\ngini = 0.078\\nsamples = 345\\nvalue = [331, 14]\\nclass = Ready biodegradable'),\n",
" Text(0.9096045197740112, 0.6388888888888888, 'gini = 0.0\\nsamples = 1\\nvalue = [0, 1]\\nclass = Reday non-biodegradable'),\n",
" Text(0.9322033898305084, 0.6388888888888888, 'V38 <= 0.5\\ngini = 0.073\\nsamples = 344\\nvalue = [331, 13]\\nclass = Ready biodegradable'),\n",
" Text(0.8926553672316384, 0.5833333333333334, 'V30 <= 12.851\\ngini = 0.118\\nsamples = 191\\nvalue = [179, 12]\\nclass = Ready biodegradable'),\n",
" Text(0.8700564971751412, 0.5277777777777778, 'V15 <= 10.386\\ngini = 0.078\\nsamples = 173\\nvalue = [166, 7]\\nclass = Ready biodegradable'),\n",
" Text(0.8587570621468926, 0.4722222222222222, 'V30 <= 11.485\\ngini = 0.143\\nsamples = 90\\nvalue = [83, 7]\\nclass = Ready biodegradable'),\n",
" Text(0.847457627118644, 0.4166666666666667, 'V14 <= 2.748\\ngini = 0.126\\nsamples = 89\\nvalue = [83, 6]\\nclass = Ready biodegradable'),\n",
" Text(0.8361581920903954, 0.3611111111111111, 'V14 <= 0.861\\ngini = 0.107\\nsamples = 88\\nvalue = [83, 5]\\nclass = Ready biodegradable'),\n",
" Text(0.8135593220338984, 0.3055555555555556, 'V15 <= 10.142\\ngini = 0.231\\nsamples = 30\\nvalue = [26, 4]\\nclass = Ready biodegradable'),\n",
" Text(0.8022598870056498, 0.25, 'V2 <= 2.458\\ngini = 0.391\\nsamples = 15\\nvalue = [11, 4]\\nclass = Ready biodegradable'),\n",
" Text(0.7909604519774012, 0.19444444444444445, 'gini = 0.0\\nsamples = 2\\nvalue = [0, 2]\\nclass = Reday non-biodegradable'),\n",
" Text(0.8135593220338984, 0.19444444444444445, 'V36 <= 3.866\\ngini = 0.26\\nsamples = 13\\nvalue = [11, 2]\\nclass = Ready biodegradable'),\n",
" Text(0.8022598870056498, 0.1388888888888889, 'gini = 0.0\\nsamples = 1\\nvalue = [0, 1]\\nclass = Reday non-biodegradable'),\n",
" Text(0.8248587570621468, 0.1388888888888889, 'V31 <= 0.973\\ngini = 0.153\\nsamples = 12\\nvalue = [11, 1]\\nclass = Ready biodegradable'),\n",
" Text(0.8135593220338984, 0.08333333333333333, 'V14 <= 0.792\\ngini = 0.5\\nsamples = 2\\nvalue = [1, 1]\\nclass = Ready biodegradable'),\n",
" Text(0.8022598870056498, 0.027777777777777776, 'gini = 0.0\\nsamples = 1\\nvalue = [1, 0]\\nclass = Ready biodegradable'),\n",
" Text(0.8248587570621468, 0.027777777777777776, 'gini = 0.0\\nsamples = 1\\nvalue = [0, 1]\\nclass = Reday non-biodegradable'),\n",
" Text(0.8361581920903954, 0.08333333333333333, 'gini = 0.0\\nsamples = 10\\nvalue = [10, 0]\\nclass = Ready biodegradable'),\n",
" Text(0.8248587570621468, 0.25, 'gini = 0.0\\nsamples = 15\\nvalue = [15, 0]\\nclass = Ready biodegradable'),\n",
" Text(0.8587570621468926, 0.3055555555555556, 'V2 <= 2.978\\ngini = 0.034\\nsamples = 58\\nvalue = [57, 1]\\nclass = Ready biodegradable'),\n",
" Text(0.847457627118644, 0.25, 'V2 <= 2.946\\ngini = 0.117\\nsamples = 16\\nvalue = [15, 1]\\nclass = Ready biodegradable'),\n",
" Text(0.8361581920903954, 0.19444444444444445, 'gini = 0.0\\nsamples = 15\\nvalue = [15, 0]\\nclass = Ready biodegradable'),\n",
" Text(0.8587570621468926, 0.19444444444444445, 'gini = 0.0\\nsamples = 1\\nvalue = [0, 1]\\nclass = Reday non-biodegradable'),\n",
" Text(0.8700564971751412, 0.25, 'gini = 0.0\\nsamples = 42\\nvalue = [42, 0]\\nclass = Ready biodegradable'),\n",
" Text(0.8587570621468926, 0.3611111111111111, 'gini = 0.0\\nsamples = 1\\nvalue = [0, 1]\\nclass = Reday non-biodegradable'),\n",
" Text(0.8700564971751412, 0.4166666666666667, 'gini = 0.0\\nsamples = 1\\nvalue = [0, 1]\\nclass = Reday non-biodegradable'),\n",
" Text(0.8813559322033898, 0.4722222222222222, 'gini = 0.0\\nsamples = 83\\nvalue = [83, 0]\\nclass = Ready biodegradable'),\n",
" Text(0.9152542372881356, 0.5277777777777778, 'V17 <= 1.038\\ngini = 0.401\\nsamples = 18\\nvalue = [13, 5]\\nclass = Ready biodegradable'),\n",
" Text(0.903954802259887, 0.4722222222222222, 'V14 <= 0.644\\ngini = 0.231\\nsamples = 15\\nvalue = [13, 2]\\nclass = Ready biodegradable'),\n",
" Text(0.8926553672316384, 0.4166666666666667, 'gini = 0.0\\nsamples = 2\\nvalue = [0, 2]\\nclass = Reday non-biodegradable'),\n",
" Text(0.9152542372881356, 0.4166666666666667, 'gini = 0.0\\nsamples = 13\\nvalue = [13, 0]\\nclass = Ready biodegradable'),\n",
" Text(0.9265536723163842, 0.4722222222222222, 'gini = 0.0\\nsamples = 3\\nvalue = [0, 3]\\nclass = Reday non-biodegradable'),\n",
" Text(0.9717514124293786, 0.5833333333333334, 'V31 <= 1.042\\ngini = 0.013\\nsamples = 153\\nvalue = [152, 1]\\nclass = Ready biodegradable'),\n",
" Text(0.96045197740113, 0.5277777777777778, 'V30 <= 10.944\\ngini = 0.375\\nsamples = 4\\nvalue = [3, 1]\\nclass = Ready biodegradable'),\n",
" Text(0.9491525423728814, 0.4722222222222222, 'gini = 0.0\\nsamples = 3\\nvalue = [3, 0]\\nclass = Ready biodegradable'),\n",
" Text(0.9717514124293786, 0.4722222222222222, 'gini = 0.0\\nsamples = 1\\nvalue = [0, 1]\\nclass = Reday non-biodegradable'),\n",
" Text(0.9830508474576272, 0.5277777777777778, 'gini = 0.0\\nsamples = 149\\nvalue = [149, 0]\\nclass = Ready biodegradable'),\n",
" Text(0.943502824858757, 0.6944444444444444, 'gini = 0.0\\nsamples = 1\\nvalue = [0, 1]\\nclass = Reday non-biodegradable'),\n",
" Text(0.9774011299435028, 0.8611111111111112, 'V13 <= 4.278\\ngini = 0.375\\nsamples = 28\\nvalue = [7, 21]\\nclass = Reday non-biodegradable'),\n",
" Text(0.9661016949152542, 0.8055555555555556, 'V22 <= 1.251\\ngini = 0.159\\nsamples = 23\\nvalue = [2, 21]\\nclass = Reday non-biodegradable'),\n",
" Text(0.9548022598870056, 0.75, 'gini = 0.0\\nsamples = 18\\nvalue = [0, 18]\\nclass = Reday non-biodegradable'),\n",
" Text(0.9774011299435028, 0.75, 'V12 <= -0.564\\ngini = 0.48\\nsamples = 5\\nvalue = [2, 3]\\nclass = Reday non-biodegradable'),\n",
" Text(0.9661016949152542, 0.6944444444444444, 'gini = 0.0\\nsamples = 2\\nvalue = [2, 0]\\nclass = Ready biodegradable'),\n",
" Text(0.9887005649717514, 0.6944444444444444, 'gini = 0.0\\nsamples = 3\\nvalue = [0, 3]\\nclass = Reday non-biodegradable'),\n",
" Text(0.9887005649717514, 0.8055555555555556, 'gini = 0.0\\nsamples = 5\\nvalue = [5, 0]\\nclass = Ready biodegradable')]"
]
},
2022-12-20 20:31:29 +01:00
"execution_count": 40,
2022-12-19 10:09:00 +01:00
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
2022-12-20 20:31:29 +01:00
"image/png": "iVBORw0KGgoAAAANSUhEUgAAMMAAADBwCAYAAADIH5E8AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjYuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8o6BhiAAAACXBIWXMAAD2EAAA9hAHVrK90AAEAAElEQVR4nOzOQQHAMBACsA3/nq8yeJAoyH939wEAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAFCTdgAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAGBd2gEAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAIB1aQcAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAADWpR0AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABYl3YAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABgXdoBAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACAdWkHAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA1qUdAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAWJd2AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAYF3aAQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAgHVpBwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAANalHQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAFiXdgAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAGBd2gEAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAIB1aQcAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAADWpR0AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABYl3YAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABgXdoBAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACAdWkHAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA1qUdAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAWJd2AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAYF3aAQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAgHVpBwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAANalHQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAFiXdgAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAGBd2gEAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAIB1aQcAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAADWpR0AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABYl3YAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABgXdoBAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACAdWkHAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA1qUdAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAWJd2AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAYF3aAQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAgHVpBwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAANalHQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAFiXdgAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAGBd2gEAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAIB1aQcAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAADWpR0AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABYl3YAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABgXdoBAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACAdWkHAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA1qUdAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAWJd2AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAYF3aAQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAgHVpBwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAANalHQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAFiXdgAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAGBd2gEAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAIB1aQcAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAADWpR0AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABYl3YAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABgXdoBAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACAdWkHAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA1qUdAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
2022-12-19 10:09:00 +01:00
"text/plain": [
2022-12-20 20:31:29 +01:00
"<Figure size 16000x16000 with 1 Axes>"
2022-12-19 10:09:00 +01:00
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"from sklearn.tree import plot_tree\n",
"\n",
2022-12-20 20:31:29 +01:00
"plt.figure(figsize=(40, 40), dpi=400)\n",
2022-12-19 10:09:00 +01:00
"plot_tree(model, filled=True, rounded=True, class_names=['Ready biodegradable', 'Reday non-biodegradable'], feature_names=X_train.columns)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "b55c97cd",
"metadata": {},
"source": [
"### Random Forrest Classifier"
]
},
{
"cell_type": "code",
2022-12-20 20:31:29 +01:00
"execution_count": null,
2022-12-19 10:09:00 +01:00
"id": "c9d5676b",
"metadata": {},
"outputs": [
{
"data": {
2022-12-20 20:31:29 +01:00
"image/png": "iVBORw0KGgoAAAANSUhEUgAABNEAAAStCAYAAABqTM3EAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjYuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8o6BhiAAAACXBIWXMAAA9hAAAPYQGoP6dpAADy1klEQVR4nOzdeVhUdeP+8XvYwQUXNkUStXJPDYPcsoUkNcxWk1Kk1Da0JCtcEM2UyiJ6ciEtNC0fzTLzSXOj0EzTcqms3NdIUDRBQUGY+f3hz/k2gY4icFDer+uaS+czZ7nPADrennM+JovFYhEAAAAAAACAC3IwOgAAAAAAAABQ2VGiAQAAAAAAAHZQogEAAAAAAAB2UKIBAAAAAAAAdlCiAQAAAAAAAHZQogEAAAAAAAB2UKIBAAAAAAAAdlCiAQAAAAAAAHZQogEAAAAAAAB2UKIB/2IymTR27NjLXm///v0ymUyaNWtWmWe6UnPmzFGzZs3k7OysWrVqGR3nqrds2TK1bdtWbm5uMplMOnHihNGRKtyAAQMUGBhodAwAAAAAqDCUaKiUZs2aJZPJJJPJpLVr1xZ73WKxKCAgQCaTSffee68BCa8e27dv14ABA9SkSRPNmDFD06dPNzrSVe3YsWN65JFH5O7urilTpmjOnDmqVq1aue3vnz8LJpNJTk5O8vf314ABA5Senl5u+73a/Pt9+ucjNjbW6HglmjhxohYtWmR0DAAAAACXyMnoAMDFuLm5ae7cuercubPN+OrVq/Xnn3/K1dXVoGRXj7S0NJnNZr377ru6/vrrjY5z1fvxxx918uRJjR8/XqGhoRW231dffVWNGjXSmTNn9MMPP2jWrFlau3attm3bJjc3twrLUdmdf5/+qVWrVgalubiJEyfqoYceUu/evY2OAgAAAOASUKKhUuvRo4cWLFig//znP3Jy+r9v17lz5yooKEhZWVkGpqvccnNzVa1aNR05ckSSyvQyzry8PHl4eJTZ9q4m5fF+nv9aXUz37t3Vvn17SdLAgQPl5eWlN954Q4sXL9YjjzxSZlmudv98n8rSpXyNAAAAAFzbuJwTlVrfvn117NgxrVy50jpWUFCgzz77TBERESWuk5ubqxdffFEBAQFydXVV06ZN9dZbb8lisdgsl5+fr2HDhsnb21s1atRQr1699Oeff5a4zfT0dD3xxBPy9fWVq6urWrZsqZSUlFId0/nLztasWaOnnnpKdevWVc2aNdW/f3/9/fffxZb/+uuv1aVLF1WrVk01atRQz5499dtvv9ksM2DAAFWvXl179uxRjx49VKNGDT322GMKDAxUfHy8JMnb27vY/d6mTp2qli1bytXVVfXr19dzzz1X7P5et99+u1q1aqVNmzbptttuk4eHh0aOHGm9B9xbb72lKVOmqHHjxvLw8FC3bt106NAhWSwWjR8/Xg0aNJC7u7vuu+8+HT9+3GbbX375pXr27Kn69evL1dVVTZo00fjx41VUVFRiht9//1133HGHPDw85O/vrzfffLPY+3XmzBmNHTtWN954o9zc3FSvXj098MAD2rNnj3UZs9mspKQktWzZUm5ubvL19dVTTz1V4vv/7xyRkZGSpFtuuUUmk0kDBgywvr5gwQIFBQXJ3d1dXl5eevzxx4tdcnmhr9Xl6tKliyTZHFdBQYHGjBmjoKAgeXp6qlq1aurSpYu+/fZbm3X/+bWbPn26mjRpIldXV91yyy368ccfi+1r0aJFatWqldzc3NSqVSt98cUXJWa61J89k8mk6OhoLViwQC1atJC7u7s6dOigX3/9VZL0/vvv6/rrr5ebm5tuv/127d+//7Lfnwv55ptvrD9PtWrV0n333ac//vjDZpmxY8fKZDLp999/V0REhGrXrm1zNuzHH39s/TrXqVNHjz76qA4dOmSzjV27dunBBx+Un5+f3Nzc1KBBAz366KPKzs62vge5ubn66KOPrJed/vN7CQAAAEDlw5loqNQCAwPVoUMH/fe//1X37t0lnSuVsrOz9eijj+o///mPzfIWi0W9evXSt99+qyeffFJt27bV8uXL9dJLLyk9PV3vvPOOddmBAwfq448/VkREhDp27KhvvvlGPXv2LJYhMzNTt956q/Uf/t7e3vr666/15JNPKicnRy+88EKpji06Olq1atXS2LFjtWPHDk2bNk0HDhxQWlqaTCaTpHMTAkRGRiosLExvvPGG8vLyNG3aNHXu3FlbtmyxubF7YWGhwsLC1LlzZ7311lvy8PDQgAEDNHv2bH3xxReaNm2aqlevrptuuknSuaJg3LhxCg0N1TPPPGPN8OOPP+r777+Xs7OzddvHjh1T9+7d9eijj+rxxx+Xr6+v9bVPPvlEBQUFGjJkiI4fP64333xTjzzyiO68806lpaXplVde0e7du/Xee+9p+PDhNuXjrFmzVL16dcXExKh69er65ptvNGbMGOXk5GjSpEk279fff/+te+65Rw888IAeeeQRffbZZ3rllVfUunVr6/dGUVGR7r33XqWmpurRRx/V888/r5MnT2rlypXatm2bmjRpIkl66qmnNGvWLEVFRWno0KHat2+fJk+erC1bthQ79n8aNWqUmjZtqunTp1svGzy/zfPbu+WWW5SQkKDMzEy9++67+v7777VlyxabM9dK+lpdrvPFUu3ata1jOTk5+uCDD9S3b18NGjRIJ0+e1IcffqiwsDBt3LhRbdu2tdnG3LlzdfLkST311FMymUx688039cADD2jv3r3W92DFihV68MEH1aJFCyUkJOjYsWOKiopSgwYNbLZ1OT97kvTdd99p8eLFeu655yRJCQkJuvfee/Xyyy9r6tSpevbZZ/X333/rzTff1BNPPKFvvvnmkt6X7OzsYmeoenl5SZJWrVql7t27q3Hjxho7dqxOnz6t9957T506ddLmzZuLTZTw8MMP64YbbtDEiROtReCECRMUFxenRx55RAMHDtTRo0f13nvv6bbbbrN+nQsKChQWFqb8/HwNGTJEfn5+Sk9P11dffaUTJ07I09NTc+bM0cCBAxUcHKzBgwdLkvV7CQAAAEAlZQEqoZkzZ1okWX788UfL5MmTLTVq1LDk5eVZLBaL5eGHH7bccccdFovFYmnYsKGlZ8+e1vUWLVpkkWR57bXXbLb30EMPWUwmk2X37t0Wi8Vi2bp1q0WS5dlnn7VZLiIiwiLJEh8fbx178sknLfXq1bNkZWXZLPvoo49aPD09rbn27dtnkWSZOXPmJR1bUFCQpaCgwDr+5ptvWiRZvvzyS4vFYrGcPHnSUqtWLcugQYNs1s/IyLB4enrajEdGRlokWWJjY4vtLz4+3iLJcvToUevYkSNHLC4uLpZu3bpZioqKrOOTJ0+2SLKkpKRYx7p27WqRZElOTrbZ7vnj9fb2tpw4ccI6PmLECIskS5s2bSxnz561jvft29fi4uJiOXPmjHXs/Hv3T0899ZTFw8PDZrnzGWbPnm0dy8/Pt/j5+VkefPBB61hKSopFkiUxMbHYds1ms8VisVi+++47iyTLJ598YvP6smXLShz/t39+b55XUFBg8fHxsbRq1cpy+vRp6/hXX31lkWQZM2aMdexiX6uL7W/VqlWWo0ePWg4dOmT57LPPLN7e3hZXV1fLoUOHrMsWFhZa8vPzbdb/+++/Lb6+vpYnnnjCOnb+a1e3bl3L8ePHreNffvmlRZLlf//7n3Wsbdu2lnr16tl8jVesWGGRZGnYsKF17FJ/9iwWi0WSxdXV1bJv3z7r2Pvvv2+RZPHz87Pk5ORYx89/P/1z2Yu9TyU9/nksPj4+lmPHjlnHfv75Z4uDg4Olf//+1rHzPzN9+/a12cf+/fstjo6OlgkTJtiM//rrrxYnJyfr+JYtWyySLAsWLLho5mrVqlkiIyMvugwAAACAyoPLOVHpPfLIIzp9+rS++uornTx5Ul999dUFL+VcunSpHB0dNXToUJvxF198URaLRV9//bV1OUnFlvv3WWUWi0Wff/65wsPDZbFYlJWVZX2EhYUpOztbmzdvLtVxDR482OaMp2eeeUZOTk7WbCtXrtSJEyfUt29fm/0
2022-12-19 10:09:00 +01:00
"text/plain": [
"<Figure size 1500x1500 with 5 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAjcAAAGwCAYAAABVdURTAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjYuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8o6BhiAAAACXBIWXMAAA9hAAAPYQGoP6dpAABf50lEQVR4nO3dd1QU198G8GcX2KVIM0hTFHtX7BGjRkWxtySSaBRNotHYIjF2xRJ7bIlGo8aW1wQ1v2hM7L1giQ0rooKIBVBEQDrs3vcPw8aVkh3cZWV5PufsSfbOnZlnR3C/3rkzIxNCCBARERGZCLmxAxARERHpE4sbIiIiMiksboiIiMiksLghIiIik8LihoiIiEwKixsiIiIyKSxuiIiIyKSYGztAUVOr1Xj06BFsbW0hk8mMHYeIiIh0IITA8+fP4e7uDrm84LGZElfcPHr0CB4eHsaOQURERIVw//59lCtXrsA+Ja64sbW1BfDi4NjZ2Rk5DREREekiKSkJHh4emu/xgpS44ibnVJSdnR2LGyIiomJGlyklnFBMREREJoXFDREREZkUFjdERERkUljcEBERkUlhcUNEREQmhcUNERERmRQWN0RERGRSWNwQERGRSWFxQ0RERCaFxQ0RERGZFKMWN8ePH0e3bt3g7u4OmUyGHTt2/Oc6R48eRcOGDaFUKlGlShVs2LDB4DmJiIio+DBqcZOSkoL69etjxYoVOvW/e/cuunTpgjZt2iAkJARffvklPvvsM+zbt8/ASYmIiKi4MOqDMzt16oROnTrp3H/VqlWoWLEiFi1aBACoWbMmTp48iSVLlsDX19dQMYmIiEgHmdlqPE3JQLZKwKO0tdFyFKungp8+fRo+Pj5abb6+vvjyyy/zXScjIwMZGRma90lJSYaKR0REZHJyCpa455l4kpz+z38zEJecgSfPX/w3LjkTcckZSEjNAgB4V34Lvwx+22iZi1VxExMTAxcXF602FxcXJCUlIS0tDVZWVrnWmTt3LmbMmFFUEYmIiN54GdkqPP2nIPm3SMl8qVj5ty0xLUvSts3kMqiFMFBy3RSr4qYwJk6ciICAAM37pKQkeHh4GDERERGR/mVkq16MoLxUoOQULE+SM15qL1zB4lRKAadSyn9ftgqUKaVEGduX2kop4GitgFwuM9Cn1E2xKm5cXV0RGxur1RYbGws7O7s8R20AQKlUQqlUFkU8IiIivXq5YHnyatHy8mmh5xlISs+WtG1zuQxvlVK8Upy8KFDK2CpRppQSTv8sc7CyMHrBIkWxKm6aN2+O3bt3a7UdOHAAzZs3N1IiIipprj1MxG8XHhh92J1Mj0otkJCapRlleZKcgecSCxYLMxnesskZTflnpOWfAiWnrcw/RYx9MStYpDBqcZOcnIw7d+5o3t+9exchISEoXbo0ypcvj4kTJ+Lhw4fYtGkTAGDo0KFYvnw5xo0bh08++QSHDx/G1q1bsWvXLmN9BCIqQR4/T8fA9X8jLjnT2FGoBLEwk2lGVbSKln8KlxenhhSagkUmM82CRQqjFjfnz59HmzZtNO9z5sb4+/tjw4YNiI6ORlRUlGZ5xYoVsWvXLowZMwbLli1DuXLlsHbtWl4GTkQGp1YLfLX1MuKSM1G5jA261HM3diQyMTIADtYWWoVMmVJK2FmZs2CRSCZEyRpbTUpKgr29PRITE2FnZ2fsOERUTPxw9A4W7A2DpYUcf454B1VdbI0diahEkfL9Xazm3BBRyZOZrYZKbdx/g119mIhF+28BAGZ0r83ChugNx+KGiN44z1Iyse96DP66Eo1T4XEwcm2j0a2+O/o05q0kiN50LG6I6I2QlJ6F/ddj8deVRzh5Ow7Zb0pF84/qLraY06sO5z4QFQMsbojIaJIzsnEoNBZ/Xo7G8VtPkKlSa5bVdLND13pu6FzXDc62xr9XlbXCjIUNUTHB4oaIilRapgqHbz7GX1ce4fDNx8jI/regqepcCl3ruaNLPTdUcS5lxJREVJyxuCEyEWcinuJMxFNjxyhQ+JMUHAqNRWqmStNW0ckGXeu5oWs9d1R35URdInp9LG6IirnkjGzM3hWKX/+O+u/Ob4hyjlboWs8dXeu5oba7HU/3EJFesbghKsZOhz/F179dxoNnaQCALvXc4GhtYeRU+bO3skD7Wq6oX86eBQ0RGQyLG6JiKD1Lhfl7b2J9cCQAoKyDFb79oD6aV37LuMGIiN4ALG6IiplLUc/w1bbLiHiSAgD4qKkHJnephVJK/joTEQEsbogMRgiBlcfCsfp4BDKy1P+9go7Ssl5MxnWxU2Lee/XQprqz3rZNRGQKWNwQGYBKLTDjz+vYdPqeQbbf08sdM7rXgf0bPL+GiMhYWNwQ6VlGtgpjtoRg99UYyGTAlC610KGWi962b6Uwg1Mp49/UjojoTcXihkiPnqdnYcimCzgd8RQWZjIs8fNC13ruxo5FRFSisLgh0pPHz9MxcN053IhOQimlOX7s3wgtqjgZOxYRUYnD4oZIDyLjUjBg3d+Iik+FUykFNgxqijpl7Y0di4ioRGJxQ/Sarj1MxMD1fyMuORPlS1tj0ydN4elkY+xYREQlFosbotcQfCcOQzadR0qmCrXc7LDhkyZwtrU0diwiohKNxQ1RIf115RHGbAlBlkrAu/Jb+LF/I9ha8tJsIiJjY3FDlI/4lEwM3nQeMYnpeS5/lJgGIYAudd2w2K8+lOZmRZyQiIjywuKGKB/nIuNx4d6zAvsMaF4Bgd1qw0zOh0ASEb0pWNwQ5UOIF/+t4WqL+e/Vy7Xc3sqCE4eJiN5ALG6I/kMppTnqezgYOwYREemIxQ2ZrPvxqbj8IKHQ61++X/h1iYjIeFjckElSqwV6/RCMuOTM194W59MQERUvLG7IJKmF0BQ2jSs4wtyscAWKuVyOT9+pqM9oRERkYCxuyOT95N8E9ta8/wwRUUkhN3YAIiIiIn3iyA0VqQM3YrHiyB2o1MKg+xEw7PaJiOjNxeKGitSm05EIKcKrkOytLGCl4J2DiYhKEhY3VKRyRmw+b1UJb1d6y+D7q+FmC4U5z74SEZUkLG7IKGq526FNDWdjxyAiIhPE4oYM7snzDNx+/BwAkJCaZeQ0RERk6ljckEFlZKvQfsmxXEWNXMYb4xERkWGwuCGDSslQaQqbqs6lIJMBzraW8K5s+Pk2RERUMrG4oSKz78tWkPNRBkREZGC8jISIiIhMCosbIiIiMiksboiIiMikFGrOTVZWFmJiYpCamooyZcqgdOnS+s5FREREVCg6j9w8f/4cK1euROvWrWFnZwdPT0/UrFkTZcqUQYUKFTB48GCcO3fOkFmJiIiI/pNOxc3ixYvh6emJ9evXw8fHBzt27EBISAhu3bqF06dPIzAwENnZ2ejQoQM6duyI27dvGzo3ERERUZ50Oi117tw5HD9+HLVr185zedOmTfHJJ59g1apVWL9+PU6cOIGqVavqNSgRERGRLnQqbn799VedNqZUKjF06NDXCkRERET0Oni1FBEREZkUScXN5cuX8c033+CHH35AXFyc1rKkpCR88skneg1HREREJJXOxc3+/fvRtGlTBAUFYf78+ahRowaOHDmiWZ6WloaNGzcaJCQRERGRrnQubqZPn46xY8fi2rVriIyMxLhx49C9e3fs3bvXkPmomNtzLRoAYKMwAx8ETkRERUHnm/hdv34dP//8MwBAJpNh3LhxKFeuHN5//30EBQWhSZMmBgtJxVNYzHPM/PMGAGBUu6qQsbohIqIioHNxo1QqkZCQoNXWt29fyOVy+Pn5YdGiRfrORsVYWqYKI365iIxsNVpXK4PBLSsZOxIREZUQOhc3Xl5eOHLkCBo1aqTV/uGHH0IIAX9/f72Ho+Jrxp/XcftxMsrYKrGoT33I5Ry1ISKioqFzcTNs2DAcP348z2UfffQRhBBYs2aN3oJR8XU6/CmCzt2HTAYs9fOCUymlsSMREVEJIhNCCGOHKEpJSUmwt7dHYmIi7OzsjB3HJG06HYlpf1xH2xrOWDeQc7GIiOj1Sfn+5k38yGAsLfjjRURERY/fPkRERGRSWNwQERGRSWF
"text/plain": [
"<Figure size 640x480 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"from sklearn.ensemble import RandomForestClassifier\n",
"\n",
"# Score the model with default parameters\n",
2022-12-20 20:31:29 +01:00
"score_rf, model_rf = score_the_model(\n",
2022-12-19 10:09:00 +01:00
" model=RandomForestClassifier(),\n",
" model_name='Random Forest',\n",
" random_seed=42,\n",
" X_train=X_train,\n",
" X_test=X_test,\n",
" y_train=y_train,\n",
" y_test=y_test,\n",
" plot=True\n",
")"
]
},
2022-12-20 20:31:29 +01:00
{
"attachments": {},
"cell_type": "markdown",
"id": "bf17a3b5",
"metadata": {},
"source": [
"### KNN"
]
},
2022-12-19 10:09:00 +01:00
{
"cell_type": "code",
2022-12-20 20:31:29 +01:00
"execution_count": null,
"id": "75fea43a",
2022-12-19 10:09:00 +01:00
"metadata": {},
2022-12-20 20:31:29 +01:00
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAABNEAAAStCAYAAABqTM3EAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjYuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8o6BhiAAAACXBIWXMAAA9hAAAPYQGoP6dpAAEAAElEQVR4nOzdeVhUZf/H8c+wgwouCCiiuJTmBgXh45JLopQ+mraZlhJP2YqWZE+aC6YplWVYmpRpptWjP8ts0dwoLcuyNEtzy31JUEQBUbaZ+f2hTE6gIgKH5f26rrly7jnnzOdA6vjle9+3yWq1WgUAAAAAAADgkhyMDgAAAAAAAACUdxTRAAAAAAAAgCugiAYAAAAAAABcAUU0AAAAAAAA4AooogEAAAAAAABXQBENAAAAAAAAuAKKaAAAAAAAAMAVUEQDAAAAAAAAroAiGgAAAAAAAHAFFNGASsBkMmnChAlXfd6BAwdkMpk0b968Es90rRYsWKAWLVrI2dlZNWvWNDpOhbdixQoFBwfLzc1NJpNJp0+fNjoSAAAAAFQoFNGAEjJv3jyZTCaZTCatX7++wOtWq1UBAQEymUz697//bUDCimPnzp168MEH1bRpU82ePVvvvPOO0ZEqtJMnT+ree++Vu7u7Zs6cqQULFqhatWql9n75vxd++eUXu/G0tDSFhYXJzc1NK1askCRNmDBBJpNJvr6+Onv2bIFrBQYGFvj9kv/77LXXXivyewMAAADAtaKIBpQwNzc3ffTRRwXG161bpyNHjsjV1dWAVBXL2rVrZbFYNH36dD344IO69957jY5Uof3888/KyMjQpEmT9NBDD+mBBx6Qs7NzmWZIT09Xz5499fvvv+vTTz/VbbfdZvf68ePHNWvWrKu65tSpUwstvAEAAABAaaCIBpSwXr16afHixcrLy7Mb/+ijjxQSEiI/Pz+DkpV/mZmZks4XVCSV6DTOqlxsKY2vZ/73qigyMjIUERGhLVu26JNPPtHtt99e4Jjg4GBNnTpV586dK9I1g4ODlZycrISEhCLnAAAAAIBrQRENKGEDBw7UyZMntXr1attYTk6OPv74Yw0aNKjQczIzM/XMM88oICBArq6uat68uV599VVZrVa747KzszVixAjVrVtXNWrUUN++fXXkyJFCr3n06FH95z//ka+vr1xdXdWqVSvNnTu3WPeUP0Xu22+/1aOPPqo6derI09NTQ4YM0alTpwoc/9VXX+mWW25RtWrVVKNGDfXu3Vt//PGH3TEPPvigqlevrr1796pXr16qUaOG7r//fgUGBio2NlaSVLdu3QLrvb311ltq1aqVXF1dVb9+fT355JMF1vfq2rWrWrdurU2bNqlz587y8PDQ888/b1sD7tVXX9XMmTPVpEkTeXh4qGfPnjp8+LCsVqsmTZqkBg0ayN3dXXfccYdSU1Ptrv3ZZ5+pd+/eql+/vlxdXdW0aVNNmjRJZrO50Azbt29Xt27d5OHhIX9/f73yyisFvl5ZWVmaMGGCrr/+erm5ualevXq68847tXfvXtsxFotF8fHxatWqldzc3OTr66tHH3200K//P3NERkZKkm6++WaZTCY9+OCDttcXL16skJAQubu7y9vbWw888ICOHj1apO9VUZw5c0a33XabNm/erE8++US9e/cu9Ljx48crOTm5yN1oHTt21K233qpXXnmlyIU3AAAAALgWFNGAEhYYGKj27dvrf//7n23sq6++Ulpamu67774Cx1utVvXt21evv/66brvtNk2bNk3NmzfXs88+q5iYGLtjH374YcXHx6tnz5566aWX5OzsXGhRIjk5Wf/617+0Zs0aRUdHa/r06WrWrJkeeughxcfHF/veoqOjtWPHDk2YMEFDhgzRhx9+qH79+tkV+xYsWKDevXurevXqevnllzVu3Dht375dnTp10oEDB+yul5eXp4iICPn4+OjVV1/VXXfdpfj4ePXv31+SNGvWLC1YsEB33nmnpPPrZz355JOqX7++XnvtNd111116++231bNnT+Xm5tpd++TJk7r99tsVHBys+Ph4devWzfbahx9+qLfeekvDhg3TM888o3Xr1unee+/V2LFjtWLFCj333HN65JFH9MUXX2jkyJF21503b56qV6+umJgYTZ8+XSEhIRo/frxGjRpV4Ot16tQp3XbbbQoKCtJrr72mFi1a6LnnntNXX31lO8ZsNuvf//63XnjhBYWEhOi1117TU089pbS0NG3bts123KOPPqpnn31WHTt21PTp0xUVFaUPP/xQERERBe79YmPGjNEjjzwiSZo4caIWLFigRx991HYv9957rxwdHRUXF6ehQ4dqyZIl6tSpU4HCZGHfqyvJzMzU7bffrp9//lmLFy++7FqAt9xyy1UXxSZMmHBVhTcAAAAAuCZWACXivffes0qy/vzzz9YZM2ZYa9SoYT179qzVarVa77nnHmu3bt2sVqvV2qhRI2vv3r1t5y1dutQqyfriiy/aXe/uu++2mkwm6549e6xWq9W6ZcsWqyTrE088YXfcoEGDrJKssbGxtrGHHnrIWq9ePWtKSordsffdd5/Vy8vLlmv//v1WSdb33nuvSPcWEhJizcnJsY2/8sorVknWzz77zGq1Wq0ZGRnWmjVrWocOHWp3flJSktXLy8tuPDIy0irJOmrUqALvFxsba5VkPXHihG3s+PHjVhcXF2vPnj2tZrPZNj5jxgyrJOvcuXNtY126dLFKsiYkJNhdN/9+69ataz19+rRtfPTo0VZJ1qCgIGtubq5tfODAgVYXFxdrVlaWbSz/a3exRx991Orh4WF3XH6G+fPn28ays7Otfn5+1rvuuss2NnfuXKsk67Rp0wpc12KxWK1Wq/W7776zSrJ++OGHdq+vWLGi0PF/uvj/zXw5OTlWHx8fa+vWra3nzp2zjX/55ZdWSdbx48fbxi73vbrc+zVq1Mjq7OxsXbp06SWPvfh7vW7dugJfi3/+frFarVZJ1ieffNJqtVqt3bp1s/r5+dm+L4XdKwAAAACUBDrRgFJw77336ty5c/ryyy+VkZGhL7/88pJTOZcvXy5HR0cNHz7cbvyZZ56R1Wq1dS0tX75ckgoc9/TTT9s9t1qt+uSTT9SnTx9ZrValpKTYHhEREUpLS9PmzZuLdV+PPPKI3YL0jz/+uJycnGzZVq9erdOnT2vgwIF27+vo6Kh27drpm2++KXDNxx9/vEjvvWbNGuXk5Ojpp5+Wg8Pff3QNHTpUnp6eWrZsmd3xrq6uioqKKvRa99xzj7y8vGzP27VrJ0l64IEH5OTkZDeek5NjN73R3d3d9uuMjAylpKTolltu0dmzZ7Vz506796levboeeOAB23MXFxeFhYVp3759trFPPvlE3t7eGjZsWIGcJpNJ0vkpl15eXurRo4fd1zUkJETVq1cv9Ot6Jb/88ouOHz+uJ554Qm5ubrbx3r17q0WLFgW+nlLRv1f5kpOT5ebmpoCAgCId37lzZ3Xr1u2qu9GSkpJYGw0AAABAqaOIBpSCunXrKjw8XB999JGWLFkis9msu+++u9BjDx48qPr166tGjRp24zfccIPt9fz/Ojg4qGnTpnbHNW/e3O75iRMndPr0ab3zzjuqW7eu3SO/qJS/0PzVuu666+yeV69eXfXq1bNN0/zzzz8lSbfeemuB9161alWB93VyclKDBg2K9N75X4d/3q+Li4uaNGliez2fv7+/XFxcCr1Ww4YN7Z7nF9T+WezJH7943bE//vhD/fv3l5eXlzw9PVW3bl1boSwtLc3u/AYNGtgKYflq1apld729e/eqefPmdsW7f/rzzz+VlpYmHx+fAl/XM2fOFOv7eamvpyS1aNGiwNfzar5X+d5++225uLjotttu065du4p0ztUWxYpTeAMAAACA4rj0v9oAXJNBgwZp6NChSkpK0u23316iOyNejsVikXS+qyp/Qfl/atu2bam+94IFCwrdhfSfhSJXV1e7rrKSdHHH2D85Ojpe1bj1wppvp0+fVpcuXeTp6amJEyeqadOmcnNz0+bNm/Xcc8/Z7r+o1ysqi8UiHx8fffjhh4W+Xrd
"text/plain": [
"<Figure size 1500x1500 with 5 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAjcAAAGwCAYAAABVdURTAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjYuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8o6BhiAAAACXBIWXMAAA9hAAAPYQGoP6dpAABt3klEQVR4nO3dd3zM9x8H8Ndl3GUvkUQGkdgzVlS0RoWgVaulKEFLi2hLtbUpNYvSWrXpj1qlVTRKKlbtTSLIECMJaWSPS+4+vz80x0nCXdzlMl7Px+Me3Oc+3++975vL3TufKRFCCBARERGVE0aGDoCIiIhIl5jcEBERUbnC5IaIiIjKFSY3REREVK4wuSEiIqJyhckNERERlStMboiIiKhcMTF0ACVNqVTiwYMHsLa2hkQiMXQ4REREpAEhBNLS0uDq6gojoxe3zVS45ObBgwfw8PAwdBhERERUDHfv3oW7u/sL61S45Mba2hrAk4tjY2Nj4GiIiIhIE6mpqfDw8FB9j79IhUtu8ruibGxsmNwQERGVMZoMKeGAYiIiIipXmNwQERFRucLkhoiIiMoVJjdERERUrjC5ISIionKFyQ0RERGVK0xuiIiIqFxhckNERETlCpMbIiIiKleY3BAREVG5YtDk5ujRo+jWrRtcXV0hkUjw22+/vfSY0NBQNG3aFDKZDDVq1MCGDRv0HicRERGVHQZNbjIyMtC4cWMsW7ZMo/rR0dF466230L59e1y6dAmff/45PvroIxw4cEDPkRIREVFZYdCNM7t06YIuXbpoXH/lypWoXr06Fi5cCACoW7cujh8/ju+//x4BAQH6CpOIiIg0kJOnwOOMXOQplXC3tzBYHGVqV/CTJ0/C399frSwgIACff/55kcfk5OQgJydHdT81NVVf4REREZUbQghkyBVISpfj34wcPM6U4990OZIyntz+zZDj8X//5pel5+QBAFp5VcIvw18zWOxlKrmJj4+Hs7OzWpmzszNSU1ORlZUFc3PzAsfMmTMH33zzTUmFSEREVCoplQLJWblIyshBUsaTf//NkP+XvMjxOPO/pCU/gcmUQ56n1Pp5TIwkEBB6eAVaxGDQZy8BEyZMwNixY1X3U1NT4eHhYcCIiIiIXp08T6lqMXnSkpKDpOdaU/L/ffxf8qIsRs5hbmoMB0up6lbpv3/tn/l/JSsp7C2kqGQpg425CSQSie5fsBbKVHLj4uKChIQEtbKEhATY2NgU2moDADKZDDKZrCTCIyIiKhYhBDLlimcSkhz8m/4kIclvXclvTUn6737af11A2rIxM0ElK9nThMVCCgerp4mKeiIjg7nUWMevVv/KVHLTqlUr7N+/X63s4MGDaNWqlYEiIiIiKkipFEjJylUbj5KUn7QUKHuSwBSnC8jYSPJfi0khiYmVevLiYPmkdcXUuPwvcWfQ5CY9PR23b99W3Y+OjsalS5fg4OCAqlWrYsKECbh//z42bdoEAPjkk0+wdOlSfPXVVxg6dCj+/vtvbN++Hfv27TPUSyAiogpAnqdUDah92pqSo9aa8uxg2+J2AZmZGqGSpQz2lqZwsJQV0Zry9P82ZqYwMjJsF1BpZNDk5ty5c2jfvr3qfv7YmMDAQGzYsAFxcXGIjY1VPV69enXs27cPY8aMwZIlS+Du7o41a9ZwGjgREWns2S6gJLWxKU8H2j4/Gygtu/hdQE8Tk/+SFav/WlMsn3YH2Vs8aWmxkJapDpVSSyKEMOyQ5hKWmpoKW1tbpKSkwMbGxtDhEBHRK1IqBVKzn3YB5begPJ26nPN0NtB/M4Nyit0FZKo2HuX5FpZKzwy0tbesGF1AJUWb72+miEREVKrkKpQF1k9Rb11RL3+cmQtFMfqAZCZGT1tSLGVwsPgvUXlmfEr+/yuxC6hMYXJDRER6lSnPU1s7JX/mj/oicE+7gorbBWRtZqLWcvJsV5D9c2NVKllJYW5qbPApy6QfTG6IiEhj+V1A6q0phbewPM7Ixb8ZOcjO1b4LyEgCVetJgZk/llI4WKkPtrW3kEJqwi4geoLJDRFRBZarUKpWpn12pVr1ZfbzE5UnjxWnC0ia3wVUyNgUB0vZcwvBSWFrzi4gKj4mN0RE5UiWXKFaqTa/20fVmpL+zCJwGXL8m56D1OJ2AclMVGun5M/2eboQnAwOzw20tZCyC4hKDpMbIqJSSgiB1Kw8VbLyfFfQ84NuX6ULKL/7R21JfbVl9p+2rthZmEJmUvZWraWKg8kNEVEJyVMokZQpV41FedqCUvgy+48z5Mh7hS6gZ2f7FFxm/+ny+7bmpjBmFxCVI0xuiIiKKUuuUM3+eb51pbAWlpSs3GI9j5XMRG2sSmHL7OdvWuhgJYUlu4CogmNyQ0SE/7qAsvOe7v/zgk0L8wfbZuUqtH4eyTNdQAUH1hbctNDekl1ARNpickNE5VKeQonHmblqs31etGlhsbuAjI0KJCWqFpZnltmv9N9CcewCItI/JjdEVG7I85TYffEe1h6Pxs2E9GKdI78L6PmBtfbPdQvlt6pYyUzYBURUyjC5IaIyL1Oehy2nY7HmWDTiU7NV5RIJYGduqkpGHAptTXk68NbeQgozU3YBEZV1TG6IqMxKzpRjwz8x2PBPDJIznwzWdbaRYdgbXnjHxxWVLGXsAiKqgJjcEFGZE5+SjTXHorDlTCwy5U8G9XpWssAnbb3Rs6kbB+ASVXBMboiozIhJzMBPRyPx6/n7kCueLFZXr4oNRrb3RpcGVdhKQ0QAmNwQURlw/UEKVoRGYv/VOORPaPKt7oCR7bzRtlZlDuglIjVMboio1DoTnYTlobcRGvFIVfZmHSeMbOeN5p4OBoyMiEozJjdEVKoIIXA44iGWH47EuTuPATzZ++jtRq4Y0c4bdavYGDhCIirtmNwQUamQp1Bi39U4rAiNxI34NABPFsh7t7k7Pm7jhWqVLA0cIRGVFUxuiMigsnMV+PXCPfx0JAqxSZkAAEupMT54rRo+fL06nGzMDBwhEZU1TG6IyCDSc/Kw+dQdrDkejUdpOQAAB0sphvh5YlArT9hamBo4QiIqq5jcEFGJSsqQY/2JaGz8Jwap2XkAAFdbMwxr44W+LTxgIeXHEhG9Gn6KEFGJeJCchVVHo7D1bCyyc5+sUeNd2RKftPVGdx83SE2MDBwhEZUXTG6ISK9uP0zHyiOR+O3ifdWu243cbTGynTc61XOBERfeIyIdY3JDRHpx9V4KlofeRvD1eIj/Ft5r5VUJI9t74/Uajlx4j4j0hskNEemMEAIno/7FitBIHLuVqCrvWM8ZI9t5o0lVewNGR0QVBZMbInplSqXAofAELA+NxKW7yQAAYyMJujd2xSftvFHL2dqwARJRhcLkhoiKLVehxB+XH2BFaCRuPUwHAMhMjNC3hQeGveEFDwcLA0dIRBURkxsi0lp2rgLbz93FT0eicD85CwBgLTPBwFbVMKR1dVS2lhk4QiKqyJjcEJHGUrNz8fPJO1h/IhqJ6XIAgKOVFENfr44PXqsGGzMuvEdEhsfkhohe6lFaDtadiMb/Tt5BWs6Thffc7c3xcRsvvNfcA2amxgaOkIjoKSY3RFSku0mZWHU0CtvP3UVO3pOF92o5W2FEO2+83cgVpsZceI+ISh8mN0RUwM2ENKwIjcSeyw+g+G/hPR8PO4xqXwMd6jhx4T0iKtWY3BCRyoXYx1h+OBKHwhNUZW/UdMSIdt5o5VWJC+8RUZnA5IaoghNC4NitRCwPvY1TUUkAAIkE6FzfBSPb1UBDd1sDR0hEpB0mN0QVlFIpcOB6PJaHRuLq/RQAgImRBD2buOHjtt6o4WRl4AiJiIqHyQ1RBSPPU+K3S/ex8kgkoh5lAADMTY3xvu+Thfdc7cwNHCER0athckNUQWTK87D1zF2sPhaFuJRsAICNmQkG+3licOvqcLCUGjhCIiLdYHJDVM6lZOZi48kYrD8RjceZuQAAJ2sZPnqjOvq3rAYrGT8GiKh8KdanWm5uLuLj45GZmYnKlSvDwcFB13ER0St6mJqNNcejsfnUHWTIFQCAapUs8HEbb/Rq6saF94io3NI4uUlLS8P//vc
"text/plain": [
"<Figure size 640x480 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
2022-12-19 10:09:00 +01:00
"source": [
"from sklearn.neighbors import KNeighborsClassifier\n",
"\n",
2022-12-20 20:31:29 +01:00
"# Score the model with default parameters\n",
"score_knn, model_knn = score_the_model(\n",
" model=KNeighborsClassifier(),\n",
" model_name='KNN',\n",
" random_seed=42,\n",
" X_train=X_train,\n",
" X_test=X_test,\n",
" y_train=y_train,\n",
" y_test=y_test,\n",
" plot=True\n",
")"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "0b925bbf",
"metadata": {},
"source": [
"#### Logistic Regression"
2022-12-19 10:09:00 +01:00
]
},
{
"cell_type": "code",
2022-12-20 20:31:29 +01:00
"execution_count": null,
"id": "33d0774a",
2022-12-19 10:09:00 +01:00
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/home/gasperspagnolo/Documents/faks_git/is_assignments/a2/code/.venv/lib64/python3.10/site-packages/sklearn/linear_model/_logistic.py:458: ConvergenceWarning: lbfgs failed to converge (status=1):\n",
"STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.\n",
"\n",
"Increase the number of iterations (max_iter) or scale the data as shown in:\n",
" https://scikit-learn.org/stable/modules/preprocessing.html\n",
"Please also refer to the documentation for alternative solver options:\n",
" https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression\n",
" n_iter_i = _check_optimize_result(\n"
]
},
{
"data": {
2022-12-20 20:31:29 +01:00
"image/png": "iVBORw0KGgoAAAANSUhEUgAABNEAAAStCAYAAABqTM3EAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjYuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8o6BhiAAAACXBIWXMAAA9hAAAPYQGoP6dpAADsoUlEQVR4nOzdeVxV1f7/8fcBOQwOqDEpoqgNDjkUJuGQWSSpUVqWQylaWVaYSVaSIppTZRHeHChzyMqrDebtpmlKYqnkbGU55FwmKE4oKCjs3x/+3N/OBT1KwEZ5PR+P88i99tr7fPY5QIc3a69lMwzDEAAAAAAAAICLcrG6AAAAAAAAAKCsI0QDAAAAAAAAnCBEAwAAAAAAAJwgRAMAAAAAAACcIEQDAAAAAAAAnCBEAwAAAAAAAJwgRAMAAAAAAACcIEQDAAAAAAAAnCBEAwAAAAAAAJwgRMM1xWazaeTIkVd83N69e2Wz2TRr1qxir+mf+uijj9SgQQO5ubmpatWqVpdz1Vu8eLGaN28uDw8P2Ww2HT9+3OqSit2dd96pO++8s9jOFxwcrL59+xbb+VD0n1UAAAAArEOIhmI3a9Ys2Ww22Ww2rVy5ssB+wzAUFBQkm82m++67z4IKrx7btm1T3759Vb9+fU2bNk3vv/++1SVd1Y4cOaJHHnlEnp6emjx5sj766CNVrFixxJ7vwvfC+vXrS+w5isvq1as1cuTIEg8Vg4ODzZ8PNptNFStWVMuWLTV79uwSfV4AAAAA+KcqWF0Arl0eHh6aM2eO2rRp49C+YsUK/fnnn3J3d7eosqtHSkqK8vPzNXHiRF1//fVWl3PVW7dunU6ePKnRo0crPDzc6nJKzLfffnvFx6xevVqjRo1S3759C4x43L59u1xciu9vLs2bN9eLL74oSTp48KA++OADRUVFKScnR/379y+25ynLTp8+rQoV+F8wAAAAcDVhJBpKTKdOnfTZZ5/p3LlzDu1z5sxRSEiIAgICLKqs7MvKypIkHTp0SJKK9TbO7OzsYjvX1aYkXs8L71VZYrfbZbfbi+187u7ucnNzK7bzBQYG6rHHHtNjjz2ml156SStXrlSlSpX0zjvvFNtzXC6r3j8PDw9CNAAAAOAqQ4iGEtOzZ08dOXJES5cuNdtyc3P1+eefq1evXoUek5WVpRdffFFBQUFyd3fXTTfdpLfeekuGYTj0y8nJ0eDBg+Xr66vKlSvr/vvv159//lnoOQ8cOKDHH39c/v7+cnd3V+PGjTVjxowiXdOF2/O+//57Pf3007ruuutUpUoV9enTR8eOHSvQ/5tvvlHbtm1VsWJFVa5cWZ07d9avv/7q0Kdv376qVKmSdu3apU6dOqly5cp69NFHFRwcrPj4eEmSr69vgTmUpkyZosaNG8vd3V01a9bUc889V+BWvDvvvFM333yzNmzYoDvuuENeXl569dVXzTng3nrrLU2ePFn16tWTl5eXOnTooD/++EOGYWj06NGqVauWPD099cADD+jo0aMO5/7Pf/6jzp07q2bNmnJ3d1f9+vU1evRo5eXlFVrDb7/9pvbt28vLy0uBgYF68803C7xeZ86c0ciRI3XjjTfKw8NDNWrU0IMPPqhdu3aZffLz85WYmKjGjRvLw8ND/v7+evrppwt9/f+3jqioKEnSbbfdJpvN5jDP12effaaQkBB5enrKx8dHjz32mA4cOHBZ79U/tWnTJnXs2FFVqlRRpUqVdPfdd+vHH38s0O/nn39Wu3bt5OnpqVq1amnMmDGaOXOmbDab9u7d63Ct/zsn2rvvvqvGjRvLy8tL1apVU4sWLTRnzhxJ0siRI/XSSy9JkurWrWveannhnIXNiXb8+HENHjxYwcHBcnd3V61atdSnTx9lZGRc8fX7+vqqQYMGDu+zdPnvdX5+vkaOHKmaNWvKy8tL7du312+//Vag7gvfvytWrNCzzz4rPz8/1apVy9x/Od+vaWlp6tevn2rVqiV3d3fVqFFDDzzwgMPrv379ekVERMjHx0eenp6qW7euHn/8cYfzFDYn2uV8HVy4hlWrVikmJka+vr6qWLGiunbtqsOHD1/uSw4AAACgCPgzOEpMcHCwwsLC9O9//1sdO3aUdP6X1BMnTqhHjx7617/+5dDfMAzdf//9Wr58uZ544gk1b95cS5Ys0UsvvaQDBw44jFJ58skn9fHHH6tXr15q1aqVvvvuO3Xu3LlADenp6br99ttls9kUHR0tX19fffPNN3riiSeUmZmpF154oUjXFh0drapVq2rkyJHavn27pk6dqn379iklJUU2m03S+QUBoqKiFBERoTfeeEPZ2dmaOnWq2rRpo02bNik4ONg837lz5xQREaE2bdrorbfekpeXl/r27avZs2fryy+/1NSpU1WpUiU1bdpU0vnQY9SoUQoPD9czzzxj1rBu3TqtWrXKYdTQkSNH1LFjR/Xo0UOPPfaY/P39zX2ffPKJcnNzNXDgQB09elRvvvmmHnnkEd11111KSUnRK6+8op07d+rdd9/VkCFDHMLHWbNmqVKlSoqJiVGlSpX03XffacSIEcrMzNSECRMcXq9jx47p3nvv1YMPPqhHHnlEn3/+uV555RU1adLE/NrIy8vTfffdp+TkZPXo0UODBg3SyZMntXTpUm3ZskX169eXJD399NOaNWuW+vXrp+eff1579uzRpEmTtGnTpgLX/nfDhg3TTTfdpPfff1+vvfaa6tata57zwvluu+02jR8/Xunp6Zo4caJWrVqlTZs2OYxcK+y9+id+/fVXtW3bVlWqVNHLL78sNzc3vffee7rzzju1YsUKhYaGSjofBrdv3142m02xsbGqWLGiPvjgg8u6LXratGl6/vnn1a1bNw0aNEhnzpzRzz//rDVr1qhXr1568MEHtWPHDv373//WO++8Ix8fH0nnw63CnDp1Sm3bttXWrVv1+OOP69Zbb1VGRoa++uor/fnnn+bxl+vcuXP6888/Va1aNYf2y32vY2Nj9eabbyoyMlIRERH66aefFBERoTNnzhT6fM8++6x8fX01YsQIcyTa5X6/PvTQQ/r11181cOBABQcH69ChQ1q6dKn2799vbnfo0EG+vr4aOnSoqlatqr1792r+/PmXfA0u9+vggoEDB6patWqKj4/X3r17lZiYqOjoaM2bN++KXnsAAAAAV8AAitnMmTMNSca6deuMSZMmGZUrVzays7MNwzCMhx9+2Gjfvr1hGIZRp04do3PnzuZxCxYsMCQZY8aMcThft27dDJvNZuzcudMwDMPYvHmzIcl49tlnHfr16tXLkGTEx8ebbU888YRRo0YNIyMjw6Fvjx49DG9vb7OuPXv2GJKMmTNnXta1hYSEGLm5uWb7m2++aUgy/vOf/xiGYRgnT540qlatavTv39/h+LS0NMPb29uhPSoqypBkDB06tMDzxcfHG5KMw4cPm22HDh0y7Ha70aFDByMvL89snzRpkiHJmDFjhtnWrl07Q5KRlJTkcN4L1+vr62scP37cbI+NjTUkGc2aNTPOnj1rtvfs2dOw2+3GmTNnzLYLr93fPf3004aXl5dDvws1zJ4922zLyckxAgICjIceeshsmzFjhiHJSEhIKHDe/Px8wzAM44cffjAkGZ988onD/sWLFxfa/r/+/rV5QW5uruHn52fcfPPNxunTp832r7/+2pBkjBgxwmy71Ht1uc/3v7p06WLY7XZj165dZttff/1lVK5c2bjjjjvMtoEDBxo2m83YtGmT2XbkyBGjevXqhiRjz549Znu7du2Mdu3amdsPPPCA0bhx40vWOmHChALnuaBOnTpGVFSUuT1ixAhDkjF//vwCfS+8VxdTp04do0OHDsbhw4eNw4cPG7/88ovRu3dvQ5Lx3HPPmf0u971OS0szKlSoYHTp0sWh38iRIw1JDnVfeD/atGljnDt3zmy/3O/XY8eOGZKMCRMmXPT6vvzyS6fvuWEYBX5WXe7XwYVrCA8Pd3itBw8ebLi6ujp8PwMAAAAoXtzOiRL1yCOP6PTp0/r666918uRJff311xe9lXPRokVydXXV888/79D+4osvyjAMffP
2022-12-19 10:09:00 +01:00
"text/plain": [
2022-12-20 20:31:29 +01:00
"<Figure size 1500x1500 with 5 Axes>"
2022-12-19 10:09:00 +01:00
]
},
"metadata": {},
2022-12-20 20:31:29 +01:00
"output_type": "display_data"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAjcAAAGwCAYAAABVdURTAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjYuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8o6BhiAAAACXBIWXMAAA9hAAAPYQGoP6dpAABYaUlEQVR4nO3deVhUZfsH8O+AzAAKqCGyOIK7orkSBlamorjkmoVLiuaSu0nuC7hbuZcLuWuviktavmm4kKYipYCYCmKiiCmgpIIIss3z+6Mf8zox0BycYWT8fq5rrst5znPOuc9hYG6fc5/nyIQQAkREREQmwszYARARERHpE5MbIiIiMilMboiIiMikMLkhIiIik8LkhoiIiEwKkxsiIiIyKUxuiIiIyKRUMHYAZU2lUuHevXuwsbGBTCYzdjhERESkAyEEnjx5AmdnZ5iZlTw288olN/fu3YNSqTR2GERERFQKd+7cQY0aNUrs88olNzY2NgD+Pjm2trZGjoaIiIh0kZGRAaVSqf4eL8krl9wUXoqytbVlckNERFTO6FJSwoJiIiIiMilMboiIiMikMLkhIiIik8LkhoiIiEwKkxsiIiIyKUxuiIiIyKQwuSEiIiKTwuSGiIiITAqTGyIiIjIpTG6IiIjIpBg1uTl9+jS6d+8OZ2dnyGQyfP/99/+6zqlTp9CyZUsoFArUrVsX27ZtM3icREREVH4YNbl5+vQpmjVrhrVr1+rU/9atW+jWrRvatWuHmJgYfPrppxg+fDiOHj1q4EiJiIiovDDqgzO7dOmCLl266Nw/ODgYtWrVwvLlywEAjRo1wtmzZ7Fy5Ur4+voaKkwiIjIQIQSy8wqMHQYZgJWFuU4PuTSEcvVU8IiICPj4+Gi0+fr64tNPPy12nZycHOTk5KjfZ2RkGCo8IiKSQAiBvsERiLr9yNihkAHEzveFtdw4aUa5KihOSUlB9erVNdqqV6+OjIwMZGdna11nyZIlsLOzU7+USmVZhEpERP8iO6+AiQ0ZRLkauSmNGTNmICAgQP0+IyODCQ4R0UsmcrYPrOXmxg6D9MjKwng/z3KV3Dg6OiI1NVWjLTU1Fba2trCystK6jkKhgEKhKIvwiOgVxJqR0svK/d95s5abG+0SBpmecvVJ8vLywpEjRzTajh8/Di8vLyNFRESvMtaMEL2cjFpzk5mZiZiYGMTExAD4+1bvmJgYJCUlAfj7ktLgwYPV/UeNGoWbN29i6tSpuHbtGtatW4e9e/di0qRJxgifiF5xrBnRDw/XKka9hEGmx6gjN5GRkWjXrp36fWFtjL+/P7Zt24bk5GR1ogMAtWrVwuHDhzFp0iSsXr0aNWrUwKZNm3gbOBEZHWtGSs+YtwyTaTJqcvPuu+9CCFHscm2zD7/77ru4ePGiAaMiIpKONSNELw/+JhKRTlg4W9TzBbFE9PJgckNE/4qFs0RUnpSrSfyIyDhYOFsyFsQSvVw4ckNEkrBwtigWxBK9XJjcEJEkLJwlopcd/0IRveJ0KRRm4SwRlSdMboheYSwUJiJTxIJioleY1EJhFs4SUXnAkRsiAqBboTALZ4moPGByQ1SOvejEenwqMxGZIv4lIyqnWC9DRKQda26Iyil9TqzHWhoiMiUcuSEyAS86sR5raYjIlDC5ITIBrJchIvof/jUk0rOyeno2J9YjItKOyQ2RHrHIl4jI+FhQTKRHxnh6NouBiYg0ceSGyEDK6unZLAYmItLE5IbIQFjkS0RkHLwsRURERCaFyQ0RERGZFCY3REREZFKY3BAREZFJYbUjUSkUN1EfJ9YjIjI+JjdEEnGiPiKilxsvSxFJpMtEfZxYj4jIeDhyQ/QCipuojxPrEREZD5MbKtfK6iGVz3u+roYT9RERvXz4V5nKLda+EBGRNqy5oXLLGA+pfB7raoiIXk4cuSGTUFYPqXwe62qIiF5OTG7IJLD2hYiICvHbgAzOUEW/nDCPiIi0YXJDBsWiXyIiKmssKCaDKouiXxb2EhHR8zhyQ2XGUEW/LOwlIqLnMbkhgyiss+GEd0REVNb4TUN6xzobIiIyJtbckN5pq7NhXQwREZWVUo3c5OXlISUlBVlZWahWrRqqVq2q77jIRBTW2bAuhoiIyorOIzdPnjzB+vXr0bZtW9ja2sLNzQ2NGjVCtWrV4OrqihEjRuDChQuGjJXKocI6GyY2RERUVnRKblasWAE3Nzds3boVPj4++P777xETE4Pr168jIiICQUFByM/PR6dOndC5c2f88ccfho6bXjJCCGTl5v//i5PrERGR8eh0WerChQs4ffo0GjdurHW5p6cnPv74YwQHB2Pr1q04c+YM6tWrp9dA6eXFAmIiInqZ6JTc7N69W6eNKRQKjBo16oUCovKnuIn6WERMRETGwFvBSa+en6iPRcRERGQMkm4Fv3TpEhYuXIh169YhLS1NY1lGRgY+/vhjvQZHL7f/1dkUnaiPRcRERGQsMiGE0KXjsWPH0L17d9SrVw9PnjzB06dPsW/fPrRr1w4AkJqaCmdnZxQUvNzFpBkZGbCzs0N6ejpsbW2NHU65VVydTex8X85CTEREeifl+1vnkZu5c+di8uTJuHLlChITEzF16lT06NEDoaGhLxwwlT+cqI+IiF5WOv8X++rVq/j2228BADKZDFOnTkWNGjXQt29fhISE4I033jBYkPRy40R9RET0MtE5uVEoFHj8+LFG24ABA2BmZgY/Pz8sX75c37FROcEHYhIR0ctE52+k5s2b4+TJk2jVqpVGe79+/SCEgL+/v96DIyIiIpJK5+Rm9OjROH36tNZl/fv3hxACGzdu1FtgRERERKWh891SpoJ3S+lHVm4+3AOPAuAdUkREZHgGuVuKiIiIqDxgckNEREQmhckNERERmRQmN0RERGRSjJ7crF27Fm5ubrC0tETr1q1x/vz5EvuvWrUKDRo0gJWVFZRKJSZNmoRnz56VUbRERET0sitVcnP69GlERkZqtEVGRhZ7q3hx9uzZg4CAAAQFBSE6OhrNmjWDr68v7t+/r7X/rl27MH36dAQFBSEuLg6bN2/Gnj17MHPmzNIcBhEREZmgUiU37777LgYPHqzRNmjQIPVDNHW1YsUKjBgxAkOHDoW7uzuCg4NhbW2NLVu2aO1/7tw5tGnTBgMGDICbmxs6deqE/v37lzjak5OTg4yMDI0XERERma5SJTe3bt3CiRMnNNrCwsJw8+ZNnbeRm5uLqKgo+Pj4/C8YMzP4+PggIiJC6zre3t6IiopSJzM3b97EkSNH0LVr12L3s2TJEtjZ2alfSqVS5xiJiIio/CnVzGuurq5F2pydnSVtIy0tDQUFBahevbpGe/Xq1XHt2jWt6wwYMABpaWl46623IIRAfn4+Ro0aVeJlqRkzZiAgIED9PiMjgwkOERGRCTN6QbEUp06dwuLFi7Fu3TpER0fjwIEDOHz4MBYsWFDsOgqFAra2thovIiIiMl06jdxUqVIFMplMpw0+fPhQp3729vYwNzdHamqqRntqaiocHR21rjNnzhwMGjQIw4cPBwC8/vrrePr0KUaOHIlZs2bBzKxc5WpERERkADolN6tWrdL7juVyOVq1aoWwsDD06tULAKBSqRAWFoZx48ZpXScrK6tIAmNubg4AeMUekUVERETF0Cm58ff3N8jOAwIC4O/vDw8PD3h6emLVqlV4+vQphg4dCgAYPHgwXFxcsGTJEgBA9+7dsWLFCrRo0QKtW7fGjRs3MGfOHHTv3l2d5BAREdGrrVQFxQkJCdi6dSsSEhKwevVqODg44KeffkLNmjXRuHFjnbfj5+eHBw8eIDAwECkpKWjevDlCQ0PVRcZJSUkaIzWzZ8+GTCbD7NmzcffuXVSrVg3du3fHokWLSnMYREREZIJkQuL1nF9++QVdunRBmzZtcPr0acTFxaF27dr4/PPPERkZif379xsqVr2Q8sh0Kl5Wbj7cA48CAGLn+8JaXqo8mYiISCdSvr8lV+BOnz4dCxcuxPHjxyGXy9Xt7du3x6+//io9WjIKIQSycvNf4FVg7EMgIiLSSvJ/ty9fvoxdu3YVaXdwcEBaWpp
"text/plain": [
"<Figure size 640x480 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
2022-12-19 10:09:00 +01:00
}
],
"source": [
2022-12-20 20:31:29 +01:00
"from sklearn.linear_model import LogisticRegression\n",
"\n",
"# Score the model with default parameters\n",
"score_log_reg, model_log_reg = score_the_model(\n",
" model=LogisticRegression(max_iter=100),\n",
" model_name='Logistic Regression',\n",
" random_seed=42,\n",
" X_train=X_train,\n",
" X_test=X_test,\n",
" y_train=y_train,\n",
" y_test=y_test,\n",
" plot=True\n",
")"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "641e5a5a",
"metadata": {},
"source": [
"#### SVM"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "96adfe07",
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAABNEAAAStCAYAAABqTM3EAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjYuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8o6BhiAAAACXBIWXMAAA9hAAAPYQGoP6dpAADckklEQVR4nOzdf3zN9f//8fvZsbMfxqjZxlqGyvyWjeVHqYwVH79KREVL+mWS9YuMiTKlNOXHSn5EvEnJuzciVlOy/Jj0roSQH2/ZWGJs2Jy9vn/0dd6d98Zhtr0Ou10vl3PJeb6er9fr8Tpn8nL3fD2fFsMwDAEAAAAAAAA4Lw+zCwAAAAAAAADcHSEaAAAAAAAA4AIhGgAAAAAAAOACIRoAAAAAAADgAiEaAAAAAAAA4AIhGgAAAAAAAOACIRoAAAAAAADgAiEaAAAAAAAA4AIhGgAAAAAAAOACIRpwFbBYLBozZswl77d3715ZLBbNmTOn1Gu6XPPmzVN4eLg8PT1VrVo1s8u54q1cuVLNmzeXt7e3LBaLjh07ZnZJAAAAAHBFIUQDSsmcOXNksVhksVi0bt26ItsNw1BoaKgsFov+7//+z4QKrxzbt2/Xww8/rHr16mnGjBl67733zC7pivbHH3+od+/e8vHx0dSpUzVv3jxVrly5TM/5448/qlevXqpdu7a8vb0VEhKijh076p133pEkbdmyRRaLRQkJCec9xq+//iqLxaL4+HhJ0pgxY2SxWOTh4aEDBw4U6Z+TkyMfHx9ZLBbFxcWVzYUBAAAAqLAI0YBS5u3trQULFhRpX7t2rf7zn//Iy8vLhKquLGlpaSosLNTkyZP18MMPq3fv3maXdEXbtGmTTpw4oXHjxmngwIF68MEH5enpWWbnW79+vSIjI/XDDz9o0KBBmjJlih599FF5eHho8uTJkqQWLVooPDxc//jHP857nHO/jx588EGndi8vr2L3W7JkSSleBQAAAAA4I0QDSlnnzp21ePFinT171ql9wYIFioiIUHBwsEmVub/c3FxJ0uHDhyWpVB/jzMvLK7VjXWnK4vM8910V59VXX5W/v782bdqkhIQEPfroo3r55Ze1atUqrV+/3tHvgQce0J49e/Tdd98Ve5x//OMfCg8PV4sWLZzaO3fuXGyItmDBAnXp0qWEVwQAAAAAF0aIBpSyvn376o8//tDq1asdbfn5+fr444/Vr1+/YvfJzc3Vs88+q9DQUHl5eal+/fp64403ZBiGU78zZ85o2LBhqlGjhqpUqaJu3brpP//5T7HHPHjwoB555BEFBQXJy8tLjRo10qxZs0p0TeceVf3666/1+OOP69prr1XVqlXVv39//fnnn0X6f/7557r11ltVuXJlValSRV26dNHPP//s1Ofhhx+Wn5+fdu/erc6dO6tKlSp64IEHFBYWpsTERElSjRo1isz3Nm3aNDVq1EheXl6qVauWBg8eXGR+r9tvv12NGzdWRkaGbrvtNvn6+uqll15yzAH3xhtvaOrUqapbt658fX3VqVMnHThwQIZhaNy4cbruuuvk4+Oj7t276+jRo07H/uc//6kuXbqoVq1a8vLyUr169TRu3DjZ7fZia9i2bZvuuOMO+fr6KiQkRK+//nqRz+v06dMaM2aMbrrpJnl7e6tmzZq65557tHv3bkefwsJCJScnq1GjRvL29lZQUJAef/zxYj///61jwIABkqSWLVvKYrHo4YcfdmxfvHixIiIi5OPjo4CAAD344IM6ePDgRX1X57N79241atSo2NAuMDDQ8etzxyhu5GZGRoZ27NhR7Hn69eunrVu3avv27Y62zMxMffnll+f9PQYAAAAAl4sQDShlYWFhat26tdNImc8//1zHjx/X/fffX6S/YRjq1q2b3nrrLd11112aNGmS6tevr+eff94xF9Q5jz76qJKTk9WpUydNmDBBnp6exY68ycrK0i233KI1a9YoLi5OkydP1g033KCBAwcqOTm5xNcWFxenX375RWPGjFH//v01f/589ejRwynsmzdvnrp06SI/Pz+99tprGjVqlLZt26Z27dpp7969Tsc7e/asYmJiFBgYqDfeeEP33nuvkpOT1bNnT0nS9OnTNW/ePN1zzz2S/poTa/DgwapVq5befPNN3XvvvXr33XfVqVMnFRQUOB37jz/+0N13363mzZsrOTlZd9xxh2Pb/PnzNW3aNA0ZMkTPPvus1q5dq969eyshIUErV67Uiy++qMcee0z/+te/9Nxzzzkdd86cOfLz81N8fLwmT56siIgIjR49WsOHDy/yef3555+666671KxZM7355psKDw/Xiy++qM8//9zRx2636//+7//08ssvKyIiQm+++aaGDh2q48eP66effnL0e/zxx/X888+rbdu2mjx5smJjYzV//nzFxMQUufa/GzlypB577DFJ0tixYzVv3jw9/vjjjmvp3bu3rFarkpKSNGjQIC1ZskTt2rUrEkwW912dT+3atZWRkeFUf3Hq1KmjNm3a6KOPPioSQp4L1ooLxW677TZdd911TuHbokWL5Ofnx0g0AAAAAGXHAFAqZs+ebUgyNm3aZEyZMsWoUqWKkZeXZxiGYdx3333GHXfcYRiGYdSuXdvo0qWLY7+lS5cakoxXXnnF6Xi9evUyLBaLsWvXLsMwDGPr1q2GJOOpp55y6tevXz9DkpGYmOhoGzhwoFGzZk0jOzvbqe/9999v+Pv7O+r67bffDEnG7NmzL+raIiIijPz8fEf766+/bkgy/vnPfxqGYRgnTpwwqlWrZgwaNMhp/8zMTMPf39+pfcCAAYYkY/jw4UXOl5iYaEgyjhw54mg7fPiwYbPZjE6dOhl2u93RPmXKFEOSMWvWLEdb+/btDUlGSkqK03HPXW+NGjWMY8eOOdpHjBhhSDKaNWtmFBQUONr79u1r2Gw24/Tp0462c5/d3z3++OOGr6+vU79zNcydO9fRdubMGSM4ONi49957HW2zZs0yJBmTJk0qctzCwkLDMAzjm2++MSQZ8+fPd9q+cuXKYtv/199/Ns/Jz883AgMDjcaNGxunTp1ytC9btsyQZIwePdrRdqHvqjhffPGFYbVaDavVarRu3dp44YUXjFWrVjn97JwzdepUQ5KxatUqR5vdbjdCQkKM1q1bO/X9+8/Fc889Z9xwww2ObS1btjRiY2MNwzAMScbgwYMvqlYAAAAAuFiMRAPKQO/evXXq1CktW7ZMJ06c0LJly877mNmKFStktVr19NNPO7U/++yzMgzDMWppxYoVklSk3zPPPOP03jAMffLJJ+ratasMw1B2drbjFRMTo+PHj2vLli0luq7HHnvMaUL6J598UpUqVXLUtnr1ah07dkx9+/Z1Oq/ValVUVJS++uqrIsd88sknL+rca9asUX5+vp555hl5ePz3f12DBg1S1apVtXz5cqf+Xl5eio2NLfZY9913n/z9/R3vo6KiJP01gX2lSpWc2vPz850eb/Tx8XH8+sSJE8rOztatt96qvLw8p8cLJcnPz89pUnybzaZWrVppz549jrZPPvlEAQEBGjJkSJE6LRaLpL8eufT391fHjh2dPteIiAj5+fkV+7m6snnzZh0+fFhPPfWUvL29He1dunRReHh4kc9TuvjvqmPHjkpPT1e3bt30ww8/6PXXX1dMTIxCQkL02WefOfXt06ePPD09nUaVrV27VgcPHrzgI6P9+vXTrl27tGnTJsd/eZQTAAAAQFmq5LoLgEtVo0YNRUdHa8GCBcrLy5PdblevXr2K7btv3z7VqlVLVapUcWpv0KCBY/u5/3p4eKhevXpO/erXr+/0/siRIzp27Jjee+89vffee8We89xE85fqxhtvdHrv5+enmjVrOh7T/PXXXyVJd955Z7H7V61a1el9pUqVdN11113Uuc99Dv97vTabTXXr1nVsPyckJEQ2m63YY11//fVO788FaqGhocW2/33esZ9//lkJCQn68ssvlZOT49T/+PHjTu+vu+46RxB2TvXq1fXvf//b8X737t2qX7++U3j3v3799VcdP37caT6xvyvJ93m+z1OSwsPDtW7dOqe2S/m
"text/plain": [
"<Figure size 1500x1500 with 5 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAjcAAAGwCAYAAABVdURTAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjYuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8o6BhiAAAACXBIWXMAAA9hAAAPYQGoP6dpAABSBUlEQVR4nO3dd1hT5/s/8HcYCSDTDzJEFPdWVKoVW1dRHHXWihut1bqtfKxbcVSxtlqtizpRv7Y42lpbqXsrDhSsEyuIuFApCrJHnt8f/ZFPI0FzMCEQ3q/rynWRJ885584xkNvn3Od5ZEIIASIiIiIjYWLoAIiIiIh0ickNERERGRUmN0RERGRUmNwQERGRUWFyQ0REREaFyQ0REREZFSY3REREZFTMDB1AcVMqlXj06BFsbGwgk8kMHQ4RERFpQQiBly9fomLFijAxef3YTJlLbh49egR3d3dDh0FERERFcP/+fVSqVOm1fcpccmNjYwPgn5Nja2tr4GiIiIhIGykpKXB3d1d9j79OmUtu8i9F2draMrkhIiIqZbQpKWFBMRERERkVJjdERERkVJjcEBERkVFhckNERERGhckNERERGRUmN0RERGRUmNwQERGRUWFyQ0REREaFyQ0REREZFSY3REREZFQMmtycPHkS3bp1Q8WKFSGTybBnz543bnP8+HE0bdoUCoUCNWrUQEhIiN7jJCIiotLDoMlNWloaGjdujNWrV2vV/+7du+jatSvatWuHqKgofP755/j0009x4MABPUdKREREpYVBF87s3LkzOnfurHX/4OBgVK1aFUuXLgUA1K1bF6dPn8a3334LX19ffYVJRERUgBACGTl5hg6jxLI0N9VqkUt9KFWrgoeHh8PHx0etzdfXF59//nmh22RlZSErK0v1PCUlRV/hERFRGSGEQJ/gcFy699zQoZRYN+b7wkpumDSjVBUUJyQkwNnZWa3N2dkZKSkpyMjI0LhNUFAQ7OzsVA93d/fiCJWIiIxYRk4eE5sSrFSN3BTF9OnTERAQoHqekpLCBIeIiHQmYpYPrOSmhg6jxLE0N9w5KVXJjYuLC548eaLW9uTJE9ja2sLS0lLjNgqFAgqFojjCIyIiAzBE7Ut69v+OZyU3NdjlF9KsVP1rtGzZEmFhYWpthw4dQsuWLQ0UERERGRJrX0gTg9bcpKamIioqClFRUQD+udU7KioK8fHxAP65pDRkyBBV/1GjRiE2NhZTpkzBrVu3sGbNGuzcuROTJk0yRPhERGRghq598ariYNDLL6SZQUduIiIi0K5dO9Xz/NoYf39/hISE4PHjx6pEBwCqVq2Kffv2YdKkSVixYgUqVaqEDRs28DZwIiIySO2LIW93psIZNLlp27YthBCFvq5p9uG2bdsiMjJSj1EREVFpxNoXysdPARGVaJwojV7n34W9RPmY3BBRicViUSIqilI1iR8RlS2GLhal0oOFvfRvHLkholKBE6XR67Cwl/6NyQ0RlQosFiUibfEvBRFpVBIKeVksSkRFweSGiApgIS8RlWYsKCaiAkpaIS+LRYlICo7cENFrlYRCXhaLEpEUTG7IqJSEOhFjwBWPiag0418sMhqsEyEiIoA1N2RESlqdiDFgrQsRlUYcuSGjVBLqRIwBa12IqDRickNGiXUiRERlF//6U6mgTaEwJ3wjIiKAyQ2VAiwUJiIiKVhQTCWe1EJhFsESEZVtHLmhUkWbQmEWwRIRlW1MbqjIimvCPE4oR0REUvBbgoqEdTBERFRSseaGisQQE+axloaIiLTBkRt6a8U1YR5raYiISBtMbuitsQ6GiIhKEn4jkST5RcScMI+IiEoqJjekNRYRExFRacCCYtKapiJiFvkSEVFJw5EbKpL8ImIW+RIRUUnD5IaKhEXERERUUvGyFBERERkVJjdERERkVJjcEBERkVFhckNERERGhckNERERGRUmN0RERGRUmNwQERGRUWFyQ0REREaFs7DRa+UvlAmAi2USEVGpwOSGCsWFMomIqDTiZSkqlKaFMgEulklERCVbkUZucnJykJCQgPT0dFSoUAHly5fXdVxUwuQvlAmAi2USEVGJpvXIzcuXL7F27Vq0adMGtra28PDwQN26dVGhQgVUqVIFI0aMwMWLF/UZKxlQ/kKZVnIzJjZERFSiaZXcLFu2DB4eHti8eTN8fHywZ88eREVF4fbt2wgPD0dgYCByc3PRsWNHdOrUCX/99Ze+4yY9EkIgPTuXBcRERFQqaXVZ6uLFizh58iTq16+v8fXmzZvjk08+QXBwMDZv3oxTp06hZs2aOg2UigeLiImIqLTTKrn58ccftdqZQqHAqFGj3iogMixNRcQsICYiotKEt4JTofKLiFlATEREpYmkW8GvXLmCL7/8EmvWrEFiYqLaaykpKfjkk090GhwZVn4RMRMbIiIqTbRObg4ePIjmzZsjNDQUX331FerUqYNjx46pXs/IyMCWLVv0EiQRERGRtrRObubOnYvJkyfj2rVriIuLw5QpU9C9e3fs379fn/ERERERSaJ1zc3169exbds2AIBMJsOUKVNQqVIl9OnTB6GhoXjnnXf0FiQRERGRtrRObhQKBV68eKHWNmDAAJiYmMDPzw9Lly7VdWxEREREkmmd3Hh6euLYsWNo1qyZWnu/fv0ghIC/v7/Og6Pik7/6NyfuIyKi0k7r5Gb06NE4efKkxtf69+8PIQTWr1+vs8Co+HDiPiIiMiYyIYQwdBDFKSUlBXZ2dkhOToatra2hwykR0rNzUW/OAbU2ryoO2DWqJW8DJyKiEkHK9zcn8SM1nLiPiIhKOyY3ZVR+jQ0AtTqb/In7iIiISit+i5VBrLEhIiJjJmn5BTIOmhbHBLhAJhERGQeDJzerV6+Gh4cHLCws0KJFC1y4cOG1/ZcvX47atWvD0tIS7u7umDRpEjIzM4spWuMTMcsHN+b74sZ8XxYQExGRUShScnPy5ElERESotUVERBR6q3hhduzYgYCAAAQGBuLy5cto3LgxfH198fTpU439f/jhB0ybNg2BgYG4efMmNm7ciB07dmDGjBlFeRuE/9XYcIFMIiIyFkVKbtq2bYshQ4aotQ0ePBjt2rWTtJ9ly5ZhxIgRGDZsGOrVq4fg4GBYWVlh06ZNGvufPXsWrVq1woABA+Dh4YGOHTuif//+rx3tycrKQkpKitqjrBJCID07lxP1ERGRUStScnP37l0cPnxYre3IkSOIjY3Veh/Z2dm4dOkSfHx8/heMiQl8fHwQHh6ucRtvb29cunRJlczExsYiLCwMXbp0KfQ4QUFBsLOzUz3c3d21jtGY5BcR15tzAF5fHn7zBkRERKVUke6WqlKlSoG2ihUrStpHYmIi8vLy4OzsrNbu7OyMW7duadxmwIABSExMxHvvvQchBHJzczFq1KjXXpaaPn06AgICVM9TUlLKZIKjqYiYBcRERGSMStWt4MePH8eiRYuwZs0atGjRAnfu3MHEiROxYMECzJ49W+M2CoUCCoWimCMt2ThRHxERGTOtkhsHBwetvwSTkpK06ufo6AhTU1M8efJErf3JkydwcXHRuM3s2bMxePBgfPrppwCAhg0bIi0tDSNHjsTMmTNhYmLwm79KBU7UR0RExkyrb7jly5fr/MByuRzNmjXDkSNH0LNnTwCAUqnEkSNHMG7cOI3bpKenF0hgTE3/uaxSxpbIIiIiokJoldz4+/vr5eABAQHw9/eHl5cXmjdvjuXLlyMtLQ3Dhg0DAAwZMgRubm4ICgoCAHTr1g3Lli1DkyZNVJelZs+ejW7duqmSHCIiIirbinRtIiYmBps3b0ZMTAxWrFgBJycn/PHHH6hcuTLq16+v9X78/Pzw7NkzzJkzBwkJCfD09MT+/ftVRcbx8fFqIzWzZs2CTCbDrFmz8PDhQ1SoUAHdunXDwoULi/I2iIiIyAjJhMTrOSdOnEDnzp3RqlUrnDx5Ejdv3kS1atWwePFiREREYPfu3fqKVSekLJluTNKzc1FvzgEAwI35vqy5ISKiUkXK97fkCtxp06bhyy+/xKFDhyCXy1Xt7du3x7lz56RHS0RERKRDkpO
"text/plain": [
"<Figure size 640x480 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"from sklearn.svm import SVC\n",
"from sklearn.preprocessing import StandardScaler\n",
"\n",
"# Scale the data\n",
"scaler = StandardScaler()\n",
"X_train_scaled = scaler.fit_transform(X_train)\n",
"X_test_scaled = scaler.transform(X_test)\n",
"# Score the model with default parameters\n",
"\n",
"scores_svm, model_svm = score_the_model(\n",
" model=SVC(),\n",
" model_name='SVM',\n",
" random_seed=42,\n",
" X_train=X_train_scaled,\n",
" X_test=X_test_scaled,\n",
" y_train=y_train,\n",
" y_test=y_test,\n",
" plot=True\n",
")\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0842608e",
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAABNEAAAStCAYAAABqTM3EAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjYuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8o6BhiAAAACXBIWXMAAA9hAAAPYQGoP6dpAADwcUlEQVR4nOzdeVxV1f7/8fdhBgfUEBwiUfM6pwbJdWjwRlIaamWamiKVZYmaZKUpYk5UdgnLgbTUtLzZYNZNc4jUMskBtJuVmiNmgqIpCgp62L8/+rm/nkCPA7BRXs/H4zxyr73W3p99Dhi8XXsvm2EYhgAAAAAAAABckIvVBQAAAAAAAABlHSEaAAAAAAAA4AQhGgAAAAAAAOAEIRoAAAAAAADgBCEaAAAAAAAA4AQhGgAAAAAAAOAEIRoAAAAAAADgBCEaAAAAAAAA4AQhGgAAAAAAAOAEIRquezabTWPHjr3scXv37pXNZtPcuXOLvaarNX/+fDVq1Eju7u6qUqWK1eVc85YtW6aWLVvKy8tLNptNx44ds7qkUtO/f38FBQU5tF3p9wyKxvsJAAAAXB8I0VAq5s6dK5vNJpvNprVr1xbabxiGAgMDZbPZdP/991tQ4bVj27Zt6t+/v+rXr69Zs2Zp5syZVpd0TTty5Ih69Oghb29vTZs2TfPnz1eFChVK/Lx79uxRdHS0/vGPf8jHx0c+Pj5q0qSJBg0apP/9738lfn6rLViwQImJiZfcPygoyPw7xGazycvLSw0aNNDzzz+vo0ePllyhl2jp0qUEZQAAAMB1zs3qAlC+eHl5acGCBWrfvr1D+5o1a/T777/L09PTosquHatXr1ZBQYGmTJmim2++2epyrnkbN27UiRMnNH78eIWFhZXKOb/88kv17NlTbm5u6tOnj1q0aCEXFxdt27ZNixYt0owZM7Rnzx7VqVOnVOr5u1OnTsnNrWT/97BgwQJt3bpVzz777CWPadmypZ577jlJ0unTp5WamqrExEStWbNGGzZsKKFKL83SpUs1bdq0IoO00ng/AQAAAJQ8fqpHqerUqZM+/vhjvfnmmw6/VC5YsEDBwcHKysqysLqyLScnRxUqVNChQ4ckqVhv48zNzZWPj0+xHe9aUhLv57nPqii7du3SI488ojp16ig5OVk1a9Z02P/qq69q+vTpcnG5+EThi53janl5eZXIca9W7dq19eijj5rbTzzxhCpWrKjXX39dv/32mxo0aGBhdRdWVt9PAAAAAJeH2zlRqnr16qUjR45o5cqVZlt+fr4++eQT9e7du8gxOTk5eu655xQYGChPT081bNhQr7/+ugzDcOiXl5enYcOGqXr16qpUqZK6dOmi33//vchjHjhwQI899pgCAgLk6emppk2bavbs2Vd0TeduVf3222/11FNP6YYbblDlypXVr18//fnnn4X6f/XVV7r99ttVoUIFVapUSZ07d9bPP//s0Kd///6qWLGidu3apU6dOqlSpUrq06ePgoKCFBcXJ0mqXr16oWctTZ8+XU2bNpWnp6dq1aqlQYMGFXq+11133aVmzZopNTVVd9xxh3x8fPTSSy+Zz4B7/fXXNW3aNNWrV08+Pj7q2LGj9u/fL8MwNH78eN14443y9vZW165dC91G9/nnn6tz586qVauWPD09Vb9+fY0fP152u73IGn755Rd16NBBPj4+ql27tl577bVC79fp06c1duxY/eMf/5CXl5dq1qypBx98ULt27TL7FBQUKDExUU2bNpWXl5cCAgL01FNPFfn+/72OyMhISdJtt90mm82m/v37m/s//vhjBQcHy9vbW35+fnr00Ud14MCBS/qsLuS1115TTk6O5syZUyhAkyQ3NzcNGTJEgYGBl3SO7777Tg8//LBuuukmeXp6KjAwUMOGDdOpU6cKHXvx4sVq1qyZvLy81KxZM3322WdF1ljUM7wu5Xtm9erVstls+uijjzRx4kTdeOON8vLy0t13362dO3ea/e666y4tWbJE+/btM2/P/Ptz2S5VjRo1JKnQTK9vvvnG/D6rUqWKunbtql9//bXQ+M2bN+u+++5T5cqVVbFiRd1999364YcfHPqcOXNGL7/8sho0aCAvLy/dcMMNat++vfn3WP/+/TVt2jTzvTv3Oufv7+fYsWNls9m0c+dO9e/fX1WqVJGvr6+ioqKUm5vrcO5Tp05pyJAh8vPzM/9eO3DgAM9ZAwAAACzATDSUqqCgILVp00b/+c9/dN9990n6K1Q6fvy4HnnkEb355psO/Q3DUJcuXbRq1So9/vjjatmypZYvX67nn39eBw4c0BtvvGH2feKJJ/T++++rd+/eatu2rb755ht17ty5UA2ZmZn65z//KZvNpujoaFWvXl1fffWVHn/8cWVnZ1/W7WXni46OVpUqVTR27Fht375dM2bM0L59+8xgQfprQYDIyEiFh4fr1VdfVW5urmbMmKH27dtr8+bNDkHC2bNnFR4ervbt2+v111+Xj4+P+vfvr3nz5umzzz7TjBkzVLFiRd1yyy2S/vrF/OWXX1ZYWJiefvpps4aNGzfq+++/l7u7u3nsI0eO6L777tMjjzyiRx99VAEBAea+Dz74QPn5+Ro8eLCOHj2q1157TT169NC//vUvrV69Wi+++KJ27typt956S8OHD3cIUubOnauKFSsqJiZGFStW1DfffKMxY8YoOztbkydPdni//vzzT91777168MEH1aNHD33yySd68cUX1bx5c/Nrw2636/7771dycrIeeeQRDR06VCdOnNDKlSu1detW1a9fX5L01FNPae7cuYqKitKQIUO0Z88eTZ06VZs3by507ecbNWqUGjZsqJkzZ2rcuHGqW7euecxzx7vtttsUHx+vzMxMTZkyRd9//702b97sMHOtqM/qQr788kvdfPPNCg0NvWCfolzoHB9//LFyc3P19NNP64YbbtCGDRv01ltv6ffff9fHH39sjl+xYoUeeughNWnSRPHx8Tpy5IiioqJ04403Oj335X7PvPLKK3JxcdHw4cN1/Phxvfbaa+rTp4/Wr18v6a/3/fjx4/r999/N7+GKFSs6rePMmTPmbNXTp09r8+bNSkhI0B133KG6deua/b7++mvdd999qlevnsaOHatTp07prbfeUrt27ZSWlmZ+n/3888+6/fbbVblyZb3wwgtyd3fX22+/rbvuuktr1qwxP6OxY8cqPj5eTzzxhFq3bq3s7Gxt2rRJaWlpuueee/TUU0/pjz/+0MqVKzV//nyn13FOjx49VLduXcXHxystLU3vvPOO/P399eqrr5p9+vfvr48++kh9+/bVP//5T61Zs6bIv9cAAAAAlAIDKAVz5swxJBkbN240pk6dalSqVMnIzc01DMMwHn74YaNDhw6GYRhGnTp1jM6dO5vjFi9ebEgyJkyY4HC87t27Gzabzdi5c6dhGIaxZcsWQ5LxzDPPOPTr3bu3IcmIi4sz2x5//HGjZs2aRlZWlkPfRx55xPD19TXr2rNnjyHJmDNnziVdW3BwsJGfn2+2v/baa4Yk4/PPPzcMwzBOnDhhVKlSxRgwYIDD+IyMDMPX19ehPTIy0pBkjBgxotD54uLiDEnG4cOHzbZDhw4ZHh4eRseOHQ273W62T5061ZBkzJ4922y78847DUlGUlKSw3HPXW/16tWNY8eOme0jR440JBktWrQwzpw5Y7b36tXL8PDwME6fPm22nXvvzvfUU08ZPj4+Dv3O1TBv3jyzLS8vz6hRo4bx0EMPmW2zZ882JBkJCQmFjltQUGAYhmF89913hiTjgw8+cNi/bNmyItv/7vyvzXPy8/MNf39/o1mzZsapU6fM9i+//NKQZIwZM8Zsu9hn9XfHjx83JBndunUrtO/PP/80Dh8+bL7Ofy8vdo6i3vP4+HjDZrMZ+/btM9tatmxp1KxZ0+GzXbFihSHJqFOnjsP4K/2eWbVqlSHJaNy4sZGXl2f2mzJliiHJ+Omnn8y2zp07FzrvxdSpU8eQVOjVrl27QnW1bNnS8Pf3N44cOWK2/fjjj4aLi4vRr18/s61bt26Gh4eHsWvXLrPtjz/+MCpVqmT
"text/plain": [
"<Figure size 1500x1500 with 5 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAjcAAAGwCAYAAABVdURTAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjYuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8o6BhiAAAACXBIWXMAAA9hAAAPYQGoP6dpAABarUlEQVR4nO3dd1QU19sH8O+C7AJKUZGmKFbsoqJGTTQqiiXWRLFE0RiN3UjUaKJix8SGSVRiRf2ZYIkt0WAhYixYELGhYEGxAEqIIL3sff/wZeMK6A7uUtbv55w9h71zZ+bZYdl9uPPMHZkQQoCIiIhITxgUdwBERERE2sTkhoiIiPQKkxsiIiLSK0xuiIiISK8wuSEiIiK9wuSGiIiI9AqTGyIiItIrZYo7gKKmVCrx+PFjmJmZQSaTFXc4REREpAEhBJ4/fw57e3sYGLx+bOadS24eP34MBweH4g6DiIiICuHBgweoUqXKa/u8c8mNmZkZgBcHx9zcvJijISIiIk0kJSXBwcFB9T3+Ou9ccpN7Ksrc3JzJDRERUSmjSUkJC4qJiIhIrzC5ISIiIr3C5IaIiIj0CpMbIiIi0itMboiIiEivMLkhIiIivcLkhoiIiPQKkxsiIiLSK0xuiIiISK8wuSEiIiK9UqzJzd9//42ePXvC3t4eMpkM+/bte+M6QUFBaNasGRQKBWrVqgU/Pz+dx0lERESlR7EmNykpKWjSpAlWr16tUf+oqCj06NEDHTp0QFhYGL788kt8/vnnOHz4sI4jJSIiotKiWG+c2a1bN3Tr1k3j/r6+vqhevTqWL18OAKhXrx5OnTqFlStXws3NTVdhEhHR/xNCIC0rp7jDoFLAxMhQo5tc6kKpuit4cHAwXF1d1drc3Nzw5ZdfFrhORkYGMjIyVM+TkpJ0FR4RkV4TQuAT32BcvP9vcYdCpUD4fDeYyosnzShVBcWxsbGwsbFRa7OxsUFSUhLS0tLyXcfb2xsWFhaqh4ODQ1GESkSkd9KycpjYUKlQqkZuCmPmzJnw9PRUPU9KSmKCQ0T0lkJmucJUbljcYVAJZmJUfO+PUpXc2NraIi4uTq0tLi4O5ubmMDExyXcdhUIBhUJRFOERERWLoqqDSc38bx+mcsNiO+VA9Cal6p3ZunVrHDp0SK3t6NGjaN26dTFFRERUvFgHQ5RXsdbcJCcnIywsDGFhYQBeXOodFhaG6OhoAC9OKQ0bNkzVf8yYMbh79y6mT5+OmzdvYs2aNdi5cyemTJlSHOETERW74qiDcalWvlhPORC9SbGO3ISEhKBDhw6q57m1MR4eHvDz80NMTIwq0QGA6tWr4+DBg5gyZQpWrVqFKlWqYMOGDbwMnIgIRVcHU5yX+BJpoliTmw8//BBCiAKX5zf78IcffohLly7pMCoiotKJdTBEL/CvgIhIS4pjgruXi3yJ6AUmN0REWsDCXqKSo1RN4kdEVFIV9wR3LPIl+g9HboiItKw4JrhjkS/Rf5jcEBFpGQt7iYoX//qISO+wsJfo3cbkhoj0Cgt7iYgFxUSkV1jYS0QcuSEivcXCXqJ3E5MbItJbLOwlejfxtBQRERHpFSY3REREpFeY3BAREZFeYXJDREREeoWVdkRUokmdkI+T6RERkxsiKrE4IR8RFQZPSxFRifU2E/JxMj2idxdHboioVJA6IR8n0yN6dzG5IaK3osubVL5cP8MJ+YhIU/ykIKJCY00MEZVErLkhokIrqptUsn6GiKTgyA0RaYUub1LJ+hkikoLJDRFpBWtiiKik4CcRkYZ0WThbWnHCPCIqiZjcEGmAhbNERKUHC4qJNFBUhbOlFQt+iagk4cgNkUS6LJwtrVjwS0QlCZMbotfIrbPhZHJERKUHP6GJCsA6GyKi0ok1N0QFyK/OhrUlREQlH0duiDSQW2fD2hIiopKPyQ2RBlhnQ0RUevDTmt55BU3OxwnqiIhKJyY39E5j0TARkf5hQTG90zSZnI9FxEREpQtHboj+X0GT87GImIiodGFyQ3pF6s0tOTkfEZH+4Sc56Q3WzxAREcCaG9Ijb3NzS9bVEBHpj0KN3GRlZSE2NhapqamoVKkSKlSooO24iN6K1Jtbsq6GiEh/aJzcPH/+HP/73//g7++P8+fPIzMzE0IIyGQyVKlSBV26dMHo0aPRokULXcZLpBHWzxARvbs0Oi21YsUKODo6YvPmzXB1dcW+ffsQFhaGyMhIBAcHw8vLC9nZ2ejSpQu6du2KW7du6TpueocJIZCamZ3Pg5PuERGRhiM3Fy5cwN9//40GDRrku7xly5b47LPP4Ovri82bN+PkyZOoXbu2VgMlAlg0TEREb6ZRcvPrr79qtDGFQoExY8a8VUBEr8NJ94iI6E1YlEClFifdIyKi/EhKbi5fvozff/8dFSpUwIABA2BlZaValpSUhC+//BKbNm3SepD0btBkAj5OukdERG8iE0IITToeOXIEPXv2RO3atfH8+XOkpKRg165d6NChAwAgLi4O9vb2yMkp2UWdSUlJsLCwQGJiIszNzYs7HPp/hamlCZ/vxuSGiOgdIeX7W+NJ/ObOnYupU6fi2rVruHfvHqZPn45evXohICDgrQMmkjoBH+tqiIioIBr/23v9+nVs27YNACCTyTB9+nRUqVIFn3zyCfz9/Tm/DWmNJhPwsa6GiIgKonFyo1Ao8OzZM7W2wYMHw8DAAO7u7li+fLm2Y6N3FGtpiIjobWj8DeLs7Izjx4+jefPmau0DBw6EEAIeHh5aD45KNql34H4dTsBHRETaonFyM3bsWPz999/5Lhs0aBCEEFi/fr3WAqOSjZPpERFRSaXx1VL6gldLaUdqZjbqzzms9e26VCuPXWNas56GiIjUSPn+ZmEDvTWpd+B+HRYKExHR22JyQ2+NBcBERFSSaDzPDREREVFpwOSGiIiI9EqxJzerV6+Go6MjjI2N0apVK5w/f/61/X18fODk5AQTExM4ODhgypQpSE9PL6JoiYiIqKQrVHLz999/IyQkRK0tJCSkwEvFC7Jjxw54enrCy8sLoaGhaNKkCdzc3PDkyZN8+//yyy+YMWMGvLy8cOPGDWzcuBE7duzAN998U5iXQURERHqoUMnNhx9+iGHDhqm1DR06VHUTTU2tWLECo0aNwogRI1C/fn34+vrC1NS0wDuLnzlzBm3btsXgwYPh6OiILl26YNCgQa8d7cnIyEBSUpLag4iIiPRXoZKbqKgoHDt2TK0tMDAQd+/e1XgbmZmZuHjxIlxdXf8LxsAArq6uCA4OznedNm3a4OLFi6pk5u7duzh06BC6d+9e4H68vb1hYWGhejg4OGgcIxEREZU+hbp+t1q1anna7O3tJW0jPj4eOTk5sLGxUWu3sbHBzZs3811n8ODBiI+Px/vvvw8hBLKzszFmzJjXnpaaOXMmPD09Vc+TkpKY4BAREemxYi8oliIoKAiLFy/GmjVrEBoaij179uDgwYNYsGBBgesoFAqYm5urPYiIiEh/aTRyU758eY1njU1ISNCon5WVFQwNDREXF6fWHhcXB1tb23zXmT17NoYOHYrPP/8cANCoUSOkpKRg9OjR+Pbbb2FgUKpyNSIiItIBjZIbHx8fre9YLpejefPmCAwMRJ8+fQAASqUSgYGBmDBhQr7rpKam5klgDA1fTPv/jt0ii4iIiAqgUXLj4eGhk517enrCw8MDLi4uaNmyJXx8fJCSkoIRI0YAAIYNG4bKlSvD29sbANCzZ0+sWLECTZs2RatWrXD79m3Mnj0bPXv2VCU5RERE9G4rVEHxnTt3sHnzZty5cwerVq2CtbU1/vzzT1StWhUNGjTQeDvu7u54+vQp5syZg9jYWDg7OyMgIEBVZBwdHa02UjNr1izIZDLMmjULjx49QqVKldCzZ08sWrSoMC+DiIiI9JBMSDyfc+LECXTr1g1t27bF33//jRs3bqBGjRpYsmQJQkJCsHv3bl3FqhVSbplOBUvNzEb9OYcBAOHz3XjjTCIi0ikp39+SK3BnzJiBhQsX4ujRo5DL5ar2jh074uzZs9KjpVJFCIHUzGykZuYUdyhERET5kvzv9tWrV/H
"text/plain": [
"<Figure size 640x480 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"from sklearn.ensemble import GradientBoostingClassifier\n",
"\n",
"# Score the model with default parameters\n",
"score_gb, model_gb = score_the_model(\n",
" model=GradientBoostingClassifier(),\n",
" model_name='Gradient Boosting',\n",
" random_seed=42,\n",
" X_train=X_train,\n",
" X_test=X_test,\n",
" y_train=y_train,\n",
" y_test=y_test,\n",
" plot=True\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4c75c0cd",
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAABNEAAAStCAYAAABqTM3EAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjYuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8o6BhiAAAACXBIWXMAAA9hAAAPYQGoP6dpAADwe0lEQVR4nOzdeVxU5eLH8e+wgwtqCCiSqJl7apjkllkkqWHatUwrkVu2oia3W5gLmiWVxcWbC2mZWpq2326apiSZaVqolXuumApKLmwKwpzfH17PrwkUNeAgfN6v17xynnnOzHeGyvHrc85jMwzDEAAAAAAAAIALcrI6AAAAAAAAAFDRUaIBAAAAAAAAJaBEAwAAAAAAAEpAiQYAAAAAAACUgBINAAAAAAAAKAElGgAAAAAAAFACSjQAAAAAAACgBJRoAAAAAAAAQAko0QAAAAAAAIASUKIB5cxms2nChAmXfdz+/ftls9k0d+7cUs/0V7377rtq3ry5XF1dVatWLavjXPWWLVumdu3aycPDQzabTSdPnrQ60kXNnTtXNptN+/fvtzoKAAAAAJQZSjRUSef/0G+z2bRmzZoijxuGocDAQNlsNt11110WJLx67NixQ0OHDlWTJk00e/ZszZo1y+pIV7Xff/9d9913nzw9PTV9+nS9++67qlatWrm89owZM2Sz2RQSElIurydJQ4cONf9btNlscnFxUWBgoO6//35t27at3HJcyLZt2zRhwgQKQgAAAABysToAYCUPDw8tXLhQXbt2dRj/5ptv9Ntvv8nd3d2iZFeP5ORk2e12TZ06Vdddd53Vca56P/zwg7KysjRp0iSFhoaW62svWLBAQUFB2rBhg3bv3l1uP093d3e99dZbkqSCggLt2bNHiYmJWrZsmbZt26b69euXS47ibNu2TRMnTtStt96qoKAgy3IAAAAAsB4r0VCl9e7dWx9++KEKCgocxhcuXKjg4GD5+/tblKziy8nJkSQdPXpUkkr1NM7c3NxSe66rTVl8nud/Vhezb98+rV27VvHx8apbt64WLFhQaq9fEhcXFz344IN68MEHNXToUE2aNElz587VqVOntGTJknLLAQAAAAAXQ4mGKm3QoEH6/ffftWLFCnMsPz9fH330kQYPHlzsMTk5OfrHP/6hwMBAubu7q1mzZnrttddkGIbDvLy8PI0aNUp169ZVjRo11LdvX/3222/FPuehQ4f097//XX5+fnJ3d1erVq00Z86cK3pP509VXb16tR577DFdc801qlmzpoYMGaITJ04Umf/ll1+qW7duqlatmmrUqKE+ffpo69atDnOGDh2q6tWra8+ePerdu7dq1KihBx54QEFBQYqNjZUk1a1bt8j13mbMmKFWrVrJ3d1d9evX11NPPVXk+l633nqrWrdurZSUFN1yyy3y8vLS888/b14D7rXXXtP06dPVuHFjeXl5qWfPnjp48KAMw9CkSZPUoEEDeXp66u6779bx48cdnvs///mP+vTpo/r168vd3V1NmjTRpEmTVFhYWGyGbdu2qUePHvLy8lJAQIBeffXVIp/XmTNnNGHCBF1//fXy8PBQvXr1dM8992jPnj3mHLvdroSEBLVq1UoeHh7y8/PTY489Vuzn/+ccERERkqSbbrpJNptNQ4cONR//8MMPFRwcLE9PT/n4+OjBBx/UoUOHLulnVZIFCxaodu3a6tOnjwYMGHDBEm3r1q267bbb5OnpqQYNGujFF1+U3W4vMu9SP/sLOV9gu7g4Lpjeu3ev7r33XtWpU0deXl66+eabiy3ajh49qocfflh+fn7y8PBQ27ZtNW/evCLzFi1apODgYNWoUUM1a9ZUmzZtNHXqVEnn/lu69957JUk9evQwTzlNTk6+pPcAAAAAoHLhdE5UaUFBQerUqZPef/999erVS9K5UunUqVO6//779e9//9thvmEY6tu3r1atWqWHH35Y7dq10/Lly/XPf/5Thw4d0r/+9S9z7iOPPKL33ntPgwcPVufOnfX111+rT58+RTKkp6fr5ptvls1mU1RUlOrWrasvv/xSDz/8sDIzM/X0009f0XuLiopSrVq1NGHCBO3cuVMzZ87UgQMHlJycLJvNJunchgAREREKCwvTK6+8otzcXM2cOVNdu3bVpk2bHE5fKygoUFhYmLp27arXXntNXl5eGjp0qObPn69PP/1UM2fOVPXq1XXDDTdIkiZMmKCJEycqNDRUTzzxhJnhhx9+0HfffSdXV1fzuX///Xf16tVL999/vx588EH5+fmZjy1YsED5+fkaPny4jh8/rldffVX33XefbrvtNiUnJ+u5557T7t279cYbb+iZZ55xKB/nzp2r6tWrKzo6WtWrV9fXX3+t8ePHKzMzU1OmTHH4vE6cOKE777xT99xzj+677z599NFHeu6559SmTRvz343CwkLdddddSkpK0v3336+RI0cqKytLK1as0JYtW9SkSRNJ0mOPPaa5c+cqMjJSI0aM0L59+zRt2jRt2rSpyHv/ozFjxqhZs2aaNWuWXnjhBTVq1Mh8zvPPd9NNNykuLk7p6emaOnWqvvvuO23atMlh5VpxP6uSLFiwQPfcc4/c3Nw0aNAg82d10003mXPS0tLUo0cPFRQUKCYmRtWqVdOsWbPk6elZ5Pku57OXpIyMDPMz3rt3r5577jldc801DtckTE9PV+fOnZWbm6sRI0bommuu0bx589S3b1999NFH6t+/vyTp9OnTuvXWW7V7925FRUWpUaNG+vDDDzV06FCdPHlSI0eOlCStWLFCgwYN0u23365XXnlFkrR9+3Z99913GjlypG655RaNGDFC//73v/X888+rRYsWkmT+EwAAAEAVYwBV0DvvvGNIMn744Qdj2rRpRo0aNYzc3FzDMAzj3nvvNXr06GEYhmE0bNjQ6NOnj3ncZ599ZkgyXnzxRYfnGzBggGGz2Yzdu3cbhmEYmzdvNiQZTz75pMO8wYMHG5KM2NhYc+zhhx826tWrZ2RkZDjMvf/++w1vb28z1759+wxJxjvvvHNJ7y04ONjIz883x1999VVDkvGf//zHMAzDyMrKMmrVqmUMGzbM4fi0tDTD29vbYTwiIsKQZMTExBR5vdjYWEOScezYMXPs6NGjhpubm9GzZ0+jsLDQHJ82bZohyZgzZ4451r17d0OSkZiY6PC8599v3bp1jZMnT5rjo0ePNiQZbdu2Nc6ePWuODxo0yHBzczPOnDljjp3/7P7oscceM7y8vBzmnc8wf/58cywvL8/w9/c3/va3v5ljc+bMMSQZ8fHxRZ7XbrcbhmEY3377rSHJWLBggcPjy5YtK3b8z/747+Z5+fn5hq+vr9G6dWvj9OnT5vgXX3xhSDLGjx9vjl3sZ3UhP/74oyHJWLFihfleGjRoYIwcOdJh3tNPP21IMtavX2+OHT161PD29jYkGfv27TPHL/WzP5/3z7eAgAAjJSWl2Nf/9ttvzbGsrCyjUaNGRlBQkPnvWkJCgiHJeO+998x5+fn5RqdOnYzq1asbmZmZhmEYxsiRI42aNWsaBQUFF/xsPvzwQ0OSsWrVqgvOAQAAAFA1cDonqrz77rtPp0+f1hdffKGsrCx98cUXFzyVc+nSpXJ2dtaIESMcxv/xj3/IMAx9+eWX5jxJReb9eVWZYRj6+OOPFR4eLsMwlJGRYd7CwsJ06tQpbdy48Yre16OPPuqw4umJJ56Qi4uLmW3FihU6efKkBg0a5PC6zs7OCgkJ0apVq4o85xNPPHFJr71y5Url5+fr6aeflpPT//9vZtiwYapZs2aR0+/c3d0VGRlZ7HPde++98vb2Nu+f3znywQcfdDjVLyQkRPn5+Q6nN/5xhVRWVpYyMjLUrVs35ebmaseOHQ6vU716dT344IPmfTc3N3Xs2FF79+41xz7++GP5+Pho+PDhRXKeX9334YcfytvbW3fccYfD5xocHKzq1asX+7mW5Mcff9TRo0f15JNPysPDwxzv06ePmjdvXuzpjJf6s5LOrULz8/NTjx49zPcycOBALVq0yOH0y6VLl+rmm29Wx44dzbG6desWe7ro5Xz2Hh4eWrF
"text/plain": [
"<Figure size 1500x1500 with 5 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAjcAAAGwCAYAAABVdURTAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjYuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8o6BhiAAAACXBIWXMAAA9hAAAPYQGoP6dpAABfmUlEQVR4nO3dd1QUV/8G8Gcpu4A0FWmKYu+KSiRijK8GxV5SxI4mMYk9EmPsWGJJLNE3scSK5meCJWpMNNh7F8UGggXEAiii9Lp7f3/4sskK6A7usrI8n3P2HPfOnZlnx0W+3rkzIxNCCBAREREZCRNDByAiIiLSJRY3REREZFRY3BAREZFRYXFDRERERoXFDRERERkVFjdERERkVFjcEBERkVExM3SAkqZSqfDw4UPY2NhAJpMZOg4RERFpQQiB1NRUuLq6wsTk5WMzZa64efjwIdzc3Awdg4iIiIrh3r17qFKlykv7lLnixsbGBsDzg2Nra2vgNERERKSNlJQUuLm5qX+Pv0yZK27yT0XZ2tqyuCEiIipltJlSwgnFREREZFRY3BAREZFRYXFDRERERoXFDRERERkVFjdERERkVFjcEBERkVFhcUNERERGhcUNERERGRUWN0RERGRUWNwQERGRUTFocXPs2DF0794drq6ukMlk2Llz5yvXOXLkCJo3bw6FQoFatWohKChI7zmJiIio9DBocZOeno6mTZti2bJlWvWPjo5G165d0a5dO4SFheHLL7/Ep59+ir179+o5KREREZUWBn1wZufOndG5c2et+69cuRLVq1fHokWLAAD169fHiRMn8MMPP8DX11dfMYmI6DVk5ijxJD3b0DGoBMnNTOBoY2Gw/Zeqp4KfPn0aPj4+Gm2+vr748ssvi1wnOzsb2dn//FClpKToKx4RUZmWnafEncfpiEpI/d8rDVEJqYhNyoAQhk5HJal5VXtsH9HaYPsvVcVNfHw8nJycNNqcnJyQkpKCzMxMWFpaFlhn3rx5mDlzZklFJCIyenlKFWKepCMqIQ2R8am4+SgVkfGpiHmSAaWq8CpGbmoCmayEg5LBmJsa9nqlUlXcFMekSZMQEBCgfp+SkgI3NzcDJiIiKh1UKoF7TzP+V8A8L2SiElJx53E6cpSqQtexsTBDHSeb/72sUdfJBrWdbOBgLYeM1Q2VkFJV3Dg7OyMhIUGjLSEhAba2toWO2gCAQqGAQqEoiXhERKWSEAIPk7MQ9b/iJTIhFTcT0nDzUSqycgsvYqzkpqjtaP1PIeP8vJhxtrVgEUMGV6qKm1atWmHPnj0abfv370erVq0MlIiIjEFKVi72X0/AlfvPUJamhmTmKHHrcRpuJqQhLTuv0D5yMxN1EVP7fyMxdZxsUNneEiYmLGLozWTQ4iYtLQ23bt1Sv4+OjkZYWBgqVKiAqlWrYtKkSXjw4AE2btwIAPjiiy/w008/YcKECfj4449x6NAhbNmyBbt37zbURyCiUio9Ow8HIhLw15U4HI18XORplrLC3FSGGg7W6gKmtpMN6jrboGoFK5iyiKFSxqDFzYULF9CuXTv1+/y5Mf7+/ggKCkJcXBxiY2PVy6tXr47du3dj3LhxWLp0KapUqYI1a9bwMnAi0kpmjhKHIx/hrysPcejGI41TLrUcrdG+niMszE0NmLBkmZvIUL1SOdR1soG7QzmDTwIl0hWZEGXrAr2UlBTY2dkhOTkZtra2ho5DRHqWnafE0cjH+OtKHA5EJCAjR6le5l7RCt2auKJbUxfUdbLhXBGiN5iU39+las4NEZE2cvJUOHkrEX9eeYj91xOQ+q/5JJXtLdGtqQu6N3FFQ1dbFjRERojFDREZlaNRjzHx9yuIS85StznbWqBrExd0a+ICDzd7FjRERo7FDREZhbTsPMzZHYHfzj2fp+dgLUeXxi7o1sQVntXK88oeojKExQ0RlXpn7jzB+K2Xcf9pJgBgiLc7vulUD5bysjM5mIj+weKGiEqtrFwlvg+JxPpT0RDi+XyaBR81gXdNB0NHIyIDYnFDRMUmhMCxm4kIi30GUcK3vxMC+PPKQ9x5nA4A6NfSDZO71IeNhXmJ5iCiNw+LGyKSLDtPiT/CHmLN8TuISkgzaBZHGwW++6AJ2tVzNGgOInpzsLghIq0lZ+Ti/87eRdCpGDxOzQYAlJObwreRM6wMML+lYjkFhrZ2h72VvMT3TURvLhY3RPRK95IysO5kNDafv6e+CZ6zrQWGtnZH35ZVYWfJU0FE9OZgcUNERbpy/xlWHbuDPVfjoPrflJp6zjb47N0a6NbEFXIz3q6fiN48LG6IjEh6dh4O3niEvy4/xOk7T5CnLP4kXwGh8eylNrUd8Nm7NfBOLQfeBI+I3mgsbohKuaxcJQ7feIS/rsTh4I0EjYLkdZmZyNDDwxXD2tRAfRc+i42ISgcWN0SlUHaeEseiEvHn5YcFHgZZraIVujVxQaeGLrC3er25MHZW5rDlpdVEVMqwuCEqJV75MMgmzx810KgyHwZJRGUbixuif0lKz8G+6/HIUeru1M7rEgIIf5iCkOvxSM7MVbc721o8f3ZSUxc048MgiYjUWNwQ/c+tR2nwX3cOD55lGjpKkRysFejS2Bndm7qiRVU+DJKIqDAsbogAXIp9io+DzuNpRi6qlLdEkyp2ho6kwcFagU4NneFVoyJMWdAQEb0Uixsq8w5HPsKI/7uIzFwlmlaxw7ohb6GitcLQsYiIqJhY3FCZtv3ifUzYdgV5KoE2tR2wcmALlFPwx4KIqDTjv+JUZq06dhtz99wAAPTycMX3HzblHXeJiIwAixsqc1QqgXl/R2D18WgAwKfvVMfkLvU5OZeIyEiwuKEyJVepwoRtV7Dj0gMAwKTO9fB525oGTkVERLrE4obKjIycPAz/v4s4GvUYpiYyfP9BE3zQooqhYxERkY6xuCGjoVIJnLr9BM8ycwosEwJYcyIal+89g4W5CVYMaIF29RwNkJKIiPSNxQ0ZhXtJGfh622WcuZP00n72VuZYN+QtNK9avoSSERFRSWNxQ6WaEAKbz9/D7L/CkZ6jhJXctMgb8JW3kuOrjnVQy9GmhFMSEVFJYnFDpVZ8chYmbr+CI5GPAQBvuZfHwo+aolrFcgZORkREhsTihkodIQT+CHuI6X9cQ0pWHuRmJpjgWxdDW1fnowmIiIjFDb2ZDt94hP8euok8pSiwLCtXiZuP0gAATarYYXGfpjzVREREaixu6I30y5m7uBT7rMjlZiYyjHmvNob/pybMTXlXYSIi+geLG3ojKVXPR2w+fac6WtdyKLC8lqM13CpYlXQsIiIqBVjc0Butvost70dDRESScDyfiIiIjAqLGyIiIjIqLG6IiIjIqLC4ISIiIqPC4oaIiIiMCosbIiIiMiosboiIiMioFOs+N7m5uYiPj0dGRgYqVaqEChUq6DoXERERUbFoPXKTmpqKFStWoG3btrC1tYW7uzvq16+PSpUqoVq1ahg2bBjOnz+vz6xEREREr6RVcbN48WK4u7tj/fr18PHxwc6dOxEWFoaoqCicPn0agYGByMvLQ8eOHdGpUyfcvHlT37mJiIiICqXVaanz58/j2LFjaNiwYaHLW7ZsiY8//hgrV67E+vXrcfz4cdSuXVunQansEEIgKT3H0DGIiKiU0qq4+e2337TamEKhwBdffPFagYjWn4zB1QfJkJuZwNO9vKHjEBFRKcOrpeiNcvV+Mub9HQEAmNa1PqpVLGfgREREVNpIKm4uX76Mb7/9FsuXL0diYqLGspSUFHz88cc6DUdlS1p2Hkb/dhG5SgHfhk4Y+HY1Q0ciIqJSSOviZt++fWjZsiWCg4Px3XffoV69ejh8+LB6eWZmJjZs2KCXkGT8hBCYsuMqYp5koLK9Jb7/oClkMpmhYxERUSmkdXEzY8YMjB8/HteuXUNMTAwmTJiAHj16ICQkRJ/5qIzYGnoff4Q9hKmJDP/t5wE7K3NDRyIiolJK65v4Xb9+Hb/88gsAQCaTYcKECahSpQo+/PBDBAcH46233tJbSDJutx6lIvCP6wCAgA510KIabwpJRETFp3Vxo1Ao8OzZM42
"text/plain": [
"<Figure size 640x480 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"from sklearn.ensemble import AdaBoostClassifier\n",
"\n",
"# Score the model with default parameters\n",
"score_ada, model_ada = score_the_model(\n",
" model=AdaBoostClassifier(\n",
" estimator = RandomForestClassifier(),\n",
" n_estimators=500000,\n",
" learning_rate=0.001,\n",
" ),\n",
" model_name='AdaBoost',\n",
" random_seed=42,\n",
" X_train=X_train,\n",
" X_test=X_test,\n",
" y_train=y_train,\n",
" y_test=y_test,\n",
" plot=True\n",
")\n"
2022-12-19 10:09:00 +01:00
]
},
{
"cell_type": "markdown",
"id": "3dafbf40",
"metadata": {},
"source": [
"### 2.3 Evaluation\n",
"Given that the data set is not in the ”big data” category, implement a cross-validation procedure based\n",
"on five folds (approximately equal sized) of your data. Furthermore, repeat the experiment 10 times with\n",
"different folds and average the results (include standard deviation). You are expected to report the following\n",
"metrics:\n",
"- F1\n",
"- Precision\n",
"- Recall\n",
"- AUC\n",
"Comment on the performance of algorithms and visualize their final scores. How do they perform against\n",
"the random baseline? What about the constant one? How do different learning scenarios impact the final\n",
"score? Are the differences between the models statistically significant?"
]
},
{
"cell_type": "markdown",
"id": "1bd730c6",
"metadata": {},
"source": [
"Tle malo u detajle razlozi kko delajo tej scoringi"
]
},
{
"cell_type": "markdown",
"id": "addfc3ea",
"metadata": {},
"source": [
"## Report and presentation\n",
"The assignment has to be submitted in the form of two files: a markdown file and a PDF file created from\n",
"the R Studio markdown file (in RStudio → file - new file - R Markdown), where you write both the code,\n",
"as well as the text of answers (echo = T option must be enabled for each code block). Markdown files can\n",
"easily be exported to PDF using (“Knit”) button in R Studio. If you are using Python, you can produce a\n",
"similar report with Jupyter Notebook."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.8"
},
"vscode": {
"interpreter": {
"hash": "73efbd7de9807940366a2e2c585910074bc00282bd7f8b3dae7eb06897ea8ebf"
}
}
},
"nbformat": 4,
"nbformat_minor": 5
}