2638 lines
3.0 MiB
Plaintext
2638 lines
3.0 MiB
Plaintext
|
{
|
|||
|
"cells": [
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"id": "c093ea0c",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"# Seminar 2: Predicting Biodegradability of Chemical"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"id": "7aa30d7d",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"## 1. Introduction\n",
|
|||
|
"Chemicals are all around us. Studying their properties by the means of machine learning is an active\n",
|
|||
|
"research field; matching molecular patterns with their behavior can be a decisive factor in the creation of\n",
|
|||
|
"new materials, drugs, and more.\n",
|
|||
|
"In this seminar assignment, your task is to explore the data and build machine-learning models that\n",
|
|||
|
"predict the biodegradability of chemicals."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"attachments": {},
|
|||
|
"cell_type": "markdown",
|
|||
|
"id": "aeab08c8",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"## 2. Task\n",
|
|||
|
"You will work with the data set compiled by Mansouri et al. [data](https://www.openml.org/search?type=data&status=active&id=1494&sort=runs). There are 41 features and one target feature (biodegradability).\n",
|
|||
|
"The target variable is encoded as ready biodegradable (1) and not ready biodegradable (2). The data set\n",
|
|||
|
"consists of 1055 instances. Features can be either symbolic or numeric.\n",
|
|||
|
"IMPORTANT: Use the dataset provided on uˇcilnica and NOT the one posted on the link above. It is\n",
|
|||
|
"minimally modified and split into train in test sets.\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"id": "a4f197dd",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"### 2.1 Exploration\n",
|
|||
|
"Inspect the dataset. How balanced is the target variable? Are there any missing values present? If there\n",
|
|||
|
"are, choose a strategy that takes this into account.\n",
|
|||
|
"Most of your data is of the numeric type. Can you identify, by adopting exploratory analysis, whether\n",
|
|||
|
"some features are directly related to the target? What about feature pairs? Produce at least three types of\n",
|
|||
|
"visualizations of the feature space and be prepared to argue why these visualizations were useful for your\n",
|
|||
|
"subsequent analysis."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 1,
|
|||
|
"id": "5bcf6290",
|
|||
|
"metadata": {},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"# Needed imports\n",
|
|||
|
"import numpy as np\n",
|
|||
|
"import pandas as pd\n",
|
|||
|
"import matplotlib.pyplot as plt\n",
|
|||
|
"import sklearn\n",
|
|||
|
"import seaborn as sns\n",
|
|||
|
"import scikitplot as skplt\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 2,
|
|||
|
"id": "18ff4f76",
|
|||
|
"metadata": {},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"df_train = pd.read_csv('train.csv')\n",
|
|||
|
"df_test = pd.read_csv('test.csv')"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"id": "ea26bfdf",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"#### Lets inspect training and test data"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 3,
|
|||
|
"id": "5933f4d7",
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"text/html": [
|
|||
|
"<div>\n",
|
|||
|
"<style scoped>\n",
|
|||
|
" .dataframe tbody tr th:only-of-type {\n",
|
|||
|
" vertical-align: middle;\n",
|
|||
|
" }\n",
|
|||
|
"\n",
|
|||
|
" .dataframe tbody tr th {\n",
|
|||
|
" vertical-align: top;\n",
|
|||
|
" }\n",
|
|||
|
"\n",
|
|||
|
" .dataframe thead th {\n",
|
|||
|
" text-align: right;\n",
|
|||
|
" }\n",
|
|||
|
"</style>\n",
|
|||
|
"<table border=\"1\" class=\"dataframe\">\n",
|
|||
|
" <thead>\n",
|
|||
|
" <tr style=\"text-align: right;\">\n",
|
|||
|
" <th></th>\n",
|
|||
|
" <th>V1</th>\n",
|
|||
|
" <th>V2</th>\n",
|
|||
|
" <th>V3</th>\n",
|
|||
|
" <th>V4</th>\n",
|
|||
|
" <th>V5</th>\n",
|
|||
|
" <th>V6</th>\n",
|
|||
|
" <th>V7</th>\n",
|
|||
|
" <th>V8</th>\n",
|
|||
|
" <th>V9</th>\n",
|
|||
|
" <th>V10</th>\n",
|
|||
|
" <th>...</th>\n",
|
|||
|
" <th>V33</th>\n",
|
|||
|
" <th>V34</th>\n",
|
|||
|
" <th>V35</th>\n",
|
|||
|
" <th>V36</th>\n",
|
|||
|
" <th>V37</th>\n",
|
|||
|
" <th>V38</th>\n",
|
|||
|
" <th>V39</th>\n",
|
|||
|
" <th>V40</th>\n",
|
|||
|
" <th>V41</th>\n",
|
|||
|
" <th>Class</th>\n",
|
|||
|
" </tr>\n",
|
|||
|
" </thead>\n",
|
|||
|
" <tbody>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>1</th>\n",
|
|||
|
" <td>3.919</td>\n",
|
|||
|
" <td>2.6909</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>31.4</td>\n",
|
|||
|
" <td>2</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>2.949</td>\n",
|
|||
|
" <td>1.591</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>7.253</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>2</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>2</th>\n",
|
|||
|
" <td>4.170</td>\n",
|
|||
|
" <td>2.1144</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>30.8</td>\n",
|
|||
|
" <td>1</td>\n",
|
|||
|
" <td>1</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>3.315</td>\n",
|
|||
|
" <td>1.967</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>7.257</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>2</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>4</th>\n",
|
|||
|
" <td>3.000</td>\n",
|
|||
|
" <td>2.7098</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>20.0</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>2</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>1</td>\n",
|
|||
|
" <td>3.046</td>\n",
|
|||
|
" <td>5.000</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>6.690</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>2</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>13</th>\n",
|
|||
|
" <td>4.214</td>\n",
|
|||
|
" <td>2.6272</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>30.0</td>\n",
|
|||
|
" <td>3</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>2.998</td>\n",
|
|||
|
" <td>1.722</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>6.770</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>2</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>16</th>\n",
|
|||
|
" <td>3.942</td>\n",
|
|||
|
" <td>2.7719</td>\n",
|
|||
|
" <td>1</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>31.6</td>\n",
|
|||
|
" <td>2</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>3.542</td>\n",
|
|||
|
" <td>1.739</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>8.127</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>1</td>\n",
|
|||
|
" <td>2</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" </tbody>\n",
|
|||
|
"</table>\n",
|
|||
|
"<p>5 rows × 42 columns</p>\n",
|
|||
|
"</div>"
|
|||
|
],
|
|||
|
"text/plain": [
|
|||
|
" V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 ... V33 V34 V35 \\\n",
|
|||
|
"1 3.919 2.6909 0 0 0 0 0 31.4 2 0 ... 0 0 0 \n",
|
|||
|
"2 4.170 2.1144 0 0 0 0 0 30.8 1 1 ... 0 0 0 \n",
|
|||
|
"4 3.000 2.7098 0 0 0 0 0 20.0 0 2 ... 0 0 1 \n",
|
|||
|
"13 4.214 2.6272 0 0 0 0 0 30.0 3 0 ... 0 0 0 \n",
|
|||
|
"16 3.942 2.7719 1 0 0 0 0 31.6 2 0 ... 0 0 0 \n",
|
|||
|
"\n",
|
|||
|
" V36 V37 V38 V39 V40 V41 Class \n",
|
|||
|
"1 2.949 1.591 0 7.253 0 0 2 \n",
|
|||
|
"2 3.315 1.967 0 7.257 0 0 2 \n",
|
|||
|
"4 3.046 5.000 0 6.690 0 0 2 \n",
|
|||
|
"13 2.998 1.722 0 6.770 0 0 2 \n",
|
|||
|
"16 3.542 1.739 0 8.127 0 1 2 \n",
|
|||
|
"\n",
|
|||
|
"[5 rows x 42 columns]"
|
|||
|
]
|
|||
|
},
|
|||
|
"execution_count": 3,
|
|||
|
"metadata": {},
|
|||
|
"output_type": "execute_result"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"df_test.head()"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 4,
|
|||
|
"id": "1743d191",
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"text/html": [
|
|||
|
"<div>\n",
|
|||
|
"<style scoped>\n",
|
|||
|
" .dataframe tbody tr th:only-of-type {\n",
|
|||
|
" vertical-align: middle;\n",
|
|||
|
" }\n",
|
|||
|
"\n",
|
|||
|
" .dataframe tbody tr th {\n",
|
|||
|
" vertical-align: top;\n",
|
|||
|
" }\n",
|
|||
|
"\n",
|
|||
|
" .dataframe thead th {\n",
|
|||
|
" text-align: right;\n",
|
|||
|
" }\n",
|
|||
|
"</style>\n",
|
|||
|
"<table border=\"1\" class=\"dataframe\">\n",
|
|||
|
" <thead>\n",
|
|||
|
" <tr style=\"text-align: right;\">\n",
|
|||
|
" <th></th>\n",
|
|||
|
" <th>V1</th>\n",
|
|||
|
" <th>V2</th>\n",
|
|||
|
" <th>V3</th>\n",
|
|||
|
" <th>V4</th>\n",
|
|||
|
" <th>V5</th>\n",
|
|||
|
" <th>V6</th>\n",
|
|||
|
" <th>V7</th>\n",
|
|||
|
" <th>V8</th>\n",
|
|||
|
" <th>V9</th>\n",
|
|||
|
" <th>V10</th>\n",
|
|||
|
" <th>...</th>\n",
|
|||
|
" <th>V33</th>\n",
|
|||
|
" <th>V34</th>\n",
|
|||
|
" <th>V35</th>\n",
|
|||
|
" <th>V36</th>\n",
|
|||
|
" <th>V37</th>\n",
|
|||
|
" <th>V38</th>\n",
|
|||
|
" <th>V39</th>\n",
|
|||
|
" <th>V40</th>\n",
|
|||
|
" <th>V41</th>\n",
|
|||
|
" <th>Class</th>\n",
|
|||
|
" </tr>\n",
|
|||
|
" </thead>\n",
|
|||
|
" <tbody>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>count</th>\n",
|
|||
|
" <td>846.000000</td>\n",
|
|||
|
" <td>846.000000</td>\n",
|
|||
|
" <td>846.000000</td>\n",
|
|||
|
" <td>821.000000</td>\n",
|
|||
|
" <td>846.000000</td>\n",
|
|||
|
" <td>846.000000</td>\n",
|
|||
|
" <td>846.000000</td>\n",
|
|||
|
" <td>846.000000</td>\n",
|
|||
|
" <td>846.000000</td>\n",
|
|||
|
" <td>846.000000</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>846.000000</td>\n",
|
|||
|
" <td>846.000000</td>\n",
|
|||
|
" <td>846.000000</td>\n",
|
|||
|
" <td>846.000000</td>\n",
|
|||
|
" <td>821.000000</td>\n",
|
|||
|
" <td>846.000000</td>\n",
|
|||
|
" <td>846.000000</td>\n",
|
|||
|
" <td>846.000000</td>\n",
|
|||
|
" <td>846.000000</td>\n",
|
|||
|
" <td>846.000000</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>mean</th>\n",
|
|||
|
" <td>4.790476</td>\n",
|
|||
|
" <td>3.054551</td>\n",
|
|||
|
" <td>0.739953</td>\n",
|
|||
|
" <td>0.030451</td>\n",
|
|||
|
" <td>0.946809</td>\n",
|
|||
|
" <td>0.277778</td>\n",
|
|||
|
" <td>1.669031</td>\n",
|
|||
|
" <td>37.422813</td>\n",
|
|||
|
" <td>1.342790</td>\n",
|
|||
|
" <td>1.784870</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>0.903073</td>\n",
|
|||
|
" <td>1.241135</td>\n",
|
|||
|
" <td>0.926714</td>\n",
|
|||
|
" <td>3.922100</td>\n",
|
|||
|
" <td>2.549406</td>\n",
|
|||
|
" <td>0.671395</td>\n",
|
|||
|
" <td>8.643191</td>\n",
|
|||
|
" <td>0.059102</td>\n",
|
|||
|
" <td>0.706856</td>\n",
|
|||
|
" <td>1.333333</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>std</th>\n",
|
|||
|
" <td>0.531991</td>\n",
|
|||
|
" <td>0.813983</td>\n",
|
|||
|
" <td>1.504545</td>\n",
|
|||
|
" <td>0.198281</td>\n",
|
|||
|
" <td>2.318081</td>\n",
|
|||
|
" <td>1.045544</td>\n",
|
|||
|
" <td>2.220221</td>\n",
|
|||
|
" <td>9.030008</td>\n",
|
|||
|
" <td>2.018433</td>\n",
|
|||
|
" <td>1.773856</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>1.526124</td>\n",
|
|||
|
" <td>2.248684</td>\n",
|
|||
|
" <td>1.239133</td>\n",
|
|||
|
" <td>0.992636</td>\n",
|
|||
|
" <td>0.625021</td>\n",
|
|||
|
" <td>1.093633</td>\n",
|
|||
|
" <td>1.223700</td>\n",
|
|||
|
" <td>0.342364</td>\n",
|
|||
|
" <td>2.145396</td>\n",
|
|||
|
" <td>0.471683</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>min</th>\n",
|
|||
|
" <td>2.000000</td>\n",
|
|||
|
" <td>0.803900</td>\n",
|
|||
|
" <td>0.000000</td>\n",
|
|||
|
" <td>0.000000</td>\n",
|
|||
|
" <td>0.000000</td>\n",
|
|||
|
" <td>0.000000</td>\n",
|
|||
|
" <td>0.000000</td>\n",
|
|||
|
" <td>9.100000</td>\n",
|
|||
|
" <td>0.000000</td>\n",
|
|||
|
" <td>0.000000</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>0.000000</td>\n",
|
|||
|
" <td>0.000000</td>\n",
|
|||
|
" <td>0.000000</td>\n",
|
|||
|
" <td>2.279000</td>\n",
|
|||
|
" <td>1.467000</td>\n",
|
|||
|
" <td>0.000000</td>\n",
|
|||
|
" <td>4.948000</td>\n",
|
|||
|
" <td>0.000000</td>\n",
|
|||
|
" <td>0.000000</td>\n",
|
|||
|
" <td>1.000000</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>25%</th>\n",
|
|||
|
" <td>4.499000</td>\n",
|
|||
|
" <td>2.510175</td>\n",
|
|||
|
" <td>0.000000</td>\n",
|
|||
|
" <td>0.000000</td>\n",
|
|||
|
" <td>0.000000</td>\n",
|
|||
|
" <td>0.000000</td>\n",
|
|||
|
" <td>0.000000</td>\n",
|
|||
|
" <td>30.800000</td>\n",
|
|||
|
" <td>0.000000</td>\n",
|
|||
|
" <td>0.000000</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>0.000000</td>\n",
|
|||
|
" <td>0.000000</td>\n",
|
|||
|
" <td>0.000000</td>\n",
|
|||
|
" <td>3.497000</td>\n",
|
|||
|
" <td>2.101000</td>\n",
|
|||
|
" <td>0.000000</td>\n",
|
|||
|
" <td>8.009500</td>\n",
|
|||
|
" <td>0.000000</td>\n",
|
|||
|
" <td>0.000000</td>\n",
|
|||
|
" <td>1.000000</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>50%</th>\n",
|
|||
|
" <td>4.840000</td>\n",
|
|||
|
" <td>3.052400</td>\n",
|
|||
|
" <td>0.000000</td>\n",
|
|||
|
" <td>0.000000</td>\n",
|
|||
|
" <td>0.000000</td>\n",
|
|||
|
" <td>0.000000</td>\n",
|
|||
|
" <td>1.000000</td>\n",
|
|||
|
" <td>37.850000</td>\n",
|
|||
|
" <td>1.000000</td>\n",
|
|||
|
" <td>1.500000</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>0.000000</td>\n",
|
|||
|
" <td>0.000000</td>\n",
|
|||
|
" <td>1.000000</td>\n",
|
|||
|
" <td>3.732500</td>\n",
|
|||
|
" <td>2.461000</td>\n",
|
|||
|
" <td>0.000000</td>\n",
|
|||
|
" <td>8.508000</td>\n",
|
|||
|
" <td>0.000000</td>\n",
|
|||
|
" <td>0.000000</td>\n",
|
|||
|
" <td>1.000000</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>75%</th>\n",
|
|||
|
" <td>5.119000</td>\n",
|
|||
|
" <td>3.415725</td>\n",
|
|||
|
" <td>1.000000</td>\n",
|
|||
|
" <td>0.000000</td>\n",
|
|||
|
" <td>1.000000</td>\n",
|
|||
|
" <td>0.000000</td>\n",
|
|||
|
" <td>3.000000</td>\n",
|
|||
|
" <td>43.800000</td>\n",
|
|||
|
" <td>2.000000</td>\n",
|
|||
|
" <td>3.000000</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>1.000000</td>\n",
|
|||
|
" <td>2.000000</td>\n",
|
|||
|
" <td>1.000000</td>\n",
|
|||
|
" <td>3.980000</td>\n",
|
|||
|
" <td>2.861000</td>\n",
|
|||
|
" <td>1.000000</td>\n",
|
|||
|
" <td>9.019750</td>\n",
|
|||
|
" <td>0.000000</td>\n",
|
|||
|
" <td>0.000000</td>\n",
|
|||
|
" <td>2.000000</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>max</th>\n",
|
|||
|
" <td>6.496000</td>\n",
|
|||
|
" <td>7.918400</td>\n",
|
|||
|
" <td>12.000000</td>\n",
|
|||
|
" <td>2.000000</td>\n",
|
|||
|
" <td>36.000000</td>\n",
|
|||
|
" <td>13.000000</td>\n",
|
|||
|
" <td>18.000000</td>\n",
|
|||
|
" <td>60.700000</td>\n",
|
|||
|
" <td>24.000000</td>\n",
|
|||
|
" <td>12.000000</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>12.000000</td>\n",
|
|||
|
" <td>18.000000</td>\n",
|
|||
|
" <td>7.000000</td>\n",
|
|||
|
" <td>10.695000</td>\n",
|
|||
|
" <td>5.750000</td>\n",
|
|||
|
" <td>8.000000</td>\n",
|
|||
|
" <td>14.700000</td>\n",
|
|||
|
" <td>4.000000</td>\n",
|
|||
|
" <td>27.000000</td>\n",
|
|||
|
" <td>2.000000</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" </tbody>\n",
|
|||
|
"</table>\n",
|
|||
|
"<p>8 rows × 42 columns</p>\n",
|
|||
|
"</div>"
|
|||
|
],
|
|||
|
"text/plain": [
|
|||
|
" V1 V2 V3 V4 V5 V6 \\\n",
|
|||
|
"count 846.000000 846.000000 846.000000 821.000000 846.000000 846.000000 \n",
|
|||
|
"mean 4.790476 3.054551 0.739953 0.030451 0.946809 0.277778 \n",
|
|||
|
"std 0.531991 0.813983 1.504545 0.198281 2.318081 1.045544 \n",
|
|||
|
"min 2.000000 0.803900 0.000000 0.000000 0.000000 0.000000 \n",
|
|||
|
"25% 4.499000 2.510175 0.000000 0.000000 0.000000 0.000000 \n",
|
|||
|
"50% 4.840000 3.052400 0.000000 0.000000 0.000000 0.000000 \n",
|
|||
|
"75% 5.119000 3.415725 1.000000 0.000000 1.000000 0.000000 \n",
|
|||
|
"max 6.496000 7.918400 12.000000 2.000000 36.000000 13.000000 \n",
|
|||
|
"\n",
|
|||
|
" V7 V8 V9 V10 ... V33 \\\n",
|
|||
|
"count 846.000000 846.000000 846.000000 846.000000 ... 846.000000 \n",
|
|||
|
"mean 1.669031 37.422813 1.342790 1.784870 ... 0.903073 \n",
|
|||
|
"std 2.220221 9.030008 2.018433 1.773856 ... 1.526124 \n",
|
|||
|
"min 0.000000 9.100000 0.000000 0.000000 ... 0.000000 \n",
|
|||
|
"25% 0.000000 30.800000 0.000000 0.000000 ... 0.000000 \n",
|
|||
|
"50% 1.000000 37.850000 1.000000 1.500000 ... 0.000000 \n",
|
|||
|
"75% 3.000000 43.800000 2.000000 3.000000 ... 1.000000 \n",
|
|||
|
"max 18.000000 60.700000 24.000000 12.000000 ... 12.000000 \n",
|
|||
|
"\n",
|
|||
|
" V34 V35 V36 V37 V38 V39 \\\n",
|
|||
|
"count 846.000000 846.000000 846.000000 821.000000 846.000000 846.000000 \n",
|
|||
|
"mean 1.241135 0.926714 3.922100 2.549406 0.671395 8.643191 \n",
|
|||
|
"std 2.248684 1.239133 0.992636 0.625021 1.093633 1.223700 \n",
|
|||
|
"min 0.000000 0.000000 2.279000 1.467000 0.000000 4.948000 \n",
|
|||
|
"25% 0.000000 0.000000 3.497000 2.101000 0.000000 8.009500 \n",
|
|||
|
"50% 0.000000 1.000000 3.732500 2.461000 0.000000 8.508000 \n",
|
|||
|
"75% 2.000000 1.000000 3.980000 2.861000 1.000000 9.019750 \n",
|
|||
|
"max 18.000000 7.000000 10.695000 5.750000 8.000000 14.700000 \n",
|
|||
|
"\n",
|
|||
|
" V40 V41 Class \n",
|
|||
|
"count 846.000000 846.000000 846.000000 \n",
|
|||
|
"mean 0.059102 0.706856 1.333333 \n",
|
|||
|
"std 0.342364 2.145396 0.471683 \n",
|
|||
|
"min 0.000000 0.000000 1.000000 \n",
|
|||
|
"25% 0.000000 0.000000 1.000000 \n",
|
|||
|
"50% 0.000000 0.000000 1.000000 \n",
|
|||
|
"75% 0.000000 0.000000 2.000000 \n",
|
|||
|
"max 4.000000 27.000000 2.000000 \n",
|
|||
|
"\n",
|
|||
|
"[8 rows x 42 columns]"
|
|||
|
]
|
|||
|
},
|
|||
|
"execution_count": 4,
|
|||
|
"metadata": {},
|
|||
|
"output_type": "execute_result"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"df_train.describe()"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 5,
|
|||
|
"id": "b2689ec0",
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"<class 'pandas.core.frame.DataFrame'>\n",
|
|||
|
"Int64Index: 846 entries, 3 to 1055\n",
|
|||
|
"Data columns (total 42 columns):\n",
|
|||
|
" # Column Non-Null Count Dtype \n",
|
|||
|
"--- ------ -------------- ----- \n",
|
|||
|
" 0 V1 846 non-null float64\n",
|
|||
|
" 1 V2 846 non-null float64\n",
|
|||
|
" 2 V3 846 non-null int64 \n",
|
|||
|
" 3 V4 821 non-null float64\n",
|
|||
|
" 4 V5 846 non-null int64 \n",
|
|||
|
" 5 V6 846 non-null int64 \n",
|
|||
|
" 6 V7 846 non-null int64 \n",
|
|||
|
" 7 V8 846 non-null float64\n",
|
|||
|
" 8 V9 846 non-null int64 \n",
|
|||
|
" 9 V10 846 non-null int64 \n",
|
|||
|
" 10 V11 846 non-null int64 \n",
|
|||
|
" 11 V12 846 non-null float64\n",
|
|||
|
" 12 V13 846 non-null float64\n",
|
|||
|
" 13 V14 846 non-null float64\n",
|
|||
|
" 14 V15 846 non-null float64\n",
|
|||
|
" 15 V16 846 non-null int64 \n",
|
|||
|
" 16 V17 846 non-null float64\n",
|
|||
|
" 17 V18 846 non-null float64\n",
|
|||
|
" 18 V19 846 non-null int64 \n",
|
|||
|
" 19 V20 846 non-null int64 \n",
|
|||
|
" 20 V21 846 non-null int64 \n",
|
|||
|
" 21 V22 830 non-null float64\n",
|
|||
|
" 22 V23 846 non-null int64 \n",
|
|||
|
" 23 V24 846 non-null int64 \n",
|
|||
|
" 24 V25 846 non-null int64 \n",
|
|||
|
" 25 V26 846 non-null int64 \n",
|
|||
|
" 26 V27 838 non-null float64\n",
|
|||
|
" 27 V28 846 non-null float64\n",
|
|||
|
" 28 V29 838 non-null float64\n",
|
|||
|
" 29 V30 846 non-null float64\n",
|
|||
|
" 30 V31 846 non-null float64\n",
|
|||
|
" 31 V32 846 non-null int64 \n",
|
|||
|
" 32 V33 846 non-null int64 \n",
|
|||
|
" 33 V34 846 non-null int64 \n",
|
|||
|
" 34 V35 846 non-null int64 \n",
|
|||
|
" 35 V36 846 non-null float64\n",
|
|||
|
" 36 V37 821 non-null float64\n",
|
|||
|
" 37 V38 846 non-null int64 \n",
|
|||
|
" 38 V39 846 non-null float64\n",
|
|||
|
" 39 V40 846 non-null int64 \n",
|
|||
|
" 40 V41 846 non-null int64 \n",
|
|||
|
" 41 Class 846 non-null int64 \n",
|
|||
|
"dtypes: float64(19), int64(23)\n",
|
|||
|
"memory usage: 284.2 KB\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"df_train.info()"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 6,
|
|||
|
"id": "22003f33",
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"text/html": [
|
|||
|
"<div>\n",
|
|||
|
"<style scoped>\n",
|
|||
|
" .dataframe tbody tr th:only-of-type {\n",
|
|||
|
" vertical-align: middle;\n",
|
|||
|
" }\n",
|
|||
|
"\n",
|
|||
|
" .dataframe tbody tr th {\n",
|
|||
|
" vertical-align: top;\n",
|
|||
|
" }\n",
|
|||
|
"\n",
|
|||
|
" .dataframe thead th {\n",
|
|||
|
" text-align: right;\n",
|
|||
|
" }\n",
|
|||
|
"</style>\n",
|
|||
|
"<table border=\"1\" class=\"dataframe\">\n",
|
|||
|
" <thead>\n",
|
|||
|
" <tr style=\"text-align: right;\">\n",
|
|||
|
" <th></th>\n",
|
|||
|
" <th>V1</th>\n",
|
|||
|
" <th>V2</th>\n",
|
|||
|
" <th>V3</th>\n",
|
|||
|
" <th>V4</th>\n",
|
|||
|
" <th>V5</th>\n",
|
|||
|
" <th>V6</th>\n",
|
|||
|
" <th>V7</th>\n",
|
|||
|
" <th>V8</th>\n",
|
|||
|
" <th>V9</th>\n",
|
|||
|
" <th>V10</th>\n",
|
|||
|
" <th>...</th>\n",
|
|||
|
" <th>V33</th>\n",
|
|||
|
" <th>V34</th>\n",
|
|||
|
" <th>V35</th>\n",
|
|||
|
" <th>V36</th>\n",
|
|||
|
" <th>V37</th>\n",
|
|||
|
" <th>V38</th>\n",
|
|||
|
" <th>V39</th>\n",
|
|||
|
" <th>V40</th>\n",
|
|||
|
" <th>V41</th>\n",
|
|||
|
" <th>Class</th>\n",
|
|||
|
" </tr>\n",
|
|||
|
" </thead>\n",
|
|||
|
" <tbody>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>1</th>\n",
|
|||
|
" <td>3.919</td>\n",
|
|||
|
" <td>2.6909</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>31.4</td>\n",
|
|||
|
" <td>2</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>2.949</td>\n",
|
|||
|
" <td>1.591</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>7.253</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>2</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>2</th>\n",
|
|||
|
" <td>4.170</td>\n",
|
|||
|
" <td>2.1144</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>30.8</td>\n",
|
|||
|
" <td>1</td>\n",
|
|||
|
" <td>1</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>3.315</td>\n",
|
|||
|
" <td>1.967</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>7.257</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>2</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>4</th>\n",
|
|||
|
" <td>3.000</td>\n",
|
|||
|
" <td>2.7098</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>20.0</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>2</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>1</td>\n",
|
|||
|
" <td>3.046</td>\n",
|
|||
|
" <td>5.000</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>6.690</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>2</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>13</th>\n",
|
|||
|
" <td>4.214</td>\n",
|
|||
|
" <td>2.6272</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>30.0</td>\n",
|
|||
|
" <td>3</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>2.998</td>\n",
|
|||
|
" <td>1.722</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>6.770</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>2</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>16</th>\n",
|
|||
|
" <td>3.942</td>\n",
|
|||
|
" <td>2.7719</td>\n",
|
|||
|
" <td>1</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>31.6</td>\n",
|
|||
|
" <td>2</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>3.542</td>\n",
|
|||
|
" <td>1.739</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>8.127</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>1</td>\n",
|
|||
|
" <td>2</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" </tbody>\n",
|
|||
|
"</table>\n",
|
|||
|
"<p>5 rows × 42 columns</p>\n",
|
|||
|
"</div>"
|
|||
|
],
|
|||
|
"text/plain": [
|
|||
|
" V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 ... V33 V34 V35 \\\n",
|
|||
|
"1 3.919 2.6909 0 0 0 0 0 31.4 2 0 ... 0 0 0 \n",
|
|||
|
"2 4.170 2.1144 0 0 0 0 0 30.8 1 1 ... 0 0 0 \n",
|
|||
|
"4 3.000 2.7098 0 0 0 0 0 20.0 0 2 ... 0 0 1 \n",
|
|||
|
"13 4.214 2.6272 0 0 0 0 0 30.0 3 0 ... 0 0 0 \n",
|
|||
|
"16 3.942 2.7719 1 0 0 0 0 31.6 2 0 ... 0 0 0 \n",
|
|||
|
"\n",
|
|||
|
" V36 V37 V38 V39 V40 V41 Class \n",
|
|||
|
"1 2.949 1.591 0 7.253 0 0 2 \n",
|
|||
|
"2 3.315 1.967 0 7.257 0 0 2 \n",
|
|||
|
"4 3.046 5.000 0 6.690 0 0 2 \n",
|
|||
|
"13 2.998 1.722 0 6.770 0 0 2 \n",
|
|||
|
"16 3.542 1.739 0 8.127 0 1 2 \n",
|
|||
|
"\n",
|
|||
|
"[5 rows x 42 columns]"
|
|||
|
]
|
|||
|
},
|
|||
|
"execution_count": 6,
|
|||
|
"metadata": {},
|
|||
|
"output_type": "execute_result"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"df_test.head()"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 7,
|
|||
|
"id": "d7235214",
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"text/html": [
|
|||
|
"<div>\n",
|
|||
|
"<style scoped>\n",
|
|||
|
" .dataframe tbody tr th:only-of-type {\n",
|
|||
|
" vertical-align: middle;\n",
|
|||
|
" }\n",
|
|||
|
"\n",
|
|||
|
" .dataframe tbody tr th {\n",
|
|||
|
" vertical-align: top;\n",
|
|||
|
" }\n",
|
|||
|
"\n",
|
|||
|
" .dataframe thead th {\n",
|
|||
|
" text-align: right;\n",
|
|||
|
" }\n",
|
|||
|
"</style>\n",
|
|||
|
"<table border=\"1\" class=\"dataframe\">\n",
|
|||
|
" <thead>\n",
|
|||
|
" <tr style=\"text-align: right;\">\n",
|
|||
|
" <th></th>\n",
|
|||
|
" <th>V1</th>\n",
|
|||
|
" <th>V2</th>\n",
|
|||
|
" <th>V3</th>\n",
|
|||
|
" <th>V4</th>\n",
|
|||
|
" <th>V5</th>\n",
|
|||
|
" <th>V6</th>\n",
|
|||
|
" <th>V7</th>\n",
|
|||
|
" <th>V8</th>\n",
|
|||
|
" <th>V9</th>\n",
|
|||
|
" <th>V10</th>\n",
|
|||
|
" <th>...</th>\n",
|
|||
|
" <th>V33</th>\n",
|
|||
|
" <th>V34</th>\n",
|
|||
|
" <th>V35</th>\n",
|
|||
|
" <th>V36</th>\n",
|
|||
|
" <th>V37</th>\n",
|
|||
|
" <th>V38</th>\n",
|
|||
|
" <th>V39</th>\n",
|
|||
|
" <th>V40</th>\n",
|
|||
|
" <th>V41</th>\n",
|
|||
|
" <th>Class</th>\n",
|
|||
|
" </tr>\n",
|
|||
|
" </thead>\n",
|
|||
|
" <tbody>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>count</th>\n",
|
|||
|
" <td>209.000000</td>\n",
|
|||
|
" <td>209.000000</td>\n",
|
|||
|
" <td>209.00000</td>\n",
|
|||
|
" <td>209.000000</td>\n",
|
|||
|
" <td>209.000000</td>\n",
|
|||
|
" <td>209.000000</td>\n",
|
|||
|
" <td>209.000000</td>\n",
|
|||
|
" <td>209.000000</td>\n",
|
|||
|
" <td>209.000000</td>\n",
|
|||
|
" <td>209.000000</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>209.000000</td>\n",
|
|||
|
" <td>209.000000</td>\n",
|
|||
|
" <td>209.000000</td>\n",
|
|||
|
" <td>209.000000</td>\n",
|
|||
|
" <td>209.000000</td>\n",
|
|||
|
" <td>209.000000</td>\n",
|
|||
|
" <td>209.000000</td>\n",
|
|||
|
" <td>209.000000</td>\n",
|
|||
|
" <td>209.000000</td>\n",
|
|||
|
" <td>209.000000</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>mean</th>\n",
|
|||
|
" <td>4.750938</td>\n",
|
|||
|
" <td>3.130050</td>\n",
|
|||
|
" <td>0.62201</td>\n",
|
|||
|
" <td>0.086124</td>\n",
|
|||
|
" <td>1.114833</td>\n",
|
|||
|
" <td>0.339713</td>\n",
|
|||
|
" <td>1.555024</td>\n",
|
|||
|
" <td>35.569378</td>\n",
|
|||
|
" <td>1.511962</td>\n",
|
|||
|
" <td>1.880383</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>0.803828</td>\n",
|
|||
|
" <td>1.411483</td>\n",
|
|||
|
" <td>1.100478</td>\n",
|
|||
|
" <td>3.902612</td>\n",
|
|||
|
" <td>2.629201</td>\n",
|
|||
|
" <td>0.746411</td>\n",
|
|||
|
" <td>8.574038</td>\n",
|
|||
|
" <td>0.019139</td>\n",
|
|||
|
" <td>0.789474</td>\n",
|
|||
|
" <td>1.354067</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>std</th>\n",
|
|||
|
" <td>0.603914</td>\n",
|
|||
|
" <td>0.897556</td>\n",
|
|||
|
" <td>1.27690</td>\n",
|
|||
|
" <td>0.406969</td>\n",
|
|||
|
" <td>2.393143</td>\n",
|
|||
|
" <td>1.182566</td>\n",
|
|||
|
" <td>2.246383</td>\n",
|
|||
|
" <td>9.471334</td>\n",
|
|||
|
" <td>1.721220</td>\n",
|
|||
|
" <td>1.784023</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>1.498327</td>\n",
|
|||
|
" <td>2.374355</td>\n",
|
|||
|
" <td>1.320857</td>\n",
|
|||
|
" <td>1.029605</td>\n",
|
|||
|
" <td>0.714285</td>\n",
|
|||
|
" <td>1.077657</td>\n",
|
|||
|
" <td>1.315016</td>\n",
|
|||
|
" <td>0.195176</td>\n",
|
|||
|
" <td>2.589491</td>\n",
|
|||
|
" <td>0.479378</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>min</th>\n",
|
|||
|
" <td>2.000000</td>\n",
|
|||
|
" <td>1.134900</td>\n",
|
|||
|
" <td>0.00000</td>\n",
|
|||
|
" <td>0.000000</td>\n",
|
|||
|
" <td>0.000000</td>\n",
|
|||
|
" <td>0.000000</td>\n",
|
|||
|
" <td>0.000000</td>\n",
|
|||
|
" <td>0.000000</td>\n",
|
|||
|
" <td>0.000000</td>\n",
|
|||
|
" <td>0.000000</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>0.000000</td>\n",
|
|||
|
" <td>0.000000</td>\n",
|
|||
|
" <td>0.000000</td>\n",
|
|||
|
" <td>2.267000</td>\n",
|
|||
|
" <td>1.576000</td>\n",
|
|||
|
" <td>0.000000</td>\n",
|
|||
|
" <td>4.917000</td>\n",
|
|||
|
" <td>0.000000</td>\n",
|
|||
|
" <td>0.000000</td>\n",
|
|||
|
" <td>1.000000</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>25%</th>\n",
|
|||
|
" <td>4.414000</td>\n",
|
|||
|
" <td>2.494500</td>\n",
|
|||
|
" <td>0.00000</td>\n",
|
|||
|
" <td>0.000000</td>\n",
|
|||
|
" <td>0.000000</td>\n",
|
|||
|
" <td>0.000000</td>\n",
|
|||
|
" <td>0.000000</td>\n",
|
|||
|
" <td>29.400000</td>\n",
|
|||
|
" <td>0.000000</td>\n",
|
|||
|
" <td>0.000000</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>0.000000</td>\n",
|
|||
|
" <td>0.000000</td>\n",
|
|||
|
" <td>0.000000</td>\n",
|
|||
|
" <td>3.401000</td>\n",
|
|||
|
" <td>2.146000</td>\n",
|
|||
|
" <td>0.000000</td>\n",
|
|||
|
" <td>7.872000</td>\n",
|
|||
|
" <td>0.000000</td>\n",
|
|||
|
" <td>0.000000</td>\n",
|
|||
|
" <td>1.000000</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>50%</th>\n",
|
|||
|
" <td>4.807000</td>\n",
|
|||
|
" <td>3.039300</td>\n",
|
|||
|
" <td>0.00000</td>\n",
|
|||
|
" <td>0.000000</td>\n",
|
|||
|
" <td>0.000000</td>\n",
|
|||
|
" <td>0.000000</td>\n",
|
|||
|
" <td>0.000000</td>\n",
|
|||
|
" <td>34.200000</td>\n",
|
|||
|
" <td>1.000000</td>\n",
|
|||
|
" <td>2.000000</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>0.000000</td>\n",
|
|||
|
" <td>0.000000</td>\n",
|
|||
|
" <td>1.000000</td>\n",
|
|||
|
" <td>3.694000</td>\n",
|
|||
|
" <td>2.469000</td>\n",
|
|||
|
" <td>0.000000</td>\n",
|
|||
|
" <td>8.464000</td>\n",
|
|||
|
" <td>0.000000</td>\n",
|
|||
|
" <td>0.000000</td>\n",
|
|||
|
" <td>1.000000</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>75%</th>\n",
|
|||
|
" <td>5.188000</td>\n",
|
|||
|
" <td>3.555400</td>\n",
|
|||
|
" <td>1.00000</td>\n",
|
|||
|
" <td>0.000000</td>\n",
|
|||
|
" <td>1.000000</td>\n",
|
|||
|
" <td>0.000000</td>\n",
|
|||
|
" <td>3.000000</td>\n",
|
|||
|
" <td>41.200000</td>\n",
|
|||
|
" <td>2.000000</td>\n",
|
|||
|
" <td>3.000000</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>1.000000</td>\n",
|
|||
|
" <td>2.000000</td>\n",
|
|||
|
" <td>2.000000</td>\n",
|
|||
|
" <td>3.991000</td>\n",
|
|||
|
" <td>2.967000</td>\n",
|
|||
|
" <td>1.000000</td>\n",
|
|||
|
" <td>9.017000</td>\n",
|
|||
|
" <td>0.000000</td>\n",
|
|||
|
" <td>0.000000</td>\n",
|
|||
|
" <td>2.000000</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>max</th>\n",
|
|||
|
" <td>6.253000</td>\n",
|
|||
|
" <td>9.177500</td>\n",
|
|||
|
" <td>8.00000</td>\n",
|
|||
|
" <td>3.000000</td>\n",
|
|||
|
" <td>16.000000</td>\n",
|
|||
|
" <td>12.000000</td>\n",
|
|||
|
" <td>14.000000</td>\n",
|
|||
|
" <td>60.000000</td>\n",
|
|||
|
" <td>9.000000</td>\n",
|
|||
|
" <td>11.000000</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>12.000000</td>\n",
|
|||
|
" <td>18.000000</td>\n",
|
|||
|
" <td>6.000000</td>\n",
|
|||
|
" <td>10.355000</td>\n",
|
|||
|
" <td>5.825000</td>\n",
|
|||
|
" <td>6.000000</td>\n",
|
|||
|
" <td>14.030000</td>\n",
|
|||
|
" <td>2.000000</td>\n",
|
|||
|
" <td>27.000000</td>\n",
|
|||
|
" <td>2.000000</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" </tbody>\n",
|
|||
|
"</table>\n",
|
|||
|
"<p>8 rows × 42 columns</p>\n",
|
|||
|
"</div>"
|
|||
|
],
|
|||
|
"text/plain": [
|
|||
|
" V1 V2 V3 V4 V5 V6 \\\n",
|
|||
|
"count 209.000000 209.000000 209.00000 209.000000 209.000000 209.000000 \n",
|
|||
|
"mean 4.750938 3.130050 0.62201 0.086124 1.114833 0.339713 \n",
|
|||
|
"std 0.603914 0.897556 1.27690 0.406969 2.393143 1.182566 \n",
|
|||
|
"min 2.000000 1.134900 0.00000 0.000000 0.000000 0.000000 \n",
|
|||
|
"25% 4.414000 2.494500 0.00000 0.000000 0.000000 0.000000 \n",
|
|||
|
"50% 4.807000 3.039300 0.00000 0.000000 0.000000 0.000000 \n",
|
|||
|
"75% 5.188000 3.555400 1.00000 0.000000 1.000000 0.000000 \n",
|
|||
|
"max 6.253000 9.177500 8.00000 3.000000 16.000000 12.000000 \n",
|
|||
|
"\n",
|
|||
|
" V7 V8 V9 V10 ... V33 \\\n",
|
|||
|
"count 209.000000 209.000000 209.000000 209.000000 ... 209.000000 \n",
|
|||
|
"mean 1.555024 35.569378 1.511962 1.880383 ... 0.803828 \n",
|
|||
|
"std 2.246383 9.471334 1.721220 1.784023 ... 1.498327 \n",
|
|||
|
"min 0.000000 0.000000 0.000000 0.000000 ... 0.000000 \n",
|
|||
|
"25% 0.000000 29.400000 0.000000 0.000000 ... 0.000000 \n",
|
|||
|
"50% 0.000000 34.200000 1.000000 2.000000 ... 0.000000 \n",
|
|||
|
"75% 3.000000 41.200000 2.000000 3.000000 ... 1.000000 \n",
|
|||
|
"max 14.000000 60.000000 9.000000 11.000000 ... 12.000000 \n",
|
|||
|
"\n",
|
|||
|
" V34 V35 V36 V37 V38 V39 \\\n",
|
|||
|
"count 209.000000 209.000000 209.000000 209.000000 209.000000 209.000000 \n",
|
|||
|
"mean 1.411483 1.100478 3.902612 2.629201 0.746411 8.574038 \n",
|
|||
|
"std 2.374355 1.320857 1.029605 0.714285 1.077657 1.315016 \n",
|
|||
|
"min 0.000000 0.000000 2.267000 1.576000 0.000000 4.917000 \n",
|
|||
|
"25% 0.000000 0.000000 3.401000 2.146000 0.000000 7.872000 \n",
|
|||
|
"50% 0.000000 1.000000 3.694000 2.469000 0.000000 8.464000 \n",
|
|||
|
"75% 2.000000 2.000000 3.991000 2.967000 1.000000 9.017000 \n",
|
|||
|
"max 18.000000 6.000000 10.355000 5.825000 6.000000 14.030000 \n",
|
|||
|
"\n",
|
|||
|
" V40 V41 Class \n",
|
|||
|
"count 209.000000 209.000000 209.000000 \n",
|
|||
|
"mean 0.019139 0.789474 1.354067 \n",
|
|||
|
"std 0.195176 2.589491 0.479378 \n",
|
|||
|
"min 0.000000 0.000000 1.000000 \n",
|
|||
|
"25% 0.000000 0.000000 1.000000 \n",
|
|||
|
"50% 0.000000 0.000000 1.000000 \n",
|
|||
|
"75% 0.000000 0.000000 2.000000 \n",
|
|||
|
"max 2.000000 27.000000 2.000000 \n",
|
|||
|
"\n",
|
|||
|
"[8 rows x 42 columns]"
|
|||
|
]
|
|||
|
},
|
|||
|
"execution_count": 7,
|
|||
|
"metadata": {},
|
|||
|
"output_type": "execute_result"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"df_test.describe()"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 8,
|
|||
|
"id": "9598495e",
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"<class 'pandas.core.frame.DataFrame'>\n",
|
|||
|
"Int64Index: 209 entries, 1 to 1051\n",
|
|||
|
"Data columns (total 42 columns):\n",
|
|||
|
" # Column Non-Null Count Dtype \n",
|
|||
|
"--- ------ -------------- ----- \n",
|
|||
|
" 0 V1 209 non-null float64\n",
|
|||
|
" 1 V2 209 non-null float64\n",
|
|||
|
" 2 V3 209 non-null int64 \n",
|
|||
|
" 3 V4 209 non-null int64 \n",
|
|||
|
" 4 V5 209 non-null int64 \n",
|
|||
|
" 5 V6 209 non-null int64 \n",
|
|||
|
" 6 V7 209 non-null int64 \n",
|
|||
|
" 7 V8 209 non-null float64\n",
|
|||
|
" 8 V9 209 non-null int64 \n",
|
|||
|
" 9 V10 209 non-null int64 \n",
|
|||
|
" 10 V11 209 non-null int64 \n",
|
|||
|
" 11 V12 209 non-null float64\n",
|
|||
|
" 12 V13 209 non-null float64\n",
|
|||
|
" 13 V14 209 non-null float64\n",
|
|||
|
" 14 V15 209 non-null float64\n",
|
|||
|
" 15 V16 209 non-null int64 \n",
|
|||
|
" 16 V17 209 non-null float64\n",
|
|||
|
" 17 V18 209 non-null float64\n",
|
|||
|
" 18 V19 209 non-null int64 \n",
|
|||
|
" 19 V20 209 non-null int64 \n",
|
|||
|
" 20 V21 209 non-null int64 \n",
|
|||
|
" 21 V22 209 non-null float64\n",
|
|||
|
" 22 V23 209 non-null int64 \n",
|
|||
|
" 23 V24 209 non-null int64 \n",
|
|||
|
" 24 V25 209 non-null int64 \n",
|
|||
|
" 25 V26 209 non-null int64 \n",
|
|||
|
" 26 V27 209 non-null float64\n",
|
|||
|
" 27 V28 209 non-null float64\n",
|
|||
|
" 28 V29 209 non-null int64 \n",
|
|||
|
" 29 V30 209 non-null float64\n",
|
|||
|
" 30 V31 209 non-null float64\n",
|
|||
|
" 31 V32 209 non-null int64 \n",
|
|||
|
" 32 V33 209 non-null int64 \n",
|
|||
|
" 33 V34 209 non-null int64 \n",
|
|||
|
" 34 V35 209 non-null int64 \n",
|
|||
|
" 35 V36 209 non-null float64\n",
|
|||
|
" 36 V37 209 non-null float64\n",
|
|||
|
" 37 V38 209 non-null int64 \n",
|
|||
|
" 38 V39 209 non-null float64\n",
|
|||
|
" 39 V40 209 non-null int64 \n",
|
|||
|
" 40 V41 209 non-null int64 \n",
|
|||
|
" 41 Class 209 non-null int64 \n",
|
|||
|
"dtypes: float64(17), int64(25)\n",
|
|||
|
"memory usage: 70.2 KB\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"df_test.info()"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"id": "84e0c414",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"#### Display distributions of target variable **Class** in training and validation set."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 9,
|
|||
|
"id": "5ca239ec",
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAjsAAAHHCAYAAABZbpmkAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjYuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8o6BhiAAAACXBIWXMAAA9hAAAPYQGoP6dpAABJz0lEQVR4nO3deVxUZf//8fcIgoiAKyAuuKe4JpaSmrkkKlmm3ZpZoqlZobmUFd8slxa7NZc00+67UlvMrTQzl9zSUmwxNbM0NdcUNE0QFRS4fn/0Y+5GQGEcGDi9no/HPB7Mda4553POmWHec805Z2zGGCMAAACLKubuAgAAAPITYQcAAFgaYQcAAFgaYQcAAFgaYQcAAFgaYQcAAFgaYQcAAFgaYQcAAFgaYQcAAFgaYaeIGzt2rGw2W4Es64477tAdd9xhv//ll1/KZrNpyZIlBbL8fv36qVq1agWyLGclJydr4MCBCg4Ols1m0/Dhw/M8j8x9+scff7i+QOSLq18buXX48GHZbDa99tpr1+3r6tf63LlzZbPZdPjwYZfNMyf9+vVTqVKl8n05+c1ms2ns2LFOPbZatWrq16+fS+tB7hF2CpHMfz6ZtxIlSigkJESRkZGaPn26zp8/75LlnDhxQmPHjtXOnTtdMj9XKsy15cYrr7yiuXPn6rHHHtP777+vhx566Jp9ly1bVnDFXWXr1q0aO3aszp0757Ya8qKo1ftPc/HiRY0dO1Zffvml22pYuXKl02EE11cY9rHTDAqNOXPmGElm/Pjx5v333zfvvvuueeWVV0zHjh2NzWYzoaGhZteuXQ6PuXLlirl06VKelvPdd98ZSWbOnDl5elxqaqpJTU2139+4caORZBYvXpyn+Thb2+XLl01KSorLlpUfmjdvblq2bJmrvr6+viY6OjpL+5gxY4wkc/r0aRdX52jSpElGkjl06FC+LsdVCnO9V782cuvQoUNGkpk0adJ1+2Y+L1wlLS3NXLp0yWRkZLhkfqdPnzaSzJgxY7JMi46ONr6+vi5ZzrXExMS4dBtd7dKlS+bKlStOPTYlJcVcvnzZxRUVrGvt48LO0z0RC9fSuXNnNWvWzH4/NjZWGzZs0F133aW7775bv/zyi3x8fCRJnp6e8vTM39148eJFlSxZUl5eXvm6nOspXry4W5efG6dOnVJYWJi7y3AbY4xSUlLsz0+rKyyvDWd4eHjIw8PD3WW4TVpamjIyMvK070qUKOH08ry9vZ1+LFzA3WkL/5M5svPdd99lO/2VV14xksx//vMfe1t2n/a++OIL07JlSxMQEGB8fX1NnTp1TGxsrDHmf6MxV98yR1LatGlj6tevb77//nvTunVr4+PjY4YNG2af1qZNG/tyMue1YMECExsba4KCgkzJkiVN165dzdGjRx1qCg0NzXYU4+/zvF5t0dHRJjQ01OHxycnJZuTIkaZy5crGy8vL1KlTx0yaNCnLp1VJJiYmxixdutTUr1/feHl5mbCwMLNq1apst/XVEhISzMMPP2wCAwONt7e3adSokZk7d26WbXH1LadRiOz6Zm6fzH26f/9+Ex0dbQICAoy/v7/p16+fuXDhQpZ5vf/++6Zp06amRIkSpkyZMqZXr15Ztv/VMpeRU73vvvuuadu2ralQoYLx8vIy9erVM2+++WaW+YSGhpqoqCizevVqEx4ebry9vc3UqVONMcYcPnzYdO3a1ZQsWdJUqFDBDB8+3KxevdpIMhs3bnSYz7Zt20xkZKTx9/c3Pj4+5vbbbzdff/11ruu9WkxMjPH19c12e91///0mKCjIpKWlGWOMWbZsmenSpYupWLGi8fLyMjVq1DDjx4+3T8+Ul9dGamqqef75503Tpk2Nv7+/KVmypGnVqpXZsGGDwzz/PrIzZcoUU7VqVVOiRAlz++23m927d2e7z67mzP435n//b/6+DTP351dffWVuueUW4+3tbapXr27mzZt3zXllrsfVt8wRgMyRnePHj5t77rnH+Pr6mvLly5snn3wyy3ZOT083U6dONWFhYcbb29sEBgaaRx55xJw9e/aaNURHR2dbw9/rmzRpkpk6daqpUaOGKVasmNmxY0eu95UxJsuoRl5eq1f/D8zc/l9//bUZMWKEKV++vClZsqTp1q2bOXXqVJZtMmbMGFOxYkXj4+Nj7rjjDrNnz54c/69e7aOPPjJNmzY1pUqVMn5+fqZBgwZm2rRpDn3+/PNPM2zYMPv/0po1a5pXX33VpKenO2zDnPZxYcfIThHy0EMP6f/+7//0xRdfaNCgQdn22bNnj+666y41atRI48ePl7e3tw4cOKAtW7ZIkurVq6fx48frhRde0COPPKLWrVtLkm677Tb7PM6cOaPOnTvr/vvv14MPPqigoKBr1vXyyy/LZrPpmWee0alTpzRt2jR16NBBO3fuzNMn/NzU9nfGGN19993auHGjBgwYoCZNmmjNmjUaNWqUfv/9d02dOtWh/9dff61PPvlEjz/+uPz8/DR9+nT16NFDR48eVbly5XKs69KlS7rjjjt04MABDRkyRNWrV9fixYvVr18/nTt3TsOGDVO9evX0/vvva8SIEapcubKefPJJSVKFChWynef777+vgQMH6tZbb9UjjzwiSapZs6ZDn549e6p69eqaMGGCfvjhB7399tsKDAzUv//9b3ufl19+Wc8//7x69uypgQMH6vTp05oxY4Zuv/127dixQ6VLl852+d27d9evv/6qjz76SFOnTlX58uUd6p01a5bq16+vu+++W56envrss8/0+OOPKyMjQzExMQ7z2rdvn3r37q3Bgwdr0KBBuummm3ThwgW1a9dOJ0+e1LBhwxQcHKz58+dr48aNWWrZsGGDOnfurPDwcI0ZM0bFihXTnDlz1K5dO3311Ve69dZbr1vv1Xr16qWZM2fq888/17/+9S97+8WLF/XZZ5+pX79+9lGNuXPnqlSpUho5cqRKlSqlDRs26IUXXlBSUpImTZrkMN/cvjaSkpL09ttvq3fv3ho0aJDOnz+vd955R5GRkfr222/VpEkTh/7vvfeezp8/r5iYGKWkpOj1119Xu3bttHv37mu+/pzd/9dy4MAB3XfffRowYICio6P17rvvql+/fgoPD1f9+vWzfUyFChU0a9YsPfbYY7r33nvVvXt3SVKjRo3sfdLT0xUZGanmzZvrtdde07p16zR58mTVrFlTjz32mL3f4MGDNXfuXPXv319PPPGEDh06pDfeeEM7duzQli1bchzhHTx4sE6cOKG1a9fq/fffz7bPnDlzlJKSokceeUTe3t4qW7ZsnvdVdnLzWs3J0KFDVaZMGY0ZM0aHDx/WtGnTNGTIEC1cuNDeJzY2VhMnTlTXrl0VGRmpXbt2KTIyUikpKded/9q1a9W7d2+1b9/eXs8vv/yiLVu2aNiwYZL+el20adNGv//+uwYPHqyqVatq69atio2N1cmTJzVt2rRc7eNCzd1pC/9zvZEdY4wJCAgwN998s/3+1Z/2pk6det3jPa51XEybNm2MJDN79uxsp2U3slOpUiWTlJRkb1+0aJGRZF5//XV7W25Gdq5X29UjO8uWLTOSzEsvveTQ77777jM2m80cOHDA3ibJeHl5ObTt2rXLSDIzZszIsqy/mzZtmpFkPvjgA3vb5cuXTUREhClVqpTDumd+Ms6N6x2z8/DDDzu033vvvaZcuXL2+4cPHzYeHh7m5Zdfdui3e/du4+npmaX9atc6BubixYtZ2iIjI02NGjUc2kJDQ40ks3r1aof2yZMnG0lm2bJl9rZLly6ZunXrOozsZGRkmNq1a5vIyEiH0biLFy+a6tWrmzvvvDNX9V4tIyPDVKpUyfTo0cOhPfO5uXnz5muu6+DBg03JkiUdjhHLy2sjLS0tyzE8f/75pwkKCnLYr5mfln18fMzx48ft7d98842RZEaMGGFvu/q1fqP7P6eRnau3z6lTp4y3t7d58sknrzm/6x2zo/9/POLf3XzzzSY8PNx+/6uvvjKSzIcffujQL3NE8Or2q+V0zE7mdvb3988yapLbfWVMziM713utGpPzyE6HDh0cnvsjRow
|
|||
|
"text/plain": [
|
|||
|
"<Figure size 640x480 with 1 Axes>"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {},
|
|||
|
"output_type": "display_data"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"_, _, bars = plt.hist(df_train['Class'], bins=10)\n",
|
|||
|
"plt.xlabel('Class')\n",
|
|||
|
"plt.ylabel('Frequency')\n",
|
|||
|
"plt.title('Distribution of the target variable in the training set')\n",
|
|||
|
"plt.bar_label(bars, fmt='%1.0f')\n",
|
|||
|
"plt.show()"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 10,
|
|||
|
"id": "c74f9fb5",
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAjsAAAHHCAYAAABZbpmkAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjYuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8o6BhiAAAACXBIWXMAAA9hAAAPYQGoP6dpAABH6UlEQVR4nO3deVxUZf//8fcggogsYgJiLrjlmhYmebsrSWqmabmWuFthrmV5l7mkkZlrmd6VqZVmampmppmYtqi5lm3uu4JbgrggyvX7ox/zbQQUYdiOr+fjMQ+d65y5zuecYZg311znjM0YYwQAAGBRLrldAAAAQHYi7AAAAEsj7AAAAEsj7AAAAEsj7AAAAEsj7AAAAEsj7AAAAEsj7AAAAEsj7AAAAEsj7NzhRo0aJZvNliPbaty4sRo3bmy//91338lms2nx4sU5sv3u3burbNmyObKtzEpISFDv3r0VGBgom82mQYMG3XYfKc/pmTNnnF8gssWNr42MOnTokGw2m956661bruvs1/qcOXNks9l06NAhp/WZnu7du6tIkSLZvh1YF2HHQlJ++aTcChUqpKCgIIWHh2vatGm6cOGCU7Zz4sQJjRo1Sjt37nRKf86Ul2vLiNdff11z5szRM888o48//lhPPfXUTdddtmxZzhV3g59++kmjRo3S+fPnc62G25Hf6r3TXLp0SaNGjdJ3332XazWsXLlSo0aNytZt5OZ+/vHHHxo1alSOBNQ8x8AyZs+ebSSZMWPGmI8//th8+OGH5vXXXzfNmzc3NpvNlClTxvzyyy8Oj0lKSjKXL1++re1s2bLFSDKzZ8++rcclJiaaxMRE+/1169YZSWbRokW31U9ma7t69aq5cuWK07aVHUJDQ029evUytK6np6eJiIhI1T5y5EgjyZw+fdrJ1TmaMGGCkWQOHjyYrdtxlrxc742vjYw6ePCgkWQmTJhwy3VTfi6c5dq1a+by5csmOTnZKf2dPn3aSDIjR45MtSwiIsJ4eno6ZTs3ExkZ6dRjlJab7Wd2W7RokZFk1q1bl+Pbzm2uuZKwkK1atGih2rVr2+8PHz5c0dHReuSRR/Too4/qzz//lIeHhyTJ1dVVrq7Z+2Nw6dIlFS5cWG5ubtm6nVspWLBgrm4/I06dOqWqVavmdhm5xhijK1eu2H8+rS6vvDYyo0CBAipQoEBulwFkTG6nLThPysjOli1b0lz++uuvG0nmvffes7el9dfeN998Y+rVq2d8fHyMp6enqVSpkhk+fLgx5v9GY268pYykNGrUyFSrVs1s3brVNGjQwHh4eJiBAwfalzVq1Mi+nZS+FixYYIYPH24CAgJM4cKFTevWrc2RI0ccaipTpkyaoxj/7vNWtUVERJgyZco4PD4hIcEMGTLE3H333cbNzc1UqlTJTJgwIdVfq5JMZGSkWbp0qalWrZpxc3MzVatWNV9//XWax/pGsbGxpmfPnsbf39+4u7ube++918yZMyfVsbjxlt4oRFrrphyflOd07969JiIiwvj4+Bhvb2/TvXt3c/HixVR9ffzxx+b+++83hQoVMkWLFjUdO3ZMdfxvlLKN9Or98MMPTZMmTUzx4sWNm5ubqVKlinn33XdT9VOmTBnTqlUrs2rVKhMSEmLc3d3N5MmTjTHGHDp0yLRu3doULlzYFC9e3AwaNMisWrUqzb9MN23aZMLDw423t7fx8PAwDRs2ND/88EOG671RZGSk8fT0TPN4derUyQQEBJhr164ZY4xZtmyZadmypSlRooRxc3Mz5cqVM2PGjLEvT3E7r43ExEQzYsQIc//99xtvb29TuHBhU79+fRMdHe3Q579HdiZNmmRKly5tChUqZBo2bGh27dqV5nN2o8w8/8b83++bfx/DlOfz+++/Nw888IBxd3c3wcHBZu7cuTftK2U/bryljH6kjOwcO3bMtGnTxnh6epq77rrLDB06NNVxvn79upk8ebKpWrWqcXd3N/7+/qZv377m3LlzN60hIiIizRput98tW7aY5s2bm2LFiplChQqZsmXLmh49emRoP9Ny9epVM2rUKFOhQgXj7u5u/Pz8TL169cw333zjsN6ff/5p2rdvb4oWLWrc3d1NSEiI+eKLL+zLU56vG293yigPIzt3kKeeekr//e9/9c0336hPnz5prvP777/rkUce0b333qsxY8bI3d1d+/bt048//ihJqlKlisaMGaNXX31Vffv2VYMGDSRJ//nPf+x9nD17Vi1atFCnTp305JNPKiAg4KZ1jRs3TjabTS+++KJOnTqlKVOmKCwsTDt37rytv/AzUtu/GWP06KOPat26derVq5dq1aql1atX64UXXtDx48c1efJkh/V/+OEHLVmyRM8++6y8vLw0bdo0tW/fXkeOHFGxYsXSrevy5ctq3Lix9u3bp/79+ys4OFiLFi1S9+7ddf78eQ0cOFBVqlTRxx9/rMGDB+vuu+/W0KFDJUnFixdPs8+PP/5YvXv3Vp06ddS3b19JUvny5R3W6dChg4KDgxUVFaXt27frgw8+kL+/v8aPH29fZ9y4cRoxYoQ6dOig3r176/Tp03r77bfVsGFD7dixQ76+vmluv127dtqzZ48+/fRTTZ48WXfddZdDvTNmzFC1atX06KOPytXVVV9++aWeffZZJScnKzIy0qGv3bt3q3PnzurXr5/69Omje+65RxcvXlTTpk118uRJDRw4UIGBgZo/f77WrVuXqpbo6Gi1aNFCISEhGjlypFxcXDR79mw1bdpU33//verUqXPLem/UsWNHTZ8+XV999ZWeeOIJe/ulS5f05Zdfqnv37vZRjTlz5qhIkSIaMmSIihQpoujoaL366quKj4/XhAkTHPrN6GsjPj5eH3zwgTp37qw+ffrowoULmjVrlsLDw/Xzzz+rVq1aDut/9NFHunDhgiIjI3XlyhVNnTpVTZs21a5du276+svs838z+/bt0+OPP65evXopIiJCH374obp3766QkBBVq1YtzccUL15cM2bM0DPPPKPHHntM7dq1kyTde++99nWuX7+u8PBwhYaG6q233tK3336riRMnqnz58nrmmWfs6/Xr109z5sxRjx49NGDAAB08eFDvvPOOduzYoR9//DHdEd5+/frpxIkTWrNmjT7++OM0l9+q31OnTql58+YqXry4XnrpJfn6+urQoUNasmRJhvfzRqNGjVJUVJT99R4fH6+tW7dq+/bteuihhyT983u7Xr16KlmypF566SV5enpq4cKFatu2rT7//HM99thjatiwoQYMGKBp06bpv//9r6pUqSJJ9n8tL7fTFpznViM7xhjj4+Nj7rvvPvv9G//amzx58i3ne9xsXkyjRo2MJDNz5sw0l6U1slOyZEkTHx9vb1+4cKGRZKZOnWpvy8jIzq1qu3FkZ9myZUaSGTt2rMN6jz/+uLHZbGbfvn32NknGzc3Noe2XX34xkszbb7+dalv/NmXKFCPJfPLJJ/a2q1evmrp165oiRYo47HvKX8YZcas5Oz179nRof+yxx0yxYsXs9w8dOmQKFChgxo0b57Derl27jKura6r2G91sDsylS5dStYWHh5ty5co5tJUpU8ZIMqtWrXJonzhxopFkli1bZm+7fPmyqVy5ssNfo8nJyaZixYomPDzcYTTu0qVLJjg42Dz00EMZqvdGycnJpmTJkqZ9+/YO7Sk/mxs2bLjpvvbr188ULlzYYY7Y7bw2rl27lmoOz99//20CAgIcnteUkQIPDw9z7Ngxe/vmzZuNJDN48GB7242v9aw+/+mN7Nx4fE6dOmXc3d3N0KFDb9rfrebs6P/PR/y3++67z4SEhNjvf//990aSmTdvnsN6KSOCN7bfKL05Oxntd+nSpbf8HXy7c3Zq1qx5y98JzZo1MzVq1HD4eUtOTjb/+c9/TMWKFe1td/KcHc7GusMUKVLkpmdlpfwl98UXXyg5OTlT23B3d1ePHj0yvH63bt3k5eVlv//444+rRIkSWrlyZaa2n1ErV65UgQIFNGDAAIf2oUOHyhijr7/+2qE9LCzMYfTk3nvvlbe3tw4cOHDL7QQGBqpz5872toIFC2r
|
|||
|
"text/plain": [
|
|||
|
"<Figure size 640x480 with 1 Axes>"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {},
|
|||
|
"output_type": "display_data"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"_, _, bars = plt.hist(df_test['Class'], bins=10)\n",
|
|||
|
"plt.xlabel('Class')\n",
|
|||
|
"plt.ylabel('Frequency')\n",
|
|||
|
"plt.title('Distribution of the target variable in the test set')\n",
|
|||
|
"plt.bar_label(bars, fmt='%1.0f')\n",
|
|||
|
"plt.show()"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"attachments": {},
|
|||
|
"cell_type": "markdown",
|
|||
|
"id": "82afd315",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"#### Display relationship between features in the training set using the correlation matrix"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 11,
|
|||
|
"id": "e8cf8eb1",
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"text/plain": [
|
|||
|
"(42.5, -0.5)"
|
|||
|
]
|
|||
|
},
|
|||
|
"execution_count": 11,
|
|||
|
"metadata": {},
|
|||
|
"output_type": "execute_result"
|
|||
|
},
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAABy4AAAe2CAYAAABKEJQUAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjYuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8o6BhiAAAACXBIWXMAAA9hAAAPYQGoP6dpAAEAAElEQVR4nOzddXRU1/rw8e9EZuLubkQIEtyhtBQp0EIFK1KkUOqlCrTQ9vbWBai73gq0pcXd3TUJxBPirhOf948JSSbMJMO9cJPfe5/PWrNWSfaZPD1nn2fvffY5+yg0Go0GIYQQQgghhBBCCCGEEEIIIYRoRybtHYAQQgghhBBCCCGEEEIIIYQQQsjEpRBCCCGEEEIIIYQQQgghhBCi3cnEpRBCCCGEEEIIIYQQQgghhBCi3cnEpRBCCCGEEEIIIYQQQgghhBCi3cnEpRBCCCGEEEIIIYQQQgghhBCi3cnEpRBCCCGEEEIIIYQQQgghhBCi3cnEpRBCCCGEEEIIIYQQQgghhBCi3cnEpRBCCCGEEEIIIYQQQgghhBCi3cnEpRBCCCGEEEIIIYQQQgghhBCi3cnEpRBCCCGEEEIIIYQQQgghhBCi3cnEpRBCCCGEEEIIIYQQQgghhBD/h+3bt4/x48fj5eWFQqHgr7/+anObPXv20LNnT1QqFSEhIXz33XfXlPn4448JCAjAwsKCfv36cezYsRsffDMycSmEEEIIIYQQQgghhBBCCCHE/2Hl5eV0796djz/+2KjySUlJjB07luHDh3PmzBmefPJJ5s2bx9atWxvL/PbbbyxatIjly5dz6tQpunfvzqhRo8jJyblZ/xsoNBqN5qZ9uxBCCCGEEEIIIYQQQgghhBDiv0ahULB27VomTJhgsMzzzz/Pxo0buXDhQuPPpkyZQlFREVu2bAGgX79+9OnTh48++giA+vp6fH19eeyxx3jhhRduSuzyxKUQQgghhBBCCCGEEEIIIYQQHUxVVRUlJSU6n6qqqhvy3YcPH2bEiBE6Pxs1ahSHDx8GoLq6mpMnT+qUMTExYcSIEY1lbgazm/bNQgghhBBCCCGEEEIIIYQQ4v8sS7+p7R3C/7Tn54Txyiuv6Pxs+fLlvPzyy//xd2dlZeHu7q7zM3d3d0pKSlCr1RQWFlJXV6e3TGxs7H/89w2RiUshhBBCCCGEEEIIIYQQQgghOpjFixezaNEinZ+pVKp2iua/QyYuhRBCCCGEEEIIIYQQQgghhOhgVCrVTZuo9PDwIDs7W+dn2dnZ2NnZYWlpiampKaampnrLeHh43JSYQN5xKYQQQgghhBBCCCGEEEIIIcT/lAEDBrBz506dn23fvp0BAwYAoFQq6dWrl06Z+vp6du7c2VjmZpCJSyGEEEIIIYQQQgghhBBCCCH+DysrK+PMmTOcOXMGgKSkJM6cOUNqaiqgXXZ25syZjeUfeughEhMTee6554iNjeWTTz5h9erVPPXUU41lFi1axJdffsn3339PTEwMCxcupLy8nNmzZ9+0/w9ZKlYIIYQQQgghhBBCCCGEEEKI/8NOnDjB8OHDG/999d2Ys2bN4rvvviMzM7NxEhMgMDCQjRs38tRTT7Fy5Up8fHz46quvGDVqVGOZyZMnk5uby7Jly8jKyiIqKootW7bg7u5+0/4/FBqNRnPTvl0IIYQQQgghhBBCCCGEEEL8n2TpN7W9Q/ifpk79pb1D+K+TpWKFEEIIIYQQQgghhBBCCCGEEO1OlooVQgghhBBCCCGEEEIIIYQQ11Ao5Pk38d8lNU4IIYQQQgghhBBCCCGEEEII0e5k4lIIIYQQQgghhBBCCCGEEEII0e5k4lIIIYQQQgghhBBCCCGEEEII0e5k4lIIIYQQQgghhBBCCCGEEEII0e5k4lIIIYQQQgghhBBCCCGEEEII0e5k4lIIIYQQQgghhBBCCCGEEEII0e7M2jsAIYQQQgghhBBCCCGEEEII0fEo5Pk38V8mNU4IIYQQQgghhBBCCCGEEEII0e5k4lIIIYQQQgghhBBCCCGEEEII0e5k4lIIIYQQQgghhBBCCCGEEEII0e5k4lIIIYQQQgghhBBCCCGEEEII0e5k4lIIIYQQQgghhBBCCCGEEEII0e5k4lIIIYQQQgghhBBCCCGEEEII0e5k4lIIIYQQQgghhBBCCCGEEEII0e7M2jsAIYQQQgghhBBCCCGEEEII0fEoFPL8m/jv6lATl5Z+U9s7BB3q1F/o+fP+9g5Dx6lpQ4j4el97h9EoZu5QpuzuOPEA/Dp8KP3/ONDeYeg4cs9gAp/f0N5h6Eh6axwByza3dxg6kl8dg/8bO9o7DB0pi0fQ65eOlQdOTh1C52861nkXPWcofdd0rPPu2H2DeeborvYOQ8e7/W4lZOIP7R1Go/i1M1l+qmOdc6/0HIFL2JPtHYaOvEsrCFi+pb3D0JH8yugO1R8AbZ/gldMdqz4t7zGCOfv3tHcYOr4Zcgu3bDzY3mHo2DN2EN1+7Fht3bkZQ+izumO1K8cnDabf7x0rpqP3DubRw7vbO4xGHw0YzuC/O9Y+OnDXYO7a0bHq998jhuAc+nh7h6Ej//IqZu7d295h6Phh2DAmdbCx5urhQztkP/yhgx0nDwB8Nmg4kd92rP10cfZQAj/uWHU86ZFhDF3fcfoE+8YP6pDnXKfPO1ZMcQuGMmprx2rrto7qmG1d4DPr2zsMHUnvjuf2LR3nnAPYPnoQfqs6Vm5KfXwYp/M71jXMHs7j2jsEIcR/mUyVCyGEEEIIIYQQQgghhBBCCCHanUxcCiGEEEIIIYQQQgghhBBCCCHanUxcCiGEEEIIIYQQQgghhBBCCCHanUxcCiGEEEIIIYQQQgghhBBCCCHanUxcCiGEEEIIIYQQQgghhBBCCCHanVl7ByCEEEIIIYQQQgghhBBCCCE6HoVCnn8T/11S44QQQgghhBBCCCGEEEIIIYQQ7U4mLoUQQgghhBBCCCGEEEIIIYQQ7U4mLoUQQgghhBBCCCGEEEIIIYQQ7U4mLoUQQgghhBBCCCGEEEIIIYQQ7U4mLoUQQgghhBBCCCGEEEIIIYQQ7U4mLoUQQgghhBBCCCGEEEIIIYQQ7U4mLoUQQgghhBBCCCGEEEIIIYQQ7c7sRn1RbW0tGRkZ+Pn53aivFEIIIYQQQgghhBBCCCGEEO1EoVC0dwjif8wNm7i8ePEiPXv2pK6u7oZ836C+4Tz10Dh6dg3C092RSfPeY/22E61uM6R/BG+9NIPOoT5cycznzVVr+en3fTplFsy8nacWjMfd1Z7zMaksWvYdJ84mXFdskzp5MjPCB2dLJZcLy3j7ZAIX88v0lp0Y7MG4QDeCHawAiCko46OzyTrlb/Vx5p5OnkQ42eCgMmfKplNcLiq/rpimRXgyp6svLpZKYgvK+OfhBM7nleote1+YB3eGuNPJURtTdF4ZH5xI1in/+pBQJoZ66Gy3/0oB87deMCqe3D27ydm2lZqSYix9fPGZPBXrwECD5QtPniBz3d9U5+ehcnPHa+I92HftCoCmrpaMv/+i5MIFqvNyMbG0xDY8Au+J92Du4GBUPAD3BHkyPdQbJwsl8cXlvHcmgehC/cct0NaK+ZF+hDvY4GltwQdnE/ktPuOacq4WSh7pGsAAd0dUZiZcKavktRNxxBbp/96WZgzwZ/7QYFxtVcRklvDy3xc5e6Woze3Gdffiw2k92XYxiwU/6D8vXpvYlfv7+/Pq+ot8eyDJqHgAZvT1Y8GgQFxtVMRkl7J8YzRn04vb3G58F08+nBTFtphs5v9yqvHnoyLcub+PH1297HC0UnLHJweIztJfNw2Z2dOH+f38cbVREpNTxvJtlzibWaK37OhQVx4ZGIi/oyXmJiYkFVbw5bEU1l7IaiyTsniE3m1f3xXH50dTjIrpvk6ezAzX5oG4q3mgQP9xD7Kz4qFu/kQ42uBlY8G7pxL45ZJufbIyM2VhN3+G+zjjqDLnUmE5755KINrAd+ozNcKTOV20eeBSYet54N5QD+4KcSfkah7IL2NFizzwzyGhTOx0bR5YsM24PABwb7An08O8cbZQEldUzrunDZ93QXY
|
|||
|
"text/plain": [
|
|||
|
"<Figure size 2500x2500 with 2 Axes>"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {},
|
|||
|
"output_type": "display_data"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"correlation_matrix = df_train.corr()\n",
|
|||
|
"fig, ax = plt.subplots(figsize=(25, 25))\n",
|
|||
|
"\n",
|
|||
|
"ax = sns.heatmap(\n",
|
|||
|
" correlation_matrix,\n",
|
|||
|
" annot=True,\n",
|
|||
|
" linewidths=0.5,\n",
|
|||
|
" fmt=\".2f\",\n",
|
|||
|
" cmap=\"YlGnBu\"\n",
|
|||
|
")\n",
|
|||
|
"\n",
|
|||
|
"# Jupyter notebook specific\n",
|
|||
|
"bottom_side, top_side = ax.get_ylim()\n",
|
|||
|
"ax.set_ylim(bottom_side + 0.5, top_side - 0.5)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"attachments": {},
|
|||
|
"cell_type": "markdown",
|
|||
|
"id": "c2b4a57c",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"We can see that there is the highest positive correlation in **V14** atribute and the highest negative value in the attributes **V1, V27** So lets see the distribution of those values in comparrison to class."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"id": "f1918d5b",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"**V14 vs V17**"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 12,
|
|||
|
"id": "8d4ce9a6",
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"text/plain": [
|
|||
|
"<matplotlib.legend.Legend at 0x7fe49dcbc5b0>"
|
|||
|
]
|
|||
|
},
|
|||
|
"execution_count": 12,
|
|||
|
"metadata": {},
|
|||
|
"output_type": "execute_result"
|
|||
|
},
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAABNoAAANXCAYAAADjAjLCAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjYuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8o6BhiAAAACXBIWXMAAA9hAAAPYQGoP6dpAADjfklEQVR4nOzdeVhUZf8G8PucYd8XRcAUcElFchdFc/lVCmqo1Ztmmlrma6aVLb5lZUhWaquVZablmpmlpaihZpobiokbobmElgqSICCyzzm/P8YZGWY7AwMDeH+uy0vnnGfOec4w9b7efZ/nK8iyLIOIiIiIiIiIiIiqRbT3BIiIiIiIiIiIiBoCBm1EREREREREREQ2wKCNiIiIiIiIiIjIBhi0ERERERERERER2QCDNiIiIiIiIiIiIhtg0EZERERERERERGQDDNqIiIiIiIiIiIhsgEEbERERERERERGRDTBoIyIiIiIiIiIisgEGbURERNRgjR8/HqGhoXa5tyAImDVrlk2vuXLlSrRt2xaOjo7w8fGx6bVt5fz58xAEAcuWLbP3VMxKTExEp06d4OLiAkEQkJuba+8p1Sp7/rNBRETUkDFoIyIiqgcEQVD0a9euXfaeqp79+/dj1qxZt12IURNOnTqF8ePHo2XLlli8eDG+/PJLu85n9erVmD9/vl3nUFXZ2dkYMWIEXF1d8dlnn2HlypVwd3c3GDd06FC4ubnh+vXrJq81evRoODk5ITs7GwDw3XffYcyYMWjdujUEQUD//v1r6jFqVFZWFhwcHDBmzBiTY65fvw5XV1c8+OCDAIBDhw5h6tSpaN++Pdzd3dG8eXOMGDECp0+fNnivuX+PDRgwoMaei4iIqKY52HsCREREZNnKlSv1Xq9YsQLbt283ON6uXbvanJZF+/fvR3x8PMaPH19nK7BqSlFRERwcbPd/tXbt2gVJkvDxxx+jVatWNrtuVa1evRqpqamYNm2a3vGQkBAUFRXB0dHRPhNT4NChQ7h+/Tpmz56N++67z+S40aNHIyEhAT/++CPGjh1rcL6wsBAbNmxATEwM/P39AQALFy7E4cOH0b17d134Vh8FBARgwIAB2LBhAwoLC+Hm5mYwZv369SguLtaFcfPmzcO+ffvw8MMPo0OHDsjMzMSCBQvQpUsXHDhwABEREbr3Vv53FwD8/vvv+PjjjzFw4MCaezAiIqIaxqCNiIioHqhcVXLgwAFs377dbLWJUrIso7i4GK6urtW+Ft3i4uJi0+tlZWUBQJ0PLAVBsPmz25rSz3Lo0KHw9PTE6tWrjQZtGzZswI0bNzB69GjdsZUrV6Jp06YQRVEvWKqPRo8ejcTERGzcuBGPPPKIwfnVq1fD29sbQ4YMAQC88MILWL16NZycnHRjRo4cibvuugtz587FqlWrdMeN/btr165dEAQBo0aNqoGnISIiqh1cOkpERNRALF26FPfccw8CAgLg7OyM8PBwLFy40GBcaGgo7r//fmzduhXdunWDq6srFi1aBAC4cOEChg4dCnd3dwQEBOD555/H1q1bjS5LPXjwIGJiYuDt7Q03Nzf069cP+/bt052fNWsWpk+fDgAICwvTLQs7f/680flPnToVHh4eKCwsNDg3atQoBAYGQq1WA9AEHEOGDEFwcDCcnZ3RsmVLzJ49W3feFO1f5Cs/i6l9xU6dOoX//Oc/8PPzg4uLC7p164aNGzeavYdW5T3aZs2aBUEQcPbsWV2Fn7e3Nx5//HGjz1xRaGgo4uLiAACNGzfWu7apveBCQ0Mxfvx43etly5ZBEATs27cPL7zwAho3bgx3d3c88MAD+Pfffw3e//PPP6Nfv37w9PSEl5cXunfvjtWrVwMA+vfvj82bN+PChQu6n6t2vy9Tn+Wvv/6KPn36wN3dHT4+Phg2bBhOnjypN6Y6n5HW999/j65du8LV1RWNGjXCmDFjcOnSJd35/v37Y9y4cQCA7t27QxAEvc+pIu2yyB07dujCuYpWr14NT09PDB06VHesWbNmEMWq/V/s0tJSvPHGG+jatSu8vb3h7u6OPn36YOfOnXrjtJ/x+++/jy+//BItW7aEs7MzunfvjkOHDhlc96effkJERARcXFwQERGBH3/8UdF8HnjgAbi7u+t+7hVlZWVhx44d+M9//gNnZ2cAQK9evfRCNgBo3bo12rdvb/CzrqykpATr1q1Dv379cMcddyiaHxERUV3EijYiIqIGYuHChWjfvj2GDh0KBwcHJCQk4Omnn4YkSZgyZYre2D///BOjRo3CpEmTMHHiRLRp0wY3btzAPffcg4yMDDz33HMIDAzE6tWrDf6SD2hCk0GDBqFr166Ii4uDKIq6oG/Pnj2IjIzEgw8+iNOnT+Pbb7/FRx99hEaNGgHQBEXGjBw5Ep999hk2b96Mhx9+WHe8sLAQCQkJGD9+PFQqFQBNaOTh4YEXXngBHh4e+PXXX/HGG28gPz8f7733nk0+zz/++AO9e/dG06ZN8corr8Dd3R1r167F8OHDsW7dOjzwwANVuu6IESMQFhaGOXPmICUlBUuWLEFAQADmzZtn8j3z58/HihUr8OOPP2LhwoXw8PBAhw4dqnT/Z555Br6+voiLi8P58+cxf/58TJ06Fd99951uzLJly/DEE0+gffv2mDFjBnx8fHDkyBEkJibi0UcfxWuvvYa8vDxcvHgRH330EQDAw8PD5D1/+eUXDBo0CC1atMCsWbNQVFSETz/9FL1790ZKSorBpvxV+Yy083788cfRvXt3zJkzB1euXMHHH3+Mffv24ciRI/Dx8cFrr72GNm3a4Msvv8Sbb76JsLAwtGzZ0uQ1R48ejeXLl2Pt2rWYOnWq7nhOTg62bt2KUaNG2awaND8/H0uWLMGoUaMwceJEXL9+HV999RWio6ORnJyMTp066Y1fvXo1rl+/jkmTJkEQBLz77rt48MEH8ddff+mW7m7btg0PPfQQwsPDMWfOHGRnZ+Pxxx9XFGa5u7tj2LBh+OGHH5CTkwM/Pz/due+++w5qtVqvms8YWZZx5coVtG/f3uy4LVu2IDc31+L1iIiI6jyZiIiI6p0pU6bIlf9nvLCw0GBcdHS03KJFC71jISEhMgA5MTFR7/gHH3wgA5B/+ukn3bGioiK5bdu2MgB5586dsizLsiRJcuvWreXo6GhZkiS9+4eFhckDBgzQHXvvvfdkAHJ6errFZ5IkSW7atKn80EMP6R1fu3atDEDevXu32WedNGmS7ObmJhcXF+uOjRs3Tg4JCdG93rlzp96zaKWnp8sA5KVLl+qO3XvvvfJdd92ldz1JkuRevXrJrVu3tvg8AOS4uDjd67i4OBmA/MQTT+iNe+CBB2R/f3+L19O+/99//zV7H62QkBB53LhxutdLly6VAcj33Xef3s/t+eefl1UqlZybmyvLsizn5ubKnp6eco8ePeSioiK9a1Z835AhQ/Q+Wy1jn2WnTp3kgIAAOTs7W3fs2LFjsiiK8tixYw2esSqfUWlpqRwQECBHRETozXvTpk0yAPmNN94w+CwOHTpk9pqyLMvl5eVyUFCQHBUVpXf8iy++kAHIW7duNfne9u3by/369bN4j4r3Kikp0Tt27do1uUmTJnqfifYz9vf3l3NycnTHN2zYIAOQExISdMc6deokBwUF6X6+sizL27ZtkwEY/flVtnnzZhmAvGjRIr3jPXv2lJs2bSqr1Wqz71+5cqUMQP7qq6/MjnvooYdkZ2dn+dq1axbnREREVJdx6SgREVEDUbGqJi8vD1evXkW/fv3w119/IS8vT29sWFgYoqOj9Y4lJiaiadOmesvgXFxcMHHiRL1xR48exZkzZ/Doo48iOzsbV69exdWrV3Hjxg3ce++92L17NyRJsnr+giDg4YcfxpYtW1BQUKA7/t1336Fp06a4++67jT7r9evXcfXqVfTp0weFhYU4deqU1feuLCcnB7/++itGjBihu/7Vq1eRnZ2N6OhonDlzRm85ojWeeuopvdd9+vRBdnY28vPzqz1vJf773/9CEAS9+6vValy4cAEAsH37dly/fh2vvPKKwV5rFd+nVEZGBo4ePYr
|
|||
|
"text/plain": [
|
|||
|
"<Figure size 1500x1000 with 1 Axes>"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {},
|
|||
|
"output_type": "display_data"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"\n",
|
|||
|
"plt.figure(figsize=(15, 10))\n",
|
|||
|
"\n",
|
|||
|
"# Scatter with 1 values of target class\n",
|
|||
|
"plt.scatter(\n",
|
|||
|
" df_train['V1'][df_train['Class'] == 1],\n",
|
|||
|
" df_train['V27'][df_train['Class'] == 1],\n",
|
|||
|
")\n",
|
|||
|
"\n",
|
|||
|
"# Scatter with 2 values of target class\n",
|
|||
|
"plt.scatter(\n",
|
|||
|
" df_train['V1'][df_train['Class'] == 2],\n",
|
|||
|
" df_train['V27'][df_train['Class'] == 2],\n",
|
|||
|
")\n",
|
|||
|
"\n",
|
|||
|
"plt.title('Target value in function of V1 and V27')\n",
|
|||
|
"\n",
|
|||
|
"plt.xlabel('V1')\n",
|
|||
|
"plt.ylabel('V27')\n",
|
|||
|
"plt.legend(['Biodegradable', 'Non-biodegradable'])\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 13,
|
|||
|
"id": "d50d1f44",
|
|||
|
"metadata": {},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"# Spliting the data into features and labels\n",
|
|||
|
"X_train = df_train.drop('Class', axis=1)\n",
|
|||
|
"y_train = df_train['Class']\n",
|
|||
|
"X_test = df_test.drop('Class', axis=1)\n",
|
|||
|
"y_test = df_test['Class']"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 14,
|
|||
|
"id": "f0aa7c9d",
|
|||
|
"metadata": {},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"from sklearn.linear_model import LogisticRegression\n",
|
|||
|
"from sklearn.neighbors import KNeighborsClassifier\n",
|
|||
|
"from sklearn.ensemble import RandomForestClassifier\n",
|
|||
|
"\n",
|
|||
|
"# Put models in a dictionary\n",
|
|||
|
"models = {\n",
|
|||
|
" \"Logistic Regression\": LogisticRegression(),\n",
|
|||
|
" \"KNN\": KNeighborsClassifier(),\n",
|
|||
|
" \"Random Forest\": RandomForestClassifier()\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"# Create a function to fit and score models\n",
|
|||
|
"def fit_and_score(models, X_train, X_test, y_train, y_test):\n",
|
|||
|
" \"\"\"\n",
|
|||
|
" Fits and evaluates given machine learning models.\n",
|
|||
|
" models: dict of different Scikit-Learn machine learning models\n",
|
|||
|
" X_train: training data (no labels)\n",
|
|||
|
" x_test: testing data (no labels)\n",
|
|||
|
" y_train: training labels\n",
|
|||
|
" y_test: trest labels\n",
|
|||
|
" \"\"\"\n",
|
|||
|
"\n",
|
|||
|
" # Set random seed\n",
|
|||
|
" np.random.seed(42)\n",
|
|||
|
"\n",
|
|||
|
" # Make a dictioanry to keep model scores\n",
|
|||
|
" model_scores = {}\n",
|
|||
|
"\n",
|
|||
|
" # Loop through models\n",
|
|||
|
" for name, model in models.items():\n",
|
|||
|
" # Fit the model to the data\n",
|
|||
|
" model.fit(X_train, y_train)\n",
|
|||
|
" # Evaluate the model and append its score to model_scores\n",
|
|||
|
" model_scores[name] = model.score(X_test, y_test)\n",
|
|||
|
"\n",
|
|||
|
" return model_scores"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"id": "10387356",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"#### Check if there are any missing values"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 15,
|
|||
|
"id": "87e277e6",
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"text/plain": [
|
|||
|
"V4 25\n",
|
|||
|
"V22 16\n",
|
|||
|
"V27 8\n",
|
|||
|
"V29 8\n",
|
|||
|
"V37 25\n",
|
|||
|
"dtype: int64"
|
|||
|
]
|
|||
|
},
|
|||
|
"execution_count": 15,
|
|||
|
"metadata": {},
|
|||
|
"output_type": "execute_result"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"na_counts = df_train.isna().sum()\n",
|
|||
|
"na_counts[na_counts > 0]\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"id": "cb57434a",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"#### We can see that there are five atributes that have missing values. Lets inspect them."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"id": "9dbd2c02",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"##### V4"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 16,
|
|||
|
"id": "ca1e544a",
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"text/plain": [
|
|||
|
"count 821.000000\n",
|
|||
|
"mean 0.030451\n",
|
|||
|
"std 0.198281\n",
|
|||
|
"min 0.000000\n",
|
|||
|
"25% 0.000000\n",
|
|||
|
"50% 0.000000\n",
|
|||
|
"75% 0.000000\n",
|
|||
|
"max 2.000000\n",
|
|||
|
"Name: V4, dtype: float64"
|
|||
|
]
|
|||
|
},
|
|||
|
"execution_count": 16,
|
|||
|
"metadata": {},
|
|||
|
"output_type": "execute_result"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"df_train['V4'].describe()"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 17,
|
|||
|
"id": "9e4d7d1d",
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"text/plain": [
|
|||
|
"0.0 800\n",
|
|||
|
"1.0 17\n",
|
|||
|
"2.0 4\n",
|
|||
|
"Name: V4, dtype: int64"
|
|||
|
]
|
|||
|
},
|
|||
|
"execution_count": 17,
|
|||
|
"metadata": {},
|
|||
|
"output_type": "execute_result"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"df_train['V4'].value_counts()"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"id": "3a3191c9",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"We can see that the majority of entires in that particular atribute are zeros. So I think that it would be best if I set all the `Nan` values to zeros."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 18,
|
|||
|
"id": "d8489bd4",
|
|||
|
"metadata": {},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"df_train['V4'].fillna(0, inplace=True)\n",
|
|||
|
"df_test['V4'].fillna(0, inplace=True)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"id": "3e84e48b",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"##### V22"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 19,
|
|||
|
"id": "a711431d",
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"text/plain": [
|
|||
|
"count 830.000000\n",
|
|||
|
"mean 1.243898\n",
|
|||
|
"std 0.094109\n",
|
|||
|
"min 0.898000\n",
|
|||
|
"25% 1.187500\n",
|
|||
|
"50% 1.248500\n",
|
|||
|
"75% 1.298750\n",
|
|||
|
"max 1.641000\n",
|
|||
|
"Name: V22, dtype: float64"
|
|||
|
]
|
|||
|
},
|
|||
|
"execution_count": 19,
|
|||
|
"metadata": {},
|
|||
|
"output_type": "execute_result"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"df_train['V22'].describe()"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 20,
|
|||
|
"id": "f0325325",
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"text/plain": [
|
|||
|
"1.299 9\n",
|
|||
|
"1.280 9\n",
|
|||
|
"1.296 8\n",
|
|||
|
"1.254 8\n",
|
|||
|
"1.264 8\n",
|
|||
|
" ..\n",
|
|||
|
"1.449 1\n",
|
|||
|
"1.159 1\n",
|
|||
|
"1.363 1\n",
|
|||
|
"1.331 1\n",
|
|||
|
"1.410 1\n",
|
|||
|
"Name: V22, Length: 321, dtype: int64"
|
|||
|
]
|
|||
|
},
|
|||
|
"execution_count": 20,
|
|||
|
"metadata": {},
|
|||
|
"output_type": "execute_result"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"df_train['V22'].value_counts()"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 21,
|
|||
|
"id": "25a74baf",
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAjIAAAHHCAYAAACle7JuAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjYuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8o6BhiAAAACXBIWXMAAA9hAAAPYQGoP6dpAABLi0lEQVR4nO3deVxU1f8/8NcAssgyyg6iiLiFiguW4YqKIhq5FW4lmGYWrlgWnywlLTTLLbf6ZriFW25lKi4ouFYq5FIumOYC4s5moDDn90c/JkeGbZzhzoXX8/G4j4dz751z33NnhBfnnntGIYQQICIiIpIhE6kLICIiItIVgwwRERHJFoMMERERyRaDDBEREckWgwwRERHJFoMMERERyRaDDBEREckWgwwRERHJFoMMERERyRaDTDU0ffp0KBSKSjlWQEAAAgIC1I8PHDgAhUKBH374oVKOHx4ejvr161fKsXSVk5ODUaNGwdXVFQqFAhMnTqxwG0Xv6Z07d/RfIEGhUGD69Onqx3I63ytWrIBCocCVK1cMfqzw8HDY2NgY/DiG9vT7TcaNQUbmin5IFS2WlpZwd3dHUFAQFi5ciOzsbL0cJy0tDdOnT0dKSope2tMnY66tPD777DOsWLECb7/9NlavXo3XX3+91H23bt1aecU94eWXX0bNmjVL/UwNGzYM5ubmuHv3Lu7evYs5c+agc+fOcHJyQq1atfDiiy9i/fr1xZ7322+/YezYsWjWrBmsra1Rr149hIaG4sKFC3p9DUeOHMH06dPx4MEDvbarD8Zc25MePnyI6dOn48CBA5LVsGPHjmobNOLi4jB//nypyzAugmQtNjZWABCffPKJWL16tfjuu+/EZ599Jnr27CkUCoXw9PQUv//+u8ZzHj9+LP75558KHee3334TAERsbGyFnpefny/y8/PVj/fv3y8AiI0bN1aoHV1re/TokcjLy9PbsQyhXbt2okOHDuXa19raWoSFhRVbP23aNAFA3L59W8/V/WfdunUCgFi5cqXW7bm5ucLa2lqEhIQIIYT46aefRI0aNUTfvn3F/PnzxaJFi0TXrl0FAPHxxx9rPHfgwIHC1dVVjBs3Tvzf//2fmDFjhnBxcRHW1tbi9OnTensNc+bMEQDE5cuXK/S8f/75Rzx+/Fj92BDnW9faylJQUCD++ecfoVKp9NLe7du3BQAxbdq0YtvCwsKEtbW1Xo5TmoiICGHIX19Pv9/GpE+fPsLT01PqMoyKmTTxifQtODgYbdu2VT+OiopCQkICXnrpJbz88sv4888/YWVlBQAwMzODmZlh3/qHDx+iZs2aMDc3N+hxylKjRg1Jj18et27dgo+Pj9RllOnll1+Gra0t4uLiMHz48GLbt23bhtzcXAwbNgwA0KxZM1y8eBGenp7qfd555x0EBgZi9uzZmDJlCqytrQEAkZGRiIuL0/i8DBo0CC1atMCsWbOwZs0aA7+64lQqFR49egRLS0tYWlpW+vH1xdTUFKamplKXIZmCggKoVKoK/SyS8/tdLUmdpOjZFPXI/Pbbb1q3f/bZZwKA+Oabb9Triv6afNLu3btFhw4dhFKpFNbW1qJx48YiKipKCPFfL8rTS1EPSJcuXUSzZs3E8ePHRadOnYSVlZWYMGGCeluXLl3Uxylqa926dSIqKkq4uLiImjVripCQEHH16lWNmjw9PbX2PjzZZlm1hYWFFfvrJScnR0RGRgoPDw9hbm4uGjduLObMmVPsL1YAIiIiQmzZskU0a9ZMmJubCx8fH7Fz506t5/ppGRkZ4o033hDOzs7CwsJC+Pr6ihUrVhQ7F08vJf1Frm3fovNT9J5evHhRhIWFCaVSKezs7ER4eLjIzc0t1tbq1atFmzZthKWlpahdu7YYNGhQsfOvTVhYmDAzMxMZGRnFtr300kvC1tZWPHz4sNQ2Fi5cKACIU6dOlXm8Nm3aiDZt2pS53++//y7CwsKEl5eXsLCwEC4uLmLEiBHizp076n2KzlFJ57vo/V6zZo3w8fERZmZmYsuWLeptT/ZAFLX1559/ildffVXY2toKe3t7MX78eI3ezsuXL5fYW/hkm2XVJoTu71nRz4gn2/L09BR9+vQRBw8eFM8//7ywsLAQXl5eJfa2Pf16nl6KXkdRj8z169dF3759hbW1tXB0dBSTJ08WBQUFGm0VFhaKefPmCR8fH2FhYSGcnZ3F6NGjxb1790qtISwsTGsNT9Y3Z84cMW/ePNGgQQNhYmIikpOTRX5+vvjoo49EmzZthJ2dnahZs6bo2LGjSEhIKHaMkt7v8v7/etqFCxfEgAEDhIuLi7CwsBB16tQRgwYNEg8ePNDYr6z3uEuXLsVeN3tn2CNT5b3++uv43//+h927d+PNN9/Uus/Zs2fx0ksvwdfXF5988gksLCyQmpqKw4cPAwCee+45fPLJJ/j4448xevRodOrUCQDQvn17dRt3795FcHAwBg8ejNdeew0uLi6l1vXpp59CoVDg/fffx61btzB//nwEBgYiJSVF3XNUHuWp7UlCCLz88svYv38/Ro4ciVatWiE+Ph7vvfcebty4gXnz5mnsf+jQIWzevBnvvPMObG1tsXDhQgwcOBBXr16Fg4NDiXX9888/CAgIQGpqKsaOHQsvLy9s3LgR4eHhePDgASZMmIDnnnsOq1evxqRJk+Dh4YHJkycDAJycnLS2uXr1aowaNQovvPACRo8eDQDw9vbW2Cc0NBReXl6IiYnByZMn8e2338LZ2RmzZ89W7/Ppp5/io48+QmhoKEaNGoXbt2/jq6++QufOnZGcnIxatWqV+LqGDRuGlStXYsOGDRg7dqx6/b179xAfH48hQ4aU+f7dvHkTAODo6FjqfkIIZGRkoFmzZqXuBwB79uzBX3/9hREjRsDV1RVnz57FN998g7Nnz+LYsWNQKBQYMGAALly4gLVr12LevHnq4z95vhMSEtSvzdHRscyB4qGhoahfvz5iYmJw7NgxLFy4EPfv38eqVavKrPlJZdX2LO9ZSVJTU/HKK69g5MiRCAsLw3fffYfw8HD4+fmVeM6dnJywdOlSvP322+jfvz8GDBgAAPD19VXvU1hYiKCgILRr1w5ffPEF9u7diy+//BLe3t54++231fu99dZbWLFiBUaMGIHx48fj8uXLWLRoEZKTk3H48OESe1PfeustpKWlYc+ePVi9erXWfWJjY5GXl4fRo0fDwsIC9vb2yMrKwrfffoshQ4bgzTffRHZ2NpYvX46goCD8+uuvaNWqVZnnrDz/v5726NEjBAUFIT8/H+PGjYOrqytu3LiB7du348GDB1AqlQDK9x5/+OGHyMzMxPXr19U/q6rC4OpnJnWSomdTVo+MEEIolUrRunVr9eOne2TmzZtX5vX+0sahFP2VsGzZMq3btPXI1KlTR2RlZanXb9iwQQAQCxYsUK8rT49MWbU93SOzdetWAUDMnDlTY79XXnlFKBQKkZqaql4HQJibm2us+/333wUA8dVXXxU71pPmz58vAIg1a9ao1z169Ej4+/sLGxsbjdde9NdxeZQ1RuaNN97QWN+/f3/h4OCgfnzlyhVhamoqPv30U439Tp8+LczMzIqtf1pBQYFwc3MT/v7+GuuXLVsmAIj4+PhSn3/37l3h7OwsOnXqVOp+Qvz71ykAsXz58jL31dYLtHbtWgFAJCUlqdeVNg4FgDAxMRFnz57Vuk3bX+gvv/yyxn7vvPOOAKAel1beHpnSanvW96ykHpmnz82tW7eEhYWFmDx5cqntlTVGBv9/zN6TWrduLfz8/NSPDx48KACI77//XmO/Xbt2aV3/tJLGyBSdbzs7O3Hr1i2NbQUFBRrj9YQQ4v79+8LFxaXY/5uS3u+y/n9pk5ycXOa4wIq8xxwjUxzvWqoGbGxsSr3TpOivuW3btkGlUul0DAsLC4wYMaLc+w8fPhy2trbqx6+88grc3NywY8cOnY5fXjt27ICpqSnGjx+vsX7y5MkQQmDnzp0a6wMDAzV6PXx9fWFnZ4e//vqrzOO4urpiyJAh6nU1atTA+PHjkZOTg8TERD28muLGjBmj8bhTp064e/c
|
|||
|
"text/plain": [
|
|||
|
"<Figure size 640x480 with 1 Axes>"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {},
|
|||
|
"output_type": "display_data"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"_, _, bars = plt.hist(df_test['V22'], bins=20)\n",
|
|||
|
"plt.xlabel('V22')\n",
|
|||
|
"plt.ylabel('Frequency')\n",
|
|||
|
"plt.title('Distribution of the V22 atribute in the train set')\n",
|
|||
|
"plt.bar_label(bars, fmt='%1.0f')\n",
|
|||
|
"plt.show()"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"id": "6d6b63fd",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"The distribution of the target variable **V22** is normal, so i could try to fill the missing values with `mean()`."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 22,
|
|||
|
"id": "2b2b6e2d",
|
|||
|
"metadata": {},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"df_train['V22'].fillna(df_train['V22'].mean(), inplace=True)\n",
|
|||
|
"df_test['V22'].fillna(df_test['V22'].mean(), inplace=True)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"id": "4164f62c",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"##### V27"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 23,
|
|||
|
"id": "9a8b64ac",
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"text/plain": [
|
|||
|
"count 838.000000\n",
|
|||
|
"mean 2.218153\n",
|
|||
|
"std 0.221545\n",
|
|||
|
"min 1.000000\n",
|
|||
|
"25% 2.107000\n",
|
|||
|
"50% 2.251000\n",
|
|||
|
"75% 2.359750\n",
|
|||
|
"max 2.859000\n",
|
|||
|
"Name: V27, dtype: float64"
|
|||
|
]
|
|||
|
},
|
|||
|
"execution_count": 23,
|
|||
|
"metadata": {},
|
|||
|
"output_type": "execute_result"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"df_train['V27'].describe()"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 24,
|
|||
|
"id": "1bddfb76",
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"text/plain": [
|
|||
|
"2.000 36\n",
|
|||
|
"2.236 31\n",
|
|||
|
"2.194 24\n",
|
|||
|
"1.848 22\n",
|
|||
|
"2.175 21\n",
|
|||
|
" ..\n",
|
|||
|
"2.294 1\n",
|
|||
|
"2.466 1\n",
|
|||
|
"2.488 1\n",
|
|||
|
"2.372 1\n",
|
|||
|
"2.622 1\n",
|
|||
|
"Name: V27, Length: 290, dtype: int64"
|
|||
|
]
|
|||
|
},
|
|||
|
"execution_count": 24,
|
|||
|
"metadata": {},
|
|||
|
"output_type": "execute_result"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"df_train['V27'].value_counts()"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 25,
|
|||
|
"id": "f1787f2e",
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAjIAAAHHCAYAAACle7JuAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjYuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8o6BhiAAAACXBIWXMAAA9hAAAPYQGoP6dpAABPhElEQVR4nO3deViUVf8/8PeIzohsirI+IBK4oWJGhmgqKoJohmm55AKKaT5gqW3SoqIVmuVSKVZfA7UQl0TLEnIDl9TSJJdSwTQ1QcyFzUBlzu8Pf8zjyDaMM9xzD+/Xdc1Vc+4zZz5nzj3jhzPnPqMQQggQERERyVADqQMgIiIi0hcTGSIiIpItJjJEREQkW0xkiIiISLaYyBAREZFsMZEhIiIi2WIiQ0RERLLFRIaIiIhki4kMERERyRYTmXpozpw5UCgUdfJcgYGBCAwM1NxPT0+HQqHAxo0b6+T5IyIi0KpVqzp5Ln0VFRVh4sSJcHZ2hkKhwLRp02rdRvmY/vPPP4YPsJ47f/48FAoFEhMTNWURERGwtraWLqhaqOv3e8eOHevkuYylsvEm08ZERuYSExOhUCg0t8aNG8PV1RUhISH4+OOPUVhYaJDnuXz5MubMmYPMzEyDtGdIphybLt5//30kJiZiypQpWLNmDcaOHVtt3c2bN9ddcPd5+umn0aRJk2rPqdGjR0OpVOLatWu4du0aFi5ciF69esHBwQFNmzZFt27dsG7dugqPi4iI0DqPH7z9/fffBunDDz/8gDlz5hikLUMz5djuZwrvt6SkJCxZskSy55fS8uXLmWQ9SJCsJSQkCABi7ty5Ys2aNeLLL78U77//vggODhYKhUJ4eHiI3377Tesxd+7cEf/++2+tnueXX34RAERCQkKtHldaWipKS0s193fv3i0AiA0bNtSqHX1ju337tigpKTHYcxmDv7+/6NGjh051raysRHh4eIXy2bNnCwDi6tWrBo7uf5KTkwUAsWrVqkqPFxcXCysrKzF48GAhhBDfffedaNSokQgLCxNLliwRn376qejTp48AIGbNmqX12J9++kmsWbNG67Z69WrRpEkT4ePjY7A+REVFidp+7KnVavHvv/+Ku3fvasrCw8OFlZWVweLSNzZd6PN+r05177fevXuLDh06GOy5qjJo0CDh4eFhlLYrG29T0qFDB9G7d2+pwzApDaVKoMiwQkND8fjjj2vux8TEYNeuXXjqqafw9NNP448//oClpSUAoGHDhmjY0LhDf+vWLTRp0gRKpdKoz1OTRo0aSfr8usjLy4OPj4/UYdTo6aefho2NDZKSkjBu3LgKx7ds2YLi4mKMHj0aANChQwdkZWXBw8NDU+e///0vgoKCsGDBArz++uuwsrICAAQEBCAgIECrvX379uHWrVua9ura3bt3oVaroVQq0bhxY0liMIS6eL+bspKSEiiVSjRooNsXEOUz2yQjUmdS9HDKZ2R++eWXSo+///77AoD4/PPPNWXlf73f78cffxQ9evQQdnZ2wsrKSrRp00bExMQIIf43i/LgrfwvsvK/wg4fPix69uwpLC0txcsvv6w5dv9fD+VtJScni5iYGOHk5CSaNGkiBg8eLC5cuKAVk4eHR6WzD/e3WVNs4eHhFf5yKyoqEjNmzBBubm5CqVSKNm3aiIULFwq1Wq1VD4CIiooSKSkpokOHDkKpVAofHx+xbdu2Sl/rB125ckVMmDBBODo6CpVKJXx9fUViYmKF1+LB27lz5yptr7K65a9P+ZhmZWWJ8PBwYWdnJ2xtbUVERIQoLi6u0NaaNWvEY489Jho3biyaNWsmRowYUeH1r0x4eLho2LChuHLlSoVjTz31lLCxsRG3bt2qto2PP/5YABDHjh2rtt6UKVOEQqGo8vW43549e8Szzz4r3N3dhVKpFG5ubmLatGlasYSHh1f6GgohxLlz5wQAsXDhQrF48WLxyCOPiAYNGoijR49qjt0/A1E+I3P27FkRHBwsmjRpIlxcXERsbKzWeVQ+xrt379aK98E2q4tNCCHKysrE4sWLhY+Pj1CpVMLR0VFMmjRJXL9+vcbXprL3u77ntq6fBSdPnhSBgYHC0tJSuLq6igULFlRoq6SkRMyaNUt4eXlpxuy1116rcQa1d+/eFZ6//D1eHt/atWvFW2+9JVxdXYVCoRA3btwQ165dE6+88oro2LGjsLKyEjY2NmLAgAEiMzNTq/3qxvvSpUsiLCxMWFlZiRYtWohXXnlFp5mbX375RQQHB4vmzZuLxo0bi1atWonx48dr1dFljD08PCr0nbMznJExe2PHjsWbb76JH3/8ES+88EKldU6ePImnnnoKvr6+mDt3LlQqFbKzs7F//34AQPv27TF37lzMmjULkyZNQs+ePQEA3bt317Rx7do1hIaGYuTIkRgzZgycnJyqjeu9996DQqHAG2+8gby8PCxZsgRBQUHIzMzUzBzpQpfY7ieEwNNPP43du3cjMjISjz76KNLS0vDaa6/h77//xuLFi7Xq79u3D5s2bcJ///tf2NjY4OOPP8awYcNw4cIFNG/evMq4/v33XwQGBiI7OxvR0dHw9PTEhg0bEBERgZs3b+Lll19G+/btsWbNGkyfPh1ubm545ZVXAAAODg6VtrlmzRpMnDgRTzzxBCZNmgQA8PLy0qozfPhweHp6Ii4uDr/++iv+7//+D46OjliwYIGmznvvvYd33nkHw4cPx8SJE3H16lV88skn6NWrF44ePYqmTZtW2a/Ro0dj1apVWL9+PaKjozXl169fR1paGkaNGlXj+OXm5gIAWrRoUWWdO3fuYP369ejevbtOi7U3bNiAW7duYcqUKWjevDl+/vlnfPLJJ7h06RI2bNgAAJg8eTIuX76M7du3Y82aNZW2k5CQgJKSEkyaNAkqlQr29vZQq9WV1i0rK8OAAQPQrVs3fPDBB0hNTcXs2bNx9+5dzJ07t8aY71dTbJMnT0ZiYiLGjx+Pl156CefOncOnn36Ko0ePYv/+/XrNPOpzbuvyfrtx4wYGDBiAoUOHYvjw4di4cSPeeOMNdOrUCaGhoQAAtVqNp59+Gvv27cOkSZPQvn17HD9+HIsXL8aZM2eqXQf21ltvIT8/H5cuXdK8Xx9ceD1v3jwolUq8+uqrKC0thVKpxO+//47Nmzfjueeeg6enJ65cuYLPPvsMvXv3xu+//w5XV9dqX6+ysjKEhITA398fH374IXbs2IGPPvoIXl5emDJlSpWPy8vLQ3BwMBwcHDBz5kw0bdoU58+fx6ZNm7Tq6TLGS5YswdSpU2FtbY233noLAGr8rK0XpM6k6OHUNCMjhBB2dnaiS5cumvsP/oW2ePHiGtdX1PS9OACxYsWKSo9VNiPzn//8RxQUFGjK169fLwCIpUuXasp0mZGpKbYHZ2Q2b94sAIh3331Xq96zzz4rFAqFyM7O1pQBEEqlUqvst99+EwDEJ598UuG57rdkyRIBQHz11Veastu3b4uAgABhbW2t1XcPDw8xaNCgatsrV9MamQkTJmiVP/PMM6J58+aa++fPnxcWFhbivffe06p3/Phx0bBhwwrlD7p7965wcXERAQEBWuUrVqwQAERaWlq1j7927ZpwdHQUPXv2rLbed999JwCI5cuXV1uvXGWzQHFxcUKhUIi//vpLU1bVOpTyv8JtbW1FXl5epcce/AsdgJg6daqmTK1Wi0GDBgmlUql5L+k6I1NdbHv37hUAxNdff61VnpqaWmn5g6qakdH33Nbls2D16tWastLSUuHs7CyGDRumKVuzZo1o0KCB2Lt3r9bjy8+j/fv3VxtDVWtkyl/vRx55pMI5UVJSIsrKyrTKzp07J1QqlZg7d65WWVXjfX89IYTo0qWL8PPzqzbWlJSUGj+jazPGXCNTEa9aqgesra2rvdKk/C/wLVu2VPnXZ01UKhXGjx+vc/1x48bBxsZGc//ZZ5+Fi4sLfvjhB72eX1c//PADLCws8NJLL2mVv/LKKxBCYNu2bVrlQUFBWrMevr6+sLW1xZ9//lnj8zg7O2PUqFGaskaNGuGll15CUVERMjIyDNCbil588UWt+z179sS
|
|||
|
"text/plain": [
|
|||
|
"<Figure size 640x480 with 1 Axes>"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {},
|
|||
|
"output_type": "display_data"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"_, _, bars = plt.hist(df_test['V27'], bins=20)\n",
|
|||
|
"plt.xlabel('V27')\n",
|
|||
|
"plt.ylabel('Frequency')\n",
|
|||
|
"plt.title('Distribution of the V27 atribute in the train set')\n",
|
|||
|
"plt.bar_label(bars, fmt='%1.0f')\n",
|
|||
|
"plt.show()"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"id": "53b79865",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"The distribution of the target variable **V27** is normal, so i could try to fill the missing values with `mean()`."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 26,
|
|||
|
"id": "8974127e",
|
|||
|
"metadata": {},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"# Set the nan values to the mean of the column\n",
|
|||
|
"df_train['V27'].fillna(df_train['V27'].mean(), inplace=True)\n",
|
|||
|
"df_test['V27'].fillna(df_test['V27'].mean(), inplace=True)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"attachments": {},
|
|||
|
"cell_type": "markdown",
|
|||
|
"id": "3afb5a2f",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"##### V29"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 27,
|
|||
|
"id": "f410439d",
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"text/plain": [
|
|||
|
"count 838.00000\n",
|
|||
|
"mean 0.02506\n",
|
|||
|
"std 0.15640\n",
|
|||
|
"min 0.00000\n",
|
|||
|
"25% 0.00000\n",
|
|||
|
"50% 0.00000\n",
|
|||
|
"75% 0.00000\n",
|
|||
|
"max 1.00000\n",
|
|||
|
"Name: V29, dtype: float64"
|
|||
|
]
|
|||
|
},
|
|||
|
"execution_count": 27,
|
|||
|
"metadata": {},
|
|||
|
"output_type": "execute_result"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"df_train['V29'].describe()"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 28,
|
|||
|
"id": "2d33e7c4",
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"text/plain": [
|
|||
|
"0.0 817\n",
|
|||
|
"1.0 21\n",
|
|||
|
"Name: V29, dtype: int64"
|
|||
|
]
|
|||
|
},
|
|||
|
"execution_count": 28,
|
|||
|
"metadata": {},
|
|||
|
"output_type": "execute_result"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"df_train['V29'].value_counts()"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"id": "515e9e80",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"We can see that the majority of entires in that particular atribute are zeros. So I think that it would be best if I set all the `Nan` values to zeros."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 29,
|
|||
|
"id": "48e8ba49",
|
|||
|
"metadata": {},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"# Set nan values to 0\n",
|
|||
|
"df_train['V29'].fillna(0, inplace=True)\n",
|
|||
|
"df_test['V29'].fillna(0, inplace=True)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"id": "f659f8bc",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"##### V37"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 30,
|
|||
|
"id": "8515f06b",
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"text/plain": [
|
|||
|
"count 821.000000\n",
|
|||
|
"mean 2.549406\n",
|
|||
|
"std 0.625021\n",
|
|||
|
"min 1.467000\n",
|
|||
|
"25% 2.101000\n",
|
|||
|
"50% 2.461000\n",
|
|||
|
"75% 2.861000\n",
|
|||
|
"max 5.750000\n",
|
|||
|
"Name: V37, dtype: float64"
|
|||
|
]
|
|||
|
},
|
|||
|
"execution_count": 30,
|
|||
|
"metadata": {},
|
|||
|
"output_type": "execute_result"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"df_train['V37'].describe()"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 31,
|
|||
|
"id": "36bc89b5",
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"text/plain": [
|
|||
|
"2.167 9\n",
|
|||
|
"2.500 9\n",
|
|||
|
"2.833 8\n",
|
|||
|
"2.667 8\n",
|
|||
|
"1.833 7\n",
|
|||
|
" ..\n",
|
|||
|
"2.029 1\n",
|
|||
|
"1.886 1\n",
|
|||
|
"2.089 1\n",
|
|||
|
"2.197 1\n",
|
|||
|
"2.206 1\n",
|
|||
|
"Name: V37, Length: 535, dtype: int64"
|
|||
|
]
|
|||
|
},
|
|||
|
"execution_count": 31,
|
|||
|
"metadata": {},
|
|||
|
"output_type": "execute_result"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"df_train['V37'].value_counts()"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 32,
|
|||
|
"id": "02c38a9f",
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAjMAAAHHCAYAAABKudlQAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjYuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8o6BhiAAAACXBIWXMAAA9hAAAPYQGoP6dpAABNjUlEQVR4nO3deVhUZf8G8HsUZ0SWUZQ1FhEXRMWMTFFT3FA0xNQss8Qt01BTtIx6K5cMrV+KlqKWgVqkYqJlEbmiphaY5Pa6YBqaLOYCgjEo8/z+6GVyZB9nOHPg/lzXuS7nnDPPfOeccbjnmec8oxBCCBARERHJVD2pCyAiIiJ6GAwzREREJGsMM0RERCRrDDNEREQkawwzREREJGsMM0RERCRrDDNEREQkawwzREREJGsMM0RERCRrDDN10Ny5c6FQKGrksQICAhAQEKC7vW/fPigUCmzZsqVGHn/s2LFo3rx5jTyWofLz8zFx4kQ4OTlBoVBgxowZ1W6j5Jz+9ddfxi+wjrt06RIUCgViY2N168aOHQtra2vpiqqGmv7/3r59+xp5LFMp63yT+WOYkbnY2FgoFArd0rBhQ7i4uGDAgAFYvnw5bt++bZTHuXr1KubOnYu0tDSjtGdM5lxbVbz//vuIjY3FlClTsGHDBrz44osV7rtt27aaK+4+Q4YMQaNGjSp8TY0ePRpKpRLXr18HAMycOROPPfYY7Ozs0KhRI7Rt2xZz585Ffn6+3v3Gjh2r9zp+cPnzzz+N8hy+//57zJ071yhtGZs513Y/c/j/FhcXh6ioKMkeX0orV65k0CqLIFmLiYkRAMT8+fPFhg0bxOeffy7ef/99ERgYKBQKhfDw8BC//fab3n3u3r0r/v7772o9TkpKigAgYmJiqnU/jUYjNBqN7vbevXsFABEfH1+tdgytraioSBQWFhrtsUyhS5cuonv37lXa18rKSoSGhpZa/+677woA4tq1a0au7l8bN24UAMS6devK3F5QUCCsrKxEcHCwbl337t3F9OnTxfLly8WaNWvElClThEqlEt27dxfFxcW6/Q4dOiQ2bNigt6xfv140atRI+Pj4GO05hIWFieq+7Wm1WvH333+Le/fu6daFhoYKKysro9VlaG1VYcj/94pU9P+tV69eol27dkZ7rPIMHjxYeHh4mKTtss63OWnXrp3o1auX1GWYHQvJUhQZVVBQEB5//HHd7YiICOzZswdPPfUUhgwZgv/+97+wtLQEAFhYWMDCwrSn/s6dO2jUqBGUSqVJH6cyDRo0kPTxqyInJwc+Pj5Sl1GpIUOGwMbGBnFxcRgzZkyp7du3b0dBQQFGjx6tW3fw4MFS+3l5eWH27Nn45Zdf0LVrVwCAv78//P399fY7ePAg7ty5o9deTbp37x60Wi2USiUaNmwoSQ3GUBP/381ZYWEhlEol6tWr2hcRJT3cJDNSpyl6OCU9MykpKWVuf//99wUAsWbNGt26kk/x9/vxxx9F9+7dhVqtFlZWVqJ169YiIiJCCPFvb8qDS8kns5JPY6mpqeLJJ58UlpaW4tVXX9Vtu/9TRElbGzduFBEREcLR0VE0atRIBAcHi4yMDL2aPDw8yuyFuL/NymoLDQ0t9QkuPz9fhIeHC1dXV6FUKkXr1q3Fhx9+KLRard5+AERYWJhISEgQ7dq1E0qlUvj4+IjExMQyj/WDsrOzxfjx44WDg4NQqVTC19dXxMbGljoWDy4XL14ss72y9i05PiXn9Pz58yI0NFSo1Wpha2srxo4dKwoKCkq1tWHDBvHYY4+Jhg0biiZNmohnn3221PEvS2hoqLCwsBDZ2dmltj311FPCxsZG3Llzp8I2tmzZIgBUehynTJkiFApFucfjfvv37xcjRowQbm5uQqlUCldXVzFjxgy9WkJDQ8s8hkIIcfHiRQFAfPjhh2Lp0qWiRYsWol69euLYsWO6bff3RJT0zFy4cEEEBgaKRo0aCWdnZzFv3jy911HJOd67d69evQ+2WVFtQghRXFwsli5dKnx8fIRKpRIODg5i0qRJ4saNG5Uem7L+vxv62q7qe8GpU6dEQECAsLS0FC4uLmLx4sWl2iosLBTvvPOO8PLy0p2z1157rdKe1F69epV6/JL/4yX1ffXVV+Ktt94SLi4uQqFQiJs3b4rr16+LWbNmifbt2wsrKythY2MjBg4cKNLS0vTar+h8X7lyRYSEhAgrKyvRrFkzMWvWrCr14KSkpIjAwEDRtGlT0bBhQ9G8eXMxbtw4vX2qco49PDxKPXf20vyj7sb1OuLFF1/Em2++iR9//BEvvfRSmfucOnUKTz31FHx9fTF//nyoVCqkp6fjp59+AgC0bdsW8+fPxzvvvINJkybhySefBAB069ZN18b169cRFBSE5557Di+88AIcHR0rrGvhwoVQKBSYM2cOcnJyEBUVhX79+iEtLU3Xg1QVVantfkIIDBkyBHv37sWECRPw6KOPIikpCa+99hr+/PNPLF26VG//gwcPYuvWrXjllVdgY2OD5cuXY/jw4cjIyEDTpk3Lrevvv/9GQEAA0tPTMXXqVHh6eiI+Ph5jx47FrVu38Oqrr6Jt27bYsGEDZs6cCVdXV8yaNQsAYG9vX2abGzZswMSJE/HEE09g0qRJAP7p5bjfyJEj4enpicjISPz666/47LPP4ODggMWLF+v2WbhwId5++22MHDkSEydOxLVr1/Dxxx+jZ8+eOHbsGBo3blzu8xo9ejTWrVuHzZs3Y+rUqbr1N27cQFJSEkaNGlXq/N27dw+3bt1CUVERTp48if/85z+wsbHBE088Ue7j3L17F5s3b0a3bt2qNIA7Pj4ed+7cwZQpU9C0aVP88ssv+Pjjj3HlyhXEx8cDAF5++WVcvXoVO3fuxIYNG8psJyYmBoWFhZg0aRJUKhXs7Oyg1WrL3Le4uBgDBw5E165d8cEHH+CHH37Au+++i3v37mH+/PmV1ny/ymp7+eWXERsbi3HjxmH69Om4ePEiPvnkExw7dgw//fSTQT2Qhry2q/L/7ebNmxg4cCCGDRuGkSNHYsuWLZgzZw46dOiAoKAgAIBWq8WQIUNw8OBBTJo0CW3btsWJEyewdOlSnDt3rsJxYW+99RZyc3Nx5coV3f/XBwdjL1iwAEqlErNnz4ZGo4FSqcTp06exbds2PPPMM/D09ER2djZWr16NXr164fTp03BxcanweBUXF2PAgAHo0qUL/u///g+7du3CRx99BC8vL0yZMqXc++Xk5CAwMBD29vZ444030LhxY1y6dAlbt27V268q5zgqKgrTpk2DtbU13nrrLQCo9L22zpA6TdHDqaxnRggh1Gq16NSpk+72g5/Uli5dWul4i8q+JwcgVq1aVea2snpmHnnkEZGXl6dbv3nzZgFALFu2TLeuKj0zldX2YM/Mtm3bBADx3nvv6e03YsQIoVAoRHp6um4dAKFUKvXW/fbbbwKA+Pjjj0s91v2ioqIEAPHFF1/o1hUVFQl/f39hbW2t99w9PDzE4MGDK2yvRGVjZsaPH6+3/umnnxZNmzbV3b506ZKoX7++WLhwod5+J06cEBYWFqXWP+jevXvC2dlZ+Pv7661ftWqVACCSkpJK3efw4cN6nyTbtGlTqqfiQd9++60AIFauXFnhfiXK6g2KjIwUCoVC/PHHH7p15Y1LKfk0bmtrK3Jycsrc9uAndQBi2rRpunVarVYMHjxYKJVK3f+lqvbMVFTbgQMHBADx5Zdf6q3/4Ycfylz/oPJ6Zgx9bVflvWD9+vW6dRqNRjg5OYnhw4fr1m3YsEHUq1dPHDhwQO/+Ja+jn376qcIayhszU3K8W7RoUeo1UVhYqDdOS4h/zoNKpRLz58/XW1fe+b5/PyGE6NSpk/Dz86uw1oSEhErfo6tzjjlmpmy8mqkOsLa2rvAKlJJP4tu3by/3U2hlVCoVxo0bV+X9x4wZAxsbG93tESNGwNnZGd9//71Bj19V33//PerXr4/p06frrZ81axaEEEhMTNRb369fP73eD19fX9ja2uL333+v9HGcnJwwatQo3boGDRpg+vTpyM/PR3JyshGeTWmTJ0/Wu/3
|
|||
|
"text/plain": [
|
|||
|
"<Figure size 640x480 with 1 Axes>"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {},
|
|||
|
"output_type": "display_data"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"_, _, bars = plt.hist(df_test['V37'], bins=20)\n",
|
|||
|
"plt.xlabel('V37')\n",
|
|||
|
"plt.ylabel('Frequency')\n",
|
|||
|
"plt.title('Distribution of the V37 atribute in the train set')\n",
|
|||
|
"plt.bar_label(bars, fmt='%1.0f')\n",
|
|||
|
"plt.show()"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"attachments": {},
|
|||
|
"cell_type": "markdown",
|
|||
|
"id": "15f862dd",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"The distribution of the target variable **V37** is normal, so i could try to fill the missing values with `mean()`."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 33,
|
|||
|
"id": "e1058d9a",
|
|||
|
"metadata": {},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"df_train['V37'].fillna(df_train['V37'].mean(), inplace=True)\n",
|
|||
|
"df_test['V37'].fillna(df_test['V37'].mean(), inplace=True)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"attachments": {},
|
|||
|
"cell_type": "markdown",
|
|||
|
"id": "44ca71d0",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"### 2.2 Modeling\n",
|
|||
|
"Besides the baselines (majority classifier, random classifier), use at least three machine learning algorithms\n",
|
|||
|
"to model the target class. Be ready to argue why did you select specific algorithms and how did you find\n",
|
|||
|
"the best hyperparameters for them. Consider the following points when creating your models:\n",
|
|||
|
"- Create your models using all features and subsets of them using various feature selection techniques.\n",
|
|||
|
"- Certain models assume that data follows a particular distribution or may work better with other\n",
|
|||
|
"types of variables (e.g., categorical instead of numeric). Explore whether you can come up with feature\n",
|
|||
|
"transformations that are more appropriate for your models. Try to construct new features from existing\n",
|
|||
|
"ones. Try to explain the results and performance of different models."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 34,
|
|||
|
"id": "42e83cd5",
|
|||
|
"metadata": {},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"# Spliting the data into features and labels\n",
|
|||
|
"X_train = df_train.drop('Class', axis=1).reset_index(drop=True)\n",
|
|||
|
"y_train = df_train['Class'].reset_index(drop=True)\n",
|
|||
|
"X_test = df_test.drop('Class', axis=1).reset_index(drop=True)\n",
|
|||
|
"y_test = df_test['Class'].reset_index(drop=True)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"attachments": {},
|
|||
|
"cell_type": "markdown",
|
|||
|
"id": "5779375e",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"#### Lets firstly write a simple function that will score all our generated models"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 59,
|
|||
|
"id": "3d716f7b",
|
|||
|
"metadata": {},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"from sklearn.metrics import precision_score\n",
|
|||
|
"from sklearn.metrics import recall_score\n",
|
|||
|
"from sklearn.metrics import f1_score\n",
|
|||
|
"from sklearn.metrics import roc_auc_score\n",
|
|||
|
"from sklearn.metrics import RocCurveDisplay\n",
|
|||
|
"from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay\n",
|
|||
|
"\n",
|
|||
|
"def score_the_model(model, model_name, random_seed, X_train, X_test, y_train, y_test, plot=False):\n",
|
|||
|
" \"\"\"\n",
|
|||
|
" Fits and evaluates given machine learning models.\n",
|
|||
|
" models: dict of different Scikit-Learn machine learning models\n",
|
|||
|
" X_train: training data (no labels)\n",
|
|||
|
" x_test: testing data (no labels)\n",
|
|||
|
" y_train: training labels\n",
|
|||
|
" y_test: trest labels\n",
|
|||
|
" \"\"\"\n",
|
|||
|
"\n",
|
|||
|
" # Set random seed\n",
|
|||
|
" np.random.seed(random_seed)\n",
|
|||
|
"\n",
|
|||
|
" # Fit the model to the data\n",
|
|||
|
" model.fit(X_train, y_train)\n",
|
|||
|
"\n",
|
|||
|
" model_score = model.score(X_test, y_test) # Mean accuracy of ``self.predict(X)`` wrt. `y`.\n",
|
|||
|
" # Predict the labels\n",
|
|||
|
" y_pred = model.predict(X_test)\n",
|
|||
|
"\n",
|
|||
|
" # Compute scores\n",
|
|||
|
" f1 = f1_score(y_test, y_pred)\n",
|
|||
|
" precision = precision_score(y_test, y_pred)\n",
|
|||
|
" recall = recall_score(y_test, y_pred)\n",
|
|||
|
" auc = roc_auc_score(y_test, y_pred)\n",
|
|||
|
"\n",
|
|||
|
" # Plot scores\n",
|
|||
|
" scores = {\n",
|
|||
|
" 'Accuracy': model_score,\n",
|
|||
|
" 'F1': f1,\n",
|
|||
|
" 'Precision': precision,\n",
|
|||
|
" 'Recall': recall,\n",
|
|||
|
" 'AUC': auc\n",
|
|||
|
" }\n",
|
|||
|
" if plot:\n",
|
|||
|
" # Plot scores\n",
|
|||
|
" fig, ax = plt.subplots(nrows=2, ncols=2, figsize=(15,15))\n",
|
|||
|
"\n",
|
|||
|
" # Plot the bar chart in the first subplot\n",
|
|||
|
" ax[0, 0].bar(scores.keys(), scores.values())\n",
|
|||
|
" # Display values of the bars\n",
|
|||
|
" for i, v in enumerate(scores.values()):\n",
|
|||
|
" ax[0, 0].text(i-0.1, v+0.01, str(round(v, 2)))\n",
|
|||
|
" ax[0, 0].set_title(f'Model performance for {model_name}')\n",
|
|||
|
" ax[0, 0].set_ylabel('Score')\n",
|
|||
|
" \n",
|
|||
|
" # Plot the ROC curve in the second subplot\n",
|
|||
|
" f = RocCurveDisplay.from_estimator(model, X_test, y_test)\n",
|
|||
|
" f.plot(ax=ax[0, 1])\n",
|
|||
|
" \n",
|
|||
|
" # Plot the confusion matrix in the third subplot\n",
|
|||
|
" cm = confusion_matrix(y_test, y_pred, labels=model.classes_)\n",
|
|||
|
" cm_plt = ConfusionMatrixDisplay(cm, display_labels=model.classes_)\n",
|
|||
|
" cm_plt.plot(ax=ax[1, 0])\n",
|
|||
|
" return scores, model"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"id": "d144deb1",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"### Decision tree model"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 61,
|
|||
|
"id": "63fe4438",
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAABNwAAATFCAYAAAB7FctDAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjYuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8o6BhiAAAACXBIWXMAAA9hAAAPYQGoP6dpAAEAAElEQVR4nOzdeVxU9f7H8fewg7Ipi4oIapqZpgZq7i2Wt7qWrablVtlys0zq3rREK0srb1y6aVqW2ebN23r7pVlGqVmmglppKW6IGwgiu2wz8/sDODmCpThwWF7Px4NHzfecM/M5A+b05vv9fix2u90uAAAAAAAAAE7hYnYBAAAAAAAAQGNC4AYAAAAAAAA4EYEbAAAAAAAA4EQEbgAAAAAAAIATEbgBAAAAAAAATkTgBgAAAAAAADgRgRsAAAAAAADgRARuAAAAAAAAgBMRuAEAAAAAAABOROAGnCWLxaInn3zyrK9LSUmRxWLRkiVLnF7TuXrnnXfUpUsXubu7KyAgwOxyGryVK1eqZ8+e8vLyksViUXZ2ttkl1Zqa/lxfeumluvTSS2ulJgAAAAAwG4EbGqQlS5bIYrHIYrFo3bp1VY7b7XaFh4fLYrHor3/9qwkVNhw7duzQ+PHj1bFjRy1atEivvfaa2SU1aMeOHdOtt94qb29vzZ8/X++8846aNWtWa6938p8Fi8UiLy8vtWnTRsOGDdO///1v5eXl1dprNzSRkZEO79XpvupjKA4AAACgYXEzuwDgXHh5eWnp0qUaOHCgw/iaNWt08OBBeXp6mlRZw7F69WrZbDa99NJLOu+888wup8HbtGmT8vLyNGvWLA0dOrTOXvfpp59W+/btVVpaqrS0NK1evVoPP/yw4uLi9Nlnn+miiy6qldeNiIjQiRMn5O7uflbXffXVV7VSzx+Jj49Xfn6+8XjFihX6z3/+o3/9618KCgoyxvv371/ntQEAAABoXAjc0KBdc801+uCDD/Tvf/9bbm6//zgvXbpUUVFRyszMNLG6+q2goEDNmjXT0aNHJcmpS0kLCwvl4+PjtOdrSGrj/az8Xv2Rq6++WtHR0cbjadOm6ZtvvtFf//pXXXfddfrtt9/k7e3ttJoqVc6qO1seHh5Or+XPjBgxwuFxWlqa/vOf/2jEiBGKjIw87XVn8v4DAAAAwMlYUooGbdSoUTp27JhWrVpljJWUlOjDDz/U6NGjq72moKBAjzzyiMLDw+Xp6anzzz9f//znP2W32x3OKy4u1pQpUxQcHCxfX19dd911OnjwYLXPeejQId15550KDQ2Vp6enLrzwQi1evLhG91S5RHDt2rW699571bJlS/n5+Wns2LE6fvx4lfO/+OILDRo0SM2aNZOvr6+uvfZabd++3eGc8ePHq3nz5tqzZ4+uueYa+fr66vbbb1dkZKRmzpwpSQoODq6yP90rr7yiCy+8UJ6enmrTpo0eeOCBKvuRXXrpperWrZuSkpI0ePBg+fj46PHHHzf29vrnP/+p+fPnq0OHDvLx8dFVV12lAwcOyG63a9asWWrbtq28vb11/fXXKysry+G5//e//+naa69VmzZt5OnpqY4dO2rWrFmyWq3V1vDrr7/qsssuk4+Pj8LCwvTCCy9Ueb+Kior05JNPqnPnzvLy8lLr1q114403as+ePcY5NptN8fHxuvDCC+Xl5aXQ0FDde++91b7/p9Yxbtw4SVLv3r1lsVg0fvx44/gHH3ygqKgoeXt7KygoSHfccYcOHTp0Rt+rmrj88ssVGxur/fv3691333U4tmPHDt18881q0aKFvLy8FB0drc8++6zKc2RnZ2vKlCmKjIyUp6en2rZtq7FjxxphdnV7uKWlpWnChAlq27atPD091bp1a11//fVKSUlxeK9O3cPt6NGjuuuuuxQaGiovLy/16NFDb731lsM5J/9cvfbaa+rYsaM8PT3Vu3dvbdq0qUbv08n+6P0/m5+LM/lzCQAAAKDxYoYbGrTIyEj169dP//nPf3T11VdLKv8f3ZycHN12223697//7XC+3W7Xddddp2+//VZ33XWXevbsqS+//FJ///vfdejQIf3rX/8yzr377rv17rvvavTo0erfv7+++eYbXXvttVVqSE9P1yWXXCKLxaJJkyYpODhYX3zxhe666y7l5ubq4YcfrtG9TZo0SQEBAXryySe1c+dOLViwQPv379fq1atlsVgklTc7GDdunIYNG6bnn39ehYWFWrBggQYOHKgtW7Y4zNopKyvTsGHDNHDgQP3zn/+Uj4+Pxo8fr7fffluffPKJFixYoObNmxtLD5988kk99dRTGjp0qO6//36jhk2bNun77793WEJ47NgxXX311brtttt0xx13KDQ01Dj23nvvqaSkRA8++KCysrL0wgsv6NZbb9Xll1+u1atX67HHHtPu3bv18ssv69FHH3UIKpcsWaLmzZsrJiZGzZs31zfffKMZM2YoNzdXc+fOdXi/jh8/rr/85S+68cYbdeutt+rDDz/UY489pu7duxs/G1arVX/961+VkJCg2267TZMnT1ZeXp5WrVqlbdu2qWPHjpKke++9V0uWLNGECRP00EMPad++fZo3b562bNlS5d5P9sQTT+j888/Xa6+9ZizxrHzOyufr3bu35syZo/T0dL300kv6/vvvtWXLFocZcdV9r2pqzJgxevzxx/XVV19p4sSJkqTt27drwIABCgsL09SpU9WsWTP997//1YgRI/TRRx/phhtukCTl5+dr0KBB+u2333TnnXfq4osvVmZmpj777DMdPHjQYRnmyW666SZt375dDz74oCIjI3X06FGtWrVKqampp51JduLECV166aXavXu3Jk2apPbt2+uDDz7Q+PHjlZ2drcmTJzucv3TpUuXl5enee++VxWLRCy+8oBtvvFF79+496+Wtpzrd+3+mPxdn8+cSAAAAQCNlBxqgN9980y7JvmnTJvu8efPsvr6+9sLCQrvdbrffcsst9ssuu8xut9vtERER9muvvda47tNPP7VLsj/zzDMOz3fzzTfbLRaLfffu3Xa73W7funWrXZL9b3/7m8N5o0ePtkuyz5w50xi766677K1bt7ZnZmY6nHvbbbfZ/f39jbr27dtnl2R/8803z+jeoqKi7CUlJcb4Cy+8YJdk/9///me32+32vLw8e0BAgH3ixIkO16elpdn9/f0dxseNG2eXZJ86dWqV15s5c6Zdkj0jI8MYO3r0qN3Dw8N+1VVX2a1WqzE+b948uyT74sWLjbEhQ4bYJdkXLlzo8LyV9xscHGzPzs42xqdNm2aXZO/Ro4e9tLTUGB81apTdw8PDXlRUZIxVvncnu/fee+0+Pj4O51XW8PbbbxtjxcXF9latWtlvuukmY2zx4sV2Sfa4uLgqz2uz2ex2u93+3Xff2SXZ33vvPYfjK1eurHb8VCf/bFYqKSmxh4SE2Lt162Y/ceKEMf7555/bJdlnzJhhjP3R9+pMX+9U/v7+9l69ehmPr7jiCnv37t0d3kObzWbv37+/vVOnTsbYjBkz7JLsH3/8cZXnrHy/Tv25Pn78uF2Sfe7cuX9Y95AhQ+xDhgwxHsfHx9sl2d99911jrKSkxN6vXz978+bN7bm5uQ6v17JlS3tWVpZx7v/+9z+7JPv//d///eHrnmzu3Ll2SfZ9+/YZY6d7/8/05+Js/lwCAAAAaLxYUooG79Zbb9WJEyf0+eefKy8vT59//vlpl5OuWLFCrq6ueuihhxzGH3nkEdntdn3xxRfGeZKqnHfqbDW73a6PPvpIw4cPl91uV2ZmpvE1bNgw5eTkaPPmzTW6r3vuucdhps79998vNzc3o7ZVq1YpOztbo0aNcnhdV1dX9e3bV99++22V57z//vvP6LW//vprlZSU6OGHH5aLy+//mZg4caL8/Py0fPlyh/M9PT01YcKEap/rlltukb+/v/G4b9++kqQ77rjDYd+9vn37qqSkxGGJ5cl7juXl5SkzM1ODBg1SYWGhduzY4fA6zZs31x133GE89vDwUJ8+fbR3715j7KOPPlJQUJAefPDBKnVWzhr84IMP5O/vryuvvNLhfY2KilL
|
|||
|
"text/plain": [
|
|||
|
"<Figure size 1500x1500 with 5 Axes>"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {},
|
|||
|
"output_type": "display_data"
|
|||
|
},
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAjcAAAGwCAYAAABVdURTAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjYuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8o6BhiAAAACXBIWXMAAA9hAAAPYQGoP6dpAABqfklEQVR4nO3dd3iTVfsH8G/SNulOKd1toKyyy6ZSFAQKBZSlKMoqqLgAEV5UULYyFEVQUQQFxJ++KA7kFQRKGQKizLJaCpRRCh2U0qZ7JOf3R5tIaApJSZo2/X6uq5fmyXme3HkIzc059zlHIoQQICIiIrIRUmsHQERERGROTG6IiIjIpjC5ISIiIpvC5IaIiIhsCpMbIiIisilMboiIiMimMLkhIiIim2Jv7QCqm0ajwY0bN+Dm5gaJRGLtcIiIiMgIQgjk5OQgICAAUum9+2bqXHJz48YNKJVKa4dBREREVXDt2jUEBQXds02dS27c3NwAlN0cd3d3K0dDRERExlCpVFAqlbrv8Xupc8mNdijK3d2dyQ0REVEtY0xJCQuKiYiIyKYwuSEiIiKbwuSGiIiIbAqTGyIiIrIpTG6IiIjIpjC5ISIiIpvC5IaIiIhsCpMbIiIisilMboiIiMimMLkhIiIim2LV5ObPP//EoEGDEBAQAIlEgs2bN9/3nL1796Jjx46Qy+Vo2rQp1q9fb/E4iYiIqPawanKTl5eHdu3aYeXKlUa1v3z5Mh577DH06tULsbGxeP311/HCCy9gx44dFo6UiIiIagurbpw5YMAADBgwwOj2q1atQqNGjfDRRx8BAFq2bIkDBw7g448/RmRkpKXCJCIiIiOlqwqRW1SKxt6uVouhVu0KfujQIUREROgdi4yMxOuvv17pOUVFRSgqKtI9VqlUlgqPiIioTlEVluB0cjZir2XhVHIWTiVnIyW7EL2ae2Pd+K5Wi6tWJTepqanw9fXVO+br6wuVSoWCggI4OTlVOGfx4sWYP39+dYVIRERkkwpL1Dh7Q4VTyVk4ea0skbmUkVehnUQCFJSorRDhv2pVclMVM2fOxLRp03SPVSoVlEqlFSMiIiKq2UrVGlxIz8XJa1k4mZyNU8lZSEjNQalGVGir9HRCaJAH2gd5IDRIgTaBCrjIrZte1Krkxs/PD2lpaXrH0tLS4O7ubrDXBgDkcjnkcnl1hEdERFTrCCFw9VY+TiZn4eS1skTmzI1sFJZoKrT1cpWhXZAHQoM8EKpUoF2QBzxdZFaI+t5qVXLTrVs3bNu2Te9YdHQ0unXrZqWIiIiIapd0VWF5jUw2TpbXyWQXlFRo5yq3R9tABdopPdAuSIFQpQcCFI6QSCRWiNo0Vk1ucnNzcfHiRd3jy5cvIzY2Fp6enmjQoAFmzpyJ69evY8OGDQCAl19+GZ999hnefPNNPPfcc9i9ezd+/PFHbN261VpvgYiIqMbKLigr+D15R51MqqqwQjuZnRStAtzLkpggD7RTeqCxlwuk0pqfyBhi1eTm6NGj6NWrl+6xtjYmKioK69evR0pKCpKSknTPN2rUCFu3bsXUqVOxYsUKBAUF4auvvuI0cCIiqvPKCn6zdUNLJ5OzcdlAwa9UAjTzcUM7ZXkiE+SB5n5ukNnbzqYFEiFExeogG6ZSqaBQKJCdnQ13d3drh0NERGSyUrUG59Nyy4eVymplEtJyoDZQ8NvA0xmhQWX1Me2UHmgd4G71gt+qMOX7u/a9OyIiojpECIErt/J1SczJ5CycrbTgV4725T0yoeVDTDWx4NfSmNwQERHVIGm6gt/yot9rWVAVllZo5ya3R9vyBEab0PjXkoJfS2NyQ0REZCXZ+SU4db0sidEmNGmqogrtZPZStA5wL5+GXTaDqVH92lvwa2lMboiIiKpBQXF5wW/5ongnr2Xhyq38Cu2kEiDE102XxLQL8kCIr20V/FoakxsiIiIzK1FrcD4tRzesdDI5G+crKfhtWN+5fNaSQlfw6yzj1/OD4N0jIiJ6ABqNwJVbebpF8U5ey8LZGyoUlVYs+PV2k5clMUEeCFV6IDRQgXp1sODX0pjcEBERmSA1W7/g91RyJQW/jva6GUtl07AV8HNnwW91YHJDRERUiaz8Yl0CE1u+OF56zr0LfrWL47Hg13qY3BAREaGs4PfMjWzdNgWnku9d8KtdFC80SIHmfm5wsGPBb03B5IaIiOqcErUGCal3Fvxm4UJ6bqUFv3dOwWbBb83HPx0iIrJpGo3A5Vt5uhV+TyVXXvDr4ybXWxQvNEgBD2cW/NY2TG6IiMhmCCGQqirUbVOgLfrNqaTgt90d2xS0V3rAT+FohajJ3JjcEBFRrZWVX1y2KF75WjInk7Nw00DBr7y84FebxIQGKRDMgl+bxeSGiIhqhfziUpy9odItincqOQtXDRT82kkl5QW/5dOwlQqE+LLgty5hckNERDWOtuBXuyjeqfIVfg3U+yK4vnP5rKWyVX5bByjgJLOr/qCpxmByQ0REVqUt+NUmMSfLC36LDRT8+rrL9bYqCA30gMLZwQpRU03G5IaIiKqNEAIp2YV6i+KdTs5GTlHFgl93R3vdOjLaVX5Z8EvGYHJDREQWczuvuHzW0r+r/GbkGi74bROoQGiQorzg1wPB9Z25VQFVCZMbIiIyi/ziUpy5rtItincqORtJmYYLfpv7uum2KQgNYsEvmReTGyIiMllx6b8Fv9rF8S6kGy74beTlUra6b/nMpVb+LPgly2JyQ0RE96TRCFzK0Bb8lk3DjksxXPDr5+6o26YgNEjBgl+yCiY3RESkI4TAjezCfxfFu5aFM9fvXfB7575Lvu4s+CXrY3JDRFSHZWoLfstnLp1MzkJGbnGFdo4OUrQJ+HdRvHZBHmjIgl+qoZjcEBHVEXlFpThzPRunkrMRW14rcy2zoEK7fwt+PXSr/Ib4usKeBb9USzC5ISKyQcWlGpxLVen2XTqVXHnBb+Pygt+yXhkPtA5wh6MDC36p9mJyQ0RUy5UV/ObqFsU7mZyN+BsqFKsNF/xqp2C3C/JA2yAFFE4s+CXbwuSGiKgWEULgelaBbpuCsoJfFXINFPwqnBz0FsVrF6SADwt+qQ5gckNEVINpC361+y6dukfBb9vAfxfFa6/0QANPFvxS3cTkhoiohsjVFfyWLYp3MjkLybcrFvzaSyVo7ueG0CAPtC8fYmrmw4JfIi0mN0REVqAr+C1fT+ZUchYupOdCGCr49XbRW0umlT8LfonuhckNEZGFqTUCl27m6hbFO5WchfiUHIMFv/4Kx7JEpnwtmTaBLPglMhWTGyIiM9IW/J68Y1G808nZyCtWV2jr4eygK/TVJjQ+biz4JXpQTG6IiB7ArdyiskXxyntkTiVn41ZexYJfJwe78oJfBULLF8djwS+RZTC5ISIyUm5RKU6X18doE5rrWYYLflv4lxf8lvfINPVmwS9RdWFyQ0RkQFGpGudScsqnYZclNBdvGi74bXJHwW8oC36JrI7JDRHVeWqNQOLNXN1aMieTsxCfokKJumImE6Bw1G1T0C5IgTZBCrg7suCXqCZhckNEdYoQAsm3C8p2wi6fvXTmeuUFv+2C/t08kgW/RLUDkxsismkZuUU4lZyl23fpVHI2Mg0U/DrL7NAmQKG375LS04kFv0S1EJMbIrIZOYUlOH09W7dNwclr2ZUW/Lb0dy9bFK98iKmpjyvspExkiGxBlZKbkpISpKamIj8/H97e3vD09DR3XERE91RUqkZ8Sk75Cr9lPTKJBgp+JRKgsZdLeY1MWdFvSxb8Etk0o5ObnJwc/N///R82btyIw4cPo7i4GEIISCQSBAUFoV+/fnjxxRfRpUsXS8ZLRHWQWiNwMT23PIkp65E5l2q44DfQw6ls1lKQB9opFWgbqIAbC36J6hSjkptly5Zh4cKFaNKkCQYNGoS3334bAQEBcHJyQmZmJs6cOYP9+/ejX79+CAsLw6effopmzZpZOnYiskHagl/tongnk7Nx5no28g0U/NZzdkA7pYduld/QIA94u8mtEDUR1SQSIQyt2qDv2WefxaxZs9C6det7tisqKsK6desgk8nw3HPPmS1
|
|||
|
"text/plain": [
|
|||
|
"<Figure size 640x480 with 1 Axes>"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {},
|
|||
|
"output_type": "display_data"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"from sklearn.tree import DecisionTreeClassifier\n",
|
|||
|
"\n",
|
|||
|
"# Score the model with default parameters\n",
|
|||
|
"scores, model = score_the_model(\n",
|
|||
|
" model=DecisionTreeClassifier(),\n",
|
|||
|
" model_name='Decision Tree',\n",
|
|||
|
" random_seed=42,\n",
|
|||
|
" X_train=X_train,\n",
|
|||
|
" X_test=X_test,\n",
|
|||
|
" y_train=y_train,\n",
|
|||
|
" y_test=y_test,\n",
|
|||
|
" plot=True\n",
|
|||
|
")"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"attachments": {},
|
|||
|
"cell_type": "markdown",
|
|||
|
"id": "a72e54f6",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"Now lets plot the decision tree"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 50,
|
|||
|
"id": "c4fe47bd",
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"text/plain": [
|
|||
|
"[Text(0.6615466101694916, 0.9722222222222222, 'V36 <= 3.678\\ngini = 0.444\\nsamples = 846\\nvalue = [564, 282]\\nclass = Ready biodegradable'),\n",
|
|||
|
" Text(0.4240819209039548, 0.9166666666666666, 'V1 <= 4.792\\ngini = 0.483\\nsamples = 361\\nvalue = [147, 214]\\nclass = Reday non-biodegradable'),\n",
|
|||
|
" Text(0.2803672316384181, 0.8611111111111112, 'V34 <= 1.5\\ngini = 0.435\\nsamples = 285\\nvalue = [91, 194]\\nclass = Reday non-biodegradable'),\n",
|
|||
|
" Text(0.1906779661016949, 0.8055555555555556, 'V14 <= 0.673\\ngini = 0.32\\nsamples = 210\\nvalue = [42, 168]\\nclass = Reday non-biodegradable'),\n",
|
|||
|
" Text(0.16807909604519775, 0.75, 'V18 <= 1.158\\ngini = 0.278\\nsamples = 12\\nvalue = [10, 2]\\nclass = Ready biodegradable'),\n",
|
|||
|
" Text(0.15677966101694915, 0.6944444444444444, 'V28 <= 0.121\\ngini = 0.165\\nsamples = 11\\nvalue = [10, 1]\\nclass = Ready biodegradable'),\n",
|
|||
|
" Text(0.14548022598870056, 0.6388888888888888, 'gini = 0.0\\nsamples = 10\\nvalue = [10, 0]\\nclass = Ready biodegradable'),\n",
|
|||
|
" Text(0.16807909604519775, 0.6388888888888888, 'gini = 0.0\\nsamples = 1\\nvalue = [0, 1]\\nclass = Reday non-biodegradable'),\n",
|
|||
|
" Text(0.17937853107344634, 0.6944444444444444, 'gini = 0.0\\nsamples = 1\\nvalue = [0, 1]\\nclass = Reday non-biodegradable'),\n",
|
|||
|
" Text(0.2132768361581921, 0.75, 'V38 <= 1.5\\ngini = 0.271\\nsamples = 198\\nvalue = [32, 166]\\nclass = Reday non-biodegradable'),\n",
|
|||
|
" Text(0.2019774011299435, 0.6944444444444444, 'V41 <= 1.5\\ngini = 0.247\\nsamples = 194\\nvalue = [28, 166]\\nclass = Reday non-biodegradable'),\n",
|
|||
|
" Text(0.1906779661016949, 0.6388888888888888, 'V22 <= 1.265\\ngini = 0.221\\nsamples = 190\\nvalue = [24, 166]\\nclass = Reday non-biodegradable'),\n",
|
|||
|
" Text(0.12146892655367232, 0.5833333333333334, 'V32 <= 0.5\\ngini = 0.148\\nsamples = 161\\nvalue = [13, 148]\\nclass = Reday non-biodegradable'),\n",
|
|||
|
" Text(0.11016949152542373, 0.5277777777777778, 'V28 <= 0.843\\ngini = 0.139\\nsamples = 160\\nvalue = [12, 148]\\nclass = Reday non-biodegradable'),\n",
|
|||
|
" Text(0.09887005649717515, 0.4722222222222222, 'V37 <= 1.95\\ngini = 0.129\\nsamples = 159\\nvalue = [11, 148]\\nclass = Reday non-biodegradable'),\n",
|
|||
|
" Text(0.03389830508474576, 0.4166666666666667, 'V22 <= 1.174\\ngini = 0.34\\nsamples = 23\\nvalue = [5, 18]\\nclass = Reday non-biodegradable'),\n",
|
|||
|
" Text(0.022598870056497175, 0.3611111111111111, 'gini = 0.0\\nsamples = 1\\nvalue = [1, 0]\\nclass = Ready biodegradable'),\n",
|
|||
|
" Text(0.04519774011299435, 0.3611111111111111, 'V31 <= 1.358\\ngini = 0.298\\nsamples = 22\\nvalue = [4, 18]\\nclass = Reday non-biodegradable'),\n",
|
|||
|
" Text(0.03389830508474576, 0.3055555555555556, 'gini = 0.0\\nsamples = 1\\nvalue = [1, 0]\\nclass = Ready biodegradable'),\n",
|
|||
|
" Text(0.05649717514124294, 0.3055555555555556, 'V37 <= 1.935\\ngini = 0.245\\nsamples = 21\\nvalue = [3, 18]\\nclass = Reday non-biodegradable'),\n",
|
|||
|
" Text(0.04519774011299435, 0.25, 'V35 <= 1.5\\ngini = 0.18\\nsamples = 20\\nvalue = [2, 18]\\nclass = Reday non-biodegradable'),\n",
|
|||
|
" Text(0.022598870056497175, 0.19444444444444445, 'V3 <= 0.5\\ngini = 0.105\\nsamples = 18\\nvalue = [1, 17]\\nclass = Reday non-biodegradable'),\n",
|
|||
|
" Text(0.011299435028248588, 0.1388888888888889, 'gini = 0.0\\nsamples = 15\\nvalue = [0, 15]\\nclass = Reday non-biodegradable'),\n",
|
|||
|
" Text(0.03389830508474576, 0.1388888888888889, 'V22 <= 1.231\\ngini = 0.444\\nsamples = 3\\nvalue = [1, 2]\\nclass = Reday non-biodegradable'),\n",
|
|||
|
" Text(0.022598870056497175, 0.08333333333333333, 'gini = 0.0\\nsamples = 2\\nvalue = [0, 2]\\nclass = Reday non-biodegradable'),\n",
|
|||
|
" Text(0.04519774011299435, 0.08333333333333333, 'gini = 0.0\\nsamples = 1\\nvalue = [1, 0]\\nclass = Ready biodegradable'),\n",
|
|||
|
" Text(0.06779661016949153, 0.19444444444444445, 'V17 <= 0.97\\ngini = 0.5\\nsamples = 2\\nvalue = [1, 1]\\nclass = Ready biodegradable'),\n",
|
|||
|
" Text(0.05649717514124294, 0.1388888888888889, 'gini = 0.0\\nsamples = 1\\nvalue = [1, 0]\\nclass = Ready biodegradable'),\n",
|
|||
|
" Text(0.07909604519774012, 0.1388888888888889, 'gini = 0.0\\nsamples = 1\\nvalue = [0, 1]\\nclass = Reday non-biodegradable'),\n",
|
|||
|
" Text(0.06779661016949153, 0.25, 'gini = 0.0\\nsamples = 1\\nvalue = [1, 0]\\nclass = Ready biodegradable'),\n",
|
|||
|
" Text(0.1638418079096045, 0.4166666666666667, 'V9 <= 3.5\\ngini = 0.084\\nsamples = 136\\nvalue = [6, 130]\\nclass = Reday non-biodegradable'),\n",
|
|||
|
" Text(0.13559322033898305, 0.3611111111111111, 'V18 <= 1.162\\ngini = 0.059\\nsamples = 131\\nvalue = [4, 127]\\nclass = Reday non-biodegradable'),\n",
|
|||
|
" Text(0.11299435028248588, 0.3055555555555556, 'V37 <= 2.292\\ngini = 0.017\\nsamples = 118\\nvalue = [1, 117]\\nclass = Reday non-biodegradable'),\n",
|
|||
|
" Text(0.1016949152542373, 0.25, 'V37 <= 2.285\\ngini = 0.087\\nsamples = 22\\nvalue = [1, 21]\\nclass = Reday non-biodegradable'),\n",
|
|||
|
" Text(0.0903954802259887, 0.19444444444444445, 'gini = 0.0\\nsamples = 21\\nvalue = [0, 21]\\nclass = Reday non-biodegradable'),\n",
|
|||
|
" Text(0.11299435028248588, 0.19444444444444445, 'gini = 0.0\\nsamples = 1\\nvalue = [1, 0]\\nclass = Ready biodegradable'),\n",
|
|||
|
" Text(0.12429378531073447, 0.25, 'gini = 0.0\\nsamples = 96\\nvalue = [0, 96]\\nclass = Reday non-biodegradable'),\n",
|
|||
|
" Text(0.15819209039548024, 0.3055555555555556, 'V2 <= 3.228\\ngini = 0.355\\nsamples = 13\\nvalue = [3, 10]\\nclass = Reday non-biodegradable'),\n",
|
|||
|
" Text(0.14689265536723164, 0.25, 'V11 <= 0.5\\ngini = 0.165\\nsamples = 11\\nvalue = [1, 10]\\nclass = Reday non-biodegradable'),\n",
|
|||
|
" Text(0.13559322033898305, 0.19444444444444445, 'gini = 0.0\\nsamples = 9\\nvalue = [0, 9]\\nclass = Reday non-biodegradable'),\n",
|
|||
|
" Text(0.15819209039548024, 0.19444444444444445, 'V16 <= 1.5\\ngini = 0.5\\nsamples = 2\\nvalue = [1, 1]\\nclass = Ready biodegradable'),\n",
|
|||
|
" Text(0.14689265536723164, 0.1388888888888889, 'gini = 0.0\\nsamples = 1\\nvalue = [1, 0]\\nclass = Ready biodegradable'),\n",
|
|||
|
" Text(0.1694915254237288, 0.1388888888888889, 'gini = 0.0\\nsamples = 1\\nvalue = [0, 1]\\nclass = Reday non-biodegradable'),\n",
|
|||
|
" Text(0.1694915254237288, 0.25, 'gini = 0.0\\nsamples = 2\\nvalue = [2, 0]\\nclass = Ready biodegradable'),\n",
|
|||
|
" Text(0.192090395480226, 0.3611111111111111, 'V18 <= 1.146\\ngini = 0.48\\nsamples = 5\\nvalue = [2, 3]\\nclass = Reday non-biodegradable'),\n",
|
|||
|
" Text(0.1807909604519774, 0.3055555555555556, 'gini = 0.0\\nsamples = 3\\nvalue = [0, 3]\\nclass = Reday non-biodegradable'),\n",
|
|||
|
" Text(0.2033898305084746, 0.3055555555555556, 'gini = 0.0\\nsamples = 2\\nvalue = [2, 0]\\nclass = Ready biodegradable'),\n",
|
|||
|
" Text(0.12146892655367232, 0.4722222222222222, 'gini = 0.0\\nsamples = 1\\nvalue = [1, 0]\\nclass = Ready biodegradable'),\n",
|
|||
|
" Text(0.1327683615819209, 0.5277777777777778, 'gini = 0.0\\nsamples = 1\\nvalue = [1, 0]\\nclass = Ready biodegradable'),\n",
|
|||
|
" Text(0.2598870056497175, 0.5833333333333334, 'V2 <= 2.882\\ngini = 0.471\\nsamples = 29\\nvalue = [11, 18]\\nclass = Reday non-biodegradable'),\n",
|
|||
|
" Text(0.22033898305084745, 0.5277777777777778, 'V8 <= 38.8\\ngini = 0.459\\nsamples = 14\\nvalue = [9, 5]\\nclass = Ready biodegradable'),\n",
|
|||
|
" Text(0.1977401129943503, 0.4722222222222222, 'V31 <= 1.896\\ngini = 0.32\\nsamples = 5\\nvalue = [1, 4]\\nclass = Reday non-biodegradable'),\n",
|
|||
|
" Text(0.1864406779661017, 0.4166666666666667, 'gini = 0.0\\nsamples = 4\\nvalue = [0, 4]\\nclass = Reday non-biodegradable'),\n",
|
|||
|
" Text(0.20903954802259886, 0.4166666666666667, 'gini = 0.0\\nsamples = 1\\nvalue = [1, 0]\\nclass = Ready biodegradable'),\n",
|
|||
|
" Text(0.24293785310734464, 0.4722222222222222, 'V22 <= 1.331\\ngini = 0.198\\nsamples = 9\\nvalue = [8, 1]\\nclass = Ready biodegradable'),\n",
|
|||
|
" Text(0.23163841807909605, 0.4166666666666667, 'gini = 0.0\\nsamples = 8\\nvalue = [8, 0]\\nclass = Ready biodegradable'),\n",
|
|||
|
" Text(0.2542372881355932, 0.4166666666666667, 'gini = 0.0\\nsamples = 1\\nvalue = [0, 1]\\nclass = Reday non-biodegradable'),\n",
|
|||
|
" Text(0.2994350282485876, 0.5277777777777778, 'V13 <= 3.298\\ngini = 0.231\\nsamples = 15\\nvalue = [2, 13]\\nclass = Reday non-biodegradable'),\n",
|
|||
|
" Text(0.288135593220339, 0.4722222222222222, 'V12 <= 0.841\\ngini = 0.133\\nsamples = 14\\nvalue = [1, 13]\\nclass = Reday non-biodegradable'),\n",
|
|||
|
" Text(0.2768361581920904, 0.4166666666666667, 'gini = 0.0\\nsamples = 12\\nvalue = [0, 12]\\nclass = Reday non-biodegradable'),\n",
|
|||
|
" Text(0.2994350282485876, 0.4166666666666667, 'V2 <= 3.222\\ngini = 0.5\\nsamples = 2\\nvalue = [1, 1]\\nclass = Ready biodegradable'),\n",
|
|||
|
" Text(0.288135593220339, 0.3611111111111111, 'gini = 0.0\\nsamples = 1\\nvalue = [1, 0]\\nclass = Ready biodegradable'),\n",
|
|||
|
" Text(0.3107344632768362, 0.3611111111111111, 'gini = 0.0\\nsamples = 1\\nvalue = [0, 1]\\nclass = Reday non-biodegradable'),\n",
|
|||
|
" Text(0.3107344632768362, 0.4722222222222222, 'gini = 0.0\\nsamples = 1\\nvalue = [1, 0]\\nclass = Ready biodegradable'),\n",
|
|||
|
" Text(0.2132768361581921, 0.6388888888888888, 'gini = 0.0\\nsamples = 4\\nvalue = [4, 0]\\nclass = Ready biodegradable'),\n",
|
|||
|
" Text(0.2245762711864407, 0.6944444444444444, 'gini = 0.0\\nsamples = 4\\nvalue = [4, 0]\\nclass = Ready biodegradable'),\n",
|
|||
|
" Text(0.3700564971751412, 0.8055555555555556, 'V16 <= 0.5\\ngini = 0.453\\nsamples = 75\\nvalue = [49, 26]\\nclass = Ready biodegradable'),\n",
|
|||
|
" Text(0.3163841807909605, 0.75, 'V1 <= 4.107\\ngini = 0.311\\nsamples = 52\\nvalue = [42, 10]\\nclass = Ready biodegradable'),\n",
|
|||
|
" Text(0.288135593220339, 0.6944444444444444, 'V34 <= 2.5\\ngini = 0.48\\nsamples = 10\\nvalue = [4, 6]\\nclass = Reday non-biodegradable'),\n",
|
|||
|
" Text(0.2768361581920904, 0.6388888888888888, 'gini = 0.0\\nsamples = 6\\nvalue = [0, 6]\\nclass = Reday non-biodegradable'),\n",
|
|||
|
" Text(0.2994350282485876, 0.6388888888888888, 'gini = 0.0\\nsamples = 4\\nvalue = [4, 0]\\nclass = Ready biodegradable'),\n",
|
|||
|
" Text(0.3446327683615819, 0.6944444444444444, 'V2 <= 2.217\\ngini = 0.172\\nsamples = 42\\nvalue = [38, 4]\\nclass = Ready biodegradable'),\n",
|
|||
|
" Text(0.3220338983050847, 0.6388888888888888, 'V38 <= 1.5\\ngini = 0.444\\nsamples = 3\\nvalue = [1, 2]\\nclass = Reday non-biodegradable'),\n",
|
|||
|
" Text(0.3107344632768362, 0.5833333333333334, 'gini = 0.0\\nsamples = 2\\nvalue = [0, 2]\\nclass = Reday non-biodegradable'),\n",
|
|||
|
" Text(0.3333333333333333, 0.5833333333333334, 'gini = 0.0\\nsamples = 1\\nvalue = [1, 0]\\nclass = Ready biodegradable'),\n",
|
|||
|
" Text(0.3672316384180791, 0.6388888888888888, 'V1 <= 4.426\\ngini = 0.097\\nsamples = 39\\nvalue = [37, 2]\\nclass = Ready biodegradable'),\n",
|
|||
|
" Text(0.3559322033898305, 0.5833333333333334, 'V7 <= 0.5\\ngini = 0.32\\nsamples = 10\\nvalue = [8, 2]\\nclass = Ready biodegradable'),\n",
|
|||
|
" Text(0.3446327683615819, 0.5277777777777778, 'V28 <= 0.029\\ngini = 0.198\\nsamples = 9\\nvalue = [8, 1]\\nclass = Ready biodegradable'),\n",
|
|||
|
" Text(0.3333333333333333, 0.4722222222222222, 'gini = 0.0\\nsamples = 7\\nvalue = [7, 0]\\nclass = Ready biodegradable'),\n",
|
|||
|
" Text(0.3559322033898305, 0.4722222222222222, 'V28 <= 0.097\\ngini = 0.5\\nsamples = 2\\nvalue = [1, 1]\\nclass = Ready biodegradable'),\n",
|
|||
|
" Text(0.3446327683615819, 0.4166666666666667, 'gini = 0.0\\nsamples = 1\\nvalue = [0, 1]\\nclass = Reday non-biodegradable'),\n",
|
|||
|
" Text(0.3672316384180791, 0.4166666666666667, 'gini = 0.0\\nsamples = 1\\nvalue = [1, 0]\\nclass = Ready biodegradable'),\n",
|
|||
|
" Text(0.3672316384180791, 0.5277777777777778, 'gini = 0.0\\nsamples = 1\\nvalue = [0, 1]\\nclass = Reday non-biodegradable'),\n",
|
|||
|
" Text(0.3785310734463277, 0.5833333333333334, 'gini = 0.0\\nsamples = 29\\nvalue = [29, 0]\\nclass = Ready biodegradable'),\n",
|
|||
|
" Text(0.423728813559322, 0.75, 'V27 <= 2.089\\ngini = 0.423\\nsamples = 23\\nvalue = [7, 16]\\nclass = Reday non-biodegradable'),\n",
|
|||
|
" Text(0.4124293785310734, 0.6944444444444444, 'gini = 0.0\\nsamples = 3\\nvalue = [3, 0]\\nclass = Ready biodegradable'),\n",
|
|||
|
" Text(0.4350282485875706, 0.6944444444444444, 'V28 <= 0.015\\ngini = 0.32\\nsamples = 20\\nvalue = [4, 16]\\nclass = Reday non-biodegradable'),\n",
|
|||
|
" Text(0.4124293785310734, 0.6388888888888888, 'V27 <= 2.239\\ngini = 0.124\\nsamples = 15\\nvalue = [1, 14]\\nclass = Reday non-biodegradable'),\n",
|
|||
|
" Text(0.4011299435028249, 0.5833333333333334, 'gini = 0.0\\nsamples = 14\\nvalue = [0, 14]\\nclass = Reday non-biodegradable'),\n",
|
|||
|
" Text(0.423728813559322, 0.5833333333333334, 'gini = 0.0\\nsamples = 1\\nvalue = [1, 0]\\nclass = Ready biodegradable'),\n",
|
|||
|
" Text(0.4576271186440678, 0.6388888888888888, 'V36 <= 3.535\\ngini = 0.48\\nsamples = 5\\nvalue = [3, 2]\\nclass = Ready biodegradable'),\n",
|
|||
|
" Text(0.4463276836158192, 0.5833333333333334, 'gini = 0.0\\nsamples = 2\\nvalue = [0, 2]\\nclass = Reday non-biodegradable'),\n",
|
|||
|
" Text(0.4689265536723164, 0.5833333333333334, 'gini = 0.0\\nsamples = 3\\nvalue = [3, 0]\\nclass = Ready biodegradable'),\n",
|
|||
|
" Text(0.5677966101694916, 0.8611111111111112, 'V30 <= 10.221\\ngini = 0.388\\nsamples = 76\\nvalue = [56, 20]\\nclass = Ready biodegradable'),\n",
|
|||
|
" Text(0.5254237288135594, 0.8055555555555556, 'V36 <= 3.673\\ngini = 0.201\\nsamples = 44\\nvalue = [39, 5]\\nclass = Ready biodegradable'),\n",
|
|||
|
" Text(0.5141242937853108, 0.75, 'V18 <= 1.158\\ngini = 0.133\\nsamples = 42\\nvalue = [39, 3]\\nclass = Ready biodegradable'),\n",
|
|||
|
" Text(0.4915254237288136, 0.6944444444444444, 'V12 <= 1.437\\ngini = 0.05\\nsamples = 39\\nvalue = [38, 1]\\nclass = Ready biodegradable'),\n",
|
|||
|
" Text(0.480225988700565, 0.6388888888888888, 'gini = 0.0\\nsamples = 37\\nvalue = [37, 0]\\nclass = Ready biodegradable'),\n",
|
|||
|
" Text(0.5028248587570622, 0.6388888888888888, 'V12 <= 1.504\\ngini = 0.5\\nsamples = 2\\nvalue = [1, 1]\\nclass = Ready biodegradable'),\n",
|
|||
|
" Text(0.4915254237288136, 0.5833333333333334, 'gini = 0.0\\nsamples = 1\\nvalue = [0, 1]\\nclass = Reday non-biodegradable'),\n",
|
|||
|
" Text(0.5141242937853108, 0.5833333333333334, 'gini = 0.0\\nsamples = 1\\nvalue = [1, 0]\\nclass = Ready biodegradable'),\n",
|
|||
|
" Text(0.536723163841808, 0.6944444444444444, 'V17 <= 1.005\\ngini = 0.444\\nsamples = 3\\nvalue = [1, 2]\\nclass = Reday non-biodegradable'),\n",
|
|||
|
" Text(0.5254237288135594, 0.6388888888888888, 'gini = 0.0\\nsamples = 1\\nvalue = [1, 0]\\nclass = Ready biodegradable'),\n",
|
|||
|
" Text(0.5480225988700564, 0.6388888888888888, 'gini = 0.0\\nsamples = 2\\nvalue = [0, 2]\\nclass = Reday non-biodegradable'),\n",
|
|||
|
" Text(0.536723163841808, 0.75, 'gini = 0.0\\nsamples = 2\\nvalue = [0, 2]\\nclass = Reday non-biodegradable'),\n",
|
|||
|
" Text(0.6101694915254238, 0.8055555555555556, 'V36 <= 3.511\\ngini = 0.498\\nsamples = 32\\nvalue = [17, 15]\\nclass = Ready biodegradable'),\n",
|
|||
|
" Text(0.5706214689265536, 0.75, 'V31 <= 1.567\\ngini = 0.278\\nsamples = 12\\nvalue = [2, 10]\\nclass = Reday non-biodegradable'),\n",
|
|||
|
" Text(0.559322033898305, 0.6944444444444444, 'gini = 0.0\\nsamples = 1\\nvalue = [1, 0]\\nclass = Ready biodegradable'),\n",
|
|||
|
" Text(0.5819209039548022, 0.6944444444444444, 'V2 <= 4.616\\ngini = 0.165\\nsamples = 11\\nvalue = [1, 10]\\nclass = Reday non-biodegradable'),\n",
|
|||
|
" Text(0.5706214689265536, 0.6388888888888888, 'gini = 0.0\\nsamples = 10\\nvalue = [0, 10]\\nclass = Reday non-biodegradable'),\n",
|
|||
|
" Text(0.5932203389830508, 0.6388888888888888, 'gini = 0.0\\nsamples = 1\\nvalue = [1, 0]\\nclass = Ready biodegradable'),\n",
|
|||
|
" Text(0.6497175141242938, 0.75, 'V17 <= 1.025\\ngini = 0.375\\nsamples = 20\\nvalue = [15, 5]\\nclass = Ready biodegradable'),\n",
|
|||
|
" Text(0.6271186440677966, 0.6944444444444444, 'V1 <= 4.856\\ngini = 0.124\\nsamples = 15\\nvalue = [14, 1]\\nclass = Ready biodegradable'),\n",
|
|||
|
" Text(0.615819209039548, 0.6388888888888888, 'gini = 0.0\\nsamples = 1\\nvalue = [0, 1]\\nclass = Reday non-biodegradable'),\n",
|
|||
|
" Text(0.6384180790960452, 0.6388888888888888, 'gini = 0.0\\nsamples = 14\\nvalue = [14, 0]\\nclass = Ready biodegradable'),\n",
|
|||
|
" Text(0.672316384180791, 0.6944444444444444, 'V31 <= 2.562\\ngini = 0.32\\nsamples = 5\\nvalue = [1, 4]\\nclass = Reday non-biodegradable'),\n",
|
|||
|
" Text(0.6610169491525424, 0.6388888888888888, 'gini = 0.0\\nsamples = 4\\nvalue = [0, 4]\\nclass = Reday non-biodegradable'),\n",
|
|||
|
" Text(0.6836158192090396, 0.6388888888888888, 'gini = 0.0\\nsamples = 1\\nvalue = [1, 0]\\nclass = Ready biodegradable'),\n",
|
|||
|
" Text(0.8990112994350282, 0.9166666666666666, 'V40 <= 0.5\\ngini = 0.241\\nsamples = 485\\nvalue = [417, 68]\\nclass = Ready biodegradable'),\n",
|
|||
|
" Text(0.8206214689265536, 0.8611111111111112, 'V12 <= -0.712\\ngini = 0.185\\nsamples = 457\\nvalue = [410, 47]\\nclass = Ready biodegradable'),\n",
|
|||
|
" Text(0.751412429378531, 0.8055555555555556, 'V27 <= 2.363\\ngini = 0.452\\nsamples = 55\\nvalue = [36, 19]\\nclass = Ready biodegradable'),\n",
|
|||
|
" Text(0.7401129943502824, 0.75, 'V8 <= 42.5\\ngini = 0.475\\nsamples = 31\\nvalue = [12, 19]\\nclass = Reday non-biodegradable'),\n",
|
|||
|
" Text(0.7175141242937854, 0.6944444444444444, 'V22 <= 1.228\\ngini = 0.495\\nsamples = 20\\nvalue = [11, 9]\\nclass = Ready biodegradable'),\n",
|
|||
|
" Text(0.7062146892655368, 0.6388888888888888, 'V11 <= 0.5\\ngini = 0.459\\nsamples = 14\\nvalue = [5, 9]\\nclass = Reday non-biodegradable'),\n",
|
|||
|
" Text(0.6949152542372882, 0.5833333333333334, 'gini = 0.0\\nsamples = 6\\nvalue = [0, 6]\\nclass = Reday non-biodegradable'),\n",
|
|||
|
" Text(0.7175141242937854, 0.5833333333333334, 'V31 <= 1.845\\ngini = 0.469\\nsamples = 8\\nvalue = [5, 3]\\nclass = Ready biodegradable'),\n",
|
|||
|
" Text(0.7062146892655368, 0.5277777777777778, 'gini = 0.0\\nsamples = 2\\nvalue = [0, 2]\\nclass = Reday non-biodegradable'),\n",
|
|||
|
" Text(0.7288135593220338, 0.5277777777777778, 'V37 <= 3.486\\ngini = 0.278\\nsamples = 6\\nvalue = [5, 1]\\nclass = Ready biodegradable'),\n",
|
|||
|
" Text(0.7175141242937854, 0.4722222222222222, 'gini = 0.0\\nsamples = 4\\nvalue = [4, 0]\\nclass = Ready biodegradable'),\n",
|
|||
|
" Text(0.7401129943502824, 0.4722222222222222, 'V2 <= 3.358\\ngini = 0.5\\nsamples = 2\\nvalue = [1, 1]\\nclass = Ready biodegradable'),\n",
|
|||
|
" Text(0.7288135593220338, 0.4166666666666667, 'gini = 0.0\\nsamples = 1\\nvalue = [0, 1]\\nclass = Reday non-biodegradable'),\n",
|
|||
|
" Text(0.751412429378531, 0.4166666666666667, 'gini = 0.0\\nsamples = 1\\nvalue = [1, 0]\\nclass = Ready biodegradable'),\n",
|
|||
|
" Text(0.7288135593220338, 0.6388888888888888, 'gini = 0.0\\nsamples = 6\\nvalue = [6, 0]\\nclass = Ready biodegradable'),\n",
|
|||
|
" Text(0.7627118644067796, 0.6944444444444444, 'V3 <= 0.5\\ngini = 0.165\\nsamples = 11\\nvalue = [1, 10]\\nclass = Reday non-biodegradable'),\n",
|
|||
|
" Text(0.751412429378531, 0.6388888888888888, 'gini = 0.0\\nsamples = 10\\nvalue = [0, 10]\\nclass = Reday non-biodegradable'),\n",
|
|||
|
" Text(0.7740112994350282, 0.6388888888888888, 'gini = 0.0\\nsamples = 1\\nvalue = [1, 0]\\nclass = Ready biodegradable'),\n",
|
|||
|
" Text(0.7627118644067796, 0.75, 'gini = 0.0\\nsamples = 24\\nvalue = [24, 0]\\nclass = Ready biodegradable'),\n",
|
|||
|
" Text(0.8898305084745762, 0.8055555555555556, 'V39 <= 8.446\\ngini = 0.13\\nsamples = 402\\nvalue = [374, 28]\\nclass = Ready biodegradable'),\n",
|
|||
|
" Text(0.847457627118644, 0.75, 'V30 <= 5.124\\ngini = 0.357\\nsamples = 56\\nvalue = [43, 13]\\nclass = Ready biodegradable'),\n",
|
|||
|
" Text(0.8192090395480226, 0.6944444444444444, 'V8 <= 46.1\\ngini = 0.206\\nsamples = 43\\nvalue = [38, 5]\\nclass = Ready biodegradable'),\n",
|
|||
|
" Text(0.7966101694915254, 0.6388888888888888, 'V6 <= 1.5\\ngini = 0.1\\nsamples = 38\\nvalue = [36, 2]\\nclass = Ready biodegradable'),\n",
|
|||
|
" Text(0.7853107344632768, 0.5833333333333334, 'V14 <= 1.534\\ngini = 0.053\\nsamples = 37\\nvalue = [36, 1]\\nclass = Ready biodegradable'),\n",
|
|||
|
" Text(0.7740112994350282, 0.5277777777777778, 'gini = 0.0\\nsamples = 35\\nvalue = [35, 0]\\nclass = Ready biodegradable'),\n",
|
|||
|
" Text(0.7966101694915254, 0.5277777777777778, 'V17 <= 0.989\\ngini = 0.5\\nsamples = 2\\nvalue = [1, 1]\\nclass = Ready biodegradable'),\n",
|
|||
|
" Text(0.7853107344632768, 0.4722222222222222, 'gini = 0.0\\nsamples = 1\\nvalue = [1, 0]\\nclass = Ready biodegradable'),\n",
|
|||
|
" Text(0.807909604519774, 0.4722222222222222, 'gini = 0.0\\nsamples = 1\\nvalue = [0, 1]\\nclass = Reday non-biodegradable'),\n",
|
|||
|
" Text(0.807909604519774, 0.5833333333333334, 'gini = 0.0\\nsamples = 1\\nvalue = [0, 1]\\nclass = Reday non-biodegradable'),\n",
|
|||
|
" Text(0.8418079096045198, 0.6388888888888888, 'V18 <= 1.105\\ngini = 0.48\\nsamples = 5\\nvalue = [2, 3]\\nclass = Reday non-biodegradable'),\n",
|
|||
|
" Text(0.8305084745762712, 0.5833333333333334, 'gini = 0.0\\nsamples = 2\\nvalue = [2, 0]\\nclass = Ready biodegradable'),\n",
|
|||
|
" Text(0.8531073446327684, 0.5833333333333334, 'gini = 0.0\\nsamples = 3\\nvalue = [0, 3]\\nclass = Reday non-biodegradable'),\n",
|
|||
|
" Text(0.8757062146892656, 0.6944444444444444, 'V30 <= 15.07\\ngini = 0.473\\nsamples = 13\\nvalue = [5, 8]\\nclass = Reday non-biodegradable'),\n",
|
|||
|
" Text(0.864406779661017, 0.6388888888888888, 'gini = 0.0\\nsamples = 8\\nvalue = [0, 8]\\nclass = Reday non-biodegradable'),\n",
|
|||
|
" Text(0.8870056497175142, 0.6388888888888888, 'gini = 0.0\\nsamples = 5\\nvalue = [5, 0]\\nclass = Ready biodegradable'),\n",
|
|||
|
" Text(0.9322033898305084, 0.75, 'V31 <= 9.19\\ngini = 0.083\\nsamples = 346\\nvalue = [331, 15]\\nclass = Ready biodegradable'),\n",
|
|||
|
" Text(0.9209039548022598, 0.6944444444444444, 'V8 <= 13.25\\ngini = 0.078\\nsamples = 345\\nvalue = [331, 14]\\nclass = Ready biodegradable'),\n",
|
|||
|
" Text(0.9096045197740112, 0.6388888888888888, 'gini = 0.0\\nsamples = 1\\nvalue = [0, 1]\\nclass = Reday non-biodegradable'),\n",
|
|||
|
" Text(0.9322033898305084, 0.6388888888888888, 'V38 <= 0.5\\ngini = 0.073\\nsamples = 344\\nvalue = [331, 13]\\nclass = Ready biodegradable'),\n",
|
|||
|
" Text(0.8926553672316384, 0.5833333333333334, 'V30 <= 12.851\\ngini = 0.118\\nsamples = 191\\nvalue = [179, 12]\\nclass = Ready biodegradable'),\n",
|
|||
|
" Text(0.8700564971751412, 0.5277777777777778, 'V15 <= 10.386\\ngini = 0.078\\nsamples = 173\\nvalue = [166, 7]\\nclass = Ready biodegradable'),\n",
|
|||
|
" Text(0.8587570621468926, 0.4722222222222222, 'V30 <= 11.485\\ngini = 0.143\\nsamples = 90\\nvalue = [83, 7]\\nclass = Ready biodegradable'),\n",
|
|||
|
" Text(0.847457627118644, 0.4166666666666667, 'V14 <= 2.748\\ngini = 0.126\\nsamples = 89\\nvalue = [83, 6]\\nclass = Ready biodegradable'),\n",
|
|||
|
" Text(0.8361581920903954, 0.3611111111111111, 'V14 <= 0.861\\ngini = 0.107\\nsamples = 88\\nvalue = [83, 5]\\nclass = Ready biodegradable'),\n",
|
|||
|
" Text(0.8135593220338984, 0.3055555555555556, 'V15 <= 10.142\\ngini = 0.231\\nsamples = 30\\nvalue = [26, 4]\\nclass = Ready biodegradable'),\n",
|
|||
|
" Text(0.8022598870056498, 0.25, 'V2 <= 2.458\\ngini = 0.391\\nsamples = 15\\nvalue = [11, 4]\\nclass = Ready biodegradable'),\n",
|
|||
|
" Text(0.7909604519774012, 0.19444444444444445, 'gini = 0.0\\nsamples = 2\\nvalue = [0, 2]\\nclass = Reday non-biodegradable'),\n",
|
|||
|
" Text(0.8135593220338984, 0.19444444444444445, 'V36 <= 3.866\\ngini = 0.26\\nsamples = 13\\nvalue = [11, 2]\\nclass = Ready biodegradable'),\n",
|
|||
|
" Text(0.8022598870056498, 0.1388888888888889, 'gini = 0.0\\nsamples = 1\\nvalue = [0, 1]\\nclass = Reday non-biodegradable'),\n",
|
|||
|
" Text(0.8248587570621468, 0.1388888888888889, 'V31 <= 0.973\\ngini = 0.153\\nsamples = 12\\nvalue = [11, 1]\\nclass = Ready biodegradable'),\n",
|
|||
|
" Text(0.8135593220338984, 0.08333333333333333, 'V14 <= 0.792\\ngini = 0.5\\nsamples = 2\\nvalue = [1, 1]\\nclass = Ready biodegradable'),\n",
|
|||
|
" Text(0.8022598870056498, 0.027777777777777776, 'gini = 0.0\\nsamples = 1\\nvalue = [1, 0]\\nclass = Ready biodegradable'),\n",
|
|||
|
" Text(0.8248587570621468, 0.027777777777777776, 'gini = 0.0\\nsamples = 1\\nvalue = [0, 1]\\nclass = Reday non-biodegradable'),\n",
|
|||
|
" Text(0.8361581920903954, 0.08333333333333333, 'gini = 0.0\\nsamples = 10\\nvalue = [10, 0]\\nclass = Ready biodegradable'),\n",
|
|||
|
" Text(0.8248587570621468, 0.25, 'gini = 0.0\\nsamples = 15\\nvalue = [15, 0]\\nclass = Ready biodegradable'),\n",
|
|||
|
" Text(0.8587570621468926, 0.3055555555555556, 'V2 <= 2.978\\ngini = 0.034\\nsamples = 58\\nvalue = [57, 1]\\nclass = Ready biodegradable'),\n",
|
|||
|
" Text(0.847457627118644, 0.25, 'V2 <= 2.946\\ngini = 0.117\\nsamples = 16\\nvalue = [15, 1]\\nclass = Ready biodegradable'),\n",
|
|||
|
" Text(0.8361581920903954, 0.19444444444444445, 'gini = 0.0\\nsamples = 15\\nvalue = [15, 0]\\nclass = Ready biodegradable'),\n",
|
|||
|
" Text(0.8587570621468926, 0.19444444444444445, 'gini = 0.0\\nsamples = 1\\nvalue = [0, 1]\\nclass = Reday non-biodegradable'),\n",
|
|||
|
" Text(0.8700564971751412, 0.25, 'gini = 0.0\\nsamples = 42\\nvalue = [42, 0]\\nclass = Ready biodegradable'),\n",
|
|||
|
" Text(0.8587570621468926, 0.3611111111111111, 'gini = 0.0\\nsamples = 1\\nvalue = [0, 1]\\nclass = Reday non-biodegradable'),\n",
|
|||
|
" Text(0.8700564971751412, 0.4166666666666667, 'gini = 0.0\\nsamples = 1\\nvalue = [0, 1]\\nclass = Reday non-biodegradable'),\n",
|
|||
|
" Text(0.8813559322033898, 0.4722222222222222, 'gini = 0.0\\nsamples = 83\\nvalue = [83, 0]\\nclass = Ready biodegradable'),\n",
|
|||
|
" Text(0.9152542372881356, 0.5277777777777778, 'V17 <= 1.038\\ngini = 0.401\\nsamples = 18\\nvalue = [13, 5]\\nclass = Ready biodegradable'),\n",
|
|||
|
" Text(0.903954802259887, 0.4722222222222222, 'V14 <= 0.644\\ngini = 0.231\\nsamples = 15\\nvalue = [13, 2]\\nclass = Ready biodegradable'),\n",
|
|||
|
" Text(0.8926553672316384, 0.4166666666666667, 'gini = 0.0\\nsamples = 2\\nvalue = [0, 2]\\nclass = Reday non-biodegradable'),\n",
|
|||
|
" Text(0.9152542372881356, 0.4166666666666667, 'gini = 0.0\\nsamples = 13\\nvalue = [13, 0]\\nclass = Ready biodegradable'),\n",
|
|||
|
" Text(0.9265536723163842, 0.4722222222222222, 'gini = 0.0\\nsamples = 3\\nvalue = [0, 3]\\nclass = Reday non-biodegradable'),\n",
|
|||
|
" Text(0.9717514124293786, 0.5833333333333334, 'V31 <= 1.042\\ngini = 0.013\\nsamples = 153\\nvalue = [152, 1]\\nclass = Ready biodegradable'),\n",
|
|||
|
" Text(0.96045197740113, 0.5277777777777778, 'V30 <= 10.944\\ngini = 0.375\\nsamples = 4\\nvalue = [3, 1]\\nclass = Ready biodegradable'),\n",
|
|||
|
" Text(0.9491525423728814, 0.4722222222222222, 'gini = 0.0\\nsamples = 3\\nvalue = [3, 0]\\nclass = Ready biodegradable'),\n",
|
|||
|
" Text(0.9717514124293786, 0.4722222222222222, 'gini = 0.0\\nsamples = 1\\nvalue = [0, 1]\\nclass = Reday non-biodegradable'),\n",
|
|||
|
" Text(0.9830508474576272, 0.5277777777777778, 'gini = 0.0\\nsamples = 149\\nvalue = [149, 0]\\nclass = Ready biodegradable'),\n",
|
|||
|
" Text(0.943502824858757, 0.6944444444444444, 'gini = 0.0\\nsamples = 1\\nvalue = [0, 1]\\nclass = Reday non-biodegradable'),\n",
|
|||
|
" Text(0.9774011299435028, 0.8611111111111112, 'V13 <= 4.278\\ngini = 0.375\\nsamples = 28\\nvalue = [7, 21]\\nclass = Reday non-biodegradable'),\n",
|
|||
|
" Text(0.9661016949152542, 0.8055555555555556, 'V22 <= 1.251\\ngini = 0.159\\nsamples = 23\\nvalue = [2, 21]\\nclass = Reday non-biodegradable'),\n",
|
|||
|
" Text(0.9548022598870056, 0.75, 'gini = 0.0\\nsamples = 18\\nvalue = [0, 18]\\nclass = Reday non-biodegradable'),\n",
|
|||
|
" Text(0.9774011299435028, 0.75, 'V12 <= -0.564\\ngini = 0.48\\nsamples = 5\\nvalue = [2, 3]\\nclass = Reday non-biodegradable'),\n",
|
|||
|
" Text(0.9661016949152542, 0.6944444444444444, 'gini = 0.0\\nsamples = 2\\nvalue = [2, 0]\\nclass = Ready biodegradable'),\n",
|
|||
|
" Text(0.9887005649717514, 0.6944444444444444, 'gini = 0.0\\nsamples = 3\\nvalue = [0, 3]\\nclass = Reday non-biodegradable'),\n",
|
|||
|
" Text(0.9887005649717514, 0.8055555555555556, 'gini = 0.0\\nsamples = 5\\nvalue = [5, 0]\\nclass = Ready biodegradable')]"
|
|||
|
]
|
|||
|
},
|
|||
|
"execution_count": 50,
|
|||
|
"metadata": {},
|
|||
|
"output_type": "execute_result"
|
|||
|
},
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAEj4AABIgCAYAAABMxjtCAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjYuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8o6BhiAAAACXBIWXMAAA9hAAAPYQGoP6dpAAEAAElEQVR4nOz9eZSd+UHf+X9ubdr3tVtqtVrq1i611truNTaEbcDgJgFMxmeIiX+QSQbm/H7DCSSeZIYwmUMCTpgzJgwMgSR2SNgC7g4YsGOM8b1VKu1Lq1tSt1pLa9/3pbb7+8PhOSncQNOW7tXyep3jP1zuqvpUddWt53nO+b5dqtfr9QAAAAAAAAAAAAAAAAAAAAAAADRAS7MHAAAAAAAAAAAAAAAAAAAAAAAATw7hIwAAAAAAAAAAAAAAAAAAAAAAoGGEjwAAAAAAAAAAAAAAAAAAAAAAgIYRPgIAAAAAAAAAAAAAAAAAAAAAABpG+AgAAAAAAAAAAAAAAAAAAAAAAGgY4SMAAAAAAAAAAAAAAAAAAAAAAKBhhI8AAAAAAAAAAAAAAAAAAAAAAICGET4CAAAAAAAAAAAAAAAAAAAAAAAaRvgIAAAAAAAAAAAAAAAAAAAAAABoGOEjAAAAAAAAAAAAAAAAAAAAAACgYYSPAAAAAAAAAAAAAAAAAAAAAACAhhE+AgAAAAAAAAAAAAAAAAAAAAAAGkb4CAAAAAAAAAAAAAAAAAAAAAAAaBjhIwAAAAAAAAAAAAAAAAAAAAAAoGGEjwAAAAAAAAAAAAAAAAAAAAAAgIYRPgIAAAAAAAAAAAAAAAAAAAAAABpG+AgAAAAAAAAAAAAAAAAAAAAAAGgY4SMAAAAAAAAAAAAAAAAAAAAAAKBhhI8AAAAAAAAAAAAAAAAAAAAAAICGET4CAAAAAAAAAAAAAAAAAAAAAAAaRvgIAAAAAAAAAAAAAAAAAAAAAABoGOEjAAAAAAAAAAAAAAAAAAAAAACgYYSPAAAAAAAAAAAAAAAAAAAAAACAhhE+AgAAAAAAAAAAAAAAAAAAAAAAGkb4CAAAAAAAAAAAAAAAAAAAAAAAaBjhIwAAAAAAAAAAAAAAAAAAAAAAoGGEjwAAAAAAAAAAAAAAAAAAAAAAgIYRPgIAAAAAAAAAAAAAAAAAAAAAABpG+AgAAAAAAAAAAAAAAAAAAAAAAGgY4SMAAAAAAAAAAAAAAAAAAAAAAKBhhI8AAAAAAAAAAAAAAAAAAAAAAICGET4CAAAAAAAAAAAAAAAAAAAAAAAaRvgIAAAAAAAAAAAAAAAAAAAAAABoGOEjAAAAAAAAAAAAAAAAAAAAAACgYYSPAAAAAAAAAAAAAAAAAAAAAACAhhE+AgAAAAAAAAAAAAAAAAAAAAAAGkb4CAAAAAAAAAAAAAAAAAAAAAAAaBjhIwAAAAAAAAAAAAAAAAAAAAAAoGGEjwAAAAAAAAAAAAAAAAAAAAAAgIYRPgIAAAAAAAAAAAAAAAAAAAAAABpG+AgAAAAAAAAAAAAAAAAAAAAAAGgY4SMAAAAAAAAAAAAAAAAAAAAAAKBhhI8AAAAAAAAAAAAAAAAAAAAAAICGET4CAAAAAAAAAAAAAAAAAAAAAAAaRvgIAAAAAAAAAAAAAAAAAAAAAABoGOEjAAAAAAAAAAAAAAAAAAAAAACgYYSPAAAAAAAAAAAAAAAAAAAAAACAhhE+AgAAAAAAAAAAAAAAAAAAAAAAGkb4CAAAAAAAAAAAAAAAAAAAAAAAaBjhIwAAAAAAAAAAAAAAAAAAAAAAoGGEjwAAAAAAAAAAAAAAAAAAAAAAgIYRPgIAAAAAAAAAAAAAAAAAAAAAABpG+AgAAAAAAAAAAAAAAAAAAAAAAGgY4SMAAAAAAAAAAAAAAAAAAAAAAKBhhI8AAAAAAAAAAAAAAAAAAAAAAICGET4CAAAAAAAAAAAAAAAAAAAAAAAaRvgIAAAAAAAAAAAAAAAAAAAAAABoGOEjAAAAAAAAAAAAAAAAAAAAAACgYYSPAAAAAAAAAAAAAAAAAAAAAACAhhE+AgAAAAAAAAAAAAAAAAAAAAAAGkb4CAAAAAAAAAAAAAAAAAAAAAAAaBjhIwAAAAAAAAAAAAAAAAAAAAAAoGGEjwAAAAAAAAAAAAAAAAAAAAAAgIYRPgIAAAAAAAAAAAAAAAAAAAAAABpG+AgAAAAAAAAAAAAAAAAAAAAAAGgY4SMAAAAAAAAAAAAAAAAAAAAAAKBhhI8AAAAAAAAAAAAAAAAAAAAAAICGET4CAAAAAAAAAAAAAAAAAAAAAAAaRvgIAAAAAAAAAAAAAAAAAAAAAABoGOEjAAAAAAAAAAAAAAAAAAAAAACgYYSPAAAAAAAAAAAAAAAAAAAAAACAhhE+AgAAAAAAAAAAAAAAAAAAAAAAGkb4CAAAAAAAAAAAAAAAAAAAAAAAaBjhIwAAAAAAAAAAAAAAAAAAAAAAoGGEjwAAAAAAAAAAAAAAAAAAAAAAgIYRPgIAAAAAAAAAAAAAAAAAAAAAABpG+AgAAAAAAAAAAAAAAAAAAAAAAGgY4SMAAAAAAAAAAAAAAAAAAAAAAKBhhI8AAAAAAAAAAAAAAAAAAAAAAICGET4CAAAAAAAAAAAAAAAAAAAAAAAaRvgIAAAAAAAAAAAAAAAAAAAAAABoGOEjAAAAAAAAAAAAAAAAAAAAAACgYYSPAAAAAAAAAAAAAAAAAAAAAACAhhE+AgAAAAAAAAAAAAAAAAAAAAAAGkb4CAAAAAAAAAAAAAAAAAAAAAAAaBjhIwAAAAAAAAAAAAAAAAAAAAAAoGGEjwAAAAAAAAAAAAAAAAAAAAAAgIYRPgIAAAAAAAAAAAAAAAAAAAAAABpG+AgAAAAAAAAAAAAAAAAAAAAAAGgY4SMAAAAAAAAAAAAAAAAAAAAAAKBhhI8AAAAAAAAAAAAAAAAAAAAAAICGET4CAAAAAAAAAAAAAAAAAAAAAAAaRvgIAAAAAAAAAAAAAAAAAAAAAABoGOEjAAAAAAAAAAAAAAAAAAAAAACgYYSPAAAAAAAAAAAAAAAAAAAAAACAhhE+AgAAAAAAAAAAAAAAAAAAAAAAGkb4CAAAAAAAAAAAAAAAAAAAAAAAaBjhIwAAAAAAAAAAAAAAAAAAAAAAoGGEjwAAAAAAAAAAAAAAAAAAAAAAgIYRPgIAAAAAAAAAAAAAAAAAAAAAABpG+AgAAAAAAAAAAAAAAAAAAAAAAGgY4SMAAAAAAAAAAAAAAAAAAAAAAKBhhI8AAAAAAAAAAAAAAAAAAAAAAICGET4CAAAAAAAAAAAAAAAAAAAAAAAaRvgIAAAAAAAAAAAAAAAAAAAAAABoGOEjAAAAAAAAAAAAAAAAAAAAAACgYYSPAAAAAAAAAAAAAAAAAAAAAACAhhE+AgAAAAAAAAAAAAAAAAAAAAAAGkb4CAAAAAAAAAAAAAAAAAAAAAAAaBjhIwAAAAAAAAAAAAAAAAAAAAAAoGGEjwAAAAAAAAAAAAAAAAAAAAAAgIYRPgIAAAAAAAAAAAAAAAAAAAAAABpG+AgAAAAAAAAAAAAAAAAAAAAAAGgY4SMAAAAAAAAAAAAAAAAAAAAAAKBhhI8AAAAAAAAAAAAAAAAAAAAAAICGET4CAAAAAAAAAAAAAAAAAAAAAAAaRvgIAAAAAAAAAAAAAAAAAAAAAABoGOEjAAAAAAAAAAAAAAAAAAAAAACgYYSPAAAAAAAAAAAAAAAAAAAAAACAhhE+AgAAAAAAAAAAAAAAAAAAAAAAGqat2QMAAAAAAAAAAAAAAOBRUK/Xs3///pw7d67ZUx5rEydOzLp16zJlypRmTwEAAAAAAB4Q4SMAAAAAAAAAAAAAAPhL1Ov1/NiP/Vg+8YlPNHvKE2HlihX5oy9+MfPnz2/2FAAAAAAA4AEo1ev1erNHAAAAAAAAAAAAAADAw2z79u3p7OzM//6Dfz3f/Q2dKZVKzZ702Dpz6Wq+73/9V/nuD//N/MIv/EKz5wAAAAAAAA9AW7MHAAAAAAAAAAAAAADAw+706dNJku//tkrmzJj6wD7PD//0v80n//7fSqlUyt/
|
|||
|
"text/plain": [
|
|||
|
"<Figure size 6000x6000 with 1 Axes>"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {},
|
|||
|
"output_type": "display_data"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"from sklearn.tree import plot_tree\n",
|
|||
|
"\n",
|
|||
|
"plt.figure(figsize=(60, 60))\n",
|
|||
|
"plot_tree(model, filled=True, rounded=True, class_names=['Ready biodegradable', 'Reday non-biodegradable'], feature_names=X_train.columns)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"attachments": {},
|
|||
|
"cell_type": "markdown",
|
|||
|
"id": "b55c97cd",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"### Random Forrest Classifier"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 60,
|
|||
|
"id": "c9d5676b",
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"text/plain": [
|
|||
|
"({'Accuracy': 0.8229665071770335,\n",
|
|||
|
" 'F1': 0.8664259927797834,\n",
|
|||
|
" 'Precision': 0.8450704225352113,\n",
|
|||
|
" 'Recall': 0.8888888888888888,\n",
|
|||
|
" 'AUC': 0.7957957957957957},\n",
|
|||
|
" RandomForestClassifier())"
|
|||
|
]
|
|||
|
},
|
|||
|
"execution_count": 60,
|
|||
|
"metadata": {},
|
|||
|
"output_type": "execute_result"
|
|||
|
},
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAABNwAAATFCAYAAAB7FctDAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjYuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8o6BhiAAAACXBIWXMAAA9hAAAPYQGoP6dpAAEAAElEQVR4nOzdeVhUdeP+8XvYwQU1NkUSNZdMU8PgccssktQwW01LkVKzQkuywlywTKksonKhLDQtn8wy80lzo6hM03KpLPc9ExRNUFAQZn5/9PN8m0BTHDiMvF/XNVfOZ85ynxmo6fac87HYbDabAAAAAAAAADiEi9kBAAAAAAAAgMsJhRsAAAAAAADgQBRuAAAAAAAAgANRuAEAAAAAAAAOROEGAAAAAAAAOBCFGwAAAAAAAOBAFG4AAAAAAACAA1G4AQAAAAAAAA5E4QYAAAAAAAA4EIUb8A8Wi0Xjx4+/6PX27t0ri8WiWbNmOTzTpZozZ46aN28ud3d31apVy+w4Tm/p0qVq06aNvLy8ZLFYdPz4cbMjVbiBAwcqNDTU7BgAAAAAUClRuKFSmjVrliwWiywWi1atWlXidZvNppCQEFksFt12220mJHQeW7du1cCBA9W4cWPNmDFDb7/9ttmRnNrRo0d17733ytvbW1OnTtWcOXNUrVq1ctvf338XLBaL3NzcFBwcrIEDB+rgwYPltl9n88/36e+PhIQEs+OVatKkSVq4cKHZMQAAAACUAzezAwDn4+Xlpblz56pTp052419//bV+//13eXp6mpTMeWRkZMhqter111/XVVddZXYcp/fDDz/oxIkTmjBhgiIjIytsv88//7waNmyo06dP6/vvv9esWbO0atUqbd68WV5eXhWWo7I7+z79XcuWLU1Kc36TJk3S3Xffrd69e5sdBQAAAICDUbihUuvRo4fmz5+vN954Q25u//fjOnfuXIWFhSk7O9vEdJVbXl6eqlWrpsOHD0uSQy8lzc/Pl4+Pj8O250zK4/08+1mdT/fu3dWuXTtJ0qBBg+Tn56eXXnpJixYt0r333uuwLM7u7++TI13IZwQAAAAAZ3FJKSq1vn376ujRo1qxYoUxVlhYqI8//lj9+vUrdZ28vDw9+eSTCgkJkaenp5o1a6ZXXnlFNpvNbrmCggKNGDFC/v7+qlGjhnr16qXff/+91G0ePHhQDz74oAIDA+Xp6alrrrlGaWlpZTqms5e+ffPNN3r44Yd1xRVXqGbNmhowYID+/PPPEst/8cUX6ty5s6pVq6YaNWqoZ8+e+vXXX+2WGThwoKpXr65du3apR48eqlGjhu6//36FhoYqMTFRkuTv71/i/nTTpk3TNddcI09PT9WrV0+PPfZYifuR3XjjjWrZsqXWr1+vG264QT4+Pnr22WeNe9a98sormjp1qho1aiQfHx9169ZNBw4ckM1m04QJE1S/fn15e3vr9ttv17Fjx+y2/dlnn6lnz56qV6+ePD091bhxY02YMEHFxcWlZvjtt9/UtWtX+fj4KDg4WC+//HKJ9+v06dMaP368mjZtKi8vL9WtW1d33nmndu3aZSxjtVqVkpKia665Rl5eXgoMDNTDDz9c6vv/zxwxMTGSpOuvv14Wi0UDBw40Xp8/f77CwsLk7e0tPz8/PfDAAyUu+zzXZ3WxOnfuLEl2x1VYWKhx48YpLCxMvr6+qlatmjp37qyvvvrKbt2/f3Zvv/22GjduLE9PT11//fX64YcfSuxr4cKFatmypby8vNSyZUt9+umnpWa60N89i8WiuLg4zZ8/Xy1atJC3t7fat2+vX375RZL01ltv6aqrrpKXl5duvPFG7d2796Lfn3P58ssvjd+nWrVq6fbbb9eWLVvslhk/frwsFot+++039evXT7Vr17Y7y/b99983Puc6derovvvu04EDB+y2sWPHDt11110KCgqSl5eX6tevr/vuu085OTnGe5CXl6f33nvPuPT17z9LAAAAAJwbZ7ihUgsNDVX79u313//+V927d5f0VwGVk5Oj++67T2+88Ybd8jabTb169dJXX32lhx56SG3atNGyZcv01FNP6eDBg3rttdeMZQcNGqT3339f/fr1U4cOHfTll1+qZ8+eJTJkZWXpP//5j1ES+Pv764svvtBDDz2k3NxcPfHEE2U6tri4ONWqVUvjx4/Xtm3bNH36dO3bt08ZGRmyWCyS/prsICYmRlFRUXrppZeUn5+v6dOnq1OnTtq4caPdTeuLiooUFRWlTp066ZVXXpGPj48GDhyo2bNn69NPP9X06dNVvXp1XXvttZL+KhWee+45RUZG6pFHHjEy/PDDD/ruu+/k7u5ubPvo0aPq3r277rvvPj3wwAMKDAw0Xvvggw9UWFioYcOG6dixY3r55Zd177336qabblJGRoaeeeYZ7dy5U2+++aZGjhxpV1TOmjVL1atXV3x8vKpXr64vv/xS48aNU25uriZPnmz3fv3555+69dZbdeedd+ree+/Vxx9/rGeeeUatWrUyfjaKi4t12223KT09Xffdd58ef/xxnThxQitWrNDmzZvVuHFjSdLDDz+sWbNmKTY2VsOHD9eePXs0ZcoUbdy4scSx/93o0aPVrFkzvf3228ali2e3eXZ7119/vZKSkpSVlaXXX39d3333nTZu3Gh3Rlxpn9XFOltC1a5d2xjLzc3VO++8o759+2rw4ME6ceKE3n33XUVFRWndunVq06aN3Tbmzp2rEydO6OGHH5bFYtHLL7+sO++8U7t37zbeg+XLl+uuu+5SixYtlJSUpKNHjyo2Nlb169e329bF/O5J0rfffqtFixbpsccekyQlJSXptttu09NPP61p06bp0Ucf1Z9//qmXX35ZDz74oL788ssLel9ycnJKnPnq5+cnSVq5cqW6d++uRo0aafz48Tp16pTefPNNdezYURs2bCgxCcQ999yjJk2aaNKkSUZpOHHiRI0dO1b33nuvBg0apCNHjujNN9/UDTfcYHzOhYWFioqKUkFBgYYNG6agoCAdPHhQn3/+uY4fPy5fX1/NmTNHgwYNUnh4uIYMGSJJxs8SAAAAgMuADaiEZs6caZNk++GHH2xTpkyx1ahRw5afn2+z2Wy2e+65x9a1a1ebzWazNWjQwNazZ09jvYULF9ok2V544QW77d199902i8Vi27lzp81ms9k2bdpkk2R79NFH7Zbr16+fTZItMTHRGHvooYdsdevWtWVnZ9ste99999l8fX2NXHv27LFJss2cOfOCji0sLMxWWFhojL/88ss2SbbPPvvMZrPZbCdOnLDVqlXLNnjwYLv1MzMzbb6+vnbjMTExNkm2hISEEvtLTEy0SbIdOXLEGDt8+LDNw8PD1q1bN1txcbExPmXKFJskW1pamjHWpUsXmyRbamqq3XbPHq+/v7/t+PHjxvioUaNskmytW7e2nTlzxhjv27evzcPDw3b69Glj7Ox793cPP/ywzcfHx265sxlmz55tjBUUFNiCgoJsd911lzGWlpZmk2RLTk4usV2r1Wqz2Wy2b7/91ibJ9sEHH9i9vnTp0lLH/+nvP5tnFRYW2gICAmwtW7a0nTp1yhj//PPPbZJs48aNM8bO91mdb38rV660HTlyxHbgwAHbxx9/bPP397d5enraDhw4YCxbVFRkKygosFv/zz//tAUGBtoefPBBY+zsZ3fFFVfYjh07Zox/9tlnNkm2//3vf8ZYmzZtbHXr1rX7jJcvX26TZGvQoIExdqG/ezabzSbJ5unpaduzZ48x9tZbb9kk2YKCgmy5ubnG+Nmfp78ve773qbTH348lICDAdvToUWPsp59+srm4uNgGDBhgjJ39nenbt6/dPvbu3WtzdXW1TZw40W78l19+sbm5uRnjGzdutEmyzZ8//7yZq1WrZouJiTnvMgAAAACcE5eUotK79957derUKX3++ec6ceKEPv/883NeTrpkyRK5urpq+PDhduNPPvmkbDabvvjiC2M5SSWW++fZajabTZ988omio6Nls9mUnZ1tPKKiopSTk6MNGzaU6biGDBlidybVI488Ijc3NyPbihUrdPz4cfXt29duv66
|
|||
|
"text/plain": [
|
|||
|
"<Figure size 1500x1500 with 5 Axes>"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {},
|
|||
|
"output_type": "display_data"
|
|||
|
},
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAjcAAAGwCAYAAABVdURTAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjYuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8o6BhiAAAACXBIWXMAAA9hAAAPYQGoP6dpAABf50lEQVR4nO3dd1QU198G8GcX2KVIM0hTFHtX7BGjRkWxtySSaBRNotHYIjF2xRJ7bIlGo8aW1wQ1v2hM7L1giQ0rooKIBVBEQDrs3vcPw8aVkh3cZWV5PufsSfbOnZlnR3C/3rkzIxNCCBARERGZCLmxAxARERHpE4sbIiIiMiksboiIiMiksLghIiIik8LihoiIiEwKixsiIiIyKSxuiIiIyKSYGztAUVOr1Xj06BFsbW0hk8mMHYeIiIh0IITA8+fP4e7uDrm84LGZElfcPHr0CB4eHsaOQURERIVw//59lCtXrsA+Ja64sbW1BfDi4NjZ2Rk5DREREekiKSkJHh4emu/xgpS44ibnVJSdnR2LGyIiomJGlyklnFBMREREJoXFDREREZkUFjdERERkUljcEBERkUlhcUNEREQmhcUNERERmRQWN0RERGRSWNwQERGRSWFxQ0RERCaFxQ0RERGZFKMWN8ePH0e3bt3g7u4OmUyGHTt2/Oc6R48eRcOGDaFUKlGlShVs2LDB4DmJiIio+DBqcZOSkoL69etjxYoVOvW/e/cuunTpgjZt2iAkJARffvklPvvsM+zbt8/ASYmIiKi4MOqDMzt16oROnTrp3H/VqlWoWLEiFi1aBACoWbMmTp48iSVLlsDX19dQMYmIiEgHmdlqPE3JQLZKwKO0tdFyFKungp8+fRo+Pj5abb6+vvjyyy/zXScjIwMZGRma90lJSYaKR0REZHJyCpa455l4kpz+z38zEJecgSfPX/w3LjkTcckZSEjNAgB4V34Lvwx+22iZi1VxExMTAxcXF602FxcXJCUlIS0tDVZWVrnWmTt3LmbMmFFUEYmIiN54GdkqPP2nIPm3SMl8qVj5ty0xLUvSts3kMqiFMFBy3RSr4qYwJk6ciICAAM37pKQkeHh4GDERERGR/mVkq16MoLxUoOQULE+SM15qL1zB4lRKAadSyn9ftgqUKaVEGduX2kop4GitgFwuM9Cn1E2xKm5cXV0RGxur1RYbGws7O7s8R20AQKlUQqlUFkU8IiIivXq5YHnyatHy8mmh5xlISs+WtG1zuQxvlVK8Upy8KFDK2CpRppQSTv8sc7CyMHrBIkWxKm6aN2+O3bt3a7UdOHAAzZs3N1IiIipprj1MxG8XHhh92J1Mj0otkJCapRlleZKcgecSCxYLMxnesskZTflnpOWfAiWnrcw/RYx9MStYpDBqcZOcnIw7d+5o3t+9exchISEoXbo0ypcvj4kTJ+Lhw4fYtGkTAGDo0KFYvnw5xo0bh08++QSHDx/G1q1bsWvXLmN9BCIqQR4/T8fA9X8jLjnT2FGoBLEwk2lGVbSKln8KlxenhhSagkUmM82CRQqjFjfnz59HmzZtNO9z5sb4+/tjw4YNiI6ORlRUlGZ5xYoVsWvXLowZMwbLli1DuXLlsHbtWl4GTkQGp1YLfLX1MuKSM1G5jA261HM3diQyMTIADtYWWoVMmVJK2FmZs2CRSCZEyRpbTUpKgr29PRITE2FnZ2fsOERUTPxw9A4W7A2DpYUcf454B1VdbI0diahEkfL9Xazm3BBRyZOZrYZKbdx/g119mIhF+28BAGZ0r83ChugNx+KGiN44z1Iyse96DP66Eo1T4XEwcm2j0a2+O/o05q0kiN50LG6I6I2QlJ6F/ddj8deVRzh5Ow7Zb0pF84/qLraY06sO5z4QFQMsbojIaJIzsnEoNBZ/Xo7G8VtPkKlSa5bVdLND13pu6FzXDc62xr9XlbXCjIUNUTHB4oaIilRapgqHbz7GX1ce4fDNx8jI/regqepcCl3ruaNLPTdUcS5lxJREVJyxuCEyEWcinuJMxFNjxyhQ+JMUHAqNRWqmStNW0ckGXeu5oWs9d1R35URdInp9LG6IirnkjGzM3hWKX/+O+u/Ob4hyjlboWs8dXeu5oba7HU/3EJFesbghKsZOhz/F179dxoNnaQCALvXc4GhtYeRU+bO3skD7Wq6oX86eBQ0RGQyLG6JiKD1Lhfl7b2J9cCQAoKyDFb79oD6aV37LuMGIiN4ALG6IiplLUc/w1bbLiHiSAgD4qKkHJnephVJK/joTEQEsbogMRgiBlcfCsfp4BDKy1P+9go7Ssl5MxnWxU2Lee/XQprqz3rZNRGQKWNwQGYBKLTDjz+vYdPqeQbbf08sdM7rXgf0bPL+GiMhYWNwQ6VlGtgpjtoRg99UYyGTAlC610KGWi962b6Uwg1Mp49/UjojoTcXihkiPnqdnYcimCzgd8RQWZjIs8fNC13ruxo5FRFSisLgh0pPHz9MxcN053IhOQimlOX7s3wgtqjgZOxYRUYnD4oZIDyLjUjBg3d+Iik+FUykFNgxqijpl7Y0di4ioRGJxQ/Sarj1MxMD1fyMuORPlS1tj0ydN4elkY+xYREQlFosbotcQfCcOQzadR0qmCrXc7LDhkyZwtrU0diwiohKNxQ1RIf115RHGbAlBlkrAu/Jb+LF/I9ha8tJsIiJjY3FDlI/4lEwM3nQeMYnpeS5/lJgGIYAudd2w2K8+lOZmRZyQiIjywuKGKB/nIuNx4d6zAvsMaF4Bgd1qw0zOh0ASEb0pWNwQ5UOIF/+t4WqL+e/Vy7Xc3sqCE4eJiN5ALG6I/kMppTnqezgYOwYREemIxQ2ZrPvxqbj8IKHQ61++X/h1iYjIeFjckElSqwV6/RCMuOTM194W59MQERUvLG7IJKmF0BQ2jSs4wtyscAWKuVyOT9+pqM9oRERkYCxuyOT95N8E9ta8/wwRUUkhN3YAIiIiIn3iyA0VqQM3YrHiyB2o1MKg+xEw7PaJiOjNxeKGitSm05EIKcKrkOytLGCl4J2DiYhKEhY3VKRyRmw+b1UJb1d6y+D7q+FmC4U5z74SEZUkLG7IKGq526FNDWdjxyAiIhPE4oYM7snzDNx+/BwAkJCaZeQ0RERk6ljckEFlZKvQfsmxXEWNXMYb4xERkWGwuCGDSslQaQqbqs6lIJMBzraW8K5s+Pk2RERUMrG4oSKz78tWkPNRBkREZGC8jISIiIhMCosbIiIiMiksboiIiMikFGrOTVZWFmJiYpCamooyZcqgdOnS+s5FREREVCg6j9w8f/4cK1euROvWrWFnZwdPT0/UrFkTZcqUQYUKFTB48GCcO3fOkFmJiIiI/pNOxc3ixYvh6emJ9evXw8fHBzt27EBISAhu3bqF06dPIzAwENnZ2ejQoQM6duyI27dvGzo3ERERUZ50Oi117tw5HD9+HLVr185zedOmTfHJJ59g1apVWL9+PU6cOIGqVavqNSgRERGRLnQqbn799VedNqZUKjF06NDXCkRERET0Oni1FBEREZkUScXN5cuX8c033+CHH35AXFyc1rKkpCR88skneg1HREREJJXOxc3+/fvRtGlTBAUFYf78+ahRowaOHDmiWZ6WloaNGzcaJCQRERGRrnQubqZPn46xY8fi2rVriIyMxLhx49C9e3fs3bvXkPmomNtzLRoAYKMwAx8ETkRERUHnm/hdv34dP//8MwBAJpNh3LhxKFeuHN5//30EBQWhSZMmBgtJxVNYzHPM/PMGAGBUu6qQsbohIqIioHNxo1QqkZCQoNXWt29fyOVy+Pn5YdGiRfrORsVYWqYKI365iIxsNVpXK4PBLSsZOxIREZUQOhc3Xl5eOHLkCBo1aqTV/uGHH0IIAX9/f72Ho+Jrxp/XcftxMsrYKrGoT33I5Ry1ISKioqFzcTNs2DAcP348z2UfffQRhBBYs2aN3oJR8XU6/CmCzt2HTAYs9fOCUymlsSMREVEJIhNCCGOHKEpJSUmwt7dHYmIi7OzsjB3HJG06HYlpf1xH2xrOWDeQc7GIiOj1Sfn+5k38yGAsLfjjRURERY/fPkRERGRSWNwQERGRSWF
|
|||
|
"text/plain": [
|
|||
|
"<Figure size 640x480 with 1 Axes>"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {},
|
|||
|
"output_type": "display_data"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"from sklearn.ensemble import RandomForestClassifier\n",
|
|||
|
"\n",
|
|||
|
"# Score the model with default parameters\n",
|
|||
|
"score_the_model(\n",
|
|||
|
" model=RandomForestClassifier(),\n",
|
|||
|
" model_name='Random Forest',\n",
|
|||
|
" random_seed=42,\n",
|
|||
|
" X_train=X_train,\n",
|
|||
|
" X_test=X_test,\n",
|
|||
|
" y_train=y_train,\n",
|
|||
|
" y_test=y_test,\n",
|
|||
|
" plot=True\n",
|
|||
|
")"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 47,
|
|||
|
"id": "47db95d0",
|
|||
|
"metadata": {},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"\n",
|
|||
|
"from sklearn.linear_model import LogisticRegression\n",
|
|||
|
"from sklearn.neighbors import KNeighborsClassifier\n",
|
|||
|
"from sklearn.ensemble import RandomForestClassifier\n",
|
|||
|
"from sklearn.tree import DecisionTreeClassifier\n",
|
|||
|
"\n",
|
|||
|
"# Put models in a dictionary\n",
|
|||
|
"models = {\n",
|
|||
|
" \"Logistic Regression\": LogisticRegression(max_iter=120),\n",
|
|||
|
" \"KNN\": KNeighborsClassifier(),\n",
|
|||
|
" \"Random Forest\": RandomForestClassifier(),\n",
|
|||
|
" \"Decision Tree\": DecisionTreeClassifier(),\n",
|
|||
|
"} \n",
|
|||
|
"\n",
|
|||
|
"# Create a function to fit and score models\n",
|
|||
|
"def fit_and_score(models, X_train, X_test, y_train, y_test):\n",
|
|||
|
" \"\"\"\n",
|
|||
|
" Fits and evaluates given machine learning models.\n",
|
|||
|
" models: dict of different Scikit-Learn machine learning models\n",
|
|||
|
" X_train: training data (no labels)\n",
|
|||
|
" x_test: testing data (no labels)\n",
|
|||
|
" y_train: training labels\n",
|
|||
|
" y_test: trest labels\n",
|
|||
|
" \"\"\"\n",
|
|||
|
"\n",
|
|||
|
" # Set random seed\n",
|
|||
|
" np.random.seed(42)\n",
|
|||
|
"\n",
|
|||
|
" # Make a dictioanry to keep model scores\n",
|
|||
|
" model_scores = {}\n",
|
|||
|
"\n",
|
|||
|
" # Loop through models\n",
|
|||
|
" for name, model in models.items():\n",
|
|||
|
" # Fit the model to the data\n",
|
|||
|
" model.fit(X_train, y_train)\n",
|
|||
|
" # Evaluate the model and append its score to model_scores\n",
|
|||
|
" model_scores[name] = model.score(X_test, y_test)\n",
|
|||
|
"\n",
|
|||
|
" return model_scores"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 48,
|
|||
|
"id": "71a1be34",
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stderr",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"/home/gasperspagnolo/Documents/faks_git/is_assignments/a2/code/.venv/lib64/python3.10/site-packages/sklearn/linear_model/_logistic.py:458: ConvergenceWarning: lbfgs failed to converge (status=1):\n",
|
|||
|
"STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.\n",
|
|||
|
"\n",
|
|||
|
"Increase the number of iterations (max_iter) or scale the data as shown in:\n",
|
|||
|
" https://scikit-learn.org/stable/modules/preprocessing.html\n",
|
|||
|
"Please also refer to the documentation for alternative solver options:\n",
|
|||
|
" https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression\n",
|
|||
|
" n_iter_i = _check_optimize_result(\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"text/plain": [
|
|||
|
"{'Logistic Regression': 0.84688995215311,\n",
|
|||
|
" 'KNN': 0.7511961722488039,\n",
|
|||
|
" 'Random Forest': 0.8229665071770335,\n",
|
|||
|
" 'Decision Tree': 0.784688995215311}"
|
|||
|
]
|
|||
|
},
|
|||
|
"execution_count": 48,
|
|||
|
"metadata": {},
|
|||
|
"output_type": "execute_result"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"fit_and_score(models, X_train, X_test, y_train, y_test)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"id": "3dafbf40",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"### 2.3 Evaluation\n",
|
|||
|
"Given that the data set is not in the ”big data” category, implement a cross-validation procedure based\n",
|
|||
|
"on five folds (approximately equal sized) of your data. Furthermore, repeat the experiment 10 times with\n",
|
|||
|
"different folds and average the results (include standard deviation). You are expected to report the following\n",
|
|||
|
"metrics:\n",
|
|||
|
"- F1\n",
|
|||
|
"- Precision\n",
|
|||
|
"- Recall\n",
|
|||
|
"- AUC\n",
|
|||
|
"Comment on the performance of algorithms and visualize their final scores. How do they perform against\n",
|
|||
|
"the random baseline? What about the constant one? How do different learning scenarios impact the final\n",
|
|||
|
"score? Are the differences between the models statistically significant?"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"id": "1bd730c6",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"Tle malo u detajle razlozi kko delajo tej scoringi"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"id": "addfc3ea",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"## Report and presentation\n",
|
|||
|
"The assignment has to be submitted in the form of two files: a markdown file and a PDF file created from\n",
|
|||
|
"the R Studio markdown file (in RStudio → file - new file - R Markdown), where you write both the code,\n",
|
|||
|
"as well as the text of answers (echo = T option must be enabled for each code block). Markdown files can\n",
|
|||
|
"easily be exported to PDF using (“Knit”) button in R Studio. If you are using Python, you can produce a\n",
|
|||
|
"similar report with Jupyter Notebook."
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"metadata": {
|
|||
|
"kernelspec": {
|
|||
|
"display_name": "Python 3 (ipykernel)",
|
|||
|
"language": "python",
|
|||
|
"name": "python3"
|
|||
|
},
|
|||
|
"language_info": {
|
|||
|
"codemirror_mode": {
|
|||
|
"name": "ipython",
|
|||
|
"version": 3
|
|||
|
},
|
|||
|
"file_extension": ".py",
|
|||
|
"mimetype": "text/x-python",
|
|||
|
"name": "python",
|
|||
|
"nbconvert_exporter": "python",
|
|||
|
"pygments_lexer": "ipython3",
|
|||
|
"version": "3.10.8"
|
|||
|
},
|
|||
|
"vscode": {
|
|||
|
"interpreter": {
|
|||
|
"hash": "73efbd7de9807940366a2e2c585910074bc00282bd7f8b3dae7eb06897ea8ebf"
|
|||
|
}
|
|||
|
}
|
|||
|
},
|
|||
|
"nbformat": 4,
|
|||
|
"nbformat_minor": 5
|
|||
|
}
|