{ "cells": [ { "cell_type": "code", "execution_count": 1, "id": "2f62b569-40d3-4fbe-a7e8-eae2c17b4650", "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd\n", "import matplotlib.pyplot as plt" ] }, { "cell_type": "markdown", "id": "7f206848-d082-4ffd-82db-665f2edbbf0f", "metadata": {}, "source": [ "#### Feature selection – variance" ] }, { "cell_type": "code", "execution_count": 2, "id": "6a4f882d-fa66-4176-8861-411a9ecec123", "metadata": {}, "outputs": [], "source": [ "from sklearn.datasets import load_wine\n", "data = load_wine()\n", "X = data.data\n", "y = data.target\n", "\n", "df = pd.DataFrame(X)" ] }, { "cell_type": "markdown", "id": "fa9060d9-5987-4e74-b5c0-8f16cc850621", "metadata": {}, "source": [ "##### Fairer Comparison of Variance With Feature Normalization\n", "Often, it is not fair to compare the variance of a feature to another. The reason is that as the values in the distribution get bigger, the variance grows exponentially. In other words, the variances will not be on the same scale." ] }, { "cell_type": "code", "execution_count": 3, "id": "a7f5dc62-8b4e-47c4-b639-ba1698125138", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | 0 | \n", "1 | \n", "2 | \n", "3 | \n", "4 | \n", "5 | \n", "6 | \n", "7 | \n", "8 | \n", "9 | \n", "10 | \n", "11 | \n", "12 | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
count | \n", "178.000000 | \n", "178.000000 | \n", "178.000000 | \n", "178.000000 | \n", "178.000000 | \n", "178.000000 | \n", "178.000000 | \n", "178.000000 | \n", "178.000000 | \n", "178.000000 | \n", "178.000000 | \n", "178.000000 | \n", "178.000000 | \n", "
mean | \n", "13.000618 | \n", "2.336348 | \n", "2.366517 | \n", "19.494944 | \n", "99.741573 | \n", "2.295112 | \n", "2.029270 | \n", "0.361854 | \n", "1.590899 | \n", "5.058090 | \n", "0.957449 | \n", "2.611685 | \n", "746.893258 | \n", "
std | \n", "0.811827 | \n", "1.117146 | \n", "0.274344 | \n", "3.339564 | \n", "14.282484 | \n", "0.625851 | \n", "0.998859 | \n", "0.124453 | \n", "0.572359 | \n", "2.318286 | \n", "0.228572 | \n", "0.709990 | \n", "314.907474 | \n", "
min | \n", "11.030000 | \n", "0.740000 | \n", "1.360000 | \n", "10.600000 | \n", "70.000000 | \n", "0.980000 | \n", "0.340000 | \n", "0.130000 | \n", "0.410000 | \n", "1.280000 | \n", "0.480000 | \n", "1.270000 | \n", "278.000000 | \n", "
25% | \n", "12.362500 | \n", "1.602500 | \n", "2.210000 | \n", "17.200000 | \n", "88.000000 | \n", "1.742500 | \n", "1.205000 | \n", "0.270000 | \n", "1.250000 | \n", "3.220000 | \n", "0.782500 | \n", "1.937500 | \n", "500.500000 | \n", "
50% | \n", "13.050000 | \n", "1.865000 | \n", "2.360000 | \n", "19.500000 | \n", "98.000000 | \n", "2.355000 | \n", "2.135000 | \n", "0.340000 | \n", "1.555000 | \n", "4.690000 | \n", "0.965000 | \n", "2.780000 | \n", "673.500000 | \n", "
75% | \n", "13.677500 | \n", "3.082500 | \n", "2.557500 | \n", "21.500000 | \n", "107.000000 | \n", "2.800000 | \n", "2.875000 | \n", "0.437500 | \n", "1.950000 | \n", "6.200000 | \n", "1.120000 | \n", "3.170000 | \n", "985.000000 | \n", "
max | \n", "14.830000 | \n", "5.800000 | \n", "3.230000 | \n", "30.000000 | \n", "162.000000 | \n", "3.880000 | \n", "5.080000 | \n", "0.660000 | \n", "3.580000 | \n", "13.000000 | \n", "1.710000 | \n", "4.000000 | \n", "1680.000000 | \n", "