|
| 1 | +.. _mean_normalization_scaler: |
| 2 | + |
| 3 | +.. currentmodule:: feature_engine.scaling |
| 4 | + |
| 5 | +MeanNormalizationScaler |
| 6 | +======================= |
| 7 | + |
| 8 | +:class:`MeanNormalizationScaler()` scales variables using mean normalization. With mean normalization, |
| 9 | +we center the distribution around 0, and rescale the distribution to the variable's value range, |
| 10 | +so that its values vary between -1 and 1. This is accomplished by subtracting the mean of the feature |
| 11 | +and then dividing by its range (i.e., the difference between the maximum and minimum values). |
| 12 | + |
| 13 | +The :class:`MeanNormalizationScaler()` only works with non-constant numerical variables. |
| 14 | +If the variable is constant, the scaler will raise an error. |
| 15 | + |
| 16 | +Python example |
| 17 | +-------------- |
| 18 | + |
| 19 | +We'll show how to use :class:`MeanNormalizationScaler()` through a toy dataset. Let's create |
| 20 | +a toy dataset: |
| 21 | + |
| 22 | +.. code:: python |
| 23 | +
|
| 24 | + import pandas as pd |
| 25 | + from feature_engine.scaling import MeanNormalizationScaler |
| 26 | +
|
| 27 | + df = pd.DataFrame.from_dict( |
| 28 | + { |
| 29 | + "Name": ["tom", "nick", "krish", "jack"], |
| 30 | + "City": ["London", "Manchester", "Liverpool", "Bristol"], |
| 31 | + "Age": [20, 21, 19, 18], |
| 32 | + "Height": [1.80, 1.77, 1.90, 2.00], |
| 33 | + "Marks": [0.9, 0.8, 0.7, 0.6], |
| 34 | + "dob": pd.date_range("2020-02-24", periods=4, freq="min"), |
| 35 | + }) |
| 36 | +
|
| 37 | + print(df) |
| 38 | +
|
| 39 | +The dataset looks like this: |
| 40 | + |
| 41 | +.. code:: python |
| 42 | +
|
| 43 | + Name City Age Height Marks dob |
| 44 | + 0 tom London 20 1.80 0.9 2020-02-24 00:00:00 |
| 45 | + 1 nick Manchester 21 1.77 0.8 2020-02-24 00:01:00 |
| 46 | + 2 krish Liverpool 19 1.90 0.7 2020-02-24 00:02:00 |
| 47 | + 3 jack Bristol 18 2.00 0.6 2020-02-24 00:03:00 |
| 48 | +
|
| 49 | +We see that the only numerical features in this dataset are **Age**, **Marks**, and **Height**. We want |
| 50 | +to scale them using mean normalization. |
| 51 | + |
| 52 | +First, let's make a list with the variable names: |
| 53 | + |
| 54 | +.. code:: python |
| 55 | +
|
| 56 | + vars = [ |
| 57 | + 'Age', |
| 58 | + 'Marks', |
| 59 | + 'Height', |
| 60 | + ] |
| 61 | +
|
| 62 | +Now, let's set up :class:`MeanNormalizationScaler()`: |
| 63 | + |
| 64 | +.. code:: python |
| 65 | +
|
| 66 | + # set up the scaler |
| 67 | + scaler = MeanNormalizationScaler(variables = vars) |
| 68 | +
|
| 69 | + # fit the scaler |
| 70 | + scaler.fit(df) |
| 71 | + |
| 72 | +The scaler learns the mean of every column in *vars* and their respective range. |
| 73 | +Note that we can access these values in the following way: |
| 74 | + |
| 75 | +.. code:: python |
| 76 | +
|
| 77 | + # access the parameters learned by the scaler |
| 78 | + print(f'Means: {scaler.mean_}') |
| 79 | + print(f'Ranges: {scaler.range_}') |
| 80 | +
|
| 81 | +We see the features' mean and value ranges in the following output: |
| 82 | + |
| 83 | +.. code:: python |
| 84 | +
|
| 85 | + Means: {'Age': 19.5, 'Marks': 0.7500000000000001, 'Height': 1.8675000000000002} |
| 86 | + Ranges: {'Age': 3.0, 'Marks': 0.30000000000000004, 'Height': 0.22999999999999998} |
| 87 | +
|
| 88 | +We can now go ahead and scale the variables: |
| 89 | + |
| 90 | +.. code:: python |
| 91 | +
|
| 92 | + # scale the data |
| 93 | + df = scaler.transform(df) |
| 94 | + print(df) |
| 95 | +
|
| 96 | +In the following output, we can see the scaled variables: |
| 97 | + |
| 98 | +.. code:: python |
| 99 | +
|
| 100 | + Name City Age Height Marks dob |
| 101 | + 0 tom London 0.166667 -0.293478 0.500000 2020-02-24 00:00:00 |
| 102 | + 1 nick Manchester 0.500000 -0.423913 0.166667 2020-02-24 00:01:00 |
| 103 | + 2 krish Liverpool -0.166667 0.141304 -0.166667 2020-02-24 00:02:00 |
| 104 | + 3 jack Bristol -0.500000 0.576087 -0.500000 2020-02-24 00:03:00 |
| 105 | +
|
| 106 | +We can restore the data to itsoriginal values using the inverse transformation: |
| 107 | + |
| 108 | +.. code:: python |
| 109 | +
|
| 110 | + # inverse transform the dataframe |
| 111 | + df = scaler.inverse_transform(df) |
| 112 | + print(df) |
| 113 | +
|
| 114 | +In the following data, we see the scaled variables returned to their oridinal representation: |
| 115 | + |
| 116 | +.. code:: python |
| 117 | +
|
| 118 | + Name City Age Height Marks dob |
| 119 | + 0 tom London 20 1.80 0.9 2020-02-24 00:00:00 |
| 120 | + 1 nick Manchester 21 1.77 0.8 2020-02-24 00:01:00 |
| 121 | + 2 krish Liverpool 19 1.90 0.7 2020-02-24 00:02:00 |
| 122 | + 3 jack Bristol 18 2.00 0.6 2020-02-24 00:03:00 |
| 123 | +
|
| 124 | +
|
| 125 | +Additional resources |
| 126 | +-------------------- |
| 127 | + |
| 128 | +For more details about this and other feature engineering methods check out |
| 129 | +these resources: |
| 130 | + |
| 131 | + |
| 132 | +.. figure:: ../../images/feml.png |
| 133 | + :width: 300 |
| 134 | + :figclass: align-center |
| 135 | + :align: left |
| 136 | + :target: https://www.trainindata.com/p/feature-engineering-for-machine-learning |
| 137 | + |
| 138 | + Feature Engineering for Machine Learning |
| 139 | + |
| 140 | +| |
| 141 | +| |
| 142 | +| |
| 143 | +| |
| 144 | +| |
| 145 | +| |
| 146 | +| |
| 147 | +| |
| 148 | +| |
| 149 | +| |
| 150 | +
|
| 151 | +Or read our book: |
| 152 | + |
| 153 | +.. figure:: ../../images/cookbook.png |
| 154 | + :width: 200 |
| 155 | + :figclass: align-center |
| 156 | + :align: left |
| 157 | + :target: https://www.packtpub.com/en-us/product/python-feature-engineering-cookbook-9781835883587 |
| 158 | + |
| 159 | + Python Feature Engineering Cookbook |
| 160 | + |
| 161 | +| |
| 162 | +| |
| 163 | +| |
| 164 | +| |
| 165 | +| |
| 166 | +| |
| 167 | +| |
| 168 | +| |
| 169 | +| |
| 170 | +| |
| 171 | +| |
| 172 | +| |
| 173 | +| |
| 174 | +
|
| 175 | +Both our book and course are suitable for beginners and more advanced data scientists |
| 176 | +alike. By purchasing them you are supporting Sole, the main developer of Feature-engine. |
0 commit comments