Transforming¶

This chapter describes the ibex.trans() function, which allows

applying functions or estimators to pandas.DataFrame objects
selecting a subset of columns for applications
naming the output columns of the results

or any combination of these.

We’ll use a DataFrame X, with columns 'a' and 'b', and (implied) index 1, 2, 3,

>>> import pandas as pd
>>> X = pd.DataFrame({'a': [1, 2], 'b': [3, 4]})

and also import trans:

>>> from ibex import trans

Specifying Functions¶

The (positionally first) func argument allows specifying the transformation to apply.

This can be None, meaning that the output should be the input:

>>> trans().fit_transform(X)
   a  b
0  1  3
1  2  4

Tip

Specifying Output Columns and Multiple Transformations show uses for this.

The func argument can alternatively be a function, which will be applied to the pandas.DataFrame.values of the input:

>>> import numpy as np
>>> trans(np.sqrt).fit_transform(X)
          a         b
0  1.000000  1.732051
1  1.414214  2.000000

Finally, it can be a different estimator:

>>> from ibex.sklearn.decomposition import PCA
>>> trans(PCA(n_components=2)).fit_transform(X)
          a  b
0 -0.707107  ...
1  0.707107  ...

Specifying Input Columns¶

The (positionally second) in_cols argument allows specifying the columns to which to apply the function.

If it is None, then the function will be applied to all columns.

If it is a string, the function will be applied to the DataFrame consisting of the single column corresponding to this string:

>>> trans(None, 'a').fit_transform(X)
   a
0  1
1  2
>>> trans(np.sqrt, 'a').fit_transform(X)
          a
0  1.000000
1  1.414214
>>> trans(PCA(n_components=1), 'a').fit_transform(X)
     a
0 -0.5
1  0.5

If it is a list of strings, the function will be applied to the DataFrame consisting of the columns corresponding to these strings:

>>> trans(None, ['a']).fit_transform(X)
   a
0  1
1  2
>>> trans(np.sqrt, ['a']).fit_transform(X)
          a
0  1.000000
1  1.414214
>>> trans(PCA(n_components=1), ['a']).fit_transform(X)
     a
0 -0.5
1  0.5

Specifying Output Columns¶

The (positionally third) out_cols argument allows specifying the names of the columns of the result.

If it is None, then the output columns will be as explained in _verification_and_processing_output_dataframe_columns in _verification_and_processing:

>>> trans(np.sqrt, out_cols=None).fit_transform(X)
          a         b
0  1.000000  1.732051
1  1.414214  2.000000

If it is a string, it will become the (single) column of the resulting DataFrame.

>>> trans(PCA(n_components=1), out_cols='pc').fit_transform(X)
        pc
0 -0.707107
1  0.707107

If it is a list of strings, these will become the columns of the resulting DataFrame.

>>> trans(out_cols=['c', 'd']).fit_transform(X)
   c  d
0  1  3
1  2  4

>>> trans(np.sqrt, out_cols=['c', 'd']).fit_transform(X)
          c         d
0  1.000000  1.732051
1  1.414214  2.000000
>>> trans(PCA(n_components=2), out_cols=['pc1', 'pc2']).fit_transform(X)
          pc1  pc2
0 -0.707107  ...
1  0.707107  ...

Tip

As can be seen from the first of the examples just above, this can be used to build a step that simply changes the column names of a DataFrame.

Specifying Combinations¶

Of course, you can combine the arguments specified above:

>>> trans(None, 'a', 'c').fit_transform(X)
   c
0  1
1  2

>>> trans(None, ['a'], ['c']).fit_transform(X)
   c
0  1
1  2

>>> trans(np.sqrt, ['a', 'b'], ['c', 'd']).fit_transform(X)
          c         d
0  1.000000  1.732051
1  1.414214  2.000000

>>> trans(PCA(n_components=1), 'a', 'pc').fit_transform(X)
     pc
0 -0.5
1  0.5

Multiple Transformations¶

Applying multiple transformations on a single DataFrame is no different than any other case of uniting features (see Uniting Features). In particular, it’s possible to succinctly use the + operator:

>>> trn = trans(np.sin, 'a', 'sin_a') + trans(np.cos, 'b', 'cos_b')
>>> trn.fit_transform(X)
  functiontransformer_0 functiontransformer_1
                  sin_a                 cos_b
0              0.841471             -0.989992
1              0.909297             -0.653644

>>> trn = trans() + trans(np.sin, 'a', 'sin_a') + trans(np.cos, 'b', 'cos_b')
>>> trn.fit_transform(X)
  functiontransformer_0    functiontransformer_1 functiontransformer_2
                      a  b                 sin_a                 cos_b
0                     1  3              0.841471             -0.989992
1                     2  4              0.909297             -0.653644

Tip

As can be seen from the last of the examples just above, this can be used to build a step that simply adds to the existing columns of some DataFrame.

Transforming¶

Specifying Functions¶

Specifying Input Columns¶

Specifying Output Columns¶

Specifying Combinations¶

Multiple Transformations¶

Table Of Contents

Related Topics

This Page