.. _function_transformer:

Transforming
============ 

This chapter describes the :py:func:`ibex.trans` function, which allows

#. applying functions or estimators to :class:`pandas.DataFrame` objects

#. selecting a subset of columns for applications

#. naming the output columns of the results

or any combination of these.


We'll use a ``DataFrame`` ``X``, with columns ``'a'`` and ``'b'``, and (implied) index ``1, 2, 3``,

    >>> import pandas as pd
    >>> X = pd.DataFrame({'a': [1, 2], 'b': [3, 4]})

and also import ``trans``:

    >>> from ibex import trans


Specifying Functions
--------------------

The (positionally first) ``func`` argument allows specifying the transformation to apply. 

This can be ``None``, meaning that the output should be the input:
    
    >>> trans().fit_transform(X)
       a  b
    0  1  3
    1  2  4

.. tip::

    :ref:`function_transformer_specifying_output_columns` and :ref:`function_transformer_multiple_transformations` show uses for this.

The ``func`` argument can alternatively be a function, which will be applied to the 
:attr:`pandas.DataFrame.values` of the input:

    >>> import numpy as np
    >>> trans(np.sqrt).fit_transform(X)
              a         b
    0  1.000000  1.732051
    1  1.414214  2.000000

Finally, it can be a different estimator: 

    >>> from ibex.sklearn.decomposition import PCA 
    >>> trans(PCA(n_components=2)).fit_transform(X)
              a  b
    0 -0.707107  ...
    1  0.707107  ...


Specifying Input Columns
------------------------

The (positionally second) ``in_cols`` argument allows specifying the columns to which to apply the function. 

If it is ``None``, then the function will be applied to all columns.

If it is a string, the function will be applied to the ``DataFrame`` consisting of the single column corresponding to this string:

    >>> trans(None, 'a').fit_transform(X)
       a
    0  1
    1  2
    >>> trans(np.sqrt, 'a').fit_transform(X)
              a
    0  1.000000
    1  1.414214
    >>> trans(PCA(n_components=1), 'a').fit_transform(X)
         a
    0 -0.5
    1  0.5


If it is a ``list`` of strings, the function will be applied to the ``DataFrame`` consisting of the columns corresponding to these strings:


    >>> trans(None, ['a']).fit_transform(X)
       a
    0  1
    1  2
    >>> trans(np.sqrt, ['a']).fit_transform(X)
              a
    0  1.000000
    1  1.414214
    >>> trans(PCA(n_components=1), ['a']).fit_transform(X)
         a
    0 -0.5
    1  0.5


.. _function_transformer_specifying_output_columns:

Specifying Output Columns
-------------------------

The (positionally third) ``out_cols`` argument allows specifying the names of the columns of the result. 

If it is ``None``, then the output columns will be as explained in 
:ref:`_verification_and_processing_output_dataframe_columns` 
in
:ref:`_verification_and_processing`:

    >>> trans(np.sqrt, out_cols=None).fit_transform(X)
              a         b
    0  1.000000  1.732051
    1  1.414214  2.000000

If it is a string, it will become the (single) column of the resulting ``DataFrame``.

    >>> trans(PCA(n_components=1), out_cols='pc').fit_transform(X)
            pc
    0 -0.707107
    1  0.707107

If it is a ``list`` of strings, these will become the columns of the resulting ``DataFrame``.

    >>> trans(out_cols=['c', 'd']).fit_transform(X)
       c  d
    0  1  3
    1  2  4

    >>> trans(np.sqrt, out_cols=['c', 'd']).fit_transform(X)
              c         d
    0  1.000000  1.732051
    1  1.414214  2.000000
    >>> trans(PCA(n_components=2), out_cols=['pc1', 'pc2']).fit_transform(X)
              pc1  pc2
    0 -0.707107  ...
    1  0.707107  ...

.. tip::

    As can be seen from the first of the examples just above, this can be used to build a step that simply changes the column names of a ``DataFrame``.


Specifying Combinations
-----------------------------------

Of course, you can combine the arguments specified above:

    >>> trans(None, 'a', 'c').fit_transform(X)
       c
    0  1
    1  2

    >>> trans(None, ['a'], ['c']).fit_transform(X)
       c
    0  1
    1  2

    >>> trans(np.sqrt, ['a', 'b'], ['c', 'd']).fit_transform(X)
              c         d
    0  1.000000  1.732051
    1  1.414214  2.000000

    >>> trans(PCA(n_components=1), 'a', 'pc').fit_transform(X)
         pc
    0 -0.5
    1  0.5


.. _function_transformer_multiple_transformations:

Multiple Transformations
------------------------

Applying multiple transformations on a single ``DataFrame`` is no different than any other case of uniting features (see :ref:`feature_union`). In particular, it's possible to succinctly use the ``+`` operator:

    >>> trn = trans(np.sin, 'a', 'sin_a') + trans(np.cos, 'b', 'cos_b')
    >>> trn.fit_transform(X)
      functiontransformer_0 functiontransformer_1
                      sin_a                 cos_b
    0              0.841471             -0.989992
    1              0.909297             -0.653644

    >>> trn = trans() + trans(np.sin, 'a', 'sin_a') + trans(np.cos, 'b', 'cos_b')
    >>> trn.fit_transform(X)
      functiontransformer_0    functiontransformer_1 functiontransformer_2
                          a  b                 sin_a                 cos_b
    0                     1  3              0.841471             -0.989992
    1                     2  4              0.909297             -0.653644


.. tip::

    As can be seen from the last of the examples just above, this can be used to build a step that simply adds to the 
    existing columns of some ``DataFrame``.