ibex.FrameMixin

class ibex.FrameMixin[source]

A base class for steps taking pandas entities, not numpy entities.

Subclass this step to indicate that a step takes pandas entities.

Example

This is a simple, illustrative “identity” transformer, which simply relays its input.

>>> import pandas as pd
>>> from sklearn import base
>>> import ibex
>>>
>>> class Id(
...            base.BaseEstimator, # (1)
...            base.TransformerMixin, # (2)
...            ibex.FrameMixin): # (3)
...
...     def fit(self, X, y=None):
...         self.x_columns = X.columns # (4)
...         if y is not None and isinstance(y, pd.DataFrame):
...             self.y_columns = y.columns
...         return self
...
...     def transform(self, X, *args, **kwargs):
...         return X[self.x_columns] # (5)

Note the following general points:

  1. We subclass sklearn.base.BaseEstimator, as this is an estimator.
  2. We subclass sklearn.base.TransformerMixin, as, in this case, this is specifically a transformer.
  3. We subclass ibex.FrameMixin, as this estimator deals with pandas entities.

4. In fit, we make sure to set ibex.FrameMixin.x_columns;, and, if relevant, ibex.FrameMixin.y_columns (if y is a pandas.DataFrame); this will ensure that the transformer will “remember” the columns it should see in further calls.

5. In transform, we first use x_columns. This will verify the columns of X, and also reorder them according to the original order seen in fit (if needed).

Suppose we define two pandas.DataFrame objects, X_1 and X_2, with different columns:

>>> import pandas as pd
>>>
>>> X_1 = pd.DataFrame({'a': [1, 2, 3], 'b': [3, 4, 5]})
>>> X_2 = X_1.rename(columns={'b': 'd'})

The following fit-transform combination will work:

>>> Id().fit(X_1).transform(X_1)
a  b
0  1  3
1  2  4
2  3  5

The following fit-transform combination will fail:

>>> try:
...     Id().fit(X_1).transform(X_2)
... except KeyError:
...     print('caught')
caught

The following transform will fail, as the estimator was not fitted:

>>> try:
...     from sklearn.exceptions import NotFittedError
... except ImportError:
...     from sklearn.utils.validation import NotFittedError # Older Versions
>>> try:
...     Id().transform(X_2)
... except NotFittedError:
...     print('caught')
caught

Steps can be piped into each other:

>>> (Id() | Id()).fit(X_1).transform(X_1)
a  b
0  1  3
1  2  4
2  3  5

Steps can be added:

>>> (Id() + Id()).fit(X_1).transform(X_1)
     id_0     id_1
   a  b  a  b
0  1  3  1  3
1  2  4  2  4
2  3  5  3  5
__add__(other)[source]
Returns:ibex.sklearn.pipeline.FeatureUnion
__or__(other)[source]

Pipes the result of this step to other.

Parameters:other – A different step object whose class subclasses this one.
Returns:ibex.sklearn.pipeline.Pipeline
__weakref__

list of weak references to the object (if defined)

x_columns

The X columns set in the last call to fit.

Set this property at fit, and call it in other methods:

y_columns

The y columns set in the last call to fit.

Set this property at fit, and call it in other methods:

New in version 0.1.2.