Adapting Estimators

This chapter describes a low-level function, ibex.frame(), which transforms estimators conforming to the sickit-learn protocol to pandas-aware estimators.

Tip

This chapter describes interfaces required for writing a pandas-aware estimator from scratch, or for adapting an estimator to be pandas-aware. As all of sklearn is wrapped by Ibex, this can be skipped if you’re not planning on doing either.

The frame function takes an estimator, and returns an adapter of that estimator. This adapter does the same thing as the adapted class, except that:

  1. It expects to take pandas.DataFrame and pandas.Series objects, not numpy.array objects. It performs verifications on these inputs,
    and processes the outputs, as described in Verification and Processing.
  2. It supports two additional operators: | for pipelining (see Pipelining), and + for feature unions (see Uniting Features).

Suppose we start with:

>>> from sklearn import linear_model
>>> from sklearn import preprocessing
>>> from sklearn import base
>>> from ibex import frame

We can use frame to adapt an object:

>>> prd = frame(linear_model.LinearRegression())
>>> prd
Adapter[LinearRegression](copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)

We can use frame to adapt a class:

>>> PdLinearRegression = frame(linear_model.LinearRegression)
>>> PdStandardScaler = frame(preprocessing.StandardScaler)

Once we adapt a class, it behaves pretty much like the underlying one. We can construct it in whatever ways it the underlying class supports, for example:

>>> PdLinearRegression()
Adapter[LinearRegression](copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)
>>> PdLinearRegression(fit_intercept=False)
Adapter[LinearRegression](copy_X=True, fit_intercept=False, n_jobs=1, normalize=False)

It has the same name as the underlying class:

>>> PdLinearRegression.__name__
'LinearRegression'

It subclasses the same mixins of the underlying class:

>>> isinstance(PdLinearRegression(), base.RegressorMixin)
True
>>> isinstance(PdLinearRegression(), base.TransformerMixin)
False
>>> isinstance(PdStandardScaler(), base.RegressorMixin)
False
>>> isinstance(PdStandardScaler(), base.TransformerMixin)
True

As can be seen above, though, the string and representation is modified, to signify this is an adapted type:

>>> PdLinearRegression()
Adapter[LinearRegression](copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)
>>> linear_model.LinearRegression()
LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)


Of course, the imposition to decorate every class (not to mention object) via frame, can become annoying.

_images/got_frame.jpeg

If a library is used often enough, it might pay to wrap it once. Ibex does this (nearly completely) automatically for sklearn (see sklearn).