Adapting Estimators¶
This chapter describes a low-level function, ibex.frame()
, which transforms estimators conforming to the sickit-learn protocol to pandas
-aware estimators.
Tip
This chapter describes interfaces required for writing a pandas
-aware estimator from scratch, or for adapting an estimator to be pandas
-aware. As all of sklearn
is wrapped by Ibex, this can be skipped if you’re not planning on doing either.
The frame
function takes an estimator, and returns an adapter of that estimator. This adapter does the
same thing as the adapted class, except that:
- It expects to take
pandas.DataFrame
andpandas.Series
objects, notnumpy.array
objects. It performs verifications on these inputs, - and processes the outputs, as described in Verification and Processing.
- It expects to take
- It supports two additional operators:
|
for pipelining (see Pipelining), and+
for feature unions (see Uniting Features).
Suppose we start with:
>>> from sklearn import linear_model
>>> from sklearn import preprocessing
>>> from sklearn import base
>>> from ibex import frame
We can use frame
to adapt an object:
>>> prd = frame(linear_model.LinearRegression())
>>> prd
Adapter[LinearRegression](copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)
We can use frame
to adapt a class:
>>> PdLinearRegression = frame(linear_model.LinearRegression)
>>> PdStandardScaler = frame(preprocessing.StandardScaler)
Once we adapt a class, it behaves pretty much like the underlying one. We can construct it in whatever ways it the underlying class supports, for example:
>>> PdLinearRegression()
Adapter[LinearRegression](copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)
>>> PdLinearRegression(fit_intercept=False)
Adapter[LinearRegression](copy_X=True, fit_intercept=False, n_jobs=1, normalize=False)
It has the same name as the underlying class:
>>> PdLinearRegression.__name__
'LinearRegression'
It subclasses the same mixins of the underlying class:
>>> isinstance(PdLinearRegression(), base.RegressorMixin)
True
>>> isinstance(PdLinearRegression(), base.TransformerMixin)
False
>>> isinstance(PdStandardScaler(), base.RegressorMixin)
False
>>> isinstance(PdStandardScaler(), base.TransformerMixin)
True
As can be seen above, though, the string and representation is modified, to signify this is an adapted type:
>>> PdLinearRegression()
Adapter[LinearRegression](copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)
>>> linear_model.LinearRegression()
LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)
Of course, the imposition to decorate every class (not to mention object) via frame
, can become annoying.
If a library is used often enough, it might pay to wrap it once. Ibex does this (nearly completely) automatically for sklearn
(see sklearn).