HuberRegressor
¶
-
class
ibex.sklearn.linear_model.
HuberRegressor
(epsilon=1.35, max_iter=100, alpha=0.0001, warm_start=False, fit_intercept=True, tol=1e-05)¶ Bases:
sklearn.linear_model.huber.HuberRegressor
,ibex._base.FrameMixin
Note
The documentation following is of the class wrapped by this class. There are some changes, in particular:
- A parameter
X
denotes apandas.DataFrame
. - A parameter
y
denotes apandas.Series
.
Note
The documentation following is of the original class wrapped by this class. This class wraps the attribute
coef_
.Example:
>>> import pandas as pd >>> import numpy as np >>> from ibex.sklearn import datasets >>> from ibex.sklearn.linear_model import LinearRegression as PdLinearRegression
>>> iris = datasets.load_iris() >>> features = iris['feature_names'] >>> iris = pd.DataFrame( ... np.c_[iris['data'], iris['target']], ... columns=features+['class'])
>>> iris[features] sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) 0 5.1 3.5 1.4 0.2 1 4.9 3.0 1.4 0.2 2 4.7 3.2 1.3 0.2 3 4.6 3.1 1.5 0.2 4 5.0 3.6 1.4 0.2 ...
>>> from ibex.sklearn import linear_model as pd_linear_model >>> >>> prd = pd_linear_model.HuberRegressor().fit(iris[features], iris['class']) >>> >>> prd.coef_ sepal length (cm) ... sepal width (cm) ... petal length (cm) ... petal width (cm) ... dtype: float64
Note
The documentation following is of the original class wrapped by this class. This class wraps the attribute
intercept_
.Example:
>>> import pandas as pd >>> import numpy as np >>> from ibex.sklearn import datasets >>> from ibex.sklearn.linear_model import LinearRegression as PdLinearRegression
>>> iris = datasets.load_iris() >>> features = iris['feature_names'] >>> iris = pd.DataFrame( ... np.c_[iris['data'], iris['target']], ... columns=features+['class'])
>>> iris[features] sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) 0 5.1 3.5 1.4 0.2 1 4.9 3.0 1.4 0.2 2 4.7 3.2 1.3 0.2 3 4.6 3.1 1.5 0.2 4 5.0 3.6 1.4 0.2 ...
>>> >>> from ibex.sklearn import linear_model as pd_linear_model >>> >>> prd = pd_linear_model.HuberRegressor().fit(iris[features], iris['class']) >>> >>> #scalar intercept >>> type(prd.intercept_) <class 'numpy.float64'>
Linear regression model that is robust to outliers.
The Huber Regressor optimizes the squared loss for the samples where
|(y - X'w) / sigma| < epsilon
and the absolute loss for the samples where|(y - X'w) / sigma| > epsilon
, where w and sigma are parameters to be optimized. The parameter sigma makes sure that if y is scaled up or down by a certain factor, one does not need to rescale epsilon to achieve the same robustness. Note that this does not take into account the fact that the different features of X may be of different scales.This makes sure that the loss function is not heavily influenced by the outliers while not completely ignoring their effect.
Read more in the User Guide
New in version 0.18.
- epsilon : float, greater than 1.0, default 1.35
- The parameter epsilon controls the number of samples that should be classified as outliers. The smaller the epsilon, the more robust it is to outliers.
- max_iter : int, default 100
- Maximum number of iterations that scipy.optimize.fmin_l_bfgs_b should run for.
- alpha : float, default 0.0001
- Regularization parameter.
- warm_start : bool, default False
- This is useful if the stored attributes of a previously used model has to be reused. If set to False, then the coefficients will be rewritten for every call to fit.
- fit_intercept : bool, default True
- Whether or not to fit the intercept. This can be set to False if the data is already centered around the origin.
- tol : float, default 1e-5
- The iteration will stop when
max{|proj g_i | i = 1, ..., n}
<=tol
where pg_i is the i-th component of the projected gradient.
- coef_ : array, shape (n_features,)
- Features got by optimizing the Huber loss.
- intercept_ : float
- Bias.
- scale_ : float
- The value by which
|y - X'w - c|
is scaled down. - n_iter_ : int
- Number of iterations that fmin_l_bfgs_b has run for. Not available if SciPy version is 0.9 and below.
- outliers_ : array, shape (n_samples,)
- A boolean mask which is set to True where the samples are identified as outliers.
[1] Peter J. Huber, Elvezio M. Ronchetti, Robust Statistics Concomitant scale estimates, pg 172 [2] Art B. Owen (2006), A robust hybrid of lasso and ridge regression. http://statweb.stanford.edu/~owen/reports/hhu.pdf -
fit
(X, y, sample_weight=None)[source]¶ Note
The documentation following is of the class wrapped by this class. There are some changes, in particular:
- A parameter
X
denotes apandas.DataFrame
. - A parameter
y
denotes apandas.Series
.
Fit the model according to the given training data.
- X : array-like, shape (n_samples, n_features)
- Training vector, where n_samples in the number of samples and n_features is the number of features.
- y : array-like, shape (n_samples,)
- Target vector relative to X.
- sample_weight : array-like, shape (n_samples,)
- Weight given to each sample.
- self : object
- Returns self.
- A parameter
-
predict
(X)¶ Note
The documentation following is of the class wrapped by this class. There are some changes, in particular:
- A parameter
X
denotes apandas.DataFrame
. - A parameter
y
denotes apandas.Series
.
Predict using the linear model
- X : {array-like, sparse matrix}, shape = (n_samples, n_features)
- Samples.
- C : array, shape = (n_samples,)
- Returns predicted values.
- A parameter
-
score
(X, y, sample_weight=None)¶ Note
The documentation following is of the class wrapped by this class. There are some changes, in particular:
- A parameter
X
denotes apandas.DataFrame
. - A parameter
y
denotes apandas.Series
.
Returns the coefficient of determination R^2 of the prediction.
The coefficient R^2 is defined as (1 - u/v), where u is the residual sum of squares ((y_true - y_pred) ** 2).sum() and v is the total sum of squares ((y_true - y_true.mean()) ** 2).sum(). The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a R^2 score of 0.0.
- X : array-like, shape = (n_samples, n_features)
- Test samples.
- y : array-like, shape = (n_samples) or (n_samples, n_outputs)
- True values for X.
- sample_weight : array-like, shape = [n_samples], optional
- Sample weights.
- score : float
- R^2 of self.predict(X) wrt. y.
- A parameter
- A parameter