OutputCodeClassifier
¶
-
class
ibex.sklearn.multiclass.
OutputCodeClassifier
(estimator, code_size=1.5, random_state=None, n_jobs=1)¶ Bases:
sklearn.multiclass.OutputCodeClassifier
,ibex._base.FrameMixin
Note
The documentation following is of the class wrapped by this class. There are some changes, in particular:
- A parameter
X
denotes apandas.DataFrame
. - A parameter
y
denotes apandas.Series
.
(Error-Correcting) Output-Code multiclass strategy
Output-code based strategies consist in representing each class with a binary code (an array of 0s and 1s). At fitting time, one binary classifier per bit in the code book is fitted. At prediction time, the classifiers are used to project new points in the class space and the class closest to the points is chosen. The main advantage of these strategies is that the number of classifiers used can be controlled by the user, either for compressing the model (0 < code_size < 1) or for making the model more robust to errors (code_size > 1). See the documentation for more details.
Read more in the User Guide.
- estimator : estimator object
- An estimator object implementing fit and one of decision_function or predict_proba.
- code_size : float
- Percentage of the number of classes to be used to create the code book. A number between 0 and 1 will require fewer classifiers than one-vs-the-rest. A number greater than 1 will require more classifiers than one-vs-the-rest.
- random_state : int, RandomState instance or None, optional, default: None
- The generator used to initialize the codebook. If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.
- n_jobs : int, optional, default: 1
- The number of jobs to use for the computation. If -1 all CPUs are used. If 1 is given, no parallel computing code is used at all, which is useful for debugging. For n_jobs below -1, (n_cpus + 1 + n_jobs) are used. Thus for n_jobs = -2, all CPUs but one are used.
- estimators_ : list of int(n_classes * code_size) estimators
- Estimators used for predictions.
- classes_ : numpy array of shape [n_classes]
- Array containing labels.
- code_book_ : numpy array of shape [n_classes, code_size]
- Binary array containing the code of each class.
[1] “Solving multiclass learning problems via error-correcting output codes”, Dietterich T., Bakiri G., Journal of Artificial Intelligence Research 2, 1995. [2] “The error coding method and PICTs”, James G., Hastie T., Journal of Computational and Graphical statistics 7, 1998. [3] “The Elements of Statistical Learning”, Hastie T., Tibshirani R., Friedman J., page 606 (second-edition) 2008. -
fit
(X, y)[source]¶ Note
The documentation following is of the class wrapped by this class. There are some changes, in particular:
- A parameter
X
denotes apandas.DataFrame
. - A parameter
y
denotes apandas.Series
.
Fit underlying estimators.
- X : (sparse) array-like, shape = [n_samples, n_features]
- Data.
- y : numpy array of shape [n_samples]
- Multi-class targets.
self
- A parameter
-
predict
(X)[source]¶ Note
The documentation following is of the class wrapped by this class. There are some changes, in particular:
- A parameter
X
denotes apandas.DataFrame
. - A parameter
y
denotes apandas.Series
.
Predict multi-class targets using underlying estimators.
- X : (sparse) array-like, shape = [n_samples, n_features]
- Data.
- y : numpy array of shape [n_samples]
- Predicted multi-class targets.
- A parameter
-
score
(X, y, sample_weight=None)¶ Note
The documentation following is of the class wrapped by this class. There are some changes, in particular:
- A parameter
X
denotes apandas.DataFrame
. - A parameter
y
denotes apandas.Series
.
Returns the mean accuracy on the given test data and labels.
In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.
- X : array-like, shape = (n_samples, n_features)
- Test samples.
- y : array-like, shape = (n_samples) or (n_samples, n_outputs)
- True labels for X.
- sample_weight : array-like, shape = [n_samples], optional
- Sample weights.
- score : float
- Mean accuracy of self.predict(X) wrt. y.
- A parameter
- A parameter