DictionaryLearning
¶
-
class
ibex.sklearn.decomposition.
DictionaryLearning
(n_components=None, alpha=1, max_iter=1000, tol=1e-08, fit_algorithm='lars', transform_algorithm='omp', transform_n_nonzero_coefs=None, transform_alpha=None, n_jobs=1, code_init=None, dict_init=None, verbose=False, split_sign=False, random_state=None)¶ Bases:
sklearn.decomposition.dict_learning.DictionaryLearning
,ibex._base.FrameMixin
Note
The documentation following is of the class wrapped by this class. There are some changes, in particular:
- A parameter
X
denotes apandas.DataFrame
. - A parameter
y
denotes apandas.Series
.
Note
The documentation following is of the original class wrapped by this class. This class wraps the attribute
components_
.Example:
>>> import pandas as pd >>> import numpy as np >>> from ibex.sklearn import datasets >>> from ibex.sklearn.decomposition import PCA as PdPCA
>>> iris = datasets.load_iris() >>> features = iris['feature_names'] >>> iris = pd.DataFrame( ... np.c_[iris['data'], iris['target']], ... columns=features+['class'])
>>> iris[features] sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) 0 5.1 3.5 1.4 0.2 1 4.9 3.0 1.4 0.2 2 4.7 3.2 1.3 0.2 3 4.6 3.1 1.5 0.2 4 5.0 3.6 1.4 0.2 ...
>>> PdPCA(n_components=2).fit(iris[features], iris['class']).transform(iris[features]) comp_0 comp_1 0 -2.684207 ...0.326607 1 -2.715391 ...0.169557 2 -2.889820 ...0.137346 3 -2.746437 ...0.311124 4 -2.728593 ...0.333925 ...
Dictionary learning
Finds a dictionary (a set of atoms) that can best be used to represent data using a sparse code.
Solves the optimization problem:
(U^*,V^*) = argmin 0.5 || Y - U V ||_2^2 + alpha * || U ||_1 (U,V) with || V_k ||_2 = 1 for all 0 <= k < n_components
Read more in the User Guide.
- n_components : int,
- number of dictionary elements to extract
- alpha : float,
- sparsity controlling parameter
- max_iter : int,
- maximum number of iterations to perform
- tol : float,
- tolerance for numerical error
- fit_algorithm : {‘lars’, ‘cd’}
lars: uses the least angle regression method to solve the lasso problem (linear_model.lars_path) cd: uses the coordinate descent method to compute the Lasso solution (linear_model.Lasso). Lars will be faster if the estimated components are sparse.
New in version 0.17: cd coordinate descent method to improve speed.
- transform_algorithm : {‘lasso_lars’, ‘lasso_cd’, ‘lars’, ‘omp’, ‘threshold’}
Algorithm used to transform the data lars: uses the least angle regression method (linear_model.lars_path) lasso_lars: uses Lars to compute the Lasso solution lasso_cd: uses the coordinate descent method to compute the Lasso solution (linear_model.Lasso). lasso_lars will be faster if the estimated components are sparse. omp: uses orthogonal matching pursuit to estimate the sparse solution threshold: squashes to zero all coefficients less than alpha from the projection
dictionary * X'
New in version 0.17: lasso_cd coordinate descent method to improve speed.
- transform_n_nonzero_coefs : int,
0.1 * n_features
by default - Number of nonzero coefficients to target in each column of the solution. This is only used by algorithm=’lars’ and algorithm=’omp’ and is overridden by alpha in the omp case.
- transform_alpha : float, 1. by default
- If algorithm=’lasso_lars’ or algorithm=’lasso_cd’, alpha is the penalty applied to the L1 norm. If algorithm=’threshold’, alpha is the absolute value of the threshold below which coefficients will be squashed to zero. If algorithm=’omp’, alpha is the tolerance parameter: the value of the reconstruction error targeted. In this case, it overrides n_nonzero_coefs.
- n_jobs : int,
- number of parallel jobs to run
- code_init : array of shape (n_samples, n_components),
- initial value for the code, for warm restart
- dict_init : array of shape (n_components, n_features),
- initial values for the dictionary, for warm restart
- verbose : bool, optional (default: False)
- To control the verbosity of the procedure.
- split_sign : bool, False by default
- Whether to split the sparse feature vector into the concatenation of its negative part and its positive part. This can improve the performance of downstream classifiers.
- random_state : int, RandomState instance or None, optional (default=None)
- If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.
- components_ : array, [n_components, n_features]
- dictionary atoms extracted from the data
- error_ : array
- vector of errors at each iteration
- n_iter_ : int
- Number of iterations run.
References:
J. Mairal, F. Bach, J. Ponce, G. Sapiro, 2009: Online dictionary learning for sparse coding (http://www.di.ens.fr/sierra/pdfs/icml09.pdf)
SparseCoder MiniBatchDictionaryLearning SparsePCA MiniBatchSparsePCA
-
fit
(X, y=None)[source]¶ Note
The documentation following is of the class wrapped by this class. There are some changes, in particular:
- A parameter
X
denotes apandas.DataFrame
. - A parameter
y
denotes apandas.Series
.
Fit the model from data in X.
- X : array-like, shape (n_samples, n_features)
- Training vector, where n_samples in the number of samples and n_features is the number of features.
y : Ignored.
- self : object
- Returns the object itself
- A parameter
-
fit_transform
(X, y=None, **fit_params)¶ Note
The documentation following is of the class wrapped by this class. There are some changes, in particular:
- A parameter
X
denotes apandas.DataFrame
. - A parameter
y
denotes apandas.Series
.
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
- X : numpy array of shape [n_samples, n_features]
- Training set.
- y : numpy array of shape [n_samples]
- Target values.
- X_new : numpy array of shape [n_samples, n_features_new]
- Transformed array.
- A parameter
-
transform
(X)¶ Note
The documentation following is of the class wrapped by this class. There are some changes, in particular:
- A parameter
X
denotes apandas.DataFrame
. - A parameter
y
denotes apandas.Series
.
Encode the data as a sparse combination of the dictionary atoms.
Coding method is determined by the object parameter transform_algorithm.
- X : array of shape (n_samples, n_features)
- Test data to be transformed, must have the same number of features as the data used to train the model.
- X_new : array, shape (n_samples, n_components)
- Transformed data
- A parameter
- A parameter