`DictionaryLearning`¶

class ibex.sklearn.decomposition.DictionaryLearning(n_components=None, alpha=1, max_iter=1000, tol=1e-08, fit_algorithm='lars', transform_algorithm='omp', transform_n_nonzero_coefs=None, transform_alpha=None, n_jobs=1, code_init=None, dict_init=None, verbose=False, split_sign=False, random_state=None)¶

Bases: sklearn.decomposition.dict_learning.DictionaryLearning, ibex._base.FrameMixin

Note

The documentation following is of the class wrapped by this class. There are some changes, in particular:

A parameter X denotes a pandas.DataFrame.

A parameter y denotes a pandas.Series.

Note

The documentation following is of the original class wrapped by this class. This class wraps the attribute components_.

Example:

>>> import pandas as pd
>>> import numpy as np
>>> from ibex.sklearn import datasets
>>> from ibex.sklearn.decomposition import PCA as PdPCA

>>> iris = datasets.load_iris()
>>> features = iris['feature_names']
>>> iris = pd.DataFrame(
...     np.c_[iris['data'], iris['target']],
...     columns=features+['class'])

>>> iris[features]
sepal length (cm)  sepal width (cm)  petal length (cm)  petal width (cm)
0                5.1               3.5                1.4               0.2
1                4.9               3.0                1.4               0.2
2                4.7               3.2                1.3               0.2
3                4.6               3.1                1.5               0.2
4                5.0               3.6                1.4               0.2
...

>>> PdPCA(n_components=2).fit(iris[features], iris['class']).transform(iris[features])
    comp_0    comp_1
0   -2.684207 ...0.326607
1   -2.715391 ...0.169557
2   -2.889820 ...0.137346
3   -2.746437 ...0.311124
4   -2.728593 ...0.333925
...

Dictionary learning

Finds a dictionary (a set of atoms) that can best be used to represent data using a sparse code.

Solves the optimization problem:
(U^*,V^*) = argmin 0.5 || Y - U V ||_2^2 + alpha * || U ||_1
            (U,V)
            with || V_k ||_2 = 1 for all  0 <= k < n_components
Read more in the User Guide.

n_components : int,

number of dictionary elements to extract

alpha : float,

sparsity controlling parameter

max_iter : int,

maximum number of iterations to perform

tol : float,

tolerance for numerical error

fit_algorithm : {‘lars’, ‘cd’}

lars: uses the least angle regression method to solve the lasso problem (linear_model.lars_path) cd: uses the coordinate descent method to compute the Lasso solution (linear_model.Lasso). Lars will be faster if the estimated components are sparse.

New in version 0.17: cd coordinate descent method to improve speed.

transform_algorithm : {‘lasso_lars’, ‘lasso_cd’, ‘lars’, ‘omp’, ‘threshold’}

Algorithm used to transform the data lars: uses the least angle regression method (linear_model.lars_path) lasso_lars: uses Lars to compute the Lasso solution lasso_cd: uses the coordinate descent method to compute the Lasso solution (linear_model.Lasso). lasso_lars will be faster if the estimated components are sparse. omp: uses orthogonal matching pursuit to estimate the sparse solution threshold: squashes to zero all coefficients less than alpha from the projection dictionary * X'

New in version 0.17: lasso_cd coordinate descent method to improve speed.

transform_n_nonzero_coefs : int, 0.1 * n_features by default

Number of nonzero coefficients to target in each column of the solution. This is only used by algorithm=’lars’ and algorithm=’omp’ and is overridden by alpha in the omp case.

transform_alpha : float, 1. by default

If algorithm=’lasso_lars’ or algorithm=’lasso_cd’, alpha is the penalty applied to the L1 norm. If algorithm=’threshold’, alpha is the absolute value of the threshold below which coefficients will be squashed to zero. If algorithm=’omp’, alpha is the tolerance parameter: the value of the reconstruction error targeted. In this case, it overrides n_nonzero_coefs.

n_jobs : int,

number of parallel jobs to run

code_init : array of shape (n_samples, n_components),

initial value for the code, for warm restart

dict_init : array of shape (n_components, n_features),

initial values for the dictionary, for warm restart

verbose : bool, optional (default: False)

To control the verbosity of the procedure.

split_sign : bool, False by default

Whether to split the sparse feature vector into the concatenation of its negative part and its positive part. This can improve the performance of downstream classifiers.

random_state : int, RandomState instance or None, optional (default=None)

If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.

components_ : array, [n_components, n_features]

dictionary atoms extracted from the data

error_ : array

vector of errors at each iteration

n_iter_ : int

Number of iterations run.

References:

J. Mairal, F. Bach, J. Ponce, G. Sapiro, 2009: Online dictionary learning for sparse coding (http://www.di.ens.fr/sierra/pdfs/icml09.pdf)

SparseCoder MiniBatchDictionaryLearning SparsePCA MiniBatchSparsePCA

fit(X, y=None)[source]¶

Note

The documentation following is of the class wrapped by this class. There are some changes, in particular:

A parameter X denotes a pandas.DataFrame.

A parameter y denotes a pandas.Series.

Fit the model from data in X.

X : array-like, shape (n_samples, n_features)

Training vector, where n_samples in the number of samples and n_features is the number of features.

y : Ignored.

self : object

Returns the object itself

fit_transform(X, y=None, **fit_params)¶

Note

The documentation following is of the class wrapped by this class. There are some changes, in particular:

A parameter X denotes a pandas.DataFrame.

A parameter y denotes a pandas.Series.

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

X : numpy array of shape [n_samples, n_features]

Training set.

y : numpy array of shape [n_samples]

Target values.

X_new : numpy array of shape [n_samples, n_features_new]

Transformed array.

transform(X)¶

Note

The documentation following is of the class wrapped by this class. There are some changes, in particular:

A parameter X denotes a pandas.DataFrame.

A parameter y denotes a pandas.Series.

Encode the data as a sparse combination of the dictionary atoms.

Coding method is determined by the object parameter transform_algorithm.

X : array of shape (n_samples, n_features)

Test data to be transformed, must have the same number of features as the data used to train the model.

X_new : array, shape (n_samples, n_components)

Transformed data

`DictionaryLearning`¶

Table Of Contents

Related Topics

This Page

DictionaryLearning¶

`DictionaryLearning`¶