RandomizedPCA

class ibex.sklearn.decomposition.RandomizedPCA(*args, **kwargs)

Bases: sklearn.decomposition.pca.RandomizedPCA, ibex._base.FrameMixin

Note

The documentation following is of the class wrapped by this class. There are some changes, in particular:

Note

The documentation following is of the original class wrapped by this class. This class wraps the attribute components_.

Example:

>>> import pandas as pd
>>> import numpy as np
>>> from ibex.sklearn import datasets
>>> from ibex.sklearn.decomposition import PCA as PdPCA
>>> iris = datasets.load_iris()
>>> features = iris['feature_names']
>>> iris = pd.DataFrame(
...     np.c_[iris['data'], iris['target']],
...     columns=features+['class'])
>>> iris[features]
sepal length (cm)  sepal width (cm)  petal length (cm)  petal width (cm)
0                5.1               3.5                1.4               0.2
1                4.9               3.0                1.4               0.2
2                4.7               3.2                1.3               0.2
3                4.6               3.1                1.5               0.2
4                5.0               3.6                1.4               0.2
...
>>> PdPCA(n_components=2).fit(iris[features], iris['class']).transform(iris[features])
    comp_0    comp_1
0   -2.684207 ...0.326607
1   -2.715391 ...0.169557
2   -2.889820 ...0.137346
3   -2.746437 ...0.311124
4   -2.728593 ...0.333925
...

Principal component analysis (PCA) using randomized SVD

Deprecated since version 0.18: This class will be removed in 0.20. Use PCA with parameter svd_solver ‘randomized’ instead. The new implementation DOES NOT store whiten components_. Apply transform to get them.

Linear dimensionality reduction using approximated Singular Value Decomposition of the data and keeping only the most significant singular vectors to project the data to a lower dimensional space.

Read more in the User Guide.

n_components : int, optional
Maximum number of components to keep. When not given or None, this is set to n_features (the second dimension of the training data).
copy : bool
If False, data passed to fit are overwritten and running fit(X).transform(X) will not yield the expected results, use fit_transform(X) instead.
iterated_power : int, default=2

Number of iterations for the power method.

Changed in version 0.18.

whiten : bool, optional

When True (False by default) the components_ vectors are multiplied by the square root of (n_samples) and divided by the singular values to ensure uncorrelated outputs with unit component-wise variances.

Whitening will remove some information from the transformed signal (the relative variance scales of the components) but can sometime improve the predictive accuracy of the downstream estimators by making their data respect some hard-wired assumptions.

random_state : int, RandomState instance or None, optional, default=None
If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.
components_ : array, shape (n_components, n_features)
Components with maximum variance.
explained_variance_ratio_ : array, shape (n_components,)
Percentage of variance explained by each of the selected components. If k is not set then all components are stored and the sum of explained variances is equal to 1.0.
singular_values_ : array, shape (n_components,)
The singular values corresponding to each of the selected components. The singular values are equal to the 2-norms of the n_components variables in the lower-dimensional space.
mean_ : array, shape (n_features,)
Per-feature empirical mean, estimated from the training set.
>>> import numpy as np
>>> from sklearn.decomposition import RandomizedPCA
>>> X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
>>> pca = RandomizedPCA(n_components=2)
>>> pca.fit(X)                 
RandomizedPCA(copy=True, iterated_power=2, n_components=2,
       random_state=None, whiten=False)
>>> print(pca.explained_variance_ratio_)  
[ 0.99244...  0.00755...]
>>> print(pca.singular_values_)  
[ 6.30061...  0.54980...]

PCA TruncatedSVD

[Halko2009]Finding structure with randomness: Stochastic algorithms for constructing approximate matrix decompositions Halko, et al., 2009 (arXiv:909)
[MRT]A randomized algorithm for the decomposition of matrices Per-Gunnar Martinsson, Vladimir Rokhlin and Mark Tygert
fit(X, y=None)[source]

Note

The documentation following is of the class wrapped by this class. There are some changes, in particular:

Fit the model with X by extracting the first principal components.

X : array-like, shape (n_samples, n_features)
Training data, where n_samples in the number of samples and n_features is the number of features.

y : Ignored.

self : object
Returns the instance itself.
fit_transform(X, y=None)[source]

Note

The documentation following is of the class wrapped by this class. There are some changes, in particular:

Fit the model with X and apply the dimensionality reduction on X.

X : array-like, shape (n_samples, n_features)
New data, where n_samples in the number of samples and n_features is the number of features.

y : Ignored.

X_new : array-like, shape (n_samples, n_components)

inverse_transform(X)[source]

Note

The documentation following is of the class wrapped by this class. There are some changes, in particular:

Transform data back to its original space.

Returns an array X_original whose transform would be X.

X : array-like, shape (n_samples, n_components)
New data, where n_samples in the number of samples and n_components is the number of components.

X_original array-like, shape (n_samples, n_features)

If whitening is enabled, inverse_transform does not compute the exact inverse operation of transform.

transform(X)[source]

Note

The documentation following is of the class wrapped by this class. There are some changes, in particular:

Apply dimensionality reduction on X.

X is projected on the first principal components previous extracted from a training set.

X : array-like, shape (n_samples, n_features)
New data, where n_samples in the number of samples and n_features is the number of features.

X_new : array-like, shape (n_samples, n_components)