If you know the basics PCA you probably know that PCA is computationally extensive. This article gives an intuition behind all the extensive mathematical calculation behind PCA. Let’s dive deep.
A simple way of computing PCA of a matrix X is to compute the eigenvalue decomposition of its covariance matrix. Given N ×D matrix X, a target dimensionality d, the algorithm computes an N ×d matrix V such that the columns of V are the principal components of X.
The algorithm for computing the eigenvalue decomposition of the covariance matrix can be summarized in the following steps:
- Step1: Compute the mean-centred matrix Xc: The mean-centred matrix Xc is computed by subtracting the vector of all the column means of matrix X from each row of X.
- Step 2: Compute the covariance matrix Cv = X′ c ∗Xc: The covariance matrix is computed by multiplying the mean-centred matrix Xc with its transpose. This is a computationally intensive step because it requires multiplying two matrices where typically none of them can fit in memory.
- Step 3: Compute the eigenvalue decomposition of Cv: Eigenvalues λi and eigenvectors vi satisfy the relation: Cv ∗ λi = λi ∗ vi, where λi is i th eigenvalue and vi is its corresponding eigenvector. Putting the above formula in the matrix form: Cv ∗Λ = Λ∗V, where Λ is a diagonal matrix whose elements are the eigenvalues of the covariance matrix Cv, and V is the matrix whose columns are the corresponding eigenvectors. The eigenvalue decomposition of the matrix Cv is given by Cv = Λ∗V ∗Λ −1.
- Step 4: Get the principal components: The principal components of the input matrix X are the eigenvectors associated with the largest eigenvalues. Since the eigenvalues in the diagonal matrix Λ are sorted in decreasing order, then the first d vectors in the matrix V are the principal components of X: V = (v1, v2,…, vd).
The algorithm has two computationally intensive steps:
- Computing the covariance matrix
- Computing the eigenvalue decomposition of the covariance matrix
3. The computational complexity of the covariance matrix computations is O(ND×min(N, D)) which is a result of multiplying two matrices of size D×N and N ×D, respectively. The other computationally intensive computation is the eigenvalue decomposition. The worst-case complexity of such algorithms is O(D³) for a matrix of size D×D. Therefore the overall complexity is O(ND×min(N, D) +D³).
Libraries like scikit-learn use Singular Value Decomposition (SVD) for Eigenvalue decomposition. SVD is more optimized, numerically accurate and holds for every matrix, unlike eigen decomposition.
Singular Value Decomposition can be described as
How to connect SVD with eigenvalue decomposition?
The column U gives the eigenvectors. Finding the top k singular values of X is equivalent to finding the top k eigenvectors of the covariance matrix.