Principal Components Analysis


return to the course schedule
 

Ed 710 Educational Statistics
Spring 2003
Copyright - Antonia D'Onofrio - 2001/2002/2003

Introduction
 

Principal Components Analysis is a multi variate technique that is grounded in several fundamental regression concepts that you have previously considered.  The more important of these regression concepts include:


What would be a typical application of this technique
 

The method of principal components analysis is primarily that of data reduction.  A large number of variables can be reduced to a smaller number of variables, ideally one variable that contains all of the reliable information that is described in the larger set of parent variables.  For example, 20 scores taken from a variety of reading measures can be reduced to become a single score or measure of general reading ability.  Another example might be the reduction of 10 performance tasks associated with intelligence into a single score that represent performance IQ.  Yet another example would be the reduction of 40 items on a test of leadership into a single score that would then be termed a scaled score of leadership.
 

How does principal components analysis achieve this end.?
 

The technique of principal components analysis begins as do all multi variate techniques by computing the intercorrelations of all the variables in a study.  A number of assumptions regarding the method are tested during this process.  Among these assumptions are included::
 
 
 


All combinations of standardized regression coefficients are computed.  Recall that these are analogous to b weights, however they have been standardized so as to proportionalize measurement of all variables, in the event that variables do not all come from the similar measurement distributions.
 

These standardized coefficients are referred to as loadings in principal components analysis. (Why should life be simple?) They essentially describe the extent to which each variable predicts every other variable in a se t.
 

There will be x(x-1)/2 non redundant coefficients at the end of these calculations.
 

What happens to these coefficients?

The coefficients are "resolved" as weighted linear combinations of one another.  This is exactly analogous to the idea of a regression equation.  Any predictor that is predicted by all the other predictors in a set is described as the best weighted linear combination of the other predictors.  In other word, when score #1 in a set of ten performance tasks is predicted, it is predicted by the other nine.  There are 9 regression coefficients describing the relationships between the score on task # 1 and the other nine task scores. These calculations are based on all observations, that is the scores of all individuals taking the performance test.

The next step in the analysis is to determine which linear combination of predictors contains the most variance-- i.e., pools the largest amount of variation in the entire set of predictor scores.
 

For example, the linear combination of predictors that predict task #3 may contain more variance than any other equation describing the linear combination of predictors that predict any of the remaining 9 task scores.
 

The process is repeated 8 more times until prediction of all remaining task scores is completed.  Theoretically there will be X-1 components.  And theoretically, each component explains the prediction of a single task score.

The resulting mathematical structure might look like this.

variables I II III IV V VI VII VII IX X
1 1.0. - - - - - - - - -
2  - 1.0 - - - - - -
3 - - 1.0 - - - - - - -
4 - - - 1.0 - - - - - -
5 - - - - 1.0 - - - -
6 - - - - - 1.0 - - - -
7 - - - - - - 1.0 - - -
8 - - - - - - - 1.0 - -
9 - - - - - - - - 1.0 -
10 - - - - - - - - -. 1.0

The value sin the cells are semi partial correlations, or squared standardized regression coefficients. Because they are squared values, they can be summed.  The sum of all loadings is referred to as an Eigenvalue, or the sum of all squared loadings.  The Eigenvalue for each component in this example would be 1.0.
 
 
 

When examined horizontally, each predictor explains 100 percent of the variance in its own component.  Since the other components have loadings of zero (0), each predictor also explains 100 percent of all the variance in each component that can be attributed to a single predictor.
 

In this example, each predictor is a perfect predictor of a component that is uniquely associated with itself.
This arrangement illustrates that each predictor is the best linear combination of itself.  All the other cells that might be filled with values that represent predictors of lesser importance fade away to zero.
 

The structure of coefficients in this example in fact illustrates the null hypothesis of principal components analysis:  i.e., that there are as many components as their are predictors.  This would therefore not be the hoped for solution.
 
 

The ideal solution in principal components analysis is for all predictors to load on a single component.  All loadings should also be positive.  When this occurs, then a simple structure and a positive manifold has been achieved.  This would be the solution that demonstrates that the alternative hypothesis has been achieved.

Revise the table below so that a simple structure and a positive manifold is evident.
 

variables I II III IV V VI VII VII IX X
1 1.0. - - - - - - - - -
2  - 1.0 - - - - - -
3 - - 1.0 - - - - - - -
4 - - - 1.0 - - - - - -
5 - - - - 1.0 - - - -
6 - - - - - 1.0 - - - -
7 - - - - - - 1.0 - - -
8 - - - - - - - 1.0 - -
9 - - - - - - - - 1.0 -
10 - - - - - - - - -. 1.0

 

Once computed a principal components analysis may not be interpreted as is.  When each  component has an Eigenvalue of 1.0 data reduction is not possible.
 

Until the structure of coefficients is recomputed. it is not possible to see how many predictors pool variation and form a single variable made up of aggregated variance.  In reality, more than one component may emerge, demonstrating more than one new variable, but certainly fewer than the original number of predictors.

The process of reducing variation is termed rotation
 

There are two commonly used approaches to rotation.
 


 
 

The most commonly used rotation is the Kaiser normalization technique.  This technique concentrates variation into components.  This will make components easier to interpret as constructed pools of variance based on the predictors that predict them.  With the Kaiser normalization technique, loadings will tend to be higher in the first component.  Subsequent components will explain increasingly more error variance and less common or explainable variance.  Thus the loadings on these components will be smaller with each computation of a component. The Eigenvalue of the first component will be  the largest, and so on down the line.
 

Equimax rotations are used to concentrate variance in predictors.  With this technique, more interpretable components will be resolved.  A few loadings will be interpretable on several components.  The remaining loadings on each component will be negligible.  This approach will also spread variance over components, making Eigenvalues smaller, but also spread out over more components.