Ed 710 Educational Statistics
Spring 2003
Copyright - Antonia D'Onofrio - 2001/2002/2003
Introduction
What would be a typical application of this technique
The method of principal components analysis is primarily that of data
reduction. A large number of variables can be reduced to a smaller
number of variables, ideally one variable that contains all of the reliable
information that is described in the larger set of parent variables.
For example, 20 scores taken from a variety of reading measures can be
reduced to become a single score or measure of general reading ability.
Another example might be the reduction of 10 performance tasks associated
with intelligence into a single score that represent performance IQ.
Yet another example would be the reduction of 40 items on a test of leadership
into a single score that would then be termed a scaled score of leadership.
How does principal components analysis achieve this end.?
The technique of principal components analysis begins as do all multi
variate techniques by computing the intercorrelations of all the variables
in a study. A number of assumptions regarding the method are tested
during this process. Among these assumptions are included::
All combinations of standardized regression coefficients are computed.
Recall that these are analogous to b weights, however they have been standardized
so as to proportionalize measurement of all variables, in the event that
variables do not all come from the similar measurement distributions.
These standardized coefficients are referred to as loadings in principal
components analysis. (Why should life be simple?) They essentially describe
the extent to which each variable predicts every other variable in a se
t.
There will be x(x-1)/2 non redundant coefficients at the end of these
calculations.
What happens to these coefficients?
The coefficients are "resolved" as weighted linear combinations of one another. This is exactly analogous to the idea of a regression equation. Any predictor that is predicted by all the other predictors in a set is described as the best weighted linear combination of the other predictors. In other word, when score #1 in a set of ten performance tasks is predicted, it is predicted by the other nine. There are 9 regression coefficients describing the relationships between the score on task # 1 and the other nine task scores. These calculations are based on all observations, that is the scores of all individuals taking the performance test.
The next step in the analysis is to determine which linear combination
of predictors contains the most variance-- i.e., pools the largest amount
of variation in the entire set of predictor scores.
For example, the linear combination of predictors that predict task
#3 may contain more variance than any other equation describing the linear
combination of predictors that predict any of the remaining 9 task scores.
The process is repeated 8 more times until prediction of all remaining task scores is completed. Theoretically there will be X-1 components. And theoretically, each component explains the prediction of a single task score.
The resulting mathematical structure might look like this.
| variables | I | II | III | IV | V | VI | VII | VII | IX | X |
| 1 | 1.0. | - | - | - | - | - | - | - | - | - |
| 2 | - | 1.0 | - | - | - | - | - | - | - | - |
| 3 | - | - | 1.0 | - | - | - | - | - | - | - |
| 4 | - | - | - | 1.0 | - | - | - | - | - | - |
| 5 | - | - | - | - | 1.0 | - | - | - | - | |
| 6 | - | - | - | - | - | 1.0 | - | - | - | - |
| 7 | - | - | - | - | - | - | 1.0 | - | - | - |
| 8 | - | - | - | - | - | - | - | 1.0 | - | - |
| 9 | - | - | - | - | - | - | - | - | 1.0 | - |
| 10 | - | - | - | - | - | - | - | - | -. | 1.0 |
The value sin the cells are semi partial correlations, or squared standardized
regression coefficients. Because they are squared values, they can be summed.
The sum of all loadings is referred to as an Eigenvalue, or the sum of
all squared loadings. The Eigenvalue for each component in this example
would be 1.0.
When examined horizontally, each predictor explains 100 percent of the
variance in its own component. Since the other components have loadings
of zero (0), each predictor also explains 100 percent of all the variance
in each component that can be attributed to a single predictor.
In this example, each predictor is a perfect predictor of a component
that is uniquely associated with itself.
This arrangement illustrates that each predictor is the best linear
combination of itself. All the other cells that might be filled with
values that represent predictors of lesser importance fade away to zero.
The structure of coefficients in this example in fact illustrates the
null hypothesis of principal components analysis: i.e., that there
are as many components as their are predictors. This would therefore
not be the hoped for solution.
The ideal solution in principal components analysis is for all predictors to load on a single component. All loadings should also be positive. When this occurs, then a simple structure and a positive manifold has been achieved. This would be the solution that demonstrates that the alternative hypothesis has been achieved.
Revise the table below so that a simple structure and a positive manifold
is evident.
| variables | I | II | III | IV | V | VI | VII | VII | IX | X |
| 1 | 1.0. | - | - | - | - | - | - | - | - | - |
| 2 | - | 1.0 | - | - | - | - | - | - | - | - |
| 3 | - | - | 1.0 | - | - | - | - | - | - | - |
| 4 | - | - | - | 1.0 | - | - | - | - | - | - |
| 5 | - | - | - | - | 1.0 | - | - | - | - | |
| 6 | - | - | - | - | - | 1.0 | - | - | - | - |
| 7 | - | - | - | - | - | - | 1.0 | - | - | - |
| 8 | - | - | - | - | - | - | - | 1.0 | - | - |
| 9 | - | - | - | - | - | - | - | - | 1.0 | - |
| 10 | - | - | - | - | - | - | - | - | -. | 1.0 |
Once computed a principal components analysis may not be interpreted
as is. When each component has an Eigenvalue of 1.0 data reduction
is not possible.
Until the structure of coefficients is recomputed. it is not possible to see how many predictors pool variation and form a single variable made up of aggregated variance. In reality, more than one component may emerge, demonstrating more than one new variable, but certainly fewer than the original number of predictors.
The process of reducing variation is termed rotation
There are two commonly used approaches to rotation.
The most commonly used rotation is the Kaiser normalization technique.
This technique concentrates variation into components. This will
make components easier to interpret as constructed pools of variance based
on the predictors that predict them. With the Kaiser normalization
technique, loadings will tend to be higher in the first component.
Subsequent components will explain increasingly more error variance and
less common or explainable variance. Thus the loadings on these components
will be smaller with each computation of a component. The Eigenvalue of
the first component will be the largest, and so on down the line.
Equimax rotations are used to concentrate variance in predictors.
With this technique, more interpretable components will be resolved.
A few loadings will be interpretable on several components. The remaining
loadings on each component will be negligible. This approach will
also spread variance over components, making Eigenvalues smaller, but also
spread out over more components.