Guidelines for Conducting Varimax PCA Analysis: Insights for Minimal Variable Loadings

Guidelines for Conducting Varimax PCA Analysis: Insights for Minimal Variable Loadings

When conducting varimax PCA analysis, the specific number of variables required to load onto a single component can be highly subjective. The utility of PCA lies in its flexibility; there are no strict rules; only guidelines. However, certain practices can help ensure the results are interpretable and useful.

Understanding PCA and Component Analysis

Principal Component Analysis (PCA) is a statistical procedure that reduces the dimensionality of data while retaining patterns and nuances. Varimax rotation is often used to simplify the interpretability of factors by maximizing the variance of the loadings within each factor. The aim is to create factors that are as interpretable as possible.

Choosing the Number of Components

Henry Kaiser, the originator of Little Jiffy, suggests extracting components based on the Kaiser criterion, which involves retaining components with eigenvalues greater than 1.0. This criterion is often criticized for being somewhat arbitrary, but it serves as a useful starting point. However, it may not always be the best approach, particularly when the number of variables is large and varied.

Parti-based Extraction

If you believe in using pure factor analysis (FA) over PCA and prefer limited dimensionality, you might evaluate the number of factors using techniques like the scree plot. The scree plot graphs the eigenvalues of each component and looks for a visual inflection point, often referred to as the "elbow" of the curve. This graphical method can help determine the number of valuable components to retain.

Minimum Eigenvalue Contribution Rule

Another rule-of-thumb involves the minimum eigenvalue contribution rule, which suggests retaining components with eigenvalues of at least 1.0. This rule roughly corresponds to the impact of a single input feature and is particularly useful when the number of components is large.

Evaluating Component Loadings

The key to meaningful results lies in the interpretation of the components. A component should ideally have clear, distinct loadings, meaning that it is primarily associated with a few variables. A common concern is whether each component should be associated with at least three variables. This is not a hard and fast rule, but it can help ensure that the component has enough explanatory power to be meaningful. However, as stated earlier, having only one variable associated with a component is not inherently wrong.

Practical Considerations for Loadings

If you find that a single component is dominated by a single variable, this could indicate that the component is too narrow in scope to be useful. Conversely, a component with multiple variables can offer a more comprehensive explanation. The real question is how well the component can be interpreted in the context of the problem you are addressing.

My Preferred Method: Eigenvalue Differencing

I often use a modified version of the scree plot method, ranking eigenvalues and looking at the percentage change. This method involves finding the last eigenvalue with a positive difference before the changes switch to zero or below. The component numbers are then chosen based on this point. This approach often provides robust and useful results.

Example: Suppose the eigenvalues are as follows:

1.2, 1.1, 1.0, 0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3

The differences between consecutive eigenvalues are:

0.1, -0.1, -0.1, -0.1, -0.1, -0.1, -0.1, -0.1, -0.1

Using the eigenvalue ranking method, the last positive difference is between the first two eigenvalues. Therefore, the two components with the highest eigenvalues would be retained.

Conclusion

While there are no hard and fast rules regarding the number of variables that should load onto a single component, this number can significantly impact the interpretability of your analysis. By using flexible and context-specific methods, such as the eigenvalue differencing approach, you can ensure that your results are both useful and meaningful.