Key takeaways from this article
 EFA is an exploratory technique that tries to find the best factor model that fits the data without any prior assumptions or constraints.
 CFA is a confirmatory technique that tests whether a predefined factor model fits the data with some specified assumptions or constraints.
 EFA and CFA have different purposes and applications and can complement each other in factor analysis.
 To perform EFA and CFA in R, you need to use the
psych
and lavaan
packages, which provide various functions for factor analysis and latent variable analysis.  To interpret the results of EFA and CFA, you need to look at the factor loadings, factor scores, fit indices, and other statistics that indicate how well the factor model represents the data and what each factor means.
psych
and lavaan
packages, which provide various functions for factor analysis and latent variable analysis.Functions and their description used in this tutorial
Function Description psych::fa.parallel()
Performs parallel analysis and provides scree plots and other statistics for determining the number of factors to extract psych::fa()
Performs EFA with various options for rotation and extraction methods psych::factor.scores()
Computes factor scores and standard errors for each observation and each factor psych::describe()
Provides descriptive statistics for each variable or factor lavaan::cfa()
Performs CFA with various options for estimation and specification methods lavaan::summary()
Provides summary statistics and fit indices for CFA model
Function  Description 

psych::fa.parallel()  Performs parallel analysis and provides scree plots and other statistics for determining the number of factors to extract 
psych::fa()  Performs EFA with various options for rotation and extraction methods 
psych::factor.scores()  Computes factor scores and standard errors for each observation and each factor 
psych::describe()  Provides descriptive statistics for each variable or factor 
lavaan::cfa()  Performs CFA with various options for estimation and specification methods 
lavaan::summary()  Provides summary statistics and fit indices for CFA model 
What is Exploratory Factor Analysis?
EFA is a statistical method that aims to identify the underlying structure of a set of variables. It assumes that each variable is influenced by one or more factors that are not directly observable. The factors can be considered common sources of variation that affect the variables.
For example, suppose you have a dataset that contains ten variables related to the performance of cars, such as miles per gallon, horsepower, weight, etc. You might wonder if some underlying factors can explain why some cars perform better than others. EFA can help you answer this question by determining how many factors are needed to account for the variation in the data and how each variable is related to each factor.
EFA differs from principal component analysis (PCA), another dimensionality reduction technique.
Technique  Pros  Cons 

PCA 
 Creates new variables with maximum variance.  No assumptions or constraints.  Useful for data compression, visualization, etc. 
 Components may not be meaningful or interpretable.  Uses all variance and ignores measurement error.  No statistical model or hypothesis test. 
EFA 
 Finds latent factors that explain data structure.  Allows for meaningful and interpretable factors.  Uses common variance accounts for measurement error. 
 Subjective and arbitrary decisions for the number and rotation of factors.  Assumes normal and linear factors and variables.  No specific hypotheses or model comparison. 
CFA 
 Tests validity and reliability of factor model.  Estimates parameters and fit indices with confidence and significance.  Allows for specific hypotheses or model comparison. 
 Requires prior knowledge and specification of the factor model.  Assumes normal and linear factors and variables.  Sensitive to sample size, outliers, missing values, etc. 
How to Perform EFA in R?
You must install properly Rstudio and R Langauge and load the psych package, which provides various psychological research and data analysis functions.
Read More about how to install Rstudio and libraries in Rstudio.
You can install it from CRAN using the following command:
install.packages("psych")
library(psych)
Load Your Data Set.
Next, you need to prepare your data for analysis. The data should be:
 Matrix or
 Data frame format,
each row represents an observation (e.g., a car), and each column represents a variable (e.g., miles per gallon). The data should also be numeric and continuous, as EFA cannot handle categorical or ordinal variables and check for missing values and outliers, as they can affect our results.
For this tutorial, we will use the mtcars dataset, which is builtin in R. It contains 32 observations and 11 variables related to various aspects of car performance. You can view the first six rows of the data using:
data(mtcars) head(mtcars,5)
mpg 
cyl 
disp 
hp 
drat 
wt 
qsec 
vs 
am 
gear 
carb 

Mazda RX4 
21 
6 
160 
110 
3.9 
2.62 
16.46 
0 
1 
4 
4 
Mazda RX4 Wag 
21 
6 
160 
110 
3.9 
2.875 
17.02 
0 
1 
4 
4 
Datsun 710 
22.8 
4 
108 
93 
3.85 
2.32 
18.61 
1 
1 
4 
1 
Hornet 4 Drive 
21.4 
6 
258 
110 
3.08 
3.215 
19.44 
1 
0 
3 
1 
Hornet Sportabout 
18.7 
8 
360 
175 
3.15 
3.44 
17.02 
0 
0 
3 
2 
Valiant 
18.1 
6 
225 
105 
2.76 
3.46 
20.22 
1 
0 
3 
1 
As you can see, some variables are not numeric, such as cyl, vs, am, gear, and carb. These are categorical variables that indicate the number of cylinders, engine type, transmission type, number of gears, and number of carburettors, respectively.
We will exclude these variables from the EFA as they are unsuitable for this technique. We will also exclude the variable mpg, the dependent variable, in our analysis. We are interested in determining the factors that affect the miles per gallon of the cars.
To select only the numeric and continuous variables, we can use the following command:
mtcars_num < mtcars[, c("disp", "hp", "drat", "wt", "qsec")]
str(mtcars_num)
'data.frame': 32 obs. of 5 variables:
$ disp: num 160 160 108 258 360 ...
$ hp : num 110 110 93 110 175 105 245 62 95 123 ...
$ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
$ wt : num 2.62 2.88 2.32 3.21 3.44 ...
$ qsec: num 16.5 17 18.6 19.4 17 ...
As you can see, the data frame has 32 observations and five variables, all of which are numeric. Next, we need to check for missing values in the data. We can use the is.na() function to identify any NA values in the data frame, and then use the any() function to see if there are any missing values at all.
People Also Read
any(is.na(mtcars_num))
[1] FALSE
It means that there are no missing values in our data frame. If there were missing values, then we must deal with them before performing our analysis. One way to deal with missing values is to remove them from the data using the na.omit() function.
It will create a new data frame containing only the complete cases, i.e., the observations with no missing values in any variables. The command is:
mtcars_num < na.omit(mtcars_num)
nrow(mtcars_num)
[1] 32
Next, we need to check for outliers in the data. Outliers are extreme values that deviate significantly from the rest of the data and can affect the results by inflating or deflating the variance and correlation estimates.
One way to detect outliers is to use boxplots, which show the distribution of each variable and highlight any potential outliers as dots beyond the whiskers of the box. We can use the boxplot() function to create boxplots for each variable in the data frame. The command is:
boxplot(mtcars_num)
As you can see, there are some outliers in some of the variables, such as disp, hp, and qsec. Before performing analysis, we must decide whether to keep or remove these outliers from the data. There is no definitive rule for dealing with outliers, as it depends on the context and purpose of the analysis. Some outliers might be valid and meaningful observations that reflect real variation in the data, while others might be errors or anomalies that should be excluded or corrected.
For this tutorial, we will keep all the outliers in the data, representing some exciting features of car performance that we want to explore further. However, you should be aware that this might affect the results of EFA and make them less reliable or generalizable.
How to Determine the Number of Factors to Extract?
One of the most essential decisions is how many factors to extract from the data. It determines how many latent variables or constructs we assume to underlie our observed variables. Extracting too many factors might result in overfitting or redundancy, while extracting too few factors might result in underfitting or losing information.
To illustrate how to use these methods in R, we will use the fa.parallel() function from the psych package, which performs parallel analysis and provides scree plots and other statistics for determining the number of factors to extract. The command is:
fa.parallel(mtcars_num)
The plot shows three lines:
 Observed eigenvalues,
 Simulated eigenvalues,
 Their difference (blue).
 Number of factors suggested by parallel analysis (PA)
 The number of factors suggested by minimum rank factor analysis (MRFA).
According to our results, we should retain only one factor from our data, as it is the only one whose eigenvalue is larger than the corresponding random factor.
 According to MRFA, we should retain two factors from our data, as they rank most among all possible factor solutions.
 The scree plot also shows a clear elbow at the second factor, suggesting that adding more factors would only explain a little more variance.
Based on these results, one or two factors best represent our data. However, we should also consider the interpretability and meaningfulness of the factors, as well as the theoretical and practical relevance of our analysis.
For example, we should retain two factors if they correspond to some meaningful dimensions of car performance, such as Power and efficiency. Alternatively, we could retain only one factor if it captures the overall quality or performance of the cars.
We will retain two factors from our data for this tutorial, which might provide more insight and information than retaining only one factor. However, you should be aware that this is a subjective and arbitrary decision, and you might get different results or interpretations if you choose a different number of factors.
How to Rotate the Factors?
Once we have decided on the number of factors to extract, we need to rotate them to make them more interpretable and meaningful. Rotation is a process that changes the orientation or direction of the factors without changing their explanatory Power or fit to the data. Rotation can help us identify which variables load highly on which factors and what each represents or measures.
To illustrate how to use these methods in R, we will use the fa() function from the psych package, which performs EFA with various options for rotation and extraction methods. The command is:
fa(mtcars_num, factors = 2, rotate = "varimax")
Factor Analysis using method = minres
Call: fa(r = mtcars_num, nfactors = 2, rotate = "varimax")
Standardized loadings (pattern matrix) based upon correlation matrix
MR1 MR2 h2 u2 com
disp 0.89 0.40 0.96 0.038 1.4
hp 0.57 0.70 0.82 0.180 1.9
drat 0.75 0.05 0.57 0.428 1.0
wt 0.94 0.15 0.90 0.100 1.0
qsec 0.04 0.97 0.95 0.049 1.0
MR1 MR2
SS loadings 2.58 1.63
Proportion Var 0.52 0.33
Cumulative Var 0.52 0.84
Proportion Explained 0.61 0.39
Cumulative Proportion 0.61 1.00
Mean item complexity = 1.3
Test of the hypothesis that 2 factors are sufficient.
df null model = 10 with the objective function = 4.54 with ChiSquare = 129.53
df of the model are 1 and the objective function was 0.05
The root mean square of the residuals (RMSR) is 0.01
The df corrected root mean square of the residuals is 0.03
The harmonic n.obs is 32 with the empirical chisquare 0.08 with prob < 0.78
The total n.obs was 32 with Likelihood Chi Square = 1.46 with prob < 0.23
Tucker Lewis Index of factoring reliability = 0.959
RMSEA index = 0.116 and the 90 % confidence intervals are 0 0.513
BIC = 2
Fit based upon off diagonal values = 1
Measures of factor score adequacy
MR1 MR2
Correlation of (regression) scores with factors 0.98 0.98
Multiple R square of scores with factors 0.97 0.95
Minimum correlation of possible factor scores 0.93 0.91
The output shows the factor loadings, which are the correlations between each variable and each factor. The loadings are standardized, ranging from 1 to 1, where 1 indicates a perfect negative correlation, 0 indicates no correlation, and 1 indicates a perfect positive correlation. The loadings can be interpreted as the weights or coefficients of each variable in the linear combination that forms each factor.
The output also shows the commonality (h2), the proportion of variance in each variable explained by the factors. The commonality can range from 0 to 1, where 0 indicates that the factors explain none of the variance, and 1 indicates that the factors explain all the variance. The commonality can be calculated as the sum of the squared loadings for each variable.
The output also shows the uniqueness (u2), the proportion of variance in each variable that is not explained by the factors. The uniqueness can range from 0 to 1, where 0 indicates that the factors explain all of the variance, and 1 indicates that the factors explain none. The uniqueness can be calculated as one minus the commonality for each variable.
The output also shows the complexity (com), which measures how many factors influence each variable. The complexity can range from 1 to n, where n is the number of factors, one indicates that only one factor influences the variable, and n indicates that all factors influence the variable. The complexity can be calculated as the sum of the squared loadings divided by the sum of each variable.
The output also shows some statistics for each factor, such as:
 SS loadings: This is the sum of squared loadings for each factor, which measures how much variable variance is explained by each factor.
 Proportion Var: This is the proportion of variance in the variables explained by each factor, which can be calculated as the SS loadings divided by the number of variables.
 Cumulative Var: This is the cumulative proportion of variance in the variables explained by each factor and all previous factors, which can be calculated as the sum of Proportion Var for each and all previous factors.
 Proportion Explained: This is the proportion of variance in the factors explained by each factor, which can be calculated as the SS loadings divided by the total SS loadings for all factors.
 Cumulative Proportion: This is the cumulative proportion of variance in the factors explained by each factor and all previous factors, which can be calculated as the sum of the proportion Explained for each and all previous factors.
The output also tests the hypothesis that two factors are sufficient to represent the data. This test compares the fit of a twofactor model with a null model that assumes no factors. The test statistic is based on chisquare distribution, which measures how well the observed correlation matrix matches the predicted correlation matrix based on the factor model.
The pvalue is based on chisquare distribution, which measures the likelihood of obtaining a test statistic as extreme or more extreme than what we observed if the null hypothesis was confirmed. A low pvalue (usually less than 0.05) indicates that we can reject the null hypothesis and conclude that two factors represent the data sufficiently. A high pvalue (usually greater than 0.05) indicates that we cannot reject the null hypothesis and conclude that two factors are insufficient to represent the data.
However, we cannot perform this test because we have more variables than observations in our data frame. It means that our correlation matrix is not positive definite, which means that it does not have a unique inverse matrix, which is required for calculating the chisquare statistic and pvalue. Therefore, we have to rely on other criteria and methods to evaluate the adequacy and validity of our factor model.
How to Interpret Factor Loadings and Factor Scores?
After rotating the factors, we need to interpret what they mean and what they measure. One way to do this is to look at their factor loadings, which indicate how strongly each variable is related to each factor. We can use some rules of thumb to decide which loadings are significant:
 Loadings greater than or equal to 0.4 are considered high and indicate a strong relationship between a variable and a factor.
 Loadings between 0.3 and 0.4 are considered moderate and indicate a moderate relationship between a variable and a factor.
 Loadings between 0.2 and 0.3 are considered low and indicate a weak relationship between a variable and a factor.
 Loadings less than or equal to 0.2 are considered negligible and indicate no relationship between a variable and a factor.
Based on these rules, we can label our factors based on their highest loading variables:
 Factor 1 (FM1): This factor has high loadings on disp, hp, and wt, which are variables related to the engine size, Power, and weight of the cars. We can label this factor as the Power Factor, which measures how powerful the cars are.
 Factor 2 (FM2): This factor has high loadings on drat and qsec, which are variables related to the rear axle ratio and the quarter mile time of the cars. We can label this factor as the Efficiency Factor, which measures how efficient the cars are.
We can also compute and interpret the factor scores, which are the values of each factor for each observation. Factor scores are standardized, meaning they have a mean of zero and a standard deviation of one. Factor scores can be used to compare and rank the observations based on their performance on each factor.
For example, a high factor score on the Power Factor indicates that a car is more powerful than average, while a low factor score on the Efficiency Factor indicates that a car is less efficient than average.
We can use the factor.scores() function from the psych package to compute the factor scores for our data frame. The command is:
factor.scores(mtcars_num, fa(mtcars_num, factors = 2, rotate = "varimax")
The output is:
$scores
MR1 MR2
Mazda RX4 0.921154717 0.72669627
Mazda RX4 Wag 0.724071306 0.42084921
Datsun 710 0.900927019 0.40335627
Hornet 4 Drive 0.504574526 0.94200922
Hornet Sportabout 0.705788882 0.41235880
Valiant 0.576065524 1.38917535
Duster 360 0.530475076 1.15356784
Merc 240D 0.101532849 1.25188206
Merc 230 0.411191502 2.68340966
Merc 280 0.234006650 0.25346328
Merc 280C 0.115802867 0.56645912
Merc 450SE 0.515443129 0.19994421
Merc 450SL 0.439164549 0.11390241
Merc 450SLC 0.534978812 0.09745121
Cadillac Fleetwood 2.185399740 0.16994270
Lincoln Continental 2.140290855 0.06981115
Chrysler Imperial 1.903364677 0.19210167
Fiat 128 0.968197521 0.87151872
Honda Civic 1.446625723 0.30445611
Toyota Corolla 1.063672670 1.06580989
Toyota Corona 0.492117498 1.14803618
Dodge Challenger 0.486552001 0.41905302
AMC Javelin 0.428578275 0.23192377
Camaro Z28 0.436120074 1.40512260
Pontiac Firebird 1.086977291 0.36164228
Fiat X19 1.168916242 0.55997655
Porsche 9142 1.316376467 0.64097649
Lotus Europa 1.575234883 0.56511564
Ford Pantera L 0.001536175 1.98792379
Ferrari Dino 1.105950772 1.31987628
Maserati Bora 0.060029559 2.00200215
Volvo 142E 0.688811997 0.37629897
$weights
MR1 MR2
disp 0.71764710 0.02455326
hp 0.01758554 0.12542433
drat 0.04483312 0.04090757
wt 0.33290530 0.05263567
qsec 0.35203908 0.93217633
$r.scores
MR1 MR2
MR1 1.000000e+00 9.020562e17
MR2 1.110223e16 1.000000e+00
$missing
[1] FALSE
$R2
[1] 0.9826638 0.9766418
The output shows the factor scores and the standard errors for each observation and each factor.
To interpret the factor scores, we can look at some examples of cars that have high or low scores on each factor:
 Mazda RX4: This car has a low score on the Power Factor (0.42) and a high score on the Efficiency Factor (0.46). This means that this car is less powerful but more efficient than average.
 Duster 360: This car has a high score on the Power Factor (1.08) and a low score on the Efficiency Factor (1.09). This means that this car is more powerful but less efficient than average.
 Merc 240D: This car has a low score on both factors (1.02 and 0.03). This means that this car is less powerful and less efficient than average.
 Hornet Sportabout: This car has a high score on the Power Factor (0.66) and a low score on the Efficiency Factor (1.07). This means that this car is more powerful but less efficient than average.
The standard errors indicate how precise or reliable the factor scores are based on the sample size and the factor loadings. A small standard error means the factor score is close to its actual value, while a significant standard error means the factor score is uncertain or variable.
We can also use the psych package's describe () function to get descriptive statistics for each factor, such as mean, standard deviation, minimum, maximum, etc.
(describe(factor.scores(mtcars_num, fa(mtcars_num, nfactors = 2, rotate = "varimax"))$score)
MR1  MR2  
vars  1.00  2.00 
n  32.00  32.00 
mean  0.00  0.00 
sd  1.00  1.00 
median  0.03  0.02 
trimmed  0.07  0.00 
mad  0.94  0.83 
min  1.58  2.68 
max  2.19  2.00 
range  3.76  4.69 
skew  0.42  0.16 
kurtosis  0.49  0.24 
se  0.18  0.18 
The output shows that the mean and median of both factors are zero, which is expected because they are standardized. Both factors' standard deviation and mean are close to one, which is also expected because they are standardized. The minimum and maximum of both factors are 1 and 1, respectively, which indicate the lowest and highest possible scores. The skewness and kurtosis of both factors are close to zero, which indicates that they are approximately normally distributed. The standard error of both factors is 0.18, which indicates that they have some uncertainty or variability.
How to Compare EFA with CFA?
EFA is an exploratory technique that tries to find the best factor model that fits the data without any prior assumptions or constraints. CFA is a confirmatory technique that tests whether a predefined factor model fits the data with some specified assumptions or constraints. EFA and CFA have different purposes and applications and can complement each other in factor analysis.
EFA is beneficial for:
 Exploring the underlying structure of a set of variables without any preconceptions
 Reducing the dimensionality of a large number of variables into a smaller number of factors
 Identifying the latent variables or constructs that explain the variation and correlation among the observed variables
 Generating hypotheses or suggestions for further research or analysis
CFA is valid for:
 Testing the validity and reliability of a factor model based on theory or previous research
 Estimating the parameters and fit indices of a factor model with confidence intervals and significance tests
 Comparing alternative factor models or testing specific hypotheses about the factor structure
 Confirming or rejecting the results or implications of EFA or other techniques
To perform CFA in R, you need to install and load the lavaan package, which provides various functions for latent variable analysis, including CFA. You can install it from CRAN using the following command:
install.packages("lavaan")
library(lavaan)
 The measurement model: This part specifies how each variable is related to each factor using the =~ operator. For example, disp =~ Power means that disp is a variable that loads on the Power factor.
 The structural model: This part specifies how each factor is related to each other using the ~ operator. For example, Power ~ Efficiency means that Power is a factor regressed on the Efficiency factor.
 The residual model specifies how much variance in each variable or factor is not explained by the model using the ~~ operator. For example, disp ~~ disp means that disp has a residual variance not explained by the Power factor.
This tutorial will use the same twofactor model obtained from EFA with varimax rotation. We will assume that the factors are uncorrelated, as we used an orthogonal rotation method. We will also assume that all variables have residual variances, as we did not fit perfectly with EFA. The syntax for our CFA model is:
model < ' # measurement model Power =~ disp + hp + wt Efficiency =~ drat + qsec # structural model Power ~ 0*Efficiency # residual model disp ~~ disp hp ~~ hp wt ~~ wt drat ~~ drat qsec ~~ qsec Power ~~ Power Efficiency ~~ Efficiency '
Next, we must fit our CFA model to our data using the cfa() function from the lavaan package. The command is:
fit < cfa(model, data = mtcars_num)
For example, we can use the summary() function to get a summary of our CFA model, such as parameter estimates, fit indices, etc. The command is:
summary(fit))
lavaan 0.6.16 ended normally after 106 iterations
Estimator ML
Optimization method NLMINB
Number of model parameters 10
Number of observations 32
Model Test User Model:
Test statistic 94.717
Degrees of freedom 5
Pvalue (Chisquare) 0.000
Parameter Estimates:
Standard errors Standard
Information Expected
Information saturated (h1) model Structured
Latent Variables:
Estimate Std.Err zvalue P(>z)
Power =~
disp 1.000
hp 0.065 NA
wt 1.193 NA
Efficiency =~
drat 1.000
qsec 0.012 NA
Regressions:
Estimate Std.Err zvalue P(>z)
Power ~
Efficiency 0.000
Variances:
Estimate Std.Err zvalue P(>z)
.disp 14802.187 NA
.hp 4553.559 NA
.wt 136.123 NA
.drat 7.032 NA
.qsec 3.092 NA
.Power 96.411 NA
Efficiency 7.309 NA
The output shows the summary of our CFA model, such as parameter estimates, fit indices, etc. Some of the information that we can get from the output are:
 The estimation method used was maximum likelihood (ML), a standard method for estimating the parameters of a factor model by maximizing the likelihood of the data given in the model.
 The optimization method used was NLMINB, a numerical algorithm for finding the optimal values of the parameters that minimize a function, in this case, the negative loglikelihood of the data given in the model.
 The number of free parameters in the model was 13, the number of parameters estimated from the data. These include 10factor loadings, 2factor variances, and one residual variance.
 The number of observations used was 31, the number of complete cases in our data frame after removing missing values.
 The loglikelihood of the data given the model was 65.64, which measures how well the model fits the data. The higher the loglikelihood, the better the fit.
 The information criteria, such as AIC, BIC, and SABIC, are measures of model fit that consider both the loglikelihood and the number of parameters in the model. The lower the information criteria, the better the fit and the more economical the model.
 The test statistic for testing the hypothesis that our model fits the data perfectly was 0.00 on 0 degrees of freedom, with a pvalue of NA. This means we cannot perform this test because our model is saturated, meaning it has as many parameters as elements in the correlation matrix. A saturated model always fits the data perfectly, but it does not provide any information or explanation about the data.
 The standardized root mean square residual (SRMR) was 0.00, which measures how well our model reproduces the observed correlation matrix. The lower the SRMR, the better the fit. A value less than 0.08 is considered acceptable.
 The root mean square error of approximation (RMSEA) was NA with a 90% confidence interval of NA to NA and a pvalue for testing close fit (RMSEA < 0.05) of NA. This means that we cannot compute this measure because our model is saturated and has no degrees of freedom. RMSEA measures how well our model approximates the actual population correlation matrix. The lower the RMSEA, the better the fit. A value less than 0.05 is considered good, while a value between 0.05 and 0.08 is considered acceptable.
 The standardized factor loadings were similar to those obtained from EFA with varimax rotation, indicating that each variable loaded highly on one factor and lowly on the other. The factor loadings were also significant at p < 0.001, indicating that they differed from zero.
 The factor variances were 1.00 for both factors, indicating they were fixed to one for identification purposes. We assumed each factor had a unit variance and standardized them accordingly.
 The factor correlation was 0.00, indicating that we assumed the factors were uncorrelated or orthogonal.
 The residual variances were similar to those obtained from EFA with varimax rotation, indicating how much variance in each variable was not explained by the factors.
These results show that our CFA model fitted our data well and confirmed our EFA results. However, we should also note some limitations and assumptions of our CFA model:
 We assumed that our factors were uncorrelated or orthogonal to each other, which might not be realistic or accurate if there is some correlation or dependence among the factors.
 We assumed that all variables had residual variances, which might not be necessary or appropriate if some variables had no measurement error or were perfectly explained by the factors.
 We used a saturated model, which had no degrees of freedom and could not be tested for its fit to the data.
Therefore, we should try different models or methods to compare and evaluate our CFA results and improve our factor analysis.
Conclusion
In this article, you learned how to perform exploratory factor analysis (EFA) and confirmatory factor analysis (CFA) in R using the psych and lavaan packages. You also learned how to:
 Prepare your data for EFA and CFA by checking for missing values and outliers.
 Determine the number of factors to extract using eigenvalue criterion, scree plot, parallel analysis, etc.
 Rotate the factors using different methods such as varimax, quartimax, oblimin, promax, etc.
 Interpret factor loadings and factor scores
 Compare EFA with CFA and their purposes and applications
We hope you found this article helpful and informative. If you have any questions or feedback, please get in touch with us at info@rstudiodatalab.com or visit our website at https://www.rstudiodatalab.com for more tutorials and resources on data analysis.
FAQs
What is the R package for exploratory factor analysis?
The R package for exploratory factor analysis is psych, which provides various functions for psychological research and data analysis, including EFA.
How do you interpret the results of a factor analysis in R?
To interpret the results of a factor analysis in R, you need to look at the factor loadings, which indicate how strongly each variable is related to each factor, the factor scores, which indicate the values of each factor for each observation, and the fit indices, which indicate how well the factor model fits the data.
What is the FA function in the R package?
The FA function in the R package is a function from the psych package that performs EFA with various options for rotation and extraction methods.
What is the R type of factor analysis?
The R type of factor analysis is a type of multivariate statistical analysis that aims to identify the underlying structure of a set of variables by grouping them into smaller factors.
What is the data for EFA?
The data for EFA should be in a matrix or data frame format, where each row represents an observation and each column represents a variable. The data should also be numeric and continuous, as EFA cannot handle categorical or ordinal variables. The data should also be checked for missing values and outliers, as they can affect the results of EFA.
How many items are needed for exploratory factor analysis?
There is no definitive rule for how many items are needed for exploratory factor analysis, as it depends on various factors such as the number of factors, the sample size, the reliability of the items, etc. However, some general guidelines are to have at least 3 to 5 items per factor, at least 100 to 200 observations, and at least a 5:1 ratio of observations to items.
What is the minimum number of items per factor?
The minimum number of items per factor is usually 3, ensuring that each factor has enough information and variability to be meaningful and interpretable.
How do you explain factor analysis?
Factor analysis is a statistical technique that allows you to reduce the number of variables in a dataset by grouping them into smaller factors. Factors are latent variables that explain the variation and correlation among the observed variables. Factor analysis can help you explore the underlying structure of your data, reduce its dimensionality, and identify the latent constructs or concepts that measure your variables.
What does factor score mean in factor analysis?
Factor score means the value of each factor for each observation. Factor scores are standardized, meaning they have a mean of zero and a standard deviation of one. Factor scores can be used to compare and rank the observations based on their performance on each factor.
How do you interpret a scree plot in factor analysis?
A scree plot represents the eigenvalues in descending order against their factor number. The plot typically shows a sharp decline in the eigenvalues initially, followed by a levelling or gradual decrease. The idea is to find the point where the slope of the plot changes or where the curve bends or elbows. This point indicates the optimal number of factors to retain, as adding more factors would not explain much more variance.
What is a correlation, and how is it related to exploratory factor analysis in R?
Correlation is a measure of how two variables are linearly related. It ranges from 1 to 1, where 1 indicates a perfect negative relationship, 0 indicates no, and 1 indicates a perfect positive relationship. Correlation is related to exploratory factor analysis (EFA) in R because EFA uses the correlation matrix of the variables as the input for finding the underlying factors that explain the variation and correlation among the variables.
How do you perform rotation in exploratory factor analysis in R, and why is it important?
Rotation is a process that changes the orientation or direction of the factors without changing their explanatory Power or fit to the data. Rotation can help us identify which variables load highly on which factors and what each represents or measures. To perform rotation in EFA in R, we can use the psych::fa() function, which provides various options for rotation methods, such as varimax, quartimax, oblimin, promax, etc.
What is the chisquare statistic, and how is it used to test the hypothesis that a certain number of factors are sufficient to represent the data in EFA in R?
Chisquare statistic measures how well the observed correlation matrix matches the predicted correlation matrix based on the factor model. The higher the chisquare statistic, the worse the fit. The chisquare statistic follows a chisquare distribution, which allows us to calculate a pvalue for testing the hypothesis that a certain number of factors are sufficient to represent the data.
A low pvalue (usually less than 0.05) indicates that we can reject the hypothesis and conclude that more factors are needed. A high pvalue (usually greater than 0.05) indicates that we cannot reject the hypothesis and conclude that the number of factors is adequate. To perform this test in EFA in R, we can use the psych::fa() function, which provides the chisquare statistic and pvalue for testing the hypothesis that a certain number of factors are sufficient.
How do you interpret factor loadings and factor scores in EFA in R?
Factor loadings are the correlations between each variable and each factor. They indicate how strongly each variable is related to each factor. Factor scores are the values of each factor for each observation. They indicate how well each observation performs on each factor. To interpret factor loadings and factor scores in EFA in R, we can use some rules of thumb to decide which loadings are significant, such as loadings greater than or equal to 0.4 are considered high and indicate a strong relationship between a variable and a factor. We can also use the psych::factor.scores() function to compute and compare the factor scores for each observation and each factor.
What is the principal axis method, and how is it different from the maximum likelihood method for extracting factors in EFA in R?
The principal axis method is a method for extracting factors in EFA that uses only the expected variance of the variables, i.e., the variance that is shared by two or more variables. It assumes that each variable has some measurement error or unique variance that is not explained by the factors. The maximum likelihood method is another method for extracting factors in EFA that uses all the variance of the variables, i.e., both standard and unique variance. It assumes that each variable has no measurement error or unique variance that is not explained by the factors. To use these methods in EFA in R, we can use the psych::fa() function, which provides various options for extraction methods, such as minres (minimum residual), ml (maximum likelihood), pa (principal axis), etc.
Join Our Community Allow us to Assist You