Brief explanation

Structural Equation Modelling (SEM) is a state of art methodology and fulfills much of broader discusion about statistical modeling, and allows to make inference and causal analysis. SEM models are regression models broadly used in Marketing, Human Resources, Biostatistics and Medicine, revealing their flexibility as analytical tool.

Despite being a state-of-the-art methodology, SEM does not create new developments of statistical theory, since it consists in a combination of multivariate analysis, factor analysis and regression analysis. SEM makes it possible to validate or reject one or more hypothesis about an existing relation between different variables, to estimate simultaneously dependency relations between variables and estimate measurement error in model’s variables. SEM may be understood as a net of unidirectional or bidirectional paths linking different variables. Such net can be validated using multivariate data.

Confirmatory Factor Analysis

A usual methodology for model evaluation is Confirmatory Factor Analysis (CFA) that is a particular case of SEM. It is a process which consists in specifying quantity and kinds of observed variables to one or more latent variables and analyze how well those variables measure the latent variable itself. Think of a latent variable as an artificial variable that is represented as a linear combination of observed variables.

Holzinger and Swineford (1939) proposed one the most famous CFA models. In their work they analysed the mental ability based test scores of children from two different schools (details here)

We can represent Holzinger and Swineford’s model in a diagram: In this model, $$\newcommand{\R}{\mathbb{R}} x_i$$ are exogenous variables that record scores and $$y_i$$ are latent (and endogenous) variables that represent visual ability, textual ability and speed ability respectively, double-headed arrows represent an association between $$y_1$$, $$y_2$$, and $$y_3$$, single-headed arrows represent direct effects, and $$\delta_i$$ represent error terms.

A factor loading is a measurable effect which reflects the incidence of an observed variable over another either observed or latent variable. Interpreting factor loadings is equivalent to interpreting direct effects in a linear econometric model. In this example factor loadings are well defined as there are no missing relations between variables, is known which observed variables have an effect over defined latent variables, and both latent variables covariate so there is a relation between latent variables.

Model evaluation

When I wrote my thesis I had to study SEM’s goodness of fit and its indicators. The interested reader may visit my Github repo to read more about some important linear algebra aspects of SEM but here I present a table that synthesizes 50% of my thesis:

Indicator Type Classification Direction Ideal values Normed (0-1)
RMSEA significance adjusted less is better $$\leq 0,05$$ no
SMSR significance unadjusted less is better $$<0,1$$ no
GFI significance unadjusted more is better $$>0,9$$ yes

This table is useful for a cheatsheet and to keep in mind what to look when comparing models.

Using Lavaan to obtain parameters estimates

I highly recommend using lavaan. I have used AMOS and Stata but I guess R’s flexibility is unbeatable at the moment. Lavaan also provides HolzingerSwineford1939 dataset that contains the complete recordings made by Holzinger and Swineford.

First I load the needed packages to perform the analysis:

# Needed packages to perform the analysis
new.pkg <- pkg[!(pkg %in% installed.packages()[, "Package"])]
if (length(new.pkg))
install.packages(new.pkg, dependencies = TRUE)
sapply(pkg, require, character.only = TRUE)
}

packages <- c("lavaan","semPlot","semTools","nonnest2","htmlTable")

Then I fit the model using the cfa() function:

# 1.2. Specify the model
HS.model <- ' visual  =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed   =~ x7 + x8 + x9 '

# 1.3. Fit the model
fit <- cfa(HS.model, data = HolzingerSwineford1939)

Then I obtain a model summary, confidence intervals and goodness of fit indicators:

# 2.1. Display summary output
summary(fit, fit.measures = FALSE)
lavaan 0.6-4 ended normally after 35 iterations

Optimization method                           NLMINB
Number of free parameters                         21

Number of observations                           301

Estimator                                         ML
Model Fit Test Statistic                      85.306
Degrees of freedom                                24
P-value (Chi-square)                           0.000

Parameter Estimates:

Information                                 Expected
Information saturated (h1) model          Structured
Standard Errors                             Standard

Latent Variables:
Estimate  Std.Err  z-value  P(>|z|)
visual =~
x1                1.000
x2                0.554    0.100    5.554    0.000
x3                0.729    0.109    6.685    0.000
textual =~
x4                1.000
x5                1.113    0.065   17.014    0.000
x6                0.926    0.055   16.703    0.000
speed =~
x7                1.000
x8                1.180    0.165    7.152    0.000
x9                1.082    0.151    7.155    0.000

Covariances:
Estimate  Std.Err  z-value  P(>|z|)
visual ~~
textual           0.408    0.074    5.552    0.000
speed             0.262    0.056    4.660    0.000
textual ~~
speed             0.173    0.049    3.518    0.000

Variances:
Estimate  Std.Err  z-value  P(>|z|)
.x1                0.549    0.114    4.833    0.000
.x2                1.134    0.102   11.146    0.000
.x3                0.844    0.091    9.317    0.000
.x4                0.371    0.048    7.779    0.000
.x5                0.446    0.058    7.642    0.000
.x6                0.356    0.043    8.277    0.000
.x7                0.799    0.081    9.823    0.000
.x8                0.488    0.074    6.573    0.000
.x9                0.566    0.071    8.003    0.000
visual            0.809    0.145    5.564    0.000
textual           0.979    0.112    8.737    0.000
speed             0.384    0.086    4.451    0.000
# 2.2. Obtain confidence intervals for the estimated coefficients
parameterEstimates(fit, ci = TRUE, level = 0.95)
lhs op     rhs   est    se      z pvalue ci.lower ci.upper
1   visual =~      x1 1.000 0.000     NA     NA    1.000    1.000
2   visual =~      x2 0.554 0.100  5.554      0    0.358    0.749
3   visual =~      x3 0.729 0.109  6.685      0    0.516    0.943
4  textual =~      x4 1.000 0.000     NA     NA    1.000    1.000
5  textual =~      x5 1.113 0.065 17.014      0    0.985    1.241
6  textual =~      x6 0.926 0.055 16.703      0    0.817    1.035
7    speed =~      x7 1.000 0.000     NA     NA    1.000    1.000
8    speed =~      x8 1.180 0.165  7.152      0    0.857    1.503
9    speed =~      x9 1.082 0.151  7.155      0    0.785    1.378
10      x1 ~~      x1 0.549 0.114  4.833      0    0.326    0.772
11      x2 ~~      x2 1.134 0.102 11.146      0    0.934    1.333
12      x3 ~~      x3 0.844 0.091  9.317      0    0.667    1.022
13      x4 ~~      x4 0.371 0.048  7.779      0    0.278    0.465
14      x5 ~~      x5 0.446 0.058  7.642      0    0.332    0.561
15      x6 ~~      x6 0.356 0.043  8.277      0    0.272    0.441
16      x7 ~~      x7 0.799 0.081  9.823      0    0.640    0.959
17      x8 ~~      x8 0.488 0.074  6.573      0    0.342    0.633
18      x9 ~~      x9 0.566 0.071  8.003      0    0.427    0.705
19  visual ~~  visual 0.809 0.145  5.564      0    0.524    1.094
20 textual ~~ textual 0.979 0.112  8.737      0    0.760    1.199
21   speed ~~   speed 0.384 0.086  4.451      0    0.215    0.553
22  visual ~~ textual 0.408 0.074  5.552      0    0.264    0.552
23  visual ~~   speed 0.262 0.056  4.660      0    0.152    0.373
24 textual ~~   speed 0.173 0.049  3.518      0    0.077    0.270
# 2.3 Obtain goodness of fit indicators of the model
fitMeasures(fit, c("chisq", "rmsea", "srmr", "gfi", "ecvi"))
chisq  rmsea   srmr    gfi   ecvi
85.306  0.092  0.065  0.943  0.423
reliability(fit)
visual   textual     speed     total
alpha  0.6261171 0.8827069 0.6884550 0.7604886
omega  0.6253180 0.8851754 0.6877600 0.8453351
omega2 0.6253180 0.8851754 0.6877600 0.8453351
omega3 0.6120052 0.8850608 0.6858417 0.8596204
avevar 0.3705589 0.7210163 0.4244883 0.5145874

Exporting lavaan output to tables

Lavaan produces a lot of potential pieces of output, but I suggest that at minimum that you export the above model summary, confidence intervals and goodness of fit indicators as part of your final tables:

htmlTable(txtRound(parameterEstimates(fit, ci = TRUE, level = 0.95), digits = 3, excl.cols = c(1,3)), align = "l|r|r|r|r|r|r|r|r|r")
lhs op rhs est se z pvalue ci.lower ci.upper
1 visual =~ x1 1.000 0.000 1.000 1.000
2 visual =~ x2 0.554 0.100 5.554 2.798 0.358 0.749
3 visual =~ x3 0.729 0.109 6.685 2.313 0.516 0.943
4 textual =~ x4 1.000 0.000 1.000 1.000
5 textual =~ x5 1.113 0.065 17.014 0e+00 0.985 1.241
6 textual =~ x6 0.926 0.055 16.703 0e+00 0.817 1.035
7 speed =~ x7 1.000 0.000 1.000 1.000
8 speed =~ x8 1.180 0.165 7.152 8.564 0.857 1.503
9 speed =~ x9 1.082 0.151 7.155 8.398 0.785 1.378
10 x1 ~~ x1 0.549 0.114 4.833 1.344 0.326 0.772
11 x2 ~~ x2 1.134 0.102 11.146 0e+00 0.934 1.333
12 x3 ~~ x3 0.844 0.091 9.317 0e+00 0.667 1.022
13 x4 ~~ x4 0.371 0.048 7.779 7.327 0.278 0.465
14 x5 ~~ x5 0.446 0.058 7.642 2.132 0.332 0.561
15 x6 ~~ x6 0.356 0.043 8.277 2.220 0.272 0.441
16 x7 ~~ x7 0.799 0.081 9.823 0e+00 0.640 0.959
17 x8 ~~ x8 0.488 0.074 6.573 4.923 0.342 0.633
18 x9 ~~ x9 0.566 0.071 8.003 1.110 0.427 0.705
19 visual ~~ visual 0.809 0.145 5.564 2.640 0.524 1.094
20 textual ~~ textual 0.979 0.112 8.737 0e+00 0.760 1.199
21 speed ~~ speed 0.384 0.086 4.451 8.533 0.215 0.553
22 visual ~~ textual 0.408 0.074 5.552 2.818 0.264 0.552
23 visual ~~ speed 0.262 0.056 4.660 3.168 0.152 0.373
24 textual ~~ speed 0.173 0.049 3.518 4.346 0.077 0.270
htmlTable(txtRound(reliability(fit),3), align = "l|r|r|r|r")
visual textual speed total
alpha 0.626 0.883 0.688 0.760
omega 0.625 0.885 0.688 0.845
omega2 0.625 0.885 0.688 0.845
omega3 0.612 0.885 0.686 0.860
avevar 0.371 0.721 0.424 0.515

There are other packages to convert R output to tables such as knitr and xtables that can also export to $$\rm\LaTeX$$.

Creating a diagram for the model in R

Now let’s get on to producing the exciting part of an SEM analysis - the model diagram. This code might not be good-looking but the result is!

semPaths(fit, style = "lisrel",
whatLabels = "std", edge.label.cex = .6, node.label.cex = .6,
label.prop = 0.9, edge.label.color = "black", rotation = 4,
equalizeManifests = FALSE, optimizeLatRes = TRUE, node.width = 1.5,
edge.width = 0.5, shapeMan = "rectangle", shapeLat = "ellipse",
shapeInt = "triangle", sizeMan = 4, sizeInt = 2, sizeLat = 4,
curve = 2, unCol = "#070b8c") If you are running a script instead of a R Markdown document, you can save the plot as svg with these lines:

svg("cfa_example_2.svg")
semPaths(fit, style = "lisrel",
whatLabels = "std", edge.label.cex = .6, node.label.cex = .6,
label.prop = 0.9, edge.label.color = "black", rotation = 4,
equalizeManifests = FALSE, optimizeLatRes = TRUE, node.width = 1.5,
edge.width = 0.5, shapeMan = "rectangle", shapeLat = "ellipse",
shapeInt = "triangle", sizeMan = 4, sizeInt = 2, sizeLat = 4,
curve = 2, unCol = "#070b8c")
#dev.off() # uncomment if you run a script but there's no need to use this in R Markdown

Appendix

If you are curious about the first diagram, I made it with TikZ. This is the code

\documentclass[border=3mm]{standalone}
\usepackage[applemac]{inputenc}
\usepackage[protrusion=true,expansion=true]{microtype}
\usepackage[bb=lucida,bbscaled=1,cal=boondoxo]{mathalfa}
\usepackage[stdmathitalics=true,math-style=iso,lucidasmallscale=true,romanfamily=bright]{lucimatx}
\usepackage{tikz}
\usetikzlibrary{arrows,positioning}
\newcommand{\at}{\makeatletter @\makeatother}
\begin{document}
\begin{tikzpicture}[auto,node distance=.5cm,
latent/.style={fill=red!20,circle,draw, thick,inner sep=0pt,minimum size=8mm,align=center},
observed/.style={fill=blue!20,rectangle,draw, thick,inner sep=0pt,minimum width=8mm,minimum height=8mm,align=center},
error/.style={fill=yellow!20,circle,draw, thick,inner sep=0pt,minimum width=8mm,minimum height=8mm,align=center},
paths/.style={->,  thick, >=stealth'},
paths2/.style={<-,  thick, >=stealth'},
twopaths/.style={<->,  thick, >=stealth'}
]

%Define observed variables
\node [latent] (y1) at (0,0) {$y_1$};
\node [latent] (y2) [below=3cm of y1]  {$y_2$};
\node [latent] (y3) [below=3cm of y2]  {$y_3$};

%%%

\node [observed] (x2) [left=1.8cm of y1]  {$x_2$};
\node [observed] (x1) [above=of x2]  {$x_1$};
\node [observed] (x3) [below=of x2]  {$x_3$};

\node [error] (ex1) [left=0.5cm of x1]  {$\delta_1$};
\node [error] (ex2) [left=0.5cm of x2]  {$\delta_2$};
\node [error] (ex3) [left=0.5cm of x3]  {$\delta_3$};

%%%

\node [observed] (x5) [left=1.8cm of y2]  {$x_5$};
\node [observed] (x4) [above=of x5]  {$x_4$};
\node [observed] (x6) [below=of x5]  {$x_6$};

\node [error] (ex4) [left=0.5cm of x4]  {$\delta_4$};
\node [error] (ex5) [left=0.5cm of x5]  {$\delta_5$};
\node [error] (ex6) [left=0.5cm of x6]  {$\delta_6$};

%%%

\node [observed] (x8) [left=1.8cm of y3]  {$x_8$};
\node [observed] (x7) [above=of x8]  {$x_7$};
\node [observed] (x9) [below=of x8]  {$x_9$};

\node [error] (ex7) [left=0.5cm of x7]  {$\delta_7$};
\node [error] (ex8) [left=0.5cm of x8]  {$\delta_8$};
\node [error] (ex9) [left=0.5cm of x9]  {$\delta_9$};

% Draw paths form latent to observed variables

\foreach \all in {x1,x2,x3}{
\draw [paths] (y1.west) to node {} (\all.east);
}

\draw [paths] (x1.west) to (ex1.east);
\draw [paths] (x2.west) to (ex2.east);
\draw [paths] (x3.west) to (ex3.east);

%%%

\foreach \all in {x4,x5,x6}{
\draw [paths] (y2.west) to node {} (\all.east);
}

\draw [paths] (x4.west) to (ex4.east);
\draw [paths] (x5.west) to (ex5.east);
\draw [paths] (x6.west) to (ex6.east);

%%%

\foreach \all in {x7,x8,x9}{
\draw [paths] (y3.west) to node {} (\all.east);
}

\draw [paths] (x7.west) to (ex7.east);
\draw [paths] (x8.west) to (ex8.east);
\draw [paths] (x9.west) to (ex9.east);

%%%

\draw [twopaths] (y1.east) to [bend left=90] (y2.east);
\draw [twopaths] (y2.east) to [bend left=90] (y3.east);

\draw [twopaths] (y1.east) to [bend left=90] (y3.east);

\end{tikzpicture}
\end{document}

and then I ran pdf2svg cfa_example.pdf cfa_example.svg.