13.1 psych包提供的相关函数和处理步骤
13.1.1 概念
1.主成分分析(PCA):一种数据降维技巧,可以将大量相关变量转为一组很少的不相关变量,这些无关变量成为主成分。
在PCA中,各主成分(eg:PC1和PC2)既要求所解释的方差最大化,又要使各主成分之间不相关。
主成分相当于变量的果。
2.探索性因子分析(EFA):是一系列用来发现一组变量的潜在结构的方法。
因子(F1,F2)则被当作观测变量的结构基础或”因”。因子间可以有相关性,但也不是必需的。
13.1.2 psych包提供的相关函数
备注:
载荷矩阵:
在因子分析或主成分分析(PCA)中,载荷矩阵(loading matrix)是一种重要的输出结果。载荷矩阵显示了原始变量与提取的因子之间的关系,它描述了每个变量对于每个因子的贡献程度或权重。
在因子分析中,载荷矩阵显示了变量与因子之间的因子载荷(factor loading)。因子载荷是一个标准化的系数,表示变量与因子之间的线性关系强度。它衡量了变量对于因子的贡献程度,数值范围通常在-1到1之间。正的载荷表示变量与因子之间的正相关关系,负的载荷表示变量与因子之间的负相关关系,而接近于0的载荷表示变量与因子之间的关系较弱。
在PCA中,载荷矩阵显示了原始变量与主成分之间的相关性。主成分是原始变量的线性组合,它们按照解释方差的大小排列。载荷矩阵的元素表示原始变量与主成分之间的相关性,它们是标准化的系数。
13.1.3 处理步骤
1.数据预处理:1)使用原始数据矩阵或相关系数矩阵,若输入的是原始数据矩阵,则相关系数矩阵会被自动计算。2)不可以有缺失值,分析前确保无缺失值。3)删除相关性时,psych包默认成对删除。
2.判断:判断是PCA还是因子分析。
2.1. 如果是因子分析,则需要选择因子模型(eg:最大似然估计)。
3.判断要选择的主成分/因子数目(碎石图)。
4.选择主成分/因子。
5.旋转主成分/因子。
6.解释结果。
7.计算主成分或因子得分。
13.2 主成分分析
13.2.1 判断要选择的主成分数目
13.2.1.1 语法
library(psych)
fa.parallel(r, n.obs =, fa = , n.iter = 100, alpha = 0.05)
abline(h=1)
根据100个随机数据矩阵推导出来的特征值均值(虚线)和大于1的特征值准则(y=1水平实线)上只有一个符号X的个数就是需要的主成分的数目。
r:一个包含数据的矩阵或数据框。
n.obs(直接不写):观测数据的样本大小。如果不提供此参数,函数将自动从数据中推断样本大小。
**fa:指定用于PCA或因子分析的函数。一般写pc
1)“pc”:主成分分析
2)“both”:同时出现因子和PCA的分析结果。
n.iter:用于计算平行分析的随机数据集数量,一般写100。
alpha:显著性水平,用于确定因子的显著性。
13.2.1.2 举例1
用数据集USJudgeRatings举例,该数据集包含了律师对美国高等法院法官的评分,包含43个观测值,12个变量。
按照1.3所述的处理步骤进行:
1.数据预处理,检查有无缺失值:
is.na(USJudgeRatings)
## CONT INTG DMNR DILG CFMG DECI PREP FAMI ORAL WRIT
## AARONSON,L.H. FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## ALEXANDER,J.M. FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## ARMENTANO,A.J. FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## BERDON,R.I. FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## BRACKEN,J.J. FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## BURNS,E.B. FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## CALLAHAN,R.J. FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## COHEN,S.S. FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## DALY,J.J. FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## DANNEHY,J.F. FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## DEAN,H.H. FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## DEVITA,H.J. FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## DRISCOLL,P.J. FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## GRILLO,A.E. FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## HADDEN,W.L.JR. FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## HAMILL,E.C. FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## HEALEY.A.H. FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## HULL,T.C. FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## LEVINE,I. FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## LEVISTER,R.L. FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## MARTIN,L.F. FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## MCGRATH,J.F. FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## MIGNONE,A.F. FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## MISSAL,H.M. FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## MULVEY,H.M. FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## NARUK,H.J. FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## O'BRIEN,F.J. FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## O'SULLIVAN,T.J. FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## PASKEY,L. FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## RUBINOW,J.E. FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## SADEN.G.A. FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## SATANIELLO,A.G. FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## SHEA,D.M. FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## SHEA,J.F.JR. FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## SIDOR,W.J. FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## SPEZIALE,J.A. FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## SPONZO,M.J. FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## STAPLETON,J.F. FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## TESTO,R.J. FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## TIERNEY,W.L.JR. FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## WALL,R.A. FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## WRIGHT,D.B. FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## ZARRILLI,K.J. FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## PHYS RTEN
## AARONSON,L.H. FALSE FALSE
## ALEXANDER,J.M. FALSE FALSE
## ARMENTANO,A.J. FALSE FALSE
## BERDON,R.I. FALSE FALSE
## BRACKEN,J.J. FALSE FALSE
## BURNS,E.B. FALSE FALSE
## CALLAHAN,R.J. FALSE FALSE
## COHEN,S.S. FALSE FALSE
## DALY,J.J. FALSE FALSE
## DANNEHY,J.F. FALSE FALSE
## DEAN,H.H. FALSE FALSE
## DEVITA,H.J. FALSE FALSE
## DRISCOLL,P.J. FALSE FALSE
## GRILLO,A.E. FALSE FALSE
## HADDEN,W.L.JR. FALSE FALSE
## HAMILL,E.C. FALSE FALSE
## HEALEY.A.H. FALSE FALSE
## HULL,T.C. FALSE FALSE
## LEVINE,I. FALSE FALSE
## LEVISTER,R.L. FALSE FALSE
## MARTIN,L.F. FALSE FALSE
## MCGRATH,J.F. FALSE FALSE
## MIGNONE,A.F. FALSE FALSE
## MISSAL,H.M. FALSE FALSE
## MULVEY,H.M. FALSE FALSE
## NARUK,H.J. FALSE FALSE
## O'BRIEN,F.J. FALSE FALSE
## O'SULLIVAN,T.J. FALSE FALSE
## PASKEY,L. FALSE FALSE
## RUBINOW,J.E. FALSE FALSE
## SADEN.G.A. FALSE FALSE
## SATANIELLO,A.G. FALSE FALSE
## SHEA,D.M. FALSE FALSE
## SHEA,J.F.JR. FALSE FALSE
## SIDOR,W.J. FALSE FALSE
## SPEZIALE,J.A. FALSE FALSE
## SPONZO,M.J. FALSE FALSE
## STAPLETON,J.F. FALSE FALSE
## TESTO,R.J. FALSE FALSE
## TIERNEY,W.L.JR. FALSE FALSE
## WALL,R.A. FALSE FALSE
## WRIGHT,D.B. FALSE FALSE
## ZARRILLI,K.J. FALSE FALSE
结果显示没有缺失值。
2.这里选择PCA分析
3.判断需要主成分的数目
library(psych)
fa.parallel(USJudgeRatings[,-1],fa = "pc",n.iter = 100)#第一个变量CONT-律师与法官接触的次数被删除了
## Warning in fa.stats(r = r, f = f, phi = phi, n.obs = n.obs, np.obs = np.obs, :
## The estimated weights for the factor scores are probably incorrect. Try a
## different factor score estimation method.
## Parallel analysis suggests that the number of factors = NA and the number of components = 1
abline(h=1)
结果显示:根据100个随机数据矩阵推导出来的特征值均值(虚线)和大于1的特征值准则(y=1水平实线)上只有一个符号X,说明只需要一个主成分。
13.2.2 提取主成分
-
13.2.2.1 语法
使用psych包中的principal()
r:相关系数矩阵或原始数据矩阵
nfactors:设定主成分的数目(默认为1)
rotate:指定旋转的方法(默认为方差最大化旋转),指定为“none”后则不旋转
scores:设定是否需要计算主成分得分(默认不需要)
-
13.2.2.2 举例(接上面例子1)
4.提取USJudgeRatings数据集的主成分
principal(USJudgeRatings[,-1], nfactors =1)
## Principal Components Analysis
## Call: principal(r = USJudgeRatings[, -1], nfactors = 1)
## Standardized loadings (pattern matrix) based upon correlation matrix
## PC1 h2 u2 com
## INTG 0.92 0.84 0.1565 1
## DMNR 0.91 0.83 0.1663 1
## DILG 0.97 0.94 0.0613 1
## CFMG 0.96 0.93 0.0720 1
## DECI 0.96 0.92 0.0763 1
## PREP 0.98 0.97 0.0299 1
## FAMI 0.98 0.95 0.0469 1
## ORAL 1.00 0.99 0.0091 1
## WRIT 0.99 0.98 0.0196 1
## PHYS 0.89 0.80 0.2013 1
## RTEN 0.99 0.97 0.0275 1
##
## PC1
## SS loadings 10.13
## Proportion Var 0.92
##
## Mean item complexity = 1
## Test of the hypothesis that 1 component is sufficient.
##
## The root mean square of the residuals (RMSR) is 0.04
## with the empirical chi square 6.21 with prob < 1
##
## Fit based upon off diagonal values = 1
PC1栏:成分载荷,指观测变量与主成分的相关系数。是一个可用来进行一般性评价的维度。
h2:公因子方差,主成分对每个变量的方差解释度。如果PC不止一个,那么一般方差解释度程递减,即PC1的方差解释度一定大于PC2。
u2:1-h2,成分唯一性,即方差无法被主成分解释的比例。
SS loading:包含了与主成分相关联的特征值,指的是与特定主成分相关联的标准化后的方差值。
Proportion Var :每个主成分对整个数据集的解释程度,此处为92%,说明PC1解释了11个变量92%的方差。
13.2.2.3 举例2
Harman23.cor数据集,即身体测量数据集包含305个女生的8个身体测量指标。数据集由变量的相关系数组成,而不是原始数据。
1.数据预处理
is.na(Harman23.cor)
## cov center n.obs
## FALSE FALSE FALSE
Harman23.cor
## $cov
## height arm.span forearm lower.leg weight bitro.diameter
## height 1.000 0.846 0.805 0.859 0.473 0.398
## arm.span 0.846 1.000 0.881 0.826 0.376 0.326
## forearm 0.805 0.881 1.000 0.801 0.380 0.319
## lower.leg 0.859 0.826 0.801 1.000 0.436 0.329
## weight 0.473 0.376 0.380 0.436 1.000 0.762
## bitro.diameter 0.398 0.326 0.319 0.329 0.762 1.000
## chest.girth 0.301 0.277 0.237 0.327 0.730 0.583
## chest.width 0.382 0.415 0.345 0.365 0.629 0.577
## chest.girth chest.width
## height 0.301 0.382
## arm.span 0.277 0.415
## forearm 0.237 0.345
## lower.leg 0.327 0.365
## weight 0.730 0.629
## bitro.diameter 0.583 0.577
## chest.girth 1.000 0.539
## chest.width 0.539 1.000
##
## $center
## [1] 0 0 0 0 0 0 0 0
##
## $n.obs
## [1] 305
2.判断:判断是PCA还是因子分析,这里用PCA分析
3.判断要选择的主成分/因子数目(碎石图)。
library(psych)
fa.parallel(Harman23.cor$cov, n.obs =305, fa = "pc", fm = "ml", n.iter = 100)
结果显示,需要主成分数目为2。
4.选择主成分/因子
principal(Harman23.cor$cov, nfactors =2,rotate = "none")
## Principal Components Analysis
## Call: principal(r = Harman23.cor$cov, nfactors = 2, rotate = "none")
## Standardized loadings (pattern matrix) based upon correlation matrix
## PC1 PC2 h2 u2 com
## height 0.86 -0.37 0.88 0.123 1.4
## arm.span 0.84 -0.44 0.90 0.097 1.5
## forearm 0.81 -0.46 0.87 0.128 1.6
## lower.leg 0.84 -0.40 0.86 0.139 1.4
## weight 0.76 0.52 0.85 0.150 1.8
## bitro.diameter 0.67 0.53 0.74 0.261 1.9
## chest.girth 0.62 0.58 0.72 0.283 2.0
## chest.width 0.67 0.42 0.62 0.375 1.7
##
## PC1 PC2
## SS loadings 4.67 1.77
## Proportion Var 0.58 0.22
## Cumulative Var 0.58 0.81
## Proportion Explained 0.73 0.27
## Cumulative Proportion 0.73 1.00
##
## Mean item complexity = 1.7
## Test of the hypothesis that 2 components are sufficient.
##
## The root mean square of the residuals (RMSR) is 0.05
##
## Fit based upon off diagonal values = 0.99
结果显示:PC1解释了58%的方差,PC2解释了22%的方差,两者总共解释了81%的方差。PC1与身高测量指标都呈正相关,为一个一般性的衡量因子;PC2与前4个变量呈负相关,与后4个变量呈正相关,似乎是一个长度-容积因子。
13.2.3 主成分旋转
旋转:是一系列将成分载荷矩阵变得更容易解释的数学方法,它们尽可能是去噪。
旋转方法包括:正交旋转和斜交旋转,分别使选择的成分保持不相关(正交)和变量相关(不相关)。
最流行的正交旋转为方差最大化旋转(varimax rotation),它试图对载荷阵的列进行去噪,使得每个成分只由一组有限的变量来解释(载荷阵每列只有少数几个很大的载荷,其他都是很小的载荷)。
-
13.2.3.1 语法
即在提取主成分时使用rotate=“varimax”
principal(r, nfactors = , rotate =“varimax" )
-
13.2.3.2 举例(接例子2)
5.旋转主成分/因子。
principal(Harman23.cor$cov, nfactors =2,rotate = "varimax")
## Principal Components Analysis
## Call: principal(r = Harman23.cor$cov, nfactors = 2, rotate = "varimax")
## Standardized loadings (pattern matrix) based upon correlation matrix
## RC1 RC2 h2 u2 com
## height 0.90 0.25 0.88 0.123 1.2
## arm.span 0.93 0.19 0.90 0.097 1.1
## forearm 0.92 0.16 0.87 0.128 1.1
## lower.leg 0.90 0.22 0.86 0.139 1.1
## weight 0.26 0.88 0.85 0.150 1.2
## bitro.diameter 0.19 0.84 0.74 0.261 1.1
## chest.girth 0.11 0.84 0.72 0.283 1.0
## chest.width 0.26 0.75 0.62 0.375 1.2
##
## RC1 RC2
## SS loadings 3.52 2.92
## Proportion Var 0.44 0.37
## Cumulative Var 0.44 0.81
## Proportion Explained 0.55 0.45
## Cumulative Proportion 0.55 1.00
##
## Mean item complexity = 1.1
## Test of the hypothesis that 2 components are sufficient.
##
## The root mean square of the residuals (RMSR) is 0.05
##
## Fit based upon off diagonal values = 0.99
结果:可以看到经过旋转后的主成分用RC来表示。第4和第5步骤也可以一次性完成。
RC1栏载荷显示RC1主要由前4个变量来解释(长度变量),RC2由变量5-8来解释(容积变量)。
注意两个主成分仍不相关,对变量的解释性不变,这是由于变量的群组没有发生变化。
两个主成分旋转后的累积方差解释性没有变化(81%),变的只是各个成分对方差的解释度。
13.2.4 获取主成分得分
这是主成分分析的一个重要步骤,它可以提供有关个体或样本在主成分空间中的位置和表现的信息。
主成分得分是指将原始数据投影到主成分空间后得到的新的数值。每个主成分得分表示个体或样本在对应主成分上的表现或得分。通过获取主成分得分,可以将原始数据转换为一组更具解释性和简化的数值,以便进行比较、分析和解释。
获取主成分得分的好处包括:
1.数据降维:主成分得分可以替代原始数据,将高维数据降低到较低维度的主成分空间中。这有助于简化数据分析和可视化,同时保留了主要的数据变异性。
2.数据解释性:主成分得分提供了对个体或样本表现的解释,可以用于比较不同个体或样本之间的差异。它们可以帮助识别在主成分空间中具有相似或不同特征的群体或样本。
3.特征提取:主成分得分可以用于确定对主成分贡献最大的变量或特征。通过观察主成分得分与原始变量之间的关系,可以识别在主成分中具有较大权重和贡献的变量。
4.数据压缩:主成分得分可以用较少的数值表示一个个体或样本的主要信息,从而实现数据的压缩和存储效率的提高。
当主成分分析基于相关系数矩阵时,原始数据不可用,就不能获得每个观测值的主成分得分,但可以得到用来计算主成分得分的系数。
13.2.4.1 语法
即在提取主成分时使用scores=TRUE,得到的主成分得分会自动保存为scores列
pc <- principal(r, nfactors =,scores = TRUE)
head(pc$scores)
13.2.4.2 举例1(使用例子1)
pc <- principal(USJudgeRatings[,-1], nfactors =1,scores = TRUE)
head(pc$scores)
## PC1
## AARONSON,L.H. -0.1857981
## ALEXANDER,J.M. 0.7469865
## ARMENTANO,A.J. 0.0704772
## BERDON,R.I. 1.1358765
## BRACKEN,J.J. -2.1586211
## BURNS,E.B. 0.7669406
13.2.4.3 举例2,获取主成分得分的系数(使用例子2)
Harman23.cor$cov
## height arm.span forearm lower.leg weight bitro.diameter
## height 1.000 0.846 0.805 0.859 0.473 0.398
## arm.span 0.846 1.000 0.881 0.826 0.376 0.326
## forearm 0.805 0.881 1.000 0.801 0.380 0.319
## lower.leg 0.859 0.826 0.801 1.000 0.436 0.329
## weight 0.473 0.376 0.380 0.436 1.000 0.762
## bitro.diameter 0.398 0.326 0.319 0.329 0.762 1.000
## chest.girth 0.301 0.277 0.237 0.327 0.730 0.583
## chest.width 0.382 0.415 0.345 0.365 0.629 0.577
## chest.girth chest.width
## height 0.301 0.382
## arm.span 0.277 0.415
## forearm 0.237 0.345
## lower.leg 0.327 0.365
## weight 0.730 0.629
## bitro.diameter 0.583 0.577
## chest.girth 1.000 0.539
## chest.width 0.539 1.000
rc <- principal(Harman23.cor$cov, nfactors =2,rotate = "varimax")
round(unclass(rc$weights),2)
## RC1 RC2
## [1,] 0.28 -0.05
## [2,] 0.30 -0.08
## [3,] 0.30 -0.09
## [4,] 0.28 -0.06
## [5,] -0.06 0.33
## [6,] -0.08 0.32
## [7,] -0.10 0.34
## [8,] -0.04 0.27
解释round(unclass(rc$weights),2):
rc$weights表示主成分分析结果中的载荷矩阵,它包含了变量与主成分之间的载荷值。
unclass()函数用于将对象转换为普通的矩阵或数据框形式,以便进行后续操作。
round()函数用于对矩阵中的数值进行四舍五入,第一个参数是要进行四舍五入的对象,第二个参数是指定保留的小数位数(在这里是2位小数)。
也就是说这里,不加unclass()函数和round()函数也是可以直接输出载荷矩阵,但为了计算方便可以用round进行保存2位小数
rc <- principal(Harman23.cor$cov, nfactors =2,rotate = "varimax")
rc$weights
## RC1 RC2
## [1,] 0.27524417 -0.04748169
## [2,] 0.29673051 -0.08005490
## [3,] 0.29823990 -0.09158460
## [4,] 0.28014088 -0.06027214
## [5,] -0.06053059 0.33228637
## [6,] -0.07752349 0.32477593
## [7,] -0.10366026 0.33763942
## [8,] -0.03730720 0.27392667
主成分得分=系数*变量值
PC1 = 0.28*height+0.30*arm.span+0.30*forearm+0.29*lower.leg-0.06*weight-0.08*bitro.diameter-0.10*chest.girth-0.04*chest.width
PC2 =
13.3 探索性因子分析
因子分析的目标是发掘隐藏在数据下的一组较少的、更为基础的无法观测的变量,来解释一组观测变量的相关性。
虽然和PCA分析不同,但是基本步骤是一样的。
13.3.1 判断需要提取的公共因子数目(碎石图)
13.3.1.1 语法
library(psych)
fa.parallel(r, n.obs = , fa = , fm = , n.iter = 100, alpha = 0.05)
abline(h=1)
**fa:指定用于PCA或因子分析的函数。可以写“both”则同时出现主成分和因子的碎石图。
13.3.1.2 举例
数据集ability.cov,112人参加了6个测验,包括非语言的普通智力测验(general),画图(picture),积木图案(blocks),迷宫(maze),阅读(reading)和词汇(vocab)。我们如何用一组较少的、潜在的心理学因素来解释受试者的测验得分?
library(psych)
ability.cov
## $cov
## general picture blocks maze reading vocab
## general 24.641 5.991 33.520 6.023 20.755 29.701
## picture 5.991 6.700 18.137 1.782 4.936 7.204
## blocks 33.520 18.137 149.831 19.424 31.430 50.753
## maze 6.023 1.782 19.424 12.711 4.757 9.075
## reading 20.755 4.936 31.430 4.757 52.604 66.762
## vocab 29.701 7.204 50.753 9.075 66.762 135.292
##
## $center
## [1] 0 0 0 0 0 0
##
## $n.obs
## [1] 112
test <- ability.cov$cov
test#这里还不是相关性系数矩阵
## general picture blocks maze reading vocab
## general 24.641 5.991 33.520 6.023 20.755 29.701
## picture 5.991 6.700 18.137 1.782 4.936 7.204
## blocks 33.520 18.137 149.831 19.424 31.430 50.753
## maze 6.023 1.782 19.424 12.711 4.757 9.075
## reading 20.755 4.936 31.430 4.757 52.604 66.762
## vocab 29.701 7.204 50.753 9.075 66.762 135.292
将test转化为相关性系数矩阵:
test <- cov2cor(test)#该函数可以将协方差矩阵转化为相关性系数矩阵
test
## general picture blocks maze reading vocab
## general 1.0000000 0.4662649 0.5516632 0.3403250 0.5764799 0.5144058
## picture 0.4662649 1.0000000 0.5724364 0.1930992 0.2629229 0.2392766
## blocks 0.5516632 0.5724364 1.0000000 0.4450901 0.3540252 0.3564715
## maze 0.3403250 0.1930992 0.4450901 1.0000000 0.1839645 0.2188370
## reading 0.5764799 0.2629229 0.3540252 0.1839645 1.0000000 0.7913779
## vocab 0.5144058 0.2392766 0.3564715 0.2188370 0.7913779 1.0000000
library(psych)
fa.parallel(test, n.obs = 112, fa ="both", n.iter = 100)
## Warning in fa.stats(r = r, f = f, phi = phi, n.obs = n.obs, np.obs = np.obs, :
## The estimated weights for the factor scores are probably incorrect. Try a
## different factor score estimation method.
## Warning in fac(r = r, nfactors = nfactors, n.obs = n.obs, rotate = rotate, : An
## ultra-Heywood case was detected. Examine the results carefully
## Warning in fa.stats(r = r, f = f, phi = phi, n.obs = n.obs, np.obs = np.obs, :
## The estimated weights for the factor scores are probably incorrect. Try a
## different factor score estimation method.
## Warning in fac(r = r, nfactors = nfactors, n.obs = n.obs, rotate = rotate, : An
## ultra-Heywood case was detected. Examine the results carefully
## Warning in fa.stats(r = r, f = f, phi = phi, n.obs = n.obs, np.obs = np.obs, :
## The estimated weights for the factor scores are probably incorrect. Try a
## different factor score estimation method.## Warning in fa.stats(r = r, f = f, phi = phi, n.obs = n.obs, np.obs = np.obs, :
## The estimated weights for the factor scores are probably incorrect. Try a
## different factor score estimation method.## Warning in fa.stats(r = r, f = f, phi = phi, n.obs = n.obs, np.obs = np.obs, :
## The estimated weights for the factor scores are probably incorrect. Try a
## different factor score estimation method.
## Warning in fac(r = r, nfactors = nfactors, n.obs = n.obs, rotate = rotate, : An
## ultra-Heywood case was detected. Examine the results carefully
## Warning in fa.stats(r = r, f = f, phi = phi, n.obs = n.obs, np.obs = np.obs, :
## The estimated weights for the factor scores are probably incorrect. Try a
## different factor score estimation method.## Warning in fa.stats(r = r, f = f, phi = phi, n.obs = n.obs, np.obs = np.obs, :
## The estimated weights for the factor scores are probably incorrect. Try a
## different factor score estimation method.## Warning in fa.stats(r = r, f = f, phi = phi, n.obs = n.obs, np.obs = np.obs, :
## The estimated weights for the factor scores are probably incorrect. Try a
## different factor score estimation method.
## Warning in fac(r = r, nfactors = nfactors, n.obs = n.obs, rotate = rotate, : An
## ultra-Heywood case was detected. Examine the results carefully
## Warning in fa.stats(r = r, f = f, phi = phi, n.obs = n.obs, np.obs = np.obs, :
## The estimated weights for the factor scores are probably incorrect. Try a
## different factor score estimation method.
## Warning in fac(r = r, nfactors = nfactors, n.obs = n.obs, rotate = rotate, : An
## ultra-Heywood case was detected. Examine the results carefully
## Warning in fa.stats(r = r, f = f, phi = phi, n.obs = n.obs, np.obs = np.obs, :
## The estimated weights for the factor scores are probably incorrect. Try a
## different factor score estimation method.## Warning in fa.stats(r = r, f = f, phi = phi, n.obs = n.obs, np.obs = np.obs, :
## The estimated weights for the factor scores are probably incorrect. Try a
## different factor score estimation method.
## Warning in fac(r = r, nfactors = nfactors, n.obs = n.obs, rotate = rotate, : An
## ultra-Heywood case was detected. Examine the results carefully
## Warning in fa.stats(r = r, f = f, phi = phi, n.obs = n.obs, np.obs = np.obs, :
## The estimated weights for the factor scores are probably incorrect. Try a
## different factor score estimation method.
## Warning in fac(r = r, nfactors = nfactors, n.obs = n.obs, rotate = rotate, : An
## ultra-Heywood case was detected. Examine the results carefully## Warning in fa.stats(r = r, f = f, phi = phi, n.obs = n.obs, np.obs = np.obs, :
## The estimated weights for the factor scores are probably incorrect. Try a
## different factor score estimation method.## Warning in fa.stats(r = r, f = f, phi = phi, n.obs = n.obs, np.obs = np.obs, :
## The estimated weights for the factor scores are probably incorrect. Try a
## different factor score estimation method.
## Warning in fac(r = r, nfactors = nfactors, n.obs = n.obs, rotate = rotate, : An
## ultra-Heywood case was detected. Examine the results carefully
## Warning in fa.stats(r = r, f = f, phi = phi, n.obs = n.obs, np.obs = np.obs, :
## The estimated weights for the factor scores are probably incorrect. Try a
## different factor score estimation method.
## Warning in fac(r = r, nfactors = nfactors, n.obs = n.obs, rotate = rotate, : An
## ultra-Heywood case was detected. Examine the results carefully
## Parallel analysis suggests that the number of factors = 2 and the number of components = 1
abline(h=c(0,1))
注意注意:
对于因子分析,Kaiser-Harris准则的特征值数大于0而非1,因子用三角形的线表示。则虚线和实线0以上的三角形数目为所需因子数。
在本例中,所需因子数为2。
13.3.2 提取公共因子
-
13.3.2.1 语法
fa(r,nfactors = ,n.obs = ,rotate = ,scores = ,fm=)
r:相关系数矩阵或原始数据矩阵
nfactors:设定提取的因子数(默认为1)
n.obs:观测值数目(输入相关系数矩阵时需要填写)
rotate:设定旋转的方法(默认为斜交转轴法),不旋转“none“,正交旋转“varimax”,斜交旋转“promax”
scores:设定是否计算因子得分(默认不计算)
fm:设定因子化方法(默认极小残差法),与主成分不同,提取公共因子的方法较多,包括最大似然法“ml”,主轴迭代法“pa”,加权最小二乘法“wls”,广义加权最小二乘法“gls”,最小残差法“minres”。其中最大似然法备受青睐,但有时该法不会收敛?此时使用主轴迭代法更好。
-
13.3.2.2 举例
这里使用主轴迭代法“pa“:
fa(test,nfactors =2 ,n.obs =112 ,rotate ="none",fm="pa")
## Factor Analysis using method = pa
## Call: fa(r = test, nfactors = 2, n.obs = 112, rotate = "none", fm = "pa")
## Standardized loadings (pattern matrix) based upon correlation matrix
## PA1 PA2 h2 u2 com
## general 0.75 0.07 0.57 0.432 1.0
## picture 0.52 0.32 0.38 0.623 1.7
## blocks 0.75 0.52 0.83 0.166 1.8
## maze 0.39 0.22 0.20 0.798 1.6
## reading 0.81 -0.51 0.91 0.089 1.7
## vocab 0.73 -0.39 0.69 0.313 1.5
##
## PA1 PA2
## SS loadings 2.75 0.83
## Proportion Var 0.46 0.14
## Cumulative Var 0.46 0.60
## Proportion Explained 0.77 0.23
## Cumulative Proportion 0.77 1.00
##
## Mean item complexity = 1.5
## Test of the hypothesis that 2 factors are sufficient.
##
## df null model = 15 with the objective function = 2.48 with Chi Square = 268.35
## df of the model are 4 and the objective function was 0.07
##
## The root mean square of the residuals (RMSR) is 0.03
## The df corrected root mean square of the residuals is 0.06
##
## The harmonic n.obs is 112 with the empirical chi square 3.49 with prob < 0.48
## The total n.obs was 112 with Likelihood Chi Square = 7.15 with prob < 0.13
##
## Tucker Lewis Index of factoring reliability = 0.953
## RMSEA index = 0.083 and the 90 % confidence intervals are 0 0.182
## BIC = -11.72
## Fit based upon off diagonal values = 0.99
## Measures of factor score adequacy
## PA1 PA2
## Correlation of (regression) scores with factors 0.96 0.92
## Multiple R square of scores with factors 0.93 0.84
## Minimum correlation of possible factor scores 0.86 0.68
结果显示:两个因子解释了6个心理测验60%的方差。不过因子载荷阵(模块2)的意义不太好解释(数字较乱,没有规律),因此还需要旋转。
13.3.3 旋转提取因子
-
13.3.3.1 正交旋转(因子间不相关)
library(psych)
fa.varimax <- fa(test,nfactors =2 ,n.obs =112 ,rotate ="varimax",fm="pa")
fa.varimax
## Factor Analysis using method = pa
## Call: fa(r = test, nfactors = 2, n.obs = 112, rotate = "varimax", fm = "pa")
## Standardized loadings (pattern matrix) based upon correlation matrix
## PA1 PA2 h2 u2 com
## general 0.49 0.57 0.57 0.432 2.0
## picture 0.16 0.59 0.38 0.623 1.1
## blocks 0.18 0.89 0.83 0.166 1.1
## maze 0.13 0.43 0.20 0.798 1.2
## reading 0.93 0.20 0.91 0.089 1.1
## vocab 0.80 0.23 0.69 0.313 1.2
##
## PA1 PA2
## SS loadings 1.83 1.75
## Proportion Var 0.30 0.29
## Cumulative Var 0.30 0.60
## Proportion Explained 0.51 0.49
## Cumulative Proportion 0.51 1.00
##
## Mean item complexity = 1.3
## Test of the hypothesis that 2 factors are sufficient.
##
## df null model = 15 with the objective function = 2.48 with Chi Square = 268.35
## df of the model are 4 and the objective function was 0.07
##
## The root mean square of the residuals (RMSR) is 0.03
## The df corrected root mean square of the residuals is 0.06
##
## The harmonic n.obs is 112 with the empirical chi square 3.49 with prob < 0.48
## The total n.obs was 112 with Likelihood Chi Square = 7.15 with prob < 0.13
##
## Tucker Lewis Index of factoring reliability = 0.953
## RMSEA index = 0.083 and the 90 % confidence intervals are 0 0.182
## BIC = -11.72
## Fit based upon off diagonal values = 0.99
## Measures of factor score adequacy
## PA1 PA2
## Correlation of (regression) scores with factors 0.96 0.92
## Multiple R square of scores with factors 0.91 0.85
## Minimum correlation of possible factor scores 0.82 0.71
结果显示:正交旋转后的载荷阵更好解释,阅读和词汇在第一因子上载荷较大(0.93和0.80),而画图、积木图案和迷宫在第二个因子上载荷较大(0.59,0.89,0.43),非语言的普通智力在两个因子上的载荷较为平均。这说明两个潜在变量(因子)即语言智力因子和非语言智力因子可以解释6个心理测试(显变量)间的相互关系。
-
13.3.3.2 斜交旋转(因子间相关)
library(GPArotation)
## Warning: 程辑包'GPArotation'是用R版本4.3.2 来建造的
##
## 载入程辑包:'GPArotation'
## The following objects are masked from 'package:psych':
##
## equamax, varimin
library(psych)
fa.promax <- fa(test,nfactors =2,n.obs =112,rotate ="promax",fm="pa")
fa.promax
## Factor Analysis using method = pa
## Call: fa(r = test, nfactors = 2, n.obs = 112, rotate = "promax", fm = "pa")
## Standardized loadings (pattern matrix) based upon correlation matrix
## PA1 PA2 h2 u2 com
## general 0.37 0.48 0.57 0.432 1.9
## picture -0.03 0.63 0.38 0.623 1.0
## blocks -0.10 0.97 0.83 0.166 1.0
## maze 0.00 0.45 0.20 0.798 1.0
## reading 1.00 -0.09 0.91 0.089 1.0
## vocab 0.84 -0.01 0.69 0.313 1.0
##
## PA1 PA2
## SS loadings 1.83 1.75
## Proportion Var 0.30 0.29
## Cumulative Var 0.30 0.60
## Proportion Explained 0.51 0.49
## Cumulative Proportion 0.51 1.00
##
## With factor correlations of
## PA1 PA2
## PA1 1.00 0.55
## PA2 0.55 1.00
##
## Mean item complexity = 1.2
## Test of the hypothesis that 2 factors are sufficient.
##
## df null model = 15 with the objective function = 2.48 with Chi Square = 268.35
## df of the model are 4 and the objective function was 0.07
##
## The root mean square of the residuals (RMSR) is 0.03
## The df corrected root mean square of the residuals is 0.06
##
## The harmonic n.obs is 112 with the empirical chi square 3.49 with prob < 0.48
## The total n.obs was 112 with Likelihood Chi Square = 7.15 with prob < 0.13
##
## Tucker Lewis Index of factoring reliability = 0.953
## RMSEA index = 0.083 and the 90 % confidence intervals are 0 0.182
## BIC = -11.72
## Fit based upon off diagonal values = 0.99
## Measures of factor score adequacy
## PA1 PA2
## Correlation of (regression) scores with factors 0.97 0.94
## Multiple R square of scores with factors 0.93 0.88
## Minimum correlation of possible factor scores 0.86 0.77
结果显示:斜交旋转比正交旋转多了一个模块:因子间的相关系数(因子关联矩阵),可以看到这里相关性为0.55,相关性较大,如果相关性很低则说明正交旋转更合适。
另外,注意斜交旋转与正交旋转的不同:
1)正交分析的重点在于因子结构矩阵(变量与因子的相关系数,PA1和PA2)
2)斜交分析则关注3个矩阵:1.因子模式矩阵,2.因子关联矩阵,3.因子结构矩阵
1.因子模式矩阵(P):标准化的回归系数矩阵(PA1和PA2),这里不再表示变量与因子的相关系数,而是标准化的回归系数。从这个例子中也能得到和正交旋转一样的2个潜在变量(因子),即语言智力因子和非语言智力因子。
2.因子关联矩阵(Phi):显示两个因子相关系数的矩阵。
3.因子结构矩阵(因子荷载阵)(F):即类似正交旋转中的PA1 PA2,变量与因子的相关系数,这里没有显示出来,但可以通过F=P*Phi计算得到。
这里计算F:
fsm <- function(oblique){
if(class(oblique)[2]=="fa"&is.null(oblique$Phi)){
warning("object doesn't look like oblique EFA")}else{
P <- unclass(oblique$loading)
F <- P%*%oblique$Phi#这里使用%*%是矩阵乘法运算符
colnames(F) <- c("PA1","PA2")
return(F)
}}
fsm(fa.promax)
## PA1 PA2
## general 0.6362443 0.6879280
## picture 0.3203301 0.6137902
## blocks 0.4293252 0.9089946
## maze 0.2491789 0.4496446
## reading 0.9510795 0.4587468
## vocab 0.8287066 0.4476297
结果显示:与正交旋转相比,这里的载荷阵列的噪音较大,这是由于我们允许因子间相关。虽然斜交旋转更复杂,但模型将更符合真实数据。
13.3.4 因子得分
因子分析并不那么关注因子得分。可以用scores=TRUE轻松得到因子得分。返回为weights:
fa <- fa(test,nfactors =2 ,n.obs =112 ,rotate ="none",fm="pa",scores = TRUE)
fa$weights
## PA1 PA2
## general 0.16174225 0.142817443
## picture 0.06135420 0.075396562
## blocks 0.40699336 0.709931420
## maze 0.03478801 0.008562505
## reading 0.45184777 -0.744726561
## vocab 0.12331024 -0.146753411
13.4 绘制图片
13.4.1 factor.plot()函数
13.4.1.1 绘制因子图
factor.plot(fa.promax,lab=rownames(fa.promax$loadings))
图片显示词汇和阅读在第一个因子(PA1)上载荷较大,而积木图案、画图和迷宫在第二个因子(PA2)上载荷较大。普通智力测量在两个因子上较为平均。
13.4.1.2 绘制PCA图
pc <- principal(Harman23.cor$cov, nfactors =2,rotate = "varimax")
factor.plot(pc,labels = rownames(pc$loadings))
13.4.2 fa.diagram()函数
13.4.2.1 因子图
simple 参数用于控制生成因子分析图的简单模式和完整模式之间的区别:
当 simple = TRUE 时,生成的因子分析图是简化的版本。它通常只显示因子与观测变量之间的直接连接线,而不显示额外的箭头、标签或其他附加信息。这种简化模式适用于当因子分析模型较为简单或因子数量较少时,希望以简洁的方式呈现因子分析图的情况。
当 simple = FALSE 时,生成的因子分析图是完整的版本。它会显示更多的细节和附加信息,如因子与观测变量之间的连接线、箭头的方向、因子名称、观测变量名称等。这种完整模式适用于当因子分析模型较为复杂或因子数量较多时,需要详细呈现因子分析图的情况。
fa.diagram(fa.promax,simple = TRUE)
fa.diagram(fa.promax,simple = FALSE)
13.4.2.2 PCA图
pc <- principal(Harman23.cor$cov, nfactors =2,rotate = "varimax")
fa.diagram(pc,simple = FALSE)
完整教程请查看
R语言基础学习手册