Title: | Reduced-Rank Multinomial Logistic Regression for Markov Chains |
---|---|
Description: | Fit the reduced-rank multinomial logistic regression model for Markov chains developed by Wang, Abner, Fardo, Schmitt, Jicha, Eldik and Kryscio (2021)<doi:10.1002/sim.8923> in R. It combines the ideas of multinomial logistic regression in Markov chains and reduced-rank. It is very useful in a study where multi-states model is assumed and each transition among the states is controlled by a series of covariates. The key advantage is to reduce the number of parameters to be estimated. The final coefficients for all the covariates and the p-values for the interested covariates will be reported. The p-values for the whole coefficient matrix can be calculated by two bootstrap methods. |
Authors: | Pei Wang [aut, cre], Richard Kryscio [aut] |
Maintainer: | Pei Wang <[email protected]> |
License: | GPL-2 |
Version: | 0.4.0 |
Built: | 2024-10-25 02:59:42 UTC |
Source: | https://github.com/cran/RRMLRfMC |
This function is used to update A matrix
Aupdate(Dfix, Gamma, Adata, R, p, q, I, iniA, eps, refA)
Aupdate(Dfix, Gamma, Adata, R, p, q, I, iniA, eps, refA)
Dfix |
the coefficient matrix for study covariates |
Gamma |
the G matrix value |
Adata |
the dataset |
R |
the rank of reduced rank model |
p |
the number of covariates in the dimension reduction |
q |
the numbne of study covariates |
I |
a U by U incidence matrix with elements; I(i,j)=1 if state j can be accessed from state i in one step and 0 otherwise |
iniA |
initial value for the iteration |
eps |
the tolerance for convergence, default is 10^-5 |
refA |
a vector of reference categories |
a list of outputs:
NewA: the updated A matrix
loglikeA: the loglikelihood when updating A
A dataset containing the states and covariates of 649 participants enrolled in the BRAiNS cohort at the University of Kentucky's Alzheimer's Disease Research Center.
cogdat
cogdat
A data frame with 6240 rows and 14 columns:
used to denote the participants; from 1 to 649
used to denote the visit number for each participant
denote the previous state
denote the current state
baseline age (centered at age 72)
family history of dementia
self reported high blood pressure
at least one Apolipoprotein-E (APOE) gene 4 allele
cigarette smoking level (none versus < 10)
cigarette smoking level (11-19)
cigarette smoking level (>= 20 pack years))
low education
self reported head injury
This function is used to calculate the loglikelihood with a given matrix B=AG
derivativeB(B, I, zy, refd)
derivativeB(B, I, zy, refd)
B |
a numeric coefficient matrix |
I |
U by U incidence matrix with elements; I(i,j)=1 if state j can be accessed from state i in one step and 0 otherwise |
zy |
the variable values for a given observation |
refd |
a vector of reference categories |
loglikelihood
This function is used calculate the derivative values (first and second derivatives for Newton-Raphson method) and loglikelihood when updating A
derivatives(A, Gamma, Dmat, I, zy, refA)
derivatives(A, Gamma, Dmat, I, zy, refA)
A |
matrix with value from previous iteration |
Gamma |
G matrix values |
Dmat |
the coefficient matrix for the fixed variables, |
I |
a U by U incidence matrix with elements; I(i,j)=1 if state j can be accessed from state i in one step and 0 otherwise |
zy |
the variable values for a given observation |
refA |
a vector of reference categories |
a list of outputs:
fird: the first derivative value
secd: the second derivative value
loglike: the loglikelihood
This function is used to expand the Y(category) to a indicator vector
expand(pri, curr, I, refE)
expand(pri, curr, I, refE)
pri |
the prior state |
curr |
the current state |
I |
a U by U incidence matrix with elements; I(i,j)=1 if state j can be accessed from state i in one step and 0 otherwise |
refE |
a vector with the reference categories |
ry: a indicator vector
This function is used to update G matrix
Gupdate(A, Gdata, p, q, I, refG)
Gupdate(A, Gdata, p, q, I, refG)
A |
numeric matrix |
Gdata |
the dataset used to update G |
p |
the number of covariates in the dimension reduction |
q |
the numbne of study covariates |
I |
a U by U incidence matrix with elements; I(i,j)=1 if state j can be accessed from state i in one step and 0 otherwise |
refG |
a vector of reference categories |
a list of outputs:
NewG: the updated G matrix
loglikeK: the loglikelihood when updating G
sderr: standard errors for the coefficient matrix
This function is used to normalize a vector to have unit length
norm(x)
norm(x)
x |
a numeric vector |
a normalized vector with length 1
This function is used to fit the reduced rank multinomial logistic regression for markov chain
rrmultinom(I, z1 = NULL, z2 = NULL, T, R, eps = 1e-05, ref = NULL)
rrmultinom(I, z1 = NULL, z2 = NULL, T, R, eps = 1e-05, ref = NULL)
I |
a U by U incidence matrix with elements; U is number of states; I(i,j)=1 if state j can be accessed from state i in one step and 0 otherwise |
z1 |
a n by p matrix with covariates involved in the dimension reduction(DR), n is the number of subjects, p is the number of covariates involved in DR |
z2 |
a n by q matrix with study covariates (not in dimension reduction), q is the number of study covariates |
T |
a M by 3 state matrix,
|
R |
the rank |
eps |
the tolerance for convergence; the default is 10^-5 |
ref |
a vector of reference categories; the default is NULL and if NULL is used, the function will use the first category as the reference category for each row |
a list of outputs:
Alpha: the final A matrix
Gamma: the final G matrix
Beta: the coefficient matrix for variables involved in reduced rank
Dcoe: the coefficient matrix for the fixed variables
Dsderr: the standard error matrix for the fixed variables
Dpval: the p-value matrix for the fixed variables
coemat: the overall coefficient matrix
niter: the iteration number to get converged
df: the degrees of freedom
loglik: the final loglikelihood
converge: three possible values with 0 means fail to converge, 1 means converges, and 2 means the maximum iteration is achieved
# generate the Markov chain U=7 I1=I2=I3=rep(1,7) I4=c(0,0,0,1,1,1,1) I5=I6=I7=rep(0,7) I=rbind(I1,I2,I3,I4,I5,I6,I7) # prepare the data data=cogdat n=length(unique(data[,1])) M=nrow(data)+n Mc=0 z=matrix(0,n,9) colnames(z)=colnames(data)[5:13] T=matrix(0,M,3) for(i in 1:n){ subdat=data[which(data[,1]==i),,drop=FALSE] z[i,]=subdat[1,5:13] mc=nrow(subdat) T[(Mc+1):(Mc+mc+1),1]=i T[(Mc+1):(Mc+mc+1),2]=0:mc T[(Mc+1):(Mc+mc+1),3]=c(subdat[1,3],subdat[,4]) Mc=Mc+mc+1 } #z1=z[,c(1:3),drop=FALSE] z2=z[,4,drop=FALSE] # fit the model with rank 1 rrmultinom(I,z1=NULL,z2,T,1,eps=9,ref=c(1,1,1,4))
# generate the Markov chain U=7 I1=I2=I3=rep(1,7) I4=c(0,0,0,1,1,1,1) I5=I6=I7=rep(0,7) I=rbind(I1,I2,I3,I4,I5,I6,I7) # prepare the data data=cogdat n=length(unique(data[,1])) M=nrow(data)+n Mc=0 z=matrix(0,n,9) colnames(z)=colnames(data)[5:13] T=matrix(0,M,3) for(i in 1:n){ subdat=data[which(data[,1]==i),,drop=FALSE] z[i,]=subdat[1,5:13] mc=nrow(subdat) T[(Mc+1):(Mc+mc+1),1]=i T[(Mc+1):(Mc+mc+1),2]=0:mc T[(Mc+1):(Mc+mc+1),3]=c(subdat[1,3],subdat[,4]) Mc=Mc+mc+1 } #z1=z[,c(1:3),drop=FALSE] z2=z[,4,drop=FALSE] # fit the model with rank 1 rrmultinom(I,z1=NULL,z2,T,1,eps=9,ref=c(1,1,1,4))
This function is used get the standard error matrix from bootstrap method It returns the matrices of standard error and p-value for the coefficient matrix
sdfun(I, z1 = NULL, z2 = NULL, T, R, eps = 1e-05, B, tpoint = NULL, ref)
sdfun(I, z1 = NULL, z2 = NULL, T, R, eps = 1e-05, B, tpoint = NULL, ref)
I |
a U by U incidence matrix with elements; U is the number of states; I(i,j)=1 if state j can be accessed from state i in one step and 0 otherwise |
z1 |
a n by p matrix with covariates involved in the dimension reduction(DR), n is the number of subjects, p is the number of covariates involved in DR |
z2 |
a n by q matrix with study covariates (not in dimension reduction), q is the number of study covariates |
T |
a M by 3 state matrix,
|
R |
the rank |
eps |
the tolerance for convergence; the default is 10^-5 |
B |
the bootstrap number |
tpoint |
a matrix has two columns with the participants' visit information about timeline |
ref |
a vector of reference categories |
a list of outputs:
coe: the coefficient matrix of the original data
sd: the standard error matrix
pvalue: the p-value matrix
# generate the Markov chain U=7 I1=I2=I3=rep(1,7) I4=c(0,0,0,1,1,1,1) I5=I6=I7=rep(0,7) I=rbind(I1,I2,I3,I4,I5,I6,I7) # prepare the data data=cogdat n=length(unique(data[,1])) M=nrow(data)+n Mc=0 z=matrix(0,n,9) colnames(z)=colnames(data)[5:13] T=matrix(0,M,3) for(i in 1:n){ subdat=data[which(data[,1]==i),,drop=FALSE] z[i,]=subdat[1,5:13] mc=nrow(subdat) T[(Mc+1):(Mc+mc+1),1]=i T[(Mc+1):(Mc+mc+1),2]=0:mc T[(Mc+1):(Mc+mc+1),3]=c(subdat[1,3],subdat[,4]) Mc=Mc+mc+1 } #z1=z[,c(1:3),drop=FALSE] z2=z[,4,drop=FALSE] # find the standard deviation matrix for the model with rank 1 sdfun(I,z1=NULL,z2,T,1,eps = 9,2,ref=c(1,1,1,4))
# generate the Markov chain U=7 I1=I2=I3=rep(1,7) I4=c(0,0,0,1,1,1,1) I5=I6=I7=rep(0,7) I=rbind(I1,I2,I3,I4,I5,I6,I7) # prepare the data data=cogdat n=length(unique(data[,1])) M=nrow(data)+n Mc=0 z=matrix(0,n,9) colnames(z)=colnames(data)[5:13] T=matrix(0,M,3) for(i in 1:n){ subdat=data[which(data[,1]==i),,drop=FALSE] z[i,]=subdat[1,5:13] mc=nrow(subdat) T[(Mc+1):(Mc+mc+1),1]=i T[(Mc+1):(Mc+mc+1),2]=0:mc T[(Mc+1):(Mc+mc+1),3]=c(subdat[1,3],subdat[,4]) Mc=Mc+mc+1 } #z1=z[,c(1:3),drop=FALSE] z2=z[,4,drop=FALSE] # find the standard deviation matrix for the model with rank 1 sdfun(I,z1=NULL,z2,T,1,eps = 9,2,ref=c(1,1,1,4))