Package 'ggmix' reference manual

Title:	Variable Selection in Linear Mixed Models for SNP Data
Description:	Fit penalized multivariable linear mixed models with a single random effect to control for population structure in genetic association studies. The goal is to simultaneously fit many genetic variants at the same time, in order to select markers that are independently associated with the response. Can also handle prior annotation information, for example, rare variants, in the form of variable weights. For more information, see the website below and the accompanying paper: Bhatnagar et al., "Simultaneous SNP selection and adjustment for population structure in high dimensional prediction models", 2020, <DOI:10.1371/journal.pgen.1008766>.
Authors:	Sahir Bhatnagar [aut, cre] (https://sahirbhatnagar.com/), Karim Oualkacha [aut] (http://karimoualkacha.uqam.ca/), Yi Yang [aut] (http://www.math.mcgill.ca/yyang/), Celia Greenwood [aut] (http://www.mcgill.ca/statisticalgenetics/)
Maintainer:	Sahir Bhatnagar <[email protected]>
License:	MIT + file LICENSE
Version:	0.0.2
Built:	2025-03-05 03:01:51 UTC
Source:	https://github.com/sahirbhatnagar/ggmix

Simulated Dataset with 1D Geography

Description

A simulated dataset to show the utility of this package

Usage

admixed
admixed

Format

An object of class list of length 21.

Details

The code used to simulate the data is available at https://github.com/sahirbhatnagar/ggmix/blob/master/data-raw/bnpsd-data.R. See gen_structured_model for more details on the output and how the function used to simulate the data.

Value

A list with the following elements

ytrain: simulated response vector for training set
ytune: simulated response vector for tuning parameter selection set
ytest: simulated response vector for test set
xtrain: simulated design matrix for training set
xtune: simulated design matrix for tuning parameter selection set
xtest: simulated design matrix for testing set
xtrain_lasso: simulated design matrix for training set for lasso model. This is the same as xtrain, but also includes the nPC principal components
xtune_lasso: simulated design matrix for tuning parameter selection set for lasso model. This is the same as xtune, but also includes the nPC principal components
xtest: simulated design matrix for testing set for lasso model. This is the same as xtest, but also includes the nPC principal components
causal: character vector of the names of the causal SNPs
beta: the vector of true regression coefficients
kin_train: 2 times the estimated kinship for the training set individuals
kin_tune_train: The covariance matrix between the tuning set and the training set individuals
kin_test_train: The covariance matrix between the test set and training set individuals
Xkinship: the matrix of SNPs used to estimate the kinship matrix
not_causal: character vector of the non-causal SNPs
PC: the principal components for population structure adjustment

References

Ochoa, Alejandro, and John D. Storey. 2016a. "FST And Kinship for Arbitrary Population Structures I: Generalized Definitions." bioRxiv doi:10.1101/083915.

Ochoa, Alejandro, and John D. Storey. 2016b. "FST And Kinship for Arbitrary Population Structures II: Method of Moments Estimators." bioRxiv doi:10.1101/083923.

Examples

data(admixed)
str(admixed)
data(admixed)
str(admixed)

Simulation Scenario from Bhatnagar et al. (2018+) ggmix paper

Description

Function that generates data of the different simulation studies presented in the accompanying paper. This function requires the popkin and bnpsd package to be installed.

Usage

gen_structured_model(
  n,
  p_design,
  p_kinship,
  k,
  s,
  Fst,
  b0,
  nPC = 10,
  eta,
  sigma2,
  geography = c("ind", "1d", "circ"),
  percent_causal,
  percent_overlap,
  train_tune_test = c(0.6, 0.2, 0.2)
)
gen_structured_model(
  n,
  p_design,
  p_kinship,
  k,
  s,
  Fst,
  b0,
  nPC = 10,
  eta,
  sigma2,
  geography = c("ind", "1d", "circ"),
  percent_causal,
  percent_overlap,
  train_tune_test = c(0.6, 0.2, 0.2)
)

Arguments

`n`	number of observations to simulate
`p_design`	number of variables in X_test, i.e., the design matrix
`p_kinship`	number of variable in X_kinship, i.e., matrix used to calculate kinship
`k`	number of intermediate subpopulations.
`s`	the desired bias coefficient, which specifies sigma indirectly. Required if sigma is missing
`Fst`	The desired final FST of the admixed individuals. Required if sigma is missing
`b0`	the true intercept parameter
`nPC`	number of principal components to include in the design matrix used for regression adjustment for population structure via principal components. This matrix is used as the input in a standard lasso regression routine, where there are no random effects.
`eta`	the true eta parameter, which has to be `0 < eta < 1`
`sigma2`	the true sigma2 parameter
`geography`	the type of geography for simulation the kinship matrix. "ind" is independent populations where every individuals is actually unadmixed, "1d" is a 1D geography and "circ" is circular geography. Default: "ind". See the functions in the `bnpsd` for details on how this data is actually generated.
`percent_causal`	percentage of `p_design` that is causal. must be $0 \leq percent_causal \leq 1$ . The true regression coefficients are generated from a standard normal distribution.
`percent_overlap`	this represents the percentage of causal SNPs that will also be included in the calculation of the kinship matrix
`train_tune_test`	the proportion of sample size used for training tuning parameter selection and testing. default is 60/20/20 split

Details

The kinship is estimated using the popkin function from the popkin package. This function will multiple that kinship matrix by 2 to give the expected covariance matrix which is subsequently used in the linear mixed models

Value

A list with the following elements

ytrain: simulated response vector for training set
ytune: simulated response vector for tuning parameter selection set
ytest: simulated response vector for test set
xtrain: simulated design matrix for training set
xtune: simulated design matrix for tuning parameter selection set
xtest: simulated design matrix for testing set
xtrain_lasso: simulated design matrix for training set for lasso model. This is the same as xtrain, but also includes the nPC principal components
xtune_lasso: simulated design matrix for tuning parameter selection set for lasso model. This is the same as xtune, but also includes the nPC principal components
xtest: simulated design matrix for testing set for lasso model. This is the same as xtest, but also includes the nPC principal components
causal: character vector of the names of the causal SNPs
beta: the vector of true regression coefficients
kin_train: 2 times the estimated kinship for the training set individuals
kin_tune_train: The covariance matrix between the tuning set and the training set individuals
kin_test_train: The covariance matrix between the test set and training set individuals
Xkinship: the matrix of SNPs used to estimate the kinship matrix
not_causal: character vector of the non-causal SNPs
PC: the principal components for population structure adjustment

Examples

admixed <- gen_structured_model(n = 100,
                                p_design = 50,
                                p_kinship = 5e2,
                                geography = "1d",
                                percent_causal = 0.10,
                                percent_overlap = "100",
                                k = 5, s = 0.5, Fst = 0.1,
                                b0 = 0, nPC = 10,
                                eta = 0.1, sigma2 = 1,
                                train_tune_test = c(0.8, 0.1, 0.1))
names(admixed)
admixed <- gen_structured_model(n = 100,
                                p_design = 50,
                                p_kinship = 5e2,
                                geography = "1d",
                                percent_causal = 0.10,
                                percent_overlap = "100",
                                k = 5, s = 0.5, Fst = 0.1,
                                b0 = 0, nPC = 10,
                                eta = 0.1, sigma2 = 1,
                                train_tune_test = c(0.8, 0.1, 0.1))
names(admixed)

Fit Linear Mixed Model with Lasso or Group Lasso Regularization

Description

Main function to fit the linear mixed model with lasso or group lasso penalty for a sequence of tuning parameters. This is a penalized regression method that accounts for population structure using either the kinship matrix or the factored realized relationship matrix

Usage

ggmix(
  x,
  y,
  U,
  D,
  kinship,
  K,
  n_nonzero_eigenvalues,
  n_zero_eigenvalues,
  estimation = c("full"),
  penalty = c("lasso"),
  group,
  penalty.factor = rep(1, p_design),
  lambda = NULL,
  lambda_min_ratio = ifelse(n_design < p_design, 0.01, 1e-04),
  nlambda = 100,
  eta_init = 0.5,
  maxit = 100,
  fdev = 1e-20,
  standardize = FALSE,
  alpha = 1,
  thresh_glmnet = 1e-08,
  epsilon = 1e-04,
  dfmax = p_design + 2,
  verbose = 0
)
ggmix(
  x,
  y,
  U,
  D,
  kinship,
  K,
  n_nonzero_eigenvalues,
  n_zero_eigenvalues,
  estimation = c("full"),
  penalty = c("lasso"),
  group,
  penalty.factor = rep(1, p_design),
  lambda = NULL,
  lambda_min_ratio = ifelse(n_design < p_design, 0.01, 1e-04),
  nlambda = 100,
  eta_init = 0.5,
  maxit = 100,
  fdev = 1e-20,
  standardize = FALSE,
  alpha = 1,
  thresh_glmnet = 1e-08,
  epsilon = 1e-04,
  dfmax = p_design + 2,
  verbose = 0
)

Arguments

`x`	input matrix, of dimension n x p; where n is the number of observations and p are the number of predictors.
`y`	response variable. must be a quantitative variable
`U`	left singular vectors corresponding to the non-zero eigenvalues provided in the `D` argument.
`D`	non-zero eigenvalues. This option is provided to the user should they decide or need to calculate the eigen decomposition of the kinship matrix or the singular value decomposition of the matrix of SNPs used to calculate the kinship outside of this function. This may occur, if for example, it is easier (e.g. because of memory issues, it's easier to calculate in plink). This should correspond to the non-zero eigenvalues only. Note that if you are doing an `svd` on the matrix of SNPs used to calculate the kinship matrix, then you must provide the square of the singular values so that they correspond to the eigenvalues of the kinship matrix. If you want to use the low rank estimation algorithm, you must provide the truncated eigenvalues and eigenvectors to the `D` and `U` arguments, respectively. If you want `ggmix` to truncate the eigenvectors and eigenvalues for low rank estimation, then provide either `K` or `kinship` instead and specify `n_nonzero_eigenvalues`.
`kinship`	positive definite kinship matrix
`K`	the matrix of SNPs used to determine the kinship matrix
`n_nonzero_eigenvalues`	the number of nonzero eigenvalues. This argument is only used when `estimation="low"` and either `kinship` or `K` is provided. This argument will limit the function to finding the `n_nonzero_eigenvalues` largest eigenvalues. If `U` and `D` have been provided, then `n_nonzero_eigenvalues` defaults to the length of `D`.
`n_zero_eigenvalues`	Currently not being used. Represents the number of zero eigenvalues. This argument must be specified when `U` and `D` are specified and `estimation="low"`. This is required for low rank estimation because the number of zero eigenvalues and their corresponding eigenvalues appears in the likelihood. In general this would be the rank of the matrix used to calculate the eigen or singular value decomposition. When `kinship` is provided and `estimation="low"` the default value will be `nrow(kinship) - n_nonzero_eigenvalues`. When `K` is provided and `estimation="low"`, the default value is `rank(K) - n_nonzero_eigenvalues`
`estimation`	type of estimation. Currently only `type="full"` has been implemented.
`penalty`	type of regularization penalty. Currently, only penalty="lasso" has been implemented.
`group`	a vector of consecutive integers describing the grouping of the coefficients. Currently not implemented, but will be used when penalty="gglasso" is implemented.
`penalty.factor`	Separate penalty factors can be applied to each coefficient. This is a number that multiplies lambda to allow differential shrinkage. Can be 0 for some variables, which implies no shrinkage, and that variable is always included in the model. Default is 1 for all variables
`lambda`	A user supplied lambda sequence (this is the tuning parameter). Typical usage is to have the program compute its own lambda sequence based on nlambda and lambda.min.ratio. Supplying a value of lambda overrides this. WARNING: use with care. Do not supply a single value for lambda (for predictions after CV use predict() instead). Supply instead a decreasing sequence of lambda values. glmnet relies on its warms starts for speed, and its often faster to fit a whole path than compute a single fit.
`lambda_min_ratio`	Smallest value for lambda, as a fraction of lambda.max, the (data derived) entry value (i.e. the smallest value for which all coefficients are zero). The default depends on the sample size nobs relative to the number of variables nvars. If nobs > nvars, the default is 0.0001, close to zero. If nobs < nvars, the default is 0.01. A very small value of lambda.min.ratio will lead to a saturated fit in the nobs < nvars case.
`nlambda`	the number of lambda values - default is 100.
`eta_init`	initial value for the eta parameter, with $0 < \eta < 1$ used in determining lambda.max and starting value for fitting algorithm.
`maxit`	Maximum number of passes over the data for all lambda values; default is 10^2.
`fdev`	Fractional deviance change threshold. If change in deviance between adjacent lambdas is less than fdev, the solution path stops early. factory default = 1.0e-5
`standardize`	Logical flag for x variable standardization, prior to fitting the model sequence. The coefficients are always returned on the original scale. Default is standardize=FALSE. If variables are in the same units already, you might not wish to standardize.
`alpha`	The elasticnet mixing parameter, with $0 \leq \alpha \leq 1$ . alpha=1 is the lasso penalty, and alpha=0 the ridge penalty.
`thresh_glmnet`	Convergence threshold for coordinate descent for updating beta parameters. Each inner coordinate-descent loop continues until the maximum change in the objective after any coefficient update is less than thresh times the null deviance. Defaults value is 1E-7
`epsilon`	Convergence threshold for block relaxation of the entire parameter vector $\Theta = ( \beta, \eta, \sigma^2 )$ . The algorithm converges when $crossprod(\Theta_{j+1} - \Theta_{j}) < \epsilon$ . Defaults value is 1E-4
`dfmax`	limit the maximum number of variables in the model. Useful for very large `p` (the total number of predictors in the design matrix), if a partial path is desired. Default is the number of columns in the design matrix + 2 (for the variance components)
`verbose`	display progress. Can be either 0,1 or 2. 0 will not display any progress, 2 will display very detailed progress and 1 is somewhere in between. Default: 0.

Value

list which includes the fitted object, lambda sequence, solution path, variance-covariance parameters, degrees of freedom, and singular values, vectors of kinship matrix

References

Bhatnagar, Sahir R, Yang, Yi, Lu, Tianyuan, Schurr, Erwin, Loredo-Osti, JC, Forest, Marie, Oualkacha, Karim, and Greenwood, Celia MT. (2020) Simultaneous SNP selection and adjustment for population structure in high dimensional prediction models doi:10.1101/408484

Friedman, J., Hastie, T. and Tibshirani, R. (2008) Regularization Paths for Generalized Linear Models via Coordinate Descent, https://web.stanford.edu/~hastie/Papers/glmnet.pdf Journal of Statistical Software, Vol. 33(1), 1-22 Feb 2010 https://www.jstatsoft.org/v33/i01/

Yang, Y., & Zou, H. (2015). A fast unified algorithm for solving group-lasso penalize learning problems. Statistics and Computing, 25(6), 1129-1141. https://www.math.mcgill.ca/yyang/resources/papers/STCO_gglasso.pdf

Examples

data(admixed)
fitlmm <- ggmix(x = admixed$xtrain, y = admixed$ytrain,
                kinship = admixed$kin_train,
                estimation = "full")
gicfit <- gic(fitlmm)
coef(gicfit, type = "nonzero")
predict(gicfit, newx = admixed$xtest)[1:5,,drop=FALSE]
plot(gicfit)
plot(fitlmm)

data(admixed)
fitlmm <- ggmix(x = admixed$xtrain, y = admixed$ytrain,
                kinship = admixed$kin_train,
                estimation = "full")
gicfit <- gic(fitlmm)
coef(gicfit, type = "nonzero")
predict(gicfit, newx = admixed$xtest)[1:5,,drop=FALSE]
plot(gicfit)
plot(fitlmm)

Constructor functions for the different ggmix objects

Description

new_fullrank_kinship, new_fullrank_K, new_fullrank_UD, new_lowrank_kinship, new_lowrank_K and new_lowrank_UD create the ggmix objects from the provided data that are necessary to fit the penalized linear mixed model according to the user's parameters.

Usage

new_fullrank_kinship(x, y, kinship)

new_fullrank_K(x, y, K)

new_fullrank_UD(x, y, U, D)

new_lowrank_kinship(x, y, kinship, n_nonzero_eigenvalues, n_zero_eigenvalues)

new_lowrank_K(x, y, K, n_nonzero_eigenvalues, n_zero_eigenvalues)

new_lowrank_UD(x, y, U, D, n_nonzero_eigenvalues, n_zero_eigenvalues)
new_fullrank_kinship(x, y, kinship)

new_fullrank_K(x, y, K)

new_fullrank_UD(x, y, U, D)

new_lowrank_kinship(x, y, kinship, n_nonzero_eigenvalues, n_zero_eigenvalues)

new_lowrank_K(x, y, K, n_nonzero_eigenvalues, n_zero_eigenvalues)

new_lowrank_UD(x, y, U, D, n_nonzero_eigenvalues, n_zero_eigenvalues)

Arguments

`x`	input matrix, of dimension n x p; where n is the number of observations and p are the number of predictors.
`y`	response variable. must be a quantitative variable
`kinship`	positive definite kinship matrix
`K`	the matrix of SNPs used to determine the kinship matrix
`U`	left singular vectors corresponding to the non-zero eigenvalues provided in the `D` argument.
`D`	non-zero eigenvalues. This option is provided to the user should they decide or need to calculate the eigen decomposition of the kinship matrix or the singular value decomposition of the matrix of SNPs used to calculate the kinship outside of this function. This may occur, if for example, it is easier (e.g. because of memory issues, it's easier to calculate in plink). This should correspond to the non-zero eigenvalues only. Note that if you are doing an `svd` on the matrix of SNPs used to calculate the kinship matrix, then you must provide the square of the singular values so that they correspond to the eigenvalues of the kinship matrix. If you want to use the low rank estimation algorithm, you must provide the truncated eigenvalues and eigenvectors to the `D` and `U` arguments, respectively. If you want `ggmix` to truncate the eigenvectors and eigenvalues for low rank estimation, then provide either `K` or `kinship` instead and specify `n_nonzero_eigenvalues`.
`n_nonzero_eigenvalues`	the number of nonzero eigenvalues. This argument is only used when `estimation="low"` and either `kinship` or `K` is provided. This argument will limit the function to finding the `n_nonzero_eigenvalues` largest eigenvalues. If `U` and `D` have been provided, then `n_nonzero_eigenvalues` defaults to the length of `D`.
`n_zero_eigenvalues`	the number of desired or specified zero eigenvalues. This is only needed when `estimation="lowrank"`, and is calculated internally by the `ggmix` function. It is equal to the number of observations minus `n_nonzero_eigenvalues`

Value

A ggmix object, of the class that corresponds to the estimation method. These objects are lists that contain the data necessary for computation. These functions are not meant to be called directly by the user

Generalised Information Criterion

Description

Calculates the generalised information criterion for each value of the tuning parameter lambda

Usage

gic(ggmix_fit, ...)

## Default S3 method:
gic(ggmix_fit, ...)

## S3 method for class 'ggmix_fit'
gic(ggmix_fit, ..., an = log(log(n)) * log(p))
gic(ggmix_fit, ...)

## Default S3 method:
gic(ggmix_fit, ...)

## S3 method for class 'ggmix_fit'
gic(ggmix_fit, ..., an = log(log(n)) * log(p))

Arguments

`ggmix_fit`	An object of class `ggmix_fit` which is outputted by the `ggmix` function
`...`	other parameters. currently ignored.
`an`	numeric, the penalty per parameter to be used; the default is an = log(log(n))*log(p) where n is the number of subjects and p is the number of parameters

Details

the generalised information criterion used for gaussian response is given by

$-2 * loglikelihood(\hat{\Theta}) + an * df$

where df is the number of non-zero estimated parameters, including variance components

Value

an object with S3 class "ggmix_gic", "ggmix_fit", "*" and "**" where "*" is "lasso" or "gglasso" and "**" is fullrank or lowrank. Results are provided for converged values of lambda only.

ggmix_fit: the ggmix_fit object
lambda: the sequence of converged tuning parameters
nzero: the number of non-zero estimated coefficients including the 2 variance parameters which are not penalized and therefore always included
gic: gic value. a numeric vector with length equal to length(lambda)
lambda.min.name: a character corresponding to the name of the tuning parameter lambda which minimizes the gic
lambda.min: the value of lambda which minimizes the gic

References

Fan Y, Tang CY. Tuning parameter selection in high dimensional penalized likelihood. Journal of the Royal Statistical Society: Series B (Statistical Methodology). 2013 Jun 1;75(3):531-52.

Nishii R. Asymptotic properties of criteria for selection of variables in multiple regression. The Annals of Statistics. 1984;12(2):758-65.

Functions related to eta parameter used in optim and kkt checks

Description

Used for gradient of eta. Currently being passed to optim in lmmlasso and used in kkt_check

Usage

gr_eta_lasso_fullrank(eta, sigma2, beta, eigenvalues, x, y, nt)

fn_eta_lasso_fullrank(eta, sigma2, beta, eigenvalues, x, y, nt)
gr_eta_lasso_fullrank(eta, sigma2, beta, eigenvalues, x, y, nt)

fn_eta_lasso_fullrank(eta, sigma2, beta, eigenvalues, x, y, nt)

Arguments

`eta`	current estimate of the eta parameter
`sigma2`	current estimate of the sigma2 parameter
`beta`	current estimate of the beta parameter including the intercept. this should be of length p+1, where p is the number of variables.
`eigenvalues`	non-zero eigenvalues of the kinship matrix, or the square of the singular values of the matrix used to construct the kinship matrix
`x`	input matrix, of dimension n x p; where n is the number of observations and p are the number of predictors.
`y`	response variable. must be a quantitative variable
`nt`	total number of observations

Value

returns the value of the function and gradient for the eta variance parameter

Karim's Simulated Data

Description

A simulated dataset with a kinship matrix

Usage

karim
karim

Format

A list with 6 elements:

b: vector of length 1000 representing the true regression coefficients. 10 non-zero coefficients, the rest are 0.
kin1: the true kinship matrix
s.g: polygenic variance, set to be 1.26
s.e: error variance, set to be 1
h.tot: the total trait heritability. Set to be 60 of genotypes of dimension 600 x 1000 SNPs, with approximately 800 common and 200 rare SNPs

Details

If you simulate data using the scenario provided in the example, then the QTL heritability of y will be 8 of the trait’s total heritability), and the trait total heritability is set to be 60

Examples

data(karim)
# Simulate a response using the genotype matrix and the kinship matrix
Phi <- 2 * karim$kin1
intercept <- 1
P <- MASS::mvrnorm(1, rep(0,600), karim$s.g * Phi)
y <- intercept + karim$G %*% karim$b + P + rnorm(600,0,karim$s.e)
data(karim)
# Simulate a response using the genotype matrix and the kinship matrix
Phi <- 2 * karim$kin1
intercept <- 1
P <- MASS::mvrnorm(1, rep(0,600), karim$s.g * Phi)
y <- intercept + karim$G %*% karim$b + P + rnorm(600,0,karim$s.e)

Check of KKT Conditions for Linear Mixed Model

Description

This function checks the KKT conditions

Usage

kkt_check(eta, sigma2, beta, eigenvalues, x, y, nt, lambda, tol.kkt = 0.001)

grr_sigma2(eta, sigma2, beta, eigenvalues, x, y, nt)

grr_beta0(eta, sigma2, beta, eigenvalues, x, y, nt)
kkt_check(eta, sigma2, beta, eigenvalues, x, y, nt, lambda, tol.kkt = 0.001)

grr_sigma2(eta, sigma2, beta, eigenvalues, x, y, nt)

grr_beta0(eta, sigma2, beta, eigenvalues, x, y, nt)

Arguments

`eta`	current estimate of the eta parameter
`sigma2`	current estimate of the sigma2 parameter
`beta`	current estimate of the beta parameter including the intercept. this should be of length p+1, where p is the number of variables.
`eigenvalues`	non-zero eigenvalues of the kinship matrix, or the square of the singular values of the matrix used to construct the kinship matrix
`x`	rotated x. Should be U^T X, where U is the matrix of eigenvectors and X contains the first column of ones for the intercept. x should be a mtrix of dimension n x (p+1). These are outputted by the constructor functions. See `ggmix_data_object` for details
`y`	rotated y. Should be U^T Y, where U is the matrix of eigenvectors and Y is the response.
`nt`	total number of observations
`lambda`	A user supplied lambda sequence (this is the tuning parameter). Typical usage is to have the program compute its own lambda sequence based on nlambda and lambda.min.ratio. Supplying a value of lambda overrides this. WARNING: use with care. Do not supply a single value for lambda (for predictions after CV use predict() instead). Supply instead a decreasing sequence of lambda values. glmnet relies on its warms starts for speed, and its often faster to fit a whole path than compute a single fit.
`tol.kkt`	Tolerance for determining if an entry of the subgradient is zero

Value

returns the values of the gradient for each of the parameters

Note

grr_sigma2 and grr_beta0 are functions for the gradient of sigma2 and beta0, respectively

Estimation of Lambda Sequence for Linear Mixed Model with Lasso Penalty

Description

lambdalasso estimates a decreasing sequence of tuning parameters

Usage

lambdalasso(ggmix_object, ...)

## Default S3 method:
lambdalasso(ggmix_object, ...)

## S3 method for class 'fullrank'
lambdalasso(
  ggmix_object,
  ...,
  penalty.factor,
  lambda_min_ratio,
  epsilon = 1e-14,
  tol.kkt = 1e-09,
  eta_init = 0.5,
  nlambda = 100,
  scale_x = FALSE,
  center_y = FALSE
)
lambdalasso(ggmix_object, ...)

## Default S3 method:
lambdalasso(ggmix_object, ...)

## S3 method for class 'fullrank'
lambdalasso(
  ggmix_object,
  ...,
  penalty.factor,
  lambda_min_ratio,
  epsilon = 1e-14,
  tol.kkt = 1e-09,
  eta_init = 0.5,
  nlambda = 100,
  scale_x = FALSE,
  center_y = FALSE
)

Arguments

`ggmix_object`	A ggmix_object object of class `lowrank` or `fullrank`
`...`	Extra parameters. Currently ignored.
`penalty.factor`	Separate penalty factors can be applied to each coefficient. This is a number that multiplies lambda to allow differential shrinkage. Can be 0 for some variables, which implies no shrinkage, and that variable is always included in the model. Default is 1 for all variables
`lambda_min_ratio`	Smallest value for lambda, as a fraction of lambda.max, the (data derived) entry value (i.e. the smallest value for which all coefficients are zero). The default depends on the sample size nobs relative to the number of variables nvars. If nobs > nvars, the default is 0.0001, close to zero. If nobs < nvars, the default is 0.01. A very small value of lambda.min.ratio will lead to a saturated fit in the nobs < nvars case.
`epsilon`	Convergence threshold for block relaxation of the entire parameter vector $\Theta = ( \beta, \eta, \sigma^2 )$ . The algorithm converges when $crossprod(\Theta_{j+1} - \Theta_{j}) < \epsilon$ . Defaults value is 1E-4
`tol.kkt`	KKT tolerance. Currently ignored
`eta_init`	initial value for the eta parameter, with $0 < \eta < 1$ used in determining lambda.max and starting value for fitting algorithm.
`nlambda`	the number of lambda values - default is 100.
`scale_x`	should the columns of x be scaled - default is FALSE
`center_y`	should y be mean centered - default is FALSE.

Value

A decreasing sequence of tuning parameters

Note

This function isn't meant to be called directly by the user.

Estimation of Linear Mixed Model with Lasso Penalty

Description

lmmlasso estimates the linear mixed model with lasso penalty

Usage

lmmlasso(ggmix_object, ...)

## Default S3 method:
lmmlasso(ggmix_object, ...)

## S3 method for class 'fullrank'
lmmlasso(
  ggmix_object,
  ...,
  penalty.factor,
  lambda,
  lambda_min_ratio,
  nlambda,
  n_design,
  p_design,
  eta_init,
  maxit,
  fdev,
  standardize,
  alpha,
  thresh_glmnet,
  epsilon,
  dfmax,
  verbose
)
lmmlasso(ggmix_object, ...)

## Default S3 method:
lmmlasso(ggmix_object, ...)

## S3 method for class 'fullrank'
lmmlasso(
  ggmix_object,
  ...,
  penalty.factor,
  lambda,
  lambda_min_ratio,
  nlambda,
  n_design,
  p_design,
  eta_init,
  maxit,
  fdev,
  standardize,
  alpha,
  thresh_glmnet,
  epsilon,
  dfmax,
  verbose
)

Arguments

`ggmix_object`	A ggmix_object object of class `lowrank` or `fullrank`
`...`	Extra parameters. Currently ignored.
`penalty.factor`	Separate penalty factors can be applied to each coefficient. This is a number that multiplies lambda to allow differential shrinkage. Can be 0 for some variables, which implies no shrinkage, and that variable is always included in the model. Default is 1 for all variables
`lambda`	A user supplied lambda sequence (this is the tuning parameter). Typical usage is to have the program compute its own lambda sequence based on nlambda and lambda.min.ratio. Supplying a value of lambda overrides this. WARNING: use with care. Do not supply a single value for lambda (for predictions after CV use predict() instead). Supply instead a decreasing sequence of lambda values. glmnet relies on its warms starts for speed, and its often faster to fit a whole path than compute a single fit.
`lambda_min_ratio`	Smallest value for lambda, as a fraction of lambda.max, the (data derived) entry value (i.e. the smallest value for which all coefficients are zero). The default depends on the sample size nobs relative to the number of variables nvars. If nobs > nvars, the default is 0.0001, close to zero. If nobs < nvars, the default is 0.01. A very small value of lambda.min.ratio will lead to a saturated fit in the nobs < nvars case.
`nlambda`	the number of lambda values - default is 100.
`n_design`	total number of observations
`p_design`	number of variables in the design matrix, excluding the intercept column
`eta_init`	initial value for the eta parameter, with $0 < \eta < 1$ used in determining lambda.max and starting value for fitting algorithm.
`maxit`	Maximum number of passes over the data for all lambda values; default is 10^2.
`fdev`	Fractional deviance change threshold. If change in deviance between adjacent lambdas is less than fdev, the solution path stops early. factory default = 1.0e-5
`standardize`	Logical flag for x variable standardization, prior to fitting the model sequence. The coefficients are always returned on the original scale. Default is standardize=FALSE. If variables are in the same units already, you might not wish to standardize.
`alpha`	The elasticnet mixing parameter, with $0 \leq \alpha \leq 1$ . alpha=1 is the lasso penalty, and alpha=0 the ridge penalty.
`thresh_glmnet`	Convergence threshold for coordinate descent for updating beta parameters. Each inner coordinate-descent loop continues until the maximum change in the objective after any coefficient update is less than thresh times the null deviance. Defaults value is 1E-7
`epsilon`	Convergence threshold for block relaxation of the entire parameter vector $\Theta = ( \beta, \eta, \sigma^2 )$ . The algorithm converges when $crossprod(\Theta_{j+1} - \Theta_{j}) < \epsilon$ . Defaults value is 1E-4
`dfmax`	limit the maximum number of variables in the model. Useful for very large `p` (the total number of predictors in the design matrix), if a partial path is desired. Default is the number of columns in the design matrix + 2 (for the variance components)
`verbose`	display progress. Can be either 0,1 or 2. 0 will not display any progress, 2 will display very detailed progress and 1 is somewhere in between. Default: 0.

Value

A object of class ggmix

Estimation of Log-likelihood for Linear Mixed Model with Lasso Penalty

Description

sigma2lasso estimates the value of the sigma2 for the linear mixed model with lasso penalty

Usage

logliklasso(ggmix_object, ...)

## Default S3 method:
logliklasso(ggmix_object, ...)

## S3 method for class 'fullrank'
logliklasso(ggmix_object, ..., eta, sigma2, beta, nt, x, y)
logliklasso(ggmix_object, ...)

## Default S3 method:
logliklasso(ggmix_object, ...)

## S3 method for class 'fullrank'
logliklasso(ggmix_object, ..., eta, sigma2, beta, nt, x, y)

Arguments

`ggmix_object`	A ggmix_object object of class `lowrank` or `fullrank`
`...`	Extra parameters. Currently ignored.
`eta`	current estimate of the eta parameter
`sigma2`	current estimate of the sigma2 parameter
`beta`	current estimate of the beta parameter including the intercept. this should be of length p+1, where p is the number of variables.
`nt`	total number of observations
`x`	input matrix, of dimension n x p; where n is the number of observations and p are the number of predictors.
`y`	response variable. must be a quantitative variable

Value

A decreasing sequence of tuning parameters

Note

This function isn't meant to be called directly by the user.

Plot Method for `ggmix_fit` object

Description

Produces a coefficient profile plot of the coefficient paths for a fitted ggmix_fit object.

Usage

## S3 method for class 'ggmix_fit'
plot(x, ..., xvar = c("norm", "lambda", "dev"), label = FALSE, sign.lambda = 1)

plotCoef(
  beta,
  norm,
  lambda,
  df,
  dev,
  label = FALSE,
  xvar = c("norm", "lambda", "dev"),
  xlab = iname,
  ylab = "Coefficients",
  ...
)
## S3 method for class 'ggmix_fit'
plot(x, ..., xvar = c("norm", "lambda", "dev"), label = FALSE, sign.lambda = 1)

plotCoef(
  beta,
  norm,
  lambda,
  df,
  dev,
  label = FALSE,
  xvar = c("norm", "lambda", "dev"),
  xlab = iname,
  ylab = "Coefficients",
  ...
)

Arguments

`x`	a `ggmix_fit` object
`...`	other graphical parameters passed to `plot`
`xvar`	What is on the X-axis. "norm" plots against the L1-norm of the coefficients, "lambda" against the log-lambda sequence, and "dev" against the percent deviance explained.
`label`	If TRUE, label the curves with variable sequence numbers.
`sign.lambda`	Either plot against log(lambda) (default) or its negative if sign.lambda=-1
`beta`	fixed effects estimates
`norm`	l1 norm of fixed effect estimates. if missing, (default) this function will calculate it
`lambda`	sequence of tuning parameters
`df`	number of non-zero fixed + random effects
`dev`	percent deviance
`xlab`	x-axis label
`ylab`	y-axis label

Details

A coefficient profile plot is produced

Value

A plot is produced and nothing is returned

Plot the Generalised Information Criteria curve produced by `gic`

Description

Plots the Generalised Information Criteria curve, as a function of the lambda values used

Usage

## S3 method for class 'ggmix_gic'
plot(
  x,
  ...,
  sign.lambda = 1,
  type = c("gic", "QQranef", "QQresid", "predicted", "Tukey-Anscombe"),
  s = "lambda.min",
  newy,
  newx
)

plotGIC(x, sign.lambda, lambda.min, ...)
## S3 method for class 'ggmix_gic'
plot(
  x,
  ...,
  sign.lambda = 1,
  type = c("gic", "QQranef", "QQresid", "predicted", "Tukey-Anscombe"),
  s = "lambda.min",
  newy,
  newx
)

plotGIC(x, sign.lambda, lambda.min, ...)

Arguments

`x`	fitted linear mixed model object of class `ggmix_gic` from the `gic` function
`...`	Other graphical parameters to plot
`sign.lambda`	Either plot against log(lambda) (default) or its negative if sign.lambda=-1
`type`	`gic` returns a plot of the GIC vs. log(lambda). `QQranef` return a qqplot of the random effects. `QQresid` returns a qqplot of the residuals which is $y - X\beta - b_i$ where b_i is the subject specific random effect. `predicted` returns a plot of the predicted response ( $X \beta$ + b_i) vs. the observed response, where b_i is the subject specific random effect. `Tukey-Anscombe` returns a plot of the residuals vs. fitted values ( $X \beta$ )
`s`	Value of the penalty parameter `lambda` at which predictions are required. Default is the value `s="lambda.min"`. If `s` is numeric, it is taken as the value of `lambda` to be used. Must be a single value of the penalty parameter `lambda` at which coefficients will be extracted via the `coef` method for objects of class `ggmix_gic`. If more than one is supplied, only the first one will be used.
`newy`	the response variable that was provided to `ggmix`. this is only required for `type="QQresis"`, `type="Tukey-Anscombe"` and `type="predicted"`
`newx`	matrix of values for `x` at which predictions are to be made. Do not include the intercept. this is only required for `type="QQresis"`, `type="Tukey-Anscombe"` and `type="predicted"`
`lambda.min`	the value of lambda which minimizes the gic

Details

A plot is produced, and nothing is returned.

Value

plot depends on the type selected

Examples

data("admixed")
fit <- ggmix(x = admixed$xtrain,
             y = admixed$ytrain,
             kinship = admixed$kin_train)
hdbic <- gic(fit)

# plot solution path
plot(fit)

# plot HDBIC curve as a function of lambda
plot(hdbic)
data("admixed")
fit <- ggmix(x = admixed$xtrain,
             y = admixed$ytrain,
             kinship = admixed$kin_train)
hdbic <- gic(fit)

# plot solution path
plot(fit)

# plot HDBIC curve as a function of lambda
plot(hdbic)

Make predictions from a `ggmix_fit` object

Description

Similar to other predict methods, this functions predicts fitted values, coefficients and more from a fitted ggmix_fit object.

Usage

## S3 method for class 'ggmix_fit'
predict(
  object,
  newx,
  s = NULL,
  type = c("link", "response", "coefficients", "all", "nonzero", "individual"),
  covariance,
  ...
)

## S3 method for class 'ggmix_fit'
coef(object, s = NULL, type, ...)
## S3 method for class 'ggmix_fit'
predict(
  object,
  newx,
  s = NULL,
  type = c("link", "response", "coefficients", "all", "nonzero", "individual"),
  covariance,
  ...
)

## S3 method for class 'ggmix_fit'
coef(object, s = NULL, type, ...)

Arguments

`object`	Fitted `ggmix_fit` model object from the `ggmix` function
`newx`	matrix of values for `x` at which predictions are to be made. Do not include the intercept. Must be a matrix. This argument is not used for `type = c("coefficients","nonzero","all")`. This matrix must have the same number of columns originally supplied to the `ggmix` fitting function.
`s`	Value(s) of the penalty parameter `lambda` at which predictions are required. Default is the entire sequence used to create the model.
`type`	Type of prediction required. Type `"link"` gives the fitted values $X \beta$ . Type `"response"` is equivalent to type `"link"`. Type `"coefficients"` computes the coefficients at the requested values for `s` and returns the regression coefficients only, including the intercept. Type `"all"` returns both the regression coefficients and variance components at the requested value of `s`. Type `"nonzero"` returns a 1 column matrix of the the nonzero fixed effects, as well as variance components for each value of `s`. If more than one `s` is provided, then `"nonzero"` will return a list of 1 column matrices. Default: "link"
`covariance`	covariance between test and training individuals. if there are q testing individuals and N-q training individuals, then this covariance matrix is q x (N-q)
`...`	additional arguments to pass to predict function

Details

s is the new vector at which predictions are requested. If s is not in the lambda sequence used for fitting the model, the predict function will use linear interpolation to make predictions. The new values are interpolated using a fraction of predicted values from both left and right lambda indices. coef(...) is equivalent to predict(ggmix_fit, type="coefficients",...). To get individual level predictions at each value of lambda, you must provide the lambda sequence to the s argument. You can pass either a ggmix_fit or ggmix_gic object. See examples for more details.

Value

The object returned depends on type.

Examples

data("admixed")
fitlmm <- ggmix(x = admixed$xtrain, y = admixed$ytrain,
                kinship = admixed$kin_train,
                estimation = "full")
bicGGMIX <- gic(fitlmm,
                an = log(length(admixed$ytrain)))
plot(bicGGMIX)
coef(bicGGMIX, s = "lambda.min")
yhat_test <- predict(bicGGMIX, s="lambda.min",
                     newx = admixed$xtest, type = "individual",
                     covariance = admixed$kin_test_train)
cor(yhat_test, admixed$ytest)
yhat_test_population <- predict(bicGGMIX, s="lambda.min",
                                newx = admixed$xtest,
                                type = "response")
data("admixed")
fitlmm <- ggmix(x = admixed$xtrain, y = admixed$ytrain,
                kinship = admixed$kin_train,
                estimation = "full")
bicGGMIX <- gic(fitlmm,
                an = log(length(admixed$ytrain)))
plot(bicGGMIX)
coef(bicGGMIX, s = "lambda.min")
yhat_test <- predict(bicGGMIX, s="lambda.min",
                     newx = admixed$xtest, type = "individual",
                     covariance = admixed$kin_test_train)
cor(yhat_test, admixed$ytest)
yhat_test_population <- predict(bicGGMIX, s="lambda.min",
                                newx = admixed$xtest,
                                type = "response")

Make predictions from a `ggmix_gic` object

Description

This function makes predictions from a ggmix_gic object, using the stored "ggmix_fit" object, and the optimal value chosen for lambda using the gic.

Usage

## S3 method for class 'ggmix_gic'
predict(object, newx, s = c("lambda.min"), ...)

## S3 method for class 'ggmix_gic'
coef(object, s = c("lambda.min"), type, ...)
## S3 method for class 'ggmix_gic'
predict(object, newx, s = c("lambda.min"), ...)

## S3 method for class 'ggmix_gic'
coef(object, s = c("lambda.min"), type, ...)

Arguments

`object`	fitted `ggmix_gic` object
`newx`	matrix of values for `x` at which predictions are to be made. Do not include the intercept. Must be a matrix. This argument is not used for `type = c("coefficients","nonzero","all")`. This matrix must have the same number of columns originally supplied to the `ggmix` fitting function.
`s`	Value(s) of the penalty parameter `lambda` at which predictions are required. Default is the value `s="lambda.min"` can be used. If `s` is numeric, it is taken as the value(s) of `lambda` to be used.
`...`	other arguments passed to `predict.ggmix_fit`
`type`	Type of prediction required. Type `"link"` gives the fitted values $X \beta$ . Type `"response"` is equivalent to type `"link"`. Type `"coefficients"` computes the coefficients at the requested values for `s` and returns the regression coefficients only, including the intercept. Type `"all"` returns both the regression coefficients and variance components at the requested value of `s`. Type `"nonzero"` returns a 1 column matrix of the the nonzero fixed effects, as well as variance components for each value of `s`. If more than one `s` is provided, then `"nonzero"` will return a list of 1 column matrices. Default: "link"

Details

This function makes it easier to use the results of gic chosen model to make a prediction.

Value

The object returned depends the ... argument which is passed on to the predict method for ggmix_fit objects.

Print Method for Objects of Class `ggmix_fit`

Description

print method for objects of class ggmix_fit

Usage

## S3 method for class 'ggmix_fit'
print(x, ..., digits = max(3, getOption("digits") - 3))

## S3 method for class 'ggmix_gic'
print(x, ..., digits = max(3, getOption("digits") - 3))
## S3 method for class 'ggmix_fit'
print(x, ..., digits = max(3, getOption("digits") - 3))

## S3 method for class 'ggmix_gic'
print(x, ..., digits = max(3, getOption("digits") - 3))

Arguments

`x`	an object of class objects of class `ggmix_fit`
`...`	other arguments passed to `print`
`digits`	significant digits in printout. Default: `max(3, getOption("digits") - 3)`

Value

The call that produced the object x is printed, followed by a three-column matrix with columns Df, %Dev, and and Lambda. The Df columns correspond to the number of nonzero coefficients including variance components. %dev is the percent deviance explained (relative to the null deviance). Lambda is the sequence of converged tuning parameters.

Extract Random Effects

Description

Generic function for extracting the random effects. This is the same generic (and same name) defined in the nlme and lme4 package.

Usage

ranef(object, ...)

random.effects(object, ...)

## Default S3 method:
random.effects(object, ...)

## Default S3 method:
ranef(object, ...)

## S3 method for class 'ggmix_gic'
ranef(object, s = "lambda.min", ...)
ranef(object, ...)

random.effects(object, ...)

## Default S3 method:
random.effects(object, ...)

## Default S3 method:
ranef(object, ...)

## S3 method for class 'ggmix_gic'
ranef(object, s = "lambda.min", ...)

Arguments

`object`	any fitted model object from which random effects estimates can be extracted. Currently supports "ggmix_gic" objects outputted by the `gic` function
`...`	other parameters. currently ignored
`s`	Value(s) of the penalty parameter `lambda` at which predictions are required. Default is the value `s="lambda.min"` can be used. If `s` is numeric, it is taken as the value(s) of `lambda` to be used.

Details

For objects of class "ggmix_gic", this function returns the subject-specific random effect value for the model which minimizes the GIC using the maximum a posteriori principle

Value

a numeric vector of length equal to the number of observations of subject-specific random effects

Examples

data("admixed")
fit <- ggmix(x = admixed$xtrain, y = admixed$ytrain,
            kinship = admixed$kin_train)
gicfit <- gic(fit)
# random effect at selected value of lambda
plot(ggmix::ranef(gicfit))
# random effects at specific values of lambda
head(ggmix::ranef(gicfit, s = c(0.1,0.2)))
data("admixed")
fit <- ggmix(x = admixed$xtrain, y = admixed$ytrain,
            kinship = admixed$kin_train)
gicfit <- gic(fit)
# random effect at selected value of lambda
plot(ggmix::ranef(gicfit))
# random effects at specific values of lambda
head(ggmix::ranef(gicfit, s = c(0.1,0.2)))

Estimation of Sigma2 for Linear Mixed Model with Lasso Penalty

Description

sigma2lasso estimates the value of the sigma2 for the linear mixed model with lasso penalty

Usage

sigma2lasso(ggmix_object, ...)

## Default S3 method:
sigma2lasso(ggmix_object, ...)

## S3 method for class 'fullrank'
sigma2lasso(ggmix_object, ..., n, beta, eta)
sigma2lasso(ggmix_object, ...)

## Default S3 method:
sigma2lasso(ggmix_object, ...)

## S3 method for class 'fullrank'
sigma2lasso(ggmix_object, ..., n, beta, eta)

Arguments

`ggmix_object`	A ggmix_object object of class `lowrank` or `fullrank`
`...`	Extra parameters. Currently ignored.
`n`	number of observations
`beta`	current estimate of the beta parameter including the intercept. this should be of length p+1, where p is the number of variables.
`eta`	current estimate of the eta parameter

Value

A decreasing sequence of tuning parameters

Note

There is a closed form solution for sigma^2, given beta and eta. This function isn't meant to be called directly by the user.

Package 'ggmix'

Help Index

Simulated Dataset with 1D Geography

Description

Usage

Format

Details

Value

References

Examples

Simulation Scenario from Bhatnagar et al. (2018+) ggmix paper

Description

Usage

Arguments

Details

Value

See Also

Examples

Fit Linear Mixed Model with Lasso or Group Lasso Regularization

Description

Usage

Arguments

Value

References

Examples

Constructor functions for the different ggmix objects

Description

Usage

Arguments

Value

See Also

Generalised Information Criterion

Description

Usage

Arguments

Details

Value

References

See Also

Functions related to eta parameter used in optim and kkt checks

Description

Usage

Arguments

Value

See Also

Karim's Simulated Data

Description

Usage

Format

Details

Examples

Check of KKT Conditions for Linear Mixed Model

Description

Usage

Arguments

Value

Note

Estimation of Lambda Sequence for Linear Mixed Model with Lasso Penalty

Description

Usage

Arguments

Value

Note

See Also

Estimation of Linear Mixed Model with Lasso Penalty

Description

Usage

Arguments

Value

See Also

Estimation of Log-likelihood for Linear Mixed Model with Lasso Penalty

Description

Usage

Arguments

Value

Note

See Also

Plot Method for ggmix_fit object

Description

Usage

Plot Method for `ggmix_fit` object

Plot the Generalised Information Criteria curve produced by `gic`

Make predictions from a `ggmix_fit` object

Make predictions from a `ggmix_gic` object

Print Method for Objects of Class `ggmix_fit`