--- title: "Alternative Inputs for Population Structure" author: "Sahir Rai Bhatnagar" date: "`r Sys.Date()`" output: html_document: toc: true toc_float: true #code_folding: hide fig_retina: null vignette: > %\VignetteIndexEntry{Alternative Inputs for Population Structure} %\VignetteEncoding{UTF-8} %\VignetteEngine{knitr::rmarkdown} editor_options: chunk_output_type: console --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` There are three different ways that the `ggmix` function can take the population structure information: 1) `kinship`: A $n \times n$ kinship matrix 2) `K`: A $n \times k$ matrix of SNPs used to determine the relatedness between subjects 3) `U`: the eigenvectors of the kinship matrix or the left singular vectors of `K`, and `D`: the eigenvalues of the kinship matrix or the square of the singular values of `K` We provide this flexibility to allow for a low rank estimation procedure (not yet implemented), and in case the user has already calculated this information elsewhere. ## The kinship argument We have already seen how to use the `kinship` argument: ```{r} library(ggmix) data("admixed") # supply p.fac to the penalty.factor argument res <- ggmix(x = admixed$xtrain, y = admixed$ytrain, kinship = admixed$kin_train) plot(res) coef(res, s = 0.059, type = "nonzero") ``` ## The U and D arguments Alternatively we can directly supply the eigenvectors and eigenvalues of the kinship matrix: ```{r} eigKinship <- eigen(admixed$kin_train) resUD <- ggmix(x = admixed$xtrain, y = admixed$ytrain, U = eigKinship$vectors[,which(eigKinship$values>0)], D = eigKinship$values[which(eigKinship$values>0)]) plot(resUD) coef(resUD, s = 0.059, type = "nonzero") ``` We can also run a singular value decomposition (SVD) on the matrix of SNPs used to construct the kinship matrix. This approach is computationally less expensive since it bypasses explicit computation of the kinship matrix. ```{r} svdXkinship <- svd(admixed$Xkinship) resUDsvd <- ggmix(x = admixed$xtrain, y = admixed$ytrain, U = svdXkinship$u, D = svdXkinship$d ^ 2) plot(resUDsvd) coef(resUDsvd, s = 0.059, type = "nonzero") ``` The results differ from the one using the kinship matrix above because this simulated data comes from an admixed population, and therefore, using a matrix of SNPs in this case may not be appropriate. The kinship matrix in the `admixed` data was calculated using the `popkin` function from the `popkin` package. ## The K argument We can directly supply the matrix of SNPs used to calculated the kinship matrix also to the `K` argument: ```{r} resK <- ggmix(x = admixed$xtrain, y = admixed$ytrain, K = admixed$kin_train) plot(resK) coef(resK, s = 0.059, type = "nonzero") ```