--- title: "Evaluate continuous biomarkers with caROC" author: "Ziyi Li (zli16@mdanderson.org)" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Evaluate continuous biomarkers with caROC} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` `caROC` is an R package devoted to the assessment of continuous biomarkers. The metrics considered include **specificity at contolled sensitivity level**, **sensitivity at controlled specificity level**, and **receiver operating characteristic** (ROC) curve. If evaluation in specific sub-population is interested, all these statistics can also be computed in the version sub-population specific analysis. We allow both categorical and continuous covariates to be adjusted in computing these metrics. ## Installation and quick start ### Install caROC Install `caROC` through ```{r install, eval=FALSE, message=FALSE, warning=FALSE} library(devtools) install_github("ziyili20/caROC") ``` ### How to get help for caROC Any caROC questions should be posted to the GitHub Issue section of caROC homepage at https://github.com/ziyili20/caROC/issues. ### Quick start on evaluating continuous biomarker with covariates adjusted ```{r quick_start, eval = FALSE, message=FALSE} library(caROC) ## get specificity at controlled sensitivity levels 0.2, 0.8, 0.9 caROC(diseaseData,controlData,formula, control_sensitivity = c(0.2,0.8, 0.9), control_specificity = NULL) ## get covariate-adjusted ROC curve with curve-based monotonizing method curveROC <- caROC(diseaseData,controlData,formula, mono_resp_method = "curve", verbose = FALSE) ``` ### Illustrating the usage of caROC in details The tutorial is based on a simulation dataset: ```{r getdata, eval = TRUE, warning=FALSE, message=FALSE} library(caROC) ### n1: number of cases ### n0: number of controls n1 = n0 = 1000 ## Z_D and Z_C are the covariates in the disease and control groups Z_D1 <- rbinom(n1, size = 1, prob = 0.3) Z_D2 <- rnorm(n1, 0.8, 1) Z_C1 <- rbinom(n0, size = 1, prob = 0.7) Z_C2 <- rnorm(n0, 0.8, 1) Y_C_Z0 <- rnorm(n0, 0.1, 1) Y_D_Z0 <- rnorm(n1, 1.1, 1) Y_C_Z1 <- rnorm(n0, 0.2, 1) Y_D_Z1 <- rnorm(n1, 0.9, 1) ## M0 and M1 are the outcome of interest (biomarker to be evaluated) in the control and disease groups M0 <- Y_C_Z0 * (Z_C1 == 0) + Y_C_Z1 * (Z_C1 == 1) + Z_C2 M1 <- Y_D_Z0 * (Z_D1 == 0) + Y_D_Z1 * (Z_D1 == 1) + 1.5 * Z_D2 diseaseData <- data.frame(M = M1, Z1 = Z_D1, Z2 = Z_D2) controlData <- data.frame(M = M0, Z1 = Z_C1, Z2 = Z_C2) ## we are interested in evaluating biomarker M while adjusting for covariate Z userFormula = "M~Z1+Z2" ``` ## 1. Covariate-adjusted sensitivity at controlled specificity level (or the reverse) ### 1.1 Compute pooled sensitivity at controlled specificed level One can easily compute covariate-adjusted specificity at controlled sensitivity levels by specifying `control_sensitivity` and leaving `control_specificity` NULL. `mono_resp_method` is to choose which monotonicity restoration method to use, "none" or "ROC". `whichSE` is to choose how to compute standard error. It could be "boostrap" or "numerical", i.e. boostrap-based or sample-based SE. Try ?caROC to see more details of these arguments. ```{r controlspec, warning=FALSE, message=FALSE} caROC(diseaseData,controlData,userFormula, control_sensitivity = c(0.2,0.8, 0.9), control_specificity = NULL, mono_resp_method = "ROC", whichSE = "bootstrap",nbootstrap = 100, CI_alpha = 0.95, logit_CI = TRUE) ``` To compute covariate-adjusted sensitivity at controlled specificity levels by specifying `control_specificity` and leaving `control_sensitivity` NULL. ```{r controlsens, warning=FALSE, message=FALSE} caROC(diseaseData,controlData,userFormula, control_sensitivity = NULL, control_specificity = c(0.7,0.8, 0.9), mono_resp_method = "none", whichSE = "sample",nbootstrap = 100, CI_alpha = 0.95, logit_CI = TRUE) ``` ### 1.2 Sub-population specific sensitivity at controlled specificity level Give the covariates of a subpopulation, we can also computed sensitivity at controlled specificity level. ```{r controlspecss, warning=FALSE, message=FALSE} target_covariates = c(1, 0.7, 0.9) sscaROC(diseaseData,controlData, userFormula = userFormula, control_sensitivity = c(0.2,0.8, 0.9), target_covariates = target_covariates, control_specificity = NULL, mono_resp_method = "none", whichSE = "sample",nbootstrap = 100, CI_alpha = 0.95, logit_CI = TRUE) ``` You can also specific covariates for multiple subpopualtions: ```{r multicontrolspecss, message=FALSE, warning=FALSE} target_covariates = matrix(c(1, 0.7, 0.9, 1, 0.8, 0.8), 2, 3, byrow = TRUE) sscaROC(diseaseData,controlData, userFormula = userFormula, control_sensitivity = c(0.2,0.8, 0.9), target_covariates = target_covariates, control_specificity = NULL, mono_resp_method = "none", whichSE = "sample",nbootstrap = 100, CI_alpha = 0.95, logit_CI = TRUE) ``` ## 2. Covariate-adjusted ROC curve ### 2.1 Pooled ROC Obtaining the covariate-adjusted ROC curve with sensitivity controlled through the whole spectrum is very easy. You can choose restoring monotonicity or no restoration when constructing ROC through argument `mono_resp_method`. It could be "none" (no monotonicity restoration) or "ROC" (curve-based monotonicity restoration). ```{r ROC, warning=FALSE, message=FALSE} ### ROC with curve-based monotonicity restoration curveROC <- caROC(diseaseData,controlData,userFormula, mono_resp_method = "ROC", verbose = FALSE) ``` Plot the ROC curves: ```{r plotROC, warning=FALSE, message=FALSE} oldpar <- par() par(mar = c(3, 3, 2, 0.3), mgp = c(1.2, 0.3, 0)) plot_caROC(curveROC) par(oldpar) ``` Construct confidence-band for the ROC curve: ```{r ROC2, warning=FALSE, message=FALSE} curveROC_CB <- caROC_CB(diseaseData,controlData, userFormula, mono_resp_method = "ROC", CB_alpha = 0.95, nbin = 100,verbose = FALSE) ``` Plot the confidence band: ```{r plotROCband, warning=FALSE, message=FALSE} oldpar <- par() par(mar = c(3, 3, 2, 0.3), mgp = c(1.2, 0.3, 0)) plot_caROC_CB(curveROC_CB, add = FALSE, lty = 2, col = "blue") par(oldpar) ``` or plot the ROC and confidence band on the same plot: ```{r plotROCband2, warning=FALSE, message=FALSE} oldpar <- par() par(mar = c(3, 3, 2, 0.3), mgp = c(1.2, 0.3, 0)) plot_caROC(curveROC) plot_caROC_CB(curveROC_CB, add = TRUE, lty = 2, col = "blue") par(oldpar) ``` ### 2.2 Sub-population specific ROC The ROC curve for given subpopulation can be easily calculated: ```{r ssROC, warning=FALSE, message=FALSE} target_covariates = c(1, 0.7, 0.9) myROC <- sscaROC(diseaseData, controlData, userFormula, target_covariates, global_ROC_controlled_by = "sensitivity", mono_resp_method = "none") oldpar <- par() par(mar = c(3, 3, 2, 0.3), mgp = c(1.2, 0.3, 0)) plot_sscaROC(myROC, lwd = 1.6) par(oldpar) ``` Confidence band can also be computed, but may take ~10-20min for a dataset with 2000 samples. ```{r ssROCband, eval=FALSE} myROCband <- sscaROC_CB(diseaseData, controlData, userFormula, mono_resp_method = "none", target_covariates, global_ROC_controlled_by = "sensitivity", CB_alpha = 0.95, logit_CB = FALSE, nbootstrap = 100, nbin = 100, verbose = FALSE) oldpar <- par() par(mar = c(3, 3, 2, 0.3), mgp = c(1.2, 0.3, 0)) plot_sscaROC_CB(myROCband, col = "purple", lty = 2) par(oldpar) ``` ## 3. Threshold at controlled sensitivity/specificity for given covariate values In clinical setting, it is useful to know the specific thresholds of biomarkers at controlled sensitivity or specificity level for given covariate values. ```{r treshold, warning=FALSE, message=FALSE} ### this is the given covariates of interest new_covariates <- data.frame(M = 1, Z1 = 0.7, Z2 = 0.9) ### controlling sensitivity levels caThreshold(userFormula, new_covariates, diseaseData = diseaseData, controlData = NULL, control_sensitivity = c(0.7,0.8,0.9), control_specificity = NULL) ### controlling specificity levels caThreshold(userFormula,new_covariates, diseaseData = NULL, controlData = controlData, control_sensitivity = NULL, control_specificity = c(0.7,0.8,0.9)) ```