`R/admix_test.R`

`admix_test.Rd`

Perform hypothesis test between unknown components of a list of admixture models, where we remind that the i-th admixture model has probability density function (pdf) l_i such that: l_i = p_i * f_i + (1-p_i) * g_i, with g_i the known component density. The unknown quantities p_i and f_i are thus estimated, leading to the test given by the following null and alternative hypothesis: H0: f_i = f_j for all i != j against H1 : there exists at least i != j such that f_i differs from f_j. The test can be performed using two methods, either the comparison of coefficients obtained through polynomial basis expansions of the component densities, or by the inner-convergence property obtained using the IBM approach. See 'Details' below for further information.

admix_test( samples = NULL, sym.f = FALSE, test.method = c("Poly", "ICV"), sim_U = NULL, min_size = NULL, comp.dist = NULL, comp.param = NULL, support = c("Real", "Integer", "Positive", "Bounded.continuous"), parallel = FALSE, n_cpu = 4 )

samples | A list of the K samples to be studied, all following admixture distributions. |
---|---|

sym.f | A boolean indicating whether the unknown component densities are assumed to be symmetric or not. |

test.method | The testing method to be applied. Can be either 'Poly' (polynomial basis expansion) or 'ICV' (inner convergence from IBM). The same testing method is performed between all samples. In the one-sample case, only 'Poly' is available and the test is a gaussianity test. For further details, see section 'Details' below. |

sim_U | (Used only with 'ICV' testing method, otherwise useless) Random draws of the inner convergence part of the contrast as defined in the IBM approach (see 'Details' below). |

min_size | (Potentially used with 'ICV' testing method, otherwise useless) Minimal size among all samples (needed to take into account the correction factor for the variance-covariance assessment). |

comp.dist | A list with 2*K elements corresponding to the component distributions (specified with R native names for these distributions) involved in the K admixture models. Elements, grouped by 2, refer to the unknown and known components of each admixture model, If there are unknown elements, they must be specified as 'NULL' objects. For instance, 'comp.dist' could be specified as follows with K = 3: list(f1 = NULL, g1 = 'norm', f2 = NULL, g2 = 'norm', f3 = NULL, g3 = 'rnorm'). |

comp.param | A list with 2*K elements corresponding to the parameters of the component distributions, each element being a list itself. The names used in this list must correspond to the native R argument names for these distributions. Elements, grouped by 2, refer to the parameters of unknown and known components of each admixture model. If there are unknown elements, they must be specified as 'NULL' objects. For instance, 'comp.param' could be specified as follows (with K = 3): list(f1 = NULL, g1 = list(mean=0,sd=1), f2 = NULL, g2 = list(mean=3,sd=1.1), f3 = NULL, g3 = list(mean=-2,sd=0.6)). |

support | (Potentially used with 'Poly' testing method, otherwise useless) The support of the observations; one of "Real", "Integer", "Positive", or "Bounded.continuous". |

parallel | (default to FALSE) Boolean indicating whether parallel computations are performed (speed-up the tabulation). |

n_cpu | (default to 2) Number of cores used when parallelizing. |

A list containing...

For further details on hypothesis techniques, see i) Inner convergence through IBM approach at https://hal.archives-ouvertes.fr/hal-03201760 ; ii) Polynomial expansions at 'False Discovery Rate model Gaussianity test' (EJS, Pommeret & Vanderkerkhove, 2017), or 'Semiparametric two-sample admixture components comparison test: the symmetric case' (JSPI, Milhaud & al., 2021).

Xavier Milhaud xavier.milhaud.research@gmail.com

##### On a simulated example to see whether the true parameters are well estimated. With 1 sample: list.comp <- list(f1 = "norm", g1 = "norm") list.param <- list(f1 = list(mean = 0, sd = 1), g1 = list(mean = 2, sd = 0.7)) ## Simulate data: sim1 <- rsimmix(n = 600, unknownComp_weight = 0.8, comp.dist = list(list.comp$f1,list.comp$g1), comp.param = list(list.param$f1, list.param$g1))$mixt.data ## Perform the test hypothesis: list.comp <- list(f1 = NULL, g1 = "norm") list.param <- list(f1 = NULL, g1 = list(mean = 2, sd = 0.7)) admix_test(samples = list(sim1), sym.f = TRUE, test.method = 'Poly', sim_U = NULL, min_size = NULL, comp.dist = list.comp, comp.param = list.param, support = "Real", parallel = TRUE, n_cpu = 2)#>#> $decision #> [1] 0 #> #> $p_value #> [1] 0.8371209 #> #> $test.stat #> [1] 0.04226214 #> #> $var.stat #> [,1] [,2] [,3] #> [1,] 0.6061963 NA NA #> [2,] NA 1.152148 NA #> [3,] NA NA 1.150159 #> #> $rank #> [1] 1 #> #> $estimates #> $estimates$p #> [1] 0.8428412 #> #> $estimates$mu #> [1] 0.2506976 #> #> $estimates$s #> [1] 1.162749 #> #>##### With two samples : list.comp <- list(f1 = "norm", g1 = "norm", f2 = "norm", g2 = "norm") list.param <- list(f1 = list(mean = 0, sd = 1), g1 = list(mean = 2, sd = 0.7), f2 = list(mean = 0, sd = 1), g2 = list(mean = -3, sd = 1.1)) sim1 <- rsimmix(n = 700, unknownComp_weight = 0.8, comp.dist = list(list.comp$f1,list.comp$g1), comp.param = list(list.param$f1, list.param$g1))$mixt.data sim2 <- rsimmix(n= 800, unknownComp_weight = 0.65, comp.dist = list(list.comp$f2,list.comp$g2), comp.param = list(list.param$f2, list.param$g2))$mixt.data list.comp <- list(f1 = NULL, g1 = "norm", f2 = NULL, g2 = "norm") list.param <- list(f1 = NULL, g1 = list(mean = 2, sd = 0.7), f2 = NULL, g2 = list(mean = -3, sd = 1.1)) #admix_test(samples = list(sim1,sim2), sym.f = FALSE, test.method = 'ICV', sim_U = NULL, # min_size = NULL, comp.dist = list.comp, comp.param = list.param, support = "Real", # parallel = TRUE, n_cpu = 2) admix_test(samples = list(sim1,sim2), sym.f = TRUE, test.method = 'Poly', sim_U = NULL, min_size = NULL, comp.dist = list.comp, comp.param = list.param, support = "Real", parallel = FALSE, n_cpu = NULL)#> [[1]] #> [[1]]$decision #> [1] 1 #> #> [[1]]$p_value #> [1] 0.0003103294 #> #> [[1]]$test_statistic #> [1] 13.007 #> #> [[1]]$varCov.matrix #> [,1] [,2] [,3] #> [1,] 2.248907 NA NA #> [2,] NA 10.37172 NA #> [3,] NA NA 50.81926 #> #> [[1]]$rank #> [1] 1 #> #> [[1]]$p1 #> [1] 0.7825719 #> #> [[1]]$p2 #> [1] 0.5608453 #> #>