Test hypothesis on the unknown component of K (K > 1) admixture models using Inversion - Best Matching method. K-samples test of the unknown component distribution in admixture models using Inversion - Best Matching (IBM) method. Recall that we have K populations following admixture models, each one with probability density functions (pdf) l_k = p_k*f_k + (1-p_k)*g_k, where g_k is the known pdf and l_k corresponds to the observed sample. Perform the following hypothesis test: H0 : f_1 = ... = f_K against H1 : f_i differs from f_j (i diff j, and i,j in 1,...,K).

IBM_k_samples_test(
  samples = NULL,
  sim_U = NULL,
  n_sim_tab = 100,
  min_size = NULL,
  comp.dist = NULL,
  comp.param = NULL,
  parallel = FALSE,
  n_cpu = 2
)

Arguments

samples

A list of the samples to be studied, all following admixture distributions.

sim_U

Random draws of the inner convergence part of the contrast as defined in the IBM approach (see 'Details' below).

n_sim_tab

Number of simulated gaussian processes when tabulating the inner convergence distribution in the IBM approach.

min_size

useful to provide the minimal size among all samples (needed to take into account the correction factor for the variance-covariance assessment). Otherwise, useless.

comp.dist

A list with 2*K elements corresponding to the component distributions (specified with R native names for these distributions) involved in the K admixture models. Elements, grouped by 2, refer to the unknown and known components of each admixture model, If there are unknown elements, they must be specified as 'NULL' objects. For instance, 'comp.dist' could be specified as follows with K = 3: list(f1 = NULL, g1 = 'norm', f2 = NULL, g2 = 'norm', f3 = NULL, g3 = 'rnorm').

comp.param

A list with 2*K elements corresponding to the parameters of the component distributions, each element being a list itself. The names used in this list must correspond to the native R argument names for these distributions. Elements, grouped by 2, refer to the parameters of unknown and known components of each admixture model. If there are unknown elements, they must be specified as 'NULL' objects. For instance, 'comp.param' could be specified as follows (with K = 3): list(f1 = NULL, g1 = list(mean=0,sd=1), f2 = NULL, g2 = list(mean=3,sd=1.1), f3 = NULL, g3 = list(mean=-2,sd=0.6)).

parallel

(default to FALSE) Boolean indicating whether parallel computations are performed (speed-up the tabulation).

n_cpu

(default to 2) Number of cores used when parallelizing.

Value

A list of ten elements, containing: 1) the rejection decision; 2) the p-value of the test; 3) the terms involved in the test statistic; 4) the test statistic value; 5) the selected rank (number of terms involved in the test statistic); 6) the value of the penalized test statistic; 7) the sorted contrast values; 8) the 95th-quantile of the contrast distribution; 9) the final terms of the statistic; and 10) the contrast matrix.

Details

See the paper presenting the IBM approach at the following HAL weblink: https://hal.archives-ouvertes.fr/hal-03201760

Author

Xavier Milhaud xavier.milhaud.research@gmail.com

Examples

####### Under the alternative hypothesis H1 (with K=3 populations): list.comp <- list(f1 = "norm", g1 = "norm", f2 = "norm", g2 = "norm", f3 = "norm", g3 = "norm") list.param <- list(f1 = list(mean = 0, sd = 1), g1 = list(mean = 2, sd = 0.7), f2 = list(mean = 2, sd = 1), g2 = list(mean = 4, sd = 1.1), f3 = list(mean = 0, sd = 1), g3 = list(mean = 3, sd = 0.8)) sim1 <- rsimmix(n = 5000, unknownComp_weight = 0.8, comp.dist = list(list.comp$f1,list.comp$g1), comp.param = list(list.param$f1, list.param$g1))$mixt.data sim2 <- rsimmix(n= 5300, unknownComp_weight = 0.6, comp.dist = list(list.comp$f2,list.comp$g2), comp.param = list(list.param$f2, list.param$g2))$mixt.data sim3 <- rsimmix(n = 5100, unknownComp_weight = 0.7, comp.dist = list(list.comp$f3,list.comp$g3), comp.param = list(list.param$f3, list.param$g3))$mixt.data ## Perform the 3-samples test in a real-life setting: list.comp <- list(f1 = NULL, g1 = "norm", f2 = NULL, g2 = "norm", f3 = NULL, g3 = "norm") list.param <- list(f1 = NULL, g1 = list(mean = 2, sd = 0.7), f2 = NULL, g2 = list(mean = 4, sd = 1.1), f3 = NULL, g3 = list(mean = 3, sd = 0.8)) obj <- IBM_k_samples_test(samples= list(sim1,sim2,sim3), sim_U=NULL, n_sim_tab=8, min_size=NULL, comp.dist=list.comp, comp.param=list.param, parallel=TRUE, n_cpu=2)
#> Warning: In 'IBM_estimProp': optimization algorithm was changed (in 'optim') from 'Nelder-Mead' to 'BFGS' to avoid the solution to explose.
obj$rejection_rule
#> 95% #> TRUE