Hypothesis test between unknown components of the admixture models under study

Perform hypothesis test between unknown components of a list of admixture models, where we remind that the i-th admixture model has probability density function (pdf) l_i such that: l_i = p_i * f_i + (1-p_i) * g_i, with g_i the known component density. The unknown quantities p_i and f_i are thus estimated, leading to the test given by the following null and alternative hypothesis: H0: f_i = f_j for all i != j against H1 : there exists at least i != j such that f_i differs from f_j. The test can be performed using two methods, either the comparison of coefficients obtained through polynomial basis expansions of the component densities, or by the inner-convergence property obtained using the IBM approach. See 'Details' below for further information.

Usage

admix_test(
  samples = NULL,
  sym.f = FALSE,
  test.method = c("Poly", "ICV"),
  sim_U = NULL,
  n_sim_tab = 50,
  min_size = NULL,
  comp.dist = NULL,
  comp.param = NULL,
  support = c("Real", "Integer", "Positive", "Bounded.continuous"),
  conf.level = 0.95,
  parallel = FALSE,
  n_cpu = 2
)

Arguments

samples: A list of the K samples to be studied, all following admixture distributions.
sym.f: A boolean indicating whether the unknown component densities are assumed to be symmetric or not.
test.method: The testing method to be applied. Can be either 'Poly' (polynomial basis expansion) or 'ICV' (inner convergence from IBM). The same testing method is performed between all samples. In the one-sample case, only 'Poly' is available and the test is a gaussianity test. For further details, see section 'Details' below.
sim_U: (Used only with 'ICV' testing method, otherwise useless) Random draws of the inner convergence part of the contrast as defined in the IBM approach (see 'Details' below).
n_sim_tab: (Used only with 'ICV' testing method, otherwise useless) Number of simulated gaussian processes used in the tabulation of the inner convergence distribution in the IBM approach.
min_size: (Potentially used with 'ICV' testing method, otherwise useless) Minimal size among all samples (needed to take into account the correction factor for the variance-covariance assessment).
comp.dist: A list with 2*K elements corresponding to the component distributions (specified with R native names for these distributions) involved in the K admixture models. Elements, grouped by 2, refer to the unknown and known components of each admixture model, If there are unknown elements, they must be specified as 'NULL' objects. For instance, 'comp.dist' could be specified as follows with K = 3: list(f1 = NULL, g1 = 'norm', f2 = NULL, g2 = 'norm', f3 = NULL, g3 = 'rnorm').
comp.param: A list with 2*K elements corresponding to the parameters of the component distributions, each element being a list itself. The names used in this list must correspond to the native R argument names for these distributions. Elements, grouped by 2, refer to the parameters of unknown and known components of each admixture model. If there are unknown elements, they must be specified as 'NULL' objects. For instance, 'comp.param' could be specified as follows (with K = 3): list(f1 = NULL, g1 = list(mean=0,sd=1), f2 = NULL, g2 = list(mean=3,sd=1.1), f3 = NULL, g3 = list(mean=-2,sd=0.6)).
support: (Potentially used with 'Poly' testing method, otherwise useless) The support of the observations; one of "Real", "Integer", "Positive", or "Bounded.continuous".
conf.level: The confidence level of the K-sample test.
parallel: (default to FALSE) Boolean indicating whether parallel computations are performed (speed-up the tabulation).
n_cpu: (default to 2) Number of cores used when parallelizing.

Value

A list containing the decision of the test (reject or not), the confidence level at which the test is performed, the p-value of the test, and the value of the test statistic (following a chi2 distribution with one degree of freedom under the null).

Details

For further details on hypothesis techniques, see i) Inner convergence through IBM approach at https://hal.archives-ouvertes.fr/hal-03201760 ; ii) Polynomial expansions at 'False Discovery Rate model Gaussianity test' (EJS, Pommeret & Vanderkerkhove, 2017), or 'Semiparametric two-sample admixture components comparison test: the symmetric case' (JSPI, Milhaud & al., 2021).

Author

Xavier Milhaud xavier.milhaud.research@gmail.com

Examples

##### On a simulated example, with 1 sample (gaussianity test):
list.comp <- list(f1 = "norm", g1 = "norm")
list.param <- list(f1 = list(mean = 0, sd = 1), g1 = list(mean = 2, sd = 0.7))
## Simulate data:
sim1 <- rsimmix(n = 300, unknownComp_weight = 0.85, comp.dist = list(list.comp$f1,list.comp$g1),
                comp.param = list(list.param$f1, list.param$g1))$mixt.data
## Perform the test hypothesis:
list.comp <- list(f1 = NULL, g1 = "norm")
list.param <- list(f1 = NULL, g1 = list(mean = 2, sd = 0.7))
gaussTest <- admix_test(samples = list(sim1), sym.f = TRUE, test.method = 'Poly', sim_U = NULL,
                        n_sim_tab = 50, min_size = NULL, comp.dist = list.comp,
                        comp.param = list.param, support = "Real", conf.level = 0.95,
                        parallel = FALSE, n_cpu = 2)
#> Warning: Still needs to be implemented for cases where the unknown density mean is lower then the known density one!
#> Warning: Still needs to be implemented for cases where the unknown density mean is lower then the known density one!