Perform hypothesis test between unknown components of a list of admixture models, where we remind that the i-th admixture model has probability density function (pdf) l_i such that: l_i = p_i * f_i + (1-p_i) * g_i, with g_i the known component density. The unknown quantities p_i and f_i are thus estimated, leading to the test given by the following null and alternative hypothesis: H0: f_i = f_j for all i != j against H1 : there exists at least i != j such that f_i differs from f_j. The test can be performed using two methods, either the comparison of coefficients obtained through polynomial basis expansions of the component densities, or by the inner-convergence property obtained using the IBM approach. See 'Details' below for further information.
Usage
admix_test(
samples,
admixMod,
test_method = c("poly", "icv"),
conf_level = 0.95,
...
)
Arguments
- samples
A list of the K (K > 0) samples to be studied, each one assumed to follow a mixture distribution.
- admixMod
A list of objects of class admix_model, containing useful information about distributions and parameters of the contamination / admixture models under study.
- test_method
The testing method to be applied. Can be either 'poly' (polynomial basis expansion) or 'icv' (inner convergence from IBM). The same testing method is performed between all samples. In the one-sample case, only 'Poly' is available and the test is a gaussianity test. For further details, see section 'Details' below.
- conf_level
The confidence level of the K-sample test.
- ...
Depending on the choice made by the user for the test method ('poly' or 'icv'), optional arguments to gaussianity_test or orthobasis_test (in case of 'poly'), and to IBM_k_samples_test in case of 'icv'. .
Value
An object of class admix_test, containing 8 attributes: 1) the test decision (reject the null hypothesis or not); 2) the p-value of the test; 3) the confidence level of the test (1-alpha, where alpha denotes the level of the test or equivalently the type-I error); 4) the value of the test statistic; 5) the number of samples under study; 6) the respective size of each sample; 7) the information about mixture components (distributions and parameters); 8) the chosen testing method (either based on polynomial basis expansions, or on the inner convergence property; see given references).
References
Milhaud X, Pommeret D, Salhi Y, Vandekerkhove P (2024). “Contamination-source based K-sample clustering.” Journal of Machine Learning Research, 25(287), 1--32. https://jmlr.org/papers/v25/23-0914.html. Milhaud X, Pommeret D, Salhi Y, Vandekerkhove P (2022). “Semiparametric two-sample admixture components comparison test: The symmetric case.” Journal of Statistical Planning and Inference, 216, 135-150. ISSN 0378-3758, doi:10.1016/j.jspi.2021.05.010 . Pommeret D, Vandekerkhove P (2019). “Semiparametric density testing in the contamination model.” Electronic Journal of Statistics, 4743--4793. doi:10.1214/19-EJS1650 .
Author
Xavier Milhaud xavier.milhaud.research@gmail.com
Examples
####### Example with 2 samples
mixt1 <- twoComp_mixt(n = 380, weight = 0.7,
comp.dist = list("norm", "norm"),
comp.param = list(list("mean" = -2, "sd" = 0.5),
list("mean" = 0, "sd" = 1)))
mixt2 <- twoComp_mixt(n = 350, weight = 0.85,
comp.dist = list("norm", "norm"),
comp.param = list(list("mean" = -2, "sd" = 0.5),
list("mean" = -1, "sd" = 1)))
data1 <- getmixtData(mixt1)
data2 <- getmixtData(mixt2)
admixMod1 <- admix_model(knownComp_dist = mixt1$comp.dist[[2]],
knownComp_param = mixt1$comp.param[[2]])
admixMod2 <- admix_model(knownComp_dist = mixt2$comp.dist[[2]],
knownComp_param = mixt2$comp.param[[2]])
admix_test(samples = list(data1,data2), admixMod = list(admixMod1,admixMod2),
conf_level = 0.95, test_method = "poly", ask_poly_param = FALSE, support = "Real")
#> Testing using polynomial basis expansions requires in theory a square-root n consistent estimation
#> of the proportions of the unknown component distributions (thus using 'BVdk' estimation by default, associated to unknown
#> component distributions with symmetric densities). However, it is allowed to use 'PS' estimation in practice, but argument
#> 'est_method' has therefore to be set to 'PS'. In this case, the variance of estimators is obtained by boostrapping.
#> Call:admix_test(samples = list(data1, data2), admixMod = list(admixMod1,
#> admixMod2), test_method = "poly", conf_level = 0.95, ask_poly_param = FALSE,
#> support = "Real")
#>
#> Do we reject the null hypothesis? No
#> Here is the associated p-value of the test: 0.499