Estimates weights of unknown components from 2 admixtures using IBM
Source:R/estim_IBM.R
estim_IBM.Rd
Estimation of the component weights from the Inversion - Best Matching (IBM) method, related to two admixture models with respective probability density function (pdf) l1 and l2, such that: l1 = p1*f1 + (1-p1)g1 and l2 = p2f2 + (1-p2)*g2, where g1 and g2 are the known component densities. For further details about IBM approach, see 'Details' below.
Arguments
- samples
A list of the two considered samples.
- admixMod
A list of two objects of class admix_model, one for each sample.
- n.integ
Number of data points generated for the distribution on which to integrate.
- compute_var
(default to FALSE) A boolean that indicates whether one computes the variance of the estimators of unknown mixing proportions.
Value
An object of class estim_IBM, containing 7 attributes: 1) the number of samples under study; 2) the sizes of samples; 3) the information about mixture components (distributions and parameters) for each sample; 4) the estimation method (Inversion Best Matching here, see the given reference); 5) the estimated mixing proportions (weights of the unknown component distributions in each sample); 6) the arbitrary value of the mixing weight in the first admixture sample (in case of equal known components, see the given reference); 7) the support of integration that was used in the computations.
References
Milhaud X, Pommeret D, Salhi Y, Vandekerkhove P (2024). “Two-sample contamination model test.” Bernoulli, 30(1), 170--197. doi:10.3150/23-BEJ1593 .
Author
Xavier Milhaud xavier.milhaud.research@gmail.com
Examples
## Continuous support: simulate mixture data.
mixt1 <- twoComp_mixt(n = 1500, weight = 0.5,
comp.dist = list("norm", "norm"),
comp.param = list(list("mean" = 3, "sd" = 0.5),
list("mean" = 0, "sd" = 1)))
data1 <- getmixtData(mixt1)
mixt2 <- twoComp_mixt(n = 2000, weight = 0.7,
comp.dist = list("norm", "norm"),
comp.param = list(list("mean" = 3, "sd" = 0.5),
list("mean" = 5, "sd" = 2)))
data2 <- getmixtData(mixt2)
## Define the admixture models:
admixMod1 <- admix_model(knownComp_dist = mixt1$comp.dist[[2]],
knownComp_param = mixt1$comp.param[[2]])
admixMod2 <- admix_model(knownComp_dist = mixt2$comp.dist[[2]],
knownComp_param = mixt2$comp.param[[2]])
## Estimate the mixture weights of the two admixture models (provide only hat(theta)_n):
estim_IBM(samples = list(data1,data2),
admixMod = list(admixMod1,admixMod2), n.integ = 1000)
#> Warning: IBM estimators of two unknown proportions are reliable only if the two
#> corresponding unknown component distributions have been tested equal (see 'admix_test()').
#> Furthermore, when both the known and unknown component distributions of the mixture
#> models are identical, the IBM approach provides an estimation of the ratio of the
#> actual mixing weights rather than an estimation of the unknown weights themselves.
#> Call:estim_IBM(samples = list(data1, data2), admixMod = list(admixMod1,
#> admixMod2), n.integ = 1000)
#>
#> Estimated mixing proportion (of the unknown component) in the 1st sample: 0.511
#> Estimated mixing proportion (of the unknown component) in the 2nd sample: 0.711
#> Variance of the estimator of the mixing proportion in the 1st sample: NA
#> Variance of the estimator of the mixing proportion in the 2nd sample: NA
#>
## Example 2: multinomial distribution (discrete case).
mixt1 <- twoComp_mixt(n = 1500, weight = 0.8,
comp.dist = list("multinom", "multinom"),
comp.param = list(list("size" = 1, "prob" = c(0.2,0.3,0.5)),
list("size" = 1, "prob" = c(0.1,0.6,0.3))))
data1 <- getmixtData(mixt1)
mixt2 <- twoComp_mixt(n = 2000, weight = 0.3,
comp.dist = list("multinom", "multinom"),
comp.param = list(list("size" = 1, "prob" = c(0.2,0.3,0.5)),
list("size" = 1, "prob" = c(0.7,0.1,0.2))))
data2 <- getmixtData(mixt2)
## Define the admixture models:
admixMod1 <- admix_model(knownComp_dist = mixt1$comp.dist[[2]],
knownComp_param = mixt1$comp.param[[2]])
admixMod2 <- admix_model(knownComp_dist = mixt2$comp.dist[[2]],
knownComp_param = mixt2$comp.param[[2]])
## Estimate the mixture weights of the two admixture models (provide only hat(theta)_n):
estim_IBM(samples = list(data1,data2), admixMod = list(admixMod1,admixMod2),
compute_var = TRUE)
#> Warning: IBM estimators of two unknown proportions are reliable only if the two
#> corresponding unknown component distributions have been tested equal (see 'admix_test()').
#> Furthermore, when both the known and unknown component distributions of the mixture
#> models are identical, the IBM approach provides an estimation of the ratio of the
#> actual mixing weights rather than an estimation of the unknown weights themselves.
#> Call:estim_IBM(samples = list(data1, data2), admixMod = list(admixMod1,
#> admixMod2), compute_var = TRUE)
#>
#> Estimated mixing proportion (of the unknown component) in the 1st sample: 0.719
#> Estimated mixing proportion (of the unknown component) in the 2nd sample: 0.306
#> Variance of the estimator of the mixing proportion in the 1st sample: 0.005797
#> Variance of the estimator of the mixing proportion in the 2nd sample: 0.000551
#>