Estimate the weights related to the proportions of the unknown components of the two admixture models

Estimate the component weights from the Inversion - Best Matching (IBM) method, related to the two admixture models with respective probability density function (pdf) l1 and l2, such that: l1 = p1*f1 + (1-p1)g1 and l2 = p2f2 + (1-p2)*g2, where g1 and g2 are the known component densities. For further details about IBM approach, see 'Details' below.

Usage

IBM_estimProp(
  sample1,
  sample2,
  known.prop = NULL,
  comp.dist = NULL,
  comp.param = NULL,
  with.correction = TRUE,
  n.integ = 1000
)

Arguments

sample1: Observations of the first sample under study.
sample2: Observations of the second sample under study.
known.prop: (optional) Numeric vector with two elements, respectively the component weight for the unknown component in the first and in the second samples.
comp.dist: A list with four elements corresponding to the component distributions (specified with R native names for these distributions) involved in the two admixture models. The two first elements refer to the unknown and known components of the 1st admixture model, and the last two ones to those of the second admixture model. If there are unknown elements, they must be specified as 'NULL' objects. For instance, 'comp.dist' could be specified as follows: list(f1=NULL, g1='norm', f2=NULL, g2='norm').
comp.param: A list with four elements corresponding to the parameters of the component distributions, each element being a list itself. The names used in this list must correspond to the native R argument names for these distributions. The two first elements refer to the parameters of unknown and known components of the 1st admixture model, and the last two ones to those of the second admixture model. If there are unknown elements, they must be specified as 'NULL' objects. For instance, 'comp.param' could be specified as follows: : list(f1=NULL, g1=list(mean=0,sd=1), f2=NULL, g2=list(mean=3,sd=1.1)).
with.correction: Boolean indicating whether the solution (estimated proportions) should be adjusted or not (with the constant determined thanks to the exact proportion, usually unknown except in case of simulations).
n.integ: Number of data points generated for the distribution on which to integrate.

Value

A list with the two estimates of the component weights for each of the admixture model, plus that of the theoretical model if specified.

Details

See the paper presenting the IBM approach at the following HAL weblink: https://hal.archives-ouvertes.fr/hal-03201760

Author

Xavier Milhaud xavier.milhaud.research@gmail.com

Examples

##### On a simulated example to see whether the true parameters are well estimated.
## Simulate data:
list.comp <- list(f1 = 'norm', g1 = 'norm',
                  f2 = 'norm', g2 = 'norm')
list.param <- list(f1 = list(mean = 3, sd = 0.5), g1 = list(mean = 0, sd = 1),
                   f2 = list(mean = 3, sd = 0.5), g2 = list(mean = 5, sd = 2))
sample1 <- rsimmix(n=1500, unknownComp_weight=0.5, comp.dist = list(list.comp$f1,list.comp$g1),
                                                 comp.param=list(list.param$f1,list.param$g1))
sample2 <- rsimmix(n=2000, unknownComp_weight=0.7, comp.dist = list(list.comp$f2,list.comp$g2),
                                                 comp.param=list(list.param$f2,list.param$g2))
## Estimate the mixture weights of the two admixture models (provide hat(theta)_n and theta^c):
estim <- IBM_estimProp(sample1 = sample1[['mixt.data']], sample2 = sample2[['mixt.data']],
                       known.prop = c(0.5,0.7), comp.dist = list.comp, comp.param = list.param,
                       with.correction = FALSE, n.integ = 1000)
estim[['prop.estim']]
#> [1] 0.5037036 0.7085979
estim[['theo.prop.estim']]
#> [1] 0.4999903 0.7000001
##### On a real-life example (unknown component densities, unknown mixture weights).
list.comp <- list(f1 = NULL, g1 = 'norm',
                  f2 = NULL, g2 = 'norm')
list.param <- list(f1 = NULL, g1 = list(mean = 0, sd = 1),
                   f2 = NULL, g2 = list(mean = 5, sd = 2))
## Estimate the mixture weights of the two admixture models (provide only hat(theta)_n):
estim <- IBM_estimProp(sample1 = sample1[['mixt.data']], sample2 = sample2[['mixt.data']],
                       known.prop = NULL, comp.dist = list.comp, comp.param = list.param,
                       with.correction = FALSE, n.integ = 1000)
estim[['prop.estim']]
#> [1] 0.5024422 0.7101707
estim[['theo.prop.estim']]
#> NULL