Difference between the unknown empirical cumulative distribution functions in two admixture models

Compute the 'gap' between two unknown cumulative distribution functions (ecdf) at some given point, in admixture models with probability distribution function (pdf) given by l where l = p*f + (1-p)*g. Uses the inversion method to do so, i.e. f = (1/p) (l - (1-p)*g), where g represents the known component of the admixture model and p is the unknown proportion of the unknown component. Therefore, compute: D(z,L1,L2,p1,p2) = F1(z,L1,p1) - F2(z,L2,p2) This measure should be integrated over some domain to compute the global discrepancy, see further information in 'Details' below.

Usage

IBM_gap(z, par, fixed.p1 = NULL, sample1, sample2, comp.dist, comp.param)

Arguments

z: the point at which the difference between both unknown (estimated) component distributions is computed.
par: Numeric vector with two elements, corresponding to the weights of the unknown component for the two admixture models.
fixed.p1: (optional, NULL by default) Arbitrary value chosen by the user for the component weight related to the unknown component of the first admixture model. Only useful for optimization when the known components of the two models are identical (G1=G2, leading to unidimensional optimization).
sample1: Observations of the first sample under study.
sample2: Observations of the second sample under study.
comp.dist: A list with four elements corresponding to the component distributions (specified with R native names for these distributions) involved in the two admixture models. The two first elements refer to the unknown and known components of the 1st admixture model, and the last two ones to those of the second admixture model. If there are unknown elements, they must be specified as 'NULL' objects. For instance, 'comp.dist' could be specified as follows: list(f1=NULL, g1='norm', f2=NULL, g2='norm').
comp.param: A list with four elements corresponding to the parameters of the component distributions, each element being a list itself. The names used in this list must correspond to the native R argument names for these distributions. The two first elements refer to the parameters of unknown and known components of the 1st admixture model, and the last two ones to those of the second admixture model. If there are unknown elements, they must be specified as 'NULL' objects. For instance, 'comp.param' could be specified as follows: : list(f1=NULL, g1=list(mean=0,sd=1), f2=NULL, g2=list(mean=3,sd=1.1)).

Value

the gap evaluated at the specified point between the unknown components of the two observed samples.

Details

See the paper presenting the IBM approach at the following HAL weblink: https://hal.archives-ouvertes.fr/hal-03201760

Author

Xavier Milhaud xavier.milhaud.research@gmail.com

Examples

list.comp <- list(f1 = 'norm', g1 = 'norm',
                  f2 = 'norm', g2 = 'norm')
list.param <- list(f1 = list(mean = 3, sd = 0.5), g1 = list(mean = 0, sd = 1),
                   f2 = list(mean = 1, sd = 0.1), g2 = list(mean = 5, sd = 2))
sample1 <- rsimmix(n=1500, unknownComp_weight=0.5, comp.dist = list(list.comp$f1,list.comp$g1),
                                                   comp.param=list(list.param$f1,list.param$g1))
sample2 <- rsimmix(n=2000, unknownComp_weight=0.7, comp.dist = list(list.comp$f2,list.comp$g2),
                                                   comp.param=list(list.param$f2,list.param$g2))
IBM_gap(z = 2.8, par = c(0.3,0.6), fixed.p1 = NULL, sample1 = sample1[['mixt.data']],
        sample2 = sample2[['mixt.data']], comp.dist = list.comp, comp.param = list.param)
#> [1] -1.263038