Cross-validation estimate (by Patra and Sen) of the unknown component weight as well as the unknown distribution in an admixture model

Estimation of unknown elements (by Patra and Sen method) under the admixture model with probability density function l: l = p*f + (1-p)*g, where g is the known component of the two-component admixture, p is the unknown proportion of the unknown component distribution f. The estimated unknown component weight p is selected using a cross-validation technique that helps to choose the right penalization, see 'Details' below for further information.

Usage

PatraSen_cv_mixmodel(
  data,
  folds = 10,
  reps = 1,
  cn.s = NULL,
  cn.length = NULL,
  gridsize = 200
)

Arguments

data: Sample where the known component density of the admixture model has been transformed into a Uniform(0,1) distribution.
folds: (default to 10) Number of folds used for cross-validation.
reps: (default to 1) Number of replications for cross-validation.
cn.s: (default to NULL) A sequence of 'c.n' to be used for cross-validation (vector of values).
cn.length: (default to NULL) Number of equally spaced tuning parameter (between .001 x log(log(n)) and 0.2 x log(log(n))). Values to search from.
gridsize: (default to 200) Number of equally spaced points (between 0 and 1) to evaluate the distance function. Larger values are more computationally intensive but also lead to more accurate estimates.

Value

A list containing 'alp.hat' (estimate of the unknown component weight), 'Fs.hat' (list with elements 'x' and 'y' values for the function estimate of the unknown cumultaive distribution function), 'dist.out' which is an object of the class 'dist.fun' using the complete data.gen, 'c.n' the value of the tuning parameter used to compute the final estimate, and finally 'cv.out' which is an object of class 'cv.mixmodel'. The object is NULL if method is "fixed".

Details

See Patra, R.K. and Sen, B. (2016); Estimation of a Two-component Mixture Model with Applications to Multiple Testing; JRSS Series B, 78, pp. 869--893.

Author

Xavier Milhaud xavier.milhaud.research@gmail.com

Examples

## Simulate data:
comp.dist <- list(f = 'norm', g = 'norm')
comp.param <- list(f = list(mean = 3, sd = 0.5),
                   g = list(mean = 0, sd = 1))
data1 <- rsimmix(n = 2000, unknownComp_weight = 0.3, comp.dist, comp.param)[['mixt.data']]
## Transform the known component of the admixture model into a Uniform(0,1) distribution:
comp.dist <- list(f = NULL, g = 'norm')
comp.param <- list(f = NULL, g = list(mean = 0, sd = 1))
data1_transfo <- knownComp_to_uniform(data = data1, comp.dist = list(comp.dist$f, comp.dist$g),
                                      comp.param = list(comp.param$f, comp.param$g))
plot(density(data1_transfo))

## Estimate the proportion of the unknown component of the admixture model:
PatraSen_cv_mixmodel(data = data1_transfo, folds = 3, reps = 1, cn.s = NULL,
                               cn.length = 3, gridsize = 100)$alp.hat
#> Warning: Make sure that data is transformed such that the known component is Uniformly(0,1) distributed.
#> [1] 0.72