`R/gaussianity_test.R`

`gaussianity_test.Rd`

Perform the hypothesis test to know whether the unknown mixture component is gaussian or not, knowing that the known one has support on the real line (R). However, the case of non-gaussian known component can be overcome thanks to the basic transformation by cdf. Recall that an admixture model has probability density function (pdf) l = p*f + (1-p)*g, where g is the known pdf and l is observed (others are unknown). Requires optimization (to estimate the unknown parameters) as defined by Bordes & Vandekerkhove (2010), which means that the unknown mixture component must have a symmetric density.

gaussianity_test( sample1, comp.dist, comp.param, K = 3, lambda = 0.2, support = c("Real", "Integer", "Positive", "Bounded.continuous") )

sample1 | Observed sample with mixture distribution given by l = p*f + (1-p)*g, where f and p are unknown and g is known. |
---|---|

comp.dist | List with two elements corresponding to the component distributions involved in the admixture model. Unknown elements must be specified as 'NULL' objects. For instance if 'f' is unknown: list(f = NULL, g = 'norm'). |

comp.param | List with two elements corresponding to the parameters of the component distributions, each element being a list itself. The names used in this list must correspond to the native R names for distributions. Unknown elements must be specified as 'NULL' objects (e.g. if 'f' is unknown: list(f=NULL, g=list(mean=0,sd=1)). |

K | Number of coefficients considered for the polynomial basis expansion. |

lambda | Rate at which the normalization factor is set in the penalization rule for model selection (in ]0,1/2[). See 'Details' below. |

support | Support of the densities under consideration, useful to choose the polynomial orthonormal basis. One of 'Real', 'Integer', 'Positive', or 'Bounded.continuous'. |

A list of 6 elements, containing: 1) the rejection decision; 2) the p-value of the test; 3) the test statistic; 4) the variance-covariance matrix of the test statistic; 5) the selected rank for testing; and 6) a list of the estimates (unknown component weight 'p', shift location parameter 'mu' and standard deviation 's' of the symmetric unknown distribution).

See the paper 'False Discovery Rate model Gaussianity test' (Pommeret & Vanderkerkhove, 2017).

Xavier Milhaud xavier.milhaud.research@gmail.com

####### Under the null hypothesis H0. ## Parameters of the gaussian distribution to be tested: list.comp <- list(f = "norm", g = "norm") list.param <- list(f = c(mean = 2, sd = 0.5), g = c(mean = 0, sd = 1)) ## Simulate and plot the data at hand: obs.data <- rsimmix(n = 400, unknownComp_weight = 0.8, comp.dist = list.comp, comp.param = list.param)[['mixt.data']] plot(density(obs.data))## Performs the test: list.comp <- list(f = NULL, g = "norm") list.param <- list(f = NULL, g = c(mean = 0, sd = 1)) gaussianity_test(sample1 = obs.data, comp.dist = list.comp, comp.param = list.param, K = 3, lambda = 0.1, support = 'Real')#> $decision #> [1] 0 #> #> $p_value #> [1] 0.6615656 #> #> $test.stat #> [1] 0.1916274 #> #> $var.stat #> [,1] [,2] [,3] #> [1,] 0.2132671 NA NA #> [2,] NA 0.8014134 NA #> [3,] NA NA 0.8002897 #> #> $rank #> [1] 1 #> #> $estimates #> $estimates$p #> [1] 0.7933074 #> #> $estimates$mu #> [1] 1.929348 #> #> $estimates$s #> [1] 0.5644895 #> #>