A data-driven calibration for a non-asymptotic kernel two-sample test
We observe two populations of multivariate data described by p variables, where p is significantly larger than the population sizes. A two-sample test has to be performed to decide between the null hypothesis (the distributions of both populations are equal) and the alternative hypothesis (distributions are different). To take into account the complex structure of variables and overcome the curse of dimensionality problem, data are embedded in a well-chosen Reproducing Kernel Hilbert Space (RKHS). In our work, we study a test statistic inspired by Harchaoui et al.