kBET: High rejection rates for seemingly well integrated data
Hello there. I have been trying to get this package working for quite some time now but no matter what, the rejection rates are either 1 or close to 1 regardless of correction. I would gladly appreciate feedback on this issue.
The data before conversion to a matrix is a [cell by gene] data.frame with the last two columns giving the batch and celltypes as categorical data. The data is for a single celltype collected over three separate batches and each batch contains not too disimilar numbers of observations.
Example:
PCA before correction:

kBET before correction:

PCA after correction:

kBET after correction:

Code:
# KBET for uncorrected
# coerce as matrix and remove categorical columns
data <- as.matrix(original[ , 1:(length(original)-2)])
#coerce batch labels(sample) to factor
batch <- as.factor(original$sample)
#compute and plot
k0=floor(mean(table(batch)))
knn <- get.knn(data, k=k0, algorithm = 'cover_tree')
batch.estimate <- kBET(data, batch, plot=TRUE, do.pca = TRUE, dim.pca = 2)
#KBET for corrected
# coerce as matrix and remove categorical columns
data <- as.matrix(corrected[ , 1:(length(corrected)-2)])
#coerce batch labels(sample) to factor
batch <- as.factor(corrected$sample)
#compute and plot
k0=floor(mean(table(batch))) #neighbourhood size: mean batch size
knn <- get.knn(data, k=k0, algorithm = 'cover_tree')
batch.estimate <- kBET(data, batch, plot=TRUE, do.pca = TRUE, dim.pca = 2)
I also tried iterating over many different values of k(recomputing knn as suggested) yet the rejection rate saturates and remains high throughout.
Assessment over varying values of k (corrected data):

best regards, Dean
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 16
Dear Ray,
Thank you for pointing this out. In the publication, we used
acceptance rate = 1 - rejection rate. This is a rescaling such that low values (close to 0) correspond to “bad” results and high values (close to 1) correspond to “good” results. However, as the foundation of kBET is a statistical test (and therefore, there is no such thing as acceptance in hypothesis testing), the kBET package itself only reports rejection rates.Thank you for your question. In general, we implemented kBET such that it accepts several types of inputs and is not dependent on a feature matrix per se. The standard workflow is as follows: kBET requires a dense matrix as input and then computes a knn-graph from the dense matrix. The FNN package, which does the knn-computation, requires a dense matrix as input. This is indeed less than optimal. If you want to use an assay from
SingleCellExperimentis a sparse matrix, please consider to convert it into a dense matrix. If, for some reason, you have a large data matrix and computing nearest neighbours with kBET (and FNN resp.) takes too long, you can use a different method to compute the knn-graph instead. If you give a knn-graph as additional input, then kBET skips the computation of the knn-graph and computes the rejection rates directly on the knn-graph (see section “Variations” in the Readme):Please note that a data matrix with the same size as your original data is required, but not used (apart from estimating number of rows and columns).
Yes, you can use a reduced matrix as input. In fact, we observed that using highly variable genes tends to improve the batch effect correction in most methods.
Please let me know when you have further questions.
Best, Maren