GSEApy: Enrichr: Local mode of GO/enrichment analysis does not provide Odds Ratios in the results

Hi, thanks for this package! This is more of a feature request than a bug report. I think the odds ratio is quite an important value to interpret enrichment analysis results.

Setup

I am reporting a problem with GSEApy version, Python version, and operating system as follows:

import sys; print(sys.version)
import platform; print(platform.python_implementation()); print(platform.platform())
import gseapy; print(gseapy.__version__)

3.9.5 (default, Jun 4 2021, 12:28:51) [GCC 7.5.0] CPython Linux-3.10.0-1062.18.1.el7.x86_64-x86_64-with-glibc2.17 0.10.5

Expected behaviour

When using the Enrichr functionality of GSEApy gseapy.enrichr() in local mode, to be able to provide a custom background gene set, the resulting data frame contains the same columns (including odds ratio) as in the vanilla mode (direct query of enrichr).

Actual behaviour

When using the Enrichr functionality of GSEApy gseapy.enrichr() in local mode, to be able to provide a custom background gene set, the resulting data frame does not contain Odds Ratios, although the vanilla mode returns Odds Ratios from enrichr.

Steps to reproduce

It is already apparent in the respective example in the docs.

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 17 (8 by maintainers)

Commits related to this issue

Most upvoted comments

Thanks @sreichl , @136s . It’s not easy to detect this issue in so much detail.

Hi @sreichl,

Thank you for agreeing with my opinion. Thanks also for reconsidering the correct formula. I can agree with the pull request #238 I submitted the other day as it is also the same formula.

Two alternatives : returning an ‘inf’ value or the suggested correction formulat. I’d vote for the formula , because:

  1. Exception will be prevented (main problem solved)
  2. When x increases in size toward m, the oddr does move in the right direction instead of being stuck on the less informative inf value.

…And if the caller insists on rejecting the oddr , she still has the information that x==k from the overlap value

Hi @yossi-liron, great catch that is true!

I think this only occurs if the query gene list completely overlaps with the category gene list (eg GO Term). Still, this exception has to be addressed, and apparently we are not the first ones to encounter this.

The pragmatic solution is to add 0.5 to every cell of the contingency table to avoid divisions by zero, called Haldane-Anscombe correction.

This would mean in the case of k==x

oddr= ((x+0.5)*(bg-m+0.5))/((m+0.5)*(k-x+0.5))

What do you think?

Cheers, S