pyranges: Range join (also overlap, intersect) fails until I recreate the PyRanges from its .df

I encountered this today, and I’m completely baffled by it.

I have an object test_gr that looks like this:

+--------------+-----------+-----------+----------------------+-------+
|   Chromosome |     Start |       End | Value                | +4    |
|   (category) |   (int32) |   (int32) | (object)             | ...   |
|--------------+-----------+-----------+----------------------+-------|
|            1 | 161332203 | 161332344 | POOR_MAPPING_QUALITY | ...   |
+--------------+-----------+-----------+----------------------+-------+
Unstranded PyRanges object has 1 rows and 8 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome.
4 hidden columns: sample_name, vendor, experiment, promised_exon

And I have an object test_exons that I want to join, as below

+--------------+-----------+-----------+
| Chromosome   | Start     | End       |
| (category)   | (int32)   | (int32)   |
|--------------+-----------+-----------|
| 1            | 15764898  | 15764905  |
| 1            | 15764946  | 15765013  |
| 1            | 15766782  | 15766901  |
| 1            | 15766966  | 15766973  |
| ...          | ...       | ...       |
| X            | 154002863 | 154002997 |
| X            | 154003456 | 154003561 |
| X            | 154004448 | 154004612 |
| X            | 154005060 | 154005156 |
+--------------+-----------+-----------+
Unstranded PyRanges object has 2,177 rows and 3 columns from 23 chromosomes.
For printing, the PyRanges was sorted on Chromosome.

If I run test_gr.join(test_exons), I get Empty PyRanges, which is not correct because there is an overlap in there.

If I do this pr.PyRanges(test_gr.df).join(test_exons), using presumably the same data, then the overlap works.

+--------------+-----------+-----------+----------------------+-------+
|   Chromosome |     Start |       End | Value                | +6    |
|   (category) |   (int32) |   (int32) | (object)             | ...   |
|--------------+-----------+-----------+----------------------+-------|
|            1 | 161332203 | 161332344 | POOR_MAPPING_QUALITY | ...   |
+--------------+-----------+-----------+----------------------+-------+
Unstranded PyRanges object has 1 rows and 10 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome.
6 hidden columns: sample_name, vendor, experiment, promised_exon, Start_b, ... (+ 1 more.)

How can this be? I’ve tried versions all the way back to 0.0.79, and the problem remains.

test_gr was created by (filters removed):

test_df=failed_loci_per_assay_df[
    (failed_loci_per_assay_df.vendor == "XXX")
    & (failed_loci_per_assay_df.experiment == "YYY")
    & (failed_loci_per_assay_df.promised_exon=="ZZZ")
].copy()
test_gr = pr.PyRanges(test_df.head(1))

Thanks very much!

About this issue

Original URL
State: open
Created 4 years ago
Comments: 20 (10 by maintainers)

Commits related to this issue

#159 Chromosome and Strand should be str - ensure these fields are str before converting to category - minor black formatting - use 'not in' — committed to cfriedline/pyranges by cfriedline-sema4 4 years ago
#159 Chromosome and Strand should be str (#162) - ensure these fields are str before converting to category - minor black formatting - use 'not in' Co-authored-by: Chris Friedline <chris.friedli... — committed to pyranges/pyranges by cfriedline 4 years ago

Most upvoted comments

Thank you! Will try to look at tomorrow!

On Wednesday, November 4, 2020, Chris Friedline notifications@github.com wrote:

Here is some code to reproduce:

Data: https://drive.google.com/file/d/1kmaJGb7UP1UO8BtsVgBeCFDDyIXLs 9SF/view?usp=sharing

import pandas as pd import pyranges as pr

test_df = pd.read_csv(“test_df.tsv”, sep=“\t”) test_df = test[ (test.vendor == “44f95017”) & (test.experiment == “ff57c05b”) & (test.promised_exon == “2d243aa3”) ].copy() test_gr = pr.PyRanges(test_df.head(1))

Will fail:

test_gr.join(test_exons)

Will succeed:

pr.PyRanges(test_gr.df).join(test_exons)

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/biocore-ntnu/pyranges/issues/159#issuecomment-721465879, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEHURUWLBG7UHOQZ3CGLFQDSOCW4PANCNFSM4TJN6WIQ .

endrebak on Nov 4, 2020