tskit: "invalid start byte" error accessing variant.alleles in pyodide/jupyterlite
When I try to run the following in a jupyterlite notebook (using chrome 112.0.5615.49 under OS X):
import msprime
print("msprime v:", msprime.__version__, ", tskit v:", tskit.__version__)
ts = msprime.sim_mutations(msprime.sim_ancestry(10, sequence_length=10, random_seed=1), rate=1, random_seed=1)
next(ts.variants()).alleles
I get
msprime v: 1.2.0 , tskit v: 0.5.4
---------------------------------------------------------------------------
UnicodeDecodeError Traceback (most recent call last)
Cell In[4], line 7
4 print("msprime v:", msprime.__version__, ", tskit v:", tskit.__version__)
6 ts = msprime.sim_mutations(msprime.sim_ancestry(10, sequence_length=10, random_seed=1), rate=1, random_seed=1)
----> 7 next(ts.variants()).alleles
File /lib/python3.11/site-packages/tskit/genotypes.py:151, in Variant.alleles(self)
143 @property
144 def alleles(self) -> tuple[str | None, ...]:
145 """
146 A tuple of the allelic values which samples can possess at the current
147 site. Unless an encoding of alleles is specified when creating this
148 variant instance, the first element of this tuple is always the site's
149 ancestral state.
150 """
--> 151 return self._ll_variant.alleles
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf8 in position 4: invalid start byte
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 21 (20 by maintainers)
Turns out this is a problem in which we don’t cast the length of the string to Py_ssize_t, which is an easy fix.
Yeash, if I do
all is good, but put those identical lines in a for loop body and it fails:
And I get the error.
Yeah, the length was correct.
@hyanwong I can’t replicate this using the pyodide version at https://pyodide.org/en/stable/console.html Which is
3.11.2 (main, May 3 2023, 04:00:05)(Pyodide 0.23.2) a more recent version than3.11.2 (main, Mar 30 2023, 21:37:59)(Pyodide 0.23.0) on the notebook demo site.I can’t identify the upstream commit that fixed this, but it looks like it was fixed. I was already certain that it wasn’t our code, but this confirms it.
A quick fix might be to use
y#and then do the utf-8 decode on the python side.