methylartist: ValueError: Could not interpret value `coord` for parameter `x`
Hi
I’m trying to use methylartist composite for plotting methylation over several SVAs.
I think the software finishes profiling the coordinates, but then it rans into
Traceback (most recent call last):
File "/sw/pkg/methylartist/1.2.4/bin/methylartist", line 4855, in <module>
main(args)
File "/sw/pkg/methylartist/1.2.4/bin/methylartist", line 4570, in main
args.func(args)
File "/sw/pkg/methylartist/1.2.4/bin/methylartist", line 4358, in composite
ax0 = sns.lineplot(x='coord', y='meth', data=meanplot_table, ci='sd', lw=2, hue='sample', palette=sample_color)
File "/sw/easybuild/software/Seaborn/0.11.2-foss-2021b/lib/python3.9/site-packages/seaborn/_decorators.py", line 46, in inner_f
return f(**kwargs)
File "/sw/easybuild/software/Seaborn/0.11.2-foss-2021b/lib/python3.9/site-packages/seaborn/relational.py", line 692, in lineplot
p = _LinePlotter(
File "/sw/easybuild/software/Seaborn/0.11.2-foss-2021b/lib/python3.9/site-packages/seaborn/relational.py", line 367, in __init__
super().__init__(data=data, variables=variables)
File "/sw/easybuild/software/Seaborn/0.11.2-foss-2021b/lib/python3.9/site-packages/seaborn/_core.py", line 605, in __init__
self.assign_variables(data, variables)
File "/sw/easybuild/software/Seaborn/0.11.2-foss-2021b/lib/python3.9/site-packages/seaborn/_core.py", line 668, in assign_variables
plot_data, variables = self._assign_variables_longform(
File "/sw/easybuild/software/Seaborn/0.11.2-foss-2021b/lib/python3.9/site-packages/seaborn/_core.py", line 903, in _assign_variables_longform
raise ValueError(err)
ValueError: Could not interpret value `coord` for parameter `x`
I thought maybe it was because of low methylation calls in the coordinates, so I tried with a subset of coordinates that I’ve used with methylartist locus
(finishing succesfully), but it fails with the same error.
My command is:
methylartist composite -d methylation_calls_files.txt -s SVA_F.bed -r hg38.fa -p 20 --svg -t SVA_F.cons
I’m using version 1.2.4
I tried using --excl_ambig
and to change -l
to a more loose threshold, but I got the same error.
The contents of methylation_calls_files.txt are:
op_043_001.sorted.bam op_043_001_methylation_calls.db
op_043_002.sorted.bam op_043_002_methylation_calls.db
Any advice would be greatly appreciated!
Best,
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 17 (9 by maintainers)
Ignore the above comment, I’ve modified the arguments to
composite
in commit 13522b7 so that this makes some sense now I think. There’s now a--minelts
and--maxelts
which default to 1 and 200, respectively. To plot everything you can just give--maxelts
a very large value e.g. 5000 e.g.:Also, this now shows a message e.g.
2022-09-29 23:36:24,950 sample 054 has 1563 useable elements, will sample 1563
which can be compared against the number of unique elements in the output table for a sanity check e.g.:I’ve added a
--output_table
option tocomposite
in commit 737a5c0. The output currently looks something like:where
coord
refers to a position in the reference element (.fasta passed to-t
option). The positions included in the table are determined the same way the positions included in the mean methylation plot (default to median coverage or--meanplot_cutoff
. Let me know if this works for you.Good to know it works for you. It should be straightforward to dump the elements x sites data into a table, I’ll look at it. It will be a rather sparse table if all CpGs for all elements are included but that’s probably OK. Alternately could use the same
--meanplot_cutoff
as above.Also, it’s probably worth noting that
composite
samples from the list of input sites up to--maxsegs
elements, so if you want to see everything plotted this must be set to a number at least as big as your input.I see similar results for SVA, it was due to their heterogeneity and an automatic cutoff that determines which sites are included in the mean methylation plot. Because each individual element (in this case SVA) might include different sites (in this case, CG dinucleotides), each position in the reference SVA aligned to the individual SVA in is assessed to see how many SVAs have a CG at that position, call this the “coverage” at that CG. The median coverage is taken across all CGs and used as a cutoff for determining which to include in the plot. Since SVAs are very heterogeneous this doesn’t work well so I’ve added a parameter
--meanplot_cutoff
in f6f7fee, let me know if it works for you, looks like this for me (default vs --meanplot_cutoff 5):default:
--meanplot_cutoff 5
: