methylartist: ValueError: Could not interpret value `coord` for parameter `x`

Hi

I’m trying to use methylartist composite for plotting methylation over several SVAs.

I think the software finishes profiling the coordinates, but then it rans into

Traceback (most recent call last):
  File "/sw/pkg/methylartist/1.2.4/bin/methylartist", line 4855, in <module>
    main(args)
  File "/sw/pkg/methylartist/1.2.4/bin/methylartist", line 4570, in main
    args.func(args)
  File "/sw/pkg/methylartist/1.2.4/bin/methylartist", line 4358, in composite
    ax0 = sns.lineplot(x='coord', y='meth', data=meanplot_table, ci='sd', lw=2, hue='sample', palette=sample_color)
  File "/sw/easybuild/software/Seaborn/0.11.2-foss-2021b/lib/python3.9/site-packages/seaborn/_decorators.py", line 46, in inner_f
    return f(**kwargs)
  File "/sw/easybuild/software/Seaborn/0.11.2-foss-2021b/lib/python3.9/site-packages/seaborn/relational.py", line 692, in lineplot
    p = _LinePlotter(
  File "/sw/easybuild/software/Seaborn/0.11.2-foss-2021b/lib/python3.9/site-packages/seaborn/relational.py", line 367, in __init__
    super().__init__(data=data, variables=variables)
  File "/sw/easybuild/software/Seaborn/0.11.2-foss-2021b/lib/python3.9/site-packages/seaborn/_core.py", line 605, in __init__
    self.assign_variables(data, variables)
  File "/sw/easybuild/software/Seaborn/0.11.2-foss-2021b/lib/python3.9/site-packages/seaborn/_core.py", line 668, in assign_variables
    plot_data, variables = self._assign_variables_longform(
  File "/sw/easybuild/software/Seaborn/0.11.2-foss-2021b/lib/python3.9/site-packages/seaborn/_core.py", line 903, in _assign_variables_longform
    raise ValueError(err)
ValueError: Could not interpret value `coord` for parameter `x`

I thought maybe it was because of low methylation calls in the coordinates, so I tried with a subset of coordinates that I’ve used with methylartist locus (finishing succesfully), but it fails with the same error.

My command is: methylartist composite -d methylation_calls_files.txt -s SVA_F.bed -r hg38.fa -p 20 --svg -t SVA_F.cons

I’m using version 1.2.4

I tried using --excl_ambig and to change -l to a more loose threshold, but I got the same error.

The contents of methylation_calls_files.txt are:

op_043_001.sorted.bam op_043_001_methylation_calls.db
op_043_002.sorted.bam op_043_002_methylation_calls.db

Any advice would be greatly appreciated!

Best,

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 17 (9 by maintainers)

Most upvoted comments

Ignore the above comment, I’ve modified the arguments to composite in commit 13522b7 so that this makes some sense now I think. There’s now a --minelts and --maxelts which default to 1 and 200, respectively. To plot everything you can just give --maxelts a very large value e.g. 5000 e.g.:

methylartist composite -d 054.data.txt -s te/SVA.1kbp.bed -r Homo_sapiens_assembly38.fasta -t SVA_F.fa -p 64 --output_table --lenfrac 0.5 --maxelts 5000 -a 0.1

054 data SVA 1kbp composite

Also, this now shows a message e.g. 2022-09-29 23:36:24,950 sample 054 has 1563 useable elements, will sample 1563 which can be compared against the number of unique elements in the output table for a sanity check e.g.:

$ cut -f1,2,3 054.data.SVA.1kbp.composite.table.tsv | grep -v ^chrom | sort | uniq | wc -l
1563

I’ve added a --output_table option to composite in commit 737a5c0. The output currently looks something like:

chrom   start   end     strand  sample  coord   meth
chr16   24108350        24110141        -       054     33      0.6178728620349763
chr2    202534282       202536111       -       054     33      0.90873449102059
chr20   32723118        32724962        +       054     33      0.9506836249230268
... etc ...

where coord refers to a position in the reference element (.fasta passed to -t option). The positions included in the table are determined the same way the positions included in the mean methylation plot (default to median coverage or --meanplot_cutoff. Let me know if this works for you.

Good to know it works for you. It should be straightforward to dump the elements x sites data into a table, I’ll look at it. It will be a rather sparse table if all CpGs for all elements are included but that’s probably OK. Alternately could use the same --meanplot_cutoff as above.

Also, it’s probably worth noting that composite samples from the list of input sites up to --maxsegs elements, so if you want to see everything plotted this must be set to a number at least as big as your input.

I see similar results for SVA, it was due to their heterogeneity and an automatic cutoff that determines which sites are included in the mean methylation plot. Because each individual element (in this case SVA) might include different sites (in this case, CG dinucleotides), each position in the reference SVA aligned to the individual SVA in is assessed to see how many SVAs have a CG at that position, call this the “coverage” at that CG. The median coverage is taken across all CGs and used as a cutoff for determining which to include in the plot. Since SVAs are very heterogeneous this doesn’t work well so I’ve added a parameter --meanplot_cutoff in f6f7fee, let me know if it works for you, looks like this for me (default vs --meanplot_cutoff 5):

default: 054 data SVA_F 1kbp composite

--meanplot_cutoff 5: 054 data SVA_F 1kbp meanplot_cutoff_5 composite