robot: filter --trim true not working as advertised

TL;DR:

  1. According to the doc: filter --term-file terms.txt --trim true should return all axioms that only reference terms in terms.txt. This is not the case. I think it should be.
  2. Better doc is needed on the interaction between filters specified with a list of terms and other filter options. It is not clear from doc what the result of --trim true --select foo will be for various flavours of foo (apart from “annotations, ontology”).

Details:

“Each axiom refers to one or more entities: classes, object properties, annotation properties, etc. By default, filter will copy axioms from the input ontology for which all the entities are in the specified set. To be more relaxed and copy axioms that contain one or more entities from the specified set, use --trim false. This will usually leave some “dangling references” to entities that were not in the specified set.” Robot Filter doc cl-int.owl.zip terms.txt

Example

cl-int.owl (attached) term.txt(attached) - contains all CL terms + two object properties, one of which is RO_0002202

cl_int.owl has:

    <owl:Class rdf:about="http://purl.obolibrary.org/obo/CL_0000017">
        <rdfs:subClassOf rdf:resource="http://purl.obolibrary.org/obo/CL_0000015"/>
        <rdfs:subClassOf>
            <owl:Restriction>
                <owl:onProperty rdf:resource="http://purl.obolibrary.org/obo/RO_0002202"/>
                <owl:someValuesFrom rdf:resource="http://purl.obolibrary.org/obo/CL_0000020"/>
            </owl:Restriction>
        </rdfs:subClassOf>
        <rdfs:subClassOf>
            <owl:Restriction>
                <owl:onProperty rdf:resource="http://purl.obolibrary.org/obo/RO_0002215"/>
                <owl:someValuesFrom rdf:resource="http://purl.obolibrary.org/obo/GO_0048137"/>
            </owl:Restriction>
        </rdfs:subClassOf>
        <obo:IAO_0000115 rdf:datatype="http://www.w3.org/2001/XMLSchema#string">A male germ cell that develops from spermatogonia. The euploid primary spermatocytes undergo meiosis and give rise to the haploid secondary spermatocytes which in turn give rise to spermatids.</obo:IAO_0000115>

 robot filter -i cl-int.owl --term-file terms.txt \ 
  --trim false --select annotations --preserve-structure false --output \ 
 cl_filter_trim_false.owl

=>

    <!-- http://purl.obolibrary.org/obo/CL_0000017 -->

    <owl:Class rdf:about="http://purl.obolibrary.org/obo/CL_0000017">
        <rdfs:subClassOf rdf:resource="http://purl.obolibrary.org/obo/CL_0000015"/>
        <rdfs:subClassOf>
            <owl:Restriction>
                <owl:onProperty rdf:resource="http://purl.obolibrary.org/obo/RO_0002202"/>
                <owl:someValuesFrom rdf:resource="http://purl.obolibrary.org/obo/CL_0000020"/>
            </owl:Restriction>
        </rdfs:subClassOf>
        <rdfs:subClassOf>
            <owl:Restriction>
                <owl:onProperty rdf:resource="http://purl.obolibrary.org/obo/RO_0002215"/>
                <owl:someValuesFrom rdf:resource="http://purl.obolibrary.org/obo/GO_0048137"/>
            </owl:Restriction>
        </rdfs:subClassOf>
        <obo:IAO_0000115 rdf:datatype="http://www.w3.org/2001/XMLSchema#string">A male germ cell that develops from spermatogonia. The euploid primary spermatocytes undergo meiosis and give rise to the haploid secondary spermatocytes which in turn give rise to spermatids.</obo:IAO_0000115>
 

As expected GO_0048137 is retained as a dangling ref. However

 robot filter -i cl-int.owl --term-file terms.txt \
    --trim true --preserve-structure false --output cl_filter_trim_true_na.owl

=>

      <!-- http://purl.obolibrary.org/obo/CL_0000017 -->

    <owl:Class rdf:about="http://purl.obolibrary.org/obo/CL_0000017">
        <rdfs:subClassOf rdf:resource="http://purl.obolibrary.org/obo/CL_0000015"/>
    </owl:Class>

This is rather more surprising. CL:0000017, RO:0002202 and CL:0000020 are all referenced in ontologyterms.txt, but the subClassOf restriction referncing all 3 is not retained. In contrast CL:0000017 subClassOf CL:0000015, for which both terms are referenced, is retained. Surely this is a bug. Having this work as advertised would both useful in itself and improve usability (Users wouldn’t need to spend time experimenting with combining --select options to try to get what they want).

The only way I’ve found to get ~what I want is:

robot  filter -i cl-int.owl --term-file terms.txt --trim true \
   --select "annotations anonymous parent" --preserve-structure false \
   --output cl_filter_trim_true_select_ann_anon_parent.owl

The problem with this is that I pull back undesired terms that are not in the filter but that are parents of the specified filter terms.

The behavior of --select when combined with --trim true should be better documented. Is it accurate to say the --select selectively eliminates trimming of axioms? So, --select annotations eliminates trimming of annotation axioms on selected terms (& axioms?). --select parents selectively eliminates trimming of asserted parent axioms etc? Or do some of these --select options add additional terms to the filter set?

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 18 (14 by maintainers)

Most upvoted comments

I just released 1.4.1, which I hope addresses this.

Ok. The release artifacts are an important use case, no question. I’ll assign this to @rctauber and we’ll figure out how to get the desired behaviour, with or without filter. I’d also be fine with a specialized ROBOT command for generating these artefacts.

Help with detailed use cases, examples, and code is always appreciated.