xmlunit: ElementSelectors.byNameAndText doesnt ignore the order of nodes
Hi,
I am quite impressed with the library and its documentation. Thanks for providing all the support. I have seen your issues https://github.com/xmlunit/xmlunit/issues/77 suggesting to use ElementSelectors.byNameAndText for ignoring the order of elements.
However if i have XML’s something like this, the XMLUnit comparison fails to believe that its the same XML structure.
XML Control:
<root>
<parent><line>
<segment>L</segment>
</line>
<line>
<segment>K</segment>
</line>
<line>
<segment>P</segment>
</line>
</root>
<root>
<parent><line>
<segment>P</segment>
</line>
<line>
<segment>K</segment>
</line>
<line>
<segment>L</segment>
</line>
</root>
Here is my code:
Diff myDiff = DiffBuilder.compare(control).withTest(test).checkForSimilar(). withNodeMatcher(new DefaultNodeMatcher(ElementSelectors.byNameAndText)).build(); if(myDiff.hasDifferences()) Iterator<Difference> iterator = myDiff.getDifferences().iterator(); while(iterator.hasNext()) Difference diff = iterator.next(); System.out.println(diff.getComparison().toString) }
Please let me know what am i missing here, essentially these two structures of XML are same in different Line element order.
Also, I wanted to ignore white space between elements, i have considered using normalizeWhiteSpace and ignoreWhiteSpace API but they essentially remove the spaces from the text content as well. Below both XML’s are same except element ‘line’ is in next time for one of the XML structure.
<root>
<parent><line>
<segment>L</segment>
</line>
<line>
<segment>K</segment>
</line>
<line>
<segment>P</segment>
</line>
</root>
<root>
<parent>
<line>
<segment>P</segment>
</line>
<line>
<segment>K</segment>
</line>
<line>
<segment>L</segment>
</line>
</root>
I really appreciate your help, I spent quite a bit of time figuring this out.
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Reactions: 2
- Comments: 22 (11 by maintainers)
Commits related to this issue
- provide a `Source` that strips element content whitespace ping #115 — committed to xmlunit/xmlunit by bodewig 6 years ago
- add `ignoreElementContentWhitespace` methods to builder/matcher closes #119 ping #115 — committed to xmlunit/xmlunit by bodewig 6 years ago
WRT to element content whitespace you may need to do something yourself, you can take a look at
Nodes.stripWhitespacefor inspiration. You are correct,ignoreWhitespacewould be close but is mixing two concerns. It might be a good enhancement to have an alternativestripEmptyTextNodesthat only dealt with element content whitespace.Next to the
IDelements: You need to understand that XMLUnit really only looks at a single set of sibling elements at a time and once it has decided which branches of the documents it descends it is never going to look into the other branches. ThebyXPath/byNameAndTextcombo has been used to select the matchinglineelements. It does not affect the selection ofsegmentorIDelements at all. Once XMLUnit has settled on thelineelements to compare there are two child elements in each of them, one is calledsegment, the otherIDand theElementSelectorto use isbyName(this is what we told XMLUnit to use by default). The is no ambiguity at all. There is no magic that selects the correctIDelements, they just happen to be the only choices once XMLUnit has selected thelineelements.XMLUnit always compares nested text, the
ElementSelectoris only there to decide which elements to compare with each other, but XMLUnit will always perform all comparisons you find in theComparisonTypeenum.For the final question you may want to implement a custom
ComparisonFormatter.DefaultComparisonFormattermay be a good starting point when you only need to override the display of nested text by replacingappendText.byNameAndTextis not a silver bullet, neither is any of the other selectors. It always depends on your concrete scenario.The elements that are out of order in your example are the
lineelements. The thing that identifies matching lines to you is the nested text of thesegmentelement nested into thelineelement.byNamAndTextlooks at the name of the element -line- which is the same for all of them and the text nested directly into the element - none at all - which is also the same for all of them. This meansbyNameAndTextis not theElementSelectoryou need.What you need is “when looking at a
lineelement, also look at the nested text of the child element namedsegment”. One way to state that in XMLUnit isthere would be different ways and the
byNameinbyNameAndTextis redundant as the XPath already ensures onlysegmentelements are selected, but this is the most convenient way to state it.You do not want to use this
ElementSelectorfor everything. There is nosegmentchild inparentfor example, so you need differentElementSelectors for different parts of your document. This is where conditional selectors come in. For simplicity I will assume that matching on element name is good enough for all elements that are notlineelements. At least this is true for your example. If you need more complex decisions, then the conditional builder should provide you with everything needed as well. For the simplistic case the solution would be