xmlunit: ElementSelectors.byNameAndText doesnt ignore the order of nodes

Hi,

I am quite impressed with the library and its documentation. Thanks for providing all the support. I have seen your issues https://github.com/xmlunit/xmlunit/issues/77 suggesting to use ElementSelectors.byNameAndText for ignoring the order of elements.

However if i have XML’s something like this, the XMLUnit comparison fails to believe that its the same XML structure.

XML Control:

<root>
<parent><line>
<segment>L</segment>
</line>
<line>
<segment>K</segment>
</line>
<line>
<segment>P</segment>
</line>
</root>

<root>
<parent><line>
<segment>P</segment>
</line>
<line>
<segment>K</segment>
</line>
<line>
<segment>L</segment>
</line>
</root>

Here is my code:

Diff myDiff = DiffBuilder.compare(control).withTest(test).checkForSimilar(). withNodeMatcher(new DefaultNodeMatcher(ElementSelectors.byNameAndText)).build(); if(myDiff.hasDifferences()) Iterator<Difference> iterator = myDiff.getDifferences().iterator(); while(iterator.hasNext()) Difference diff = iterator.next(); System.out.println(diff.getComparison().toString) }

Please let me know what am i missing here, essentially these two structures of XML are same in different Line element order.

Also, I wanted to ignore white space between elements, i have considered using normalizeWhiteSpace and ignoreWhiteSpace API but they essentially remove the spaces from the text content as well. Below both XML’s are same except element ‘line’ is in next time for one of the XML structure.

<root>
<parent><line>
<segment>L</segment>
</line>
<line>
<segment>K</segment>
</line>
<line>
<segment>P</segment>
</line>
</root>

<root>
<parent>
<line>
<segment>P</segment>
</line>
<line>
<segment>K</segment>
</line>
<line>
<segment>L</segment>
</line>
</root>

I really appreciate your help, I spent quite a bit of time figuring this out.

About this issue

Original URL
State: closed
Created 6 years ago
Reactions: 2
Comments: 22 (11 by maintainers)

Commits related to this issue

provide a `Source` that strips element content whitespace ping #115 — committed to xmlunit/xmlunit by bodewig 6 years ago
add `ignoreElementContentWhitespace` methods to builder/matcher closes #119 ping #115 — committed to xmlunit/xmlunit by bodewig 6 years ago

Most upvoted comments

WRT to element content whitespace you may need to do something yourself, you can take a look at Nodes.stripWhitespace for inspiration. You are correct, ignoreWhitespace would be close but is mixing two concerns. It might be a good enhancement to have an alternative stripEmptyTextNodes that only dealt with element content whitespace.

Next to the ID elements: You need to understand that XMLUnit really only looks at a single set of sibling elements at a time and once it has decided which branches of the documents it descends it is never going to look into the other branches. The byXPath/byNameAndText combo has been used to select the matching line elements. It does not affect the selection of segment or ID elements at all. Once XMLUnit has settled on the line elements to compare there are two child elements in each of them, one is called segment, the other ID and the ElementSelector to use is byName (this is what we told XMLUnit to use by default). The is no ambiguity at all. There is no magic that selects the correct ID elements, they just happen to be the only choices once XMLUnit has selected the line elements.

XMLUnit always compares nested text, the ElementSelector is only there to decide which elements to compare with each other, but XMLUnit will always perform all comparisons you find in the ComparisonType enum.

For the final question you may want to implement a custom ComparisonFormatter. DefaultComparisonFormatter may be a good starting point when you only need to override the display of nested text by replacing appendText.

bodewig on Apr 2, 2018

byNameAndText is not a silver bullet, neither is any of the other selectors. It always depends on your concrete scenario.

The elements that are out of order in your example are the line elements. The thing that identifies matching lines to you is the nested text of the segment element nested into the line element. byNamAndText looks at the name of the element - line - which is the same for all of them and the text nested directly into the element - none at all - which is also the same for all of them. This means byNameAndText is not the ElementSelector you need.

What you need is “when looking at a line element, also look at the nested text of the child element named segment”. One way to state that in XMLUnit is

ElementSelectors.byXPath("./segment", ElementSelectors.byNameAndText)

there would be different ways and the byName in byNameAndText is redundant as the XPath already ensures only segment elements are selected, but this is the most convenient way to state it.

You do not want to use this ElementSelector for everything. There is no segment child in parent for example, so you need different ElementSelectors for different parts of your document. This is where conditional selectors come in. For simplicity I will assume that matching on element name is good enough for all elements that are not line elements. At least this is true for your example. If you need more complex decisions, then the conditional builder should provide you with everything needed as well. For the simplistic case the solution would be

ElementSelectors.conditionalBuilder()
    .whenElementIsNamed("line")
    .thenUse(ElementSelectors.byXPath("./segment", ElementSelectors.byNameAndText))
    .elseUse(ElementSelectors.byName)
    .build();

bodewig on Apr 1, 2018