jodd: java.lang.IndexOutOfBoundsException
Current behavior
I got java.lang.IndexOutOfBoundsException while parsing http://koaci.com and http://cofc.edu pages.
I use:
<dependency>
<groupId>org.jodd</groupId>
<artifactId>jodd-lagarto</artifactId>
<version>5.1.4</version>
</dependency>
Exception that I have:
java.lang.IndexOutOfBoundsException
at jodd.util.CharArraySequence.<init>(CharArraySequence.java:69)
at jodd.util.CharArraySequence.of(CharArraySequence.java:47)
at jodd.lagarto.Scanner.charSequence(Scanner.java:142)
at jodd.lagarto.LagartoParser.emitComment(LagartoParser.java:3002)
at jodd.lagarto.LagartoParser$28.parse(LagartoParser.java:1411)
at jodd.lagarto.LagartoParser.parse(LagartoParser.java:135)
Expected behavior
No exception.
Steps to Reproduce the Problem
My code:
LagartoParser lagartoParser = new LagartoParser(html.toString());
lagartoParser.setConfig(new LagartoParserConfig().setEnableConditionalComments(false)); // recommendation from previous issue !
TagVisitorImpl tagVisitor = new TagVisitorImpl();
lagartoParser.parse(tagVisitor);
class TagVisitorImpl implements TagVisitor {
@Override
public void tag(Tag tag) {
href = tag.getAttributeValue("href");
if (href != null) {
// ...
}
}
Pages that I parsed:
https://www.dropbox.com/s/ou17nxllxdnc54x/22314_koaci.com.html?dl=0
https://www.dropbox.com/s/jqioanxydlhwid2/42488_cofc.edu.html?dl=0
How I found it ? I downloaded top 100_000 pages and make some performance tests between libraries. I parse real pages, so it brings real problems. If you want I can send you list of top 1 million pages urls. I downloaded 100_000 pages as html - 6GB as ZIP, I can also share.
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 17 (13 by maintainers)
I second that too. As of IE 10 they are not supported anymore and even IE10 is completely out of support as of January this year. Not saying Jodd should drop the support altogether but changing the default might be a good idea.
https://validator.w3.org/nu/?doc=https%3A%2F%2Fwww.koaci.com%2F 😃
+1: last version supporting conditional comments was IE9
That should make it reproducible