go-diff: Index out of range panic in DiffCharsToLines on large JSON diff

I’ve encountered this issue while using https://github.com/src-d/go-git, but the bug is easily reproducible with the code snippet below and the JSON file in attachment.

package main

import (
	"fmt"
	"io/ioutil"
	"os"

	"github.com/sergi/go-diff/diffmatchpatch"
)

func main() {
	f, err := os.Open("data.txt")
	defer f.Close()
	checkErr(err)
	data, err := ioutil.ReadAll(f)
	checkErr(err)

	// from https://github.com/src-d/go-git/blob/v4.0.0/utils/diff/diff.go#L17
	dmp := diffmatchpatch.New()
	wSrc, wDst, warray := dmp.DiffLinesToChars(string(data), "")
	diffs := dmp.DiffMain(wSrc, wDst, false)
	diffs = dmp.DiffCharsToLines(diffs, warray)
	fmt.Println(diffs)
}

func checkErr(err error) {
	if err != nil {
		panic(err)
	}
}

Output:

$ go run main.go
panic: runtime error: index out of range

goroutine 1 [running]:
github.com/sergi/go-diff/diffmatchpatch.(*DiffMatchPatch).DiffCharsToLines(0xc420044ee8, 0xc420078390, 0x1, 0x2, 0xc4202ce000, 0xd802, 0xec00, 0x1, 0x2, 0xc4202ce000)
	/Users/krylovsk/src/github.com/sergi/go-diff/diffmatchpatch/diff.go:414 +0x394
main.main()
	/tmp/go-diff-debug/main.go:24 +0x29e
exit status 2

data.txt

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Comments: 16 (5 by maintainers)

Commits related to this issue

Most upvoted comments

Hi All,

I have done a fix for this issue and with my testing its working.

Issue: As I explained above issue is storing the index value in the rune. Fix: Moved the index stored from rune to string array. Using delimeter “,” the string array is combined to a string . This string is converted to rune array and returned for DiffLinesToRunes. DiffCharsToLines method, split the index string, which is the diff using the delimeter and then use the index to fetch data from linearray.

The fix is in the last two commits of my go-diff fork. I have also changed the test cases and added test to check the a huge line diff testcase .

Branch: https://github.com/r-pai/go-diff/

Any comments before I create a PR.

Thanks Pai

I too faced the issue and had a look on why the issue is occurring. I was able to know what the issue but for the fix its not that easy as you need to know most of the logic. Its better people who wrote it can fix it better than me.

Please have a look on this example. Here is a sample code of what happens when a rune array is converted to string and back to rune array. Sample: https://play.golang.org/p/XFVKayUhV27

func main() {
	r := []rune{55296}
	s := string(r)
	r1 := []rune(s)
	fmt.Println(r, s, r1)
}

Output :

[55296] � [65533] In go-diff , similar thing is done and issue occurs because of this usage.

Issue: If my file has lines is 55296 or more this issue arrises, and max 65530 should be there. How the issue occurs?

In go-diff the lineArray index is stored in the rune array ( In diff.go , diffLinesToRunesMunge() function ). This is converted to string and stored in ‘Diff’ structure ‘Text’ variable. (In diff.go , diffMainRunes function) There is come processing with this Diff.Text and logic is a bit difficult to understand 😃 Time Constraint too Now here in diff.go , DiffCharsToLines method we convert the Diff.Text to rune array and use it (as index) to get the line from line array. and this is when it get the ‘index out of bound’ error

I hope , I had put some light on this issue. 😃

Thank You… P@i

Awesome, thank you for the quick update!

Hi @stamblerre, I have a potential fix for this that I plan to apply later today if all tests pass.

Hi folks,

Sorry about the silence. I’m back and I’ll be looking into this issue soon, but I’d definitely appreciate it if one of you wants to contribute a PR to it.

We are not working on it right now, so if you want to fix it go ahead, this will be great.

On Thu, May 23, 2019, 4:00 PM Corefracture notifications@github.com wrote:

Anyone know if @sergi https://github.com/sergi is still active? I was hoping to submit a PR for this fix (unless someone else is or has a fix). I’m running into this now consistently 😦

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/sergi/go-diff/issues/89?email_source=notifications&email_token=AAMAB6WPPNVXUMIWMRRQZXLPW2PPXA5CNFSM4EL5UEO2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODWCKBNY#issuecomment-495231159, or mute the thread https://github.com/notifications/unsubscribe-auth/AAMAB6WSH6DOU75MZSVQ7QLPW2PPXANCNFSM4EL5UEOQ .