gitea: 500 when adding review comments on specific lines for PR:s

  • Gitea version (or commit ref): 1.8.2
  • Git version: 1.8.3.1
  • Operating system: CentOS 7
  • Database (use [x]):
    • PostgreSQL
    • MySQL
    • MSSQL
    • SQLite
  • Can you reproduce the bug at https://try.gitea.io:
    • Yes (provide example URL)
    • No
    • Not relevant
  • Log gist:

[Macaron] 2019-06-20 16:08:16: Started POST /MyTeam/MyRepo/pulls/19/files/reviews/comments for 172.16.2.102

2019/06/20 16:08:16 http: superfluous response.WriteHeader call from code.gitea.io/gitea/vendor/gopkg.in/macaron%2ev1.(*responseWriter).WriteHeader (response_writer.go:59)

[Macaron] 2019-06-20 16:08:16: Completed POST /MyTeam/MyRepo/pulls/19/files/reviews/comments 302 Found in 57.718947ms

Description

I get a 500 internal server error sometimes when adding review comments to specific lines for a PR. I haven’t figured the pattern yet because it only happens on specific lines of the files. But once I find a line that causes the 500 internal server error then it happens every time.

It could be related to #6236 as the files we work in are not in UTF-8 but instead in iso-8859-15 and we use the swedish characters Å, Ä and Ö but I’m not getting the same error message in the log so therefore I’m creating a new issue.

If it’s relevant I started running the server on version on 1.7.0 and has since upgraded to 1.8.1 and then 1.8.2

The repo and the files are related to my work so I can’t share those publicly.

Screenshots

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 26 (14 by maintainers)

Most upvoted comments

Changing the database will not solve this.

The problem is we assume that code lines are in utf8 so just store the line of code as a string in the database.

Git makes no such guarantees and code lines can be in any encoding.

We either:

  • Stop storing code lines in the db - and reference the line (we still need to reencode at some point.)
  • Have to reencode the codeline to ensure that it is in utf8 before putting it in the db.
  • Store the codeline as bytes and reencode it when it comes back out.

There are plenty of other places where we make the same wrong assumption: at the plumbing level git just stores bytes not strings.

  • If your commit message was converted from SVN or CVS it can be non utf8. Git can report if it was told that this message was utf8 or not and I think there is a way to convert.
  • If you filesystem is weird you can have non utf8 characters in the path. Hell if you’re perverse you can do this on Unix extremely easily.
  • Git makes almost zero assumptions about file encodings - write your code in koi8 for all it cares, write the comments in a totally different encoding to the rest of the file…
  • If you have weird settings git will put messages into weird encodings.

All of these are potential failures with encoding - we don’t see them very often because most people just use the default but it’s a significant issue for anyone with old code. Unfortunately our code is full of assumptions regarding the string nature of all of these things.

So changing your MySQL table format ain’t gonna fix this. Your codeline ain’t value utf8 and although go hasn’t complained the db certainly will.

Yes, @guillep2k it is.

We could convert the code lines to utf8 before we save it to database.

If we do this, we’d need to use the whole file for encoding detection, not just one line, as it’s not enough context. The detection library requires at least 1024 bytes; the more, the better.

We could convert the code lines to utf8 before we save it to database.

@tanrui8765 which database are you using? If it’s mysql, which character set did you use?

In my case it’s postgresql 9.2.24.

@tanrui8765 which database are you using? If it’s mysql, which character set did you use?