gitea: Linking issue dependencies fails (API returns 500 error)

Description

Okay, we’re really lost in debugging this for hours now on Codeberg. A live meeting with 5 people cannot figure out what’s exactly going on. Let’s start from the beginning:

On Codeberg.org, you can no longer link issue dependencies. The API call returns an error 500 with the error message in the gitea.log:

2022/04/10 18:38:42 ...api/v1/repo/issue.go:276:SearchIssues() [E] CountIssues: unable to get one row result when CountIssues, row count=0

Now of course we thought of a recent regression and reverted to pre-1.16.5: still happening. We tried to reproduce on try.gitea.io and on codeberg-test: not happening.

We tried moving a whole database snapshot into our air gapped staging instance: Not happening, not reproducible, everything fine it seems (debugging there is a bit complicated, but we tried

"https://codeberg.org/api/v1/repos/issues/search?q=smtp&priority_repo_id=453&type=all&_=164961674595"

and got many results, whereas on Codeberg we still get 500.

Accepting that we can probably only approach the issue on our real production server, @Gusted worked out some patches for us to get more information.

Printing the generated SQL query for the function by XORM, and executing this ourselves in MySQL, worked fine and returned the correct information (about 9000 in our case).

We then checked the length of the return value of models.Issues here to see if it’s an issue with countIssues or if it’s an underlying issue of the options query. And the return value here was zero (0), so it indicates an issue with the query options that was being passed down.

This indicates that the query is correct, but when XORM is executing it, it will not return anything. So this seems to be a scaling issue to us, maybe because this passes the id’s of all the public repos + the further issues a user has access to, which is, 17000+ on our instance. (Maybe a point for further optimization, e.g. extend the query to allow all repos with is_private = false + explicitly private repos a user has access to?)

Also very relevant according to blame: #19244, and especially this comment, as we can’t explain why this would ever return something else than 1 row from the `SELECT COUNT`` call. Please note that our tested version includes that pull request, because @6543 pushed a cherry-pick to our branch. Of course we also tried reverting it, but - still broken.

That’s where we currently are. The SQL log contains many question marks, about 17k, for each repository. That’s about how many question marks we see when thinking about how to further approach this. We appreciate any help. Thank you!

Gitea Version

https://codeberg.org/Codeberg/gitea/commit/99e133b593fef813188e1e6469c65c9a11ec258a

Can you reproduce the bug on the Gitea demo site?

No

Log Gist

No response

Screenshots

No response

Git Version

git version 2.30.2

Operating System

Debian GNU/Linux 11

How are you running Gitea?

custom fork / deployment, see commit ref above

Database

MySQL

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 34 (33 by maintainers)

Most upvoted comments

Continuing with debugging, we found out (in chronological order)

  • shorter statements work, longer don’t (so indeed, optimizing / shortening it would do improvement short-term)
  • we did a minimal reproduction without Gitea nor XORM, but with PHP which works on one server and not the other (so this issue can be closed if you don’t intend to optimize anyway), the issue is with prepared statements (that’s why the manually interpolated query did work)
  • and that even with a complete identical MariaDB config and db
  • it seems to be a regression / changed behaviour in MariaDB between (1:10.5.15-0+deb11u1) & (1:10.5.12-0+deb11u1), the update wasn’t yet applied in the airgapped staging instance

Edit 2: The error we see from MySQL is Commands out of sync; you can't run this command now

Edit 3:

  • it breaks with 1000 parameters (999 work)

What if we modify the code a bit to avoid passing all the repoIDs:

diff --git a/models/issue.go b/models/issue.go
index b1fa2d02a..1957e570c 100644
--- a/models/issue.go
+++ b/models/issue.go
@@ -1195,6 +1195,7 @@ func GetIssuesByIDs(issueIDs []int64) ([]*Issue, error) {
 type IssuesOptions struct {
        db.ListOptions
        RepoIDs            []int64 // include all repos if empty
+       RepoCond           builder.Cond
        AssigneeID         int64
        PosterID           int64
        MentionedID        int64
@@ -1289,6 +1290,10 @@ func (opts *IssuesOptions) setupSessionNoLimit(sess *xorm.Session) {
                applyReposCondition(sess, opts.RepoIDs)
        }
 
+       if opts.RepoCond != nil {
+               sess.And(opts.RepoCond)
+       }
+
        switch opts.IsClosed {
        case util.OptionalBoolTrue:
                sess.And("issue.is_closed=?", true)
diff --git a/routers/web/repo/issue.go b/routers/web/repo/issue.go
index 3ca193a15..7ab1dae01 100644
--- a/routers/web/repo/issue.go
+++ b/routers/web/repo/issue.go
@@ -2227,7 +2227,7 @@ func SearchIssues(ctx *context.Context) {
                                Page:     ctx.FormInt("page"),
                                PageSize: limit,
                        },
-                       RepoIDs:            repoIDs,
+                       RepoCond:           models.SearchRepositoryCondition(opts),
                        IsClosed:           isClosed,
                        IssueIDs:           issueIDs,
                        IncludedLabelNames: includedLabelNames,

It’s probably a very edge case, though. And it’s not breaking any data at least, only the read query. 😃

I have an idea about how to debug the problem.

Maybe we can build Codeberg-Gitea with a special flag (eg: ./gitea-codeberg my-test-issue), and use the special flag to run SearchIssues / CountIssues directly with the specialized options to trigger the bug and see output. Then we just put this test binary to production server, then test with the production database without any downtime.

If the special flag can trigger the bug, then it gives us the chance to find the root case.

If the special flag can not trigger the bug either … it must be impossible, no reason to explain it …