gitea: Slow repository browsing in 1.14.x

Description

I saw a similar thread but there is “windows” in the title so I create a new issue. Gitea 1.14.x is much slower in repository browsing than Gitea 1.13.

Sample repo running with 1.14.1: https://gitea.ttmath.org/FreeBSD/ports Try to open any directory, for example: https://gitea.ttmath.org/FreeBSD/ports/src/branch/main/audio It takes between 50-150 seconds to open a page.

The same repo running with 1.13.7: https://giteaold.ttmath.org/FreeBSD/ports Try to open similar directory, for example: https://giteaold.ttmath.org/FreeBSD/ports/src/branch/main/audio I takes about 5 seconds.

You can see the same problem on try.gitea.io: https://try.gitea.io/tsowa/FreeBSD_ports But you have a cache so you have to find a directory which was not open before. Opening such a page takes 100-300 seconds.

Let me know if more info is needed.

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 3
  • Comments: 87 (78 by maintainers)

Most upvoted comments

oh my - I think I know how to seriously improve this. I think I’ve been way too distracted by the way it was done in the go-git implementation and there’s genuinely a much quicker way to do this.

I’m going to close this as I believe these problems have been considerably improved on 1.15 and main. If specific problems remain please ask for a reopen but please provide some logs - or consider opening another issue with more details.

@zeripath

Big improvement,

browsing nixpkgs from our slow test instance:

pure git: 17.6s go-git 7.7s Your backport: 6.7s

I have a backport of the latest get-lastcommit-cache performance improvements on to 1.14 if people would like them.

OK I’ve pushed up another version of #16059 and its backport on to 1.14 to backport-use-git-log-raw.

These are radically quicker for me on most of these repositories and examples.

reponame #16059 5a90343 #16042 4c851b1 GoGit
ports 47fc04fbc3 2536ms 8717ms 8746ms
ports/devel 2633ms 50097ms 20482ms
ports/audio 240ms 2680ms 2887ms
ports/polish 185ms 8029ms 820ms
git faefdd61e 5108ms 3120ms 3373ms
git/gitk-git 313ms 544ms 2322ms
git/Documentation 13983ms 1040ms 5989ms
nixpkgs c43e0f4873 2694ms 19714ms 5863ms
nixpkgs/pkgs 2733ms 26187ms 714ms

I guess the next step is examining why git git/Documentation and nixpkgs/pkgs are pathological for 16059 and how that can be ameliorated.

Ok thinking on I think the only answer is just to use repeated calls to git log -n1 once the number of commits reaches some high level.

If I couple that with the (in progress) deferred commit info generation pr (https://github.com/zeripath/gitea/tree/defer-last-commit-info) then we’ll have a workable low memory option.

Yes this will mean that the two backends can slightly different results - but it’s ultimately better than the current status.

One question - have you disabled the commit cache? If so please re-enable it.

@tsowa unfortunately yes but it should be relatively fast - the issue will be that the structure of some repos will actually require that million of rows to be checked more than a few times. Determining which commit a file is related to is not a simple task in git - and although there’s a commit graph we don’t have a good way of querying it.

(It shouldn’t take 15s to pipe those two commands together - you’re slowing things down by allocating file space - you should pipe the output to null btw.)

There are a few more improvements to that function that can be made - for a start the function is not optimised for our collapsing of of directories containing a single document - and writing a commit graph reader would be part of that.

The gogit backend does have a commitgraph reader but it is not frugal with memory at all. I need to spend some time making a reader that is much more frugal and stream like but I haven’t had the time. (See the technical docs https://github.com/git/git/blob/master/Documentation/technical/commit-graph.txt)

In the end though we need to move rendering of last commit info out of repo browsing and in to an ajax call. Again something I haven’t had time to do.

Thanks for the hint with TAGS. I don’t have time to make more tests now but I found something interesting.

When browsing my repository with gitea I see in htop following git processes:

22304 root       20   0 12876  2100 S  0.0  0.0  0:00.00 daemon: /usr/bin/env[22305]
22305 git2       31   0  926M  254M S 136.  0.8  1:29.80 └─ /usr/local/sbin/gitea web
22839 git2       21   0  952M  158M S  3.3  0.5  0:01.11    ├─ /usr/local/bin/git -c credential.helper= -c protocol.version=2 rev-list --format=%T 9ea557779ce520c206f223f6f7b48fcc52f92dad
22840 git2       27   0 1103M  275M S 13.5  0.8  0:04.59    └─ /usr/local/bin/git -c credential.helper= -c protocol.version=2 cat-file --batch

These processes were running for about one minute so I have run the first git process by hand:

$ cd /var/db/gitea2/gitea-repositories/freebsd/ports.git
$ /usr/local/bin/git -c credential.helper= -c protocol.version=2 rev-list --format=%T 9ea557779ce520c206f223f6f7b48fcc52f92dad | wc -l

and it gave me 1087346 rows. I suppose the millions rows are then pass to the second git process.

I have piped output from the first git to the other:

$ /usr/local/bin/git -c credential.helper= -c protocol.version=2 rev-list --format=%T 9ea557779ce520c206f223f6f7b48fcc52f92dad | /usr/local/bin/git -c credential.helper= -c protocol.version=2 cat-file --batch > swinka.txt

it takes about 15 seconds and shows that file swinka.txt is larger than 1 GB

$ ll -h swinka.txt 
-rw-r--r--  1 git2  git2   1,4G 10 maj 22:47 swinka.txt

so there is a lot of data to pass between gitea and git. So the question is: is it really needed for the first git process to return one milion rows?

This is because the algorithm was changed in 1.14 due to a problem with go-git causing significant memory issues. Thank you for the test cases though because they will provide tests to improve the current algorithm.

If you are suffering significant slow downs here you can switch back to the gogit build by adding gogit to your TAGS during building.

We would otherwise appreciate help in improving the performance of the algorithm for the pure git version.