gitea: Slow repository browsing in 1.14.x

Gitea version (or commit ref): 1.14.x
Git version: 2.31.1
Operating system: FreeBSD 13
Gitea built using ports collection (www/gitea)
Gitea started by startup script provided by www/gitea port
Database (use [x]):
- PostgreSQL 12.6
- MySQL
- MSSQL
- SQLite
Can you reproduce the bug at https://try.gitea.io:
- Yes (https://try.gitea.io/tsowa/FreeBSD_ports)
- No
Log gist: https://www.ttmath.org/gitea.log

Description

I saw a similar thread but there is “windows” in the title so I create a new issue. Gitea 1.14.x is much slower in repository browsing than Gitea 1.13.

Sample repo running with 1.14.1: https://gitea.ttmath.org/FreeBSD/ports Try to open any directory, for example: https://gitea.ttmath.org/FreeBSD/ports/src/branch/main/audio It takes between 50-150 seconds to open a page.

The same repo running with 1.13.7: https://giteaold.ttmath.org/FreeBSD/ports Try to open similar directory, for example: https://giteaold.ttmath.org/FreeBSD/ports/src/branch/main/audio I takes about 5 seconds.

You can see the same problem on try.gitea.io: https://try.gitea.io/tsowa/FreeBSD_ports But you have a cache so you have to find a directory which was not open before. Opening such a page takes 100-300 seconds.

Let me know if more info is needed.

About this issue

Original URL
State: closed
Created 3 years ago
Reactions: 3
Comments: 87 (78 by maintainers)

Links to this issue

Most upvoted comments

oh my - I think I know how to seriously improve this. I think I’ve been way too distracted by the way it was done in the go-git implementation and there’s genuinely a much quicker way to do this.

zeripath on May 31, 2021

I’m going to close this as I believe these problems have been considerably improved on 1.15 and main. If specific problems remain please ask for a reopen but please provide some logs - or consider opening another issue with more details.

zeripath on Aug 26, 2021

@zeripath

Big improvement,

browsing nixpkgs from our slow test instance:

pure git: 17.6s go-git 7.7s Your backport: 6.7s

ashimokawa on Jun 2, 2021

I have a backport of the latest get-lastcommit-cache performance improvements on to 1.14 if people would like them.

zeripath on May 30, 2021

OK I’ve pushed up another version of #16059 and its backport on to 1.14 to backport-use-git-log-raw.

These are radically quicker for me on most of these repositories and examples.

reponame	#16059 5a90343	#16042 4c851b1	GoGit
ports 47fc04fbc3	2536ms	8717ms	8746ms
ports/devel	2633ms	50097ms	20482ms
ports/audio	240ms	2680ms	2887ms
ports/polish	185ms	8029ms	820ms
git faefdd61e	5108ms	3120ms	3373ms
git/gitk-git	313ms	544ms	2322ms
git/Documentation	13983ms	1040ms	5989ms
nixpkgs c43e0f4873	2694ms	19714ms	5863ms
nixpkgs/pkgs	2733ms	26187ms	714ms

I guess the next step is examining why git git/Documentation and nixpkgs/pkgs are pathological for 16059 and how that can be ameliorated.

zeripath on Jun 6, 2021

Ok thinking on I think the only answer is just to use repeated calls to git log -n1 once the number of commits reaches some high level.

If I couple that with the (in progress) deferred commit info generation pr (https://github.com/zeripath/gitea/tree/defer-last-commit-info) then we’ll have a workable low memory option.

Yes this will mean that the two backends can slightly different results - but it’s ultimately better than the current status.

zeripath on Jun 1, 2021

One question - have you disabled the commit cache? If so please re-enable it.

zeripath on May 11, 2021

@tsowa unfortunately yes but it should be relatively fast - the issue will be that the structure of some repos will actually require that million of rows to be checked more than a few times. Determining which commit a file is related to is not a simple task in git - and although there’s a commit graph we don’t have a good way of querying it.

(It shouldn’t take 15s to pipe those two commands together - you’re slowing things down by allocating file space - you should pipe the output to null btw.)

There are a few more improvements to that function that can be made - for a start the function is not optimised for our collapsing of of directories containing a single document - and writing a commit graph reader would be part of that.

The gogit backend does have a commitgraph reader but it is not frugal with memory at all. I need to spend some time making a reader that is much more frugal and stream like but I haven’t had the time. (See the technical docs https://github.com/git/git/blob/master/Documentation/technical/commit-graph.txt)

In the end though we need to move rendering of last commit info out of repo browsing and in to an ajax call. Again something I haven’t had time to do.

zeripath on May 11, 2021

Thanks for the hint with TAGS. I don’t have time to make more tests now but I found something interesting.

When browsing my repository with gitea I see in htop following git processes:

22304 root       20   0 12876  2100 S  0.0  0.0  0:00.00 daemon: /usr/bin/env[22305]
22305 git2       31   0  926M  254M S 136.  0.8  1:29.80 └─ /usr/local/sbin/gitea web
22839 git2       21   0  952M  158M S  3.3  0.5  0:01.11    ├─ /usr/local/bin/git -c credential.helper= -c protocol.version=2 rev-list --format=%T 9ea557779ce520c206f223f6f7b48fcc52f92dad
22840 git2       27   0 1103M  275M S 13.5  0.8  0:04.59    └─ /usr/local/bin/git -c credential.helper= -c protocol.version=2 cat-file --batch

These processes were running for about one minute so I have run the first git process by hand:

$ cd /var/db/gitea2/gitea-repositories/freebsd/ports.git
$ /usr/local/bin/git -c credential.helper= -c protocol.version=2 rev-list --format=%T 9ea557779ce520c206f223f6f7b48fcc52f92dad | wc -l

and it gave me 1087346 rows. I suppose the millions rows are then pass to the second git process.

I have piped output from the first git to the other:

$ /usr/local/bin/git -c credential.helper= -c protocol.version=2 rev-list --format=%T 9ea557779ce520c206f223f6f7b48fcc52f92dad | /usr/local/bin/git -c credential.helper= -c protocol.version=2 cat-file --batch > swinka.txt

it takes about 15 seconds and shows that file swinka.txt is larger than 1 GB

$ ll -h swinka.txt 
-rw-r--r--  1 git2  git2   1,4G 10 maj 22:47 swinka.txt

so there is a lot of data to pass between gitea and git. So the question is: is it really needed for the first git process to return one milion rows?

tsowa on May 10, 2021

This is because the algorithm was changed in 1.14 due to a problem with go-git causing significant memory issues. Thank you for the test cases though because they will provide tests to improve the current algorithm.

If you are suffering significant slow downs here you can switch back to the gogit build by adding gogit to your TAGS during building.

We would otherwise appreciate help in improving the performance of the algorithm for the pure git version.

zeripath on May 3, 2021