gitea: Slow repository browsing in 1.14.x
- Gitea version (or commit ref): 1.14.x
- Git version: 2.31.1
- Operating system: FreeBSD 13
- Gitea built using ports collection (www/gitea)
- Gitea started by startup script provided by www/gitea port
- Database (use
[x]
):- PostgreSQL 12.6
- MySQL
- MSSQL
- SQLite
- Can you reproduce the bug at https://try.gitea.io:
- Log gist: https://www.ttmath.org/gitea.log
Description
I saw a similar thread but there is “windows” in the title so I create a new issue. Gitea 1.14.x is much slower in repository browsing than Gitea 1.13.
Sample repo running with 1.14.1: https://gitea.ttmath.org/FreeBSD/ports Try to open any directory, for example: https://gitea.ttmath.org/FreeBSD/ports/src/branch/main/audio It takes between 50-150 seconds to open a page.
The same repo running with 1.13.7: https://giteaold.ttmath.org/FreeBSD/ports Try to open similar directory, for example: https://giteaold.ttmath.org/FreeBSD/ports/src/branch/main/audio I takes about 5 seconds.
You can see the same problem on try.gitea.io: https://try.gitea.io/tsowa/FreeBSD_ports But you have a cache so you have to find a directory which was not open before. Opening such a page takes 100-300 seconds.
Let me know if more info is needed.
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Reactions: 3
- Comments: 87 (78 by maintainers)
oh my - I think I know how to seriously improve this. I think I’ve been way too distracted by the way it was done in the go-git implementation and there’s genuinely a much quicker way to do this.
I’m going to close this as I believe these problems have been considerably improved on 1.15 and main. If specific problems remain please ask for a reopen but please provide some logs - or consider opening another issue with more details.
@zeripath
Big improvement,
browsing nixpkgs from our slow test instance:
pure git: 17.6s go-git 7.7s Your backport: 6.7s
I have a backport of the latest get-lastcommit-cache performance improvements on to 1.14 if people would like them.
OK I’ve pushed up another version of #16059 and its backport on to 1.14 to backport-use-git-log-raw.
These are radically quicker for me on most of these repositories and examples.
I guess the next step is examining why git git/Documentation and nixpkgs/pkgs are pathological for 16059 and how that can be ameliorated.
Ok thinking on I think the only answer is just to use repeated calls to git log -n1 once the number of commits reaches some high level.
If I couple that with the (in progress) deferred commit info generation pr (https://github.com/zeripath/gitea/tree/defer-last-commit-info) then we’ll have a workable low memory option.
Yes this will mean that the two backends can slightly different results - but it’s ultimately better than the current status.
One question - have you disabled the commit cache? If so please re-enable it.
@tsowa unfortunately yes but it should be relatively fast - the issue will be that the structure of some repos will actually require that million of rows to be checked more than a few times. Determining which commit a file is related to is not a simple task in git - and although there’s a commit graph we don’t have a good way of querying it.
(It shouldn’t take 15s to pipe those two commands together - you’re slowing things down by allocating file space - you should pipe the output to null btw.)
There are a few more improvements to that function that can be made - for a start the function is not optimised for our collapsing of of directories containing a single document - and writing a commit graph reader would be part of that.
The gogit backend does have a commitgraph reader but it is not frugal with memory at all. I need to spend some time making a reader that is much more frugal and stream like but I haven’t had the time. (See the technical docs https://github.com/git/git/blob/master/Documentation/technical/commit-graph.txt)
In the end though we need to move rendering of last commit info out of repo browsing and in to an ajax call. Again something I haven’t had time to do.
Thanks for the hint with TAGS. I don’t have time to make more tests now but I found something interesting.
When browsing my repository with gitea I see in htop following git processes:
These processes were running for about one minute so I have run the first git process by hand:
and it gave me 1087346 rows. I suppose the millions rows are then pass to the second git process.
I have piped output from the first git to the other:
it takes about 15 seconds and shows that file swinka.txt is larger than 1 GB
so there is a lot of data to pass between gitea and git. So the question is: is it really needed for the first git process to return one milion rows?
This is because the algorithm was changed in 1.14 due to a problem with go-git causing significant memory issues. Thank you for the test cases though because they will provide tests to improve the current algorithm.
If you are suffering significant slow downs here you can switch back to the gogit build by adding
gogit
to yourTAGS
during building.We would otherwise appreciate help in improving the performance of the algorithm for the pure git version.