engine-deprecated: Initializing Public Git Repo Fails/Unclear When Gitbase is Ready

Summary: Downloading a subset of the PGA and calling srcd init fails to provide a usable environment after several hours.

srcd Engine version: v0.11.0 Container image versions:

elithrar@matt-workstation:~$ docker images
REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
srcd/cli-daemon     v0.11.0             e09406877f03        2 days ago          44MB
srcd/gitbase        v0.19.0             5df5a9b119a9        3 days ago          37.6MB
srcd/gitbase-web    v0.6.2              1c745ac11485        4 days ago          109MB
bblfsh/bblfshd      v2.11.8-drivers     ac9a79330aa9        2 weeks ago         1.42GB

Machine spec: 16 vCPU, 200GB RAM, 200GB SSD (custom GCP VM)

Steps to reproduce:

  1. Initialize repo
~$ srcd init $HOME/repos/pga/siva # I believe this is the correct root as per below?
~$ tree -L 2 repos/pga/
repos/pga/
`-- siva    
     `-- latest
  1. Call srcd sql 'SHOW tables; until it no longer reports “waiting for Gitbase to be ready”. This takes multiple hours; htop reports random cores pegged at 100% at intervals, but low memory usage.

  2. Successfully call srcd sql 'SHOW TABLES; and get a table listing:

~$ srcd sql "SHOW tables;"
+--------------+
|    TABLE     |
+--------------+
| blobs        |
| commit_blobs |
| commit_files |
| commit_trees |
| commits      |
| files        |
| ref_commits  |
| refs         |
| remotes      |
| repositories |
| tree_entries |
+--------------+
  1. Attempt to run a test query against a table (any) - fails with an rpc error:
~$ srcd sql 'SELECT 1 from refs;'
2019/03/11 05:30:13 rpc error: code = Unknown desc = closing row iterator: invalid connection

Open to suggestions!

My initial feedback (noting the empathy sessions/issues on this repo) is that determining the status of gitbase is near-impossible as a user without (likely) entering the container; knowing how to initialize the engine on siva files is undocumented (educated guess based on prev. issues; may still be wrong); errors are opaque and lack context (filename, line no. would be useful).

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 15 (8 by maintainers)

Most upvoted comments

Thanks David - I’m going to give the custom dataset approach a shot. Really appreciate the help.

The ~15 attempts (not an exaggeration) at downloading a working siva dataset - varying my jq filters to shrink it a little more each time - seemingly returned a corrupted DB each time.

On Sun, Mar 31, 2019 at 8:21 AM David Pordomingo notifications@github.com wrote:

By the way: initing Engine for that 7.5K repos was fast (less than a minute), and the query to count the commits was ready in similar time, so I’d think that the problem you got was related with a corrupted database (as seen by repository does not exist error)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/src-d/engine/issues/313#issuecomment-478350924, or mute the thread https://github.com/notifications/unsubscribe-auth/AABIcIOv9bD5HcdXy0KkWxY05CRE5eUsks5vcNKDgaJpZM4bn1NE .