verdaccio: verdaccio/sinopia is not yet cluster ready

Currently verdaccio/sinopia has an internal cache of which packages exists. It does not reparse the .sinopia-db, therefore if you run sinopia in a cluster, there will be nodes that are out of sync and even more problematic, you may overwrite the .sinopia-db with different states.

Therefore we need to add the capability to either: Create a communication channel (EventBus, MQ, whatsoever) or just a central configuration, like a database.

About this issue

  • Original URL
  • State: closed
  • Created 8 years ago
  • Reactions: 6
  • Comments: 31 (9 by maintainers)

Commits related to this issue

Most upvoted comments

Just FYI

npm install -g verdaccio@beta in the alpha version the door is open to creating new storage plugins. There is no documentation yet, but, I’m working on a dummy plugin as an experiment and see whether the API has good shape.

https://github.com/verdaccio/verdaccio-memory

Redis or not really doesn’t matter 😃 But good to see you guys make progress here. CouchDB is not that wrong at all, I’m personally not a big fan of it, but the official npmjs also uses CouchDB, so you’re not that far from right.

Yeah, I think it is awesome, It is already perfect! I can wait 5-10 seconds any way! It is good that you keep it working!!! Ciao dude!

Using a local couchdb and with changes I got from #44 I manage to make it works. It caches package.json content and the commandspublish and unpublish seems to works. This is just a early stage, no plugin support, but, that should be that hard. The branch is origin/storage_pluggable.

CouchDB Admin Tool Screenshots

Here we can see a previous private package published. screen shot 2017-05-21 at 11 59 00 am A public packages published screen shot 2017-05-21 at 12 35 31 pm Overview of the size database, after some popular packages cached. screen shot 2017-05-21 at 12 35 19 pm

Setup:

  • Install CouchDB
  • User password: root/root
  • Domain: 127.0.0.1:5984

#474

I’ve been working in a small demo about how to scale this properly which I’ll push in the following days hopefully. My idea is the following:

The Problem

Verdaccio persist only in the file storage in a file called .sinopia-db.json and handled internally by local-data.js and currently there is not way to change this unless we create a way via configuration to choose different sort of (I named it) adapters, but I’m not good naming stuff, so, help 👍 .

Possible solution

Adapters

It would be a way to switch from different persistence sources, like databases, file storage, NFS (#54) and so on. This approach, at least, would eliminate the first barrier and handle state in one single place. eg: local-mongo.js, local-xxx.js.

Configuration example for MongoDB

adapter:
  mongo:
    url: 'mongodb://127.0.0.1:27017/verdaccio'
    user: foo
    password: barr
   ## additional parameters

If there is no adapter defined (as a fallback) the file storage would be used by default, otherwise, the adapter would be used.

Caveats

  • Adapters would be asynchronous, the current implementation is synchronous and that force me to move to Promises or different alternatives (async await, but we do not use ES6 transpilers sadly ) to defer requests. In other words, huge changes in the code base.
  • The packages are still stored in the file system, it might be a possible problem of sync if a a package exist in the file system but not in the persisten source. eg: (corrupt backups)
  • How they will be integrated? bundle-in? plugins?
  • Maybe more, I’m just starting to see future problems. (<-- your opinion would be great for me)

As I mentioned before. I’m working in a small implementation with MongoDB and already have persistence connection, trying to solve those small issues I’ve been finding.

I’d love your suggestions, comments to see if I’m addressing this correctly.

#1 thanks for your work on this going to have a look here soon.

Storage Integrations

CouchDB #475 (help wanted) Firebase #474 (help wanted)

Amazon S3

Amazon S3 #472 (PoC ready for testing)

https://github.com/Remitly/verdaccio-s3-storage/tree/s3 https://www.npmjs.com/package/verdaccio-s3-storage

Google Cloud

Google Storage #473 (PoC ready for testing)

https://www.npmjs.com/package/verdaccio-google-cloud https://github.com/verdaccio/verdaccio-google-cloud

@p3x-robot … I will just stop here witth that answer and will leave that alone, as there just seems a big huge language barrier, as you seem to completely understand me wrong though, just letting you know I do know node.js and b/c I do for many years, I know about many culprits and the cluster module is just not the way you go to scale in complicated environments. Using the cluster module is just the most easy way to go for, with a couple of problems that you will note very very late, maybe already too late, though. As already noted above, do whatever you want, I obviously don’t care how you do it, I just gave an advice.

And to give an answer to the json file “lock”, well yes kind of, but for a standalone server of this (which is the only version which the json file is suitable for) writing it sync was probably the easiest way to stop any concurrency there.

And as a last thing to the c++ statement, I always use the c++ bits just for the interface for node, the rest I write in c when I look for speed though.

Ohhh !! now I see your point and also I love redis, it’s damn fast. I like it and I’m agreed with your approach.

As you said, the main goals 🥇 here for the short future are:

My current small roadmap with this:

  1. Clean up the code base (not ready yet) and still is a mess and really hard to read and maintain
  2. Plugins first #192 #169 ( In order allow others to hook their own stuff as you mentioned 👍 ) 2.1 Storage Plugins (eg: this ticket) 2.2 Auth Plugins 2.3 etc etc …
  3. Improve WebUI

Anyways, it is very good you already done, you dont have to be speedy 😃 I love what you did!!!

redis is not enough fast? you can save json by key, and load, super fast, storage can keep file based as it is, can’t be faster. redis is scalable.

@p3x-robot how would you handle the persistence of packages and database of private modules?