knowledge-repo: Posts don't show and can't be accessed when using MySQL DB

Auto-reviewers: @NiharikaRay @matthewwardrop @earthmancash @danfrankj

I’m trying to deploy knowledge repo with a database backend. The database is running on a different place than the knowledge_repo server and is accessed over an SSH tunnel.

It seems that Knowledge Repo can read from and write to this database just fine.

However, for whatever reason, no posts are showing up. And it’s not just in the feed, but they also can’t be accessed directly via their path. I looked in the database itself and the posts are there in the repository table, however, the posts table that knowledge repo makes itself is empty. If I try to re-add the posts I see in the repository table, I expectedly get the error that their paths already exist.

I tried also setting SQLALCHEMY_DATABASE_URI to use sqlite, but that didn’t seem to help.

Another thing I tried was running a MySQL DB on a local docker container, and the same issue persists.

Version 0.8.1 of Knowledge Repo, on a Mac, with python 3 if that helps.

I’m sort of at a loss as to why this could be happening.

In the logs, I see stuff like this:

    _mysql.connection.query(self, query)
sqlalchemy.exc.OperationalError: (_mysql_exceptions.OperationalError) (2013, 'Lost connection to MySQL server during query') [SQL: 'SELECT index_metadata.id AS index_metadata_id, index_metadata.type AS index_metadata_type, index_metadata.name AS index_metadata_name, index_metadata.value AS index_metadata_value, index_metadata.updated_at AS index_metadata_updated_at \nFROM index_metadata \nWHERE index_metadata.type = %s AND index_metadata.name = %s \n LIMIT %s'] [parameters: ('lock', 'master_check', 1)] (Background on this error at: http://sqlalche.me/e/e3q8)

During startup of knowledge repo I also see this:

WARNING:knowledge_repo.app.index:Master indexing thread has died. Restarting...

About this issue

  • Original URL
  • State: open
  • Created 6 years ago
  • Comments: 29 (14 by maintainers)

Most upvoted comments

I explored this some more, and it’s actually a combination of minor issues and lack of documentation.

  1. When using deploy with MySQL 5.6, indexing fails with the Lost Connection messages. However, using runserver works. The catch is to make it work you have to make sure to delete any data in the index_metadata table that may have been left behind by deploy.

  2. Indexing also works with deploy when setting --workers 0 , and indexing also works when triggered manually via knowledge_repo’s reindex command. This suggests that there is some issue specifically with the multiple workers. So a good workaround is to deploy normally and turn off automatic indexing and do it manually via a cron job that calls reindex.

  3. You also have to change the post status to 3 in the database (not sure how to do this in the UI or via the command line. submit only gets status to 1). I see no options in knowledge_repo script, or direction in the webapp about how to ‘review’ posts submitted for review and change their status. The accept() and publish() etc methods seem to not be called by the current version of the web app or knowledge_repo script except specifically for webposts. One option is to manually navigate to https://host:port/edit/project_name/path_to_post.kp but this requires that project_name posts are in the list of allowed posts to edit online (this can be specified in config, and only webposts is allowed by default).

  4. There is incompatibility with MySQL 5.7 changes to Strict mode. This causes indexing when using runserver (and also deploy) to fail (unlike MySQL 5.6, where it works with runserver)

Basically this is not currently ready to run reliably with a DB backend, but it also seems like it’s pretty close.

Finally some progress!

It appears db.session.close() doesn’t actually close connections when using a pool, just returns them to the pool. Then they somehow collide with connections from other queries like the one in get_posts when using multiprocessing.

I added db.engine.dispose() at the start of all models.py IndexMetadata() methods, which moved the error to dbrepository.py revision() method. I then added self.engine.dispose() at the start of that method too, and now indexing seems to work, including with the latest version of sqlalchemy.

As far as I can tell those are the only places where db connections are made during indexing.

While indexing works, I haven’t yet tested to see if this broke anything else.

Hey @matthewwardrop can you check your sqlalchemy version? I did some tests, and indexing partly works with sqlalchemy 1.1.x versions (and previous versions also), whereas all 1.2.X versions fail. Looks like something may have changed in 1.2.x that breaks KR.

The latest version that works is 1.1.18 which is the latest in the 1.1.x line

So maybe upgrading the sqlalchemy version will allow you to reproduce the issues and fix them, or at least add sqlalchemy 1.1.x as a requirement.

When I say ‘partly’ works though, it works as long as no other queries are run. But for example when a user visits the KR page, triggering a query to list the posts, or other similar actions, indexing will then again start to fail. Basically, any queries that pull content will then break indexing. Then all subsequent indexing attempts will fail.

That’s very interesting behavior. Querying the DB for posts seems to continue to work fine, but specifically querying for indexing starts failing after the first time posts are queried, or some other query happens. I wonder if the connection of the get_posts() query and other queries are somehow colliding with the indexing thread connection because they are in separate threads and not aware of each other.

I suspect all of the issues with indexing are just improperly handled connections and different packages and databases handling different situations gracefully and others not so gracefully. Probably going through the code and closing and reinitializing connections every time will solve the issues, though not sure what sort of performance overhead it might introduce. Would love the authors to at least weigh in on approaches since I don’t have a good grasp of the codebase of knowledge repo yet.

The fix I posted above though goes a long way - it fixes the runserver indexing error I get with mysqlconnect package also, meaning that at least for me, the combination of:

  1. mysqlconnect (instead of MySQLdb)
  2. MySQL 5.6
  3. changes to the revisions() function I posted

solve all of the errors except the issue with bad leftover data in index_metadata table.

Exciting times 😃