sqlalchemy: Exclude Parameters from Exceptions
Is it possible to categorically suppress parameters from being included in log messages?
As a HIPAA compliant organization, we need to limit the potential exposure of patient data. Obviously, this is the kind of data that is passed across database drivers like SQL Alchemy. Unfortunately, while testing, we ran into some errors (e.g. sqlalchemy.exc.OperationalError). When pushing the error string to our logger, we discovered that the SQL statement and parameters were included. Since the parameters are the patient data, this is far from ideal.
Unfortunately, my (usually strong) google-fu could not find any information about when/how these values are injected into exceptions and (critically) if they can be categorically suppressed at the source. Trying to catch and suppress them in each and every use is not practical since a single mistake could leak patient data into systems where we’d rather not store it.
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 18 (10 by maintainers)
Hi there -
you can intercept all the SQLAlchemy error objects which include parameters using the handle_error event. Here is a handler that will replace the raw parameter structure, which may be a dictionary, tuple, or list of dictionaries or tuples depending on the type of database and execution, with a single dictionary that includes no information:
this will cover that particular case, however would not cover the case if you’ve turned on SQLAlchemy INFO logging, which will log SQL statements and their parameters as well.
The only way to truly guarantee a particular string can be controlled everywhere is by ensuring when this string first exists in your Python process, it is immediately tainted such that it will never display anywhere. Here’s a super quick one:
if you use that object for your sensitive data, it will never
__repr__(). While you can use the SensitiveString object in your application directly, any results you get back will not use that type, which would be inconvenient since usually an application is feeding data from the database itself back into new queries, so you can use a custom datatype to ensure this string is used on all columns for which you are storing sensitive data. Here is a demo:above, you can see we deal with the sensitive data naturally, but the type handler makes sure in the majority of cases that it is automatically coerced to a SensitiveString object that will not indicate any data in its
__repr__. you can of course use SensitiveString objects explicitly also (likely you’d want to give them a short name like s(“”)) and if you really wanted to enforce explicit use of these safe strings throughout your application, you could change the “process_bind_param()” method to enforce that the incoming object is a SensitiveString object.This should be added as a recipe to the wiki at https://github.com/sqlalchemy/sqlalchemy/wiki/UsageRecipes. The technique above is one I used when I used to work in finance and we had an application that required PII to be encrypted in the database. I implemented a similar approach using tainted strings like above so that they were automatically encrypted using a symmetric key going into and coming out of the DB.
Indeed. I wasn’t suggesting that you fix this, just adding some notes for others who come to this thread who might think that
hide_parameters=Truewill totally solve the problem.For future people who find their way here via google, the relevant feature seems to be https://github.com/sqlalchemy/sqlalchemy/commit/05e1c2c2dbf2e1df0c7d35f32f33c56646ea3f1e (add
hide_parameters=Truetocreate_engine).From the perspective of HIPAA compliance and reducing leakage of PHI/PII into logs, this feature doesn’t seem to be sufficient, since when the database drive raises an error, that error is included in the
StatementErrormessage. This means that a bug on inserting a record where a non-nullable column was set to null can still leak all the other values of columns in that row into the logs. For example, anIntegrityErrorcan look like the following, where the stringPHIrepresents some value which shouldn’t be put into logs:The solution here is fairly non-obvious. Even a specialized wrapper like
SensitiveStringprobably wouldn’t keep the PHI out of the underlying database driver’s exception repr. One option is to modify theerase_parametersmethod above to scrub substrings afterFailing row contains (...and similar strings emitted by the database driver. That’s a bit brittle.Yes please.
I guess I didn’t follow your point about the advantage of tainted string. I don’t want to waste your time to rehash it since I prefer the parameter.