sqlalchemy: Exclude Parameters from Exceptions

Is it possible to categorically suppress parameters from being included in log messages?

As a HIPAA compliant organization, we need to limit the potential exposure of patient data. Obviously, this is the kind of data that is passed across database drivers like SQL Alchemy. Unfortunately, while testing, we ran into some errors (e.g. sqlalchemy.exc.OperationalError). When pushing the error string to our logger, we discovered that the SQL statement and parameters were included. Since the parameters are the patient data, this is far from ideal.

Unfortunately, my (usually strong) google-fu could not find any information about when/how these values are injected into exceptions and (critically) if they can be categorically suppressed at the source. Trying to catch and suppress them in each and every use is not practical since a single mistake could leak patient data into systems where we’d rather not store it.

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 18 (10 by maintainers)

Most upvoted comments

Hi there -

you can intercept all the SQLAlchemy error objects which include parameters using the handle_error event. Here is a handler that will replace the raw parameter structure, which may be a dictionary, tuple, or list of dictionaries or tuples depending on the type of database and execution, with a single dictionary that includes no information:

from sqlalchemy import event
from sqlalchemy import engine


@event.listens_for(engine.Engine, "handle_error")
def erase_parameters(exception_context):
    for exc in [
        exception_context.chained_exception,
        exception_context.original_exception,
        exception_context.sqlalchemy_exception,
    ]:
        if exc is not None:
            exc.params = {"no_parameters": "parameters hidden"}

this will cover that particular case, however would not cover the case if you’ve turned on SQLAlchemy INFO logging, which will log SQL statements and their parameters as well.

The only way to truly guarantee a particular string can be controlled everywhere is by ensuring when this string first exists in your Python process, it is immediately tainted such that it will never display anywhere. Here’s a super quick one:

class SensitiveString(str):
    def __repr__(self):
        return "<redacted>"

if you use that object for your sensitive data, it will never __repr__(). While you can use the SensitiveString object in your application directly, any results you get back will not use that type, which would be inconvenient since usually an application is feeding data from the database itself back into new queries, so you can use a custom datatype to ensure this string is used on all columns for which you are storing sensitive data. Here is a demo:

from sqlalchemy import Column
from sqlalchemy import create_engine
from sqlalchemy import Integer
from sqlalchemy import String
from sqlalchemy import TypeDecorator
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import Session


class SensStringType(TypeDecorator):
    """Store a string that should not be displayed."""
    impl = String

    def process_bind_param(self, value, dialect):
        if value is not None and not isinstance(value, SensitiveString):
            value = SensitiveString(value)
        return value

    def process_result_value(self, value, dialect):
        if value is not None:
            value = SensitiveString(value)
        return value

class SensitiveString(str):
    """A string that doesn't repr."""

    def __repr__(self):
        return "<redacted>"

Base = declarative_base()


class A(Base):
    __tablename__ = "a"

    id = Column(Integer, primary_key=True)
    data = Column(SensStringType(50))


e = create_engine("sqlite://", echo=True)
Base.metadata.create_all(e)

s = Session(e)

# in SQL logging, we won't see the parameter value, because the bind processor
# runs on the parameter before it is logged.
# Note that DEBUG logging, which we aren't using here and is not commonly
# used, will show the raw values in logged result rows.
s.add(A(id=1, data="some sensitive data"))
s.flush()

# when we get it back...
data, = s.query(A.data).first()

# the string is there...
assert data == str(data) == "some sensitive data"

# but again, no repr
assert repr(data) == "<redacted>"


# error messages are also logged with the redacted string.  Here
# we intentionally create an integrity error to demontrate
s.add(A(id=1, data="some sensitive data"))
s.commit()

above, you can see we deal with the sensitive data naturally, but the type handler makes sure in the majority of cases that it is automatically coerced to a SensitiveString object that will not indicate any data in its __repr__. you can of course use SensitiveString objects explicitly also (likely you’d want to give them a short name like s(“”)) and if you really wanted to enforce explicit use of these safe strings throughout your application, you could change the “process_bind_param()” method to enforce that the incoming object is a SensitiveString object.

This should be added as a recipe to the wiki at https://github.com/sqlalchemy/sqlalchemy/wiki/UsageRecipes. The technique above is one I used when I used to work in finance and we had an application that required PII to be encrypted in the database. I implemented a similar approach using tainted strings like above so that they were automatically encrypted using a symmetric key going into and coming out of the DB.

Indeed. I wasn’t suggesting that you fix this, just adding some notes for others who come to this thread who might think that hide_parameters=True will totally solve the problem.

For future people who find their way here via google, the relevant feature seems to be https://github.com/sqlalchemy/sqlalchemy/commit/05e1c2c2dbf2e1df0c7d35f32f33c56646ea3f1e (add hide_parameters=True to create_engine).

From the perspective of HIPAA compliance and reducing leakage of PHI/PII into logs, this feature doesn’t seem to be sufficient, since when the database drive raises an error, that error is included in the StatementError message. This means that a bug on inserting a record where a non-nullable column was set to null can still leak all the other values of columns in that row into the logs. For example, an IntegrityError can look like the following, where the string PHI represents some value which shouldn’t be put into logs:

IntegrityError('(psycopg2.errors.NotNullViolation) null value in column "patient_control_number" violates not-null constraint\nDETAIL: Failing row contains (1, null, null, null, null, null, null, null, null, null, null, PHI, PHI, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, 2020-09-25 21:56:47.339412+00).\n')

The solution here is fairly non-obvious. Even a specialized wrapper like SensitiveString probably wouldn’t keep the PHI out of the underlying database driver’s exception repr. One option is to modify the erase_parameters method above to scrub substrings after Failing row contains (... and similar strings emitted by the database driver. That’s a bit brittle.

would you find a create_engine() parameter that prevents .params from being included in log messages and execption objects useful

Yes please.

I’m not familiar with a canonical pattern that calls str() on Exception objects.

I guess I didn’t follow your point about the advantage of tainted string. I don’t want to waste your time to rehash it since I prefer the parameter.