sqlalchemy: fetching cx_Oracle CLOBs unnecessarily slow / could be optimized with a switch

According to the cx_Oracle bug tracker, it is possible to force fetching CLOBs with cx_Oracle as if they were long strings, by just using the following output type handler:

    def output_type_handler(cursor, name, defaultType, size, precision, scale):
        if defaultType == cx_Oracle.CLOB:
            return cursor.var(cx_Oracle.LONG_STRING, arraysize = cursor.arraysize)

SQLAlchemy currently has two options: either do not fetch CLOBs eagerly at all, or fetch the CLOBs using a very inefficient .read(). Maybe it would be worthwhile to add this as an option. This matters a lot when the database has lots of small CLOBs and the rows need to be fetched in bulk.

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 25 (17 by maintainers)

Commits related to this issue

Most upvoted comments

Mike Bayer has proposed a fix for this issue in the rel_1_3 branch:

Use cx_oracle.LONG_STRING /LONG_BINARY for CLOB/BLOB https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/1948

As noted, the performance difference fetching CLOBs (and BLOBs) as strings (bytes) is significant. I would indeed use that as the default. Those requiring the processing of large CLOB/BLOB should fetch the CLOB/BLOB directly and do whatever they need to do with it. It sounds like you have that option available already, so changing the default to fetch CLOB/BLOB as strings/bytes without doing a .read() makes sense to me.