sqlalchemy: fetching cx_Oracle CLOBs unnecessarily slow / could be optimized with a switch
According to the cx_Oracle bug tracker, it is possible to force fetching CLOBs with cx_Oracle as if they were long strings, by just using the following output type handler:
def output_type_handler(cursor, name, defaultType, size, precision, scale):
if defaultType == cx_Oracle.CLOB:
return cursor.var(cx_Oracle.LONG_STRING, arraysize = cursor.arraysize)
SQLAlchemy currently has two options: either do not fetch CLOBs eagerly at all, or fetch the CLOBs using a very inefficient .read(). Maybe it would be worthwhile to add this as an option. This matters a lot when the database has lots of small CLOBs and the rows need to be fetched in bulk.
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 25 (17 by maintainers)
Mike Bayer has proposed a fix for this issue in the rel_1_3 branch:
Use cx_oracle.LONG_STRING /LONG_BINARY for CLOB/BLOB https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/1948
As noted, the performance difference fetching CLOBs (and BLOBs) as strings (bytes) is significant. I would indeed use that as the default. Those requiring the processing of large CLOB/BLOB should fetch the CLOB/BLOB directly and do whatever they need to do with it. It sounds like you have that option available already, so changing the default to fetch CLOB/BLOB as strings/bytes without doing a .read() makes sense to me.