pgsync: cannot use pgsync on table with many (83) columns (pgsync.sync: Exception (psycopg2.errors.TooManyArguments) cannot pass more than 100 arguments to a function)
PGSync version: 2.1.1
Postgres version: 12.5
Elasticsearch version: 7.13.2
Redis version: ?
Python version: 3.8
Problem Description:
I cannot use pgsync on table with very many colums.
Error Message (if any):
[SQL: SELECT anon_1."JSON_BUILD_ARRAY_1", anon_1."JSON_BUILD_OBJECT_1"
FROM (SELECT JSON_BUILD_ARRAY() AS "JSON_BUILD_ARRAY_1", JSON_BUILD_OBJECT(%(JSON_BUILD_OBJECT_2)s, actions_dbc_1.id, %(JSON_BUILD_OBJECT_3)s, actions_dbc_1.tcn, %(JSON_BUILD_OBJECT_4)s, actions_dbc_1.patientnr, %(JSON_BUILD_OBJECT_5)s, actions_dbc_1.patient_naam, %(JSON_BUILD_OBJECT_6)s, actions_dbc_1.dbcnr, %(JSON_BUILD_OBJECT_7)s, actions_dbc_1.dbc_datum, %(JSON_BUILD_OBJECT_8)s, actions_dbc_1.prestatie_code, %(JSON_BUILD_OBJECT_9)s, actions_dbc_1.specialisme, %(JSON_BUILD_OBJECT_10)s, actions_dbc_1.toelichting, %(JSON_BUILD_OBJECT_11)s, actions_dbc_1.rule_code, %(JSON_BUILD_OBJECT_12)s, actions_dbc_1.task_status, %(JSON_BUILD_OBJECT_13)s, actions_dbc_1.regel, %(JSON_BUILD_OBJECT_14)s, actions_dbc_1.actie, %(JSON_BUILD_OBJECT_15)s, actions_dbc_1.oplosser, %(JSON_BUILD_OBJECT_16)s, actions_dbc_1.event_status, %(JSON_BUILD_OBJECT_17)s, actions_dbc_1.event_datum, %(JSON_BUILD_OBJECT_18)s, actions_dbc_1.tasklist_criteria, %(JSON_BUILD_OBJECT_19)s, actions_dbc_1.created_by_user, %(JSON_BUILD_OBJECT_20)s, actions_dbc_1.create_date, %(JSON_BUILD_OBJECT_21)s, actions_dbc_1.modified_by_user, %(JSON_BUILD_OBJECT_22)s, actions_dbc_1.modify_date, %(JSON_BUILD_OBJECT_23)s, actions_dbc_1.valuation, %(JSON_BUILD_OBJECT_24)s, actions_dbc_1.specialist_code, %(JSON_BUILD_OBJECT_25)s, actions_dbc_1.kostenplaats_code, %(JSON_BUILD_OBJECT_26)s, actions_dbc_1.specialisme_code, %(JSON_BUILD_OBJECT_27)s, actions_dbc_1.oplosser_code, %(JSON_BUILD_OBJECT_28)s, actions_dbc_1.kosten_deel, %(JSON_BUILD_OBJECT_29)s, actions_dbc_1.honorarium_deel, %(JSON_BUILD_OBJECT_30)s, actions_dbc_1.patient_geboortedatum, %(JSON_BUILD_OBJECT_31)s, actions_dbc_1.username, %(JSON_BUILD_OBJECT_32)s, actions_dbc_1.dbc_typering, %(JSON_BUILD_OBJECT_33)s, actions_dbc_1.ste_dbc_id, %(JSON_BUILD_OBJECT_34)s, actions_dbc_1.task_count, %(JSON_BUILD_OBJECT_35)s, actions_dbc_1.dbc_einddatum, %(JSON_BUILD_OBJECT_36)s, actions_dbc_1.zis_ziektegevalnr, %(JSON_BUILD_OBJECT_37)s, actions_dbc_1.dbc_segment, %(JSON_BUILD_OBJECT_38)s, actions_dbc_1.ind_medisch, %(JSON_BUILD_OBJECT_39)s, actions_dbc_1.vc_lokatie_code, %(JSON_BUILD_OBJECT_40)s, actions_dbc_1.kleur_code, %(JSON_BUILD_OBJECT_41)s, actions_dbc_1.vc_laatste_uitvoerder_code, %(JSON_BUILD_OBJECT_42)s, actions_dbc_1.task_role, %(JSON_BUILD_OBJECT_43)s, actions_dbc_1.condition_nr, %(JSON_BUILD_OBJECT_44)s, actions_dbc_1.task_allocation, %(JSON_BUILD_OBJECT_45)s, actions_dbc_1.vc_bedrijf, %(JSON_BUILD_OBJECT_46)s, actions_dbc_1.reference, %(JSON_BUILD_OBJECT_47)s, actions_dbc_1.commentary, %(JSON_BUILD_OBJECT_48)s, actions_dbc_1.operateur_naam, %(JSON_BUILD_OBJECT_49)s, actions_dbc_1.event_details, %(JSON_BUILD_OBJECT_50)s, actions_dbc_1.task_revenue_code, %(JSON_BUILD_OBJECT_51)s, actions_dbc_1.verbandscombinatie_nr, %(JSON_BUILD_OBJECT_52)s, actions_dbc_1.declaratie_code, %(JSON_BUILD_OBJECT_53)s, actions_dbc_1.vc_debiteur_code, %(JSON_BUILD_OBJECT_54)s, actions_dbc_1.characteristics, %(JSON_BUILD_OBJECT_55)s, actions_dbc_1.agenda_code, %(JSON_BUILD_OBJECT_56)s, actions_dbc_1.vgn_zorgproduct, %(JSON_BUILD_OBJECT_57)s, actions_dbc_1.afgeleid_zorgproduct, %(JSON_BUILD_OBJECT_58)s, actions_dbc_1.contact_email, %(JSON_BUILD_OBJECT_59)s, actions_dbc_1.zorgverzekeraar, %(JSON_BUILD_OBJECT_60)s, actions_dbc_1.rule_priority, %(JSON_BUILD_OBJECT_61)s, actions_dbc_1.event_priority, %(JSON_BUILD_OBJECT_62)s, actions_dbc_1.event_reference_date, %(JSON_BUILD_OBJECT_63)s, actions_dbc_1.vc_afwijkend_zp_reden, %(JSON_BUILD_OBJECT_64)s, actions_dbc_1.specialist_naam, %(JSON_BUILD_OBJECT_65)s, actions_dbc_1.zis_validatie_fout_omschr, %(JSON_BUILD_OBJECT_66)s, actions_dbc_1.rule_status, %(JSON_BUILD_OBJECT_67)s, actions_dbc_1.user_priority, %(JSON_BUILD_OBJECT_68)s, actions_dbc_1.hoofdbehandelaar_code, %(JSON_BUILD_OBJECT_69)s, actions_dbc_1.loid, %(JSON_BUILD_OBJECT_70)s, actions_dbc_1.uitvoerende_oe, %(JSON_BUILD_OBJECT_71)s, actions_dbc_1.status_code, %(JSON_BUILD_OBJECT_72)s, actions_dbc_1.checksum, %(JSON_BUILD_OBJECT_73)s, actions_dbc_1.zis_validatie_fout_datum, %(JSON_BUILD_OBJECT_74)s, actions_dbc_1.oplosser_d2a, %(JSON_BUILD_OBJECT_75)s, actions_dbc_1.task_allocation_d2a, %(JSON_BUILD_OBJECT_76)s, actions_dbc_1.username_d2a, %(JSON_BUILD_OBJECT_77)s, actions_dbc_1.sub_specialisme_code, %(JSON_BUILD_OBJECT_78)s, actions_dbc_1.vc_commentary, %(JSON_BUILD_OBJECT_79)s, actions_dbc_1.hix_fout_opmerking, %(JSON_BUILD_OBJECT_80)s, actions_dbc_1.type_verwijzer, %(JSON_BUILD_OBJECT_81)s, actions_dbc_1.zis_validatie_fout_status, %(JSON_BUILD_OBJECT_82)s, actions_dbc_1.zis_validatie_fout_code, %(JSON_BUILD_OBJECT_83)s, actions_dbc_1.zis_validatie_datum_gevalideerd, %(JSON_BUILD_OBJECT_84)s, actions_dbc_1.task_event_allocation) AS "JSON_BUILD_OBJECT_1"
FROM sd.actions_dbc AS actions_dbc_1
WHERE CAST(CAST(actions_dbc_1.xmin AS TEXT) AS BIGINT) < %(param_1)s) AS anon_1]
[parameters: {'JSON_BUILD_OBJECT_2': 'id', 'JSON_BUILD_OBJECT_3': 'tcn', 'JSON_BUILD_OBJECT_4': 'patientnr', 'JSON_BUILD_OBJECT_5': 'patient_naam', 'JSON_BUILD_OBJECT_6': 'dbcnr', 'JSON_BUILD_OBJECT_7': 'dbc_datum', 'JSON_BUILD_OBJECT_8': 'prestatie_code', 'JSON_BUILD_OBJECT_9': 'specialisme', 'JSON_BUILD_OBJECT_10': 'toelichting', 'JSON_BUILD_OBJECT_11': 'rule_code', 'JSON_BUILD_OBJECT_12': 'task_status', 'JSON_BUILD_OBJECT_13': 'regel', 'JSON_BUILD_OBJECT_14': 'actie', 'JSON_BUILD_OBJECT_15': 'oplosser', 'JSON_BUILD_OBJECT_16': 'event_status', 'JSON_BUILD_OBJECT_17': 'event_datum', 'JSON_BUILD_OBJECT_18': 'tasklist_criteria', 'JSON_BUILD_OBJECT_19': 'created_by_user', 'JSON_BUILD_OBJECT_20': 'create_date', 'JSON_BUILD_OBJECT_21': 'modified_by_user', 'JSON_BUILD_OBJECT_22': 'modify_date', 'JSON_BUILD_OBJECT_23': 'valuation', 'JSON_BUILD_OBJECT_24': 'specialist_code', 'JSON_BUILD_OBJECT_25': 'kostenplaats_code', 'JSON_BUILD_OBJECT_26': 'specialisme_code', 'JSON_BUILD_OBJECT_27': 'oplosser_code', 'JSON_BUILD_OBJECT_28': 'kosten_deel', 'JSON_BUILD_OBJECT_29': 'honorarium_deel', 'JSON_BUILD_OBJECT_30': 'patient_geboortedatum', 'JSON_BUILD_OBJECT_31': 'username', 'JSON_BUILD_OBJECT_32': 'dbc_typering', 'JSON_BUILD_OBJECT_33': 'ste_dbc_id', 'JSON_BUILD_OBJECT_34': 'task_count', 'JSON_BUILD_OBJECT_35': 'dbc_einddatum', 'JSON_BUILD_OBJECT_36': 'zis_ziektegevalnr', 'JSON_BUILD_OBJECT_37': 'dbc_segment', 'JSON_BUILD_OBJECT_38': 'ind_medisch', 'JSON_BUILD_OBJECT_39': 'vc_lokatie_code', 'JSON_BUILD_OBJECT_40': 'kleur_code', 'JSON_BUILD_OBJECT_41': 'vc_laatste_uitvoerder_code', 'JSON_BUILD_OBJECT_42': 'task_role', 'JSON_BUILD_OBJECT_43': 'condition_nr', 'JSON_BUILD_OBJECT_44': 'task_allocation', 'JSON_BUILD_OBJECT_45': 'vc_bedrijf', 'JSON_BUILD_OBJECT_46': 'reference', 'JSON_BUILD_OBJECT_47': 'commentary', 'JSON_BUILD_OBJECT_48': 'operateur_naam', 'JSON_BUILD_OBJECT_49': 'event_details', 'JSON_BUILD_OBJECT_50': 'task_revenue_code', 'JSON_BUILD_OBJECT_51': 'verbandscombinatie_nr', 'JSON_BUILD_OBJECT_52': 'declaratie_code', 'JSON_BUILD_OBJECT_53': 'vc_debiteur_code', 'JSON_BUILD_OBJECT_54': 'characteristics', 'JSON_BUILD_OBJECT_55': 'agenda_code', 'JSON_BUILD_OBJECT_56': 'vgn_zorgproduct', 'JSON_BUILD_OBJECT_57': 'afgeleid_zorgproduct', 'JSON_BUILD_OBJECT_58': 'contact_email', 'JSON_BUILD_OBJECT_59': 'zorgverzekeraar', 'JSON_BUILD_OBJECT_60': 'rule_priority', 'JSON_BUILD_OBJECT_61': 'event_priority', 'JSON_BUILD_OBJECT_62': 'event_reference_date', 'JSON_BUILD_OBJECT_63': 'vc_afwijkend_zp_reden', 'JSON_BUILD_OBJECT_64': 'specialist_naam', 'JSON_BUILD_OBJECT_65': 'zis_validatie_fout_omschr', 'JSON_BUILD_OBJECT_66': 'rule_status', 'JSON_BUILD_OBJECT_67': 'user_priority', 'JSON_BUILD_OBJECT_68': 'hoofdbehandelaar_code', 'JSON_BUILD_OBJECT_69': 'loid', 'JSON_BUILD_OBJECT_70': 'uitvoerende_oe', 'JSON_BUILD_OBJECT_71': 'status_code', 'JSON_BUILD_OBJECT_72': 'checksum', 'JSON_BUILD_OBJECT_73': 'zis_validatie_fout_datum', 'JSON_BUILD_OBJECT_74': 'oplosser_d2a', 'JSON_BUILD_OBJECT_75': 'task_allocation_d2a', 'JSON_BUILD_OBJECT_76': 'username_d2a', 'JSON_BUILD_OBJECT_77': 'sub_specialisme_code', 'JSON_BUILD_OBJECT_78': 'vc_commentary', 'JSON_BUILD_OBJECT_79': 'hix_fout_opmerking', 'JSON_BUILD_OBJECT_80': 'type_verwijzer', 'JSON_BUILD_OBJECT_81': 'zis_validatie_fout_status', 'JSON_BUILD_OBJECT_82': 'zis_validatie_fout_code', 'JSON_BUILD_OBJECT_83': 'zis_validatie_datum_gevalideerd', 'JSON_BUILD_OBJECT_84': 'task_event_allocation', 'param_1': 62355721}]
0:00:02.510704 (2.51 sec)
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 24 (8 by maintainers)
Commits related to this issue
- support more than 100 column tables #149 — committed to toluaina/pgsync by toluaina 2 years ago
I implemented a potential solution in the master branch. This hasn’t thoroughly been tested. Can you try this and let me know.
I upvote for the change if it does not have any performance penalty We had to run a patched version of 2.1.9 with the fix from @cocinfar I think it’s pretty common to have more than 100 columns table so this will be limitation for many users I can help to test all the areas affected by this if needed
@ansersolutions Unfortunately that didin’t work for us we got hits error:
pgsync_1 | -------------------------------------------------------------------------------- pgsync_1 | {‘_auto_correlate’: True, ‘_raw_columns’: [<sqlalchemy.sql.elements.BinaryExpression object at 0x7fc17213d990>, <sqlalchemy.sql.elements.BinaryExpression object at 0x7fc17213d890>, Column(‘Id’, UUID(), table=<anon_1>, primary_key=True, nullable=False)], ‘_label_style’: symbol(‘LABEL_STYLE_DISAMBIGUATE_ONLY’), ‘_bind’: None, ‘_all_selected_columns’: [<sqlalchemy.sql.elements.BinaryExpression object at 0x7fc17213d990>, <sqlalchemy.sql.elements.BinaryExpression object at 0x7fc17213d890>, Column(‘Id’, UUID(), table=<anon_1>, primary_key=True, nullable=False)], ‘_memoized_keys’: frozenset({‘_all_selected_columns’})} pgsync_1 | -------------------------------------------------------------------------------- pgsync_1 | 0:00:00.978280 (0.98 sec) pgsync_1 | Traceback (most recent call last): pgsync_1 | File “/usr/local/bin/pgsync”, line 7, in <module> pgsync_1 | sync.main() pgsync_1 | File “/usr/local/lib/python3.7/site-packages/click/core.py”, line 1128, in call pgsync_1 | return self.main(*args, **kwargs) pgsync_1 | File “/usr/local/lib/python3.7/site-packages/click/core.py”, line 1053, in main pgsync_1 | rv = self.invoke(ctx) pgsync_1 | File “/usr/local/lib/python3.7/site-packages/click/core.py”, line 1395, in invoke pgsync_1 | return ctx.invoke(self.callback, **ctx.params) pgsync_1 | File “/usr/local/lib/python3.7/site-packages/click/core.py”, line 754, in invoke pgsync_1 | return __callback(*args, **kwargs) pgsync_1 | File “/usr/local/lib/python3.7/site-packages/pgsync/sync.py”, line 1139, in main pgsync_1 | sync.pull() pgsync_1 | File “/usr/local/lib/python3.7/site-packages/pgsync/sync.py”, line 1006, in pull pgsync_1 | self.sync(self._sync(txmin=txmin, txmax=txmax)) pgsync_1 | File “/usr/local/lib/python3.7/site-packages/pgsync/sync.py”, line 875, in sync pgsync_1 | self.es.bulk(self.index, docs) pgsync_1 | File “/usr/local/lib/python3.7/site-packages/pgsync/elastichelper.py”, line 133, in bulk pgsync_1 | raise_on_error=raise_on_error, pgsync_1 | File “/usr/local/lib/python3.7/site-packages/elasticsearch/helpers/actions.py”, line 484, in parallel_bulk pgsync_1 | actions, chunk_size, max_chunk_bytes, client.transport.serializer pgsync_1 | File “/usr/local/lib/python3.7/multiprocessing/pool.py”, line 748, in next pgsync_1 | raise value pgsync_1 | File “/usr/local/lib/python3.7/multiprocessing/pool.py”, line 121, in worker pgsync_1 | result = (True, func(*args, **kwds)) pgsync_1 | File “/usr/local/lib/python3.7/multiprocessing/pool.py”, line 140, in _helper_reraises_exception pgsync_1 | raise ex pgsync_1 | File “/usr/local/lib/python3.7/multiprocessing/pool.py”, line 292, in _guarded_task_generation pgsync_1 | for i, x in enumerate(iterable): pgsync_1 | File “/usr/local/lib/python3.7/site-packages/elasticsearch/helpers/actions.py”, line 155, in _chunk_actions pgsync_1 | for action, data in actions: pgsync_1 | File “/usr/local/lib/python3.7/site-packages/pgsync/sync.py”, line 818, in _sync pgsync_1 | count: int = self.fetchcount(node._subquery) pgsync_1 | File “/usr/local/lib/python3.7/site-packages/pgsync/base.py”, line 896, in fetchcount pgsync_1 | statement.original.with_only_columns( pgsync_1 | AttributeError: ‘Select’ object has no attribute ‘original’
Our schema is something like this
[ { “database”: “integrationdb”, “index”: “es.somestuff”, “nodes”: { “table”: “Stuff”, “children”: [
{ “table”: “StuffProgress”, “relationship”: { “variant”: “object”, “type”: “one_to_many” } }, { “table”: “NestedStuff”, “relationship”: { “variant”: “object”, “type”: “one_to_many” } } ] } } ]
Just ran into this issue. We have created a deformalized table for pgsync to solve out of disk space errors on postgres. Now we are seeing this issue with 59 columns. Two questions: -What is the limit on columns (I am stuck with brute forcing it by slowly reducing the # of cols. ) -@cocinfar - I like your approach. Going to try it out. What version of pgsync are you using? I get a
ImportError: cannot import name 'get_primary_keys' from 'pgsync.base'From what you said it sounds like this only works with bootstrap (bulk inserts to ES). -@toluaina - any chance this could be added to the main branch?