cloudquery: MSSQL Destination PK violations & unicode characters
Describe the Bug
Observations after testing new MSSQL destination plugin:
- Primary Key Violations
- Unicode characters
I am using the Azure source plugin. In the tags column I am seeing issues with unicode characters
{"BillTo":"P\u0026T"}
instead of{"BillTo":"P&T"}
. This works properly with Postgres destination.
Expected Behavior
No primary key violations, and data values with the actual special characters instead of the escaped unicode representations.
CloudQuery Version
cloudquery version 2.2.0 azure (v3.4.0) -> mssql (v1.0.0)
Debug Output
{"level":"error","module":"cli","class":14,"line_no":24,"message":"Violation of PRIMARY KEY constraint 'azure_network_azure_firewall_fqdn_tags_cqpk'. Cannot insert duplicate key in object 'dbo.azure_network_azure_firewall_fqdn_tags'. The duplicate key value is (/subscriptions//resourceGroups//providers/Microsoft.Network/azureFirewallFqdnTags/).","module":"dest-mssql","number":2627,"proc_name":"dbo.cq_proc_azure_network_azure_firewall_fqdn_tags","server_name":"<SQL_SERVER_INSTANCE>","state":1,"time":"2023-01-20T21:24:08Z"} {"level":"error","module":"cli","duration":1765.3814,"error":"mssql: Violation of PRIMARY KEY constraint 'azure_network_azure_firewall_fqdn_tags_cqpk'. Cannot insert duplicate key in object 'dbo.azure_network_azure_firewall_fqdn_tags'. The duplicate key value is (/subscriptions//resourceGroups//providers/Microsoft.Network/azureFirewallFqdnTags/).","len":3,"message":"failed to write batch","table":"azure_network_azure_firewall_fqdn_tags","time":"2023-01-20T21:24:08Z"} {"level":"error","module":"cli","class":14,"line_no":23,"message":"Violation of PRIMARY KEY constraint 'azure_network_express_route_service_providers_cqpk'. Cannot insert duplicate key in object 'dbo.azure_network_express_route_service_providers'. The duplicate key value is (/subscriptions//resourceGroups//providers/Microsoft.Network/expressRouteServiceProviders/).","module":"dest-mssql","number":2627,"proc_name":"dbo.cq_proc_azure_network_express_route_service_providers","server_name":"<SQL_SERVER_INSTANCE>","state":1,"time":"2023-01-20T21:24:21Z"} {"level":"error","module":"cli","duration":44.6381,"error":"mssql: Violation of PRIMARY KEY constraint 'azure_network_express_route_service_providers_cqpk'. Cannot insert duplicate key in object 'dbo.azure_network_express_route_service_providers'. The duplicate key value is (/subscriptions//resourceGroups//providers/Microsoft.Network/expressRouteServiceProviders/).","len":3,"message":"failed to write batch","table":"azure_network_express_route_service_providers","time":"2023-01-20T21:24:21Z"} {"level":"error","module":"cli","class":14,"line_no":23,"message":"Violation of PRIMARY KEY constraint 'azure_authorization_provider_operations_metadata_cqpk'. Cannot insert duplicate key in object 'dbo.azure_authorization_provider_operations_metadata'. The duplicate key value is (/providers/Microsoft.Authorization/providerOperations/Qumulo.QaaS).","module":"dest-mssql","number":2627,"proc_name":"dbo.cq_proc_azure_authorization_provider_operations_metadata","server_name":"<SQL_SERVER_INSTANCE>","state":1,"time":"2023-01-20T21:29:57Z"} {"level":"error","module":"cli","duration":141.0392,"error":"mssql: Violation of PRIMARY KEY constraint 'azure_authorization_provider_operations_metadata_cqpk'. Cannot insert duplicate key in object 'dbo.azure_authorization_provider_operations_metadata'. The duplicate key value is (/providers/Microsoft.Authorization/providerOperations/Qumulo.QaaS).","len":1000,"message":"failed to write batch","table":"azure_authorization_provider_operations_metadata","time":"2023-01-20T21:29:57Z"} {"level":"error","module":"cli","class":14,"line_no":31,"message":"Violation of PRIMARY KEY constraint 'azure_compute_skus_cqpk'. Cannot insert duplicate key in object 'dbo.azure_compute_skus'. The duplicate key value is (Aligned).","module":"dest-mssql","number":2627,"proc_name":"dbo.cq_proc_azure_compute_skus","server_name":"<SQL_SERVER_INSTANCE>","state":1,"time":"2023-01-20T21:29:59Z"} {"level":"error","module":"cli","duration":97.4163,"error":"mssql: Violation of PRIMARY KEY constraint 'azure_compute_skus_cqpk'. Cannot insert duplicate key in object 'dbo.azure_compute_skus'. The duplicate key value is (Aligned).","len":1000,"message":"failed to write batch","table":"azure_compute_skus","time":"2023-01-20T21:29:59Z"} {"level":"error","module":"cli","class":14,"line_no":22,"message":"Violation of PRIMARY KEY constraint 'azure_security_topology_cqpk'. Cannot insert duplicate key in object 'dbo.azure_security_topology'. The duplicate key value is (/subscriptions/<AZ_SUBSCRIPTION>/resourceGroups/<AZ_RG>/providers/Microsoft.Security/locations/westcentralus/topologies/virtualMachines).","module":"dest-mssql","number":2627,"proc_name":"dbo.cq_proc_azure_security_topology","server_name":"<SQL_SERVER_INSTANCE>","state":1,"time":"2023-01-20T21:36:57Z"} {"level":"error","module":"cli","duration":93.7219,"error":"mssql: Violation of PRIMARY KEY constraint 'azure_security_topology_cqpk'. Cannot insert duplicate key in object 'dbo.azure_security_topology'. The duplicate key value is (/subscriptions/<AZ_SUBSCRIPTION>/resourceGroups/<AZ_RG>/providers/Microsoft.Security/locations/westcentralus/topologies/virtualMachines).","len":564,"message":"failed to write batch","table":"azure_security_topology","time":"2023-01-20T21:36:57Z"}
Steps to Reproduce
cloudquery.exe sync .\config --log-level warn --log-format json
kind: source
spec:
name: azure
path: cloudquery/azure
version: "v3.4.0"
destinations: ["mssql"]
tables: ["*"]
concurrency: 150000
scheduler: "round-robin"
skip_tables:
[
"azure_monitor_tenant_activity_logs",
"azure_portal_list_tenant_configuration_violations",
"azure_appservice_web_app_auth_settings",
"azure_policy_assignments",
"azure_security_contacts",
"azure_reservations_reservation_order",
"azure_security_regulatory_compliance_standards",
"azure_storage_tables",
"azure_storage_blob_services",
"azure_storage_containers",
]
spec:
subscriptions:
[
...
]
kind: destination
spec:
name: "mssql"
registry: "github"
path: "cloudquery/mssql"
version: "v1.0.0"
spec:
connection_string: ${MSSQL_CONNECTION_STRING}
auth_mode: ms
schema: dbo
Additional Context
N/A
References
https://github.com/cloudquery/cloudquery/issues/6145#issuecomment-1398444499
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 19 (11 by maintainers)
Thanks @candiduslynx And thank you again for all the work on this new plugin
@rafaelrodrigues3092
Ah, I see. The delete part is handled by this code.
@candiduslynx
Awesome. Thank you for that. I have a question regarding how the write_mode on the MSSQL destination plugin, but I want to test it out first before I go ahead and ask.
Is it cool if ask here? Or would you like me to open a new issue and ask it there?
@rafaelrodrigues3092 thanks for he info! The test plugin output is the required info. I’m going to add it to #7183 an close this issue as duplicate, if you don’t mind.
@candiduslynx
I ran the test as described. I am adding all the details below to set the stage properly. Few Items:
Please let me know if I missed something
Config:
Command
Test Plugin StdOut
Cloudquery log file
I will give this a try. Please allow me some time to figure out how to run it locally with the instructions above 😃
Hi @rafaelrodrigues3092! Yes, the problem is with Azure plugin. In fact, we saw this in another sources as well, so we’ll be fixing it in #7173.
The issue can be observer via Microsoft SQL Server if and only if a couple of resources with the same primary key end up being sent in the same batch (
overwrite
andoverwrite-delete-stale
modes). This is the reason why you saw different tables with issues on different runs.We’ll be prioritizing #7173 issue to fix all PK violations found.
Once the testing plugin is updated, would you mind running the sync against it? As you have several azure subscriptions to sync from it may help to find additional violations.
Adding to my comment above – I checked my Postgres database. I see the PK is the id (SkuName) as well. When searching for ‘Standard_NV18ads_A10_v5’ I found a single record - rather than 1 record per subscription.
I am wondering if the problem here is with the Azure source plugin?
Thanks
@candiduslynx
I tested v2.1.0 and the PK issue is still there. I decided to trace the database, and I think I found the problem. I focused on the [azure_compute_skus] table. And for this case, we essentially need a composite key. The current Id (PK) is the VM SkuName but across multiple subscriptions the SkuName repeats.
For example:
Thanks!
@rafaelrodrigues3092 I’ve updated the code for stored procedures. It’ll be available in the next release of the destination plugin (
v2.1.0
). Thanks for the suggestion!@candiduslynx after a fresh run w/ the v2.0.0 of the plugin I got an updated list of tables w/ PK violations
azure_cdn_managed_rule_sets
azure_support_services
azure_authorization_provider_operations_metadata azure_security_assessments_metadata
azure_policy_set_definitions
azure_compute_skus
azure_eventgrid_topic_types
azure_network_bgp_service_communities
azure_policy_data_policy_manifests
azure_cdn_edge_nodes
azure_frontdoor_managed_rule_sets
azure_security_secure_score_control_definitions azure_reservations_reservation
azure_policy_definitions
azure_advisor_recommendation_metadata
azure_security_topology
azure_authorization_role_assignments
An interesting observation is that it seems that some of the original tables are not part of this updated list. FYI, I have RSCI enabled on my database
ALTER DATABASE "<db_name>" SET READ_COMMITTED_SNAPSHOT ON
and in terms of scale, I am collecting data from 23 Azure subscriptions.For the stored procedures, may I suggest the insert be done using a LEFT JOIN instead?
Hi @rafaelrodrigues3092! Thank you for the report. I’ve extracted the Unicode issue into a separate issue (#7001). The PK violation will be fixed in this one.