gh-ost: Improper handling of special chars. Maybe string encoding/escaping issue?

I’m trying to perform a test-on-replica prior to actually migrating data, however I’m getting stuck on some encoding issue on varchar fields that might contain non-UTF chars:

This is the error I`m getting:

2016-09-05 13:36:57 ERROR Error 1366: Incorrect string value: '\xE1rio' for column 'Descri' at row 1; query=
                        replace /* gh-ost `Operador`.`_Telefones_gho` */ into
                                `Operador`.`_Telefones_gho`
                                        (`Codigo`, `DataHora`, `Cliente`, `Descri`, `DDD`, `Fone`, `Finaliza`, `DataCad`, `OperCad`, `Tipo`, `NovoFone`, `Prioridade`, `BTC`, `Enriquecimento`, `Celular`)
                                values
                                        (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
                ; args=[178299648 <nil> 85010132 novo propriet�rio 41 99625100 0 2016-09-05 13:35:57 9666 3 1 0 0 0 1]

The source column contains the text “novo proprietário”. However ghost unexpectedly hangs and stops migration.

Same happened when this string column contained a “forward tick” - or grave accent - (') along with other strings. Example:

foo ' bar testing blah

I did search documentation about this, but found nothing interesting.

This is the create statement for my table:

CREATE TABLE `Telefones` (
   `Codigo` int(11) NOT NULL AUTO_INCREMENT,
   `DataHora` datetime DEFAULT NULL,
   `Cliente` int(11) NOT NULL DEFAULT '0',
   `Descri` varchar(50) NOT NULL DEFAULT '',
   `DDD` char(2) NOT NULL DEFAULT '',
   `Fone` varchar(9) NOT NULL DEFAULT '',
   `Finaliza` int(11) NOT NULL DEFAULT '0',
   `DataCad` datetime DEFAULT NULL,
   `OperCad` int(11) NOT NULL DEFAULT '0',
   `Tipo` int(11) NOT NULL DEFAULT '0',
   `NovoFone` int(11) NOT NULL DEFAULT '0',
   `Prioridade` tinyint(3) unsigned NOT NULL,
   `BTC` int(11) NOT NULL DEFAULT '0',
   `Enriquecimento` tinyint(4) NOT NULL DEFAULT '0',
   `Celular` tinyint(4) NOT NULL DEFAULT '0',
   PRIMARY KEY (`Codigo`),
   UNIQUE KEY `Codigo` (`Codigo`),
   KEY `Cliente` (`Cliente`),
   KEY `FoneDDD` (`Fone`,`DDD`)
 ) ENGINE=InnoDB AUTO_INCREMENT=178308803 DEFAULT CHARSET=latin1 ROW_FORMAT=COMPRESSED

This is how I’m calling gh-ost:

./gh-ost  --user="foo" \
  --password="bar" \
  --host=<replica.ip> \
  --test-on-replica \
  --database=Operador \
  --table="Telefones" \
  --verbose \
  --alter="engine=innodb" \
  --max-load=Threads_running=30 \
  --switch-to-rbr \
  --chunk-size=2500 \
  --cut-over=default \
  --exact-rowcount \
  --serve-socket-file=/tmp/gh-ost.test.sock \
  --panic-flag-file=/tmp/gh-ost.panic.flag \
  --postpone-cut-over-flag-file=/tmp/ghost-postpone.flag \
  --execute

Thanks in advance

About this issue

  • Original URL
  • State: closed
  • Created 8 years ago
  • Comments: 28 (17 by maintainers)

Most upvoted comments

The problem originates here at the go-mysql library: https://github.com/github/gh-ost/blob/master/vendor/github.com/siddontang/go-mysql/replication/row_event.go#L476

The RBR event does not contain the metadata specifying the charset of the column, and the library assumes it should read it as utf.

I was able to fully reproduce issue and verified the following:

After data migration starts, application crashes as soon as a new item is inserted with “bad” character by binlog. Bad chars don’t trigger the issue if data was already on source table prior to migration.

Steps to reproduce:

  • Started migration on a 1M records table
  • Record id 150 contains a varchar like “Saída”
  • Migration goes fine and id 150 gets migrated. I can check ghost table and data is correctly there.
  • Migration goes on and someone inserts a new Record id 1000001 with a varchar like “José” to master
  • Insert comes to Tool via binlog
  • Tool tends to priorize data coming from binlog and tries adding id 1000001 record, no mather which position it is migrating from source table
  • Tool points id 1000001 record is invalid
  • Crashes

Does it make sense?

Sorry about that.

On top of the CREATE TABLE, i would like to add the SHOW FULL COLUMNS from that table, to the specific column. I hope it might help:

    Descri  varchar(50) latin1_swedish_ci   NO              select,insert,update,references