gh-ost: Improper handling of special chars. Maybe string encoding/escaping issue?
I’m trying to perform a test-on-replica prior to actually migrating data, however I’m getting stuck on some encoding issue on varchar fields that might contain non-UTF chars:
This is the error I`m getting:
2016-09-05 13:36:57 ERROR Error 1366: Incorrect string value: '\xE1rio' for column 'Descri' at row 1; query=
replace /* gh-ost `Operador`.`_Telefones_gho` */ into
`Operador`.`_Telefones_gho`
(`Codigo`, `DataHora`, `Cliente`, `Descri`, `DDD`, `Fone`, `Finaliza`, `DataCad`, `OperCad`, `Tipo`, `NovoFone`, `Prioridade`, `BTC`, `Enriquecimento`, `Celular`)
values
(?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
; args=[178299648 <nil> 85010132 novo propriet�rio 41 99625100 0 2016-09-05 13:35:57 9666 3 1 0 0 0 1]
The source column contains the text “novo proprietário”. However ghost unexpectedly hangs and stops migration.
Same happened when this string column contained a “forward tick” - or grave accent - (') along with other strings. Example:
foo ' bar testing blah
I did search documentation about this, but found nothing interesting.
This is the create statement for my table:
CREATE TABLE `Telefones` (
`Codigo` int(11) NOT NULL AUTO_INCREMENT,
`DataHora` datetime DEFAULT NULL,
`Cliente` int(11) NOT NULL DEFAULT '0',
`Descri` varchar(50) NOT NULL DEFAULT '',
`DDD` char(2) NOT NULL DEFAULT '',
`Fone` varchar(9) NOT NULL DEFAULT '',
`Finaliza` int(11) NOT NULL DEFAULT '0',
`DataCad` datetime DEFAULT NULL,
`OperCad` int(11) NOT NULL DEFAULT '0',
`Tipo` int(11) NOT NULL DEFAULT '0',
`NovoFone` int(11) NOT NULL DEFAULT '0',
`Prioridade` tinyint(3) unsigned NOT NULL,
`BTC` int(11) NOT NULL DEFAULT '0',
`Enriquecimento` tinyint(4) NOT NULL DEFAULT '0',
`Celular` tinyint(4) NOT NULL DEFAULT '0',
PRIMARY KEY (`Codigo`),
UNIQUE KEY `Codigo` (`Codigo`),
KEY `Cliente` (`Cliente`),
KEY `FoneDDD` (`Fone`,`DDD`)
) ENGINE=InnoDB AUTO_INCREMENT=178308803 DEFAULT CHARSET=latin1 ROW_FORMAT=COMPRESSED
This is how I’m calling gh-ost:
./gh-ost --user="foo" \
--password="bar" \
--host=<replica.ip> \
--test-on-replica \
--database=Operador \
--table="Telefones" \
--verbose \
--alter="engine=innodb" \
--max-load=Threads_running=30 \
--switch-to-rbr \
--chunk-size=2500 \
--cut-over=default \
--exact-rowcount \
--serve-socket-file=/tmp/gh-ost.test.sock \
--panic-flag-file=/tmp/gh-ost.panic.flag \
--postpone-cut-over-flag-file=/tmp/ghost-postpone.flag \
--execute
Thanks in advance
About this issue
- Original URL
- State: closed
- Created 8 years ago
- Comments: 28 (17 by maintainers)
closed by https://github.com/github/gh-ost/pull/227
The problem originates here at the
go-mysql
library: https://github.com/github/gh-ost/blob/master/vendor/github.com/siddontang/go-mysql/replication/row_event.go#L476The RBR event does not contain the metadata specifying the charset of the column, and the library assumes it should read it as
utf
.I was able to fully reproduce issue and verified the following:
After data migration starts, application crashes as soon as a new item is inserted with “bad” character by binlog. Bad chars don’t trigger the issue if data was already on source table prior to migration.
Steps to reproduce:
Does it make sense?
Sorry about that.
On top of the CREATE TABLE, i would like to add the SHOW FULL COLUMNS from that table, to the specific column. I hope it might help: