vendredi 28 novembre 2014

Strange MySQL replication error 1146 (Table doesn't exist)


I'm getting a very weird MySQL replication error No. 1146 for a REPLACE INTO query on a slave host replicating all tables in all databases from a master, and I'm having a bit of a hard time understanding why.


Here's my scenario:



  1. New data is generated solely on the master server, MySQL 5.5.40.

  2. Slave A, MySQL 5.5.38, has been replicating all tables in all databases from this master just fine for a long while, producing no errors of any kind.

  3. IO_THREAD was paused on slave A. Value of Relay_Master_Log_File was confirmed to match that of Master_Log_File and value of Exec_Master_Log_Pos was confirmed to match that of Read_Master_Log_Pos.

  4. A FLUSH TABLES WITH READ LOCK was issued on slave A and a dump of all db's was subsequently produced from it with mysqldump -v -h localhost -u root -p --all-databases --opt --single-transaction --hex-blob --no-autocommit > dump.sql. Lock was released only after the dump completed, and then the slave IO_THREAD was restarted; replication from master resumed without any problems, and continues to run smoothly to this day on slave A.

  5. The dump was transferred to slave B, MySQL 5.5.34, and loaded successfully into it through a simple mysql -h localhost -u root -p < dump.sql command, after confirming non of the target databases existed on this second host (and actually no db's other than the mysql db and information & performance schemas). I can also confirm integrity of the dump file after the transfer to slave B thanks to matching RMD160 checksums for it on both hosts.

  6. Slave B was pointed to the master server for replication, with MASTER_LOG_FILE and MASTER_LOG_POS coordinates set to the values of Relay_Master_Log_File and Exec_Master_Log_Pos recorded in step 3 above from slave A, respectively.

  7. Replication was started on slave B, and data started flowing in just fine.


However, after about a day of smooth operation, slave B produced the following error in its SQL_THREAD:



Error 'Table 'knet.course_location_tracks' doesn't exist' on query. Default database: 'knet'. Query: 'REPLACE INTO `course_location_tracks` (`userid`,`courseid`,`lesson_location`,`datestamp`) VALUES (val1,val2,val3,val4)'


(actual row values have been redacted out)


I can't make much sense of this error because I can confirm not only that the knet.course_location_tracks table exists on slave B, but also that its definition is identical to that of the corresponding table on slave A. Slave A, as I pointed out above, continues to replicate from master just fine to this day, without any problems of any kind.


If replication started fine on slave B, leading me to believe that initial replication coordinates were calculated correctly for it off of the state of slave A at the point of the dump, why am I then getting this error for a table that does exists on the host? Moreover, why am I getting the error on slave B while slave A is still replicating smoothly?


Other than having mismatching MySQL versions across my three hosts (something I only noticed recently in trying to debug the problem, and that I will see to correct ASAP), what could I be doing wrong?


And, finally, once the problem is determined and corrected, how could I get slave B to resume replication at the correct point so that it is again fully in sync with master?


Thanks in advance for any help!


PS: If it matters, replication on slave A was initially set up by transferring all db's & tables in master to it in a similar fashion, i.e. by first flushing & blocking all tables on master with the read lock, dumping the data with mysqldump (same flags), and finally loading them into the slave with the same mysql command-line client call.





Aucun commentaire:

Enregistrer un commentaire