Dev Ups

Published 2022-08-07 in ci-cd

Migrating WordPress (Lost connection to MySQL server during query)

I was attempting to restore a WordPress site using a mysqldump from a MariaDB server. The new server was using MySQL and I spent several hours investigated the implications.

A more concrete bug was unearthed as I attempted to completely automate the restoration process. Administrators restoring manually would have no issues with visiting wp-admin and checking the permalinks.

The most insidious error was:

mysql.connector.errors.OperationalError: 2013 (HY000): Lost connection to MySQL server during query

This can occur both deterministically for reasons I will document, and also, I suspect from outdated development tools and powerful IDEs. Its pervasiveness is what makes it so insidious.

Applying a MariaDB backup to MySQL Server

When I dug into the SQL, seeing "mariadb" on the first line gave me a sense that it had owned that backup.

Not so fast. MariaDB is open source, free, and claiming full compatibility with MySQL, so reviewers point out. CI/CD providers are so generous in their support of open source that there must be extensive tests against incompatibility.

But something wasn't working, so bugs there were.

Bug hunt

The restored site's permalinks were already pretty, not plain, but the REST API wouldn't work until precisely the time when I had viewed the page with its settings at https://localhost:8542/wp-admin/options-permalink.php.

The permalink settings could also already been seen to be non default, non plain, in the backup SQL, in the wp_options table:

(28,'permalink_structure','/%year%/%monthnum%/%postname%/','yes'),

Towards automation, needing to visit a web page in my browser is a fairly poor solution. I'd grown lazy by automating previous tests with WordPress's official docker image. When it came to testing a server side script, debugging it inside a container inside of a VM seemed like too many SSH pivots.

Diagnosis Procedure

A working REST API generates JSON in responce to:

curl -ki https://localhost/wp-json/wp/v2/users

If in doubt the fudged API will (should, at least) return the expected JSON:

curl -ki https://localhost/index.php?rest_route=/wp/v2/users

Recreating from scratch

I created a fresh install. After clearing the modal dialogs the procedure is:

  1. Add application password
  2. Set permalinks (they appeared as before, non-plain, by default (rem: check cookies))
  3. Backup with mysqldump
sudo mysqldump --single-transaction --flush-logs --delete-source-logs wordpress > full_db_$(date +%y%m%d_%H%M).sql

Lost connections

As directed above I had visited options-permalink.php and the sample post could then be found where expected, https://localhost:8542/2022/08/hello-world/. Posting an image via API and application password worked.

The problem then centred around my second request (specifially an UPDATE after an initial SELECT) via mysql in python returning:

mysql.connector.errors.OperationalError: 2013 (HY000): Lost connection to MySQL server during query

One hacky solution I found quite effective was to close the connection and start a new one.

A common solution I see is increasing max_allowed_packet. I'm not sure why this limit needed increasing as I had only transmitted tens of KB at most. What didn't work was:

  1. Changing /etc/mysql/my.cnf would not allow mysql to restart. systemctl restart mysql would report:

Job for mysql.service failed because the control process exited with error code.

  1. set global max_allowed_packet=67108864; via python (and specifically the wordpress database) reported that I needed SUPER or SYSTEM_VARIABLES_ADMIN privilege.
  2. I dropped global and instead got:

DatabaseError: 1621 (HY000): SESSION variable 'max_allowed_packet' is read-only. Use SET GLOBAL to assign the value

The one way I found to work was sudo mysql wordpress set global max_allowed_packet=67108864;.

I can check how this persists with:

SHOW global VARIABLES like 'max_allowed_packet';

More "Lost connection to MySQL server during query"

I say "work", because this was the effect the first time. The first time however I did not check it before setting it. It could have been the time spent attempting to get mysql to acknowledge the set value, including reboots, that allowed the server to stabilise.

On subsequent reprovisionings it appeared to already be this value. https://bugs.mysql.com/bug.php?id=52802 shows it would need to be recorded in the config file (/etc/mysql/my.cnf or equivalent) to persist; the mysql shell commands would not be persisted.

When I tried setting it outside of the wordpress database I got the very error I was trying to fix:

mysql> set global max_allowed_packet=67108864;
ERROR 2013 (HY000): Lost connection to MySQL server during query
No connection. Trying to reconnect...
Connection id:    8
Current database: *** NONE ***

I tried again and it worked. But nothing changed, I still got the same error in my python-client use-case. I even did a systemctl restart mysql and nothing changed.

I rebooted.

Miraculously, it began working. Again and again and again.

I rebooted again, still working, immediately.

I re-provisioned a new VM using this new backup. Unfortunately it was back to square one with me needing to visit the permalinks settings again to even get the REST API working. As we'll see later, this is in fact a separate bug.

I followed up by seeing whether a several minute sleep would provide the automatable solution. It could not. I tried a reboot which still did not fix it.

Comparing the SQL dumps

The newly dumped SQL showed WordPress's content to be identical to the previous MariaDB one. MySQL had changed the character sets and collations across the board. This had nothing to do with me needing to visit the permalinks settings, but was interesting.

Taking one line as an example, first from MariaDB:

`user_login` varchar(60) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT '',

After the reboot and other prods from MySQL:

`user_login` varchar(60) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT '',

A completely fresh WordPress installation to MySQL shows:

`user_login` varchar(60) COLLATE utf8mb4_unicode_520_ci NOT NULL DEFAULT '',

They all work equally well. My experience wasn't perfect but there were other factors at play.

utf8mb4_unicode_520_ci is newer than utf8mb4_unicode_ci and supports some Polish letters better, for example. Not a deal breaker or maker for me, I prefer to focus on memory and disk footprint.

A certain solution

Visiting the permalink settings creates the .htaccess file inside the DocumentRoot directory (as seen in /etc/sites-enabled/mysite.conf):

# BEGIN WordPress
# The directives (lines) between "BEGIN WordPress" and "END WordPress" are
# dynamically generated, and should only be modified via WordPress filters.
# Any changes to the directives between these markers will be overwritten.
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteRule .* - [E=HTTP_AUTHORIZATION:%{HTTP:Authorization}]
RewriteBase /
RewriteRule ^index\.php$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
</IfModule>

# END WordPress

Scripting the creation of .htaccess is how to automate WordPress deployments. This is exactly what the official WordPress docker image does.

Possible (likely) other causes

I was working on a stable code base, only changing unit tests and refactoring the code they tested, that didn't touch the database. Meanwhile, I was running all the tests, and these did use the database.

When I'm not messing with code in this way, success is consistent. When I'm using my IDE to interrupt, and especially when I'm reusing the CMySQLCursor instance to query my database, lost connections become increasingly prevalent. This holds even when I'm getting no errors from the IDE and would happily commit my ad-hoc commands to script verbatim should they prove useful. It's not like I was even doing updates, just querying for examples to paste in the unit tests. This might not be instant, I prefer to start simple and leave the IDE out for the first tests.

I don't know if its a sync issue or too many hops to get to the MySQL connection. Remote debugging is still awesome. Inconsistent errors are an error and seriously impact any kind of CI/CD pipeline one might enjoy.

I need to migrate from IDE stabilisers and, no I won't say embrace battle hardened tools like VIM, but more CI/CD pipelines could certainly substitute as stabilisers.

Restarting either the VM or PyCharm IDE got rid of this error, once all the other bugs identified had been addressed.