2011-10-13
Strange MySQL (5.0) issue with authentication
Dear Lazyweb….
I ran into a very strange issue with MySQL authentication. The initial situation is as follows:
Two MySQL servers, let’s name them a0.my.do.main and a1.my.do.main. Their my.cnf configuration is identical except for the server_id (equals 1 on a0 and 2 on a1).
Since I wanted to set up a master< ->master replication, I created a user “repl@%.my.do.main” on a0, then shut down the mysql server on both hosts, copied over the mysql database on the filesystem layer (i.e. rsync of the data directory), then started mysql on both hosts again. So far, so good, both servers came up nicely, and local (unix socket) login as a root user worked fine as expected. However:
Logging in via TCP socket only works from a1 (to both a0 and a1). Logging in via TCP socket from a0 doesn’t work to either host. I also created a test user “sven”@”%.my.do.main” (I also tried with @”%”) which has all privileges.
Here is what the commands output:
sven@a0 ~ > mysql -u sven -ptestpass -h a0.my.do.main
ERROR 1045 (28000): Access denied for user 'sven'@'a0.my.do.main' (using password: YES)
sven@a0 ~ > mysql -u sven -ptestpass -h a1.my.do.main
ERROR 1045 (28000): Access denied for user 'sven'@'a0.my.do.main' (using password: YES)
sven@a1 ~ > mysql -u sven -ptestpass -h a0.my.do.main
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 32
Server version: 5.0.77-log Source distribution
Type 'help;' or '\h' for help. Type '\c' to clear the buffer.
mysql> Bye
sven@a1 ~ > mysql -u sven -ptestpass -h a1.my.do.main
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 24
Server version: 5.0.77-log Source distribution
Type 'help;' or '\h' for help. Type '\c' to clear the buffer.
mysql>
All packages on both machines are up to date (CentOS5 with all updates available). The server variables shown with “show variables” also seem identical (except for the hostname and the server id). Besides, it looks more like a client problem to me than a server problem.
Last datapoint: If I define the mysql user “sven” with an explicit host (a0.my.do.main and a1.my.do.main) instead of a wildcard, everything works nicely, from and to both hosts.
So, dear lazyweb: Does anyone have an idea what might be causing this issue?
Update:
Due to some comments, let me add a few points of information:
- flush hosts doesn’t help. Host resolution looks OK, according to documentation, the hostname in the error message is what the server things is right, not what the client specified, and that documentation seems to be consistent with my experiences.
- The “db” and “user” tables look as expected. And I’m quite sure they are not the issue here, as I can connect from other hosts that match the host pattern.
- There is no ~/.my.cnf on either host
- Copying the database files on the filesystem layer is one of the documented ways of initializing the database on the slave for replication. Anyway, I had the same issue when trying to set up the database on the slave by loading a dump.
- DNS looks fine, both forward and reverse mappings, both servers use the same DNS server.
- IPv6 is not involved, it is disabled on these servers
Thanks for the hint at serverfault.com, I will try that if I don’t get sufficient info here.
Update 2
Another strange data point:
When I disable the (non-privileged) anonymous access user in the mysql.user table (i.e. drop the rows with user=”"), everything works as expected from both servers. However, why does it work from one host (client) to both DBs if the anonymous access is allowed (with no privileges except for being able to see that the test and mysql databases exist), but not from the other host (client)? It really looks like an issue on the client side, but I still don’t get what exactly goes wrong.
See also Strange iptables error with kernels >= 2.6.32 – solved
See also Dual boot and full encryption
See also hotplug events for media changes?
Permalink

Poisonbit said,
October 13, 2011 at 08:36 UTC (+0000)
Maybe about direct/reverse DNS entries?
It’s hard to guess it, but maybe it’s DNS, or IPv4 Vs IP6….
There is a sql statement “FLUSH HOSTS;” or “mysqladmin flush-hosts”.
Good luck.
Chris Hiestand said,
October 13, 2011 at 09:10 UTC (+0000)
I don’t know the answer but I’d do 3 things in order:
1. Try looking at the mysql (I mean the actual database named mysql) db and user tables and make sure the values are what you’d expect
2. Check your ~/.my.cnf to see if something might be wrong there.
3. Post this question up on serverfault.com
Also copying over the database files on the filesystem layer doesn’t really seem like a good idea, although I’m not expert enough to say that definitively.
Chris Hiestand said,
October 13, 2011 at 09:12 UTC (+0000)
Also check that the forward and reverse DNS for both hosts are resolving okay from eachother.
sven said,
October 13, 2011 at 09:26 UTC (+0000)
Hi everyone, thanks for the comments, I updated the post accordingly.
macno said,
October 13, 2011 at 10:02 UTC (+0000)
When you try to connect from a0 to a1.my.do.main , error output is “my-a1.my.do.main”. Why? is it just a typo?
sven said,
October 13, 2011 at 10:19 UTC (+0000)
Yes, that was just a typo while converting from my (confidential) internal names to something I could post. (Fixed)
Alex Hudson said,
October 13, 2011 at 11:02 UTC (+0000)
Taking a guess based on your second update, you’re being bitten by specificity.
If you have user table entries with hosts using wildcards (‘%’ etc), they will be treated less preferentially. So, if your connecting client potentially matches more than one of the entries in there for some reason, it will pick the one with the most specific host – even if the user is less specific.
So, user=” host=’localhost’ (for example) will win over user=’sven’ host=’%.wherever’, if you’re connecting from localhost.
Given you said removing the anon entries “made it work”, I think this is the reason. It probably doesn’t totally answer “why” – I’m assuming that your other host is matching a number of entries in there for reasons not totally clear; could be your /etc/hosts or something.
sven said,
October 13, 2011 at 12:05 UTC (+0000)
Problem is that the same server accepted a connection from the other host fine. And user ” host=’localhost’ doesn’t match ‘sven@a0.my.do.main’, and (sven@)’%.my.do.main’ is more specific then (”)@’%’.
As said, both servers are completely set up the same (including /etc/hosts, I just verified that) except for the differences needed (IP address .11 on a1, .10 on a0).
If specificity would hit here, it should either happen from both clients or from none of them. Also, connecting from a third host (which works nicely in all tested setups) should also hit that specificity issue.
Anyhow, I can’t reproduce the problem anymore. But I’m still open for ideas.