Recently there have been reports of problems getting newly registered users logged in when using SSH. The SysAdmin registers Alice and then has Alice try to login via SSH. Alice reports an authentication error.

The syserr_log messages seem to indicate that Alice is not registered.

05:04:02 Process 1111812D, root.root (sshd), created. 
05:04:02 root (sshd) : [AUTH] [INFO] sshd [PID=0x1111812D]: Invalid user alice 
+ from 127.0.0.1 
05:04:02 root (sshd) : [AUTH] [INFO] sshd [PID=0x1111812D]: input_userauth_req 
+uest: invalid user alice 
05:04:02 root (sshd) : [AUTH] [INFO] sshd [PID=0x1111812D]: Failed none for in 
+valid user alice from 127.0.0.1 port 54869 ssh2 
05:04:06 root (sshd) : [AUTH] [ERR] sshd [PID=0x1111812D]: error: Could not ge 
+t shadow information for NOUSER 
05:04:06 root (sshd) : [AUTH] [INFO] sshd [PID=0x1111812D]: Failed password fo 
+r invalid user alice from 127.0.0.1 port 54869 ssh2

 

But if Alice uses a Telnet client instead of an SSH client she can login without problems.

05:04:23 Process 1111812E, PreLogin.System (pre-login), created. 
05:04:38 Process 1111812E switched to alice.CAC (login).

 

In addition if instead of trying to login immediately after being registered Alice waits a few minutes she can login without problems.

05:12:51 Process 11118130, root.root (sshd), created. 
05:12:55 root (sshd) : [AUTH] [INFO] sshd [PID=0x11118130]: Accepted password 
+for alice from 127.0.0.1 port 54909 ssh2 
05:12:55 Process 11118131, root.root (sshd), created. 
05:12:55 Process 11118131 switched to alice.CAC (login).

 

The really frustrating thing is that sometimes the first newly registered user has no trouble authenticating with SSH but a second one does.

So what is going on?

There are certain issues with having POSIX programs like SSH reference the registration database directly. Instead their authentication calls are serviced by the posix_regdb_server process. The posix_regdb_server process maintains a copy of the registration database that it references to determine if a user is valid. Every time the posix_regdb_server looks up a user ID it compares the current time with the time it last read the registration database and if that time is more than 5 minutes in the past it rereads the database. So it is possible that a newly registered user will have to wait up to 5 minutes before he/she can login using SSH.

For the scenario where the first newly registered user has no problems but the second one does what happens is that the SysAdmin registers Bob and has Bob try to login. Since it has been more than 5 minutes since the last authentication request the posix_regdb_server refreshes its copy of the database, sees Bob as a valid user and Bob can login. Now the SysAdmin registers Alice and has Alice try to login. However, since it has only been a minute or two since Bob logged in the posix_regdb_service does not refresh its copy of the database and Alice is not found so is not allowed to login.

There is no way to change the 5 minute window. If you stop the posix_regdb_server process it will refresh its copy of the registration database when it is restarted. Typically, it will automatically be restarted within a minute but it can take up to 2 minutes so while you can speed things up you cannot speed them up by much.

A better strategy is to register all new users at the same time, before trying any of them. Also to register them at a time when no one is trying to login via SSH, that way when the first new user tries to login the posix_regdb_server will refresh its database and get all the new users at the same time.

Of course finding a time when no one is trying to login via SSH may be difficult so the best strategy is to register all the new users at the same time then go get a cup of coffee before telling them that they can login.