This morning our HP SIM server decided to stop working… The problem is that no servers appeared in the system list this morning although we have at least 1,000 server listed last night!
So my first thought was that someone deleted all my servers accidentally last night. No problems, the host file is still intact and running the “Add Systems” should fix this.. or so I thought. No such luck after 30 mins, no server appeared.
So now my attention turned to security, could there be a permissioning error? We are using domain groups for permissioning (something new in this version) instead of individual users. So I thought maybe the system account which I am using or the HP SIM service could not resolve the group properly. So I added my own account and gave full permissions, logged out, logged in, no such luck, no servers in sight. I even tried logging on using the default system account without any sucess.
Of course, one of the first thing that you would check is the event logs. But its very quiet, not a single complain about anything!
Now my attention moves to the database. As the database is managed by a DBA team, I suspected something could have changed to have cause the service not to query the database properly. However, I expected such errors to turn up on HP SIM, but there weren’t. So I ran ODBC from my HP SIM with the db account I have to connect to the database and it return error. Finally, some complains!
I loggged on to the db server and checked its event logs. Ah ha! Its full of errors generated by the HP SIM account:
Error: 9002, Severity: 17, State: 6
The log file for database ‘tempdb’ is full. Back up the transaction log for the database to free up some log space.
An email to the DBA team and they fixed it. Now my servers are back in HP SIM.
Its strange that HP SIM never complained about this in the eventlogs and I had to spend like 1 hour troubleshooting on this issue. Anyway, lesson learned, no events doesn’t mean everything is fine and good