RSS

Tag Archives: ntdebug

Cannot launch more than 50+ java.exe processes due to desktop heap

The other day, the application team came to me to diagnosis an old issue with their application. Its application that launches java programs, each spawning a java.exe process. They found that after spawning 50+ processes, the next attempt will error with “Not enough storage is available to complete this operation.”

Now if you do a search on google, the first thing that you will find is a description of IRPStackSize registry key. However, what is odd about this problem is that the eventlog is clean. There are no events related to IRPStackSize issues. Snce most articles regarding “not enough storage” points to IRPStackSize, I decided to make the changes to the server and see if it helps. However, even after setting the value up to 30, there was hardly any improvements in the number of additional java.exe processes that can be spawned. So this eliminates the IRPStack issue. Read the rest of this entry »

 
Leave a comment

Posted by on November 27, 2010 in Windows

 

Tags:

Windows 2003 servers with STOP 0xA (x,D2,x,x) after applying MS09-059

After our recent round of Microsoft patches, we are seeing an usually high number of servers BSODing with STOP 0A errors with no conclusive resolution.

At the same time, we are seeing server crashing with 0A or C5 error when Symantec antivirus is running its schedules scans. I did a sav debug on those servers and did not identify any particular reason for the crashes. At first I thought the crash was caused by SAV while scanning application files, but it wasn’t. [MS confirmed this is a different issue]

MS confirmed that servers running terminal services (like ever server) with MS09-059 can potentially have this stop errors and the only solution is to apply this hot fix below:

http://support.microsoft.com/default.aspx/kb/978243

 
Leave a comment

Posted by on January 5, 2010 in Windows

 

Tags:

CPU issue causing BSOD 7F (0x0D, 0x00, 0x00) & 9C (0x00,xxx,xx,xx)

We had a strange server that would BSOD regularly with stop codes 7F or 9C, in the later days it was BSODing every 10 mins or so.

So we had the motherboard change and it stop BSODing for about 1 hour and started again!

Diagnostics was ran but all were successful (how typical!) and HP was really reluctant to bring down other parts as the diagnostics showed okay. In fact, this is not the first time I have seen HP servers with good diagnostics but after changing on of the parts, everything is resolved. Read the rest of this entry »

 
1 Comment

Posted by on February 21, 2009 in Windows

 

Tags:

Windows 2000 cannot startup with message “loader error 3

Recently we were going through a upgrade exercise for our entire Windows estate to patch them to out latest security and company specific components. We encountered, so far, two servers that when rebooted, Windows 2000 will not start up and they showed this error

Windows 2000 could not start because of an error in the software. Please report this problem as Loader Error 3.

Starting up in recovery console or safe mode is no hope as we encountered the same error. So this points to a fundamental issue with the Operating System. Read the rest of this entry »

 
Leave a comment

Posted by on February 21, 2009 in Windows

 

Tags:

MSCluster: Low free PTEs caused cluster service to disconnect

Back in the days when a lot of us are not sure what the 3GB switch really does and thought its must be set so that Windows can recognise 4GB RAM and above, a number of our application servers has been set with 3GB switches in their servers. This is one of the servers.

The other day one of the application cluster suddenly failed over. A quick check on the servers’ eventlogs show not issues with low non-paged pool memory or memory issues nor any network issues. The application logs was rather clean, other than a strange repetitive event from the MOM agent. We have a keep alive event which the MOM agent runs once a day, but I was seeing the same event running 2 twice every minute. On another server when had the same program we could see that the idling node had cpu running at 20% or more. Once we stopped the MOM agent, the cpu dropped to almost idle. Read the rest of this entry »

 
Leave a comment

Posted by on June 30, 2008 in Windows

 

Tags:

DFSR: TSM excluding replicated folders in backup

This is an issue if you are running DFSR and using TSM as a backup client, that TSM will not backup the replicated folders. Apparently DFSR sets up the keys in

HKLM\SYSTEM\CurrentControlSet\Control\BackupRestor e\FilesNotToBackup

and the TSM client honours these values and would exclude replicated folders from its backup.

There is an interim fix from TSM that should fix this issue.

The manual workaround is to stop DFSR before backup runs and start it after the backup is complete.

 
Leave a comment

Posted by on May 28, 2008 in Windows

 

Tags: , ,

Windebug: Server out of domain due to TCP/UDP port maxed

Last 2 weeks we had 2 different business services that faced similar issues. Essentially, their servers would drop off the domain, so when you tried to logon with your domain account, it will complain that the server is not in the domain.

I came to work on this when we had a number of servers from the same service team that had SCOM heartbeat issues. When we tried to logon to have to a look, we found that those servers had dropped out of the domain. Read the rest of this entry »

 
Leave a comment

Posted by on May 13, 2008 in Windows

 

Tags: ,

Win: Unable to access server locally via alias (cname)

2 weeks ago, we had an issue raised by the apps team who was trying to access their server’s share via an alias (or cname). The server, e.g SERVER1, had an DNS alias, e.g. PROD-SVR1. The apps guy was able to access the server’s share via the alias remote, i.e. he could access \\PROD-SVR1\share1 from another server successfully. However, when he did the same thing on the server itself, it just keeps giving the error “network name not found”.

This was puzzling to me, also because I personally never access a server locally via an alias. I did a search on the internet and keep getting MSKB 281308. The issues looks similar, but I was not convinced that this was the issue, further more the server is a W2K3 server and the regkey entry, DisableStrictNameChecking, was already set to 1. More important, I had no problems access the server via the alias remotely, it was locally that was the problem.

We spent a few hours hunting down possibilities and articles on the internet, but I keep getting pointed back to the article above. We tested lmhost files, checking netmon, filemon, regmon, but nothing seems to tell me what is wrong. After a few hours, we had to give up and told the apps guys that it was just impossible for him to do it.

Some 2 weeks later, someone posted a similar query to our global distribution list and one of the London guys replied with a regkey change. I looked at it, did a quick search on the key and realized how dumb I was 2 weeks ago! MSKB 926642 pointed to the solution I wanted!

Sigh… I felt so disappointed in myself for not searching properly for it, but then, I learnt something new everyday.

 
1 Comment

Posted by on April 18, 2008 in Windows

 

Tags:

Windebug: Windows 2003 BSODing frequently with different error

Recently we found a server (HP Proliant DL580G2) rebooting itself almost everyday because of BSOD. My first instinct was to get the server patched with all the latest patched. I especially thought that the issue is SATA or SCSI drivers, as we have seen other Windows 2003 servers having similar issues with storport drivers, HP Proliant server running storport crashing, and it resolved via patching the HP drivers and updating W2k3 with the hotfixes.

This was also the original diagnosis and recommendation from Microsoft support also:

1. The stop code is 0x000000C2, which indicates that the current thread is making a bad pool request:

1: kd> .bugcheck
Bugcheck code 000000C2
Arguments 00000007 0000121a 00000800 e9a15d90

2. The 4th parameter is the pool block address got corrupted:
*e9a15d78 size: 288 previous size: 10 (Allocated) *Toke (Protected)

1: kd> dc e9a15d90
e9a15d90 00000000 00000000 00000000 00000001 …………….
e9a15da0 00000000 00000000 bad0b0b0 82100000 …………….
e9a15db0 00000000 00000000 61766441 20206970 ……..Advapi

3. The previous pool block should reach e9a15d78+288=e9a16000:

1: kd> dc e9a16000
e9a16000 e8000000 fffffc84 c01bd8f7 00074992 ………….I..
e9a16010 0054004e 0073005c 00730079 00650074 N.T.\.s.y.s.t.e.
e9a16020 0033006d 005c0032 006c0064 0063006c m.3.2.\.d.l.l.c.

4. Search in the memory and found the address could be referenced by pool tagged with $CPH, which should be owned by CPQPHP:

1: kd> !for_each_module s-a @#Base @#End “$CPH”
f34d8869 24 43 50 48 ff 74 24 08-6a 01 ff 15 48 01 4d f3 $CPH.t$.j…H.M.

1: kd> u f34d8869
*** ERROR: Module load completed but symbols could not be loaded for CPQPHP.SYS
CPQPHP+0xc869:
f34d8869 2443 and al,43h

I got in contact with the server owner and schedule an upgrade, but as I was doing the upgrade, the server just kept crashing. This is a bit strange, as it is not running anything much at that point in time. Read the rest of this entry »

 
Leave a comment

Posted by on January 18, 2008 in Windows

 

Tags:

Windebug: BSOD 0x0000001e (Non-paged pool empty)

One of our application server had BSOD for the past 2 weeks with a similar error. There were a lot of event id complaining “The server was unable to allocate from the system nonpaged pool because the pool was empty” before the server crashed.

The BSOD error was 0x0000001e (0xc0000006, 0xa0001d52, 0x00000000, 0x00370cb4).

The application team was quite insistent that their application couldn’t be the cause of the error and so I raised an issue with Microsoft to help me have a look and here was the result:

> BSOD code was as follows

ExceptionAddress: a0001d52 (win32k!XDCOBJ::bCleanDC+0x00000010)
ExceptionCode: c0000006 (In-page I/O error)
ExceptionFlags: 00000000
NumberParameters: 3
Parameter[0]: 00000000
Parameter[1]: 00370cb4
Parameter[2]: c000009a

> A check with virtual memory shows that nonpaged pool was almost used up

0: kd>!vm

*** Virtual Memory Usage ***
Physical Memory: 524165 ( 2096660 Kb)
NonPagedPool Usage: 64660 ( 258640 Kb)
NonPagedPool Max: 68609 ( 274436 Kb)
********** Excessive NonPaged Pool Usage ***** Read the rest of this entry »

 
1 Comment

Posted by on January 7, 2008 in Windows

 

Tags: