Question
Recently, I wanted to enumerate handles from all processes and print the names of the objects corresponding to those handles. I intended to use NtQueryInformationProcess::ProcessHandleInformation to retrieve the handles’ information. After that, I planned duplicate the handles and use NtQueryObject to query the names of objects. However, I unexpectedly discovered that I could not print the names of all handles, and the program got stuck. Let’s review the following code.
|
|
I intended to query a process and obtain all the information about the handles. After that, I would duplicate the handles and retrieve the names of the objects corresponding to them using NtQueryObject. However, I found that using NtQueryObject for some handles could cause the program to hang. Debugging program, I discovered that the location causing the hang was the syscall from the NtQueryObject API, indicating a deadlock occurred in the kernel. Therefore, we need to determine why calling NtQueryObject API for certain handles can lead to a deadlock, in order to enumerate handles safely.
I used LiveKd to help determine the specific location causing the hang in the kernel. From LiveKd, we can easily understand why the program got stuck: FileObjectLock.
0: kd> !process 0 0 test.exe
PROCESS ffffbf80573d60c0
SessionId: 1 Cid: a4b4 Peb: 763ae5d000 ParentCid: 3664
DirBase: 6fa3e6002 ObjectTable: ffff80084c676b40 HandleCount: 138.
Image: test.exe
0: kd> !process ffffbf80573d60c0
PROCESS ffffbf80573d60c0
SessionId: 1 Cid: a4b4 Peb: 763ae5d000 ParentCid: 3664
DirBase: 6fa3e6002 ObjectTable: ffff80084c676b40 HandleCount: 138.
Image: test.exe
VadRoot ffffbf806e7137e0 Vads 80 Clone 0 Private 474. Modified 0. Locked 0.
DeviceMap ffff80084892fa70
Token ffff8008e378f050
ElapsedTime 00:00:40.660
UserTime 00:00:00.000
KernelTime 00:00:00.000
QuotaPoolUsage[PagedPool] 145368
QuotaPoolUsage[NonPagedPool] 11208
Working Set Sizes (now,min,max) (2364, 50, 345) (9456KB, 200KB, 1380KB)
PeakWorkingSetSize 2356
VirtualSize 4206 Mb
PeakVirtualSize 4210 Mb
PageFaultCount 2476
MemoryPriority BACKGROUND
BasePriority 8
CommitCharge 590
THREAD ffffbf807919f080 Cid a4b4.d254 Teb: 000000763ae5e000 Win32Thread: 0000000000000000 WAIT: (Executive) KernelMode Alertable
ffffbf8027abda20 SynchronizationEvent
Not impersonating
DeviceMap ffff80084892fa70
Owning Process ffffbf80573d60c0 Image: test.exe
Attached Process N/A Image: N/A
Wait Start TickCount 16653582 Ticks: 2521 (0:00:00:39.390)
Context Switch Count 54 IdealProcessor: 2
UserTime 00:00:00.000
KernelTime 00:00:00.000
Win32 Start Address 0x00007ff6e670c370
Stack Init ffffae06343e7b70 Current ffffae06343e6dc0
Base ffffae06343e8000 Limit ffffae06343e1000 Call 0000000000000000
Priority 8 BasePriority 8 PriorityDecrement 0 IoPriority 2 PagePriority 5
Child-SP RetAddr Call Site
ffffae06`343e6e00 fffff800`752201a5 nt!KiSwapContext+0x76
ffffae06`343e6f40 fffff800`752215c7 nt!KiSwapThread+0xaa5
ffffae06`343e7090 fffff800`752810c6 nt!KiCommitThreadWait+0x137
ffffae06`343e7140 fffff800`752bb2f5 nt!KeWaitForSingleObject+0x256
ffffae06`343e74e0 fffff800`756e90a0 nt!IopWaitForLockAlertable+0x49
ffffae06`343e7520 fffff800`758bbaaa nt!IopWaitAndAcquireFileObjectLock+0x50
ffffae06`343e7580 fffff800`756ef455 nt!IopQueryXxxInformation+0x1ce3e2
ffffae06`343e7620 fffff800`756eefb6 nt!IopQueryNameInternal+0x3e1
ffffae06`343e76d0 fffff800`756effeb nt!IopQueryName+0x26
ffffae06`343e7720 fffff800`756efc1f nt!ObQueryNameStringMode+0xe7
ffffae06`343e7830 fffff800`7542a405 nt!NtQueryObject+0x17f
ffffae06`343e7970 00007ffe`237702d4 nt!KiSystemServiceCopyEnd+0x25 (TrapFrame @ ffffae06`343e79e0)
00000076`3b0ff4f8 00000000`00000000 0x00007ffe`237702d4
Now, the questions are:
- Why does calling NtQueryObject API for certain handles lead to a deadlock.
- What types of handles can cause a deadlock.
- How can we query names without causing freeze.
I’m very grateful to “Kanren”, who helped me figure this out. In ntoskrnl, when NtQueryObject intends to retrieve information about an object, the system calls ObQueryNameStringMode with UserMode. This function then calls IopQueryNameInternal, which has three parameters that determine whether to obtain FileObjectLock: FileObject,UseDosDeviceName,Mode.
If the file object is ASYNCHRONOUS_IO, the system will not retrieve the lock during the query process. However, if the file object is SYNCHRONOUS_IO and the mode is UserMode, the system will acquire the file object lock using IopAcquireFileObjectLock.
Solution
- Filter SYNCHRONOUS_IO
|
|
- Set Timeout
|
|
- NtQuerySystemInformation SystemObjectInformation
We can use NtQuerySystemInformation::SystemObjectInformation instead of NtQueryObject. The SystemObjectInformation uses ObQueryNameString to retrieve name information, which then calls ObQueryNameStringMode with KernelMode. Therefore, the ObQueryNameString typically used in the kernel will not encounter this problem.