A Github repository named DoubleCallBack caught my attention. Lots of programs espically game cheats use this library so as to execute Ring3 functions from kernel. I have learned this code, to be frank with you, it is so difficult to understand that I’m surprised it works at all. Therefore, in this blog post, I will talk about DoubleCallBack and how to execute Ring3 functions from kernel gracefully.
Background
Generally, if we want to execute Ring3 functions from windows kernel, apart from using KeUserModeCallback, we have to bypass various security mechanisms, including SMEP, SMAP, etc. I have written a blog introducing how to bypass these security mechanisms and inject dynamic library from PTE.USER.
Because there are many limitations, DoubleCallBack has been released, which uses CmRegisterHook to hijack the program flow from Ring3 to Ring0, and executes Ring3 functions from kernel safely. In fact, you can use any other callbacks or exceptions to hijack the program flow and then use DoubleCallBack, such as CreateProcessCallback, LoadImageCallback, etc.
DoubleCallback
Double callback, means this technique will register a callback in Ring3 to hijack the control flow of the user thread to Ring0. This allows the kernel code to call Ring3 functions safely, instead of executing Ring3 textcode in the kernel thread directly. The key benefit of this approach is that the hijack thread is a user mode thread, can avoid many security checks like APC injection.
Before explaining DoubleCallback, it is important to first understand the general system call process. I have introduced this process(Ring3->Ring0->Ring3) from other post : Windows syscall blog post(2): Ring3 to Ring0
And we need to understand KeUserModeCallback, which is an API that helps Ring0 code to call Ring3 system functions. The flow of KeUserModeCallback like this :
nt!KeUserModeCallback ->
nt!KiCallUserMode ->
nt!KiServiceExit ->
ntdll!KiUserCallbackDispatcher ->
ntdll!KiUserCallForwarder ->
Ring3 Callback Function ->
ntdll!ZwCallbackReturn ->
nt!KiCallbackReturn ->
nt!KeUserModeCallback
KeUserModeCallback API helps us arrange stack and calls nt!KiServiceExit to return Ring3, landing at ntdll!KiUserCallbackDispatcher. Let’s look at ntdll!KiUserCallbackDispatcher
ntdll!KiUserCallbackDispatcher obtains the system function pointer from PEB.KernelCallbackTable[ApiIndex], and then calls this function using ntdll!KiUserCallForwarder. After that, it calls ntdll!ZwCallbackReturn to return to Ring0 via syscall or int2E. This is the reason why the KeUserModeCallback API can be used to call Ring3 system functions, but not custom functions.
Okay, we can continue to learn about the specific functionality of DoubleCallBack if you have understood the system call process and KeUserModeCallback. Actually, DoubleCallBack manually implements the functionality of KeUserModeCallback in order to invoke any custom functions. In other words, DoubleCallBack acquires the kernel stack, arranges the parameters and return address to stack, invoke the KiUserCallForwarder gadget and NtCallbackReturn to execute any custom functions. The full sequence of DoubleCallBack like this :
DoubleCall::call ->
shellcode swapgs/sysretq ->
ntdll!KiUserCallForwarderEnd ->
Ring3 custom function ->
ntdll!NtCallbackReturn ->
DoubleCall::call
The following code shows the process of initializing DoubleCallBack.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
|
auto [ntdll_base, ntdll_size] = Utils::find_process_module(dwm_pid, "ntdll.dll");
auto [user32_base, user32_size] = Utils::find_process_module(dwm_pid, "user32.dll");
// KiUserCallForwarder end
uintptr_t leaInstructAddress = NULL;
if (Utils::KMemorySearch::SearchPattern((PVOID)ntdll_base, ntdll_size, &leaInstructAddress, "48 8B 4C 24 ?? 48 8B 54 24 ?? 4C"))
precall_offset = leaInstructAddress - ntdll_base;
// call NtCallbackReturn
leaInstructAddress = NULL;
if (Utils::KMemorySearch::SearchPattern((PVOID)user32_base, user32_size, &leaInstructAddress, "45 33 C0 48 89 44 24 20 48 8D 4C 24 20"))
postcall_offset = leaInstructAddress - user32_base;
ULONG64 pSyscall = 0;
Utils::read_msr(0xC0000082, &pSyscall)
ULONG offset = *(ULONG*)(pSyscall + 8);
if ((offset & 0xFF) != 0x10)
return false;
offset &= 0xFF00;
g_offset_stack = offset + 8;
if (offset != 0)
g_kvashadow = 1;
else
g_kvashadow = 0;
// Allocate Kernel Stack
for (int i = 0; i < 10; i++) {
this->stack[i] = MmCreateKernelStack(0, 0, 0);
this->stackUseStatus[i] = 0;
}
|
The crucial question is how the parameters should be arranged. Since the custom functions reside in Ring3 memory, we need to obtain the user stack and arrange the parameters there. Let’s look at following code:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
|
ULONG64 GetUserStackPtr() {
auto CurrentThread = (ULONG64)KeGetCurrentThread();
auto trapFrame = *(KTRAP_FRAME**)(CurrentThread + KernelOffsets::TrapFrame);
return *(ULONG64*)&trapFrame->Rsp;
}
ULONG64 Call(ULONG64 ptr, ULONG64 a1, ULONG64 a2, ULONG64 a3, ULONG64 a4, ULONG64 a5, ULONG64 a6) {
auto NewUserRsp = (char*)((GetUserStackPtr() - 0x98) & 0xFFFFFFFFFFFFFFF0);
bool i_enable = __readeflags() & 0x200;
ULONG64 OldIrql = __readcr8();
if (OldIrql > 0) {
__writecr8(0);
}
*(ULONG64*)(NewUserRsp + 0x20 + (0 * 8)) = a1;
*(ULONG64*)(NewUserRsp + 0x20 + (1 * 8)) = a2;
*(ULONG64*)(NewUserRsp + 0x20 + (2 * 8)) = a3;
*(ULONG64*)(NewUserRsp + 0x20 + (3 * 8)) = a4;
*(ULONG64*)(NewUserRsp + 0x70 + (0 * 8)) = a5;
*(ULONG64*)(NewUserRsp + 0x70 + (1 * 8)) = a6;
void* cData[9];
memset(&cData, 0, sizeof(cData));
ULONG64 ret = Call_impl(ptr, NewUserRsp, cData);
if (OldIrql > 0) {
__writecr8(OldIrql);
}
if (!i_enable)
_disable(); // cli
return ret;
}
|
The code shows that the DoubleCallBack retrieves the current thread’s rsp from the KTHREAD.TrapFrame field. When any callbacks or exceptions occur, the context of the current thread is loaded into KTHREAD.TrapFrame. And this is the reason why DoubleCallBack cannot be used in APC.
After that, DoubleCallBack compiles a shellcode to return to Ring3:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
|
; FnKiCallUserMode(ULONG64 OutVarPtr, ULONG64 CallCtx, ULONG64 KStackControl)
; alloc stack
sub rsp, 138h
; save non volatile float regs
movaps xmmword ptr [rsp + 70h], xmm10
movaps xmmword ptr [rsp + 40h], xmm7
lea rax, [rsp + 100h]
movaps xmmword ptr [rax - 80h], xmm11
movaps xmmword ptr [rax - 50h], xmm14
movaps xmmword ptr [rsp + 30h], xmm6
movaps xmmword ptr [rax - 60h], xmm13
movaps xmmword ptr [rsp + 50h], xmm8
movaps xmmword ptr [rax - 40h], xmm15
movaps xmmword ptr [rsp + 60h], xmm9
movaps xmmword ptr [rax - 70h], xmm12
; save non volatile int regs
mov [rax - 8], rbp
mov rbp, rsp
mov [rax], rbx
mov [rax + 8], rdi
mov [rax + 10h], rsi
mov [rax + 20h], r13
mov [rax + 30h], r15
mov [rax + 18h], r12
mov [rax + 28h], r14
; save ret val vars
mov [rbp + 0D8h], rcx
lea rax, [r8 - 30h]
mov [rbp + 0E0h], rax
mov rbx, gs:[188h]
mov [r8 + 20h], rsp
mov rsi, [rbx + 90h]
mov [rbp + 0D0h], rsi
; save new stack vars
cli
mov [rbx + 28h], r8
lea r9, [r8 + 50h] //calc new stack ptr
mov [rbx + 38h], r9
mov rdi, gs:[8]
mov [rdi + 4], r8
; save cur trap frame
mov ecx, 6000h
sub r9, rcx
mov gs:[1A8h], r8
mov [rbx + 30h], r9
lea rsp, [r8 - 190h]
mov rdi, rsp
mov ecx, 32h //190h
rep movsq
; fix csr
lea rbp, [rsi - 110h]
ldmxcsr dword ptr [rbp - 54h]
; restore rbp, r11
mov r11, [rbp + 0F8h] ; restore eflags
mov rbp, [rbp + 0D8h]
; load rip, rsp, rax
mov rax, [rdx + 10h] ; Ring3 custom function ptr
mov rsp, [rdx + 8] ; userRsp
mov rcx, [rdx] ; PreCall : KiUserCallForwarder end
; load floats
movss xmm0, dword ptr [rdx + 18h]
movss xmm2, dword ptr [rdx + 28h]
movss xmm1, dword ptr [rdx + 20h]
movss xmm3, dword ptr [rdx + 30h]
; load userctx x32
cmp qword ptr [rdx + 38h], 0
jz Sw2UserMode
mov r13, [rdx + 38h]
; goto usermode
Sw2UserMode:
swapgs
sysretq
|
After executing the sysretq instruction, the rip is set from rcx, and the control flow returns to execute the ntdll!KiUserCallForwarder gadget in Ring3, which will then invoke the custom function ptr in usermode.
Qriver4.0 WriteUp
理论和背景知识说完了,来看一下NepCTF2024-Qriver4.0这题。众所周知我出题不喜欢卷一堆复杂的算法来堆体力活,这次也是同样,希望用Qriver4.0这题来管中窥豹的讲一下Windows的系统调用过程以及Ring0执行Ring3函数的姿势。
首先拿到题目,同样是要先加载.sys文件后启动.exe程序。IDA载入.exe程序会发现只有三行代码:输入到全局变量,RegOpenKeyEx,输出。如果挂调试器就会发现输入完毕执行RegOpenKeyEx API后调试器失去响应,很明显驱动在整活,劫持了RegOpenKeyEx系统函数的执行。IDA载入sys文件会发现函数很多(这是因为使用了WinKernel C++框架),也没有明显的定位字符串(对字符串单独处理了),甚至DriverMain函数还加了VMP,如何定位关键函数是第一个考点。预期思路两个:
- Hook ntoskrnl API : 这个思路很好理解,题目明显是驱动劫持了程序的执行时机,读取三环全局变量获取输入,校验数据。那么类似于PsLookupProcessByProcessId、PsGetProcessPeb(驱动枚举进程模块需要)、MmCopyVirtualMemory(读取内存,但这题没走IAT)等API基本逃不开的。Hook这些函数的IAT或者InlineHook过滤返回地址,就可以定位到关键函数。但这种做法的前提是要有内核Hook相关的代码储备,这个对于CTFer来说是有难度的
- CmRegisterCallback :EXE中输入和输出之间只有RegOpenKeyEx API的调用,在Windows中存在注册表回调机制可以劫持控制流。即驱动通过CmRegisterCallback注册注册表回调后,对于任意程序发起的注册表操作均会经过回调函数。可以使用任意ARK工具验证这一点。
通过这个回调入口,我们可以定位到回调函数偏移,然后IDA跳过去看。接下来的逻辑就很简单了,总体来说就是比对注册表的操作是否是Qriver4.0.exe发起的,然后从导出表中获取到 get_input、get_information、enc 三个函数地址,DoubleCallBack调用他们获取到输入。算法部分很简单,先用C++将输入反转,然后做了一个简单的异或。异或后将结果写回到Ring3进程中,用DoubleCallBack调用enc函数做XXTEA,最后再比较结果,如果正确则修改Ring3进程内存,将fail改为congratulation。逆向分析调试是基本能力,所以故意用C++写了反转输入避免大师傅静态手撸。本题主要介绍 Ring0 -> Ring3 的姿势,所以程序也没上什么混淆,就简单处理了下字符串做了个异或,反调试上的也很简单:Ring3部分清除了.exe的EPROCESS->DebugPort,所以会出现调用RegOpenKeyEx API后调试器无响应的情况;Ring0部分简单调用了KdDisableDebugger来Detach内核调试器。只需要patch这两处即可绕过。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
|
// 注册表回调函数
NTSTATUS RegCallback(_In_ PVOID CallbackContext, _In_opt_ PVOID Argument1, _In_opt_ PVOID Argument2)
{
NTSTATUS nStatus = STATUS_SUCCESS;
auto reg_notify_class = (REG_NOTIFY_CLASS)(uint32_t)Argument1;
auto reg_key_information = (PREG_CREATE_KEY_INFORMATION)Argument2;
if (reg_notify_class == RegNtPreOpenKeyEx)
{
if (reg_key_information && reg_key_information->CompleteName->Length > 0 && !wcscmp(reg_key_information->CompleteName->Buffer, L"Software\\Qriver4.0\\"))
{
auto pid = PsGetCurrentProcessId();
std::string process_name = Utils::GetProcessImageNameByProcessID(pid);
AntiProcessDebug(pid);
Utils::call<void>(Utils::GetProcAddress("KdDisableDebugger"));
auto usermode_call = Utils::UsermodeCall(PsGetCurrentProcessId());
if (usermode_call.is_inited() == false)
return nStatus;
auto [base, size] = Utils::find_process_module(pid, process_name.c_str());
auto get_input = base + Utils::FindExportOffsetByName(base, "get_input");
auto get_information = base + Utils::FindExportOffsetByName(base, "get_information");
auto enc = base + Utils::FindExportOffsetByName(base, "enc");
ULONG64 input_address = usermode_call.CallUserMode(get_input);
ULONG64 information_address = usermode_call.CallUserMode(get_information);
auto length = my_strlen((char*)input_address);
if (length < 16 || length % 4 != 0)
return nStatus;
// Part1 : reverse input
char buff[256] = { 0 };
Utils::safe_copy(buff, (void*)input_address, length);
std::string input = buff;
std::reverse(input.begin(), input.end());
// Part2 : xor
part2(buff, input.data(), input.size());
Utils::safe_copy((char*)input_address, buff, length);
// Part3 : XXTEA
uint32_t delta = *(uint32_t*)RegCallback;
usermode_call.CallUserMode(enc, input_address, length, delta, 0, 0, 0);
char cipher[] = {0x10, 0x43, 0xA9, 0xDA, 0x12, 0xB1, 0xDD, 0x5B, 0x93, 0xA4, 0xFF, 0xCC, 0x23, 0x95, 0x0F, 0x81, 0x29, 0xDA, 0x29, 0x57, 0x67, 0x26, 0x23, 0x76, 0xE1, 0x79, 0x45, 0xF5, 0xB0, 0xA3, 0xC7, 0xEE};
Utils::safe_copy(buff, (char*)input_address, length);
if (memcmp(cipher, buff, 32) == 0) {
std::string info = "congratulation";
Utils::safe_copy((void*)information_address, (void*)info.c_str(), info.size());
}
}
}
return nStatus;
}
__forceinline void part2(char* dst, const char* src, uint32_t length) {
*dst = *src ^ 0x51;
for (int i = 1; i < length; ++i)
*(dst + i) = *(src + i) ^ *(dst + i - 1);
}
// DebugPort 清零
BOOL AntiProcessDebug(HANDLE pid) {
BOOL bRet = FALSE;
PEPROCESS Process = NULL;
if (pid <= (HANDLE)4)
return bRet;
PsLookupProcessByProcessId(pid, &Process);
if (Process == NULL)
return bRet;
PVOID* pDebugPort = reinterpret_cast<PVOID*>(reinterpret_cast<uintptr_t>(Process) + KernelOffsets::DebugPort);
if (MmIsAddressValid(pDebugPort))
{
*pDebugPort = NULL;
bRet = TRUE;
}
ObDereferenceObject(Process);
return bRet;
}
|
附上EXP:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
|
#include <windows.h>
#include <stdio.h>
BYTE cipher[] = {0x10, 0x43, 0xA9, 0xDA, 0x12, 0xB1, 0xDD, 0x5B, 0x93, 0xA4, 0xFF, 0xCC, 0x23, 0x95, 0x0F, 0x81,
0x29, 0xDA, 0x29, 0x57, 0x67, 0x26, 0x23, 0x76, 0xE1, 0x79, 0x45, 0xF5, 0xB0, 0xA3, 0xC7, 0xEE};
BYTE xor_key = 0x51;
uint32_t TEA_DELTA = 0x2444894c;
void tea_dec(uint32_t *v, int n)
{
constexpr uint32_t keys[] = {0x12345678, 0x87654321, 0x9ABCDEF0, 0x0FEDCBA9};
int p = 0;
uint32_t y, z, sum;
unsigned rounds, e;
if (n < 2)
{
return;
}
rounds = 6 + 52 / n;
sum = rounds * TEA_DELTA;
y = v[0];
do
{
e = (sum >> 2) & 3;
for (p = n - 1; p > 0; p--)
{
z = v[p - 1];
y = v[p] -= (((z >> 5 ^ y << 2) + (y >> 3 ^ z << 4)) ^ ((sum ^ y) + (keys[(p & 3) ^ e] ^ z)));
}
z = v[n - 1];
y = v[0] -= (((z >> 5 ^ y << 2) + (y >> 3 ^ z << 4)) ^ ((sum ^ y) + (keys[(p & 3) ^ e] ^ z)));
sum -= TEA_DELTA;
} while (--rounds);
}
FORCEINLINE void dec(char *v, int length)
{
tea_dec((uint32_t *)v, length / 4);
}
int main() {
dec((char*)cipher, 32);
for (int i=31; i>0; --i)
cipher[i] ^= cipher[i - 1];
cipher[0] ^= xor_key;
// print flag
for (int i=31; i>=0 ; --i)
{
printf("%c", cipher[i]);
}
}
// F62FC02FD47486510568474DF20CF26F
|
Reference
- https://github.com/wbaby/DoubleCallBack
- https://cloud.tencent.com/developer/article/1600862