In May this year, I participated in DEF CON CTF Qualifier 2017 as a member of a certain 武蔵野-related team. Actually, I’m not a top-tier CTF player, but I did my best and solved 4 challenges:
- Pepperidge Farm
Pepperidge Farm is categorized into Reverse Engineering. The problem statement is below:
Remember when the first CTF was run with a custom architecture? Pepperidge Farm remembers:
It seems like a keygenning challenge on the custom virtual machine–VMNDH-2k12.
VMNDH-2k12 is the VM built for Nuit du Hack CTF Quals 2012 as its name suggests. The architecture is described in shell-storm | Useless emulator for fun (VMNDH-2k12). This VM parses given serialized binary and repeats fetch, decode and execution.
Writing IDA loader/processor module is a common way to analyze VM-based obfuscated binary. The modules for VMNDH-2k12 and for modified version VMNDH-2k13 already exist:
Note that when you try to solve this challenge with above-mentioned processor modules with IDA Pro 7.0, the backward-compatibility issue will occur. For example, you have to change
self.reg_first_sreg in a module.
Also, the Binary Ninja plugin has been released after quals:
But in this post, I describe a solution without both IDA Pro and Binary Ninja. Because VMNDH-2k12 is open-sourced and easy to modify.
Yes, the VM has own debugger and disassembler.
However, there are pitfalls here.
Because it is a unique architecture, the destination of the control flow instructions are different from it shown on the disassembly dump. For example, take a look at
Acording to this, I modified the dissassembler in
This makes it possible to correctly display the address of the call destination in the disassembly dump. In addition, jump instructions need to be modified. In the case of
0x8759 it looks like a data section.
However, parts such as
0x87e8 are misinterpreted as codes.
The data section was not obfuscated.
So I gave first aid to
Awful… who cares?
This is a failure case.
As we have seen so far, VMNDH-2k12 is open-sourced. So I tried to solve the challenge with source code-based symbolic execution tool–KLEE.
src_vm/syscall_write.c for assertion:
Here is a modified
I’d left all of it to KLEE and get to bed…
… It’s not going to be easy.
I also wrote solver with angr. Which symbolizes stdin, but… let’s not talk about it.
An example of insufficient SMTLIB2 representations is:
Since there is no choice, I read all the disassembly. After some twists and turn, I realized that:
- Pepperidge Farm checks character codes against transformed
x * 100 + y.
Now Z3 time. For example, subroutine
This is satisfiable. But not enough. We need to add rest of constraints.
Even with halfway constraints, the process proceeds. So inscount with Pin or other dynamic binary instrumentation tools might be helpful.
Finally I got:
The conclusive SMTLIB2 representation is:
I enjoyed this challenge. It seems to be easy or medium-easy difficulty rating. If only I could have solved more difficult Reverse Engineering challenges during quals–liberty, godzilla, and so on.
Recently I read T. Blazytko et al. USENIX Security’17. The paper says the system named Syntia automatically deobfuscate binaries with program synthesis. Program synthesis is a method to synthesize some pieces of program from given I/O samples and possible operators–like this:
This is just a simple example. In practical, program slicing and path pruning will be needed. Both symbolic execution and program synthesis depend on SMT solver, but according to the paper, program synthesis is more suitable for deobfuscation tasks… really? I’ll investigate further.