Hopper (Part 2): Universal Patcher

The Big Idea

This article is for educational purposes only. Note the license agreement (EULA) states that modification of the software is strictly prohibited, however, reverse engineering rights are protected by the French copyright laws. I am not to be held accountable for any misfortunes this article brings you.
If you haven’t already, read Part 1 first. Here, we’ll create a patch based on the methods explained in Part 1. On a more technical side, this patcher will attempt to find certain signatures present around the critical points of code we want to modify. The tricky thing’s that we must find the right jump instructions and targets before overwriting them.

Steps to Success

Getting it to work first

Let’s use Python to prototype. I’ve set up a virtualenv for Python 2.7 and installed pwntools. This will allow us to rapidly prototype and test various signatures in the file. Fortunately, pwntools is well documented unlike some other libraries we’ll encounter soon :weary:. Let’s begin by writing the python script:

1
2
3
# File is patcher.py
from pwn import *
e = ELF('/opt/hopper-v4/bin/Hopper.bak')

Recall that a stack trace eventually lead us to a function call 0x638031, where several instructions above was the important test al, al and jz pattern:

1
2
3
4
5
6
7
8
9
10
.text:0x00000000638019 loc_638019:
.text:0x00000000638019                 call    CheckLicense             ; sub_504550
.text:0x0000000063801E                 test    al, al
.text:0x00000000638020                 jnz     short loc_63806C
.text:0x00000000638022                 lea     rbx, [rsp+88h+var_88]
.text:0x00000000638026                 mov     rdi, rbx
.text:0x00000000638029                 mov     rsi, r15
.text:0x0000000063802C                 call    sub_501110               ; Stack trace caller
.text:0x00000000638031                 mov     rax, [rsp+88h+var_88]    ; (Reported address = 1 instruction down)
.text:0x00000000638035                 mov     rax, [rax+1A8h]

Looking at other portions of the function, we see a peculiar call near the end:

1
2
3
4
5
6
7
8
.text:0x00000000638041                 call    CheckLicense
.text:0x00000000638046                 test    al, al
.text:0x00000000638048                 jnz     short loc_638063
.text:0x0000000063804A                 mov     esi, 1B7740h
.text:0x0000000063804F                 mov     edx, 1
.text:0x00000000638054                 mov     rdi, r15
.text:0x00000000638057                 call    __ZN7QObject10startTimerEiN2Qt9TimerTypeE ; QObject::startTimer(int,Qt::TimerType)
.text:0x0000000063805C                 mov     [r15+80h], eax

Note the timer object. What’s even more interesting is that the value 0x1B7740 stored into esi is 1800000 in decimal. Where 1800000 milliseconds is 30 minutes… Wait… The free version of Hopper has a 30-minute session limit and we just found when the timer is initiated! This means that we can reliably use the start timer code as a signature and a way to locate a call to CheckLicense. Moreover, the call to CheckLicense several instructions prior is conveniently located as we can filter all matches by proximity to the timer code. The specific signature that is guaranteed to be unique and present would be the mov esi, 1B7740h instruction as it’s logical to assume that no other parts of the program contain it. The encoded instruction is as follows:

1
BE 40 77 1B 00

We can then search the entire ELF file like so:

1
2
sig1 = list(e.search('\xBE\x40\x77\x1B\x00'))[0]
print 'Signature 1 located at:', hex(sig1)

Then we must locate a call to CheckLicense and obtain the address to the symbol. We can do this by searching all test al, al and see if a call has been made right before. We then take the call closest to our sig1 to maximize our chances of finding the correct symbol. We will be using this technique over and over again, locating all the critical signatures in the file. Note that there is a chance that the signature targets the wrong piece of code - that’s just a tradeoff of simplicity over accuracy. Let’s take a look at the assembly instructions versus the hex encoded version:

1
2
call    CheckLicense    E8 32 C5 EC FF
test    al, al          84 C0

What we can do is find all addresses (we’ll call this addr) of 84 C0 patterns in .text and check if addr-5 is equal to E8 (opcode for a call). Then extract the following 4 bytes which should be a rel32 address (4-byte relative address). The opcode encoding for a call is as follows (table from coder64’s x86 Opcode and Instruction Reference):

po st mnemonic op1 op2 op3 op4 description, notes
E8 D32 CALL rel16/32 Call Procedure

Now for the python code:

1
2
3
4
5
6
adr1 = 0
for addr in list(e.search('\x84\xC0')):
    if abs(sig1 - addr) <= 50 and e.read(addr - 5, 1) == '\xE8':
        adr1 = int(re.search(r'call +0x(\w+)', e.disasm(addr - 5, 5), re.M|re.I).group(1), 16)
        break
print 'CheckLicense() call to:', hex(adr1)

Great! We now have the location of CheckLicense stored in adr1! In this case, we use pwntools own builtin disassembler to extract the call location. However, when we write the final C++ patcher, it’s easier to just interpret the raw hex data directly, hence the importance of the instruction reference. We can now patch it like so:

1
e.asm(adr1, 'mov al, 1\nret')

What’s left to do now is to locate the print license dialog and overwrite the jump instructions. The first step is to search for the string Personal License\nRegistered to %1\n%2 and locate all XREFs to there. However, the only XREF to the string is in the form of a lea instruction. In fact, the lea instruction only takes an offset which we will have to manually calculate:

1
lea rdi, [eip + offset]     48 8D 3D OFFSET

Implemented in python:

1
2
3
4
5
6
7
8
sig2 = list(e.search('Personal License\nRegistered to %1\n%2'))[0]
adr2 = 0
for addr in list(e.search('\x48\x8D\x3D')):
    tmp = int(re.search(r'lea +rdi,\[rip0x(\w+)\]', e.disasm(addr, 7), re.M|re.I).group(1), 16)
    if tmp+addr+7 == sig2:
        adr2 = addr
        break
print 'Signature 2 XREF at: ', hex(adr2)

However, we’re not done as we need to locate the correct test eax, 1 instruction (it’s more distinct than a test eax, eax):

1
2
3
4
5
6
7
8
9
10
11
adr3 = 0
tgt1 = 0
for addr in list(e.search('\x83\xF8\x01\x0F\x84')):
    res = re.search(r'je +0x(\w+)', e.disasm(addr+3, 6), re.M|re.I)
    if res is not None:
        tmp = int(res.group(1), 16)
        if abs(adr2 - tmp) <= 230:
            adr3 = addr
            tgt1 = tmp
            break
print 'Location 1 verified as:', hex(adr3)

Using this, we can distinctly locate the correct test eax, eax instruction:

1
2
3
4
5
6
adr4 = 0
for addr in list(e.search('\x85\xC0\x0F\x84')):
    if abs(adr3 - addr) <= 50:
        adr4 = addr+2
        break
print 'Location 2 verified as:', hex(adr4)

And completing our patch:

1
2
e.asm(adr4, 'jmp '+hex(tgt1))
e.save('/opt/hopper-v4/bin/Hopper')

Full Python PoC Script

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
from pwn import *
from shutil import copyfile
import re, os, sys
file_loc = '/opt/hopper-v3/Hopper'
#file_loc = '/opt/hopper-v4/bin/Hopper'
# Restore backup file if exists
print file_loc
if os.path.exists(file_loc+'.bak'):
    print 'Restoring from backup...'
    copyfile(file_loc+'.bak', file_loc)
    os.remove(file_loc+'.bak')
print 'Creating backup...'
copyfile(file_loc, file_loc+'.bak')
e = ELF(file_loc+'.bak')
sig1 = list(e.search('\xBE\x40\x77\x1B\x00'))[0]
sig2 = list(e.search('Personal License\nRegistered to %1\n%2'))[0]
adr1 = 0; adr2 = 0; adr3 = 0; adr4 = 0;
tgt1 = 0
print 'Signature 1 located at:', hex(sig1)
print 'Signature 2 located at:', hex(sig2)
for addr in list(e.search('\x84\xC0')):
    if abs(sig1 - addr) <= 50 and e.read(addr - 5, 1) == '\xE8':
        adr1 = int(re.search(r'call +0x(\w+)', e.disasm(addr - 5, 5), re.M|re.I).group(1), 16)
        break
print 'CheckLicense() call to:', hex(adr1)
for addr in list(e.search('\x48\x8D\x3D')):
    tmp = int(re.search(r'lea +rdi,\[rip0x(\w+)\]', e.disasm(addr, 7), re.M|re.I).group(1), 16)
    if tmp+addr+7 == sig2:
        adr2 = addr
        break
print 'Signature 2 XREF at:   ', hex(adr2)
for addr in list(e.search('\x83\xF8\x01\x0F\x84')):
    res = re.search(r'je +0x(\w+)', e.disasm(addr+3, 6), re.M|re.I)
    if res is not None:
        tmp = int(res.group(1), 16)
        if abs(adr2 - tmp) <= 230:
            adr3 = addr
            tgt1 = tmp
            break
print 'Location 1 verified as:', hex(adr3)
for addr in list(e.search('\x85\xC0\x0F\x84')):
    if abs(adr3 - addr) <= 50:
        adr4 = addr+2
        break
print 'Location 2 verified as:', hex(adr4)
print 'Target location as:    ', hex(tgt1)
print 'Patching...'
e.asm(adr1, 'mov al, 1\nret')
e.asm(adr4, 'jmp '+hex(tgt1))
e.save(file_loc)

Improved C++ Version

Python is great for PoC scripts, not so much for deployment. C++, on the other hand, can link all your dependencies statically and deployment should go smoothly. However…

The Pain, Self-Harm and Horribly Documented Library

I’ve decided to use the LIEF library, as that was the easiest way to modify and manipulate the ELF format. Setting it up is as easy as copying the example CMake configuration and renaming the files to suit your needs. We can start by loading the ELF file:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
#include <iostream>
#include <memory>
#include <cstdio>
#include <LIEF/ELF.hpp>
int main()
{
    std::unique_ptr<Binary> binary;
    try {
        binary = std::unique_ptr<Binary>{ Parser::parse(filePath+".bak") };
    } catch(const LIEF::exception& e) {
        std::cerr << e.what() << std::endl;
        return EXIT_FAILURE;
    }
}

We can then search for our signatures in a similar mannar to python:

1
2
3
4
const Section& text = binary -> get_section(".text");
const Section& rdat = binary -> get_section(".rodata");
const uint64_t sig1 = text.search(std::vector<uint8_t>{ 0xBE, 0x40, 0x77, 0x1B, 0x0 }) + text.virtual_address();
const uint64_t sig2 = rdat.search("Personal License\nRegistered to %1\n%2") + rdat.virtual_address();

However, the processing is slightly more challenging:

1
2
3
4
5
6
7
8
9
10
11
uint64_t adr1 = 0;
for(size_t a : text.search_all("\x84\xC0"))
{
    uint64_t addr = a+text.virtual_address();
    if(abs(sig1-addr) <= 50 && binary -> get_content_from_virtual_address(addr-5, 1)[0] == 0xE8)
    {
        adr1 = (uint64_t)((int)addr+vec2int32(binary -> get_content_from_virtual_address(addr-4, 4)));
        break;
    }
}
std::cout << "CheckLicense() call to: 0x" << std::hex << adr1 << std::endl;

To understand what’s going on, we need to again return to the raw hex encoded instructions:

1
2
3
call    CheckLicense    E8 32 C5 EC FF
test    al, al          84 C0
                        ^ addr

When we search for references, the resultant addr really references the 84 byte. That means that addr-5 references the E8 byte, which is the opcode for a call instruction. The following 4 bytes after E8 is an offset from the current position. Note that Intel x86 is in little-endian, which means 32 C5 EC FF really encodes for the integer FFECC532. So we obtain the beginning of the offset at addr-4 and decode the sequence to be an integer. We then add it back to our addr to obtain the final destination of the call.
The code for vec2int32 is quite trivial:

1
2
3
4
5
6
7
int vec2int32(std::vector<uint8_t> v)
{
    int res = 0;
    for(int i = v.size()-1; i >= 0; i--)
        res |= v[i] << (i * 8);
    return res;
}

As is porting the rest of our python script. Note that before, we forgot to patch and overwrite the annoying Demo Version watermark. We’ll do that here:

1
2
const uint64_t sig3 = rdat.search("Demo Version") + rdat.virtual_address();
binary -> patch_address(sig3, std::vector<uint8_t>{ 0,0,0,0,0,0,0,0,0,0,0,0 });

This also shows how to patch the binary given the virtual address. One headache we’ll encounter is encoding the instruction for:

1
e.asm(adr4, 'jmp '+hex(tgt1))

Notice how we avoided using a disassembler/assembler by decoding the instructions ourselves. Here, we’ll encode this ourselves with extra nops near the end (as padding) for good luck.

1
2
3
4
5
6
7
8
9
10
11
std::vector<uint8_t> getbytes(uint64_t off)
{
    std::vector<uint8_t> v;
    v.push_back(0xE9);
    v.push_back(0xFF & (off));
    v.push_back(0xFF & (off >> 8));
    v.push_back(0xFF & (off >> 16));
    v.push_back(0xFF & (off >> 24));
    v.push_back(0x90);
    return v;
}

With that, we can patch like so:

1
binary -> patch_address(adr4, getbytes(tgt1-adr4-5));

Results (for real now)

To avoid spoonfeeding you too much code, try porting the python version over to C++. It should be really trivial by now as I already clarified and resolved many of the major hurdles.

Download compiled patcher for Linux:   Linux (Base64 Encoded)  |  Linux (Binary)

If your browser only supports downloading of the Base64 encoded file, run this to convert it back to a binary:

1
cat Patcher.txt | base64 -d > Patcher

If you’re on a Mac, I’m sorry ¯\_(ツ)_/¯

To run the patcher, simply execute:

1
2
sudo chmod +x Patcher
sudo ./Patcher

What’s Next?

In part 3, we’ll attempt to explore more ways to patch Hopper! Until then, PEACE OUT.

0%