DamCTF 2020 Malware Challenge - Phase 2
This is a part of a series of writeups for a malware challenge I made for DamCTF 2020. Please see here for the overview.
Phase 2
Nice work finding that suspicious file! Using IcyRetina’s extensive collection of Yara rules, you were able to identify that the sample is definitely malicious, but very little is known about it. Can you extract the config from the malware?
There is a chunk of encrypted data that is accessed early on in the malware’s execution. That is the the malware’s config.
Phase 2 required players to successfully extract the ELF from the PCAP in Phase 1. If you are following along and would like to skip the file extraction, you can download the ELF here.
To make it easier to start reversing, there are only two functions exported from the library: libinit
and libmain
.
Please note: In order to make it easier for readers to follow along with the analysis, I won’t be renaming any of the symbols in the analysis. However, I strongly suggest that as you work through a binary and figure out what something does (or even just a hunch or something interesting) that you rename the function to reflect it. It will make your life much easier, trust me.
void libinit(void)
{
time_t tVar1;
tVar1 = time((time_t *)0x0);
srand((uint)tVar1);
FUN_00101bbf();
FUN_0010307c(&DAT_00106c60);
DAT_00106c40 = 1;
return;
}
libinit()
is a simple function that configures the random seed, calls a couple functions, and sets a global variable. Let’s take a look at FUN_00101bbf()
:
void FUN_00101bbf(void)
{
int local_c;
local_c = 0;
while (local_c < 10) {
if (((&DAT_001061a0)[(long)local_c * 0x32] & 4) != 0) {
FUN_00101a19(PTR_DAT_00106398,&DAT_001061a0 + (long)local_c * 0x32,
&DAT_001061a0 + (long)local_c * 0x32);
(&DAT_001061a0)[(long)local_c * 0x32] = 8;
}
local_c = local_c + 1;
}
return;
}
This function loops through an array of data at DAT_001061a0
in 0x32 byte chunks, and if chunk[0] & 0x4 != 0
, it calls FUN_00101a19()
, passing a pointer to some data, and then two copies of the array chunk.
If we take a look at FUN_00101a19()
, we realize that there are only two arguments to the function (which makes sense, because args 2 and 3 were identical):
void FUN_00101a19(uchar *param_1,long param_2)
{
uchar uVar1;
int iVar2;
uchar *indata;
uchar *indata_00;
long in_FS_OFFSET;
int local_434;
RC4_KEY local_418;
long local_10;
local_10 = *(long *)(in_FS_OFFSET + 0x28);
indata = (uchar *)calloc(0x31,1);
indata_00 = (uchar *)(param_2 + 1);
RC4_set_key(&local_418,0x20,param_1);
RC4(&local_418,0x31,indata,indata);
local_434 = 0x30;
while (-1 < local_434) {
iVar2 = (int)(local_434 + (uint)indata[local_434]) % 0x31;
uVar1 = indata_00[local_434];
indata_00[local_434] = indata_00[iVar2];
indata_00[iVar2] = uVar1;
local_434 = local_434 + -1;
}
RC4(&local_418,0x31,indata_00,indata_00);
free(indata);
if (local_10 != *(long *)(in_FS_OFFSET + 0x28)) {
/* WARNING: Subroutine does not return */
__stack_chk_fail();
}
return;
}
This function uses the RC4 capabilities in OpenSSL to do some sort of crypto on the data passed by param_2
, and uses param_1
as the key. The data is shuffled like a stream cipher, and then run through RC4 to be decrypted. Here is my Python re-implementation of the decryption function:
def fun_00101a19(key, chunk):
cipher = ARC4.new(key)
indata_00 = list(chunk[1:])
indata = b"\x00" * 0x31
indata = list(cipher.decrypt(indata))
for i in range(48, -1, -1):
iVar2 = (indata[i] + i) % 49
uVar1 = indata_00[i]
indata_00[i] = indata_00[iVar2]
indata_00[iVar2] = uVar1
return cipher.decrypt(bytes(indata_00))
Now that we know how to decrypt the config, we need to know where the key and the encrypted config is in the binary. Looking back at FUN_00101bbf()
, we know that PTR_DAT_00106398
is a pointer to some data:
If we look at DAT_00104010
, we see a 32 byte, null terminated string. This looks like our key value!
Now we need to locate where the encrypted data is. FUN_00101bbf
uses DAT_001061a0
for the second argument as the base of the array, so let’s check there to make sure it’s the config block:
It looks to be a lot of encrypted data, and starts with a 0x4
byte, which aligns with the check done at the beginning of FUN_00101bbf
.
In order to properly automate the config extraction, we need to know where the bytes actually are in the binary file, rather than the segment address in memory or something else. Luckily, this is easy to determine with Ghidra (at least for this binary). The data segment addresses shown in Ghidra can be subtracted from 0x101000
to get the actual value in the binary. For example, data at 0x1061a0
in Ghidra is physically located at offset 0x51a0
in the binary file.
With all of this knowledge on the config and how it is encrypted, a script can be written to extract each chunk and decrypt it. Here is my final solution script for this phase:
from Crypto.Cipher import ARC4
import sys
# read in malware
with open(sys.argv[1], "rb") as f:
data = f.read()
# get data
config_block = data[0x51a0:0x5394]
key_len = 32
config_key = data[0x4010:0x4010+key_len]
chunk_len = 50
# chunk[0] & 0x4, encrypted
# chunk[0] & 0x8, decrypted
def fun_00101a19(key, chunk):
cipher = ARC4.new(key)
indata_00 = list(chunk[1:])
indata = b"\x00" * 0x31
indata = list(cipher.decrypt(indata))
for i in range(48, -1, -1):
iVar2 = (indata[i] + i) % 49
uVar1 = indata_00[i]
indata_00[i] = indata_00[iVar2]
indata_00[iVar2] = uVar1
return cipher.decrypt(bytes(indata_00))
for i in range(len(config_block)//chunk_len):
plaintext = fun_00101a19(config_key, config_block[i*chunk_len:(i+1)*chunk_len])
key = plaintext[:4].decode()
val = plaintext[4:].decode().split("\x00")[0]
print(f"{key} = {val}")
When we run that script, we are able to view the config for the binary, along with the flag:
$ python3 phase2.py libmal.so
stky = 7a9d6fad3798a7867a9d6fad3798a786
cont = bhuwehobiwsnbqpxnws.damctf.xyz
jgie = google.com
slti = 300
xvee = facebook.com
flag = dam{1m4g1n3_m4k1ng_w1nd0ws_m4lw4re_lma0}
ehbn = amazon.com
bnwe = microsoft.com
stiv = f83646fad02d42e6
port = 3613