Thursday, September 8, 2016

Labyrenth CTF 2016 Documents track challenge 1

This post was written to document some of my things I learned from analysing the word documents from the Labyrenth CTF 2016. Please leave me comments or questions if you have any. :)

This post has two sections, the first section focused on an interest observation on the challenge document. The second focused on solving the challenge.

Anti Forensic Trick
While solving the challenge, I noticed the document could only be executed once. After a single execution, the document failed to work. I also noticed the file size reduced after a single execution.

I went on to investigate why. I noticed the following lines of code.

The above macro is executed when the document is open. However, it only proceed to execute the function "BkAIuNwQNDkohBY" only if the variable "ppKzr" is NOT "toto". "BkAIuNwQNDkohBY" contained the key functions of the code. If "BkAIuNwQNDkohBY" is executed, the variable "ppKzr" is overwritten with "toto".

I compared the hexdump of the documents before and after execution.

The long string stored in "ppKzr" was overwritten with "toto" which resulted in the document reducing in size.

With the long string overwritten the document could not be executed. Not only that we would not be able to de-obfuscate the strings in VB code. The long string is used as a dictionary in the de-obfuscation function "QklkhFEQNB". This 'trick' could be used by malware writer to prevent analysis from being done on the document after it is executed.

Solving the challenge
First, I choose to dump the macro using various tools. This blog post by Lenny Zeltster contained a number of the tools. The following diagram showed how I used olevba.

After reading the code I realised many of the strings are obfuscated. Then I made use of the debugging function in Visual Basic for Applications (VBA) in Word 2010. The short cut to activate it is Alt+F11.

From the code, the function "QklkhFEQNB" was called many times to de-obfuscate strings. Without going into the details of how the function worked. I used added "MsgBox" to dump out the strings after de-obfuscation was done.
After executing the document, the MsgBox would dump the de-obfuscated strings.
From this URL, we can notice its going to "". The long string seems to be base64 encoded. Given a clue "b64" from the URL. After decoding the string, it seems like a long string of hex values. Using the next clue "x58". XOR was performed on the string and it decoded into the flag.

Monday, August 15, 2016

Labyrenth CTF 2016 Windows track challenge 1

For this CTF, I had a goal other than completing as many of the Windows challenges as possible. I wanted to explore different ways to solve the challenges other than using usual dynamic analysis with debuggers and state analysis with disassemblers.

Do feel free to leave comments and questions, I would try my best to address them. Do point out any errors you noticed. Thanks!

For Labyrenth CTF 2016 Windows track challenge 1, I used a binary analysis tool called Angr. Yes its spelled A-n-g-r, anger without the 'e'. I will walk through how I solved the challenge and focused on how Angr was used. The key takeaway for this challenge for me was on learning how to use Angr. For readers who just want to know how I used Angr, just skip to step six.

The first step was to find out as much as possible about the challenge binary provided. I used the Detect It Easy (DIE) tool and noticed the binary was packed using UPX.

The second step was to unpack the binary and locate the Original Entry Point (OEP) of the unpacked codes. The OEP in the PE header of the binary belonged to the packing codes. Using the Ollydbg debugger I located the 'ar jmp; at the end of the packing code. After executing the far jmp instruction the EIP would point to the OEP of the unpacked code.

Before debugging the binary in a debugger, a little trick I learnt was to remove the ASLR flag in the binary. This allowed the binary to be loaded in the same base address during every execution. This eases the references made to static analysis tool such as IDA Pro. Do note some malware may not execute if they detected modifications to PE header.

I used CFF explorer to modify the "DLL can move" or ASLR flag under Optional Headers-> Dll Characteristics. Remove the tick next to "DLL can move" to disable ASLR for this binary.

After removing the ASLR flag I debugged the binary in Ollydbg debugger and located the "Far Jmp" instruction.

The reason I know to look for this far jmp was due to DIE detecting the use of the UPX packer. This packer  typically consisted of a far jmp instruction at the end of the packing code. Do note this may not always be the case. Malware writer can write codes to fool tools like DIE. I placed a breakpoint at the far jmp instruction and let the packer code unpack the packed codes. I would step one instruction after the far jmp instruction to locate the OEP of the unpacked code.

In the third step, after locating the OEP of the unpacked code, now I would dump the unpacked code for static analysis. I used the Scylla tool for this step. I would need to set the OEP as starting address of the unpacked code. Then locate the Import Address Table (IAT) of the unpacked code. Due to packing, the IAT was 'destroyed'. Using the "IAT Autosearch" function, Scylla could locate the IAT and rebuild it for ease of analysis in IDA Pro. Next was to Dump the unpacked binary and then Fix the dump with the IAT found by Scylla.

The fourth step involved static analysis using IDA Pro. Now using IDA Pro we could understand the code flow of the unpacked code. To quickly zoom into interesting functions to analysis in malware analysis is very important. We do want want to analysis the code from the start to the end. We could using strings in IDA Pro to locate strings that could point to interesting functions. Error messages in binaries 'might' provide clues to these strings. We executed the binary and it displayed the following messages.

We also located the same messages in the strings of the binary. These strings were original obfuscated due to the packer.

Using the strings of error message we discovered interesting functions to analyse.

All the strings pointed to a single interesting function!

From the call graph, I observed user input that was stored into a 0x100 bytes local variable being passed as argument into the Sub_109110B function. The Sub_109110B function would likely be the validation function to decide if the user input is correct. This is where we would focus our effort on for the next steps.

The fifth step would focus on the analysis of Sub_109110B function. The call graph showed the beginning of the function when it load local variables with lots of data. This a common trick by malware writer to use local variable to hide data from tools such as strings.

Next we try to understand what the function was trying to do. We notice a check of the length of user input using Strlen function against 0x10. Next was a loop that would be executed 40 times. Inside the loop was 4 functions that was called. The return value of all 4 functions were used to determine the code flow. If the return value of any of the 4 functions was not zero, the loop would exit before executing 40 times.
Diving into the 4 functions, I discovered the anti debugging checks.
1. CheckRemoteDebuggerPresent function
2. Find "Olldbg" window
3. IsDebuggerPresent function
4. RDTSC instruction
An easy way to defeat the debugging checks was to patch the return value of the 4 functions containing the checks. The patch to AL register value to zero during run time using debugger would allow the loop to be fully executed (40 times).

A check after the Strlen function seems to indicate the user input string had to be at most 16 bytes long. However, the loop executed 40 times. The return value of the Strlen had to be patched to allow the loop to finish execution.

From the code flow, I observe each character of the user input was manipulated before being validated. It seemed the user input was encrypted using a stream cipher before comparing with the encrypted data store in the local variables.

The sixth step was to figure out the correct user input to display the message "Well done! A+! You get a gold star!" I considered several options:
1. Brute force the user input using Winappdbg to inject user input string and patch anti debug checks
2. Brute force the user input against an implementation of the validation function Sub_109110B using python
3. Brute force the user input against a revered implementation of the validation function using python.
4. Using Angr !

I am not an expert on using Angr, I had help from my friend Ronald who spent a lot of time on reading the code in Angr and he even fixed some of its bugs.

A few key considerations before using Angr.
1. Angr's Windows loader could be buggy.
2. Angr made us of the VEX simulation engine which had limited abilities in simulating Windows APIs.
3. Need to determine the state of memory before the start point of analysis
3. Need to determine the start point of the analysis
4. Need to determine the constrains to the user input
5.Need to determine the patches to the anti debug checks
6. Need to determine the address of paths to avoid
6. Need to determine the target address of the analysis

1. Angr's Windows loader could be buggy.
Angr might not work for all windows binaries I guess this comes with practice and experience to determine which binaries works. For this challange, Angr worked and now I am doing this write-up.

2. Angr made us of the VEX simulation engine which had limited abilities in simulating Windows APIs.
Angr works best in analyzing codes without Windows APIs. I inserted hooks to 'remove' Windows APIs and patched the return value.

Patch the Strlen function to always return 16 to pass the check:

Patch the security cookie check function:

3. Need to determine the state of memory before the start point of analysis
3. Need to determine the start point of the analysis

The start point of the analysis would be the address when the user input string would be passed into the Sub_109110B function. The user input would be setup as an symbol value of 40 bytes (loop 40 times) ending with a null byte. The pointer to the user input would have to be pre-loaded into register ecx due to instruction "Push ECX" which load the pointer to user input as argument to Sub_1099110B.

4. Need to determine the constrains to the user input

User input print was determined to be printable ASCII ending with a null byte which was set as an constrain.

5.Need to determine the patches to the anti debug checks

Hooks were placed to return zero for the 4 functions of anti debugging

6. Need to determine the address of paths to avoid
6. Need to determine the target address of the analysis

Finally, set the address of code when the validation is correct and address when validation is wrong.

After correcting the bugs in my python script, I used Angr to solve the correct user input for challenge 1.