CS8803 - O11
Overview
● Goal
○ Analyze real world malware samples and reveal their hidden/true behaviours which normally you don’t see without triggering properly.
● Learning Objectives:
○ Reverse Engineering
■ Manual Binary Reversing through Disassemblers (e.g. Ghidra, Radare2, IDA Pro)
○ Static Analysis
■ Programmatic binary analysis (e.g. data-flow analysis, VSA analysis.)
○ Symbolic Analysis
■ Performing symbolic execution to figure out the whole input or partial of the input
○ Dynamic Trace Analysis (User-level Activity Traces)
■ API Trace (e.g. File, Process, Registry, Network etc. related API trace)
■ BasicBlock Trace (Address of the basic blocks executed while malware is running)
○ Dynamic Binary Instrumentation
■ Manipulating concrete execution using DBI tools (e.g. DynamoRIO)
Please don't run the malware on your own computer! We are not responsible if you do. Only execute it inside the Windows VM we are provided. These are real world malware samples.
Task Description
● You have 3 real world malware to analyze. Your boss wants you to analyze them and write a report within two weeks, and your colleagues are waiting for your results to adjust defense mechanisms to IDS.
● As a malware analyst, to build a strong defense mechanism, your job is to try to reveal all of the behaviours as much as you can. However, most of malware (especially C2-based malware) are only revealed hidden behaviors when the proper inputs (e.g. commands from C2-server) are provided.
● Now, you are going to analyze the three real malware samples with your best weapons (the tools we provide). You are going to perform:
○ Static analysis and reverse engineering on them to find where the triggering logic (e.g. CMD dispatching logic) is located.
○ Symbolic execution to figure out the commands that will trigger hidden behaviors
○ Dynamic binary instrumentation to manipulate execution paths to trigger the hidden behaviors and to collect traces that are corresponding to the behaviors.
Agenda
● Part 1: Analyze malware1.exe(mydoom1.exe) malware which is similar to previously analyzed malware. Trigger the malicious behaviours and observe the effects on the OS. (60 pts)
○ Skills :
■ Reverse Engineering,
■ Static Analysis
■ Symbolic Analysis
■ Limited Dynamic Analysis
Agenda
● Part 2: Analyze malware2.exe(win33.exe) malware which is new to your company. Trigger the malicious behaviours and observe the effects on the OS. (30 pts)
○ Skills
■ Reverse Engineering,
■ Static Analysis,
■ Symbolic Analysis
■ Extensive Dynamic Trace Analysis
■ Dynamic Binary Instrumentation and Execution Modification
Agenda
● Part 3: Analyze one newly discovered complex malware(unknown.exe) and figuring the CMD Dispatching Logic without having prior investigation.
○ Static Analysis (analyze 10 functions) (10 pts)
■ Reverse Engineering
● Bonus: Perform further sophisticated analysis and reveal the malware behaviour:
○ Detecting whole or part of the input (5 pts)
■ Reverse Engineering
■ Symbolic Analysis
○ Triggering the malicious behaviour and observe the OS interaction (5 pts)
■ Dynamic Binary Instrumentation
■ Dynamic Trace Analysis
Note: Even if you can’t figure out any part of the malware,
you can still include any meaningful efforts of revealing the behavior of the malware. We will assign credit accordingly.
High-level Workflow
● Your company has a static analysis tool leveraging Ghidra (developed by the NSA) to detect the important logic such as CMD Dispatching logic which is responsible for determining which malicious behaviour to perform.However, this tool only produces some candidate functions,
● Your first task is to perform Reverse Engineering using Ghidra to detect the real CMD Dispatching logic. To do so, you need to analyze each candidate function until you find the actual CMD Dispatching logic.
● Second, after discovering the CMD Logic, you are required to construct symbolic execution scripts using angr which will output whole or partial inputs for each behaviour.
● Third, you have to properly trigger the malicious behaviours and execute them during concrete execution. The Dynamic Execution Trace will show the malicious activities performed by the malware samples.
Available Artifacts (Free Lunch?)
● Last week, One of your colleagues analyzed a sample malware (sample.exe) last week and created a report.
● That malware looks very similar to the malware malware1.exe which is one of the three malware samples you will analyze for the next two weeks.
● That report is quite informative.
● Fortunately, the report is open to your access and you can take advantage of your colleague’s report and findings.
● Having said that, the report lacks of how to actually execute the behaviour. Your colleague was failed to show the effects on OS such as file system or network operation. Hence, the colleague was failed to prove the nature of the malware sample.
Tutorial (your coworker’s report)
Let’s scrutinize the report for sample.exe.
Before you begin
● Set up a Virtual Machine for Malware analysis
○ Please install/update to the latest version of VirtualBox
■ https://www.virtualbox.org/wiki/Downloads
● Download the VM image
○ Download the project VM from the following link ■ https://drive.google.com/open?
id=1a_1U2UQKQ0268NApFnFHfV2HETZZEu4Z
■ Note: students have the freedom to adjust VM specs after import
Before you begin
● Open VirtualBox
○ Go to File -> Import Appliance.
○ Select the ova file and import it
● VM user credentials
○ Ubuntu VM password: 123456
○ Windows 7 VM Password: 123456
Project Structure
● VM Structure
○ A Windows 7 VM inside Ubuntu 18.04 VM
■ Windows 7 VM for dynamic analysis
● Target Malware can be executed in Windows 7 VM
■ Ubuntu VM for static analysis
● A container to block malware from going out
Project Structure
● In the Windows Virtual Machine(VM)
○ C:codeconcrete_executor
■ $python run.py ##will run the tracer program
■ racer ## tracer folder(where you need to write some code)
○ C:codedynamorio
■ Dynamorio source code and already built client
○ \VBOXSVRVM_Share
■ Shared folder with host VM
● Go to Ghidra folder and Run Ghidra
■ $ ./ghidraRun
● Use “import file” to import binary into Ghidra project
● Double click on the binary file that appears under the project to open Ghidra GUI and analyze the binary. Then close the GUI and Ghidra
completely to save the result into Ghidra database
● There will be an warming about PDBs, which won’t affect your analysis
● In order to have some basic knowledge of the malware, you usually start by analyzing it using a disassembler like Ghidra, Radare2, BinaryNinja or IDA pro, while Ghidra is free and open source (NICE).
● Meanwhile, lucky enough that your coworker sent you a static analysis tool based on Ghidra, which detects the possible dispatching logics. Then all you need to do is to iterate over the candidates and determine the real dispatching logic.
● To use the provided Ghidra Script
● Close Ghidra project GUI
● In /ghidra_9.1.1_public folder, go to /support, you will find a file called: analyze.sh
● This wrapper script runs the ListFunctionAsCodeBlock.java script with a headless analyzer and pipes the output into output.txt
● Usage: $ ./analyze.sh {malware.exe}
■ (just the file name is enough, no need to append PATH)
● To be able to run the script, you MUST first analyze the malware in Ghidra and save the result into Ghidra database
● Note: Ghidra must be closed completely, in order to run the headless script
● The script will output top 50 longest chain of basic blocks that uses certain variable in the end.
● To view these chains, run $tail -53 output.txt
● Reopen the Ghidra GUI
● Inside the Symbol Tree, you can find a folder called Functions.
● The first function in your coworker’s Ghidra script’s output is FUN_00804d32, so you decide to double click it
● The Decompile window on the right gives you the decompiled version of the function that you just double clicked. And you could check the logic of this function in a more readable form
● You notice that there is the dispatching logic between line 25 to line 31 (lucky!)
● You want to see a clearer form of the control flow graph, so you click on the Display function graph button and the CFG pops up.
● After knowing some possible information about the dispatching logic, you want to construct symbolic execution script using angr to output whole or partial inputs for each behaviour.
● Note: Your script should run with python3
● In essence, You want to find some keywords that will trigger the malware’s activities
● Let’s take another look at our CFG for the Dispatching Logic
● This chain of basic blocks(0x804d6a, 0x804da2, 0x804db6, 0x804dca) is responsible for the dispatching logic.
● The chain uses a function to determine the triggering logic -- strstr.
● All the strstr functions takes the EAX register as a pointer to the first parameter and some constant as a second parameter.
● Since all the second parameters are constants, we should focus on the first argument which is pointed by EAX.
● If you go backward from the chain of basic block, you can see EAX value is determined in basic block 0x804d6a.
● The call instruction to lstrcpyA at 0x804d79 will copy the buffer pointed by ESI to the address held by EAX -- EAX being the first argument and ESI being the second argument for lstrcpyA.
● However, the source of the copy operation is also another register.
● We need to perform further static backward slicing to detect the original source of the command.
● While going backward in the CFG, you will see the last update to the ESI register is performed at 0x804d44 with mov ESI, [EBP + 8] .
● In x86 calling conventions (e.g. cdecl, stdcall), EBP+8 points the the first argument passed to the function.
● Hence, we now know the symbolic analysis needs to solve the value of the first parameter (EBP + 8) passed to the function 00x804d32.
● Now, we know where the variable that is passed to the malware resides, and can find keywords that reach the target function in the malware!
● Fortunately, your colleague also had a symbolic execution script for sample.exe (NOTE: sample.exe is similar to malware1.exe!).
● The report also shows the start and end addresses for the symbolic execution -- start:0x804d32 end:0x804dde
● From your terminal go to ~/Desktop/symbolic_execution/ folder and run ○ python sample_inputs.py --start 0x804d32 --end 0x804dde
● It will print out one of the concretized inputs that malware requires to expand its malicious behaviour.
Tutorial - Symbolic Execution
● However, there is no free lunch.
● Using the same start and end addresses, you need to find all the commands.
○ Let’s analyze the code snippet below. When performing symbolic execution, symbolic execution engine can find the path condition for ddos_attack as message[0] = 101 ○ Once we already explored ddos_attack, you need to explore other branches now.
○ To explore other branches, you need to add constraint such as message[0] != 101, hence the path conditions for ddos_attack becomes infeasible.
○ Once, you are done with send_spam, keep adding the condition inverses to explore other behaviours.
○ Copy sample_inputs.py as malware1_inputs.py, and modify the code.
○ Add a constraint for each found input such as
■ parameter[0] != found_input[0]
○ Extra hint: you should use
○ an inverse_constraints array to output all strings at one
○ run
Tutorial - Symbolic Execution
Extra Help for Angr:
Angr’s Solver Engine: Claripy && How to add constraints
https://docs.angr.io/advanced-topics/claripy
https://docs.angr.io/core-concepts/solver
Tutorial - Dynamic Analysis
● Now, it is time to practice Dynamic Binary Instrumentation and Dynamic Trace Analysis.
● For this purpose, we already have a DynamoRIO client to capture the API traces and Basic Block addresses captured during concrete execution.
● Your colleague’s report also shows how to use and manipulate the values of the function arguments.
● The report shows the execution successfully reaches to the 0x804dde.
● However, your colleague again failed to capture and prove what malware is doing.
● We need a secure experiment environment to execute the malware.
● Why?
○ Insecure analysis environment could damage your system
○Encrypting your file during a ransomware analysis
○Infecting machines in your corporate network during a worm analysis
○Creating a tons of infected bot client in your network during a bot/trojan analysis
● The solution:
○ Contain malware in a virtual environment
■Virtual Machine
■Virtual Network
● Conservative rules(allow network traffic only if it is secure)
● Run $sudo iptables -A OUTPUT -o enp0s3 -j DROP:
○ DROP all -- anywhere anywhere
● You can check if the rule exists by typing
○ $sudo iptables -L
● Keep the firewall rule to prevent malware traffic going out.
● Although static analysis could give a lot of information about the malware, it’s important for malware analyst to see how the malware behave when it’s executed
● Your next task is to use Dynamorio to perform dynamic analysis.
● With provided code in C:codeconcrete_executor
● You can feed the malware different input to see what the malware will do
● Run Windows 7 VM in the VirtualBox in ubuntu VM
● Password: 123456
Tutorial - Dynamic Analysis
● You are given a snapshot (Base) to act as a backup for your analysis
● If something bad happens on your Windows7 testbed, always revert back to the basecamp snapshot.
● Basic Structure:
■ C:codeconcrete_executor
● Directory contains Dynamorio scripts
● run.py: python script to run the Dynamorio server
■ \VBOXSVRVM_Share
● Shared folder with host ubuntu VM
● You can find VM_Share on ubuntu desktop
● In VM_Share
■ Malware/
● where you need to put the malware sample
■ analysis/
○ mw.analysis : dynamorio task assignment file
○ sample/ : sample folder
● sample/: contains a copy of sample.exe and trace files that dynamorio returned by running the sample.exe in windows VM
● To assign a task to the Dynamorio script server, first write the following into mw.analysis file:
trace release only_config_libcalls {binary name} {starting address} {folder name}
e.g. trace release only_config_libcalls sample 0x800000 sample
(as the program starts from 0x800000)
● To run the Dynamorio server: open WIN7 VM
● open command prompt with admin privilege
● Search “cmd” and right click on the cmd
● Select “run as administrator”
● Go to C:codeconcrete_executor
● Run > python run.py
● No need to worry for some copy failed error messages in the output, It’s for other purposes. As long as you can see trace file copied over to the shared folder, the tracer program should be running correctly
Tutorial - Dynamic Analysis
• After the script finished running, it wrote “DONE” back in the mw.analysis file.
• If you want to re-analyze the same task, Delete “DONE”, go back to WIN7 VM and execute python run.py in C:codeconcrete_executor
● The trace file will be copied back to the shared folder where the malware sample sits
● Now that you obtained the trace file, you need to analyze the malware traces to find out how to trigger the behaviors
● There will be some break-points set by Dynamorio around certain behavior. Check out the code block that contains the actual BRK address in Ghidra and find a way to lure the malware into your desired path.
● Analyze the trace files to see the default behavior of malware sample.
● while checking the first trace file, we found that after sample.exe thread inits, it invoked few kernel operations to manage the key and file and then created three more threads
● As you can see, after threads have been c r e a t e d , t h e m a l w a r e c a l l e d
GetModuleFileNameA and started tons of ReadFile operations
● It should be an indication that this malware is trying to find/read from certain file/ directory/socket in the system!
● Details for win32 api calls please see: https://docs.microsoft.com/en-us/windows/ win32/api/fileapi/nf-fileapi-readfile
● At the end of first trace file, the malware creates 4 socket connections. After recv returns, one more thread is created and it then issued a send.
● Em, Interesting internet connection found here. Maybe it’s worth digging into in the other trace files for break points to see what they are really doing.
● The rest of the trace files contain a more detailed version of traces that have breakpoint address set both before and after the kernel operations you found in the first trace file. Now it’s the time, you should go back to static analysis to check the break point address in the disassembler to see what the malware is doing at that time.
Tutorial - Dynamic Analysis
● You can also use Ghidra to find and analyze all the breakpoint address to see the actual trace where malware execute through
● If necessary, find the code block and the corresponding checking instruction that is preventing the malware from going further to the code blocks containing c2-command (you should already know the target function that contains the dispatching logic from static analysis and the triggering command that you need to feed into the malware to activate the behavior)
Tutorial - Dynamic Analysis
● How can you wrap the data to define what its value is?
● Add following two functions in C:
codeconcrete_executor racerlibcall_handler.cpp with their declaration in .h file:
static void wrap_pre_target(void *wrapcxt, OUT void **user_data) { char *buf = (char *) drwrap_get_arg(wrapcxt, 0);
strcpy(buf, “[Symbolic Execution Generated String]” );
} static void monitor_target_function(void *drcontext) { app_pc tgt_function = (app_pc) 0xADDRESS;
drwrap_wrap_ex(tgt_function, wrap_pre_target, NULL, NULL, 0);
}
● Read more about how to wrap functions in Dynamorio using this link
Tutorial - Dynamic Analysis
● Check the given skeleton code for more insight on how to feed malware the correct input
● In order to compile your edited code, execute build.bat in the Windows VM
● Once you build the code, reset the Dynamorio task by deleting the “DONE” in mw.analysis and run >python run.py in C:
codeconcrete_executor
● Dynamorio will give back new traces.
● Analyze the traces with Ghidra to see if any more barriers exist that prevent malware going to the target function
● Record any meaningful behaviors in the Excel sheet along the way.
Explore all behavior for malware1.exe and malware2.exe
Tutorial - Dynamic Analysis
● For malware2:
○ In libcall_handler.cpp
■ Modify wrap_pre_target(void *wrapcxt, OUT void **user_data)
■ monitor_target_function(void *drcontext)
■ Inside wrap_pre_lib(),
(hook the internet api call to emulate attackers command)
○ In tracer.cpp
■ You are encouraged to add code in instr_should_instrument() and event_app_instruction() to manipulate malware instructions to pass or bypass certain condition check.
■ Helper link: http://dynamorio.org/docs/API_BT.html
Summary
● You are given three malware samples:
■ malware1.exe (mydoom1.exe)
■ malware2.exe (win33.exe)
■ malware3.exe (unknown.exe)
● We suggest you analyze in the order as following:
■ malware1.exe, malware2.exe, malware3.exe
● We also provided you with angr skeleton script sample_inputs.py in ubuntu VM to better analyze the behavior, you can check out the code and add more to it
● None of the tools provided will promise to work on this one. Just try your best.
Additional Help for Malware 3
As you already knew/will know that Malware 3 is obfuscated by the malware author, which is most of the case in real day life. One of the most important thing for Malware Analyst is to recover the real payload of malware.
Considering your life and happiness, we deobfuscated the first two malware(free lunch here:>) and left the third intact malware sample that we collected from the network for you to learn some deobfuscation.
Obfuscation:
https://www.mcafee.com/blogs/enterprise/malware-packers-use-tricks-avoid-analysis-detection/ https://www.vadesecure.com/en/malware-analysis-understanding-code-obfuscation-techniques/ Deobfuscation:
https://www.youtube.com/watch?v=4VBVMKdY-yg
Resources
● Ghidra:
■ https://vimeo.com/335158460 (1h thorough introduction video)
■ Other tutorials available online
● Dynamorio
■ http://dynamorio.org/tutorial.html
■ https://github.com/DynamoRIO/drmemory/tree/master/drltrace
■ https://github.com/mxmssh/drltrace
● Angr
■ https://docs.angr.io/core-concepts