Starting from:

$105

CS6264 Lab 7 Solution

Lab 7: Supplementary Information
CS 6264-OCY
"...and we aim to show stronger improvements starting next fiscal quarter." This all-hands call is taking forever, you think to yourself while daydreaming about where you should go for your holiday in two weeks.
"...will be taking the lead with our next client. We
expect great things from them."
Wait. Did your boss just say your name? "The client is an IDS vendor whose product uses machine learning models to identify malware. However, they have noticed that their models are frequently evaded and hope that we can find out why."
You don't remember being told about this, but at this point, you guess you're used to it. You're just thankful your coworker was taking notes
Actions during the meeting and he gave you some
Actions tutorials
Actions
on MLSploit, a framework you are expected to use. And later, you found some information about the attack.
Assignment
The purpose of this assignment is to gain experience with training machine learning (ML) and deep learning (DL) models classifying Windows portable executable (PE) malware into families.
Specifically, the models will be given two different datasets: benign PE files and malicious PE files from multiple families. After training a DL models, you will attack those models using an evasion attack called the Mimicry Attack. Then, you will be tasked with improving the models which were attacked. Finally, you will train a ML model using different features and see if the mimicry attack still work. You will write a report about your experiences and observations.
There are 5 tasks and a bonus task you will need to complete for this assignment. They include:
• Training DL Models (10%): Train LSTM, CNN, and RNN models on API call sequences.
• Attacking DL Models (10%): Attack models via mimicry attack.
• Detecting Attack (20%): Train model based on static features to detect the attack sample.
• Training ML Models (10%): Train classical ML models on API call existence, frequency, and arguments.
• Transferring Attack (10%): Run the mimicry attacks in controlled environment and evaluate the ML models.
• Attack ML Model using RL (10% bonus): Train ML model using ember and train RL model to evade the ML model.
You will also need to compile a report (40%) that should contain screenshots of your findings and explanations for why the certain screenshot happened. For example, if your screenshot is comparing the results of how well different models detected the attack in Task 2, then an explanation for why the results differed should be included. To complete the tasks, you will also need these files Download these files.
Supplementary Material:
Lab 7_Supplementary_Material.pdf
Actions
, Task 3 TemplateDownload Task 3 Template Deliverables
Compress the deliverables for each task into a
.tar.gz file called [GT
Username]_cs6264_lab07.tar.gz with the following directory layout:
• task1/ o pe.model.zip o prediction.zip
o *.log.txt files
• task2/ o attack-exe.zip: the attack samples generated
from MLSploit to evade your models
o attack-feature.zip o attack-prediction.zip
o attack.cfg.zip: Configuration file from MLSploit o *.log.txt files
• task3.a/ o detection1.py: Source code (preferably in Python) that will train a new model that will detect the attack from the previous task o detection2.py o detection3.py o (others here if you wish)
• task3.b/
o model1.zip o model2.zip o model3.zip o (others here if you wish)
• task4/ o pe.model.zip o prediction.zip o *.log.txt files
• task5/ o task1 model/
▪ prediction.zip
▪ *.log.txt files o task4 model/
▪ prediction.zip
▪ *.log.txt files
• report.pdf: This report should contain screenshots of your findings and explanations for why the certain screenshot happened. For example, if your screenshot is comparing the results of how well different models detected the attack in part 2, then an explanation for why the results differed should be included.
• bonus/
o model.zip
o *.log.txt files o ember-attack.zip o *.log.txt files
existence (https://github.com/ytisf/theZooLinks to an external site.). We have not applied any static obfuscation to them so they should be easily detectable by AV companies. You are to use these binaries responsibly by only reading their byte contents (e.g., using tools
like https://github.com/erocarrera/pefileLinks to an external site.).


Overview
● Introduction
● Training Deep Learning Models
● Attacking Deep Learning Models
● Defending DL Models with ML
● Cat and Mouse ● Further Reading
Introduction
● However, these malware variants will generally all have the same behaviour across different variants
● One way to detect an entire family of malware is to train a machine learning model based on the behaviour of a malware family so that if the model sees a new variant, it will still be able to classify the malware as from a specific strain
● This makes it not only faster but can allow you to scale your malware detection up than classifying each malware by hand
● Just like the pattern-recognition Host-Based IDS you created in lab 3, these models classify and identify syscall sequences (among other parameters)
MLSploit
● Helps you train and test different machine learning solutions against machine learning attacks
● Made with a simple GUI for usability

● See tutorials in lab files for more information on how you should operate
MLSploit
Training a DL Model
● To explore the world of machine learning security, we will first construct a few models to test later
● We want to test LSTM, CNN, and RNN models
● You can change the window size (i.e. length of the sequence of syscalls that the model will compare at
each time) on the right size of the UI for more accurate models
● A full-length tutorial can be found in Canvas called Lab07_PE_Module_Tutorial.pdf
Attacking a DL Model
● Now that our models are created, let’s see how easily they can be attacked
(AKA tricked into thinking a malicious file is benign)
● In MLSploit, we will create a new pipeline that will first perform a “mimicry attack” against a benign application to figure out what a benign application might do to evade the machine learning model
● Next in the pipeline, we transform the benign application by injecting 10 different shellcode chunks into it, creating 10 samples that might evade the machine learning model by making it think that it was benign
● These steps are also detailed in the same tutorial
Identifying Malware Meant to Trick DL with ML
● As we have learned in this course, combining both static and dynamic analysis makes for much more robust malware identification
● To incorporate static analysis into our process, we will need to train an ML model on static features of normal programs to identify differences in one that was injected with a mimicry attack
● We have provided static features of benign programs for you, so first we will extract features of the malicious programs you created in the previous step
● We will be using EMBER
● To extract features like EMBER, check out how PEFeatureExtractor is used by scripts/ember_init.py

● After you have static features extracted for both, let us move on to training a model to identify malware based on static features
● First, we must prepare the data for the ML algorithm
● Note that we are using a classification algorithm as this is a classification problem
○ We have some labels (malicious vs.
benign) and we want to put them on a set of THREE features
○ First, we make a table with features and a label for eat set of features
● Next, we will split this table into the table of features and the table of labels ● You can do this with numpy.hsplit() or numpy.split(axis=1) x, y = numpy.split(dataset, [-1], axis=1)
● Now, you can create a training set with scikit-learn’s train_test_split() method
● This will output a training and testing array for both features and labels for the ML algorithm to use
● Now, we can finally train a model
● Initialize a classifier like the DecisionTreeClassifier
● Then, use the class’ fit() method to train your model
● To figure out the accuracy of your model, first use the model to predict labels for the test array of features, and then get the accuracy score using sci-
kitlearn’s accuracy_score() method y_pred = dt.predict(x_test) print(accuracy_score(y_test, y_pred))


Cat and Mouse

● You can do something similar with MLSploit
● Follow the tutorial included with the project and create traditional ML models
● You can also try to run the same attack from before against these models
Further Reading
Useful Example Code
Mlsploit-pe (has useful fork of EMBER)
GitHub - evandowning/mlsploit-pe: MLSploit PE module

Scikit-learn documentation
API Reference — scikit-learn 0.24.1 documentation

ML Evasion
Mimicry attacks on host-based intrusion detection systems | Proceedings of the
9th ACM conference on Computer and communications security
[1804.04637] EMBER: An Open Dataset for Training Static PE Malware Machine
Learning Models (arxiv.org)

[1801.08917] Learning to Evade Static PE Machine Learning Malware Models via
Reinforcement Learning (arxiv.org)

More products