Skip to main content

Introduction to Malware Analysis


Why malware analysis

Malware analysis (“MA”) is a fun and excited journey for anyone new or seasoned in the career field. Taking a specimen (malware sample) and reverse engineering it to better understand its inner workings can be a long, tedious adventure. With the sheer number of malware samples circulating the internet, in addition to the various formats specimens are found in, makes malware analysis a good challenge. Outside of learning MA as a hobby, here are some other reasons why we perform malware analysis:

  • To better understand how a specimen works. This may yield certain unique attributes about how the malware was written, methods it performs or its dependencies.
  • To collect intelligence and build Indicators of Compromise (“IOCs”), usually comprised of Host Based Indicators (“HBIs”) and/or Network Based Indicators (“NBIs”).
  • For general knowledge or research purposes.

How do I get started?!

If you’re new to malware analysis, you want to ensure you’ve taken the right precautions before handling any malicious code. These series of posts will cover the following objectives:

  • Gather additional readings and resources that helped me get started.
  • Stay current on malware trends and the threat landscape.
  • Understand operational security and why it’s important.
  • Build your local malware lab.
  • Build your malware analysis sandbox (Cuckoo).
  • Review basic malware analysis techniques.
  • Get hands on with some malware samples and tools commonly used by malware analysts.
  • Learn how to write a findings report based on your analysis.
  • Learn how to extract threat intelligence from your analysis.
  • Learn to be patient! No one learned this all in one night. Take your time and have fun! If you get frustrated, get up, go for a walk and install some java (coffee). Some of the most complex problems have been solved by walk awaying from the problem.

Resources to get you started

Books

Research/training websites

Operational Security (OpSec)

When handling malicious code, there are some best practices that should be abided by to avoid any adverse effects of working with malicious code. Especially if you perform any malware analysis at your day job, you don’t want to be responsible for infecting your work laptop, having the malware spread to other systems, or worse, grant an adversary access to your system.

  • Don’t visit an attackers’ infrastructure, especially from corporate or company networks: We call this “don’t poke the bear”. Why?! I’m glad you asked…. For starters, you may inadvertently tip off the attackers. If you visit the attackers’ server(s) manually (e.g. web browser, wget, curl, etc.) or depending on the specimen you’re working with, if you run it and allow it to communicate back to the attackers’ server(s), the attacker may notice this activity coming from your Internet Protocol (“IP”)  range or from a different user agent. If the attacker deployed the malware to a group of specific target networks or regions, this would look suspicious to the attacker. As a result, the attacker could burn down their infrastructure, document your IP space or roll out new malware to the target. While these may be extreme cases, it’s generally good practice to not let the malware communicate over the internet unless you know what you’re doing.
    • In some cases, you may need the malware to have some level of network connectivity so the specimen can continue executing. Some malware samples test for internet connectivity to common sites such as google before they continue executing. In this case, you have a few options:
      • Simulate a network connection: Two tools I’ve commonly used are INetSim and FakeNet. Both of these tools can simulate common internet services or provide dummy responses. In my experience, I’ve found that INetSim is far more robust then FakeNet, but has a learning curve.
      • Real service simulation: In some cases, the malware your analyzing may only be looking for a specific file over port 80 (HTTP). In this case, you could just spin up a web server and create a dummy file for the malware to acquire. If I need to go this route, I usually just use stand up an Apache, MySQL and PHP stack using this simple bundle: https://www.ampps.com/download. Then, I can use a tool like ApateDNS to redirect all DNS requests to my local system.
      • Use TOR, VPS or VPN provider to proxy outbound traffic: In other situations, you may want the malware to have full communication to the internet to gain a full understanding into its execution chain. In this case, you need to ensure you have the proper isolation, monitoring and protections in place before continuing.
  • Search, don’t submit: While public tools such as VirusTotal (“VT”https://www.virustotal.com) are great resources when analyzing unknown malware samples, it’s best practice to not submit anything to public sandboxes or repoisorities. Why? Two reasons come to mind:
    • Tracking: When you submit anything ot VT, it tracks you by country and a unique submission ID. This can be used to gain insight into when a sample was submitted, submission method, and where it was submitted from. On the bright side, this can also be used against attackers. Some attackers may submit a sample to VT prior to beginning an attack ensure the samples won’t be detected by the customers’ antivirus. As defenders, we could use this information to track what samples the attacker has submitted to VT, among other things.
    • Information leakage: When submitting anything unknown to VT, it may contain sensitive information about a victim company, or perhaps, hardcoded credentials. Most of the time, malware analysts just query services like VT for information regarding a hash, IP address, domain or other type of metadata.
  • Never run the malware outside a guest VM: Unless you have some hardware level imaging/cloning, read/write-only tools or related utilities, you should avoid analyzing any samples on your host system.
  • Never run the malware on a guest VM with a network connection: As stated before, networking on your guest VM should be disabled unless you have the proper monitoring and protections in place.
  • Never run the malware on a guest VM with a shared drive to another system: Let’s say you’re working with ransomware. If you allow your guest VM to mount a shared drive with another system or even a shared folder with your host, all files inside the shared resource may get encrypted by the ransomware.
  • Avoid running the malware on a guest VM with connected peripherals: Avoid sharing peripherals with your guest VM. When performing malware analysis, peripherals such as USB devices,  may get infected or allow the malware to spread.
  • Take snapshots of your VM in a base state: All virtual machine applications such as VMware, Xen and VirtualBox come with the ability to take snapshots. Before starting malware analysis, you want to setup a “BASE” image that has all of your tools installed, networking disabled and shares resources disabled. This will enable you to “revert” back to this snapshot after your analysis is completed. We will go into more detail on virtual machines in the next section.
  • Sharing samples: The best way to share any piece of malware sample is to put it inside a password protected ZIP file. Across the security industry, “infected” is the most commonly used password to protect a ZIP file containing a malware sample.
  • Referencing domains: When referencing any suspicious or malicious domains in any sort of message  such as via e-mail, text message or chat application, it’s best practice to “defang” the domains to avoid accidental clicking. In more modern chat systems like Slack, it may actually attempt to visit the domain on your behalf, which could have adverse effects.

Malware in the wild

The figure below outlines some of various malware formats that crosses my lab. We will go into these formats and a few others in later posts.

Figure 1: Common malware formats

Types of malware analysis

With modern malware analysis, initial triage is usually handled by an automated sandbox solution such as Cuckoo sandbox. However, a malware sandbox is not always effective and malware analysts may need to resort to manual analysis, especially when they’re in the field and where time is everything. While under the clock, initial triage of any malware sample should take roughly between 15-30 minutes on average to yield results. However, the speed of analysis is subject to the type of specimen and experience of the analyst. Most of the time, samples are submitted to a sandbox in the background while the analyst extracts additional details from the sample, sometimes comparing notes between the sandbox output and manual triage.

For example, when using Cuckoo sandbox, I may see a registry key created. Using my local analysis VM, I can reproduce the same results and acquire the registry key data and dump the raw bytes as needed. It's not uncommon to spend a day or even a week performing analysis, depending on the sample or level of analysis required. With most samples, running them inside an automated sandbox does the trick, but if the sample performs any additional operations, such as runtime decryption of a configuration file, anti-debugging or anti-VM, we may need to take a deeper dive.

In general, I’ve found that malware analysis can be broken down into four separate categories.


Figure 2: Malware triage categories

Virtual machines

Virtual machines (“VMs”) are a must have for any malware analyst. Unless you have the proper tooling in place (e.g. hard drive cloning), it’s best to setup a VM for each flavor of Operating System and bitness (i.e. x86 and x64).


Figure 3: Win7 VM after setting up the Flare-VM

When setting up a malware environment, I find that one of three virtual applications are commonly used:

  • VMware Workstation
  • VMware Fusion
  • VirtualBox

When setting up new virtual machines, I recommend you review the following items:

Figure 4: Host only networking configuration

    • NAT/Bridged with internet
    • Simulated. In this configuration, you set up another isolated network in which only VMs on the network can communicate with each other, without any internet access. For example, VM1 and VM2 could be set on a “internal” only network. You can then route all web traffic from VM1 to VM2. VM2 could have Wireshark running so that it can collect network packets.
    • Unattributed. With this configuration, you can setup TOR or related proxy setup to route your traffic through. This is effective when the system needs to have internet access, but not exit directly from your network.
  •  Shared resources: It’s best to not connect any devices to your base VM. Malware my jump from the guest to host using external storage media like USB devices.
    • Folders
    • Printers
    • USB Drives
  • Snapshots: While I usually have a “base” snapshot for each of my VM’s, over time you may end up have multiple snapshots for various VM configurations. For example, one snapshot may have a specific version of Internet Explorer installed or a known vulnerable version of adobe.
    • Comments

Dealing with obfuscation

Throughout your malware analysis journey, you will encounter blocks of code or text with various levels of obfuscation, that is, data which is purposefully modified to make analysis harder. Some of the common obfuscation techniques include Base64, char, ord, concatenation, code comments, string replacement, xor and raw byte streams, just to name a few. To make matters worse, some specimens use a “layered” approach by combining the same obfuscation techniques multiple times or using a different obfuscation technique per layer. We cover a basic example of this in the next section.

To combat most obfuscation, I recommend using a toolkit called “CyberChef”. This tool comes loaded with many common encoding and encryption routines, which can be chained together for any layered obfuscation or encryption. The best way to get CyberChef up and running is to install and run it inside a docker container:

Figure 5: Terminal output after running the command “sudo docker pull remnux/cyberchef”

Once you pull down the docker container, run it using the command below:

## MacOS/Linux
sudo docker run -d -p 8080:80 remnux/cyberchef
## Visit http://localhost:8080 in any web browser

Figure 4: Running the cyberchef container

CyberChef is excellent at handling single layer obfuscation tactics, but it can also handle layered obfuscation with ease. For example, say we run into the hex string below:

53 58 4e 75 4a 33 51 67 64 47 68 70 63 79 42 6d 64 57 34 2f 49 51 3d 3d
 

If we paste that string of hex into CyberChef and choose the “From Hex” for your first layer and “From Base64” as the second layer, your should see the plain text “Isn’t this fun?!. CyberChef supports drag and drop too. To make things even better, you can save this as a “recipe” using the “Save recipe” button at the bottom. This is excellent when you want to share or reuse the recipe.

 

Figure 6: CyberChef inside the web browser

 

In the next blog post, we will go into our first malware analysis category, basic static analysis.

 

Popular posts from this blog

Revealing malware relationships with GraphDB: Part 1

In this post, we will learn how using a Graph Database like Neo4j can help visualize malware relationships and extend these relationships to identify patterns between samples. Before we dig into Neo4j, let’s start with some fundamental graph terminologies:   
Nodes represent entities such as a human, car, laptop or phone. Properties are attributes nodes can contain. A steering wheel or tires would be a property of the “car” node. Labels are a way to group together nodes of a similar type. For example, a label of “FastFood” may include nodes such as “Taco Bell, McDonald’s, and Chipotle”. Edges (or vertices) represent the relationship connection between two nodes. Relationships can also have their own properties. Getting started with Neo4jLink: https://neo4j.com/
Neo4j is a Graph Database commonly known for its pure simplicity and easy to use interface. I find the structure of a graph database quite fascinating, on top of learning how to normalize malware analysis data for each sample into a …

Analyzing and detecting web shells

Of the various pieces of malware i’ve analyzed, I still find web shells to be the most fascinating. While this not a new topic, i've been asked by others to do a write up on web shells, so here it is ;). 
For those new to web shells, think of this type of malware as code designed to be executed by the web server - instead of writing a backdoor in C, for example, an attacker can write malicious PHP and upload the code directly to a vulnerable web server. Web shells span across many different languages and server types. Let's take a looks at some common servers and some web extensions:
Operating System Service Binary Name Extensions Windows IIS (Internet Information Services) w3wp.exe .asp/.aspx Windows/Linux apache/apache2/nginx httpd/httpd.exe/nginx .php Windows/Linux Apache Tom