Skip to main content

Basic Static Analysis (Part 1)

As mentioned in my prior post (http://blog.stillztech.com/2018/06/introduction-to-malware-analysis.html), I've found that malware analysis can be grouped into four categories:

  • Basic Static (what this post will cover ;) )
  • Basic Dynamic
  • Advanced Static
  • Advanced Dynamic
Basic Static
When performing basic static analysis, we don’t execute the code or dig into disassembly. The idea is to obtain a quick overview of the structure of the sample and identify any low-hanging fruit. These items can be IP’s, domains, hash lookups or even keywords and phrases that may hint to the sample’s intent or purpose.
To get us started on basic static analysis, we’re going to to begin analyzing a basic Windows 32-bit executable, also known as a “PE” (i.e. Portable Executable) file. Executable files are commonly seen with a “.exe” at the end of a file name (i.e. assuming you have Windows showing hidden extensions). By default, Windows doesn’t show extensions. We can change this in Windows 7 by navigating to ”Control Panel” -> “View by” -> “Small Icons”, clicking on “Folder Options” -> “View tab”, and uncheck “Hide extensions for known file types”.

With this option unchecked, you can now see the raw file extensions. Now, why does this matter? The images below outlines the reason why it’s important:

As you can see, with the Windows default settings, the file seemed to have a “.doc” extension, however, after we uncheck the “Hide extensions for known file types”, we now see ”.doc.exe”. To make things even more confusing, attackers like to use the default “Microsoft Word Document” icon to make the file appear more legitimate. In reality, the file was always an executable, the attacker knows that Windows doesn’t show file extensions by default, so you think you’re opening a Microsoft Word document, but instead, what you actually did was execute a binary. Awesome, right?! The point here is that Windows only cares about extensions so it knows what “handlers” to use when opening a file. For example, Windows files with an extension of  “.doc” is typically handled by Microsoft Word because Word is registered to that extension, as well as “.docx”.

When it comes to the Windows executable format “PE”, the extension “.exe” is not the only file that uses the PE format.  This format is used by many other files including, but not limited to:
  • .src
  • .dll
  • .cpl
  • .ocx
  • .sys
  • .drv

The best way to tell if a file is truly a Windows PE is to open it up with a hex editor and inspect its “Magic Bytes”. The first few bytes of a file is how most libraries determine what format the file is. The image below outlines an excerpt of the contents of a PE file using 010 editor:


First, you will notice the first two bytes are “4D 5A” or “MZ” (i.e. Coined after the initials of Mark Zbikowski, one of the developers for the PE format). You can also see the text of “This program cannot be run in DOS mode”. There are many formats aside from Windows executables that can be determined by simply looking at these magic bytes. For additional context, let’s checkout a zip file in our hex editor below:

Here, we have the bytes “50 4B” or “PK”. This means we’re dealing with zip file. Other files that share the zip file format:
  • Android APK files
  • Microsoft Word .docx files
Assuming the file is not corrupt or password protected, this means you can also run a decompression tools against any of these types of files. We will dig into statically analyzing Microsoft Word and APK files in later posts.

Now, back to PE files…

The Windows PE format is used by the Windows loader to outline how the executable should be loaded into memory. To help understand the PE format structure, review the image below (Kudos to Corkami on the creation of this PE walk through. See more visuals from Ange Albertini in the appendix):


At a high level, a PE file is divided into two parts, the header and sections. The Windows loader first reads the header. The header itself is nothing more than a bunch of bytes that equate to flags or offsets to other sections. As seen in the image below, if we open a PE file with CFF Explorer, we can see the bytes offset of where the values are located. So in the case of “Machine”, we see that at the byte offset “000000EC” (or 236 bytes into the PE file), you will find the “Machine” flag.
If we go to that raw offset using the 010 editor, you will see the bytes that make up that value, as outlined below:

So, how does the Windows loader know if the program is 32bit or 64bit? It uses the two flags below:
  • Machine: Processor (2 bytes, which maps out to “AMD64”)
  • Magic: 32/64 bit (2 bytes, which maps out to “PE64”)

Once you have a solid understanding of a PE file structure, you will be able to “infer” to what may happen when the file is executed. Using the example above, we know this specific sample won’t execute on a 32-bit platform. If we try to execute this 64-bit binary on a 32-bit system, we would see the following:


Even if I override the bytes myself and force the binary to be 32-bit, I still get an error:


While many other flags exist with the header, let’s continue to the next key parts of a PE file:
  • Imports
  • Exports
  • Sections
  • Resources

Imports
Executables import functions and those functions come from libraries. This holds true for all three platforms, Linux, Windows and MacOS. PE files import functions from DLL (Dynamic Linked Libraries). Rather than having to copy common functions around or write them from scratch, developers can take advantage of libraries on disk to keep their code base small and leverage these functions. Take for example, network connections. In Windows, you have the “WinSock” library or “ws2_32.dll” for lower level networking (e.g. sockets) and higher level networking functions such as “wininet.dll”. If you identify a PE file importing this library, you can infer it might create a network connection at some point. Malware authors know this and they sometimes try to hide their imports by “packing” their executable or delaying imports to load at runtime.  PE files can import libraries in one of three ways:
  • Statically linked 
  • Dynamically linked, loaded at runtime
  • Loaded on-demand

By using the statically linked method, copies of the specified code from the library into the executable, making it more portable and faster, but at the cost of its size.

When an attacker creates a malware that imports functions on demand or delayed importing, the import table will usually contain the “Kernel32.dll” library and the functions “LoadLibrary” and “GetProcAddress”.  To make things more complex, imported functions can be imported by either its name or ordinal value. Let's look at a simple executable below using CFF explorer:


Above is a PE file that imports several functions from multiple DLL’s. I've drilled down into a specific DLL “WININET.DLL” to show its 6 imported functions for this particular sample. Based on the naming of these functions, you can infer into their capability.  Let’s check out the function “InternetConnectW”. We can lookup this function in MSDN (https://msdn.microsoft.com/en-us/library/ms909418.aspx) and see that it has several parameters of interest, such as “lpszServerName” and “nServerPort” to name a few.

As we go deeper into dynamic analysis in later posts, we will learn how to view these parameters during execution and extract their values at runtime. Also, If you’re wondering if a list of “suspicious functions used by malware” exists, checkout the appendix section “Important Windows Functions” inside the book “Practical Malware Analysis”. I've also included two useful sites below for reference:

Exports
Exports are functions that a DLL/EXE may “export” or share that allow other programs to leverage. Let’s take a look “WININET.dll” to gain a better understanding on how exports work. In the image below, we have “WININET.dll” loaded into CFF Explorer:

While any PE file may have an “export” table, this table is more commonly found inside of DLLs, as they contain the code needed by many executables to function. We see from the image above, this library exports many of functions, both by name and ordinal. Any program wishing to use the function “HttpSendRequestA” needs to import the “WININET.dll” library and specify the name or ordinal value of the function they wish to import from this library. Export names are defined by the author of the library. Malware authors that compile their own libraries commonly name exports with names that would cause an analyst to infer functionality that may not exist or to make a library look benign.

Sections
When an executable is compiled, the compiler maps together the code and program data into a series of sections. In general, the “.text” section is where the code begins execution. An example of these PE sections are outlined below:


PE sections are very relevant for static analysis for many reasons. I’ve outlined a few below:
  • Detection of a packer or cryptor. Because many parts of a executable are visible to malware analysts (hard coded IP addresses, domains, naming conventions, credentials, etc.), malware authors may use packers like UPX or cryptors like YodaCrypt to compress and/or encrypt their payloads to thwart most static analysis tools and techniques. There are many variants of packers and cryptors in the wild. Tools such as “PEiD” and “Exeinfo PE” have signature databases that they use to help detect some packers and cryptors. One such database is: https://raw.githubusercontent.com/ynadji/peid/master/userdb.txt
  • Detection of any embedded resources. PE file can have embedded objects such as images, text files and icons. While this is normal, malware authors can embedded objects (such as another PE file or configuration file) inside the resource section for later use. Lets see an example of this below:

From the image above, your can see this PE file has two embedded Javascript objects inside of its resource section. Using CFF explorer, we can extract these objects in their raw format for further analysis. Outlined below shows how to extract a particular resource.


The majority of malware samples I analyze that have embedded objects, are not typically in clear text. Some common ways I've seen malware authors protect or obfuscate their embedded objects in the resource section are below:
  • XOR the file with a single byte XOR key (more on XOR in the next post). This is actually an interesting topic as some antivirus vendors also use a single byte XOR key when quarantining a malicious file. Checkout the links below if you want to read more on this: 
  • Base64 encode the strings. Once you deal with Base64 on a daily basis, you will begin to recognize the characters and quickly decode its contents.  
  • ZLIB compress the object. I've seen this method used when a malware author embedded another PE file (e.g. psexec) inside of the main executable. You can usually tell a file is ZLIB compressed by inspecting the contents of its headers. Below are the common magic bytes:
    • 78 01: No Compression/low
    • 78 9C: Default Compression
    • 78 DA: Best Compression

Usually, you can extract the raw object using CFF Explorer as we did in the example above and manipulate the extracted raw object as you see fit. In other cases, we can use some dynamic analysis techniques to have the malware dump the embedded object. This is very handy when the malware employs some custom decoding or decryption routine that's not easily decoded or decrypted by hand.

Strings
As mentioned earlier in the post, an attacker may hardcode commands, IP addresses, domains or any other strings that may hint to the malware’s purpose or functionality. When it comes to reviewing strings, be mindful that because you see a string of interest such as “cmd.exe” or “username=potato”, doesn’t mean the sample will actually use them during execution. Attackers can (and they have) put loads of unrelated strings inside of a PE files to make triage harder for malware analysts. Let’s check out some of the strings we find inside the binary below:


We can see from the image above, we’ve identified two shell commands, references to some registry keys, comspec and some common browser executables such as Internet Explorer, Chrome and Firefox. As you can see, strings can provide the analyst with some low-hanging fruit, but they should be validated during dynamic analysis. In addition to using strings for rapid analysis, I recommend checking out Flare FLOSS. As stated in the GitHub readme, “The FireEye Labs Obfuscated String Solver (FLOSS) uses advanced static analysis techniques to automatically deobfuscate strings from malware binaries.”. This is a great tool that can serve as a replacement or addition to the standard strings executable.

In the next post (Basic Static Analysis (Part 2) - PE), we will continue to dig into more static analysis attribute and perform some OSINT (Open Source Threat Intelligence) searches to aid in our sample triage. In some cases, you may find an entire write up about your sample, while in other cases you may find nothing. The objective here is to avoid wasting time triaging a sample, if an accurate, detailed report has already been completed by a trusted source.

Appendix

Lab Specimen

MD5: 277845D6CD160B3B647F1457E8AA3726

Comments

Popular posts from this blog

Revealing malware relationships with GraphDB: Part 1

In this post, we will learn how using a Graph Database like Neo4j can help visualize malware relationships and extend these relationships to identify patterns between samples. Before we dig into Neo4j, let’s start with some fundamental graph terminologies:   
Nodes represent entities such as a human, car, laptop or phone. Properties are attributes nodes can contain. A steering wheel or tires would be a property of the “car” node. Labels are a way to group together nodes of a similar type. For example, a label of “FastFood” may include nodes such as “Taco Bell, McDonald’s, and Chipotle”. Edges (or vertices) represent the relationship connection between two nodes. Relationships can also have their own properties. Getting started with Neo4jLink: https://neo4j.com/
Neo4j is a Graph Database commonly known for its pure simplicity and easy to use interface. I find the structure of a graph database quite fascinating, on top of learning how to normalize malware analysis data for each sample into a …

Analyzing and detecting web shells

Of the various pieces of malware i’ve analyzed, I still find web shells to be the most fascinating. While this not a new topic, i've been asked by others to do a write up on web shells, so here it is ;). 
For those new to web shells, think of this type of malware as code designed to be executed by the web server - instead of writing a backdoor in C, for example, an attacker can write malicious PHP and upload the code directly to a vulnerable web server. Web shells span across many different languages and server types. Let's take a looks at some common servers and some web extensions:
Operating System Service Binary Name Extensions Windows IIS (Internet Information Services) w3wp.exe .asp/.aspx Windows/Linux apache/apache2/nginx httpd/httpd.exe/nginx .php Windows/Linux Apache Tom

Introduction to Malware Analysis

Why malware analysisMalware analysis (“MA”) is a fun and excited journey for anyone new or seasoned in the career field. Taking a specimen (malware sample) and reverse engineering it to better understand its inner workings can be a long, tedious adventure. With the sheer number of malware samples circulating the internet, in addition to the various formats specimens are found in, makes malware analysis a good challenge. Outside of learning MA as a hobby, here are some other reasons why we perform malware analysis:To better understand how a specimen works. This may yield certain unique attributes about how the malware was written, methods it performs or its dependencies.To collect intelligence and build Indicators of Compromise (“IOCs”), usually comprised of Host Based Indicators (“HBIs”) and/or Network Based Indicators (“NBIs”).For general knowledge or research purposes.How do I get started?!If you’re new to malware analysis, you want to ensure you’ve taken the right precautions befor…