Skip to main content

Web shell hunting: Meet the web shell analyzer

 In continuation of my prior work on web shells (Medium/Blog), I wanted to take my work a step further and introduce a new tool that goes beyond my legacy webshell-scan tool. The “webshell-scan” tool was written in GoLang and provided threat hunters and analysts alike with the ability to quickly scan a target system for web shells in a cross platform fashion. That said, I found it was lacking in many other areas. Allow me to elaborate below…

Requirements of web shell analysis

In order to perform proper web shell analysis, we need to define some of the key requirements that a web shell analyzer would need to include. This isn’t a definitive list but more of a guide on key requirements based on my experience working on the front lines:

Static executable: Tooling must include all dependencies when being deployed. This ensures the execution is consistent and expected.

Simple and easy to use: A tool must be simple and straightforward to deploy and execute. Nothing is more frustrating than trying to get a tool to work during a live incident response engagement at 2am..

Cross platform: A majority of web servers are running on either Windows or Linux. A tool must be able to run natively on these operating systems and the tooling must be able to cross compile with ease for rapid development.

Concurrency: Tooling must be able to run across multiple CPUs and take advantage of multiple threads/channels to quickly scan a file system.

Optimized: While this is closely tied to concurrency, the tooling must take into account what system resources are available and throttle analysis to ensure system performance is not degraded.

Self-Discovering Configuration: In Live IR mode (running the analyzer on a compromised web server), a web shell analyzer should automatically determine the type of web server that is running and automatically identify and parse the web server’s configuration file. Using this data, the tool could automatically determine where the web root is located on disk, loaded handlers/filters/modules (think Windows ISAPI/HTTP filters/handlers) and other important configuration options that could enable/disable specific analyzer features.

Context: Outside of analyzing web shells, the tooling must provide context. The number one question an analyst will ask after identifying a web shell is “how did the web shell get here?”. This is why any tooling should not only be able to identify and analyze web shells, but also provide context such as: 

  • Log file analysis: When a web shell is identified, an analyst would normally spend the next few cycles digging through logs attempting to see what IP(s) interacted with the web shell. Once identified, an analyst would then pivot on the IP address(s) interacting with the web shell to determine what other files/resources were accessed, GeoIP inspection and maybe some user agent analysis. Each of these fields could be a pivot point a tool should perform as part of any web shell analysis.
  • File timeline analysis: In addition to reviewing logs, the tool should quickly determine two other things. First, what are the file timestamps of the web shell, such as created or modified. Timestamps may vary based on the operating system and platform. Second, what happened ~10–15 minutes before/after the web shell was created? In some cases, this can lead to the identification of other web shells, initial ingress, harvested files or even new malware uploaded to the server.

Deobfuscation: A majority of web shells have at least some layers of obfuscation, commonly base64. However, some web shells take obfuscation to the extreme and contain multiple layers of mixed obfuscation. A tool should be able to handle the most common types of obfuscation techniques.

Layered searches: In addition to obfuscation, web shell authors commonly use layers of obfuscation to mask the source code. A tool must be able to handle multiple layers of obfuscation and perform detection checks for each decoded layer.

Attribute analysis: Simply telling an analyst that a web shell was identified and which code matched isn’t enough. After identification, the tool should provide an analyst with key attributes of the web shell; these attributes help the analyst determine the “capability” of the web shell, such as “can interface with MySQL” or “can start threads/processes”.

Detection by attributes: To expand further on attributes, a tool should be able to detect web shells solely based on attributes. This can be helpful where regex may miss detection logic but still provide a detection based on the web shell attributes.

Modular and scalable: A tool should be able to be updated with ease without frequent recompilation. In addition, new detection/attribute logic should be seamless to update. The tool should also support the ability to be scaled up/down depending on the resources available or based on the demand of the analysis. Performing real time daily web shell hunting/monitoring vs performing incident response would require two different levels of operation.

Real Time / On-Demand: A tool should be able to interface with the underlying operating system to support near real time web shell scanning against specific directories along with on-demand scanning for hunting and incident response. Most FIM (File Integrity Monitoring) tools would only provide context into file changes, but not that the changed file or content is a web shell ;).

Output: The tool should provide the analysis results in a consistent documented schema, formatted in JSON.

Transport: As multiple web servers are scanned, it makes more sense to send the analysis output to a centralized server for review. This means a tool should be able to send analysis output in a chunked, compressed and lossless fashion.

Analysis Interface: Since no-one wants to stare at JSON all day, a tool should include a user interface by which an analyst can review the output with a simple workflow that supports tagging, comments and other automated actions. The UI should be lightweight, robust and support multiple users. The UI should be backed by a documented API for further extensibility. 

Webshell Analyzer

Now that we’ve reviewed some of the key requirements a web shell analyzer should include, let’s dive into my newest tool, and review some of the key features included in v1:

Regex Detection Groups

One of the first improvements that was made in the web shell analyzer was to break down the regex into groups. Not only did this allow for more granular control over the regex but it also enabled the use of names/descriptions to classify our matching regex blocks. Detection groups are checked at each layer of decoding and include a frequency counter to show how many times the detection logic was found.

Layered decoding

Since many web shells have nested layers of obfuscation, the analyzer is able to iterate over most layers and feed newly deobfuscated blobs back in the pipeline for processing.

Attribute analysis

As a side effect of using regex detection groups, this also enabled the tool to include “attributes”. The logic that powers these attributes tells the analyzer to “tag” a file that contains specific matching logic. These attributes tell an analyst what a detected web shell is capable of without having to perform any manual code inspection.


As with all my projects, the output is structured in JSON to make the analysis results readable and open to future structure changes as needed.


In the example below, we have a simple web shell that’s been obfuscated in three different layers. Layer 1 is base64, layer two is again, base64 followed by layer 3, gzinflate. While my legacy scanner wouldn’t find this as a web shell, the newer web shell analyzer would decode and scan each layer for detections and attributes until no more decoding could be performed. While this is a very simple example, it highlights the importance of handling layers of obfuscation. Looking at the sample here: we can see begins with “eval(gzinflate(base64_decode(”. In order to properly process this web shell, we must first remove all the layers of “gzinflate(base64_decode(”. Normally, this is a pretty simple effort using tools like CyberChef but in this case, this web shell has 11 layers of “gzinflate(base64_decode(“. Still doable by hand but if you have to analyze dozens of web shells, it’s best to use a tool like this web shell analyzer to deal with it. The debug output below shows the web shell analyzer dealing with this web shells layers of “gzinflate(base64_decode(”:

We see that after the 11 iterations of decoding, we finally see some PHP code. From here, the analyzer can then begin processing detections and attributes on the raw PHP code. After processing, if a detection is found, the analyzer will spit out a JSON object which can be divided up into three sections: Core, Matches and Attributes. Let’s take a look at these sections below.

Core Output

The core section consists of JSON key/value pairs that contain basic information about the file, its hashes, timestamps and decoders. The “decodes” item outlines which decoding routines were checked and how often. Just because a decoding routine was used doesn’t always mean it worked, only that it was attempted.

Matches Output

The matches section outlines the matches found after each level of decoding. We could also call these “detections”, as the analyzer associates these types of matches with a potential web shell. Each key in the JSON output below outlines the exact keyword that triggered a detection and how many times each keyword was found in the web shell.

Attributes Output

Unlike the matches from the section above, the attributes section is only included for grouping and contextual purposes. When a web shell is identified, the analyzer will include these attributes to highlight what “potential capabilities” the web shell possesses. In v1 of the analyzer, These attributes are not enough to trigger a detection on their own.

I hope this tool is helpful and stay tuned for more updates to the web shell analyzer in the coming posts. As always, Happy Hunting!


Popular posts from this blog

Analyzing and detecting web shells

Of the various pieces of malware i’ve analyzed, I still find web shells to be the most fascinating. While this not a new topic, i've been asked by others to do a write up on web shells, so here it is ;).  For those new to web shells, think of this type of malware as code designed to be executed by the web server - instead of writing a backdoor in C, for example, an attacker can write malicious PHP and upload the code directly to a vulnerable web server. Web shells span across many different languages and server types. Let's take a looks at some common servers and some web extensions: Operating System Service Binary Name Extensions Windows IIS (Internet Information Services) w3wp.exe .asp/.aspx Windows/Linux apache/ apache2/nginx httpd/httpd.exe/nginx .php Windows/Linux Apache Tomcat* tomcat*.exe/tomcat* .jsp/.jspx Web shells 101 To better understand web shells, let’s take a look at a simple eval web shell below: <?php

Introduction to Malware Analysis

Why malware analysis Malware analysis (“MA”) is a fun and excited journey for anyone new or seasoned in the career field. Taking a specimen (malware sample) and reverse engineering it to better understand its inner workings can be a long, tedious adventure. With the sheer number of malware samples circulating the internet, in addition to the various formats specimens are found in, makes malware analysis a good challenge. Outside of learning MA as a hobby, here are some other reasons why we perform malware analysis: To better understand how a specimen works. This may yield certain unique attributes about how the malware was written, methods it performs or its dependencies. To collect intelligence and build Indicators of Compromise (“IOCs”), usually comprised of Host Based Indicators (“HBIs”) and/or Network Based Indicators (“NBIs”). For general knowledge or research purposes. How do I get started?! If you’re new to malware analysis, you want to ensure you’ve taken the right precauti