Skip to main content

Smashing the stack with Carbon Black


In this blog post, we will cover how we perform stacking using Carbon Black Response and how we can use this methodology to find anomalies in your environment. In reality, an awesome threat hunter would like to have the following data at their disposal:

Type
Code
Details
Real Time
RT
Real time process executions and its context
Forensic
FZ
Live forensic data such as prefetch, appcompat, registry keys, etc..
Network
NT
PCAP and extracted metadata
Logs
LG
Endpoint, firewalls, proxies, AV, Web logs, etc..
Binaries
BIN
Executables collected in real time or on-demand
Memory
MEM
Real time inspection or dumping of processes/system memory

For this blog post, we will focus on Real Time (RT) process executions within Carbon Black Response. The concept of stacking is simple, we start with collecting data of the same type and choose specific fields in which we want to perform frequency analysis on. Basically, we’re cherry picking specific processes we know attackers will use and abuse and store the results into a pivot table to view the data in various way not possible through the average interface. We can save the query results for each process to an excel file or a database depending on your preference. For this post, we use .csv as the default file extension with a pipe delimiter.

For real time process executions, let’s roll with command line arguments to begin. We then pick a window of time we want to stack and then specify our Carbon Black Response query. In this example, I’ll use PowerShell, cscript, wscript and a few other queries to get us going. Let’s first inspect the config.json file:
To begin using this script, you must first add in your Carbon Black Response URL and API token (found under your user profile). Next, set the year, month, and day you wish to begin stacking. Just like the CBR: Intel Tester script, this tool will run a daily query starting with a specified date until it reaches the current date. The results will be saved to an output file named after each query you specify under the queries object. In this example, you will get six output files, each containing the results of their query with search results dating back to 2018/9/1, assuming your query syntax is correct and you have data dating back to this date. After your config.json is configured, you can run the script. Your standard output will show what query is currently running and for what day the process is searching for, as outlined below.  
Once the query completes for your given date range, the worker will move on to the next item in the queue until the queue is empty.
ProTip
Some queries are very intense and take a lot time to search. You should test each query with the intel tester before stacking them to ensure you’re not paging down 500,000 records (not that you can’t, just not ideal). You may be able to tune this script to remove noise from commonly used applications or scripts within your environment to help yield better results for your stack.
After each query completes, you will have an output file yielding the results. Let’s take a look at our PowerShell query in Microsoft Excel below:  
Since this data is delimited by a pipe, we need to split it into its proper columns. We can do this in Excel by selecting the first column in the spreadsheet, then click the Data > Text to Columns button in the toolbar. We then click the Delimited option, click Next and then specify the other delimiter option with | (pipe). Once the delimiter is set, click Finish. Your output file should look like the following image below:
Now that the data is formatted properly, let’s get into smashing the stack. We start by creating a pivot table with all the columns. You do this by selecting all the data and selecting Insert > PivotTable and clicking OK to accept the default data range. Once your pivot table is created, you will see a menu on the right hand side called PivotTable Fields. We will select Cmdline from these fields and drag/drop this field into both the Rows and Values panes. Your output should look like the following:
We can now use the power of frequency analysis to identify anomalies in the powershell stack, as we are showing the frequency of occurrence for the Cmdline field. Since the powershell items I collected are all evil, this isn’t the best example. Let’s take a look at the example query jp_cert_spread_of_infection.
We can see after formatting and sorting inside the pivot table, we have some interesting things inside the stack. Granted, my example dataset isn’t very large, but you can quickly see a few malicious items such as:
reg  add "HKLM\SOFTWARE\Microsoft\Windows\CurrentVersion\Run" /v UpdateSvc /t REG_SZ /d "C:\TMP\p.exe -s \\10.34.2.3 'net user' > C:\TMP\o2.txt" /f
2
REG  ADD HKCU\Environment /f /v UserInitMprLogonScript /t REG_MULTI_SZ /d "C:\TMP\mim.exe sekurlsa::LogonPasswords > C:\TMP\o.txt"
2
As you use this script for larger production environments, you will be able learn about your environment. Over time, you should be able to understand what’s normal, who runs specific scripts/applications, what times the applications are usually run, from what path, on what system, with what arguments, etc.). Let’s tweak the data into two levels, by username and hostname:
With the additional fields added, we can see what command line arguments are run by both user and hostname. We can also see the evil reg add commands were run on the hostname jack-pc with the local account of Jack-PC\Jack. I’m only showing a fraction of how stacking the real time data in Carbon Black Response can be used for proactive threat hunting and learning/baselining your environment. Other stacking ideas may include:
  • Stacking by parent name and process name to identify the most common and uncommon parents/children for processes
  • Stacking by process path to identify unusual execution locations of known utilities or system binaries
  • Stacking server groups and reviewing process run the last 30 days by username, path and command line arguments

Enter sub stacking
If you thought stacking was cool, you going to enjoy sub stacking even more. While the concept sub stacking is the same as above, we’re going to dig deeper into the process metadata to stack on the process attributes, not just the process summary information as we did above. The six attributes (at least what I call them) of a process in Carbon Black are:
  • RegMods
  • FileMods
  • NetConns
  • ModLoads
  • CrossProcs
  • ChildProcs

You can read up on the various attributes at the following link below:

By default, the stacking script only queries the summary API api/v1/process. This API only returns the process summary data, it doesn't include each processes metadata/details unless you actually go to the process link itself using the id and segment_id returned in the summary results. Querying each processes details will give you additional data such as file modifications, network connections, module loads, registry modifications, etc.
ProTip
When performing substacking, it’s important to remember that for each process returned from the summary API, the script will open each processes details and extract its attribute you selected. For example, if you ask for all netconns for mstsc: (process_name:mstsc.exe AND netconn_count:[1 TO *]) and the summary API returns 50 matching results, the script will inspect each process (ALL 50) and extract out ALL the netconns for EACH process. Some processes make a lot of network connections (chrome, firefox, internet explorer, etc..), file mods, reg mods and other attributes. Don’t be surprised if the output file is huge or the script takes longer than usual to complete. Ideally, any query returning more then 5k-10k results on the summary view should be tuned based on the attribute you’re filtering on.
In order to invoke substacking, you need to add the attribute property to your config.json for each query. Currently, the supported attribute values are:
  • netconn
  • modload
  • regmod
  • filemod
  • crossproc
  • childproc

If you do not specify an attribute property for a query, the script will perform normal stacking on the query and will not query the details of the process itself. An example of sub stacking for network connections made by powershell.exe is as follows:
Notice we added the “AND netconn_count:[1 TO *]” to the query. This extra term filters down the search to only return powershell processes that have network connections vs returning all powershell processes, including those without network connections. Setting the attribute property to netconn in this query block will tell the script you want to invoke sub stacking on the netconn attribute. Each record will contain the following base fields:
  • Hostname
  • ProcessStart
  • ProcessName
  • ProcessPath
  • Cmdline
  • ProcessMD5
  • Username
  • ParentName
  • ParentMD5
  • Id
  • SegmentId
  • CBR_Link
  • QueryName
  • QueryTimestamp

In addition, each netconn would have the following fields appended to the base fields:
  • Timestamp
  • LocalIP
  • RemoteIP
  • LocalPort
  • RemotePort
  • Protocol
  • Direction
  • Domain

If you wish to perform this style of threat hunting at a greater scale for all processes, I would encourage you to review the event forwarder guide found here: https://developer.carbonblack.com/reference/enterprise-response/event-forwarder/#configure-the-cb-event-forwarder and use an event processing pipeline to handle the forwarded events.

I hope this script comes in use for those using Carbon Black Response. Happy Hunting!

Acknowledgements
Special thanks to Mike Scutt (@OMGAPT), Jeff Chan, Jason Garman and the CB team for all the help.

Comments

Popular posts from this blog

Revealing malware relationships with GraphDB: Part 1

In this post, we will learn how using a Graph Database like Neo4j can help visualize malware relationships and extend these relationships to identify patterns between samples. Before we dig into Neo4j, let’s start with some fundamental graph terminologies:   
Nodes represent entities such as a human, car, laptop or phone. Properties are attributes nodes can contain. A steering wheel or tires would be a property of the “car” node. Labels are a way to group together nodes of a similar type. For example, a label of “FastFood” may include nodes such as “Taco Bell, McDonald’s, and Chipotle”. Edges (or vertices) represent the relationship connection between two nodes. Relationships can also have their own properties. Getting started with Neo4jLink: https://neo4j.com/
Neo4j is a Graph Database commonly known for its pure simplicity and easy to use interface. I find the structure of a graph database quite fascinating, on top of learning how to normalize malware analysis data for each sample into a …

Analyzing and detecting web shells

Of the various pieces of malware i’ve analyzed, I still find web shells to be the most fascinating. While this not a new topic, i've been asked by others to do a write up on web shells, so here it is ;). 
For those new to web shells, think of this type of malware as code designed to be executed by the web server - instead of writing a backdoor in C, for example, an attacker can write malicious PHP and upload the code directly to a vulnerable web server. Web shells span across many different languages and server types. Let's take a looks at some common servers and some web extensions:
Operating System Service Binary Name Extensions Windows IIS (Internet Information Services) w3wp.exe .asp/.aspx Windows/Linux apache/apache2/nginx httpd/httpd.exe/nginx .php Windows/Linux Apache Tom

Introduction to Malware Analysis

Why malware analysisMalware analysis (“MA”) is a fun and excited journey for anyone new or seasoned in the career field. Taking a specimen (malware sample) and reverse engineering it to better understand its inner workings can be a long, tedious adventure. With the sheer number of malware samples circulating the internet, in addition to the various formats specimens are found in, makes malware analysis a good challenge. Outside of learning MA as a hobby, here are some other reasons why we perform malware analysis:To better understand how a specimen works. This may yield certain unique attributes about how the malware was written, methods it performs or its dependencies.To collect intelligence and build Indicators of Compromise (“IOCs”), usually comprised of Host Based Indicators (“HBIs”) and/or Network Based Indicators (“NBIs”).For general knowledge or research purposes.How do I get started?!If you’re new to malware analysis, you want to ensure you’ve taken the right precautions befor…