Growing Your Malware Corpus

If you’re writing YARA rules or doing other kinds of detection engineering, you’ll want to have a test bed that you can run your rules against.  This is known as a corpus. For your corpus you’ll want to have both Goodware (known good operating system files), as well as a library of malware files.

One source to get a lot of malware samples is from VX-Underground.  What I really appreciate about VX-Underground is that in addition to providing lots of malware samples, they also produce an annual archive of samples and papers. You can download a whole year’s worth of samples and papers, from 2010 to 2023.

Pandora’s Box

Just to understand the structure here, I have a USB device called “Pandora.” On the root of the drive is a folder called “APT”, and within that is a “Samples” directory. Inside the samples directory is the .7z download for 2023 from VX-Underground. There’s also a python script… we’ll get to that soon enough.

The first thing we’ll need to do is unzip the download with the usual password.

7zz x 2023.7z

Once the initial extraction is complete you can delete the original 2023.7z archive.

Within the archive for each year, there is a directory for the sample, with sub-directories of ‘Samples’ and ‘Papers.’  Every one of the samples is also password protected zip file.

This makes sense from a safety perspective, but it makes it impossible to scan against all the files at once.

Python to the Rescue

We can utilize a Python script to recursively go through the contents of our malware folder and unzip all the password protected files, while keeping those files in their original directories.

You may have noticed in the first screenshot that I have a script called ExtractSamples.py in my APT directory.

We will use this for the recursive password protected extractions.

Python ExtractSamples.py

A flurry of code goes by, and you congratulate yourself on you Python prowess. Now if we look again at our contents, we’ve got the extracted sample and the original zip file. 

Let’s get rid of all the zip files as we don’t need them cluttering up the corpus.

We can start by running a find command to identify all the 7zip files.

find . -type f -name '*.7z' -print

After you’ve checked the output and verified the command above is only grabbing the 7z files you want to delete, we can update the command to delete the found files.

find . -type f -name '*.7z' -delete

One more a directory listing to verify:

Success. All the 7z files are removed and all the sample files are intact.

GitHub Link: ExtractSamples.py

Time to go write some new detections!

Ginsu: A tool for repackaging large collections to traverse Windows Defender Live Response

Screenshot of Ginsu.ps1

Enterprise customers running Windows Defender for Endpoint have a lot of capability at their fingertips. This includes the Live Response console, a limited command shell to interact with any managed Defender assets that are online. Besides its native commands you can also use the console to push scripts and executables to endpoints.

Note: there is a specific security setting in the Defender console if you want to allow unsigned scripts.

Microsoft has its own triage package capability, but you can also push your own tools like Magnet RESPONSE or KAPE. With a little bit of PowerShell mojo you can use your favorite collection utilities using the Defender Live Response console as your entry point into the remote asset.

The console enables you to pull back files from the remote endpoint, even when it’s been quarantined. One limitation of this console function is that you’re limited to retrieving files of 3GB or less.

For many triage collections this could be under the limit, but depending on the artifacts you’re collecting you might exceed that. So what do you do when you have an isolated endpoint but you need to pull back files over 3GB? That’s where Ginsu comes in.

Ginsu is a PowerShell script that you can upload to your Defender console along with the command line version of 7zip. You configure the script with the directory with the contents you want to transfer. The script acts as a wrapper for 7zip and will create a multipart archive, splitting the files into 3GB segments.

Once you pull the archives back to your workstation, you can use 7zip to extract the files back into their original properties.

In testing, the file transfer capabilities were a bit buggy, whether it was transferring 3GB Ginsu files or other smaller files from the asset. I’m hoping this improves as the Defender console matures. If you’re able to text Ginsu in your environment, I’d love to hear how it performs.

You can download Ginsu from my GitHub repo at https://github.com/dwmetz/Ginsu