Growing Your Malware Corpus

If you’re writing YARA rules or doing other kinds of detection engineering, you’ll want to have a test bed that you can run your rules against.  This is known as a corpus. For your corpus you’ll want to have both Goodware (known good operating system files), as well as a library of malware files.

One source to get a lot of malware samples is from VX-Underground.  What I really appreciate about VX-Underground is that in addition to providing lots of malware samples, they also produce an annual archive of samples and papers. You can download a whole year’s worth of samples and papers, from 2010 to 2023.

Pandora’s Box

Just to understand the structure here, I have a USB device called “Pandora.” On the root of the drive is a folder called “APT”, and within that is a “Samples” directory. Inside the samples directory is the .7z download for 2023 from VX-Underground. There’s also a python script… we’ll get to that soon enough.

The first thing we’ll need to do is unzip the download with the usual password.

7zz x 2023.7z

Once the initial extraction is complete you can delete the original 2023.7z archive.

Within the archive for each year, there is a directory for the sample, with sub-directories of ‘Samples’ and ‘Papers.’  Every one of the samples is also password protected zip file.

This makes sense from a safety perspective, but it makes it impossible to scan against all the files at once.

Python to the Rescue

We can utilize a Python script to recursively go through the contents of our malware folder and unzip all the password protected files, while keeping those files in their original directories.

You may have noticed in the first screenshot that I have a script called ExtractSamples.py in my APT directory.

We will use this for the recursive password protected extractions.

Python ExtractSamples.py

A flurry of code goes by, and you congratulate yourself on you Python prowess. Now if we look again at our contents, we’ve got the extracted sample and the original zip file. 

Let’s get rid of all the zip files as we don’t need them cluttering up the corpus.

We can start by running a find command to identify all the 7zip files.

find . -type f -name '*.7z' -print

After you’ve checked the output and verified the command above is only grabbing the 7z files you want to delete, we can update the command to delete the found files.

find . -type f -name '*.7z' -delete

One more a directory listing to verify:

Success. All the 7z files are removed and all the sample files are intact.

GitHub Link: ExtractSamples.py

Time to go write some new detections!

Huntress CTF: Week 2 – Malware: VeeBeeEee, Snake Eater, Opendir

VeeBeeEee

First examine the file contents.

Ooof. That hurts the eyes. If we throw it into CyberChef, with the assistance of some magic (or detailed reading of the challenge), we see that it’s VB Script, which can be converted using the Microsoft Script Decoder recipe.

Copy the output to VS Code.

The syntax highlighting shows that all the ””””””””al37ysoeopm’al37ysoeopm entries are just comments, so let’s remove them.

There also seems to be an abundance of “&” obscuring the code. We’ll remove them too.

That’s a lot more readable. Looking at the code we see it’s going to use PowerShell to create a file C:\Users\Pubic\Documents\July.htm using as input the content from a pastebin URL.


Snake Eater

We’ll detonate snake_eater.exe in our lab environment.

I really enjoyed this challenge as I used my detonaRE PowerShell script to control the detonation and solve the challenge. Besides firing the malware itself, the script will initiate a pcap capture and monitor the malware process using Process Monitor. The script the converts the ProcMon output to csv for easy analysis.

Scrolling through the csv we find that the application was writing a file to:

~\AppData\Roaming\Mael Horns\flag{hashforflag}

Opendir

Let’s get Started

The link brings us to an Open Directory (duh) with lots of scripts and executables, not to mention a number of subdirectories.

The first thing to do is grab everything.

Site Sucker works well for this.

Once we’ve captured all the files and subdirectories locally we can search through them en masse. Once again in this CTF, the_silver_searcher (ag) comes into play.

Tucked away in /sir/64_bit_new/oui.txt is the flag.


Use the tag #HuntressCTF on BakerStreetForensics.com to see all related posts and solutions for the 2023 Huntress CTF.

Huntress CTF: Week 1 – WarmUps

The team at Huntress pulled off an amazing CTF that ran through the month of October with new challenges released daily. In this series, I’ll be providing my solutions to the challenges. WARNING Will Robinson, spoilers ahead! Use the tag #HuntressCTF to see all related posts.

Technical Support

There wasn’t really a solve to this one, but I’m including here for consistency. If you head to the Discord server for the event and went to the support channel, the flag was provided.


String Cheese

Taking this literally – we’ll run STRINGS on cheese:

If we scroll down through the output…


Notepad

Right click on the notepad file, open with VS Code or text editor of choice.


CaesarMirror

When you examined the text file you got

I copied the text over to CyberChef and started running some recipes on it. I found an algorithm that would work on it, well, one half at a time.

I took the original file and edited it into 2 versions, caesar_left.text and caesar_right.txt. I converted each side of the file, screenshotted the output, and then aligned them next to each other to read the complete output.


Book By Its Cover

Use the FILE command to get the properties of book.rar.

Hmm. A png file. Let’s open that with an image viewer.


BaseFFFF+1

Examining the file contents yielded…

Back to CyberChef. There’s Base64 and Base85 but neither of those work. Looking closer at the title…. BaseFFFF+1… FFFF is the Hexadecimal for 65535. Add one and you have 65536. I googled Base65536, and while it’s not in CyberChef it does exist.


Read the Rules

Head over to the Rules page. While you’re there, be sure to read up on what tools are not allowed. CTFs are usually not the situation where you bring a tank to a knife fight. Once you’ve read everything, visible, three or four times if you’re me, right, click on the webpage and choose view source.


Query Code

Once again the FILE command gives us our first clue.

It’s a png image so open with an image viewer and you have a QR code. Scan that with a QR reader and…


Dialtone

The provided wav file is a recording of different telephone buttons being pushed. The first thing to do is identify what buttons/numbers are being pushed. Using the site DialABC I uploaded the wav file and then transcribed the DTMF Tone outputs.

13040004482820197714705083053746380382743933853520408575731743622366387462228661894777288573. That is on heck of a phone number!

A hint on Discord led me to the next step. It referenced that this was a BigInteger value. After several trips with Alice down various rabbit holes I found a PowerShell syntax to convert BigInt to strings.

Hmm. Looks closer to what an encoded flag might look like, but still not there yet. Back over to CyberChef and sprinkle a little Magic dust… and we see that the next and last decoding step is to From_Hex.


Layered Security

The file command indicates that it’s a GIMP image file. I recall that GIMP is an open-source application that’s comparable to Adobe Photoshop. I’d used it previously but not in a long time. I also can’t help but think of Pulp Fiction and “Bring out the Gimp.”

After a morbid chuckle and a quick installation, I launch GIMP and open the file. In the bottom right we see there are a number of faces that are part of this picture.

As we peel down the layers we find the flag in one of the images.


Comprezz

We’ve been pretty successful starting with the file command, so let’s start there.

As the challenge suggests, no I have not heard of this file type. A quick google for compress’d data 16 bits takes me to several posts on how to uncompress theses files. After a brief trial and error (it may have taken me 2 times), I cat’d the file and then piped it to uncompress.


That’s it for the challenges in the Warm Up category. There were also challenges in Forensics, Malware and Miscellaneous.

Use the tag #HuntressCTF to see all related posts. Now that October is over, I’ll be releasing as many of these as I can.

Creating YARA files with Python

When I’m researching a piece of malware, I’ll have a notepad open (usually VS Code), where I’m capturing strings that might be useful for a detection rule. When I have a good set of indicators, the next step is to turn them into a YARA rule.

It’s easy enough to create a YARA file by hand. My objective was to streamline the boring stuff like formatting and generating a string identifier ($s1 = “stringOne”) for each string. Normally PowerShell is my goto, but this week I’m branching out and wanted to work on my Python coding.

The code relies on you having a file called strings.txt. One string per line.

When you run the script it will prompt for (metadata):

  • rule name
  • author
  • description
  • hash

It then takes the contents of strings.txt and combines those with the metadata to produce a cleanly formatted YARA rule.

Caveats:

If the strings have special characters that need to be escaped, you may need to tweak the strings in the rule after it’s created.

The script will define the condition “any of them”. If you prefer to have all strings required, you can change line 22 from

yara_rule += '\t\tany of them\n}\n'

to

yara_rule += '\t\tall of them\n}\n'

CreateYARA.py

def get_user_input():
    rule_name = input("Enter the rule name: ")
    author = input("Enter the author: ")
    description = input("Enter the description: ")
    hash_value = input("Enter the hash value: ")
    return rule_name, author, description, hash_value

def create_yara_rule(rule_name, author, description, hash_value, strings_file):
    yara_rule = f'''rule {rule_name} {{
    meta:
    \tauthor = "{author}"
    \tdescription = "{description}"
    \thash = "{hash_value}"

    strings:
    '''
    with open(strings_file, 'r') as file:
        for id, line in enumerate(file, start=1):
            yara_rule += f'\t$s{id} = "{line.strip()}"\n\t'
    yara_rule += '\n'
    yara_rule += '\tcondition:\n'
    yara_rule += '\t\tany of them\n}\n'

    return yara_rule

def main():
    rule_name, author, description, hash_value = get_user_input()
    strings_file = 'strings.txt'  

    yara_rule = create_yara_rule(rule_name, author, description, hash_value, strings_file)
    print("Generated YARA rule:")
    print(yara_rule)
    
    yar_filename = f'{rule_name}.yar'
    with open(yar_filename, 'w') as yar_file:
        yar_file.write(yara_rule)

    print(f"YARA rule saved to {yar_filename}")

if __name__ == "__main__":
    main()
Sample strings.txt file used as input for the YARA rule
Running CreateYARA.py
YARA rule created from Python script, viewed in VS Code.