Growing Your Malware Corpus

If youโ€™re writing YARA rules or doing other kinds of detection engineering, youโ€™ll want to have a test bed that you can run your rules against.  This is known as a corpus. For your corpus youโ€™ll want to have both Goodware (known good operating system files), as well as a library of malware files.

One source to get a lot of malware samples is from VX-Underground.  What I really appreciate about VX-Underground is that in addition to providing lots of malware samples, they also produce an annual archive of samples and papers. You can download a whole yearโ€™s worth of samples and papers, from 2010 to 2023.

Pandoraโ€™s Box

Just to understand the structure here, I have a USB device called โ€œPandora.โ€ On the root of the drive is a folder called โ€œAPTโ€, and within that is a โ€œSamplesโ€ directory. Inside the samples directory is the .7z download for 2023 from VX-Underground. Thereโ€™s also a python scriptโ€ฆ weโ€™ll get to that soon enough.

The first thing weโ€™ll need to do is unzip the download with the usual password.

7zz x 2023.7z

Once the initial extraction is complete you can delete the original 2023.7z archive.

Within the archive for each year, there is a directory for the sample, with sub-directories of โ€˜Samplesโ€™ and โ€˜Papers.โ€™ ย Every one of the samples is also password protected zip file.

This makes sense from a safety perspective, but it makes it impossible to scan against all the files at once.

Python to the Rescue

We can utilize a Python script to recursively go through the contents of our malware folder and unzip all the password protected files, while keeping those files in their original directories.

You may have noticed in the first screenshot that I have a script called ExtractSamples.py in my APT directory.

We will use this for the recursive password protected extractions.

Python ExtractSamples.py

A flurry of code goes by, and you congratulate yourself on you Python prowess. Now if we look again at our contents, weโ€™ve got the extracted sample and the original zip file. 

Letโ€™s get rid of all the zip files as we donโ€™t need them cluttering up the corpus.

We can start by running a find command to identify all the 7zip files.

find . -type f -name '*.7z' -print

After youโ€™ve checked the output and verified the command above is only grabbing the 7z files you want to delete, we can update the command to delete the found files.

find . -type f -name '*.7z' -delete

One more a directory listing to verify:

Success. All the 7z files are removed and all the sample files are intact.

GitHub Link: ExtractSamples.py

Time to go write some new detections!

Huntress CTF: Week 1 – Forensics: Backdoored Splunk, Traffic, Dumpster Fire

Backdoored Splunk

Hit Start.

So we’ve got a url and a specific port. Firefox web browser yields…

So we need an Authorization header. ๐Ÿค”

Time to look at the provided files. It looks to be the export of a Splunk application.

Time to download an eval copy of Splunk and… pause. There’s probably a simpler way to attack this.

The Silver Searcher is a command line tool I picked up during the CTF and I love it. It’s like Grep on PCP.

Once installed, the base command is ag, followed by what you’re searching for, and where. So let’s do a quick search for Authorization on all the contents of this directory.

That looks interesting. A clue? One of the PowerShell scripts has Authorization and what looks to be Base64 code.

We also see a comment about the $PORT being dynamic based on the Start button. Decoding the string in CyberChef…

At this point we have all the pieces, we just need to put them together. I started to look at different ways to pass an Authorization header to a web server. There’s proxy tools galore. And then there’s the basic’s like curl. After a bit of brushing up on my syntax I had:

curl -H "Authorization: Basic [longStringFromThePowershell]" http://site:$PORT

Yay what looks like more Base64. Once more with our Chef’s hat and…


Traffic

rita was a tool I hadn’t used before but it was very easy to use. I installed it on my REMnux box and then ran it against the dataset.

I then used the command to generate an html report.

Looking through the DNS requests there’s something sketchy indeed.

Let’s go take a look at that.


Dumpster Fire

Let’s start with the_silver_searcher again and see if we have any luck with “Password”.

There’s a number of hits including references to an encryptedUsername and encryptedPassword in the logins.json file. So we’ve got some encrypted Firefox user passwords. If only there were a utility that could decrypt those. Enter firepwd.py, an open source tool to decrypt Mozilla protected passwords.

Run the script in Python and point it to the directory for the user profile (where the logins.json file is).

That’s a pretty LEET password ๐Ÿ˜‰


Use the tag #HuntressCTF on BakerStreetForensics.com to see all related posts and solutions for the 2023 Huntress CTF.

Creating YARA files with Python

When I’m researching a piece of malware, I’ll have a notepad open (usually VS Code), where I’m capturing strings that might be useful for a detection rule. When I have a good set of indicators, the next step is to turn them into a YARA rule.

It’s easy enough to create a YARA file by hand. My objective was to streamline the boring stuff like formatting and generating a string identifier ($s1 = “stringOne”) for each string. Normally PowerShell is my goto, but this week I’m branching out and wanted to work on my Python coding.

The code relies on you having a file called strings.txt. One string per line.

When you run the script it will prompt for (metadata):

  • rule name
  • author
  • description
  • hash

It then takes the contents of strings.txt and combines those with the metadata to produce a cleanly formatted YARA rule.

Caveats:

If the strings have special characters that need to be escaped, you may need to tweak the strings in the rule after it’s created.

The script will define the condition “any of them”. If you prefer to have all strings required, you can change line 22 from

yara_rule += '\t\tany of them\n}\n'

to

yara_rule += '\t\tall of them\n}\n'

CreateYARA.py

def get_user_input():
    rule_name = input("Enter the rule name: ")
    author = input("Enter the author: ")
    description = input("Enter the description: ")
    hash_value = input("Enter the hash value: ")
    return rule_name, author, description, hash_value

def create_yara_rule(rule_name, author, description, hash_value, strings_file):
    yara_rule = f'''rule {rule_name} {{
    meta:
    \tauthor = "{author}"
    \tdescription = "{description}"
    \thash = "{hash_value}"

    strings:
    '''
    with open(strings_file, 'r') as file:
        for id, line in enumerate(file, start=1):
            yara_rule += f'\t$s{id} = "{line.strip()}"\n\t'
    yara_rule += '\n'
    yara_rule += '\tcondition:\n'
    yara_rule += '\t\tany of them\n}\n'

    return yara_rule

def main():
    rule_name, author, description, hash_value = get_user_input()
    strings_file = 'strings.txt'  

    yara_rule = create_yara_rule(rule_name, author, description, hash_value, strings_file)
    print("Generated YARA rule:")
    print(yara_rule)
    
    yar_filename = f'{rule_name}.yar'
    with open(yar_filename, 'w') as yar_file:
        yar_file.write(yara_rule)

    print(f"YARA rule saved to {yar_filename}")

if __name__ == "__main__":
    main()
Sample strings.txt file used as input for the YARA rule
Running CreateYARA.py
YARA rule created from Python script, viewed in VS Code.

Raspberry Pi Internet Speed Monitor

I was looking wistfully at the Lack Rack from my arm chair, admiring the (faux) copper conduit that covered the primary inbound internet link to the switch. I thought it would be cool looking to have an antique steam gauge attached to the piping. Two things caused that idea to quickly change – 1. the going prices for antique steam gauges right now, 2. once I was thinking about it as a gauge I thought an ‘internet speed gauge’ would be perfect. Alas, even if said gauge could be acquired without breaking the bank, converting MBPS to PSI and making it functional is above my level of engineering. So on to the next best thing – a Raspberry Pi hack.

Materials:

  • Raspberry Pi (3 or 4) with Raspbian 32-bit OS
  • Case with 3.5 in LCD Display
  • Copper spray paint ๐Ÿ˜‰
  • Attention to detail at the command line

Speedtest CLI

Once you’ve got your Raspberry Pi up and running start with the Installing the Speedtest CLI instructions at https://pimylifeup.com/raspberry-pi-internet-speed-monitor/. Complete steps 1-6. When the article gets to Writing our Speed Test Python Script, you can skip that section. I do recommend it from a learning perspective, but the code from that step won’t be used in the final project.

Assuming this is a new installation, you will need to install InfluxDB and Grafana. Complete the respective instructions for each.

Continue with the primary article’s instructions for Using Grafana to Display your Speedtest Data.

If you’ve made it along this far, you should have a working Grafana dashboard displaying Upload Speed, Download Speed, and Ping (Latency). If you’re hitting a glitch – go back through what you’ve coded and double check that any references to the user (default = Pi) are accurate for the user on your device. You should be seeing updated data based on the frequency you specified in crontab -e.

Install Grafana Kiosk

Next, we want to set up our device as a kiosk, and have it boot and display the Network Speed dashboard automatically.

Install Grafana Kiosk from https://github.com/grafana/grafana-kiosk. For my installation I used the ARM v6 grafana-kiosk.linux.armv6 release.

Running the Dashboard on startup:

We’re going to use a yaml file to store our dashboard configuration:

Create a new file, config.yaml and populate it as such:

general:
  kiosk-mode: full
  autofit: true
  lxde: true
  lxde-home: /home/(user)
target:
  login-method: local
  username: admin
  password: (password)
  playlist: false
  URL: http://localhost:3000/d/bdf20d32-c4ff-4578-a3f4-7a38e1f722b9/network-speed?orgId=1
  ignore-certificate-errors: false

Be sure to substitute the proper ID wherever you see (user). The URL for the dashboard can be copied from the web interface of the dashboard.

Edit /home/(user)/.config/lxsession/LXDE-pi/autostart

Add a line: (one line, may show as wrapped)

@/usr/bin/grafana-kiosk -lxde-home /home/(user) -c /home/(user)/config.yaml

Save & Exit.

Now when you reboot the Pi, the dashboard should come up full screen after login.