Python – Baker Street Forensics

MalChela 2.2 “REMnux” Release

MalChela’s 2.2 update is packed with practical and platform-friendly improvements. It includes native support for REMnux, better tool settings, and deeper integrations with analysis tools like YARA-X, Tshark, Volatility3, and the newly improved fileanalyzer module.

🦀 REMnux Edition: Built-In Support, Zero Tweaks

When the GUI loads a REMnux-specific tools.yaml profile, it enters REMnux mode.

Screenshot of yaml configuration applying REMnux mode

Native binaries and Python scripts like capa, oledump.py, olevba, and FLOSS are loaded into the MalChela tools menu, allowing you to mix and match operations with the embedded MalChela utilities and the full REMnux tool stack. No manual configuration needed—just launch and go. MalChela currently supports the following REMnux programs right out of the box:

Tool Name	Description
binwalk	Firmware analysis and extraction tool
capa	Identifies capabilities in executable files
radare2	Advanced reverse engineering framework
Volatility 3	Memory forensics framework for RAM analysis
exiftool	Extracts metadata from images, documents, and more
TShark	Terminal-based network packet analyzer (Wireshark CLI)
mraptor	Detects malicious macros in Office documents
oledump	Parses OLE files and embedded streams
oleid	Identifies features in OLE files that may indicate threats
olevba	Extracts and analyzes VBA macros from Office files
rtfobj	Extracts embedded objects from RTF documents
zipdump	Inspects contents of ZIP files, including suspicious payloads
pdf-parser	Analyzes structure and contents of suspicious PDFs
FLOSS	Reveals obfuscated and decoded strings in binaries
clamscan	On-demand virus scanner using ClamAV engine
strings	Extracts printable strings from binary files
YARA-X	Next-generation high-performance YARA rule scanner

If you only need a subset of tools you can easily save and restore that a custom profile.

TShark Panel with Built-In Reference

Tshark and the integrated field reference

A new TShark integration exposes features including:

A filter builder panel
Commonly used fields reference
Tooltip hints for each example (e.g., `ip.addr == 192.168.1.1` shows “Any traffic to or from 192.168.1.1”)
One-click copy support

This helps analysts build and understand filters quickly—even if TShark isn’t something they use every day. Using the syntax builder in MalChela you can use the exact commands directly in Tshark or Wireshark.

YARA-X Support (Install Guide Included)

Support for YARA-X (via the `yr` binary) is now built in. YARA-X is not bundled with REMnux by default, but install instructions are included in the User Guide for both macOS and Linux users.

Once installed, MalChela allows for rule-based scanning from the GUI,and with YARA-X, it’s faster than ever.

fileanalyzer: Fuzzy Hashing, PE Metadata, and More

MalChela’s fileanalyzer tool has also been updated to include:

Fuzzy hashing support via `ssdeep`
BLAKE3 hashing for fast, secure fingerprints
Expanded PE analysis, including:
Import and Export Table parsing (list of imported and exported functions)
Compilation Timestamp (for detection of suspicious or forged build times)
Section Characteristics (flags like IMAGE_SCN_MEM_EXECUTE, IMAGE_SCN_CNT_CODE, etc., for detecting anomalous sections)

These improvements provide deeper insight into executable structure, helping analysts detect anomalies such as packers, suspicious timestamps, or unexpected imports/exports. Useful for everything from sample triage to correlation, fileanalyzer now digs deeper—without slowing down.

Memory Forensics Gets a Boost: Volatility 3 Now Supported

With the 2.2 release, MalChela introduces support for Volatility 3, the modern Python-based memory forensics framework. Whether you’re running MalChela in REMnux or on a customized macOS or Linux setup, you can now access the full power of Volatility directly from the MalChela GUI.

There’s an intuitive plugin selector that dynamically adjusts available arguments based on your chosen plugin,. You can search, sort, and browse available plugins, and even toggle output options like –dump-dir with ease.

Like Tshark, there is an added plugin reference panel with searchable descriptions and argument overviews — a real time-saver when navigating Volatility’s deep and often complex toolset.

Smarter Tool Configuration via YAML

The tool configuration system continues to evolve:

Tools now declare their input type (file, folder, or hash)
The GUI dynamically adjusts the interface to match
Alternate profiles (like REMnux setups) can be managed simply by swapping `tools.yaml` files via the GUI
Easily backup or restore your custom setups
Restore the default toolset to get back to basics

This structure helps keep things clean—whether you’re testing, teaching, or deploying in a lab environment.

Embedded Documentation Access

The GUI now includes a link to the full MalChela User Guide in PDF. You can also access the documentation online.

From tool usage and CLI flags to configuration tips and install steps, it’s all just a click away—especially useful in offline environments or when onboarding new analysts. I’ll be honest, this is likely the most comprehensive user guide I’ve ever written.

Whether you’re reviewing binaries, building hash sets, or exploring network captures—MalChela 2.2 is designed bring together the tools you need, and make it easier to interoperate between them.

The new REMnux mode makes it even easier to get up and running with dozens of third party integrations.

Have an idea for a feature or application you’d like to see supported — reach out to me.

GitHub: REMnux Release

MalChela User Guide: Online, PDF, Web

Shop: T-shirts, hats, stickers, and more

Creating custom hash sets with YARA and Python

DFIR, hashing, Malware, Python, yara

I don’t like to brag, he said, but you should see the size of my malware library.

For a recent project, I wanted to produce a hash set for all the malware files in my repository. Included in the library are malware samples for Windows and other platforms. Within the library there are also a lot of pdf’s with write ups corresponding to different samples. Lastly there are zip files that the malware samples haven’t been extracted from yet.

I didn’t want hashes for any of the pdf documents or zip files. I wanted one hash set for the malware specific to Windows, and a second set for all the other samples.

Using YARA and Python, and some (a lot of) AI coaching, I wound up with three scripts. Two of them are used to create the hash sets, and a third that does counting and indexing on the source directory for different file headers.

Windows Malware Hashing

The majority of Windows malware has an MZ header. The first Python script uses YARA to recursively search a directory. For any files with an MZ header, it will write the MD5 hash to a file.

MZMD5.py

import os
import yara
import hashlib
import sys

def compile_yara_rules():
    """
    Compile YARA rules for detecting MZ headers.
    
    Returns:
        yara.Rules: Compiled YARA rules.
    """
    rules = """
rule mz_header {
    meta:
        description = "Matches files with MZ header (Windows Executables)"
    strings:
        $mz = {4D 5A}  // MZ header in hex
    condition:
        $mz at 0  // Match if MZ header is at the start of the file
}
"""
    return yara.compile(source=rules)

def calculate_md5(file_path):
    """
    Calculate the MD5 hash of a file.
    
    Args:
        file_path (str): Path to the file.
    
    Returns:
        str: MD5 hash in hexadecimal format.
    """
    md5_hash = hashlib.md5()
    try:
        with open(file_path, "rb") as f:
            for byte_block in iter(lambda: f.read(4096), b""):
                md5_hash.update(byte_block)
        return md5_hash.hexdigest()
    except Exception as e:
        return None

def scan_and_hash_files(directory, rules, output_file):
    """
    Scan files in a directory using YARA rules, calculate MD5 hashes for matches,
    and write results to an output file.
    
    Args:
        directory (str): Path to the directory to scan.
        rules (yara.Rules): Compiled YARA rules.
        output_file (str): Path to the output file where results will be saved.
    
    Returns:
        int: Total number of hashes written to the output file.
    """
    hash_count = 0
    with open(output_file, "w") as out_file:
        # Walk through the directory and its subdirectories
        for root, _, files in os.walk(directory):
            for file in files:
                file_path = os.path.join(root, file)
                
                try:
                    # Match YARA rules against the file
                    matches = rules.match(file_path)
                    if any(match.rule == "mz_header" for match in matches):
                        # Calculate MD5 hash if the file matches the MZ header rule
                        md5_hash = calculate_md5(file_path)
                        if md5_hash:
                            out_file.write(f"{md5_hash}\n")
                            # Print hash value and flush output immediately
                            print(md5_hash, flush=True)
                            hash_count += 1
                except Exception as e:
                    pass  # Suppress error messages
    
    return hash_count

if __name__ == "__main__":
    # Prompt user for directory to scan
    directory_to_scan = input("Enter the directory you want to scan: ").strip()
    
    # Verify that the directory exists
    if not os.path.isdir(directory_to_scan):
        print("Error: The specified directory does not exist.")
        exit(1)
    
    # Set output file path to MZMD5.txt in the current working directory
    output_file_path = "MZMD5.txt"
    
    # Check if the output file already exists
    if os.path.exists(output_file_path):
        overwrite = input(f"The file '{output_file_path}' already exists. Overwrite? (y/n): ").strip().lower()
        if overwrite != 'y':
            print("Operation canceled.")
            exit(0)
    
    # Compile YARA rules
    yara_rules = compile_yara_rules()
    
    # Scan directory, calculate MD5 hashes, and write results to an output file
    total_hashes = scan_and_hash_files(directory_to_scan, yara_rules, output_file_path)
    
    # Report total number of hashes written and location of the output file
    print(f"\nScan completed.")
    print(f"Total number of hashes written: {total_hashes}")
    print(f"Output file location: {os.path.abspath(output_file_path)}")

Non-Windows Malware Hashing

The second script is a little more complicated. Again we will use YARA to determine the filetype, however in this case we want to exclude anything with an MZ header, as well as exclude any zip files or pdfs. Based on the contents of the library, this should produce a hash set for all the other binaries in the library that aren’t targeted to Windows.

XMZMD5.py

import os
import yara
import hashlib

def compile_yara_rules():
    """
    Compile YARA rules for MZ, PDF, and ZIP headers.
    
    Returns:
        yara.Rules: Compiled YARA rules.
    """
    rules = """
rule mz_header {
    meta:
        description = "Matches files with MZ header (Windows Executables)"
    strings:
        $mz = {4D 5A}  // MZ header in hex
    condition:
        $mz at 0  // Match if MZ header is at the start of the file
}

rule pdf_header {
    meta:
        description = "Matches files with PDF header"
    strings:
        $pdf = {25 50 44 46}  // PDF header in hex (%PDF)
    condition:
        $pdf at 0  // Match if PDF header is at the start of the file
}

rule zip_header {
    meta:
        description = "Matches files with ZIP header"
    strings:
        $zip = {50 4B 03 04}  // ZIP header in hex
    condition:
        $zip at 0  // Match if ZIP header is at the start of the file
}
"""
    return yara.compile(source=rules)

def calculate_md5(file_path):
    """
    Calculate the MD5 hash of a file.

    Args:
        file_path (str): Path to the file.

    Returns:
        str: MD5 hash of the file, or None if an error occurs.
    """
    hasher = hashlib.md5()
    try:
        with open(file_path, 'rb') as f:
            for chunk in iter(lambda: f.read(4096), b""):
                hasher.update(chunk)
        return hasher.hexdigest()
    except Exception as e:
        print(f"[ERROR] Unable to calculate MD5 for {file_path}: {e}")
        return None

def scan_directory(directory, rules, output_file):
    """
    Scan a directory for files that do not match YARA rules and calculate their MD5 hashes.

    Args:
        directory (str): Path to the directory to scan.
        rules (yara.Rules): Compiled YARA rules.
        output_file (str): File to save MD5 hashes of unmatched files.
    """
    hash_count = 0  # Counter for total number of hashes written

    try:
        with open(output_file, 'w') as out:
            for root, _, files in os.walk(directory):
                for file in files:
                    file_path = os.path.join(root, file)
                    try:
                        # Check if the file matches any YARA rule
                        matches = rules.match(file_path)
                        if not matches:  # Only process files that do not match any rule
                            md5_hash = calculate_md5(file_path)
                            if md5_hash:
                                print(md5_hash)  # Print hash to console
                                out.write(md5_hash + '\n')  # Write only hash to output file
                                hash_count += 1
                    except yara.Error as ye:
                        print(f"[WARNING] YARA error scanning {file_path}: {ye}")
                    except Exception as e:
                        print(f"[ERROR] Unexpected error scanning {file_path}: {e}")
        
        # Report total number of hashes written and location of the output file
        print(f"\nScan completed.")
        print(f"Total number of hashes written: {hash_count}")
        print(f"Output file location: {os.path.abspath(output_file)}")

    except Exception as e:
        print(f"[ERROR] Failed to write to output file {output_file}: {e}")

if __name__ == "__main__":
    # Prompt user for directory to scan
    directory_to_scan = input("Enter directory to scan: ").strip()
    
    # Compile YARA rules
    try:
        yara_rules = compile_yara_rules()
    except Exception as e:
        print(f"[ERROR] Failed to compile YARA rules: {e}")
        exit(1)

    # Output filename for unmatched files' MD5 hashes
    output_filename = "XMZMD5.txt"

    # Check if the output file already exists and prompt user for action
    if os.path.exists(output_filename):
        overwrite_prompt = input(f"[WARNING] The file '{output_filename}' already exists. Do you want to overwrite it? (yes/no): ").strip().lower()
        
        if overwrite_prompt not in ['yes', 'y']:
            print("[INFO] Operation canceled by user.")
            exit(0)

    # Scan the directory
    if os.path.isdir(directory_to_scan):
        scan_directory(directory_to_scan, yara_rules, output_filename)
    else:
        print(f"[ERROR] The provided path is not a valid directory: {directory_to_scan}")

The third script is for counting and validation. I wanted to know the total number of files, and how many had the MZ header, were zip or pdf files, or none of the above. Based on the counts, the hash lists should contain a matching number of entries, the MZ’s for Windows malware samples and the “Neither Header Files” for the remaining binaries. Note: to run this script you will need to have the Python module “tabulate” installed. (pip install tabulate). There are 2 output options available, Detailed and Table View.

MZCount.py

import os
import yara
import time


def compile_yara_rules():
    """
    Compile YARA rules for MZ, PDF, and ZIP headers.
    
    Returns:
        yara.Rules: Compiled YARA rules.
    """
    rules = """
rule mz_header {
    meta:
        description = "Matches files with MZ header (Windows Executables)"
    strings:
        $mz = {4D 5A}  // MZ header in hex
    condition:
        $mz at 0  // Match if MZ header is at the start of the file
}

rule pdf_header {
    meta:
        description = "Matches files with PDF header"
    strings:
        $pdf = {25 50 44 46}  // PDF header in hex (%PDF)
    condition:
        $pdf at 0  // Match if PDF header is at the start of the file
}

rule zip_header {
    meta:
        description = "Matches files with ZIP header"
    strings:
        $zip = {50 4B 03 04}  // ZIP header in hex
    condition:
        $zip at 0  // Match if ZIP header is at the start of the file
}
"""
    try:
        return yara.compile(source=rules)
    except yara.SyntaxError as e:
        print(f"Error compiling YARA rules: {e}")
        raise


def display_table(counts):
    """
    Display the counts in a simple table format.
    
    Args:
        counts (dict): Dictionary containing counts for each file type.
    """
    # Clear console before displaying new table
    os.system('cls' if os.name == 'nt' else 'clear')  # Clears terminal for Windows ('cls') or Linux/Mac ('clear')
    
    # Print updated table
    print("\n+----------------------+---------+")
    print("| File Type           | Count   |")
    print("+----------------------+---------+")
    print(f"| Total Files         | {counts['total_files']:<7} |")
    print(f"| MZ Header Files     | {counts['mz_header']:<7} |")
    print(f"| PDF Header Files    | {counts['pdf_header']:<7} |")
    print(f"| ZIP Header Files    | {counts['zip_header']:<7} |")
    print(f"| Neither Header Files| {counts['neither_header']:<7} |")
    print("+----------------------+---------+")


def scan_and_count_files(directory, rules, use_table_display):
    """
    Scan files in a directory using YARA rules and count matches by header type.
    
    Args:
        directory (str): Path to the directory to scan.
        rules (yara.Rules): Compiled YARA rules.
        use_table_display (bool): Whether to use table display for live updates.
    
    Returns:
        dict: A dictionary with counts for total files, MZ headers, PDF headers, ZIP headers, and neither headers.
    """
    counts = {
        "total_files": 0,
        "mz_header": 0,
        "pdf_header": 0,
        "zip_header": 0,
        "neither_header": 0
    }

    # Walk through the directory and its subdirectories
    for root, _, files in os.walk(directory):
        for file in files:
            counts["total_files"] += 1
            file_path = os.path.join(root, file)
            
            try:
                # Open file in binary mode for YARA matching
                with open(file_path, "rb") as f:
                    data = f.read()
                
                # Match YARA rules against file content
                matches = rules.match(data=data)

                # Process matches
                if matches:
                    matched_rules = {match.rule for match in matches}
                    if "mz_header" in matched_rules:
                        counts["mz_header"] += 1
                    if "pdf_header" in matched_rules:
                        counts["pdf_header"] += 1
                    if "zip_header" in matched_rules:
                        counts["zip_header"] += 1
                else:
                    counts["neither_header"] += 1

            except Exception as e:
                print(f"Error scanning {file_path}: {e}")
                # Decrement total_files if an error occurs
                counts["total_files"] -= 1
            
            # Display updated output after processing each file
            if use_table_display:
                display_table(counts)
            else:
                print(f"Scanned: {file_path}")
                print(f"Current Counts: {counts}")
            
            time.sleep(0.1)  # Optional: Add a small delay for smoother updates
    
    return counts


if __name__ == "__main__":
    # Prompt user for directory to scan
    directory_to_scan = input("Enter directory to scan: ").strip()
    
    # Check if the directory exists
    if not os.path.isdir(directory_to_scan):
        print(f"Error: The directory '{directory_to_scan}' does not exist. Please enter a valid directory.")
        exit(1)
    
    # Prompt user for display format preference
    display_choice = input("Choose output format - (1) Detailed, (2) Table Display: ").strip()
    
    # Determine whether to use table display or original output format
    use_table_display = display_choice == "2"
    
    # Compile YARA rules
    yara_rules = compile_yara_rules()
    
    # Scan directory and count matches
    results = scan_and_count_files(directory_to_scan, yara_rules, use_table_display)
    
    # Final results display after completion
    print("\nFinal Results:")
    
    if use_table_display:
        display_table(results)
        
        # Handle case where no results were found
        if results["total_files"] == 0:
            print("No files were scanned. Please check your directory.")
            
    else:
        print(f"Total files scanned: {results['total_files']}")
        print(f"Files with MZ header: {results['mz_header']}")
        print(f"Files with PDF header: {results['pdf_header']}")
        print(f"Files with ZIP header: {results['zip_header']}")
        print(f"Files with neither MZ, PDF, nor ZIP header: {results['neither_header']}")

Double Checking the Hash Files

Finally we can use RegEx to count the number of MD5 hashes for each file. The RegEx looks for strings of 32 hexadecimal digits. (A-F and 0-9.)

grep -oE '\b[a-fA-F0-9]{32}\b' [filename].txt | wc -l

Regex output showing counts for hashes, 22789 and 5988 respectively.

The number of hashes in the MZMD5.txt hash list matches the number of MZ files identified by YARA. Additionally, the number of non-MZ binaries in the hash list, XMZMD5.txt, matches the number of files when we exclude the Windows binaries and the pdf and zip files.

There you have it, the fruits of my labors combining a few of my favorite things (cue John Coltrane), YARA, Malware, Python, and using AI as tool to develop my coding skills. If you’d like to download the scripts for your own usage, they can be found at https://github.com/dwmetz/Toolbox/ (Miscellaneous PowerShell and Python scripts related to YARA and Malware Analysis.)

Beyond Hashes: Simplifying Malware Identification with Python and MpCmdRun

DFIR, Forensics, Malware, PowerShell, Python

In an earlier post titled “Growing Your Malware Corpus”, I outlined methods for building a comprehensive test corpus of malware for detection engineering. It covers using sources like VX-Underground for malware samples and details how to organize and unzip these files using Python scripts.

In today’s post we’re going to cover using Python to apply a standard naming methodology to all our malware samples.

Depending on where you curate your samples from, they could be named by their hash, or as they were identified during investigation, like invoice.exe. Depending on the size of your collection, I’d surmise it’s highly unlikely that they have a consistent naming format.

I don’t know about you, but a title that indicates the malware family and platform is a lot more useful to me than a hash value when perusing the corpus for a juicy malware sample. We can rename all our malware files using Python and the command line utility for Windows Defender.

Step 1: You’ll need to install Python on a Windows box that has Windows Defender.

Install Python

If you don’t have Python installed on your Windows machine, you can do so by downloading the installer from python.org, or alternatively, installing from the Windows store.

*Windows Store installer for Python versions 3.7 to 3.12*

Directory Exclusion

Within the Windows Defender Virus & Threat protection settings, add an exclusion for the directory you’re going to be using with the malware. Make sure the exclusion is in place before connecting the drive with the malware so it doesn’t get nuked.

Note: Doing this assumes you’ve evaluated the potential risks associated with handling malware, even in controlled settings, and have taken safety precautions. This is not an exercise to be conducted on your corporate workstation.

*Screenshot of the D:\Malware Directory being excluded from Windows Defender.*

Automatic Sample submission

It’s up to you if you want to disable the Automatic Sample submission. If you do, you’ll still may get prompted to send some.

*Windows Defender requesting to send samples to Microsoft for further analysis.*

Rename_Malware.py

The star of this show is the python script that was shared on ~~twitter~~ from vx-underground.

The post walks through various options for utilizing Windows Defender command line, MpCPmdRun.exe. Using that information a Python script was developed to loop through the contents of a directory, analyze those files with Windows Defender, and then rename the files accordingly based on the malware identification.

*Python code for rename_malware.py in VS Code.*

You can grab the code from the linked post, or a copy on my Github here.

Once you’ve got Python installed, directory exclusion configured, and a pocketful of kryptonite (malware), – you’re ready to go.

python rename_malware.py D:\Malware

Windows Defender command line will run through each file and rename them based on its detection.

*The script recursively renames the analyzed files.*

I’m running this on a copy of my malware corpus of 30,000+ malware samples.

Counting the Corpus

A bit of handy PowerShell math. Before and after the process I wanted to be sure of how many files were present to ensure that the antivirus didn’t remove any. I also wanted to exclude counting pdfs as many of the samples in my corpus also have accompanying write-ups.

*Using PowerShell for selective file counting.*

Get-ChildItem -Recurse -file | Where-Object { $_.Extension ne *.pdf" } | Measure-Object | Select Count

Back at the console the script is still running.

*The script continues recursively renaming the analyzed files.*

Finally… not begrudgingly at all considering over 30,000 samples were analyzed, the script has reached the end of the samples.

*Script has reached the end of the files.*

If we do a directory listing on the contents of the malware directory, we see that the majority of the files have all been renamed based on their malware identification.

*File listing showing malware files named Trojan.Powershell… Trojan.Script… etc.*

Hooray!

That makes it much easier to search and query through the malware repository.

The last step… make a BACKUP. 😉

Growing Your Malware Corpus

DFIR, Malware, Python, yara

If you’re writing YARA rules or doing other kinds of detection engineering, you’ll want to have a test bed that you can run your rules against. This is known as a corpus. For your corpus you’ll want to have both Goodware (known good operating system files), as well as a library of malware files.

One source to get a lot of malware samples is from VX-Underground. What I really appreciate about VX-Underground is that in addition to providing lots of malware samples, they also produce an annual archive of samples and papers. You can download a whole year’s worth of samples and papers, from 2010 to 2023.

Pandora’s Box

Just to understand the structure here, I have a USB device called “Pandora.” On the root of the drive is a folder called “APT”, and within that is a “Samples” directory. Inside the samples directory is the .7z download for 2023 from VX-Underground. There’s also a python script… we’ll get to that soon enough.

The first thing we’ll need to do is unzip the download with the usual password.

7zz x 2023.7z

Once the initial extraction is complete you can delete the original 2023.7z archive.

Within the archive for each year, there is a directory for the sample, with sub-directories of ‘Samples’ and ‘Papers.’ Every one of the samples is also password protected zip file.

This makes sense from a safety perspective, but it makes it impossible to scan against all the files at once.

Python to the Rescue

We can utilize a Python script to recursively go through the contents of our malware folder and unzip all the password protected files, while keeping those files in their original directories.

You may have noticed in the first screenshot that I have a script called ExtractSamples.py in my APT directory.

We will use this for the recursive password protected extractions.

Python ExtractSamples.py

A flurry of code goes by, and you congratulate yourself on you Python prowess. Now if we look again at our contents, we’ve got the extracted sample and the original zip file.

Let’s get rid of all the zip files as we don’t need them cluttering up the corpus.

We can start by running a find command to identify all the 7zip files.

find . -type f -name '*.7z' -print

After you’ve checked the output and verified the command above is only grabbing the 7z files you want to delete, we can update the command to delete the found files.

find . -type f -name '*.7z' -delete

One more a directory listing to verify:

Success. All the 7z files are removed and all the sample files are intact.

GitHub Link: ExtractSamples.py

Time to go write some new detections!