Skip to content
CTF

File Analysis

Comprehensive techniques and tools for analyzing files, extracting metadata, reverse engineering file formats, and uncovering hidden data in CTFs.

Basic File Identification

Before doing anything else, determine what kind of file you are dealing with. File extensions in CTFs are often missing or intentionally misleading.

Identifying File Types

BASH
# General file type identification based on magic bytes
file target_file

# Check for specific magic bytes/headers
xxd target_file | head
hexdump -C target_file | head

# TrID - Identifies file types from their binary signatures
trid target_file

Common Magic Bytes

  • Windows PE (EXE/DLL): 4D 5A (MZ)
  • ELF (Linux Binary): 7F 45 4C 46 (.ELF)
  • PDF: 25 50 44 46 (%PDF)
  • JPEG: FF D8 FF E0 (or FF E1, FF E8)
  • PNG: 89 50 4E 47 0D 0A 1A 0A
  • ZIP: 50 4B 03 04 (PK..)

Hex Editors and Magic Byte Manipulation

Sometimes a file's magic bytes are intentionally corrupted and need to be repaired.

BASH
# View hex dump interactively
hexeditor target_file

# Command-line hex dumping
xxd target_file | less

# Convert file to hex dump, edit the text file, then convert back
xxd target_file > file.hex
nano file.hex
xxd -r file.hex > restored_file.bin

# Dump only raw hex (no offsets or ASCII representation)
xxd -p target_file > raw_hex.txt
# Revert raw hex back to binary
xxd -r -p raw_hex.txt > bin_file

Strings & Text Extraction

Text hidden in binary files is one of the most common ways to find flags or clues.

Strings Commands

BASH
# Basic strings extraction (default is >= 4 chars)
strings target_file

# Extract strings with minimum length of 10
strings -n 10 target_file

# Extract 16-bit little-endian strings (common in Windows & Registry)
strings -el target_file

# Extract strings and show their byte offset in decimal
strings -t d target_file

# Extract all possible strings (useful if encoding is unknown)
strings -a target_file

Searching for Flags

BASH
# Search for specific flag formats (e.g., flag{...}, HTB{...})
strings target_file | grep -iE "flag\{|picoCTF\{|HTB\{"

# Extract the exact flag pattern using regex (-o outputs only the match)
strings target_file | grep -oE "flag\{[^}]+\}"

Metadata Extraction

Metadata often contains hidden clues, passwords, or coordinates.

ExifTool (Universal Metadata)

BASH
# Basic metadata extraction
exiftool target_file

# Find hidden or custom tags (shows actual tag names)
exiftool -s target_file

# Show all tags, including unknown/custom tags, organized by group
exiftool -a -u -g1 target_file

# Extract embedded thumbnails or images
exiftool -b -ThumbnailImage target_file > thumbnail.jpg

Other Metadata Tools

BASH
# Document metadata anonymization toolkit (reveals what metadata exists)
mat2 -s target_file 

# PDF specific metadata
pdfinfo file.pdf 

Image Analysis & Steganography

Images are the most common medium for hidden data in CTFs.

Basic Image Inspection

BASH
# Check PNG structure and report corruption/anomalies (very useful for corrupted chunks)
pngcheck -v target_file.png

# Read QR codes, Barcodes, etc. from images
zbarimg target_file.png

LSB & Bit-Level Steganography

BASH
# zsteg: Detects LSB steganography in PNG/BMP images
zsteg target_file.png
zsteg -a target_file.png # Try all methods (slow)

# Stegsolve (GUI Tool): Essential for viewing bit planes, color channels, and analyzing LSBs
java -jar stegsolve.jar

Password-Protected Steganography

BASH
# Steghide: Extracts hidden data from JPG and WAV files (requires password)
steghide extract -sf target_file.jpg

# Stegseek: Extremely fast password cracker for steghide
stegseek target_file.jpg /usr/share/wordlists/rockyou.txt

Audio Analysis

Audio files can hide data in metadata, LSBs, or visually in the spectrogram.

Tools & Techniques

  • Spectrograms: Open the file in Audacity or Sonic Visualiser. Change the view from "Waveform" to "Spectrogram". Flags are often drawn in the sound frequencies.
  • Morse Code: Listen for beeps or look at the waveform. Short/long pulses usually mean morse code.
BASH
# Strings can still apply to audio files!
strings target_audio.wav | grep -i flag

# Check for hidden files appended to the audio file
binwalk target_audio.wav

Archive Analysis & Password Cracking

Extracting and cracking archives is a staple of file analysis.

Decompression

BASH
# Standard formats
unzip target.zip
tar -xvf target.tar.gz
7z x target.7z

Cracking Archive Passwords

BASH
# ZIP files
zip2john target.zip > zip.hash
john --wordlist=/usr/share/wordlists/rockyou.txt zip.hash

# Faster alternative for ZIPs
fcrackzip -u -D -p /usr/share/wordlists/rockyou.txt target.zip

# RAR files
rar2john target.rar > rar.hash
john --wordlist=/usr/share/wordlists/rockyou.txt rar.hash

File Carving & Extraction

When files are embedded inside other files (like a firmware image or a document), you need to carve them out.

Binwalk

BASH
# Scan file for embedded signatures
binwalk target_file

# Automatically extract known file types
binwalk -e target_file

# Force extraction using an entropy graph (finds compressed/encrypted data)
binwalk -E target_file

# Extract specific signatures only (e.g., zip)
binwalk -D 'zip archive:zip' target_file

Foremost & Scalpel

These tools recover files based on their headers, footers, and internal data structures, bypassing filesystem analysis.

BASH
# Carve all known file types using foremost
foremost -i target_file -o output_dir/

# Carve specific file types (e.g., jpg, pdf) using foremost
foremost -t jpg,pdf -i target_file -o output_dir/

# Scalpel is highly customizable based on /etc/scalpel/scalpel.conf
scalpel -c /etc/scalpel/scalpel.conf -o output_dir/ target_file

Document Analysis

Malicious documents and PDFs often hide macros or scripts.

PDF Analysis

BASH
# High-level overview of PDF structures (looks for /JS, /JavaScript, /OpenAction)
pdfid target_file.pdf

# Search PDF objects and streams for specific content
pdf-parser -a target_file.pdf
pdf-parser --search javascript target_file.pdf

# Interactive/Deep PDF analysis
peepdf -i target_file.pdf
# peepdf useful commands: info, tree, object <id>, extract <id>

Microsoft Office Documents (OLE)

BASH
# Analyze OLE files (old Office formats like .doc, .xls) for macros
oleid target_file.doc

# Extract and analyze VBA macros
olevba target_file.doc 
olevba target_file.docx

# Modern Office formats (.docx, .xlsx) are simply ZIP archives!
unzip target_file.docx -d output_dir/
# Search the extracted internal XML files (e.g., word/document.xml) for flags

Executables & Binaries (PE / ELF)

While full reverse engineering is a separate field, basic static analysis is a critical first step.

BASH
# Detailed binary information (architecture, OS, dynamically linked, etc)
rabin2 -I target_binary
readelf -h target_elf

# Look at imported functions/symbols
readelf -s target_elf
objdump -T target_binary

Data Manipulation & Decoding

CTFs often involve decoding layers of Base64, Hex, or custom encodings.

Command Line Decoding

BASH
# Base64 Decode
echo "ZmxhZ3thYmNkZX0=" | base64 -d

# URL Decode
echo "%66%6c%61%67" | python3 -c "import sys, urllib.parse as ul; print(ul.unquote_plus(sys.stdin.read()))"

CyberChef / KEYSEC

Web-based tools (or local alternatives) are incredibly versatile.

  • Data formats: Base64, Hex, Binary, Octal.
  • Operations: XOR, AES Decryption, Bit Shifting.
  • File operations: ExtractFiles, Magic (attempts to auto-decode based on known patterns).

Summary Checklist for Unknown Files

  1. Run file and trid to see what the file format actually is.
  2. Check the raw bytes with xxd | head to verify magic bytes.
  3. Run strings and grep for the flag format (grep -iE "flag\{").
  4. Run exiftool to check for hidden metadata or comments.
  5. Run binwalk -e to check for and extract hidden files.
  6. If it's an image, check zsteg and stegsolve.
  7. If it's a document, treat .docx/.xlsx as ZIPs and unzip them.
  8. If it's an audio file, open it in Audacity and view the Spectrogram.
On this page