Skip to main content

Introducing My New Ruby Gem - PDFScanner

· 3 min read
Shekhar Patil
Full stack developer.

PDF files can contain much more than text and images — they may include JavaScript, embedded files, form inputs, or even malicious payloads. If your application processes user-uploaded PDFs, it's essential to scan them for such potential security threats.

The pdf_scanner Ruby gem offers an easy-to-use and customizable way to scan PDF files against configurable security policies. It even supports encrypted PDFs and can quarantine any file that violates your policies.

Why scan PDFs?

PDFs are a commonly used format for sharing information. However, attackers often exploit advanced PDF features to inject hidden scripts or deliver malware.

By scanning your PDFs before uploading, processing, or distributing them, you significantly reduce the surface area for security breaches.

Installation

Add the gem to your Gemfile:

gem 'pdf_scanner'

Or install it directly:

gem install pdf_scanner

How to Use pdf_scanner

Here’s a basic example of how to scan a PDF file:

require 'pdf_scanner'

scanner = PdfScanner::Scanner.new
result = scanner.scan(
target_file: 'files/sample.pdf', # Required
config_file: 'config/policy.yml', # Optional
policy: 'strict', # Optional
dir: 'quarantine', # Optional
passwd: 'secret' # Optional for encrypted PDFs
)

puts result

Parameters

ParameterDescription
target_filePath to the PDF file you want to scan (required).
config_fileYAML file with security policies (optional).
policyPolicy name to enforce (optional, defaults to standard).
dirDirectory to move/quarantine flagged files (optional).
passwdPassword for encrypted PDFs (optional).

What does the scan return?

The scan method returns a hash with two keys:

{
rejected_policies: ['EmbeddedFile', 'JavaScript'],
analysis_failure: []
}
  • rejected_policies: List of policies violated by the scanned PDF.
  • analysis_failure: Any errors encountered during the scan.

Configuring Security Policies

Security policies are defined in a YAML configuration file. Here’s an example policy:

standard:
allow:
- Text
- Image
deny:
- JavaScript
- EmbeddedFile
- LaunchAction
- ExternalStream

You can customize these rules to fit the specific needs of your application.

When should you use pdf_scanner?

You can use the gem in many practical scenarios, such as:

  • Validating user-uploaded PDFs on web forms
  • Automating document checks before emailing or sharing
  • As part of a secure document ingestion pipeline
  • Complying with data handling and security standards

Conclusion

The pdf_scanner gem is a lightweight yet powerful tool for securing your PDF processing workflows in Ruby. With support for customizable policies and encrypted PDFs, it’s a must-have for any Ruby developer handling file uploads.

You can find the gem on RubyGems or browse the source code on GitHub.