Document Processing Tool

The Document Processing tool provides intelligent document conversion capabilities for PDF, DOCX, XLSX, PPTX, HTML, CSV, PNG, and JPG files using the powerful Docling library.

Note: This tool is disabled by default. To enable it, set the ENABLE_ADDITIONAL_TOOLS environment variable to include process_document.

Overview

Convert documents to structured Markdown while preserving formatting, extracting tables, images, and metadata. The tool offers processing profiles for different use cases, from simple text extraction to advanced diagram analysis with AI models.

Note: mcp-devtools also providers a PDF extraction tool that's not quite as smart but is quick and doesn't require docling, see PDF Processing for more details.

This tool is experimental and actively developed.

Features

Multi-format Support: PDF, DOCX, XLSX, PPTX, HTML, CSV, PNG, JPG
Processing Profiles: Simplified interface with preset configurations
Intelligent Conversion: Preserves document structure and formatting
OCR Support: Extract text from scanned documents
Hardware Acceleration: Supports MPS (macOS), CUDA, and CPU processing
Caching System: Avoids reprocessing identical documents
Metadata Extraction: Document metadata (title, author, page count, etc.)
Table & Image Extraction: Preserves tables and images in markdown
Diagram Analysis: Advanced diagram detection using vision models
Mermaid Generation: Convert diagrams to editable Mermaid syntax
Auto-Save: Automatically saves processed content to files

Quick Start

First, enable the tool by setting the environment variable:

ENABLE_ADDITIONAL_TOOLS="process_document"

Then ensure docling is installed in the environment you'll be running the MCP Server from:

pip install -U pip docling

Usage

You can simply prompt the agent using the tool, e.g: "Use your document processing tool to convert and save /path/to/document.pdf to markdown".

{
  "name": "process_document",
  "arguments": {
    "source": "/path/to/document.pdf"
  }
}

This uses the default text-and-image profile and saves to /path/to/document.md.

Processing Profiles

`basic` - Fast Text Extraction

{
  "name": "process_document",
  "arguments": {
    "source": "/path/to/document.pdf",
    "profile": "basic"
  }
}

Text extraction only
Fastest processing
No image or diagram analysis
Best for: Simple text documents, quick content extraction

`text-and-image` - Balanced Processing (Default)

{
  "name": "process_document",
  "arguments": {
    "source": "/path/to/document.pdf",
    "profile": "text-and-image"
  }
}

Text and image extraction
Table processing
Good balance of speed and features
Best for: Most document types, general use

`scanned` - OCR Processing

{
  "name": "process_document",
  "arguments": {
    "source": "/path/to/scanned-document.pdf",
    "profile": "scanned"
  }
}

Optimised for scanned documents
OCR enabled by default
Best for: Image-based PDFs, scanned documents

`llm-smoldocling` - Vision Enhancement

{
  "name": "process_document",
  "arguments": {
    "source": "/path/to/document.pdf",
    "profile": "llm-smoldocling"
  }
}

Enhanced with SmolDocling vision model
Diagram detection and description
Chart data extraction
No external LLM required
Best for: Documents with diagrams and charts

`llm-external` - Advanced Diagram Processing

{
  "name": "process_document",
  "arguments": {
    "source": "/path/to/document.pdf",
    "profile": "llm-external"
  }
}

Full diagram-to-Mermaid conversion
Requires LLM environment variables
Most advanced processing capabilities
Best for: Complex documents with many diagrams
Requires: LLM configuration (see setup below)

Output Options

Save to File (Default)

{
  "name": "process_document",
  "arguments": {
    "source": "/path/to/document.pdf"
  }
}

Saves to /path/to/document.md
Images saved in same directory
Returns success message with file path

Custom Save Location

{
  "name": "process_document",
  "arguments": {
    "source": "/path/to/document.pdf",
    "save_to": "/custom/path/output.md"
  }
}

Return Content Inline

{
  "name": "process_document",
  "arguments": {
    "source": "/path/to/document.pdf",
    "return_inline_only": true
  }
}

Setup and Configuration

Prerequisites

Python 3.10+ (ideally 3.13+)
Docling (auto-installed if missing)

The tool will attempt to install Docling automatically if not found.

Environment Variables

Python Configuration

DOCLING_PYTHON_PATH="/path/to/python"  # Auto-detected if not set

The tool automatically detects Python installations with Docling in the following order:

DOCLING_PYTHON_PATH environment variable (highest priority)
.python-version file in current directory or home directory
Cached Python path from previous detection
Common Python installation paths

.python-version Support: The tool respects .python-version files (used by pyenv, asdf, and other version managers) for automatic Python version selection:

Checks current working directory first
Falls back to home directory if not found in working directory
Supports version formats like 3.11.5 or 3.11
Automatically resolves Python paths from:
- pyenv: ~/.pyenv/versions/
- asdf: ~/.asdf/installs/python/
- UV: ~/.local/share/uv/python/
- System: Homebrew and standard paths

Example .python-version file:

3.11.5

Cache Configuration

DOCLING_CACHE_DIR="~/.mcp-devtools/docling-cache"
DOCLING_CACHE_ENABLED="true"

Hardware Acceleration

DOCLING_HARDWARE_ACCELERATION="auto"  # auto, mps, cuda, cpu

Processing Configuration

DOCLING_TIMEOUT="300"              # Processing timeout in seconds (default: 300 = 5 minutes)
DOCLING_MAX_FILE_SIZE="100"        # Maximum file size in MB (default: 100 MB)
DOCLING_MAX_MEMORY_LIMIT="5368709120"  # Memory limit in bytes (default: 5GB)
MCP_DEVTOOLS_MEMORY_LIMIT="5368709120" # Go application memory limit in bytes (default: 5GB)

Memory Management

The tool implements memory limits to prevent runaway memory usage during document processing:

Go Application Limit: Set via MCP_DEVTOOLS_MEMORY_LIMIT (default: 5GB)
- Soft limit enforced by Go runtime's garbage collector
- Automatically triggers more aggressive GC when approaching limit
Python Process Limit: Set via DOCLING_MAX_MEMORY_LIMIT (default: 5GB)
- Hard limit enforced by OS resource limits
- Process terminated if limit exceeded

Example configuration for stricter limits:

# Limit to 2GB for both Go and Python
MCP_DEVTOOLS_MEMORY_LIMIT="2147483648"
DOCLING_MAX_MEMORY_LIMIT="2147483648"

OCR Configuration

DOCLING_OCR_LANGUAGES="en,fr,de"

LLM Configuration (for `llm-external` profile)

DOCLING_VLM_API_URL="http://localhost:11434/v1"     # OpenAI-compatible endpoint
DOCLING_VLM_MODEL="granite_docling"                 # Vision-capable model (default: granite_docling)
DOCLING_VLM_API_KEY="your-api-key-here"            # API key

Corporate Network Setup

For environments with MITM proxies:

DOCLING_EXTRA_CA_CERTS="/path/to/mitm-ca-bundle.pem"

OCR (Optical Character Recognition)

When to Use OCR

OCR Disabled (Default):

Best for: Digital documents (native PDFs, Word documents)
Advantages: Faster, perfect accuracy, preserves formatting
How it works: Extracts text directly from document structure

OCR Enabled (scanned profile):

Best for: Scanned documents, image-based PDFs, photos
Advantages: Processes any document type, handles handwritten text
How it works: Uses computer vision to recognise text from images

OCR Language Support

{
  "name": "process_document",
  "arguments": {
    "profile": "scanned",
    "ocr_languages": ["en", "fr", "de", "es"]
  }
}

Supported languages: English (en), French (fr), German (de), Spanish (es), Italian (it), Portuguese (pt), Dutch (nl), Russian (ru), Chinese (zh), Japanese (ja), Korean (ko), and many others.

Diagram Analysis and Mermaid Generation

Basic Diagram Analysis

The llm-smoldocling profile uses built-in vision models:

Automatic diagram detection
Type classification with confidence scores
Element extraction
No external services required

Advanced Mermaid Generation

The llm-external profile converts diagrams to Mermaid syntax:

Supported LLM Providers

Ollama (local): http://localhost:11434/v1
LM Studio (local): http://localhost:1234/v1
OpenAI: https://api.openai.com/v1
OpenRouter: https://openrouter.ai/api/v1

LLM Configuration

DOCLING_VLM_API_URL="http://localhost:11434/v1"
DOCLING_VLM_MODEL="granite_docling"  # Default VLM model (qwen2.5vl:7b-q8_0, or any other vision-capable model)
DOCLING_VLM_API_KEY="your-api-key"
DOCLING_LLM_MAX_TOKENS="16384"
DOCLING_LLM_TEMPERATURE="0.1"
DOCLING_LLM_TIMEOUT="240"

Diagram Features

Automatic Detection: Identifies flowcharts, architecture diagrams, charts
Mermaid Conversion: Generates valid Mermaid syntax
AWS Colour Coding: Consistent colour schemes for architecture diagrams
Validation: Validates generated Mermaid syntax
Fallback Handling: Graceful degradation if LLM unavailable

Response Examples

File Save Response

{
  "success": true,
  "message": "Content successfully exported to file",
  "save_path": "/path/to/document.md",
  "source": "/path/to/document.pdf",
  "cache_hit": false,
  "metadata": {
    "file_size": 15420,
    "document_title": "Document Title",
    "document_author": "Author Name",
    "page_count": 10,
    "word_count": 1500
  },
  "processing_info": {
    "processing_mode": "advanced",
    "processing_method": "advanced+vision:standard",
    "hardware_acceleration": "mps",
    "ocr_enabled": false,
    "processing_time": 2.5,
    "timestamp": "2025-07-09T22:12:15+10:00"
  }
}

Inline Content Response

{
  "source": "/path/to/document.pdf",
  "content": "# Document Title\n\nDocument content in markdown...",
  "cache_hit": false,
  "metadata": {
    "title": "Document Title",
    "author": "Author Name",
    "page_count": 10
  },
  "images": [
    {
      "id": "image_1",
      "type": "picture",
      "caption": "Figure 1",
      "file_path": "/path/to/extracted/image_1.png"
    }
  ],
  "diagrams": [
    {
      "id": "diagram_1",
      "type": "flowchart",
      "description": "Process flow diagram showing...",
      "mermaid_code": "flowchart TD\n    A[Start] --> B[Process]\n    B --> C[End]",
      "confidence": 0.95
    }
  ]
}

Performance

Profile Performance (Typical Document)

basic: 1-3 seconds
text-and-image: 3-10 seconds
scanned: 10-30 seconds
llm-smoldocling: 5-15 seconds
llm-external: 15-60 seconds

Hardware Impact

CPU: Baseline performance
MPS (macOS): 2-5x faster on Apple Silicon
CUDA: 3-10x faster on NVIDIA GPUs

Caching

Intelligent caching based on:

Document source and modification time
Processing parameters and profile
24-hour TTL by default

Common Use Cases

Research Document Analysis

{
  "name": "process_document",
  "arguments": {
    "source": "/path/to/research-paper.pdf",
    "profile": "llm-smoldocling"
  }
}

Scanned Document Digitisation

{
  "name": "process_document",
  "arguments": {
    "source": "/path/to/scanned-invoice.pdf",
    "profile": "scanned"
  }
}

Architecture Documentation

{
  "name": "process_document",
  "arguments": {
    "source": "/path/to/architecture-doc.pdf",
    "profile": "llm-external"
  }
}

Quick Text Extraction

{
  "name": "process_document",
  "arguments": {
    "source": "/path/to/simple-doc.pdf",
    "profile": "basic"
  }
}

Troubleshooting

Common Issues

"Python path is required but not found"

Install Python 3.10+ and ensure it's in PATH
Set DOCLING_PYTHON_PATH environment variable
Or create a .python-version file in your project directory or home directory
Supported version managers: pyenv, asdf, UV

"Docling not available"

Install: pip install docling
Verify: python -c "import docling; print('OK')"

"Processing timeout"

Increase DOCLING_TIMEOUT environment variable
Use faster profile (basic instead of llm-external)

"Hardware acceleration not working"

Install appropriate PyTorch version
Check: python -c "import torch; print(torch.backends.mps.is_available())"

"LLM external profile not available"

Set all DOCLING_LLM_* environment variables
Verify LLM endpoint accessibility
Ensure model supports vision input

Debug Mode

{
  "name": "process_document",
  "arguments": {
    "source": "/path/to/document.pdf",
    "debug": true
  }
}

For technical implementation details, see the Document Processing source documentation.

Uh oh!

FilesExpand file tree

document-processing.md

Latest commit

History