Skip to content

Latest commit

 

History

History
492 lines (413 loc) · 13.1 KB

File metadata and controls

492 lines (413 loc) · 13.1 KB

Document Processing Tool

The Document Processing tool provides intelligent document conversion capabilities for PDF, DOCX, XLSX, PPTX, HTML, CSV, PNG, and JPG files using the powerful Docling library.

Note: This tool is disabled by default. To enable it, set the ENABLE_ADDITIONAL_TOOLS environment variable to include process_document.

Overview

Convert documents to structured Markdown while preserving formatting, extracting tables, images, and metadata. The tool offers processing profiles for different use cases, from simple text extraction to advanced diagram analysis with AI models.

Note: mcp-devtools also providers a PDF extraction tool that's not quite as smart but is quick and doesn't require docling, see PDF Processing for more details.

This tool is experimental and actively developed.

Features

  • Multi-format Support: PDF, DOCX, XLSX, PPTX, HTML, CSV, PNG, JPG
  • Processing Profiles: Simplified interface with preset configurations
  • Intelligent Conversion: Preserves document structure and formatting
  • OCR Support: Extract text from scanned documents
  • Hardware Acceleration: Supports MPS (macOS), CUDA, and CPU processing
  • Caching System: Avoids reprocessing identical documents
  • Metadata Extraction: Document metadata (title, author, page count, etc.)
  • Table & Image Extraction: Preserves tables and images in markdown
  • Diagram Analysis: Advanced diagram detection using vision models
  • Mermaid Generation: Convert diagrams to editable Mermaid syntax
  • Auto-Save: Automatically saves processed content to files

Quick Start

First, enable the tool by setting the environment variable:

ENABLE_ADDITIONAL_TOOLS="process_document"

Then ensure docling is installed in the environment you'll be running the MCP Server from:

pip install -U pip docling

Usage

You can simply prompt the agent using the tool, e.g: "Use your document processing tool to convert and save /path/to/document.pdf to markdown".

{
  "name": "process_document",
  "arguments": {
    "source": "/path/to/document.pdf"
  }
}

This uses the default text-and-image profile and saves to /path/to/document.md.

Processing Profiles

basic - Fast Text Extraction

{
  "name": "process_document",
  "arguments": {
    "source": "/path/to/document.pdf",
    "profile": "basic"
  }
}
  • Text extraction only
  • Fastest processing
  • No image or diagram analysis
  • Best for: Simple text documents, quick content extraction

text-and-image - Balanced Processing (Default)

{
  "name": "process_document",
  "arguments": {
    "source": "/path/to/document.pdf",
    "profile": "text-and-image"
  }
}
  • Text and image extraction
  • Table processing
  • Good balance of speed and features
  • Best for: Most document types, general use

scanned - OCR Processing

{
  "name": "process_document",
  "arguments": {
    "source": "/path/to/scanned-document.pdf",
    "profile": "scanned"
  }
}
  • Optimised for scanned documents
  • OCR enabled by default
  • Best for: Image-based PDFs, scanned documents

llm-smoldocling - Vision Enhancement

{
  "name": "process_document",
  "arguments": {
    "source": "/path/to/document.pdf",
    "profile": "llm-smoldocling"
  }
}
  • Enhanced with SmolDocling vision model
  • Diagram detection and description
  • Chart data extraction
  • No external LLM required
  • Best for: Documents with diagrams and charts

llm-external - Advanced Diagram Processing

{
  "name": "process_document",
  "arguments": {
    "source": "/path/to/document.pdf",
    "profile": "llm-external"
  }
}
  • Full diagram-to-Mermaid conversion
  • Requires LLM environment variables
  • Most advanced processing capabilities
  • Best for: Complex documents with many diagrams
  • Requires: LLM configuration (see setup below)

Output Options

Save to File (Default)

{
  "name": "process_document",
  "arguments": {
    "source": "/path/to/document.pdf"
  }
}
  • Saves to /path/to/document.md
  • Images saved in same directory
  • Returns success message with file path

Custom Save Location

{
  "name": "process_document",
  "arguments": {
    "source": "/path/to/document.pdf",
    "save_to": "/custom/path/output.md"
  }
}

Return Content Inline

{
  "name": "process_document",
  "arguments": {
    "source": "/path/to/document.pdf",
    "return_inline_only": true
  }
}

Setup and Configuration

Prerequisites

  • Python 3.10+ (ideally 3.13+)
  • Docling (auto-installed if missing)

The tool will attempt to install Docling automatically if not found.

Environment Variables

Python Configuration

DOCLING_PYTHON_PATH="/path/to/python"  # Auto-detected if not set

The tool automatically detects Python installations with Docling in the following order:

  1. DOCLING_PYTHON_PATH environment variable (highest priority)
  2. .python-version file in current directory or home directory
  3. Cached Python path from previous detection
  4. Common Python installation paths

.python-version Support: The tool respects .python-version files (used by pyenv, asdf, and other version managers) for automatic Python version selection:

  • Checks current working directory first
  • Falls back to home directory if not found in working directory
  • Supports version formats like 3.11.5 or 3.11
  • Automatically resolves Python paths from:
    • pyenv: ~/.pyenv/versions/
    • asdf: ~/.asdf/installs/python/
    • UV: ~/.local/share/uv/python/
    • System: Homebrew and standard paths

Example .python-version file:

3.11.5

Cache Configuration

DOCLING_CACHE_DIR="~/.mcp-devtools/docling-cache"
DOCLING_CACHE_ENABLED="true"

Hardware Acceleration

DOCLING_HARDWARE_ACCELERATION="auto"  # auto, mps, cuda, cpu

Processing Configuration

DOCLING_TIMEOUT="300"              # Processing timeout in seconds (default: 300 = 5 minutes)
DOCLING_MAX_FILE_SIZE="100"        # Maximum file size in MB (default: 100 MB)
DOCLING_MAX_MEMORY_LIMIT="5368709120"  # Memory limit in bytes (default: 5GB)
MCP_DEVTOOLS_MEMORY_LIMIT="5368709120" # Go application memory limit in bytes (default: 5GB)

Memory Management

The tool implements memory limits to prevent runaway memory usage during document processing:

  • Go Application Limit: Set via MCP_DEVTOOLS_MEMORY_LIMIT (default: 5GB)

    • Soft limit enforced by Go runtime's garbage collector
    • Automatically triggers more aggressive GC when approaching limit
  • Python Process Limit: Set via DOCLING_MAX_MEMORY_LIMIT (default: 5GB)

    • Hard limit enforced by OS resource limits
    • Process terminated if limit exceeded

Example configuration for stricter limits:

# Limit to 2GB for both Go and Python
MCP_DEVTOOLS_MEMORY_LIMIT="2147483648"
DOCLING_MAX_MEMORY_LIMIT="2147483648"

OCR Configuration

DOCLING_OCR_LANGUAGES="en,fr,de"

LLM Configuration (for llm-external profile)

DOCLING_VLM_API_URL="http://localhost:11434/v1"     # OpenAI-compatible endpoint
DOCLING_VLM_MODEL="granite_docling"                 # Vision-capable model (default: granite_docling)
DOCLING_VLM_API_KEY="your-api-key-here"            # API key

Corporate Network Setup

For environments with MITM proxies:

DOCLING_EXTRA_CA_CERTS="/path/to/mitm-ca-bundle.pem"

OCR (Optical Character Recognition)

When to Use OCR

OCR Disabled (Default):

  • Best for: Digital documents (native PDFs, Word documents)
  • Advantages: Faster, perfect accuracy, preserves formatting
  • How it works: Extracts text directly from document structure

OCR Enabled (scanned profile):

  • Best for: Scanned documents, image-based PDFs, photos
  • Advantages: Processes any document type, handles handwritten text
  • How it works: Uses computer vision to recognise text from images

OCR Language Support

{
  "name": "process_document",
  "arguments": {
    "profile": "scanned",
    "ocr_languages": ["en", "fr", "de", "es"]
  }
}

Supported languages: English (en), French (fr), German (de), Spanish (es), Italian (it), Portuguese (pt), Dutch (nl), Russian (ru), Chinese (zh), Japanese (ja), Korean (ko), and many others.

Diagram Analysis and Mermaid Generation

Basic Diagram Analysis

The llm-smoldocling profile uses built-in vision models:

  • Automatic diagram detection
  • Type classification with confidence scores
  • Element extraction
  • No external services required

Advanced Mermaid Generation

The llm-external profile converts diagrams to Mermaid syntax:

Supported LLM Providers

  • Ollama (local): http://localhost:11434/v1
  • LM Studio (local): http://localhost:1234/v1
  • OpenAI: https://api.openai.com/v1
  • OpenRouter: https://openrouter.ai/api/v1

LLM Configuration

DOCLING_VLM_API_URL="http://localhost:11434/v1"
DOCLING_VLM_MODEL="granite_docling"  # Default VLM model (qwen2.5vl:7b-q8_0, or any other vision-capable model)
DOCLING_VLM_API_KEY="your-api-key"
DOCLING_LLM_MAX_TOKENS="16384"
DOCLING_LLM_TEMPERATURE="0.1"
DOCLING_LLM_TIMEOUT="240"

Diagram Features

  • Automatic Detection: Identifies flowcharts, architecture diagrams, charts
  • Mermaid Conversion: Generates valid Mermaid syntax
  • AWS Colour Coding: Consistent colour schemes for architecture diagrams
  • Validation: Validates generated Mermaid syntax
  • Fallback Handling: Graceful degradation if LLM unavailable

Response Examples

File Save Response

{
  "success": true,
  "message": "Content successfully exported to file",
  "save_path": "/path/to/document.md",
  "source": "/path/to/document.pdf",
  "cache_hit": false,
  "metadata": {
    "file_size": 15420,
    "document_title": "Document Title",
    "document_author": "Author Name",
    "page_count": 10,
    "word_count": 1500
  },
  "processing_info": {
    "processing_mode": "advanced",
    "processing_method": "advanced+vision:standard",
    "hardware_acceleration": "mps",
    "ocr_enabled": false,
    "processing_time": 2.5,
    "timestamp": "2025-07-09T22:12:15+10:00"
  }
}

Inline Content Response

{
  "source": "/path/to/document.pdf",
  "content": "# Document Title\n\nDocument content in markdown...",
  "cache_hit": false,
  "metadata": {
    "title": "Document Title",
    "author": "Author Name",
    "page_count": 10
  },
  "images": [
    {
      "id": "image_1",
      "type": "picture",
      "caption": "Figure 1",
      "file_path": "/path/to/extracted/image_1.png"
    }
  ],
  "diagrams": [
    {
      "id": "diagram_1",
      "type": "flowchart",
      "description": "Process flow diagram showing...",
      "mermaid_code": "flowchart TD\n    A[Start] --> B[Process]\n    B --> C[End]",
      "confidence": 0.95
    }
  ]
}

Performance

Profile Performance (Typical Document)

  • basic: 1-3 seconds
  • text-and-image: 3-10 seconds
  • scanned: 10-30 seconds
  • llm-smoldocling: 5-15 seconds
  • llm-external: 15-60 seconds

Hardware Impact

  • CPU: Baseline performance
  • MPS (macOS): 2-5x faster on Apple Silicon
  • CUDA: 3-10x faster on NVIDIA GPUs

Caching

Intelligent caching based on:

  • Document source and modification time
  • Processing parameters and profile
  • 24-hour TTL by default

Common Use Cases

Research Document Analysis

{
  "name": "process_document",
  "arguments": {
    "source": "/path/to/research-paper.pdf",
    "profile": "llm-smoldocling"
  }
}

Scanned Document Digitisation

{
  "name": "process_document",
  "arguments": {
    "source": "/path/to/scanned-invoice.pdf",
    "profile": "scanned"
  }
}

Architecture Documentation

{
  "name": "process_document",
  "arguments": {
    "source": "/path/to/architecture-doc.pdf",
    "profile": "llm-external"
  }
}

Quick Text Extraction

{
  "name": "process_document",
  "arguments": {
    "source": "/path/to/simple-doc.pdf",
    "profile": "basic"
  }
}

Troubleshooting

Common Issues

"Python path is required but not found"

  • Install Python 3.10+ and ensure it's in PATH
  • Set DOCLING_PYTHON_PATH environment variable
  • Or create a .python-version file in your project directory or home directory
  • Supported version managers: pyenv, asdf, UV

"Docling not available"

  • Install: pip install docling
  • Verify: python -c "import docling; print('OK')"

"Processing timeout"

  • Increase DOCLING_TIMEOUT environment variable
  • Use faster profile (basic instead of llm-external)

"Hardware acceleration not working"

  • Install appropriate PyTorch version
  • Check: python -c "import torch; print(torch.backends.mps.is_available())"

"LLM external profile not available"

  • Set all DOCLING_LLM_* environment variables
  • Verify LLM endpoint accessibility
  • Ensure model supports vision input

Debug Mode

{
  "name": "process_document",
  "arguments": {
    "source": "/path/to/document.pdf",
    "debug": true
  }
}

For technical implementation details, see the Document Processing source documentation.