Find Long Files Tool

The Find Long Files tool efficiently identifies code files that exceed a specified line count threshold, helping AI coding agents to identify files that may need refactoring for effective reading, editing and understanding.

Overview

This tool scans a project directory and returns a formatted checklist of files that are longer than a specified threshold (default: 700 lines). It's designed to help maintain code quality by identifying files that may have grown too large and need to be split into smaller, more manageable components.

Features

Fast Line Counting: Uses optimised buffer-based counting for superior performance
Comprehensive Exclusions: Automatically excludes binary files, build artifacts, and common non-code files
Gitignore Respect: Automatically respects .gitignore patterns in the project
Permission Handling: Gracefully handles permission errors without failing
Environment Variables: Configurable via environment variables for automation
Formatted Output: Returns a clean checklist with line counts, file sizes, and execution timing
Size Limits: Automatically skips files larger than 2MB (configurable) and reports them separately

Usage Examples

While intended to be activated via a prompt to an agent, below are some example JSON tool calls.

Basic Project Scan

{
  "name": "find_long_files",
  "arguments": {
    "path": "/Users/username/git/my-project"
  }
}

Custom Line Threshold

{
  "name": "find_long_files",
  "arguments": {
    "path": "/Users/username/git/my-project",
    "line_threshold": 500
  }
}

Additional File Exclusions

{
  "name": "find_long_files",
  "arguments": {
    "path": "/Users/username/git/my-project",
    "line_threshold": 800,
    "additional_excludes": ["**/*.generated.js", "**/migrations/**"]
  }
}

Parameters

Parameter	Type	Required	Default	Description
`path`	string	Yes	-	Absolute directory path to search (e.g., '/Users/username/git/project')
`line_threshold`	number	No	700	Minimum number of lines to consider a file 'long'
`additional_excludes`	array	No	[]	Additional glob patterns to exclude beyond .gitignore

Environment Variables

Variable	Description	Example
`LONG_FILES_DEFAULT_LENGTH`	Override default line threshold	`1000`
`LONG_FILES_ADDITIONAL_EXCLUDES`	Comma-separated additional exclusion patterns	`*/.test.js,*/.spec.js`
`LONG_FILES_RETURN_PROMPT`	Custom message returned with the checklist (set to empty string `""` to disable)	`Custom instructions...`
`LONG_FILES_SORT_BY_DIRECTORY_TOTALS`	Sort by directory totals instead of individual files	`true`
`LONG_FILES_MAX_SIZE_KB`	Maximum file size in KB before skipping (default: 2048)	`4096`

Default Exclusions

The tool automatically excludes many file types and directories:

Binary and Document Files

Office documents: **/*.docx, **/*.pdf, **/*.pptx, etc.
Images: **/*.png, **/*.jpg, **/*.gif, etc.
Archives: **/*.zip, **/*.tar, **/*.gz, etc.
Executables: **/*.exe, **/*.dll, **/*.bin, etc.

Development Artifacts

Build directories: **/build/**, **/dist/**, **/target/**
Dependencies: **/node_modules/**, **/vendor/**
Caches: **/.cache/**, **/__pycache__/**
Version control: **/.git/**, **/.svn/**

Temporary and System Files

Logs: **/*.log, **/*.out.*
Temporary: **/*.tmp, **/*.bak, **/*.swp
System: **/.DS_Store, **/Thumbs.db

Output Format

The tool returns a formatted checklist with:

# Checklist of files over 700 lines

Last checked: 2025-07-29 09:29:46
Calculated in: 2.4s

- [ ] `./backend/main.py`: 1500 Lines, 108KB
- [ ] `./frontend/components/AgentNetwork.tsx`: 1492 Lines, 95KB
- [ ] `./backend/agents/orchestrator_agent.py`: 1248 Lines, 87KB

## Skipped Files (>2.0MB)

The following files were skipped due to being larger than 2.0MB:

- `./data/large_dataset.json`

Next Steps (Unless the user has instructed you otherwise):

1. Please take this checklist and save it to a temporary location.
2. Perform a quick review of the identified files and under each checklist item adds a concise (1-2 sentences) summary of what should be done to reduce the file length - for example the best solution might be to split the file up into multiple files and if so what sort of pattern or logic should be used to decide what goes where ensuring your strategy values concise, clean, efficient code and operations.
3. Then stop and ask the user to review.

## Performance

The tool uses optimised algorithms for fast execution:
- Buffer-based line counting (4.5x faster than traditional scanning)
- 32KB read buffers for optimal I/O performance
- Efficient binary file detection
- Permission-aware directory traversal

Typical performance: 1-4s for medium-sized projects.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Find Long Files Tool

Overview

Features

Usage Examples

Basic Project Scan

Custom Line Threshold

Additional File Exclusions

Parameters

Environment Variables

Default Exclusions

Binary and Document Files

Development Artifacts

Temporary and System Files

Output Format

Uh oh!

FilesExpand file tree

find_long_files.md

Latest commit

History

find_long_files.md

File metadata and controls

Find Long Files Tool

Overview

Features

Usage Examples

Basic Project Scan

Custom Line Threshold

Additional File Exclusions

Parameters

Environment Variables

Default Exclusions

Binary and Document Files

Development Artifacts

Temporary and System Files

Output Format