|
4 | 4 |  |
5 | 5 | [](https://github.com/microsoft/autogen) |
6 | 6 |
|
7 | | -> [!TIP] |
8 | | -> MarkItDown now offers an MCP (Model Context Protocol) server for integration with LLM applications like Claude Desktop. See [markitdown-mcp](https://github.com/microsoft/markitdown/tree/main/packages/markitdown-mcp) for more information. |
9 | | -
|
10 | 7 | > [!IMPORTANT] |
11 | | -> Breaking changes between 0.0.1 to 0.1.0: |
12 | | -> * Dependencies are now organized into optional feature-groups (further details below). Use `pip install 'markitdown[all]'` to have backward-compatible behavior. |
13 | | -> * convert\_stream() now requires a binary file-like object (e.g., a file opened in binary mode, or an io.BytesIO object). This is a breaking change from the previous version, where it previously also accepted text file-like objects, like io.StringIO. |
14 | | -> * The DocumentConverter class interface has changed to read from file-like streams rather than file paths. *No temporary files are created anymore*. If you are the maintainer of a plugin, or custom DocumentConverter, you likely need to update your code. Otherwise, if only using the MarkItDown class or CLI (as in these examples), you should not need to change anything. |
| 8 | +> MarkItDown performs I/O with the privileges of the current process. Like open() or requests.get(), it will access resources that the process itself can access. Sanitize your inputs in untrusted environments, and call the narrowest `convert_*` function needed for your use case (e.g., `convert_stream()`, or `convert_local()`). See the [Security Considerations](#security-considerations) section of the documentation for more information. |
15 | 9 |
|
16 | 10 | MarkItDown is a lightweight Python utility for converting various files to Markdown for use with LLMs and related text analysis pipelines. To this end, it is most comparable to [textract](https://github.com/deanmalmgren/textract), but with a focus on preserving important document structure and content as Markdown (including: headings, lists, tables, links, etc.) While the output is often reasonably presentable and human-friendly, it is meant to be consumed by text analysis tools -- and may not be the best option for high-fidelity document conversions for human consumption. |
17 | 11 |
|
@@ -267,6 +261,14 @@ You can help by looking at issues or helping review PRs. Any issue or PR is welc |
267 | 261 |
|
268 | 262 | - Run pre-commit checks before submitting a PR: `pre-commit run --all-files` |
269 | 263 |
|
| 264 | +### Security Considerations |
| 265 | + |
| 266 | +MarkItDown performs I/O with the privileges of the current process. Like `open()` or `requests.get()`, it will access resources that the process itself can access. |
| 267 | + |
| 268 | +**Sanitize your inputs:** Do not pass untrusted input directly to MarkItDown. If any part of the input may be controlled by an untrusted user or system, such as in hosted or server-side applications, it must be validated and restricted before calling MarkItDown. Depending on your environment, this may include restricting file paths, limiting URI schemes and network destinations, and blocking access to private, loopback, link-local, or metadata-service addresses. |
| 269 | + |
| 270 | +**Call only the conversion method you need:** Prefer the narrowest conversion API that fits your use case. MarkItDown's `convert()` method is intentionally permissive and can handle local files, remote URIs, and byte streams. If your application only needs to read local files, call `convert_local()` instead. If you need more control over URI fetching, call `requests.get()` yourself and pass the response object to `convert_response()`. For maximum control, open a stream to the input you want converted and call `convert_stream()`. |
| 271 | + |
270 | 272 | ### Contributing 3rd-party Plugins |
271 | 273 |
|
272 | 274 | You can also contribute by creating and sharing 3rd party plugins. See `packages/markitdown-sample-plugin` for more details. |
|
0 commit comments