IBM's open-source tool Docling is a powerful python library for converting PDFs, Word Docs, PowerPoints, Excel Files, and even scanned documents into Markdown or JSON that's accessible for Large Language Models. It does this by bundling custom AI models for layout analysis and table structure recognition into an all-in-one package.
In this article, I’ll share how Docling performed on several test documents, from complex Excel spreadsheets to scanned school calendars, and where it shines (or stumbles) in turning documents into AI-ready data.
What Makes Docling Unique#
Beyond Simple OCR#
Many tools rely heavily on optical character recognition (OCR) alone, which can be slow and error-prone. Docling starts with existing text tokens whenever possible to avoid unnecessary OCR—only activating OCR for scanned or image-based content. When Docling does OCR, it integrates with popular engines like EasyOCR, and can also leverage ocrmac for an extra boost in speed and quality.
AI for Layout & Table Recognition#
While the OCR tools are powerful, Docling’s real magic lies in it's included deep learning models. It includes two that are automatically downloaded from HuggingFace the first time Docling runs.
- Layout Analysis Model (trained on DocLayNet and other datasets) to identify sections, headers, footers, figures, and more.
- TableFormer for advanced table extraction, even if cells lack borders or use multi-level row and column headings.
Using these specialized models, Docling pulls out text and structure, preserving the document’s logical flow.
Document Conversion Tests#
What follows is a breakdown of how Docling performed on various file formats and scenarios. I’ve included screenshots so you can see the results for yourself.
DOCX and PPTX: Mixed Results#
DOCX (Resume Example)
Docling did a solid job converting a resume built in Microsoft Word. Top-level titles such as the candidate’s name and major section headers were recognized accurately and headings translated neatly into Markdown.
PPTX (Presentation Example)
However, when trying to convert a PowerPoint File, I had a notable failure. It simply threw an error: “failed to convert,” with no further information. This same PPTX document converted without trouble in my testing with markitdown.
It’s unclear if the failure was tied to embedded graphics or certain layout features, but hopefully future Docling releases provide clearer diagnostics.
That said it had no trouble with a simpler PPTX document.
Excel Spreadsheets: Good Structure, Some Quirks#
Docling can also convert Excel data into Markdown tables. Here are two examples:
Simple XLSX
A simple spreadsheet with rows and columns converted almost flawlessly:
• Each cell mapped to the correct spot,
• Markdown tables had the appropriate headers, rows, and columns.
Complex Formatted XLSX
For a human-formatted Excel files that used blank cells for spacing, Docling’s output was decent but not perfect. I like how the TableFormer model broke out each section into it's own table and got rid of the empty spaces. This is a significant improvement over Markitdown. However, the way it output raw formulas instead of the computed values is an issue.
PDF Extraction: Docling’s Strong Suit#
In my tests, Docling handled PDFs better than most open-source tools—including Markitdown. It accurately preserved reading order and recognized multi-column layouts. Here’s an overview:
Bank Statement with a Table
It can even preserve images contained in a PDF as inline base64 strings. Below you can see the original PDF, the raw markdown conversion, and the markdown rendered as HTML.
OCR Examples: macOS LiveText Steals the Show
One of Docling’s standout features is its ability to selectively use OCR. On macOS, Docling can tap into LiveText to perform extremely fast, high-accuracy OCR. This was evident in the test below where I gave it an image scan of a school calendar.
• A scanned PNG file of a NYC public school calendar was converted into Markdown. • Each cell in the calendar table was recognized, including day headings and text blocks. • The final output was fully structured, with rows and columns mapped to the correct positions.
By combining accurate OCR with layout-aware parsing, Docling nailed the table structure in a single pass. This significantly speeds up the process of extracting textual content from scanned documents.
Observations and Limitations#
- Better PDF Handling: Docling excels at PDF extraction compared to other solutions.
- Office Documents: Although DOCX and XLSX generally convert well, PPTX support can be unpredictable.
- Table Recognition: TableFormer does a remarkable job with many table layouts, but heavily stylized or line-free tables can still pose a challenge.
- MacOS OCR: If you’re on macOS, you’ll want to enable the native OCR option for a boost in accuracy and speed—especially over large sets of image-heavy PDFs.
Conclusion: A Versatile Tool for AI Document Workflows#
Docling delivers a robust solution for preparing multi-format documents—scanned or otherwise—for AI workflows. By harnessing AI-driven layout and table recognition, Docling consistently outperforms generic converters in capturing headings, reading order, and table structure. In head-to-head tests against other open-source solutions like Markitdown, Docling wins hands-down on PDF extraction. However, for certain Office docs—especially PPTX—results can vary.