April 9, 2025

How to extract data from PDF tables to Excel

In this post, we’ll explore how to tackle the challenge of extracting data from tables in PDFs—a common hurdle for businesses dealing with unstructured document formats. Whether it’s invoices, reports, or forms, extracting tabular data from PDFs can often mean resorting to manual entry or struggling with unreliable tools. We’ll show you how to automate this process using Cradl AI, a solution designed to simplify table data extraction and seamlessly integrate it into Excel. With the right tools, you can transform messy PDFs into structured, actionable data with ease.

Extracting data from tables is a common business challenge

Data extraction is an essential step in almost any document process where data is exchanged in unstructured formats like PDFs and images, and where digitalisation through APIs (Application Programming Interfaces) or EDI (Electronic Data Interchange) has yet to become the standard practice.

Ironically, while tables seem like a natural fit for Excel, they store the data in unstructured formats like PDFs or images, and their unique layouts often trip up tools like OCR. Consequentially, the "last resort" to data extraction from tables is also the most common: manual data entry.

Extracting data from PDF tables to Excel using AI

Let's now turn to how to actually solve this problem by creating a Cradl AI model that automatically extracts the information you want from your tables, and sends it to an Excel spreadsheet.

Before we begin, make sure you’ve created a Cradl AI account!

1. Define which data points to extract from your tables

Once you're inside the Cradl AI app, your first step is to create a schema of the information you want to extract from your documents. Clearly, tables are among them!

Because tables have a unique layout structure (usually rows and columns, or just rows), we'll use a unique field type to process them: the Line Items field. Cradl AI's data extraction model have been pre-trained on a variety of tables, and by using the Line Items field, you enable the AI model to draw on that experience.

Screenshot of the AI model configuration UI inside Cradl AI


When processing tables, the AI model requires only a single Line Items field to extract hundreds of data points. In the image above, the Line Items field will capture description, unit price, tax price, and total price for every row in the table, be it one or a hundred rows.

2. Extract data from your first table

Once you have configured and and saved your field schema, you're ready to extract data from a document! You can do it by simply uploading a document. When the processing is complete, you can review the results in the validation interface.

A particularly handy feature is the visual data-location mapping. Click on an extracted data field, and the location in the document gets highlighted:

Animation of a table data extraction inside Cradl AI


This, along with the confidence scores assigned to each data field by the AI, makes it easy for you to verify the AI’s output before exporting it to Excel.

3. Create an Excel Sheet to store your data

You now have an AI model that extracts data, so the next step is to create an Excel sheet to store the extracted data.

Cradl AI connects with Excel by using popular automation tools like Zapier (see more integrations), so make sure you create a new spreadsheet in a cloud file management service that integrates with Zapier or Power Automate, such as Google Drive or Microsoft OneDrive.

4. Integrating Cradl AI with your Excel sheet

You have a model that extracts data and a spreadsheet to store the data, now it is time to connect the two.

As mentioned above, Cradl AI integrates with Excel by using automation tools like Zapier (a direct integration between Cradl AI and Excel will be released in the future).

1. Create a new Zap

Head over to Zapier and create a free account. If you are unfamiliar with Zapier, you can view our video guide for a detailed step-by-step guide on how to connect Cradl AI with Zapier.

2. Choosing connectors

Use Cradl AI's «Document Parsing Completed» trigger to automatically extract data whenever a document is uploaded and validated in Cradl AI. Because we want to write multiple rows to our Excel sheet in one batch, we'll use Excel's «Add Row(s)» action.

When you're mapping Cradl AI's extracted data fields to your spreadsheet's headers in the «Add row(s)» action, you'll notice that you can choose from way more fields than the handful you defined in your spreadsheet's headers.

99% of the time you are looking for those values that are prefixed with «Validated Predictions» and suffixed with «Value» , such as «Validate Predictions Services Description Value», «Validate Predictions Services Unit Value», and so on.

4. Your automated flow is good to go

Test your flow and check your Excel sheet. The table data should have been successfully extracted from Cradl AI and populated your Excel spreadsheet!

Summary

Extracting data from tables in PDFs is a common challenge for businesses, often requiring tedious manual effort or unreliable tools. By using automated solutions like Cradl AI, you can simplify this process, leveraging the latest data extraction models to extract structured data quickly and accurately. Whether dealing with invoices, reports, or forms, automated workflows can transform unstructured PDF tables into actionable data, ready for integration into tools like Excel. This approach saves time, reduces errors, and streamlines document processing for a more efficient workflow.

You might also be interested in

Try for free today

We’ll help get you started with your document automation journey.

Schedule a free demo with our team today!