How to Extract Data from PDF Tables to Excel with AI

Learn how to accurately extract and convert data from complex tables with AI-powered OCR and seamlessly export it to Excel or Google Sheets. Leverage this knowledge to transform any unstructured PDF into spreadsheet-friendly formats with minimal manual effort using cutting-edge AI technology.

Extracting data from PDF tables is a common business challenge

Data extraction is an essential step in almost any document process where data is exchanged in unstructured formats like PDFs and images, and where digitalisation through APIs (Application Programming Interfaces) or EDI (Electronic Data Interchange) has yet to become the standard practice.

Ironically, while tables seem like a natural fit for Excel, they store the data in unstructured formats like PDFs or images, and their unique layouts often trip up tools like OCR. Consequentially, the "last resort" to data extraction from tables is also the most common: manual data entry.

Extracting data from PDF tables to Excel using AI

AI-powered OCR tools overcome these challenges by combining text recognition with contextual understanding to precisely extracted data from unstructured documents.

Let's now see how to actually use AI-powered OCR tools to extract data from PDF tables to Excel, step-by-step.

Ingredients

  • Database: Excel sheet to store the extracted table data.
  • Data extraction tool: Cradl AI to automate data extraction from documents containing tables.
  • Orchestration: Zapier to transfer data from Cradl AI into the Excel sheet.

Specify the data you need to extract

Before we begin, make sure you’ve created a Cradl AI account!

Once inside the Cradl AI app, your first task is to define a schema for the data you want to extract. Tables, with their structured rows and columns, are ideal for using Cradl AI's Line Items field.

The Line Items field is specifically designed for tabular data and leverages Cradl AI's pre-trained model to extract information efficiently. Whether your table has a few rows or hundreds, this field captures all relevant data points, such as descriptions, unit prices, tax amounts, and totals, in one go.

Screenshot of the AI model configuration UI inside Cradl AI


Extract data from your first table

Once you have configured and and saved your field schema, you're ready to extract data from a document! You can do it by simply uploading a document. When the processing is complete, you can review the results in the validation interface.

A particularly handy feature is the visual data-location mapping. Click on an extracted data field, and the location in the document gets highlighted:

Animation of a table having its data extracted in Cradl AI


Create an Excel sheet for your data

You now have an AI model that extracts data, so the next step is to create an Excel sheet to store the extracted data.

Cradl AI connects with Excel by using popular automation tools like Zapier (see more integrations), so make sure you create a new spreadsheet in a cloud file management service that integrates with Zapier or Power Automate, such as Google Drive or Microsoft OneDrive.

Export the extracted table data to Excel

You have a model that extracts data and a spreadsheet to store the data, now it is time to connect the two.

As mentioned above, Cradl AI integrates with Excel by using automation tools like Zapier (a direct integration between Cradl AI and Excel will be released in the future).

Create a new Zap

Head over to Zapier to create a free account and create your first «Zap».

If you are unfamiliar with Zapier, check out our article on How to Extract data from PDFs in Zapier for detailed guidance on how to connect Zapier and Cradl AI.

Choosing connectors

Use Cradl AI's «Document Parsing Completed» trigger to automatically extract data whenever a document is uploaded and validated in Cradl AI. Because we want to write multiple rows to our Excel sheet in one batch, we'll use Excel's «Add Row(s)» action.


When you're mapping Cradl AI's extracted data fields to your spreadsheet's headers in the «Add row(s)» action, you'll notice that you can choose from way more extracted values than the handful you defined in your spreadsheet's headers and AI model.

99% of the time you are looking for those values that are prefixed with «Validated Predictions» and suffixed with «Value» , such as «Validate Predictions Services Description Value», «Validate Predictions Services Unit Value», and so on.

Run the entire data extraction flow

Run a test to ensure your flow works. If configured correctly, Cradl AI will extract the table data and populate your Excel spreadsheet automatically!

You might also be interested in

Try for free today

We’ll help get you started with your document automation journey.

Schedule a free demo with our team today!