Invoice with line items

Extracting metadata and line items from invoices with examples in Python and Node.

Overview

This guide covers how to extract metadata and line items from multiple supplier invoices with examples in Python and Node.

You will extract the following metadata fields:

  • Name of the supplier

  • Name of the receiver

  • Invoice number

  • Purchase order number

  • Issue date

  • Pay due date

  • Total amount

You will extract the following line item fields:

  • Product number

  • Product description

  • Quantity

  • Price

This guide shows you how to

Getting your API Key

The Authorization header for your API key is: Token YOUR-API-KEY (Login if you do not see one). You can also obtain the API key by visiting the Settings page.

Getting your API key

1. Create a new document type

Before you start extracting data, you need to define a document type. Navigate to the Dashboard page and click on the New document type button in the top right corner of the table. Next, select the Line items invoice card. The wizard will already pre-fill all the needed extraction fields along with the document type configuration. Click on the Create document type.

This will create a new document type with name line-item-invoice with the following fields:

  • supplier_name

  • invoice_number

  • purchase_order_number

  • receiver_name

  • issue_date

  • pay_due_date

  • total_amount

and line items fields:

  • product_description

  • product_number

  • price

  • quantity

2. Add suppliers

Once your document type is created, you need to add data to the dataset of your document type. To do that, download the following invoices:

Each invoice is from a different supplier.

The dataset is created by uploading an original file with the correct value for each field defined inside the document type.

Example code:

1 Open file as base64 string (Lines 6-7)

Make sure that you are pointing to the correct path.

2 Specify payload (Lines 9-65)

3 Specify headers (Lines 70-74)

4 Make POST request (Lines 76-78)

import requests
import base64

file_name = 'amazing_company_1.pdf'

with open(file_name, 'rb') as file:
    base64_data = base64.b64encode(file.read()).decode('utf-8')

payload = {
    "file": base64_data,
    "file_name": file_name,
    "document_type_name": "line-item-invoice",
    "learning_fields": [
        {
            "name": "supplier_name",
            "value": "Amazing Company"
        },
        {
            "name": "receiver_name",
            "value": "Amazing Client"
        },
        {
            "name": "invoice_number",
            "value": "333"
        },
        {
            "name": "purchase_order_number",
            "value": "234778"
        },
        {
            "name": "pay_due_date",
            "value": "2021-03-31"
        },
        {
            "name": "issue_date",
            "value": "2021-02-01"
        },
        {
            "name": "total_amount",
            "value": "15.0000"
        }
    ],
    "line_items": [
        [
            {
                "name": "product_number",
                "value": None
            },
            {
                "name": "product_description",
                "value": "Amazing service"
            },
            {
                "name": "quantity",
                "value": "1"
            },
            {
                "name": "price",
                "value": "15.0000"
            }

        ]

    ]
}


url = "https://developers.typless.com/api/add-document"

headers = {
    "Accept": "application/json",
    "Content-Type": "application/json",
    "Authorization": "<<apikey>>"
}

response = requests.request("POST", url, json=payload, headers=headers)

print(response.json())

Response:

{
    "details":["0cb9d3a652781bae68a2cba92a55ef308560bf4c"],
    "message":"Document added successfully."
}

As you can see, to achieve high accuracy, Typless only needs the values that are in the document. Nevertheless, there are some rules to keep in mind when providing values.

Applying these rules to the Amazing Company example, you changed three fields:

  • total_amount value was converted with number type rules from 15,00 to 15.0000

  • issue_date value was converted with date type rules from the word Feb 1, 2021, to 2021-02-01

  • pay_due_date value was converted with date type rules from the word Mar 31, 202,1 to 2021-03-31

The same rules were also applied to the Good Services example.

You will have two suppliers added to your document type after you run both code examples.

3. Execute training

To immediately see results, you can trigger the training process on the Dashboard page. Look for the line-item-invoice document type in the list, and click on .

Need more information about training? Read more about training.

4. Extract data from documents

After the training is finished, you can start precisely extracting data from documents from trained suppliers. Here you have two new invoices from the trained suppliers:

Download them and extract the data using the code:

1 Open file as base64 string (Lines 6-7)

Make sure that you are pointing to the correct path.

2 Specify payload (Lines 8-12)

3 Specify headers (Lines 16-20)

4 Make POST request (Lines 22-24)

import requests
import base64

file_name = 'amazing_company_2.pdf'
with open(file_name, 'rb') as file:
    base64_data = base64.b64encode(file.read()).decode('utf-8')

payload = {
    "file": base64_data,
    "file_name": file_name,
    "document_type_name": "line-item-invoice"
}

url = "https://developers.typless.com/api/extract-data"

headers = {
    "Accept": "application/json",
    "Content-Type": "application/json",
    "Authorization": "<<apikey>>"
}

response = requests.request("POST", url, json=payload, headers=headers)

for field in response.json()['extracted_fields']:
    print(f'{field["name"]}: {field["values"][0]["value"]}')

Response:

// Example extraction response - the provided recipe will not produce equal results
{
    "file_name": "invoice_2.pdf",
    "object_id": "1cb25cc8-c9fa-4149-9a83-b4ed6a2173b9",
    "extracted_fields": [
        {
            "name": "supplier",
            "values": [
                {
                    "x": -1,
                    "y": -1,
                    "width": -1,
                    "height": -1,
                    "value": "ScaleGrid",
                    "confidence_score": "0.968",
                    "page_number": -1
                }
            ],
            "data_type": "AUTHOR"
        },
        {
            "name": "invoice_number",
            "values": [
                {
                    "x": 1989,
                    "y": 545,
                    "width": 323,
                    "height": 54,
                    "value": "20190500005890",
                    "confidence_score": "0.250",
                    "page_number": 0
                },
                {
                    "x": 167,
                    "y": 574,
                    "width": 391,
                    "height": 54,
                    "value": "GB123456789",
                    "confidence_score": "0.250",
                    "page_number": 0
                }
            ],
            "data_type": "STRING"
        },
        {
            "name": "issue_date",
            "values": [
                {
                    "x": 2072,
                    "y": 628,
                    "width": 240,
                    "height": 54,
                    "value": "2019-06-05",
                    "confidence_score": "0.358",
                    "page_number": 0
                }
            ],
            "data_type": "DATE"
        },
        {
            "name": "total_amount",
            "values": [
                {
                    "x": 2146,
                    "y": 1196,
                    "width": 126,
                    "height": 54,
                    "value": "47.5300",
                    "confidence_score": "0.990",
                    "page_number": 0
                }
            ],
            "data_type": "NUMBER"
        }
    ],
    "line_items": [
        [
            {
                "name": "Description",
                "values": [
                    {
                        "x": 208,
                        "y": 1196,
                        "width": 1022,
                        "height": 50,
                        "value": "5/2019-MongoBackend-MgmtStandalone-Small-744 hours",
                        "confidence_score": "0.661",
                        "page_number": 0
                    }
                ],
                "data_type": "STRING"
            },
            {
                "name": "Price",
                "values": [
                    {
                        "x": 2146,
                        "y": 1196,
                        "width": 126,
                        "height": 54,
                        "value": "47.5300",
                        "confidence_score": "0.582",
                        "page_number": 0
                    }
                ],
                "data_type": "NUMBER"
            },
            {
                "name": "Quantity",
                "values": [
                    {
                        "x": 1979,
                        "y": 1196,
                        "width": 23,
                        "height": 54,
                        "value": "1",
                        "confidence_score": "0.647",
                        "page_number": 0
                    }
                ],
                "data_type": "NUMBER"
            }
        ]
    ],
    "customer": null
}

Need a more in-depth explanation of the response? You can read about it here.

5. Continuously improve models

Typless embraces the fact that the world is changing all the time. That's why you can improve models on the fly by providing correct data after extraction. Let's say your company has a new partner, Best Supplier. You don't need to start over with building the dataset. You can simply extract and send the correct data after they are verified by your users. You can learn more about providing feedback on the building dataset page.

To send feedback, use the add-document-feedback with the object_id.

Running Typless live

The only thing that you need to do to automate your manual data entry is to integrate those simple API calls into your system.

Typless usage is very easy and straightforward!

Last updated