VAT invoice
Extracting metadata and VAT rates from invoices with examples in Python and Node.
Overview
This guide covers how to extract metadata and VAT rates from supplier invoices with examples in Python and Node.
You will extract the following metadata fields:
Name of the supplier
Name of the receiver
Invoice number
Purchase order number
Issue date
Pay due date
Total amount
Net amount
You will extract the following VAT rate fields:
VAT rate percentage
VAT rate net
This guide shows you how to
Getting your API Key
The Authorization header for your API key is: Token YOUR-API-KEY
(Login if you do not see one).
You can also obtain the API key by visiting the Settings page.
1. Create a new document type
Before you start extracting data, you need to define a document type. Navigate to the Dashboard page and click on the New document type button in the top right corner of the table. Next, select the VAT invoice card. The wizard will already pre-fill all the needed extraction fields along with the document type configuration. Click on the Create document type.
This will create a new document type named vat-invoice with the following fields:
supplier_name
invoice_number
purchase_order_number
receiver_name
issue_date
pay_due_date
total_amount
net_amount
The document type will have a VAT rate plugin already set up for your fields.
2. Add suppliers
Typless is a tool for automation. That's why you need to fill the dataset and train it first. To automate a new supplier, you first need to add its invoices to the data set. Download an example invoice from Best Flowers Inc:
To add a document to the dataset, use the add-document endpoint or use the training room, where you can easily upload a file and fill out the necessary information.
The dataset is created by uploading an original file with the correct value for each field defined inside the document type. A key point to note regarding VAT invoices is that you must also fill out all the VAT rates listed on the document, so the engine will take these into account the next time it performs extraction.
import json
import requests
import base64
file_name = 'vat_invoice_1.pdf'
with open(file_name, 'rb') as file:
base64_data = base64.b64encode(file.read()).decode('utf-8')
payload = {
"file": base64_data,
"file_name": file_name,
"document_type_name": "vat-invoice",
"learning_fields": [
{
"name": "supplier_name",
"value": "Best flowers Inc."
},
{
"name": "receiver_name",
"value": "James Bond"
},
{
"name": "invoice_number",
"value": "123/2017"
},
{
"name": "purchase_order_number",
"value": "001-001-30"
},
{
"name": "pay_due_date",
"value": "2017-06-30"
},
{
"name": "issue_date",
"value": "2017-06-16"
},
{
"name": "total_amount",
"value": "735.3300"
},
{
"name": "net_amount",
"value": "644.1400"
}
],
"vat_rates": [
[
{
"name": "vat_rate_percentage",
"value": "9.5000"
},
{
"name": "vat_rate_net",
"value": "404.1400"
},
],
[
{
"name": "vat_rate_percentage",
"value": "22.0000"
},
{
"name": "vat_rate_net",
"value": "240.0000"
},
]
]
}
url = "https://developers.typless.com/api/add-document"
headers = {
"Accept": "application/json",
"Content-Type": "application/json",
"Authorization": "<<apiKey>>"
}
response = requests.request("POST", url, json=payload, headers=headers)
print(response.json())
Response:
{
details: [ '0d0596ac5e7320eb9b75ee1b327dff4d899f1a6a' ],
message: 'Document added successfully.'
}
As you can see, to achieve high accuracy, Typless only needs the values that are in the document. Nevertheless, there are some rules to keep in mind when providing values.
Applying these rules to the provided example, you will change some fields:
total_amount
value was converted with number type rules from 735,33 to 735.3300net_amount
value was converted with number type rules from 644,14 to 644.1400issue_date
value was converted with date type rules from 16.06.2017 to 2017-06-16pay_due_date
value was converted with date type rules from 30.06.2017 to 2017-06-30
You also applied the same rules to the VAT rates on the document. VAT rates are structured as a list of lists, similarly to line items, so keep that in mind when building the data structure for training.
You will have one supplier added to your document type after you run the code example.
3. Execute training
👍 Training is executed automatically every day at 10 PM CET
For all of your suppliers with new documents in the dataset of all your document types. Free of charge
To immediately see results, you can trigger the training process on the Dashboard page.
Look for the VAT-invoice document type in the list, and click on .
4. Extract data from documents
After the training is finished, you can start precisely extracting data from documents from trained suppliers. Download a new example from Best Flowers Inc:
Download it and extract the data using the code:
import requests
import base64
file_name = 'vat_invoice_2.pdf'
with open(file_name, 'rb') as file:
base64_data = base64.b64encode(file.read()).decode('utf-8')
payload = {
"file": base64_data,
"file_name": file_name,
"document_type_name": "vat-invoice"
}
url = "https://developers.typless.com/api/extract-data"
headers = {
"Accept": "application/json",
"Content-Type": "application/json",
"Authorization": "<<apiKey>>"
}
response = requests.request("POST", url, json=payload, headers=headers)
for field in response.json()['extracted_fields']:
print(f'{field["name"]}: {field["values"][0]["value"]}')
print('--- VAT RATES ---')
for vat_rate in response.json()['vat_rates']:
for field in vat_rate:
print(f'{field["name"]}: {field["values"][0]["value"]}')
print('----------------------------------')
Response:
{
"customer": null,
"extracted_fields": [
{
"data_type": "AUTHOR",
"name": "supplier_name",
"values": [
{
"confidence_score": 0.987,
"height": -1,
"page_number": -1,
"value": "Best flowers Inc.",
"width": -1,
"x": -1,
"y": -1
}
]
},
{
"data_type": "DATE",
"name": "pay_due_date",
"values": [
{
"confidence_score": 0.99,
"height": 40,
"page_number": 0,
"value": "2017-06-30",
"width": 481,
"x": 1818,
"y": 775
},
{
"confidence_score": 0.125,
"height": 33,
"page_number": 0,
"value": "2017-06-16",
"width": 608,
"x": 1685,
"y": 715
}
]
},
{
"data_type": "STRING",
"name": "purchase_order_number",
"values": [
{
"confidence_score": 0.99,
"height": 51,
"page_number": 0,
"value": "001-001-35",
"width": 835,
"x": 1358,
"y": 1310
}
]
},
{
"data_type": "NUMBER",
"name": "total_amount",
"values": [
{
"confidence_score": 0.75,
"height": 32,
"page_number": 0,
"value": "398.3000",
"width": 112,
"x": 1208,
"y": 2978
},
{
"confidence_score": 0.75,
"height": 33,
"page_number": 0,
"value": "61.9500",
"width": 93,
"x": 829,
"y": 2977
},
{
"confidence_score": 0.75,
"height": 32,
"page_number": 0,
"value": "398.3000",
"width": 112,
"x": 1208,
"y": 3048
},
{
"confidence_score": 0.6875,
"height": 32,
"page_number": 0,
"value": "336.3500",
"width": 114,
"x": 541,
"y": 2977
},
{
"confidence_score": 0.625,
"height": 31,
"page_number": 0,
"value": "292.8000",
"width": 116,
"x": 1207,
"y": 2913
}
]
},
{
"data_type": "STRING",
"name": "invoice_number",
"values": [
{
"confidence_score": 0.99,
"height": 54,
"page_number": 0,
"value": "125/2021",
"width": 787,
"x": 1395,
"y": 1162
}
]
},
{
"data_type": "DATE",
"name": "issue_date",
"values": [
{
"confidence_score": 0.99,
"height": 33,
"page_number": 0,
"value": "2017-06-16",
"width": 608,
"x": 1685,
"y": 715
},
{
"confidence_score": 0.125,
"height": 40,
"page_number": 0,
"value": "2017-06-30",
"width": 481,
"x": 1818,
"y": 775
}
]
},
{
"data_type": "STRING",
"name": "receiver_name",
"values": [
{
"confidence_score": 0.99,
"height": 39,
"page_number": 0,
"value": "James Bond",
"width": 233,
"x": 170,
"y": 768
},
{
"confidence_score": 0.3125,
"height": 32,
"page_number": 0,
"value": "losed stre",
"width": 428,
"x": 173,
"y": 816
},
{
"confidence_score": 0.125,
"height": 51,
"page_number": 0,
"value": "chase orde",
"width": 835,
"x": 1358,
"y": 1310
},
{
"confidence_score": 0.125,
"height": 31,
"page_number": 0,
"value": "PIREA GOLD",
"width": 578,
"x": 224,
"y": 1551
},
{
"confidence_score": 0.125,
"height": 54,
"page_number": 0,
"value": "voice numb",
"width": 787,
"x": 1395,
"y": 1162
}
]
},
{
"data_type": "NUMBER",
"name": "net_amount",
"values": [
{
"confidence_score": 0.5625,
"height": 32,
"page_number": 0,
"value": "336.3500",
"width": 114,
"x": 541,
"y": 2977
},
{
"confidence_score": 0.5,
"height": 33,
"page_number": 0,
"value": "61.9500",
"width": 93,
"x": 829,
"y": 2977
},
{
"confidence_score": 0.5,
"height": 32,
"page_number": 0,
"value": "398.3000",
"width": 112,
"x": 1208,
"y": 2978
},
{
"confidence_score": 0.5,
"height": 36,
"page_number": 0,
"value": "336.3500",
"width": 219,
"x": 2093,
"y": 2068
},
{
"confidence_score": 0.4707,
"height": 36,
"page_number": 0,
"value": "398.3000",
"width": 242,
"x": 2063,
"y": 2217
}
]
}
],
"file_name": "vat_invoice_2.pdf",
"line_items": [],
"object_id": "0d05ad736c837edde4a5aa5434d06da713f7c2b2",
"vat_rates": [
[
{
"data_type": "NUMBER",
"name": "vat_rate_percentage",
"values": [
{
"confidence_score": 0.99,
"height": -1,
"page_number": -1,
"value": "9.5000",
"width": -1,
"x": -1,
"y": -1
}
]
},
{
"data_type": "NUMBER",
"name": "vat_rate_net",
"values": [
{
"confidence_score": 0.99,
"height": 31,
"page_number": 0,
"value": "96.3500",
"width": 94,
"x": 561,
"y": 2838
}
]
}
],
[
{
"data_type": "NUMBER",
"name": "vat_rate_percentage",
"values": [
{
"confidence_score": 0.99,
"height": -1,
"page_number": -1,
"value": "22.0000",
"width": -1,
"x": -1,
"y": -1
}
]
},
{
"data_type": "NUMBER",
"name": "vat_rate_net",
"values": [
{
"confidence_score": 0.99,
"height": 31,
"page_number": 0,
"value": "240.0000",
"width": 116,
"x": 541,
"y": 2913
}
]
}
]
]
}
You should successfully extract fields along with all the VAT rates present on the invoice.
5. Continuously improve models
Typless embraces the fact that the world is changing all the time. That's why you can improve models on the fly by providing correct data after extraction. Let's say your company has a new partner, Best Supplier. You don't need to start over with building the dataset. You can simply extract and send the correct data after they are verified by your users. You can learn more about providing feedback on the building a dataset page.
📘 Closed workflow loop - improve models live!
Use every action from your users to adapt and improve Typless models without any extra costs.
Running Typless live
The only thing that you need to do to automate your manual data entry is to integrate those simple API calls into your system.
Last updated