# VAT invoice

### Overview

This guide covers how to extract metadata and VAT rates from supplier invoices with examples in **Python** and **Node**.

You will extract the following metadata fields:

* Name of the supplier
* Name of the receiver
* Invoice number
* Purchase order number
* Issue date
* Pay due date
* Total amount
* Net amount

You will extract the following VAT rate fields:

* VAT rate percentage
* VAT rate net

**This guide shows you how to**

1. [Create vat-invoice document type](#id-1.-create-a-new-document-type)
2. [Add a supplier to your dataset](#id-2.-add-suppliers)
3. [Execute training](#id-3.-execute-training)
4. [Extract data from documents](#id-4.-extract-data-from-documents)
5. [Continuously improve models after extraction](#id-5.-continuously-improve-models)

## Getting your API Key

The *Authorization* header for your API key is: `Token YOUR-API-KEY` ([Login](https://app.typless.com/login/?redirect=https://docs.typless.com/) if you do not see one).\
You can also obtain the **API key** by visiting the [Settings page](https://app.typless.com/settings/profile).

{% embed url="<https://typless-public.s3-eu-west-1.amazonaws.com/videos/copy_api_key.mp4>" %}
*Getting your API key*
{% endembed %}

## 1. Create a new document type

Before you start extracting data, you need to define a document type.\
Navigate to the [Dashboard page](https://app.typless.com) and click on the **New document type** button in the top right corner of the table. Next, select the **VAT invoice** card. The wizard will already pre-fill all the needed [extraction fields](https://typless.gitbook.io/typlessapi/typless/extraction-fields) along with the [document type configuration](https://typless.gitbook.io/typlessapi/typless/document-type).\
Click on the **Create document type**.

This will create a new **document type** named **vat-invoice** with the following fields:

* **`supplier_name`**
* **`invoice_number`**
* **`purchase_order_number`**
* **`receiver_name`**
* **`issue_date`**
* **`pay_due_date`**
* **`total_amount`**
* **`net_amount`**

The document type will have a [VAT rate plugin](https://typless.gitbook.io/typlessapi/typless/plugins/vat-rate-plugin) already set up for your fields.

## 2. Add suppliers

Typless is a tool for automation. That's why you need to fill the **dataset** and **train it first**. To automate a new supplier, you first need to add its invoices to the data set.\
Download an example invoice from Best Flowers Inc:&#x20;

* [Best Flowers Inc. - download](https://typless-public.s3-eu-west-1.amazonaws.com/use_cases/vat-invoice/vat_invoice_1.pdf)

{% hint style="success" %}
To add a document to the dataset, use the [add-document](https://typless.gitbook.io/typlessapi/api-docs/api-schema#api-add-document) endpoint or use the [training room](https://typless.gitbook.io/typlessapi/typless-hub/document-type#training-room), where you can easily upload a file and fill out the necessary information.
{% endhint %}

The dataset is created by uploading an original file with the correct value for each field defined inside the document type.\
A key point to note regarding VAT invoices is that you must also fill out all the VAT rates listed on the document, so the engine will take these into account the next time it performs extraction.

<details>

<summary>1 Open file as base 64 string <em><mark style="color:green;">(Lines 6-9)</mark></em></summary>

Open the file and encode it as a base64 String.\
Make sure you are pointing to the directory with the file.

</details>

<details>

<summary><strong>2 Specify payload</strong> <em><mark style="color:green;">(Lines 11-72)</mark></em></summary>

Specify your payload with the required fields:

* file - base64 encoded file
* file\_name - name of the file
* document\_type\_name - the name of the document type we want to add the supplier to
* learning\_fields - fields with the correct values used for training
* vat\_rates

</details>

<details>

<summary><strong>3 VAT rates payload property</strong> <em><mark style="color:green;">(Lines 49-71)</mark></em></summary>

The vat\_rates property represents all the VAT rates present on the document. A VAT rate is represented with a list of field structures with a vat\_rate\_percentage field and a vat\_rate\_net field.

</details>

<details>

<summary><strong>4 Specify headers</strong> <em><mark style="color:green;">(Lines 76-80)</mark></em></summary>

Add all the required request headers.

</details>

<details>

<summary><strong>5 Send the request</strong> <em><mark style="color:green;">(Lines 82-84)</mark></em></summary>

Send the request and make sure that everything went smoothly.

</details>

{% tabs %}
{% tab title="Python" %}
{% code lineNumbers="true" %}

```python
import json

import requests
import base64

file_name = 'vat_invoice_1.pdf'

with open(file_name, 'rb') as file:
    base64_data = base64.b64encode(file.read()).decode('utf-8')

payload = {
    "file": base64_data,
    "file_name": file_name,
    "document_type_name": "vat-invoice",
    "learning_fields": [
        {
            "name": "supplier_name",
            "value": "Best flowers Inc."
        },
        {
            "name": "receiver_name",
            "value": "James Bond"
        },
        {
            "name": "invoice_number",
            "value": "123/2017"
        },
        {
            "name": "purchase_order_number",
            "value": "001-001-30"
        },
        {
            "name": "pay_due_date",
            "value": "2017-06-30"
        },
        {
            "name": "issue_date",
            "value": "2017-06-16"
        },
        {
            "name": "total_amount",
            "value": "735.3300"
        },
        {
            "name": "net_amount",
            "value": "644.1400"
        }
    ],
    "vat_rates": [
        [
            {
                "name": "vat_rate_percentage",
                "value": "9.5000"
            },
            {
                "name": "vat_rate_net",
                "value": "404.1400"
            },
        ],
        [
            {
                "name": "vat_rate_percentage",
                "value": "22.0000"
            },
            {
                "name": "vat_rate_net",
                "value": "240.0000"
            },
        ]

    ]
}

url = "https://developers.typless.com/api/add-document"

headers = {
    "Accept": "application/json",
    "Content-Type": "application/json",
    "Authorization": "<<apiKey>>"
}

response = requests.request("POST", url, json=payload, headers=headers)

print(response.json())
```

{% endcode %}
{% endtab %}

{% tab title="Node" %}
{% code lineNumbers="true" %}

```javascript
const fetch = require('node-fetch');
const fs = require('fs');

const fileName = 'vat_invoice_1.pdf';
const base64File = fs.readFileSync(fileName, {encoding: 'base64'});

const url = 'https://developers.typless.com/api/add-document';

const payload = {
    file: base64File,
    file_name: fileName,
    document_type_name: "vat-invoice",
    learning_fields: [
        {
            "name": "supplier_name",
            "value": "Best flowers Inc."
        },
        {
            "name": "receiver_name",
            "value": "James Bond"
        },
        {
            "name": "invoice_number",
            "value": "123/2017"
        },
        {
            "name": "purchase_order_number",
            "value": "001-001-30"
        },
        {
            "name": "pay_due_date",
            "value": "2017-06-30"
        },
        {
            "name": "issue_date",
            "value": "2017-06-16"
        },
        {
            "name": "total_amount",
            "value": "735.3300"
        },
        {
            "name": "net_amount",
            "value": "644.1400"
        }
    ],
    vat_rates: [
        [
            {
                "name": "vat_rate_percentage",
                "value": "9.5000"
            },
            {
                "name": "vat_rate_net",
                "value": "404.1400"
            },
        ],
        [
            {
                "name": "vat_rate_percentage",
                "value": "22.0000"
            },
            {
                "name": "vat_rate_net",
                "value": "240.0000"
            },
        ]

    ]
};

const headers = {
    'Accept': 'application/json',
    'Content-Type': 'application/json',
    'Authorization': '<<apiKey>>'
}

let options = {
    method: 'POST',
    headers: headers,
    body: JSON.stringify(payload)
};

fetch(url, options)
    .then(res => res.json())
    .then(json => console.log(json))
    .catch(err => console.error('error:' + err));
```

{% endcode %}
{% endtab %}
{% endtabs %}

Response:

{% tabs %}
{% tab title="JSON" %}

<pre class="language-json" data-line-numbers><code class="lang-json"><strong>{
</strong>  details: [ '0d0596ac5e7320eb9b75ee1b327dff4d899f1a6a' ],
  message: 'Document added successfully.'
}
</code></pre>

{% endtab %}
{% endtabs %}

As you can see, to achieve high accuracy, Typless only needs **the values that are in the document**. Nevertheless, there are some [rules](https://typless.gitbook.io/typlessapi/typless/extraction-fields) to keep in mind when providing values.

Applying these rules to the provided example, you will change some fields:

* **`total_amount`** value was converted with [number type rules](https://typless.gitbook.io/typlessapi/typless/extraction-fields#number-type) from **735,33** to **735.3300**
* **`net_amount`** value was converted with [number type rules](https://typless.gitbook.io/typlessapi/typless/extraction-fields#number-type) from **644,14** to **644.1400**
* **`issue_date`** value was converted with [date type rules](https://typless.gitbook.io/typlessapi/typless/extraction-fields#date-type) from **16.06.2017** to **2017-06-16**
* **`pay_due_date`** value was converted with [date type rules](https://typless.gitbook.io/typlessapi/typless/extraction-fields#date-type) from **30.06.2017** to **2017-06-30**

You also applied the same rules to the VAT rates on the document.\
VAT rates are structured as a list of lists, similarly to line items, so keep that in mind when building the data structure for training.

{% hint style="info" %}
**Do you need more information on the VAT rate plugin?** Learn more about how it works and its limitations [here](https://typless.gitbook.io/typlessapi/typless/plugins/vat-rate-plugin).
{% endhint %}

You will have one supplier added to your document type after you run the code example.

## 3. Execute training

{% hint style="success" %}
**👍&#x20;**<mark style="color:green;">**Training is executed automatically every day at 10 PM CET**</mark>

For **all of your suppliers** with new documents in the [dataset](https://typless.gitbook.io/typlessapi/typless/training/building-a-dataset) of all your document types.\
**Free of charge**
{% endhint %}

To immediately see results, you can trigger the training process on the [Dashboard page](https://app.typless.com).\
Look for the **VAT-invoice** document type in the list, and click on ![cogs icon](https://typless-public.s3-eu-west-1.amazonaws.com/cogs.png).

{% hint style="info" %}
**Need more information about training?** Read more about it [here](https://typless.gitbook.io/typlessapi/typless/training).
{% endhint %}

## 4. Extract data from documents

After the training is finished, you can start precisely extracting data from documents from trained suppliers. Download a new example from Best Flowers Inc:

* [Best Flowers Inc. 2 - download](https://typless-public.s3-eu-west-1.amazonaws.com/use_cases/vat-invoice/vat_invoice_2.pdf)

Download it and extract the data using the code:

<details>

<summary><strong>1 Open file as base64 string</strong> <em><mark style="color:green;">(Lines 4-6)</mark></em></summary>

Open the file in binary mode and correctly decode it into a base64 string.\
Make sure that your file is in the same directory as the script.

</details>

<details>

<summary><strong>2 Create payload</strong> <em><mark style="color:green;">(Lines 8-11)</mark></em></summary>

Create request payload with all the required parameters:

* file
* file\_name
* document\_type\_name

</details>

<details>

<summary><strong>3 Specify headers</strong> <em><mark style="color:green;">(Lines 16-19)</mark></em></summary>

Make sure that the Content-Type is set as application/json.\
Fill the Authorization header with your API key.

</details>

<details>

<summary><strong>4 Execute the request</strong> <em><mark style="color:green;">(Line 22)</mark></em></summary>

Send the request and wait for the extraction to finish.

</details>

{% tabs %}
{% tab title="Python" %}
{% code lineNumbers="true" %}

```python
import requests
import base64

file_name = 'vat_invoice_2.pdf'
with open(file_name, 'rb') as file:
    base64_data = base64.b64encode(file.read()).decode('utf-8')

payload = {
    "file": base64_data,
    "file_name": file_name,
    "document_type_name": "vat-invoice"
}

url = "https://developers.typless.com/api/extract-data"

headers = {
    "Accept": "application/json",
    "Content-Type": "application/json",
    "Authorization": "<<apiKey>>"
}

response = requests.request("POST", url, json=payload, headers=headers)

for field in response.json()['extracted_fields']:
    print(f'{field["name"]}: {field["values"][0]["value"]}')

print('--- VAT RATES ---')

for vat_rate in response.json()['vat_rates']:
    for field in vat_rate:
        print(f'{field["name"]}: {field["values"][0]["value"]}')
    print('----------------------------------')

```

{% endcode %}
{% endtab %}

{% tab title="Node" %}
{% code lineNumbers="true" %}

```javascript
const fetch = require('node-fetch');
const fs = require('fs');

const fileName = 'vat_invoice_2.pdf';
const base64File = fs.readFileSync(fileName, {encoding: 'base64'});

const url = 'https://developers.typless.com/api/extract-data';

const payload = {
    file: base64File,
    file_name: fileName,
    document_type_name: "vat-invoice"
}

const headers = {
    'Accept': 'application/json',
    'Content-Type': 'application/json',
    'Authorization': '<<apiKey>>'
}

let options = {
    method: 'POST',
    headers: headers,
    body: JSON.stringify(payload)
};

fetch(url, options)
    .then(res => res.json())
    .then(json => {
        json.extracted_fields.forEach(field => console.log(`${field.name}: ${field.values[0].value}`))
        console.log('--- VAT RATES ---');
        json.vat_rates.forEach(vatRate => {
            vatRate.forEach(field => {
                console.log(`${field.name}: ${field.values[0].value}`);
            })
            console.log('--------------------');
        })
    })
    .catch(err => console.error('error:' + err));
```

{% endcode %}
{% endtab %}
{% endtabs %}

Response:

{% tabs %}
{% tab title="JSON" %}
{% code lineNumbers="true" %}

```json
{
  "customer": null,
  "extracted_fields": [
    {
      "data_type": "AUTHOR",
      "name": "supplier_name",
      "values": [
        {
          "confidence_score": 0.987,
          "height": -1,
          "page_number": -1,
          "value": "Best flowers Inc.",
          "width": -1,
          "x": -1,
          "y": -1
        }
      ]
    },
    {
      "data_type": "DATE",
      "name": "pay_due_date",
      "values": [
        {
          "confidence_score": 0.99,
          "height": 40,
          "page_number": 0,
          "value": "2017-06-30",
          "width": 481,
          "x": 1818,
          "y": 775
        },
        {
          "confidence_score": 0.125,
          "height": 33,
          "page_number": 0,
          "value": "2017-06-16",
          "width": 608,
          "x": 1685,
          "y": 715
        }
      ]
    },
    {
      "data_type": "STRING",
      "name": "purchase_order_number",
      "values": [
        {
          "confidence_score": 0.99,
          "height": 51,
          "page_number": 0,
          "value": "001-001-35",
          "width": 835,
          "x": 1358,
          "y": 1310
        }
      ]
    },
    {
      "data_type": "NUMBER",
      "name": "total_amount",
      "values": [
        {
          "confidence_score": 0.75,
          "height": 32,
          "page_number": 0,
          "value": "398.3000",
          "width": 112,
          "x": 1208,
          "y": 2978
        },
        {
          "confidence_score": 0.75,
          "height": 33,
          "page_number": 0,
          "value": "61.9500",
          "width": 93,
          "x": 829,
          "y": 2977
        },
        {
          "confidence_score": 0.75,
          "height": 32,
          "page_number": 0,
          "value": "398.3000",
          "width": 112,
          "x": 1208,
          "y": 3048
        },
        {
          "confidence_score": 0.6875,
          "height": 32,
          "page_number": 0,
          "value": "336.3500",
          "width": 114,
          "x": 541,
          "y": 2977
        },
        {
          "confidence_score": 0.625,
          "height": 31,
          "page_number": 0,
          "value": "292.8000",
          "width": 116,
          "x": 1207,
          "y": 2913
        }
      ]
    },
    {
      "data_type": "STRING",
      "name": "invoice_number",
      "values": [
        {
          "confidence_score": 0.99,
          "height": 54,
          "page_number": 0,
          "value": "125/2021",
          "width": 787,
          "x": 1395,
          "y": 1162
        }
      ]
    },
    {
      "data_type": "DATE",
      "name": "issue_date",
      "values": [
        {
          "confidence_score": 0.99,
          "height": 33,
          "page_number": 0,
          "value": "2017-06-16",
          "width": 608,
          "x": 1685,
          "y": 715
        },
        {
          "confidence_score": 0.125,
          "height": 40,
          "page_number": 0,
          "value": "2017-06-30",
          "width": 481,
          "x": 1818,
          "y": 775
        }
      ]
    },
    {
      "data_type": "STRING",
      "name": "receiver_name",
      "values": [
        {
          "confidence_score": 0.99,
          "height": 39,
          "page_number": 0,
          "value": "James Bond",
          "width": 233,
          "x": 170,
          "y": 768
        },
        {
          "confidence_score": 0.3125,
          "height": 32,
          "page_number": 0,
          "value": "losed stre",
          "width": 428,
          "x": 173,
          "y": 816
        },
        {
          "confidence_score": 0.125,
          "height": 51,
          "page_number": 0,
          "value": "chase orde",
          "width": 835,
          "x": 1358,
          "y": 1310
        },
        {
          "confidence_score": 0.125,
          "height": 31,
          "page_number": 0,
          "value": "PIREA GOLD",
          "width": 578,
          "x": 224,
          "y": 1551
        },
        {
          "confidence_score": 0.125,
          "height": 54,
          "page_number": 0,
          "value": "voice numb",
          "width": 787,
          "x": 1395,
          "y": 1162
        }
      ]
    },
    {
      "data_type": "NUMBER",
      "name": "net_amount",
      "values": [
        {
          "confidence_score": 0.5625,
          "height": 32,
          "page_number": 0,
          "value": "336.3500",
          "width": 114,
          "x": 541,
          "y": 2977
        },
        {
          "confidence_score": 0.5,
          "height": 33,
          "page_number": 0,
          "value": "61.9500",
          "width": 93,
          "x": 829,
          "y": 2977
        },
        {
          "confidence_score": 0.5,
          "height": 32,
          "page_number": 0,
          "value": "398.3000",
          "width": 112,
          "x": 1208,
          "y": 2978
        },
        {
          "confidence_score": 0.5,
          "height": 36,
          "page_number": 0,
          "value": "336.3500",
          "width": 219,
          "x": 2093,
          "y": 2068
        },
        {
          "confidence_score": 0.4707,
          "height": 36,
          "page_number": 0,
          "value": "398.3000",
          "width": 242,
          "x": 2063,
          "y": 2217
        }
      ]
    }
  ],
  "file_name": "vat_invoice_2.pdf",
  "line_items": [],
  "object_id": "0d05ad736c837edde4a5aa5434d06da713f7c2b2",
  "vat_rates": [
    [
      {
        "data_type": "NUMBER",
        "name": "vat_rate_percentage",
        "values": [
          {
            "confidence_score": 0.99,
            "height": -1,
            "page_number": -1,
            "value": "9.5000",
            "width": -1,
            "x": -1,
            "y": -1
          }
        ]
      },
      {
        "data_type": "NUMBER",
        "name": "vat_rate_net",
        "values": [
          {
            "confidence_score": 0.99,
            "height": 31,
            "page_number": 0,
            "value": "96.3500",
            "width": 94,
            "x": 561,
            "y": 2838
          }
        ]
      }
    ],
    [
      {
        "data_type": "NUMBER",
        "name": "vat_rate_percentage",
        "values": [
          {
            "confidence_score": 0.99,
            "height": -1,
            "page_number": -1,
            "value": "22.0000",
            "width": -1,
            "x": -1,
            "y": -1
          }
        ]
      },
      {
        "data_type": "NUMBER",
        "name": "vat_rate_net",
        "values": [
          {
            "confidence_score": 0.99,
            "height": 31,
            "page_number": 0,
            "value": "240.0000",
            "width": 116,
            "x": 541,
            "y": 2913
          }
        ]
      }
    ]
  ]
}
```

{% endcode %}
{% endtab %}
{% endtabs %}

You should successfully extract fields along with all the VAT rates present on the invoice.

{% hint style="info" %}
**Need a more in-depth explanation of the response?**\
You can read about it [here](https://typless.gitbook.io/typlessapi/typless/data-extraction#understanding-response).
{% endhint %}

## 5. Continuously improve models

Typless embraces the fact that the world is changing all the time.\
That's why you can improve models **on the fly** by providing correct data after extraction.\
Let's say your company has a new partner, *Best Supplier*. You don't need to start over with building the dataset. You can simply extract and send the correct data after they **are verified by your users**.\
You can learn more about providing feedback on the [building a dataset](https://typless.gitbook.io/typlessapi/typless/training/building-a-dataset#using-live-data) page.

{% hint style="success" %}
**📘&#x20;**<mark style="color:blue;">**Closed workflow loop - improve models live!**</mark>

Use every action from your users to adapt and improve Typless models without any extra costs.
{% endhint %}

{% hint style="info" %}
To send feedback, use the [add-document-feedback](https://typless.gitbook.io/typlessapi/api-docs/api-schema#api-add-document-feedback) with [object\_id](https://typless.gitbook.io/typlessapi/typless/data-extraction#response-base-params).
{% endhint %}

## Running Typless live

The only thing that you need to do to automate your manual data entry is to integrate those simple API calls into your system.

{% hint style="info" %}
**Have any questions or need some help?** Contact us in chat or send us an email to **<support@typless.com>**
{% endhint %}

![Typless usage is simple and straightforward!](https://files.readme.io/ad56c3d-typless_1.PNG)
