# Custom document

This is a general guide that covers how to extract data from any pseudo-structured documents with examples in **Python** and **Node**. You will learn how to easily train and extract data from various different documents in [many languages and character sets](https://typless.gitbook.io/typlessapi/typless/document-type/language-support).

You will need:

* Two different examples of a document
* Correct values for at least one document
* 15 minutes of your time

**This guide shows you how to**

1. [Create custom document type](#id-1.-create-a-new-document-type)
2. [Add multiple suppliers](#id-2.-add-suppliers)
3. [Execute training](#id-3.-execute-training)
4. [Extract data from documents](#id-4.-extract-data-from-documents)
5. [Continuously improve models after extraction](#id-5.-continuously-improve-models)

## Getting your API Key

The *Authorization* header for your API key is: `Token YOUR-API-KEY` ([Login](https://app.typless.com/login/?redirect=https://docs.typless.com/) if you do not see one).\
You can also obtain the **API key** by visiting the [Settings page](https://app.typless.com/settings/profile).

{% embed url="<https://typless-public.s3-eu-west-1.amazonaws.com/videos/copy_api_key.mp4>" %}
*Getting your API key*
{% endembed %}

## 1. Create a new document type

Before you start extracting data, you need to define a document type. Navigate to the [Dashboard page](https://app.typless.com) and click on the **New document type** button in the top right corner of the table. Next, select the **Custom document** card.

{% hint style="info" %}
**Document type is used for all your suppliers**\
Click on [Document type](https://typless.gitbook.io/typlessapi/typless/document-type) to learn more.
{% endhint %}

You will have to define all metadata fields and line item fields you want to extract. The only exception is `supplier_name`*,* which must be present on each document type.

To ensure consistent training and data extraction, Typless uses 3 field data types:

<table><thead><tr><th width="123">Field type</th><th>What is it used for?</th></tr></thead><tbody><tr><td>STRING</td><td>General string fields like document numbers, address, company names, payment references, IBANs, ...</td></tr><tr><td>DATE</td><td>Dates like issue date, pay due, date of service, delivery date, contract date, ...</td></tr><tr><td>NUMBER</td><td>Numbers you want to perform calculations with like total amount, net amount, ...</td></tr></tbody></table>

{% hint style="info" %}
W**ant to learn more about defining fields?**\
Check out the [fields](https://typless.gitbook.io/typlessapi/typless/extraction-fields) or [line items](https://typless.gitbook.io/typlessapi/typless/line-items-table-extraction) guide to learn more.
{% endhint %}

## 2. Add suppliers

Once your document type is created, you need to add data to the dataset of your document type.

{% hint style="info" %}
To add a document to the dataset, use the [add-document](https://typless.gitbook.io/typlessapi/api-docs/api-schema#api-add-document) endpoint or use [training room](https://typless.gitbook.io/typlessapi/typless-hub/document-type#training-room), where you can easily upload a file and fill out the necessary information.
{% endhint %}

The dataset is created by uploading an original file with the correct value for each field defined inside the document type:

<details>

<summary><strong>1 Open file as base64 string</strong> <em><mark style="color:green;">(Lines 4-6)</mark></em></summary>

Make sure you are pointing to the right path when opening the file.

</details>

<details>

<summary><strong>2 Create payload</strong> <em><mark style="color:green;">(Lines 8-73)</mark></em></summary>

The payload consists of learning fields, line items, file name, and a base64 string-encoded file.

</details>

<details>

<summary><strong>3 Specify values for learning fields</strong> <em><mark style="color:green;">(Lines 9-60)</mark></em></summary>

For every field, you have defined in your document type, write the correct value

</details>

<details>

<summary><strong>4 Specify values for line items</strong> <em><mark style="color:green;">(Lines 47, 51, 55)</mark></em></summary>

For every line-item row add an array of line item fields with correct values

</details>

<details>

<summary><strong>5 Add file info</strong> <em><mark style="color:green;">(Lines 61-62)</mark></em></summary>

Add file in base64 and file name

</details>

<details>

<summary><strong>6 Specify document type name</strong> <em><mark style="color:green;">(Line 63)</mark></em></summary>

</details>

<details>

<summary><strong>7 Authorize with API key</strong> <em><mark style="color:green;">(Line 72)</mark></em></summary>

Authorize with your API key - prepend it with the word Token.

</details>

<details>

<summary><strong>8 Execute the request</strong> <em><mark style="color:green;">(Lines 75-77)</mark></em></summary>

Execute the request and make sure that everything went smooth.

</details>

{% tabs %}
{% tab title="Python" %}
{% code lineNumbers="true" %}

```python
import requests
import base64

file_name = 'name_of_your_document.pdf'
with open(file_name, 'rb') as file:
    base64_data = base64.b64encode(file.read()).decode('utf-8')

payload = {
    "learning_fields": [
        {
            "name": "supplier_name",
            "value": "Amazing Company"
        },
        {
            "name": "receiver_name",
            "value": "Amazing Client"
        },
        {
            "name": "invoice_number",
            "value": "3"
        },
        {
            "name": "purchase_order_number",
            "value": "234778"
        },
        {
            "name": "pay_due_date",
            "value": "2021-03-31"
        },
        {
            "name": "issue_date",
            "value": "2021-02-01"
        },
        {
            "name": "total_amount",
            "value": "15.0000"
        }
    ],
    "line_items": [
        [
            {
                "name": "product_number",
                "value": ""
            },
            {
                "name": "product_description",
                "value": "Amazing service"
            },
            {
                "name": "quantity",
                "value": "1"
            },
            {
                "name": "price",
                "value": "15.0000"
            }

        ]

    ],
    "file": base64_data,
    "file_name": file_name,
    "document_type_name": "line-item-invoice"
}


url = "https://developers.typless.com/api/add-document"

headers = {
    "Accept": "application/json",
    "Content-Type": "application/json",
    "Authorization": "<<apiKey>>"
}

response = requests.request("POST", url, json=payload, headers=headers)

print(response.json())
```

{% endcode %}
{% endtab %}

{% tab title="Node" %}
{% code lineNumbers="true" %}

```javascript
const fetch = require('node-fetch');
const fs = require('fs');

const fileName = 'name_of_your_document.pdf';
const base64File = fs.readFileSync(fileName, {encoding: 'base64'});

let url = 'https://developers.typless.com/api/add-document';

const payload = {
  file: base64File,
  file_name: fileName,
  document_type_name: "line-item-invoice",
  learning_fields: [
        {
            "name": "supplier_name",
            "value": "Amazing Company"
        },
        {
            "name": "receiver_name",
            "value": "Amazing Client"
        },
        {
            "name": "invoice_number",
            "value": "3"
        },
        {
            "name": "purchase_order_number",
            "value": "234778"
        },
        {
            "name": "pay_due_date",
            "value": "2021-03-31"
        },
        {
            "name": "issue_date",
            "value": "2021-02-01"
        },
        {
            "name": "total_amount",
            "value": "15.0000"
        }
    ],
    line_items: [
        [
            {
                "name": "product_number",
                "value": ""
            },
            {
                "name": "product_description",
                "value": "Amazing service"
            },
            {
                "name": "quantity",
                "value": "1"
            },
            {
                "name": "price",
                "value": "15.0000"
            }

        ]

    ]
};

const headers = {
  'Accept': 'application/json',
  'Content-Type': 'application/json',
  'Authorization': '<<apiKey>>'
}

let options = {
  method: 'POST',
  headers: headers
};

fetch(url, options)
  .then(res => res.json())
  .then(json => console.log(json))
  .catch(err => console.error('error:' + err));
```

{% endcode %}
{% endtab %}
{% endtabs %}

Response:

{% tabs %}
{% tab title="JSON" %}
{% code lineNumbers="true" %}

```json
{
	"details":[
		"0cb9660762f20e13850d36cd45b48d44b63059f7"
	],
	"message":"Document added successfully."
}
```

{% endcode %}
{% endtab %}
{% endtabs %}

As you can see, to achieve high accuracy, Typless only requires the values that are present in **the document**. However, there are some [rules](https://typless.gitbook.io/typlessapi/typless/extraction-fields) to keep in mind when providing values.

{% hint style="info" %}
**Want to learn more about providing training values?**\
Check out the [fields](https://typless.gitbook.io/typlessapi/typless/extraction-fields) or [line items](https://typless.gitbook.io/typlessapi/typless/line-items-table-extraction) guide to learn more.
{% endhint %}

## 3. Execute training

{% hint style="success" %}
**👍&#x20;**<mark style="color:green;">**Training is executed automatically every day at 10 PM CET**</mark>

For **all of your suppliers** with new documents in the [dataset](https://typless.gitbook.io/typlessapi/typless/training/building-a-dataset) of all your document types.\
**Free of charge**
{% endhint %}

To immediately see the results, you can trigger the training process on the [Dashboard page](https://app.typless.com).\
Look for your document type in the list, and click on ![cogs icon](https://typless-public.s3-eu-west-1.amazonaws.com/cogs.png).

{% hint style="info" %}
**Need more information about training?** You can read more about it [here](https://typless.gitbook.io/typlessapi/typless/training).
{% endhint %}

## 4. Extract data from documents

After the training is finished, you can start precisely extracting data from documents from trained suppliers.

{% hint style="info" %}
To add a document to a dataset, use the [extract-data](https://typless.gitbook.io/typlessapi/api-docs/api-schema#api-extract-data) endpoint.
{% endhint %}

<details>

<summary><strong>1 Open file as base64 string</strong> <em><mark style="color:green;">(Lines 4-6)</mark></em></summary>

Open the file in binary mode and correctly decode it into a base64 string.\
Make sure that your file is in the same directory as the script.

</details>

<details>

<summary><strong>2 Create payload</strong> <em><mark style="color:green;">(Lines 8-12)</mark></em></summary>

Create request payload with all the required parameters:

* file
* file\_name
* document\_type\_name

</details>

<details>

<summary><strong>3 Specify headers</strong> <em><mark style="color:green;">(Lines 16-20)</mark></em></summary>

Make sure that the Content-Type is set as application/json.

</details>

<details>

<summary><strong>4 Authorize with your API key</strong> <em><mark style="color:green;">(Line 19)</mark></em></summary>

You can get your API key at <https://app.typless.com/settings/profile>.

</details>

<details>

<summary><strong>5 Execute the request</strong> <em><mark style="color:green;">(Lines 22)</mark></em></summary>

Send the request and wait for the response.

</details>

{% tabs %}
{% tab title="Python" %}
{% code lineNumbers="true" %}

```python
import requests
import base64

file_name = 'name_of_your_document.pdf'
with open(file_name, 'rb') as file:
    base64_data = base64.b64encode(file.read()).decode('utf-8')

payload = {
    "file": base64_data,
    "file_name": file_name,
    "document_type_name": "line-item-invoice"
}

url = "https://developers.typless.com/api/extract-data"

headers = {
    "Accept": "application/json",
    "Content-Type": "application/json",
    "Authorization": "<<apiKey>>"
}

response = requests.request("POST", url, json=payload, headers=headers)

print(response.json())
```

{% endcode %}
{% endtab %}

{% tab title="Node" %}
{% code lineNumbers="true" %}

```javascript
const fetch = require('node-fetch');
const fs = require('fs');

const fileName = 'name_of_your_document.pdf';
const base64File = fs.readFileSync(fileName, {encoding: 'base64'});

const url = 'https://developers-development.typless.com/api/extract-data';

const payload = {
  file: base64File,
  file_name: fileName,
  document_type_name: "line-item-invoice"
}

const headers = {
  'Accept': 'application/json',
  'Content-Type': 'application/json',
  'Authorization': '<<apiKey>>'
}

let options = {
  method: 'POST',
  headers: headers,
  body: JSON.stringify(payload)
};

fetch(url, options)
  .then(res => res.json())
  .then(json => console.log(JSON.stringify(json)))
  .catch(err => console.error('error:' + err));
```

{% endcode %}
{% endtab %}
{% endtabs %}

Response:

{% tabs %}
{% tab title="JSON" %}
{% code lineNumbers="true" %}

```json
{
    "file_name": "name_of_your_document.pdf",
    "object_id": "1cb25cc8-c9fa-4149-9a83-b4ed6a2173b9",
    "extracted_fields": [
        {
            "name": "supplier",
            "values": [
                {
                    "x": -1,
                    "y": -1,
                    "width": -1,
                    "height": -1,
                    "value": "ScaleGrid",
                    "confidence_score": "0.968",
                    "page_number": -1
                }
            ],
            "data_type": "AUTHOR"
        },
        {
            "name": "invoice_number",
            "values": [
                {
                    "x": 1989,
                    "y": 545,
                    "width": 323,
                    "height": 54,
                    "value": "20190500005890",
                    "confidence_score": "0.250",
                    "page_number": 0
                },
                {
                    "x": 167,
                    "y": 574,
                    "width": 391,
                    "height": 54,
                    "value": "GB123456789",
                    "confidence_score": "0.250",
                    "page_number": 0
                }
            ],
            "data_type": "STRING"
        },
        {
            "name": "issue_date",
            "values": [
                {
                    "x": 2072,
                    "y": 628,
                    "width": 240,
                    "height": 54,
                    "value": "2019-06-05",
                    "confidence_score": "0.358",
                    "page_number": 0
                }
            ],
            "data_type": "DATE"
        },
        {
            "name": "total_amount",
            "values": [
                {
                    "x": 2146,
                    "y": 1196,
                    "width": 126,
                    "height": 54,
                    "value": "47.5300",
                    "confidence_score": "0.990",
                    "page_number": 0
                }
            ],
            "data_type": "NUMBER"
        }
    ],
    "line_items": [
        [
            {
                "name": "Description",
                "values": [
                    {
                        "x": 208,
                        "y": 1196,
                        "width": 1022,
                        "height": 50,
                        "value": "5/2019-MongoBackend-MgmtStandalone-Small-744 hours",
                        "confidence_score": "0.661",
                        "page_number": 0
                    }
                ],
                "data_type": "STRING"
            },
            {
                "name": "Price",
                "values": [
                    {
                        "x": 2146,
                        "y": 1196,
                        "width": 126,
                        "height": 54,
                        "value": "47.5300",
                        "confidence_score": "0.582",
                        "page_number": 0
                    }
                ],
                "data_type": "NUMBER"
            },
            {
                "name": "Quantity",
                "values": [
                    {
                        "x": 1979,
                        "y": 1196,
                        "width": 23,
                        "height": 54,
                        "value": "1",
                        "confidence_score": "0.647",
                        "page_number": 0
                    }
                ],
                "data_type": "NUMBER"
            }
        ]
    ],
    "customer": null
}
```

{% endcode %}
{% endtab %}
{% endtabs %}

{% hint style="info" %}
**Need a more in-depth explanation of the response?**\
You can read about it [here](https://typless.gitbook.io/typlessapi/typless/data-extraction#understanding-response).
{% endhint %}

## 5. Continuously improve models

Typless embraces the fact that the world is changing all the time.\
That's why you can improve models **on the fly** by providing correct data after extraction.\
Let's say your company has a new partner, *Best Supplier*. You don't need to start over with building the dataset. You can simply extract and send the correct data after they **are verified by your users**.\
You can learn more about providing feedback on the [building dataset](https://typless.gitbook.io/typlessapi/typless/training/building-a-dataset) page.

Add a supplier with feedback:

<details>

<summary><strong>1 Create payload</strong> <em><mark style="color:green;">(Line 3)</mark></em></summary>

Create payload with the following parameters:

* learning\_fields
* line\_items
* document\_object\_id
* document\_type\_name

</details>

<details>

<summary><strong>2 Create fields feedback data</strong> <em><mark style="color:green;">(Lines 5-35)</mark></em></summary>

Set the correct data values for all the defined fields that are on the document

</details>

<details>

<summary><strong>3 Create line items feedback data</strong> <em><mark style="color:green;">(Lines 36-55)</mark></em></summary>

</details>

<details>

<summary><strong>4 Set document object id</strong> <em><mark style="color:green;">(Line 56)</mark></em></summary>

Set the document\_object\_id you get from the extraction response in the object\_id key.\
Read more about the object id [here](https://typless.gitbook.io/typlessapi/typless/data-extraction).

</details>

<details>

<summary><strong>5 Document type name</strong> <em><mark style="color:green;">(Line 57)</mark></em></summary>

Set the document type name you are providing feedback for

</details>

<details>

<summary><strong>6 Specify headers</strong> <em><mark style="color:green;">(Lines 59-62)</mark></em></summary>

Set the correct headers, make sure that the content-type is application/json.\
Under the Authorization header put your API key prepended with the word Token

</details>

<details>

<summary><strong>7 Execute the request</strong> <em><mark style="color:green;">(Lines 65-67)</mark></em></summary>

Send the POST request with the set payload, headers, and URL.

</details>

{% tabs %}
{% tab title="Python" %}
{% code lineNumbers="true" %}

```python
import requests

url = 'https://developers.typless.com/api/add-document-feedback';

payload = {
  "learning_fields": [
        {
            "name": "supplier_name",
            "value": "Amazing Company"
        },
            {
            "name": "receiver_name",
            "value": "Another Amazing Client"
        },
        {
            "name": "invoice_number",
            "value": "350"
        },
            {
            "name": "purchase_order_number",
            "value": "345677"
        }
        {
            "name": "pay_due_date",
            "value": "2021-02-28"
        },
        {
            "name": "issue_date",
            "value": "2021-01-01"
        },
        {
            "name": "total_amount",
            "value": "259.0000"
        }
  ],
  "line_items": [
    [
      {
            "name": "product_number",
            "value": ""
        },
        {
            "name": "product_description",
            "value": "Amazing service"
        },
        {
            "name": "quantity",
            "value": "1"
        },
        {
            "name": "price",
            "value": "259.0000"
        }
    ]
   ],
  "document_object_id": ID-FROM-EXTRACTION-RESPONSE
  "document_type_name": "line-item-invoice"
}
headers = {
    "Accept": "application/json",
    "Content-Type": "application/json",
    "Authorization": "<<apiKey>>"
}

response = requests.request("POST", url, json=payload, headers=headers)

print(response.json())
```

{% endcode %}
{% endtab %}

{% tab title="Node" %}
{% code lineNumbers="true" %}

```javascript
const fetch = require('node-fetch');

let url = 'https://developers.typless.com/api/add-document-feedback';

const payload = {
  learning_fields: [
        {
            "name": "supplier_name",
            "value": "Amazing Company"
        },
            {
            "name": "receiver_name",
            "value": "Another Amazing Client"
        },
        {
            "name": "invoice_number",
            "value": "350"
        },
        {
            "name": "purchase_order_number",
            "value": "345677"
        },
        {
            "name": "pay_due_date",
            "value": "2021-02-28"
        },
        {
            "name": "issue_date",
            "value": "2021-01-01"
        },
        {
            "name": "total_amount",
            "value": "259.0000"
        }
  ],
  line_items: [
    [
      {
            "name": "product_number",
            "value": ""
        },
        {
            "name": "product_description",
            "value": "Amazing service"
        },
        {
            "name": "quantity",
            "value": "1"
        },
        {
            "name": "price",
            "value": "259.0000"
        }
    ]
    
   ],
  document_object_id: ID-FROM-EXTRACTION-RESPONSE,
  document_type_name: "line-item-invoice"
}

headers = {
    "Accept": "application/json",
    "Content-Type": "application/json",
    "Authorization": "<<apiKey>>"
}

let options = {
  method: 'POST',
  headers: headers,
  body: JSON.stringify(payload),
};

fetch(url, options)
  .then(res => res.json())
  .then(json => console.log(json))
  .catch(err => console.error('error:' + err));
```

{% endcode %}
{% endtab %}
{% endtabs %}

Response:

{% tabs %}
{% tab title="JSON" %}
{% code lineNumbers="true" %}

```json
{
	"details":[
		"0cb96695b4c677c1d6c5562d523aa9541cb5dda8"
	],
	"message":"Values added successfully."
}
```

{% endcode %}
{% endtab %}
{% endtabs %}

{% hint style="success" %}
**📘&#x20;**<mark style="color:blue;">**Closed workflow loop - improve models live!**</mark>

Use every action from your users to adapt and improve Typless models without any extra costs.
{% endhint %}

{% hint style="info" %}
To send feedback, use the [add-document-feedback](https://typless.gitbook.io/typlessapi/api-docs/api-schema#api-add-document-feedback) with [object\_id](https://typless.gitbook.io/typlessapi/typless/data-extraction#response-base-params).
{% endhint %}

## Running Typless live

The only thing that you need to do to automate your manual data entry is to integrate those simple API calls into your system.

{% hint style="info" %}
**Have any questions or need some help?** Write us an email to **<support@typless.com>**.
{% endhint %}
