# Building a dataset

To build Typless models for data extraction, you need to build a dataset of documents for the [document type](https://typless.gitbook.io/typlessapi/typless/document-type).

## Using your existing data

{% hint style="info" %}
**📘&#x20;**<mark style="color:blue;">**Use it for pre-training**</mark>

You can use existing data to achieve the state-of-the-art accuracy for your data extraction.
{% endhint %}

1. Use all data from documents that have already been manually processed and stored in the database to build a dataset for your [document type](https://typless.gitbook.io/typlessapi/typless/document-type).
2. Upload the original file with correct values from your database to train Typless before production.
3. Use the code to start:

<details>

<summary><strong>1 Open file as base64 string</strong> <em><mark style="color:green;">(Lines 4-6)</mark></em></summary>

Make sure you are pointing to the right path when opening the file.

</details>

<details>

<summary><strong>2 Create payload</strong> <em><mark style="color:green;">(Lines 8-64)</mark></em></summary>

The payload consists of learning fields, line items, file name, and a base64 string-encoded file.

</details>

<details>

<summary><strong>3 Specify values for learning fields</strong> <em><mark style="color:green;">(Lines 16, 20 etc.)</mark></em></summary>

For every field you have defined in your document type, write the correct value.

</details>

<details>

<summary><strong>4 Specify values for line items</strong> <em><mark style="color:green;">(Lines 39-60)</mark></em></summary>

For every line-item row add an array of line item fields with correct values.

</details>

<details>

<summary><strong>5 Add file info</strong> <em><mark style="color:green;">(Line 61 &#x26; 62)</mark></em></summary>

Add file in base64 and file name.

</details>

<details>

<summary><strong>6 Specify document type name</strong> <em><mark style="color:green;">(Line 63)</mark></em></summary>

</details>

<details>

<summary><strong>7 Authorize with API key</strong> <em><mark style="color:green;">(Line 72)</mark></em></summary>

Authorize with your API key - prepend it with the word Token.

</details>

<details>

<summary><strong>8 Execute the request</strong> <em><mark style="color:green;">(Lines 75-77)</mark></em></summary>

Execute the request and make sure that everything went smooth.

</details>

{% tabs %}
{% tab title="Python" %}
{% code lineNumbers="true" %}

```python
import requests
import base64

file_name = 'name_of_your_document.pdf'
with open(file_name, 'rb') as file:
    base64_data = base64.b64encode(file.read()).decode('utf-8')

payload = {
    "learning_fields": [
        {
            "name": "supplier_name",
            "value": "Amazing Company"
        },
        {
            "name": "receiver_name",
            "value": "Amazing Client"
        },
        {
            "name": "invoice_number",
            "value": "3"
        },
        {
            "name": "purchase_order_number",
            "value": "234778"
        },
        {
            "name": "pay_due_date",
            "value": "2021-03-31"
        },
        {
            "name": "issue_date",
            "value": "2021-02-01"
        },
        {
            "name": "total_amount",
            "value": "15.0000"
        }
    ],
    "line_items": [
        [
            {
                "name": "product_number",
                "value": ""
            },
            {
                "name": "product_description",
                "value": "Amazing service"
            },
            {
                "name": "quantity",
                "value": "1"
            },
            {
                "name": "price",
                "value": "15.0000"
            }

        ]

    ],
    "file": base64_data,
    "file_name": file_name,
    "document_type_name": "line-item-invoice"
}


url = "https://developers.typless.com/api/add-document"

headers = {
    "Accept": "application/json",
    "Content-Type": "application/json",
    "Authorization": "<<apiKey>>"
}

response = requests.request("POST", url, json=payload, headers=headers)

print(response.json())

```

{% endcode %}
{% endtab %}

{% tab title="Node" %}

<pre class="language-javascript" data-line-numbers><code class="lang-javascript">const fetch = require('node-fetch');
const fs = require('fs');

const fileName = 'name_of_your_document.pdf';
<strong>const base64File = fs.readFileSync(fileName, {encoding: 'base64'});
</strong>
let url = 'https://developers.typless.com/api/add-document';

const payload = {
  file: base64File,
  file_name: fileName,
  document_type_name: "line-item-invoice",
  learning_fields: [
        {
            "name": "supplier_name",
            "value": "Amazing Company"
        },
        {
            "name": "receiver_name",
            "value": "Amazing Client"
        },
        {
            "name": "invoice_number",
            "value": "3"
        },
        {
            "name": "purchase_order_number",
            "value": "234778"
        },
        {
            "name": "pay_due_date",
            "value": "2021-03-31"
        },
        {
            "name": "issue_date",
            "value": "2021-02-01"
        },
        {
            "name": "total_amount",
            "value": "15.0000"
        }
    ],
    line_items: [
        [
            {
                "name": "product_number",
                "value": ""
            },
            {
                "name": "product_description",
                "value": "Amazing service"
            },
            {
                "name": "quantity",
                "value": "1"
            },
            {
                "name": "price",
                "value": "15.0000"
            }

        ]

    ]
};

const headers = {
  'Accept': 'application/json',
  'Content-Type': 'application/json',
  'Authorization': '&#x3C;&#x3C;apiKey>>'
}

let options = {
  method: 'POST',
  headers: headers
};

fetch(url, options)
  .then(res => res.json())
  .then(json => console.log(json))
  .catch(err => console.error('error:' + err));
</code></pre>

{% endtab %}
{% endtabs %}

Response:

{% tabs %}
{% tab title="JSON" %}
{% code lineNumbers="true" %}

```json
{
	"details":[
		"0cb9660762f20e13850d36cd45b48d44b63059f7"
	],
	"message":"Document added successfully."
}
```

{% endcode %}
{% endtab %}
{% endtabs %}

## Using live data

{% hint style="info" %}
**📘&#x20;**<mark style="color:blue;">**Use it in a live environment**</mark>

Using live data allows you to improve your data extraction continuously and automate new suppliers on the fly.
{% endhint %}

Typless continuously improves with a closed feedback loop where you provide correct values for the extracted document. Check out the example below.

<details>

<summary><strong>1 Create payload</strong> <em><mark style="color:green;">(Lines 5-58)</mark></em></summary>

Create payload with the following parameters:

* learning\_fields
* line\_items
* document\_object\_id
* document\_type\_name

</details>

<details>

<summary>2 Create fields feedback data <em><mark style="color:green;">(Lines 6-35)</mark></em></summary>

Set the correct data values for all the defined fields that are on the document.

</details>

<details>

<summary>3 Create line items feedback data <em><mark style="color:green;">(Lines 36-55)</mark></em></summary>

Add all the line items with correct data values that are on the document.

</details>

<details>

<summary>4 Set document object id <em><mark style="color:green;">(Line 56)</mark></em></summary>

Set the document\_object\_id you get from the extraction response in the object\_id key.\
Read more about the object id [here](https://typless.gitbook.io/typlessapi/typless/data-extraction).

</details>

<details>

<summary>5 Document type name <em><mark style="color:green;">(Line 57)</mark></em></summary>

Set the document type name you are providing feedback for.

</details>

<details>

<summary>6 Specify headers <em><mark style="color:green;">(Lines 59-63)</mark></em></summary>

Set the correct headers; make sure the content-type is application/json.\
Under the Authorization header, put your API key prepended with the word Token

</details>

<details>

<summary>7 Execute the request <em><mark style="color:green;">(Lines 65-67)</mark></em></summary>

Send the POST request with the set payload, headers, and URL.

</details>

{% tabs %}
{% tab title="Python" %}

```python
import requests

url = 'https://developers.typless.com/api/add-document-feedback';

payload = {
  "learning_fields": [
        {
            "name": "supplier_name",
            "value": "Amazing Company"
        },
            {
            "name": "receiver_name",
            "value": "Another Amazing Client"
        },
        {
            "name": "invoice_number",
            "value": "350"
        },
            {
            "name": "purchase_order_number",
            "value": "345677"
        }
        {
            "name": "pay_due_date",
            "value": "2021-02-28"
        },
        {
            "name": "issue_date",
            "value": "2021-01-01"
        },
        {
            "name": "total_amount",
            "value": "259.0000"
        }
  ],
  "line_items": [
    [
      {
            "name": "product_number",
            "value": ""
        },
        {
            "name": "product_description",
            "value": "Amazing service"
        },
        {
            "name": "quantity",
            "value": "1"
        },
        {
            "name": "price",
            "value": "259.0000"
        }
    ]
   ],
  "document_object_id": ID-FROM-EXTRACTION-RESPONSE
  "document_type_name": "line-item-invoice"
}
headers = {
    "Accept": "application/json",
    "Content-Type": "application/json",
    "Authorization": "<<apiKey>>"
}

response = requests.request("POST", url, json=payload, headers=headers)

print(response.json())
```

{% endtab %}

{% tab title="Node" %}

```javascript
const fetch = require('node-fetch');

let url = 'https://developers.typless.com/api/add-document-feedback';

const payload = {
  learning_fields: [
        {
            "name": "supplier_name",
            "value": "Amazing Company"
        },
            {
            "name": "receiver_name",
            "value": "Another Amazing Client"
        },
        {
            "name": "invoice_number",
            "value": "350"
        },
        {
            "name": "purchase_order_number",
            "value": "345677"
        },
        {
            "name": "pay_due_date",
            "value": "2021-02-28"
        },
        {
            "name": "issue_date",
            "value": "2021-01-01"
        },
        {
            "name": "total_amount",
            "value": "259.0000"
        }
  ],
  line_items: [
    [
      {
            "name": "product_number",
            "value": ""
        },
        {
            "name": "product_description",
            "value": "Amazing service"
        },
        {
            "name": "quantity",
            "value": "1"
        },
        {
            "name": "price",
            "value": "259.0000"
        }
    ]
    
   ],
  document_object_id: ID-FROM-EXTRACTION-RESPONSE,
  document_type_name: "line-item-invoice"
}

headers = {
    "Accept": "application/json",
    "Content-Type": "application/json",
    "Authorization": "<<apiKey>>"
}

let options = {
  method: 'POST',
  headers: headers,
  body: JSON.stringify(payload),
};

fetch(url, options)
  .then(res => res.json())
  .then(json => console.log(json))
  .catch(err => console.error('error:' + err));
```

{% endtab %}
{% endtabs %}

Response:

{% tabs %}
{% tab title="JSON" %}

```json
{
	"details":[
		"0cb96695b4c677c1d6c5562d523aa9541cb5dda8"
	],
	"message":"Values added successfully."
}
```

{% endtab %}
{% endtabs %}

## &#x20;Using a training room

For smaller volumes of documents and testing purposes, you can use [training room](https://typless.gitbook.io/typlessapi/typless-hub/document-type#training-room). In the training room, you can train documents for your document type and perform test extractions to quickly see results. Each document type has its own training room. Data you confirm here as correct solutions will be used to train your document type.
