Transport report

Extracting metadata and line items from transport report with examples in Python and Node.

Overview

This guide covers how to extract metadata and line items from multiple supplier transport documents with examples in Python and Node.

You will extract the following metadata fields:

Name of the supplier
Name of the receiver
From address
To address
Document number
Shipment number
Purchase order number
Issue date
Load weight
Total amount

You will extract the following line item fields:

Product number
Product description
Quantity
Weight
Number of boxes

This guide shows you how to

Getting your API Key

The Authorization header for your API key is: Token YOUR-API-KEY (Login if you do not see one). You can also obtain the API key by visiting the Settings page.

1. Create a new document type

Before you start extracting data, you need to define a document type. Navigate to the Dashboard page and click on the New document type button in the top right corner of the table. Next, select the Transport report card. The wizard will already pre-fill all the needed extraction fields along with the document type configuration. Click on the Create document type button.

This will create a new document type with the name transport-report with the following fields:

supplier_name
receiver_name
from_address
to_address
document_number
shipment_number
purchase_order_number
issue_date
load_weight
total_amount

and the following line item fields:

product_number
product_description
quantity
weight
number_of_boxes

2. Add suppliers

Once your document type is created, you need to add data to the dataset of your document type. To do that, download the following bills of lading:

Each transport report is from a different supplier.

To add a document to the dataset, use the add-document endpoint or use training room, where you can easily upload a file and fill out the necessary information.

The dataset is created by uploading an original file with the correct value for each field defined inside the document type:

1 Open file as Base64 (Lines 4-6)

2 Specify payload (Lines 8-80)

3 Specify headers (Lines 84-87)

4 Send post request (Line 90)

import requests
import base64

file_name = 'bill_of_landing_supplier_1_example_1.pdf'
with open(file_name, 'rb') as file:
    base64_data = base64.b64encode(file.read()).decode('utf-8')

payload = {
    "file": base64_data,
    "file_name": file_name,
    "document_type_name": "transport-report",
    "learning_fields": [
        {
            "name": "supplier_name",
            "value": "Good company"
        },
        {
            "name": "receiver_name",
            "value": "Michel"
        },
        {
            "name": "from_address",
            "value": "bill of rights 23, dept. 5"
        },
        {
            "name": "to_address",
            "value": "bill of rights 773, dept. 1"
        },
        {
            "name": "document_number",
            "value": ""
        },
        {
            "name": "shipment_number",
            "value": "23-45"
        },
        {
            "name": "purchase_order_number",
            "value": "123GG-JJJK"
        },
                {
            "name": "issue_date",
            "value": "2021-02-22"
        }
                {
            "name": "load_weight",
            "value": ""
        },
        {
            "name": "total_amount",
            "value": ""
        }
    ],
    "line_items": [
        [
            {
                "name": "product_number",
                "value": "123-FG"
            },
            {
                "name": "product_description",
                "value": "Good things"
            },
            {
                "name": "quantity",
                "value": "23"
            },
            {
                "name": "weight",
                "value": "745"
            },
            {
                "name": "number_of_boxes",
                "value": "2"
            }

        ]

    ]
}

url = "https://developers.typless.com/api/add-document"

headers = {
    "Accept": "application/json",
    "Content-Type": "application/json",
    "Authorization": "Token YOUR-API-KEY"
}

response = requests.request("POST", url, json=payload, headers=headers)

print(response.json())

const fetch = require('node-fetch');
const fs = require('fs');

const fileName = "bill_of_landing_supplier_1_example_1.pdf";
const base64File = fs.readFileSync(fileName, {encoding: 'base64'});

let url = 'https://developers.typless.com/api/add-document';

const payload =  {
  file: base64File,
  file_name: fileName,
  document_type_name: "transport-report",
  learning_fields: [
        {
            "name": "supplier_name",
            "value": "Good company"
        },
        {
            "name": "receiver_name",
            "value": "Michel"
        },
        {
            "name": "from_address",
            "value": "bill of rights 23, dept. 5"
        },
        {
            "name": "to_address",
            "value": "bill of rights 773, dept. 1"
        },
        {
            "name": "document_number",
            "value": ""
        },
        {
            "name": "shipment_number",
            "value": "23-45"
        },
        {
            "name": "purchase_order_number",
            "value": "123GG-JJJK"
        },
                {
            "name": "issue_date",
            "value": "2021-02-22"
        }
                {
            "name": "load_weight",
            "value": ""
        },
        {
            "name": "total_amount",
            "value": ""
        }
    ],
    "line_items": [
        [
            {
                "name": "product_number",
                "value": "123-FG"
            },
            {
                "name": "product_description",
                "value": "Good things"
            },
            {
                "name": "quantity",
                "value": "23"
            },
            {
                "name": "weight",
                "value": "745"
            },
            {
                "name": "number_of_boxes",
                "value": "2"
            }

        ]

    ]
};

const headers = {
  'Accept': 'application/json',
  'Content-Type': 'application/json',
  'Authorization': 'Token YOUR-API-KEY'
}

let options = {
  method: 'POST',
  headers: headers
};

fetch(url, options)
  .then(res => res.json())
  .then(json => console.log(json))
  .catch(err => console.error('error:' + err));

Response:

{
	"details":[
		"0cb9660762f20e13850d36cd45b48d44b63059f7"
	],
	"message":"Document added successfully."
}

As you can see, to achieve high accuracy, Typless only needs the values that are in the document. Nevertheless, there are some rules to keep in mind when providing values.

You will have two suppliers added to your document type after you run both code examples.

3. Execute training

👍 Training is executed automatically every day at 10 PM CET

For all of your suppliers with new documents in the dataset of all your document types. Free of charge

To immediately see the results, you can trigger the training process on the Dashboard page. Look for the transport-report document type in the list, and click on .

Need more information about training? Read more about it.

4. Extract data from documents

After the training is finished, you can start precisely extracting data from the documents of the trained suppliers. Here you have two new transport reports from the trained suppliers:

Download them and extract the data using the code:

1 Open file as base64 string (Lines 4-6)

2 Specify payload (Lines 8-11)

3 Specify headers (Lines 16-20)

4 Send POST request (Line 22)

import requests
import base64

file_name = 'bill_of_landing_supplier_1_example_2.pdf'
with open(file_name, 'rb') as file:
    base64_data = base64.b64encode(file.read()).decode('utf-8')

payload = {
    "file": base64_data,
    "file_name": file_name,
    "document_type_name": "transport-report"
}

url = "https://developers.typless.com/api/extract-data"

headers = {
    "Accept": "application/json",
    "Content-Type": "application/json",
    "Authorization": "APIKEY"
}

response = requests.request("POST", url, json=payload, headers=headers)

for field in response.json()['extracted_fields']:
    print(f'{field["name"]}: {field["values"][0]["value"]}')

const fetch = require('node-fetch');
const fs = require('fs');

const fileName = 'bill_of_landing_supplier_1_example_2.pdf';
const base64File = fs.readFileSync(fileName, {encoding: 'base64'});

const url = 'https://developers.typless.com/api/extract-data';

const payload = {
  file: base64File,
  file_name: fileName,
  document_type_name: "transport-report"
}

const headers = {
  'Accept': 'application/json',
  'Content-Type': 'application/json',
  'Authorization': 'APIKEY'
}

let options = {
  method: 'POST',
  headers: headers,
  body: JSON.stringify(payload)
};

fetch(url, options)
  .then(res => res.json())
  .then(json => {
    json.extracted_fields.forEach(field => console.log(`${field.name}: ${field.values[0].value}`))
    json.line_items.forEach(
      line_item => {
        console.log('Line item')
        line_item.forEach(field => console.log(`${field.name}: ${field.values[0].value}`))
      }
  )
  })
  .catch(err => console.error('error:' + err));

Response:

// Example extraction response - the provided recipe will not produce equal results
{
    "file_name": "invoice_2.pdf",
    "object_id": "1cb25cc8-c9fa-4149-9a83-b4ed6a2173b9",
    "extracted_fields": [
        {
            "name": "supplier",
            "values": [
                {
                    "x": -1,
                    "y": -1,
                    "width": -1,
                    "height": -1,
                    "value": "ScaleGrid",
                    "confidence_score": "0.968",
                    "page_number": -1
                }
            ],
            "data_type": "AUTHOR"
        },
        {
            "name": "invoice_number",
            "values": [
                {
                    "x": 1989,
                    "y": 545,
                    "width": 323,
                    "height": 54,
                    "value": "20190500005890",
                    "confidence_score": "0.250",
                    "page_number": 0
                },
                {
                    "x": 167,
                    "y": 574,
                    "width": 391,
                    "height": 54,
                    "value": "GB123456789",
                    "confidence_score": "0.250",
                    "page_number": 0
                }
            ],
            "data_type": "STRING"
        },
        {
            "name": "issue_date",
            "values": [
                {
                    "x": 2072,
                    "y": 628,
                    "width": 240,
                    "height": 54,
                    "value": "2019-06-05",
                    "confidence_score": "0.358",
                    "page_number": 0
                }
            ],
            "data_type": "DATE"
        },
        {
            "name": "total_amount",
            "values": [
                {
                    "x": 2146,
                    "y": 1196,
                    "width": 126,
                    "height": 54,
                    "value": "47.5300",
                    "confidence_score": "0.990",
                    "page_number": 0
                }
            ],
            "data_type": "NUMBER"
        }
    ],
    "line_items": [
        [
            {
                "name": "Description",
                "values": [
                    {
                        "x": 208,
                        "y": 1196,
                        "width": 1022,
                        "height": 50,
                        "value": "5/2019-MongoBackend-MgmtStandalone-Small-744 hours",
                        "confidence_score": "0.661",
                        "page_number": 0
                    }
                ],
                "data_type": "STRING"
            },
            {
                "name": "Price",
                "values": [
                    {
                        "x": 2146,
                        "y": 1196,
                        "width": 126,
                        "height": 54,
                        "value": "47.5300",
                        "confidence_score": "0.582",
                        "page_number": 0
                    }
                ],
                "data_type": "NUMBER"
            },
            {
                "name": "Quantity",
                "values": [
                    {
                        "x": 1979,
                        "y": 1196,
                        "width": 23,
                        "height": 54,
                        "value": "1",
                        "confidence_score": "0.647",
                        "page_number": 0
                    }
                ],
                "data_type": "NUMBER"
            }
        ]
    ],
    "customer": null
}

Need a more in-depth explanation of the response? You can read about it here.

5. Continuously improve models

Typless embraces the fact that the world is changing all the time. That's why you can improve models on the fly by providing correct data after extraction. Let's say your company has a new partner, Best Supplier. You don't need to start over with building the dataset. You can simply extract and send the correct data after they are verified by your users. You can learn more about providing feedback on the building dataset page.

📘 Closed workflow loop - improve models live!

Use every action from your users to adapt and improve Typless models without any extra costs.

To send feedback, use the add-document-feedback with object_id.

Running Typless live

The only thing that you need to do to automate your manual data entry is to integrate those simple API calls into your system.

Have any questions or you need some help? Contact us in Messenger.

PreviousVAT invoice NextCustom document

Last updated 7 months ago

hashtagOverview

hashtagGetting your API Key

hashtag1. Create a new document type

hashtag2. Add suppliers

hashtag3. Execute training

hashtag4. Extract data from documents

hashtag5. Continuously improve models

hashtagRunning Typless live