trowel-bricksBuilding a dataset

To build Typless models for data extraction, you need to build a dataset of documents for the document type.

Using your existing data

circle-info

📘 Use it for pre-training

You can use existing data to achieve the state-of-the-art accuracy for your data extraction.

  1. Use all data from documents that have already been manually processed and stored in the database to build a dataset for your document type.

  2. Upload the original file with correct values from your database to train Typless before production.

  3. Use the code to start:

chevron-right1 Open file as base64 string (Lines 4-6)hashtag

Make sure you are pointing to the right path when opening the file.

chevron-right2 Create payload (Lines 8-64)hashtag

The payload consists of learning fields, line items, file name, and a base64 string-encoded file.

chevron-right3 Specify values for learning fields (Lines 16, 20 etc.)hashtag

For every field you have defined in your document type, write the correct value.

chevron-right4 Specify values for line items (Lines 39-60)hashtag

For every line-item row add an array of line item fields with correct values.

chevron-right5 Add file info (Line 61 & 62)hashtag

Add file in base64 and file name.

chevron-right6 Specify document type name (Line 63)hashtag

chevron-right7 Authorize with API key (Line 72)hashtag

Authorize with your API key - prepend it with the word Token.

chevron-right8 Execute the request (Lines 75-77)hashtag

Execute the request and make sure that everything went smooth.

Response:

Using live data

circle-info

📘 Use it in a live environment

Using live data allows you to improve your data extraction continuously and automate new suppliers on the fly.

Typless continuously improves with a closed feedback loop where you provide correct values for the extracted document. Check out the example below.

chevron-right1 Create payload (Lines 5-58)hashtag

Create payload with the following parameters:

  • learning_fields

  • line_items

  • document_object_id

  • document_type_name

chevron-right2 Create fields feedback data (Lines 6-35)hashtag

Set the correct data values for all the defined fields that are on the document.

chevron-right3 Create line items feedback data (Lines 36-55)hashtag

Add all the line items with correct data values that are on the document.

chevron-right4 Set document object id (Line 56)hashtag

Set the document_object_id you get from the extraction response in the object_id key. Read more about the object id here.

chevron-right5 Document type name (Line 57)hashtag

Set the document type name you are providing feedback for.

chevron-right6 Specify headers (Lines 59-63)hashtag

Set the correct headers; make sure the content-type is application/json. Under the Authorization header, put your API key prepended with the word Token

chevron-right7 Execute the request (Lines 65-67)hashtag

Send the POST request with the set payload, headers, and URL.

Response:

Using a training room

For smaller volumes of documents and testing purposes, you can use training room. In the training room, you can train documents for your document type and perform test extractions to quickly see results. Each document type has its own training room. Data you confirm here as correct solutions will be used to train your document type.

Last updated