Building a dataset
To build Typless models for data extraction, you need to build a dataset of documents for the document type.
Using your existing data
📘 Use it for pre-training
You can use existing data to achieve the state-of-the-art accuracy for your data extraction.
Use all data from documents that have already been manually processed and stored in the database to build a dataset for your document type.
Upload the original file with correct values from your database to train Typless before production.
Use the code to start:
1 Open file as base64 string (Lines 4-6)
Make sure you are pointing to the right path when opening the file.
2 Create payload (Lines 8-64)
The payload consists of learning fields, line items, file name, and a base64 string-encoded file.
3 Specify values for learning fields (Lines 16, 20 etc.)
For every field you have defined in your document type, write the correct value.
4 Specify values for line items (Lines 39-60)
For every line-item row add an array of line item fields with correct values.
Response:
Using live data
📘 Use it in a live environment
Using live data allows you to improve your data extraction continuously and automate new suppliers on the fly.
Typless continuously improves with a closed feedback loop where you provide correct values for the extracted document. Check out the example below.
1 Create payload (Lines 5-58)
Create payload with the following parameters:
learning_fields
line_items
document_object_id
document_type_name
2 Create fields feedback data (Lines 6-35)
Set the correct data values for all the defined fields that are on the document.
3 Create line items feedback data (Lines 36-55)
Add all the line items with correct data values that are on the document.
4 Set document object id (Line 56)
Set the document_object_id you get from the extraction response in the object_id key. Read more about the object id here.
6 Specify headers (Lines 59-63)
Set the correct headers; make sure the content-type is application/json. Under the Authorization header, put your API key prepended with the word Token
Response:
Using a training room
For smaller volumes of documents and testing purposes, you can use training room. In the training room, you can train documents for your document type and perform test extractions to quickly see results. Each document type has its own training room. Data you confirm here as correct solutions will be used to train your document type.
Last updated