Asynchronous extraction
Most of the time, processing of the documents is not time critical. That is why we also provide an asynchronous endpoint for processing the documents. Currently, the processing is handled with the process-poll method, meaning you will have to check on intervals if the document processing has finished.
Sample code for async extraction
The request for asynchronous processing is the same as the synchronous extract data request; the only difference is that you will immediately get the response with the extraction_id of the process. You will then use this extraction_id to poll for the status and results of the extraction.
You can try out the async extraction with the following sample code - there are currently only examples in Python; other languages will be added soon.
import requests
import base64
file_name = 'name_of_your_document.pdf'
with open(file_name, 'rb') as file:
base64_data = base64.b64encode(file.read()).decode('utf-8')
payload = {
"file": base64_data,
"file_name": file_name,
"document_type_name": "line-item-invoice"
}
url = "https://developers.typless.com/api/extract-data-async"
headers = {
"Accept": "application/json",
"Content-Type": "application/json",
"Authorization": "<<apiKey>>"
}
response = requests.request("POST", url, json=payload, headers=headers)
print(response.json())
If the process trigger was successful, you will get a HTTP 202 Accepted response with a body that will contain the extraction_id of the asynchronous process.
{
"extraction_id": "0d14338251a6db69bfec36face27f7edcab7322"
}
To poll the data, you can then use the extraction_id from the response
import requests
url = "https://developers.typless.com/api/get-extraction-data"
payload = {'extraction_id': 'your-extraction-id'}
headers = {"Authorization": "<<apiKey>>"}
response = requests.request("GET", url, headers=headers, params=payload)
print(response.json())
You will always get a successful response from the poll endpoint (if a catastrophe didn't happen!) The polled data response will always have the same format with the following properties:
error
result
status
Example:
{
"error": {},
"result": {
"customer": "customer-id",
"extracted_fields": [
{
"data_type": "AUTHOR",
"name": "supplier_name",
"values": [
{
"confidence_score": 0.958,
"height": -1,
"page_number": -1,
"value": "ScaleGrid",
"width": -1,
"x": -1,
"y": -1
}
]
}, ...
],
"file_name": "invoice.pdf",
"line_items": [],
"object_id": "0d143385c4fb3ec7b73256be40c4ce02b01bf097",
"vat_rates": []
},
"status": "SUCCESS"
}
The error property will include any errors that might occur during the processing part. Most errors will be related to the input file if it was not valid. The errors will have the standard error format, which also occurs on all the other endpoints with properties:
code
message
details
The result property will be an empty object if the processing was not finished. After the process is completed, it will include the results of the extraction in the same format as the synchronous endpoint. You can read more about the response here.
The status property will include the current status of the process. It has 4 predefined states:
IN_PROGRESS
SUCCESS
ERROR
EXPIRED
Retrieving results using poll queue
Alternatively, you may want to periodically check for invoices that have finished extraction, and collectively get their results. This is where our /awaiting-poll endpoint comes in handy.
For this flow, you do not need to save the extraction IDs. It suffices to periodically call the endpoint like this:
import requests
url = "https://developers.typless.com/api/v1/awaiting-poll"
payload = {'customer': 'customer-id'}
headers = {"Authorization": "<<apiKey>>"}
response = requests.request("GET", url, headers=headers, params=payload)
print(response.json())
If you used the customer
field during extraction to differentiate between companies sharing the same API key, make sure to use the same customer ID here. Otherwise, including an incorrect or mismatched customer ID will result in an empty response — in that case, it's best to omit the field entirely.
You will get a response like this:
{
"extraction_ids": ['0d143385c4fb3ec7b73256be40c4ce02b01bf097',
'0d143385c4fb3e48341eb123f973eabc23111322']
}
Invoices with these extraction IDs have finished processing. We can then poll the results similarly to what we did earlier:
import requests
url = "https://developers.typless.com/api/get-extraction-data"
headers = {"Authorization": "<<apiKey>>"}
for extraction_id in extraction_ids:
payload = {'extraction_id': extraction_id}
response = requests.request("GET", url, headers=headers, params=payload)
print(response.json())
The response format is, of course, in the same format as it was mentioned earlier. Note that documents obtained in this flow can only have two response states:
✅ SUCCESS
❌ ERROR
Documents in progress and expired documents will not be included in the /awaiting-poll
response.
Last updated