Asynchronous extraction

Most of the time, processing of the documents is not time critical. That is why we also provide an asynchronous endpoint for processing the documents. Currently, the processing is handled with the process-poll method, meaning you will have to check on intervals if the document processing has finished.

📘 Use webhooks to receive a notification when data extraction is finished

To optimize the asynchronous document processing, implement webhooks to never poll for data again! Check out the Webhook section on how to get started.

Sample code for async extraction

The request for asynchronous processing is the same as the synchronous extract data request; the only difference is that you will immediately get the response with the extraction_id of the process. You will then use this extraction_id to poll for the status and results of the extraction.

You can try out the async extraction with the following sample code - there are currently only examples in Python; other languages will be added soon.

1 Open file as base64 string (Lines 4-6)

Open the file in binary mode and correctly decode it into a base64 string. Make sure that your file is in the same directory as the script.

2 Create payload (Lines 8-12)

Create request payload with all the required parameters:

file
file_name
document_type_name

3 Specify headers (Lines 16-20)

Make sure that the Content-Type is set as application/json.

4 Authorize with your API key (Line 19)

You can get your API key at https://app.typless.com/settings/profile

5 Execute the request (Lines 22-24)

Send the request and wait for the response.

import requests
import base64

file_name = 'name_of_your_document.pdf'
with open(file_name, 'rb') as file:
    base64_data = base64.b64encode(file.read()).decode('utf-8')

payload = {
    "file": base64_data,
    "file_name": file_name,
    "document_type_name": "line-item-invoice"
}

url = "https://developers.typless.com/api/extract-data-async"

headers = {
    "Accept": "application/json",
    "Content-Type": "application/json",
    "Authorization": "<<apiKey>>"
}

response = requests.request("POST", url, json=payload, headers=headers)

print(response.json())

If the process trigger was successful, you will get a HTTP 202 Accepted response with a body that will contain the extraction_id of the asynchronous process.

{
    "extraction_id": "0d14338251a6db69bfec36face27f7edcab7322"
}

To poll the data, you can then use the extraction_id from the response

1 Authorize with your API key (Line 7)

You can get your API key at https://app.typless.com/settings/profile

2 Pass the extraction_id to query params (Line 5)

Pass the extraction_id of the process you got from the /extract-data-async endpoint and pass it as a query parameter extraction_id

3 Execute the request (Line 9)

Execute the request and parse the response.

import requests

url = "https://developers.typless.com/api/get-extraction-data"

payload = {'extraction_id': 'your-extraction-id'}

headers = {"Authorization": "<<apiKey>>"}

response = requests.request("GET", url, headers=headers, params=payload)

print(response.json())

You will always get a successful response from the poll endpoint (if a catastrophe didn't happen!) The polled data response will always have the same format with the following properties:

error
result
status

Example:

{
  "error": {},
  "result": {
    "customer": "customer-id",
    "extracted_fields": [
      {
        "data_type": "AUTHOR",
        "name": "supplier_name",
        "values": [
          {
            "confidence_score": 0.958,
            "height": -1,
            "page_number": -1,
            "value": "ScaleGrid",
            "width": -1,
            "x": -1,
            "y": -1
          }
        ]
      }, ...
    ],
    "file_name": "invoice.pdf",
    "line_items": [],
    "object_id": "0d143385c4fb3ec7b73256be40c4ce02b01bf097",
    "vat_rates": []
  },
  "status": "SUCCESS"
}

The error property will include any errors that might occur during the processing part. Most errors will be related to the input file if it was not valid. The errors will have the standard error format, which also occurs on all the other endpoints with properties:

code
message
details

The result property will be an empty object if the processing was not finished. After the process is completed, it will include the results of the extraction in the same format as the synchronous endpoint. You can read more about the response here.

The status property will include the current status of the process. It has 4 predefined states:

IN_PROGRESS
SUCCESS
ERROR
EXPIRED

📘 EXPIRED status

The process gets an EXPIRED status 48 hours after the process has finished. This means that you have 48 hours to poll the data and access the results. Afterwards, the data will be deleted.

Retrieving results using poll queue

Alternatively, you may want to periodically check for invoices that have finished extraction, and collectively get their results. This is where our /awaiting-poll endpoint comes in handy.

For this flow, you do not need to save the extraction IDs. It suffices to periodically call the endpoint like this:

1 Set the poll queue endpoint (Line 3)

This is the /awaiting-poll endpoint, which returns a list of extraction IDs that are ready. You don’t need to save or track IDs individually.

2 Add the customer filter (Line 5)

Use customer filter if you are already using it for extraction, to differentiate between different companies that are using the same API key.

3 Authorize the request (Line 7)

Insert your API key into the Authorization header. You can find it in your Typless profile settings.

4 Execute the poll request (Line 9)

Send a GET Request to retrieve all extraction IDs that are ready. Each ID in the response corresponds to a document that has finished processing.

import requests

url = "https://developers.typless.com/api/v1/awaiting-poll"

payload = {'customer': 'customer-id'}

headers = {"Authorization": "<<apiKey>>"}

response = requests.request("GET", url, headers=headers, params=payload)

print(response.json())

If you used the customer field during extraction to differentiate between companies sharing the same API key, make sure to use the same customer ID here. Otherwise, including an incorrect or mismatched customer ID will result in an empty response — in that case, it's best to omit the field entirely.

You will get a response like this:

{
  "extraction_ids": ['0d143385c4fb3ec7b73256be40c4ce02b01bf097',
                     '0d143385c4fb3e48341eb123f973eabc23111322']
}

Invoices with these extraction IDs have finished processing. We can then poll the results similarly to what we did earlier:

1 Set the result retrieval endpoint (Line 3)

Use the /get-extraction-data endpoint to retrieve results for each ID returned earlier.

2 Authorize again (Line 4)

Use the same API key in the Authorization header.

3 Loop through each extraction ID (Line 6)

Iterate over the list of extraction_ids returned by /awaiting-poll.

4 Execute individual requests (Line 8)

For each ID, send a GET request to retrieve the result. You’ll receive the document's extracted content and status.

import requests

url = "https://developers.typless.com/api/get-extraction-data"
headers = {"Authorization": "<<apiKey>>"}

for extraction_id in extraction_ids:
    payload = {'extraction_id': extraction_id}
    response = requests.request("GET", url, headers=headers, params=payload)
    print(response.json())

The response format is, of course, in the same format as it was mentioned earlier. Note that documents obtained in this flow can only have two response states:

✅ SUCCESS

❌ ERROR

Documents in progress and expired documents will not be included in the /awaiting-poll response.

PreviousData extraction NextPlugins

Last updated 3 months ago