arrows-rotate-reverseAsynchronous extraction

Most of the time, processing of the documents is not time critical. That is why we also provide an asynchronous endpoint for processing the documents. Currently, the processing is handled with the process-poll method, meaning you will have to check on intervals if the document processing has finished.

circle-info

📘 Use webhooks to receive a notification when data extraction is finished

To optimize the asynchronous document processing, implement webhooks to never poll for data again! Check out the Webhook section on how to get started.

Sample code for async extraction

The request for asynchronous processing is the same as the synchronous extract data request; the only difference is that you will immediately get the response with the extraction_id of the process. You will then use this extraction_id to poll for the status and results of the extraction.

You can try out the async extraction with the following sample code - there are currently only examples in Python; other languages will be added soon.

chevron-right1 Open file as base64 string (Lines 4-6)hashtag

Open the file in binary mode and correctly decode it into a base64 string. Make sure that your file is in the same directory as the script.

chevron-right2 Create payload (Lines 8-12)hashtag

Create request payload with all the required parameters:

  • file

  • file_name

  • document_type_name

chevron-right3 Specify headers (Lines 16-20)hashtag

Make sure that the Content-Type is set as application/json.

chevron-right4 Authorize with your API key (Line 19)hashtag
chevron-right5 Execute the request (Lines 22-24)hashtag

Send the request and wait for the response.

If the process trigger was successful, you will get a HTTP 202 Accepted response with a body that will contain the extraction_id of the asynchronous process.

To poll the data, you can then use the extraction_id from the response

chevron-right1 Authorize with your API key (Line 7)hashtag
chevron-right2 Pass the extraction_id to query params (Line 5)hashtag

Pass the extraction_id of the process you got from the /extract-data-async endpoint and pass it as a query parameter extraction_id

chevron-right3 Execute the request (Line 9)hashtag

Execute the request and parse the response.

You will always get a successful response from the poll endpoint (if a catastrophe didn't happen!) The polled data response will always have the same format with the following properties:

  • error

  • result

  • status

Example:

The error property will include any errors that might occur during the processing part. Most errors will be related to the input file if it was not valid. The errors will have the standard error format, which also occurs on all the other endpoints with properties:

  • code

  • message

  • details

The result property will be an empty object if the processing was not finished. After the process is completed, it will include the results of the extraction in the same format as the synchronous endpoint. You can read more about the response here.

The status property will include the current status of the process. It has 4 predefined states:

  • IN_PROGRESS

  • SUCCESS

  • ERROR

  • EXPIRED

circle-info

📘 EXPIRED status

The process gets an EXPIRED status 48 hours after the process has finished. This means that you have 48 hours to poll the data and access the results. Afterwards, the data will be deleted.

Retrieving results using poll queue

Alternatively, you may want to periodically check for invoices that have finished extraction, and collectively get their results. This is where our /awaiting-poll endpoint comes in handy.

For this flow, you do not need to save the extraction IDs. It suffices to periodically call the endpoint like this:

chevron-right1 Set the poll queue endpoint (Line 3)hashtag

This is the /awaiting-poll endpoint, which returns a list of extraction IDs that are ready. You don’t need to save or track IDs individually.

chevron-right2 Add the customer filter (Line 5)hashtag

Use customer filter if you are already using it for extraction, to differentiate between different companies that are using the same API key.

chevron-right3 Authorize the request (Line 7)hashtag

Insert your API key into the Authorization header. You can find it in your Typless profile settings.

chevron-right4 Execute the poll request (Line 9)hashtag

Send a GET Request to retrieve all extraction IDs that are ready. Each ID in the response corresponds to a document that has finished processing.

If you used the customer field during extraction to differentiate between companies sharing the same API key, make sure to use the same customer ID here. Otherwise, including an incorrect or mismatched customer ID will result in an empty response — in that case, it's best to omit the field entirely.

You will get a response like this:

Invoices with these extraction IDs have finished processing. We can then poll the results similarly to what we did earlier:

chevron-right1 Set the result retrieval endpoint (Line 3)hashtag

Use the /get-extraction-data endpoint to retrieve results for each ID returned earlier.

chevron-right2 Authorize again (Line 4)hashtag

Use the same API key in the Authorization header.

chevron-right3 Loop through each extraction ID (Line 6)hashtag

Iterate over the list of extraction_ids returned by /awaiting-poll.

chevron-right4 Execute individual requests (Line 8)hashtag

For each ID, send a GET request to retrieve the result. You’ll receive the document's extracted content and status.

The response format is, of course, in the same format as it was mentioned earlier. Note that documents obtained in this flow can only have two response states:

✅ SUCCESS

❌ ERROR

Documents in progress and expired documents will not be included in the /awaiting-poll response.

Last updated