Document processing config

Field reference

Use config.documents to decide whether the pipeline should re-run files it has already processed and whether to extract granular elements in addition to the default Markdown output.

Default fields are: reprocess_documents is true and extract_elements is false. You only need to send fields you want to override.

import type { JobInput } from "@trelent/data-ingestion";

const job: JobInput = {
  connector: { 
    type: "url", 
    urls: ["https://signed.example.com/contract.pdf"] 
  },
  output: { type: "s3-signed-url" },
  config: {
    documents: {
      reprocess_documents: false,
      extract_elements: true,
    },
  },
};

Field reference

config.documents.reprocess_documents

boolean

default:"true"

Re-process pages of documents that have been determined to be of a lower quality. Set to false for faster incremental ingestions when your connector identifiers are stable.

config.documents.extract_elements

boolean

default:"false"

Emit structured PDF element metadata alongside Markdown. Turn this on when downstream consumers need more robust document breakdown.

Processing config

Video processing config

Getting started

Connectors

Configuration

Outputs

Jobs

Files

Admin

Field reference

Getting started

Connectors

Configuration

Outputs

Jobs

Files

Admin

​Field reference

Field reference