config.documents to decide whether the pipeline should re-run files it has already processed and whether to extract granular elements in addition to the default Markdown output.
Default fields are:
reprocess_documents is true and extract_elements is false. You only need to send fields you want to override.Field reference
Re-process pages of documents that have been determined to be of a lower quality. Set to
false for faster incremental ingestions when your connector identifiers are stable.Emit structured PDF element metadata alongside Markdown. Turn this on when downstream consumers need more robust document breakdown.