config.video controls how the pipeline samples frames, and detects meaningful changes. Increased sensitivity will capture more images, and a lower sensitivity will capture less images.
Default fields are:
screenshot_interval_seconds is 1, sensitivity is 0.1, openai_model is gpt-4.1, whisper_model is whisper-1. You only need to send fields you want to override.Field reference
Number of seconds between sampled frames. Lower values capture more context for fast-moving scenes; higher values save cost for static footage.
Normalized threshold for detecting visual changes. Raise it capture subtle transitions; lower it to ignore small changes.
LLM used for visual reasoning and summarization. Pick a lighter model (for example,
gpt-4o-mini) when you need lower latency.Model used for audio transcription. Swap this if you require multilingual or domain-specific tuning. Model must support verbose_json output format.
Advanced SSIM tuning
Advanced SSIM tuning
Fine-tune how the system judges frame similarity. The
sensitivity field will generate default values for these fields, however if you want more granular control you can override them manually.Tile size for block-based comparisons. Leave
null to use the default value tuned for general content.Mean absolute deviation threshold. Lower numbers capture subtle noise at the expense of more detections.
Triggers when the local SSIM (structural similarity index) drops below the specified value. Helpful for scene-change detection.
Fraction of tiles that can cross the SSIM threshold before the frame counts as “changed.” Lower fractions make detection stricter.