Skip to content

Algorithm Analysis Interface Specification Document

Document Properties

PropertyDetails
Document VersionV1.6
Creation Date2026-04-28
ScopeVideo Stream Detection Interface and VLM Algorithm Interface (currently supports prompt-driven video stream analysis and dynamic configuration updates)

1. Overview

This document defines two categories of standardized interfaces for algorithm analysis:

  • Video Stream Detection Interface: Currently defines five types of video stream integration protocols: detection type query, start, stop, HTTP callback, and RTMP live streaming. The callback structure for three types of anomaly detection results — fire_detection, bag_detection, no_helmet_detection — is explicitly specified, and instanceId is required to associate patrol task instances.
  • VLM Algorithm Interface: Defines a prompt-driven video stream analysis protocol. The overall lifecycle, callback structure, and RTMP streaming conventions reuse those of the Video Stream Detection Interface. The only difference is that the start interface is extended to explicitly accept a natural language prompt, and a separate path is used to distinguish VLM tasks.

The Video Stream Detection Interface and VLM Algorithm Interface specifications are both finalized. The VLM interface shares the same task control, callback, and streaming patterns as the Video Stream Detection Interface, differing only in the addition of a prompt parameter in the start request and the use of VLM-specific paths.

2. General Specifications

  • All HTTP interface requests and responses use JSON format with Content-Type fixed as application/json.
  • The live video streaming link uses the RTMP protocol with FLV container format and H.264 video encoding.
  • Interface base paths follow a unified module convention; all algorithm-related HTTP interfaces use the analysis module.
  • HTTP write operation interfaces follow the "plural resource + action sub-path" style and uniformly use the POST method.
  • HTTP interfaces requiring request tracing and result matching must include a globally unique requestId; UUID generation is recommended.
  • The Video Stream Detection Interface consists of the detection type query interface, start interface, stop interface, HTTP callback interface, and RTMP live streaming. The legacy FTP/FTPS screenshot upload link has been deprecated and is not included in this version.
  • Video stream detection tasks follow a lifecycle of "Start → Detect → Alarm Callback → Stop". The start, stop, and alarm callback operations must all carry instanceId to associate the current task instance.
  • VLM algorithm interfaces follow the same "Start → Analyze → Callback → Stop" lifecycle as the Video Stream Detection Interface. The start, stop, and callback operations must all carry instanceId, and the start request must additionally include a prompt describing the analysis objective.
  • The caller must provide an accessible callback/alarm receiving address; the algorithm service pushes analysis results to this address as specified. An inaccessible address is treated as a caller configuration error.
  • Platform-side HTTP interface responses uniformly use a Result<T> structure with code, message, and data fields. On successful processing of a video stream detection callback, the platform returns HTTP 200 + Result<null>.
  • On successful processing of a VLM callback, the platform similarly returns HTTP 200 + Result<null>; the failure response semantics are consistent with those of the video stream detection callback.
  • Business status codes at the interface layer reuse existing definitions: 200 for success, 500 for failure.
  • Algorithm analysis result codes are returned via the resCode field in the business data, strictly reusing the existing AnalysisResultCodeEnum enumeration. Custom codes are prohibited to ensure no changes to the system's parsing logic.
  • All extension fields (extParams / extInfo) are optional and used to pass through personalized parameters without affecting core parsing logic.

3. Video Stream Detection Interface Definition

3.1 Interface Overview

ConfigurationDetails
Interface NameVideo Stream Detection Interface
Interface CompositionDetection Type Query Interface + Start Interface + Stop Interface + HTTP Callback Interface + RTMP Live Streaming
Detection Type Query Path/api/v1/analysis/video-stream-detect-types
Start Interface Path/api/v1/analysis/video-stream-tasks/start
Stop Interface Path/api/v1/analysis/video-stream-tasks/stop
HTTP Callback Path/api/v1/analysis/video-stream-callbacks/create
HTTP Request MethodGET for detection type query; POST for all other HTTP interfaces
RTMP Push URLReturned by the algorithm side upon successful start; format: rtmp://{host}:{port}/drtm/{instanceId}
Interface DescriptionThe platform may first call the detection type query interface to retrieve the currently available video stream detection algorithm catalog, then call the start interface to submit the video stream URL and task instance ID. Upon successful acceptance, the algorithm side returns the RTMP push URL for the current task instance. Anomalies are reported via the fixed callback interface, and an annotated video stream is continuously pushed. When the task ends, the platform calls the stop interface to terminate detection.
Currently Available Detection Typesfire_detection, bag_detection, no_helmet_detection
  • Video stream detection tasks use instanceId as the primary business association key. The instanceId in start requests, stop requests, and alarm callbacks must be identical to accurately associate algorithm alerts with the current patrol task instance.
  • Per API design conventions, task control interfaces use the analysis module with plural resource names and action sub-paths. The detection type query interface uses GET to retrieve the available catalog. The algorithm side should configure reportUrl as the full address of the fixed callback interface.
  • If the platform has a unified gateway domain, the recommended full callback address is: https://{domain}/api/v1/analysis/video-stream-callbacks/create.
  • The RTMP push URL is returned by the algorithm side in the successful start response. The streamKey is fixed to the current instanceId, meaning the push URL format is rtmp://{host}:{port}/drtm/{instanceId}.
  • After a task ends, the platform must explicitly call the stop interface. After a successful stop, the algorithm side must not push new alerts or continue streaming for that instanceId.

3.2 Video Stream Detection Type Query Interface Definition

3.2.1 Interface Basic Information

ConfigurationDetails
Interface NameVideo Stream Detection Type Query Interface
Interface Path/api/v1/analysis/video-stream-detect-types
Request MethodGET
Interface DescriptionRetrieves the currently available video stream detection algorithm type catalog, for use by the platform before configuring or initiating tasks
  • The path follows API design conventions: /api/v1/{module}/..., where {module}=analysis and the resource name uses the plural form video-stream-detect-types.
  • The response uses a flat array structure where each element contains item and itemDesc, representing a detection type that can be directly submitted to a video stream detection task.
  • The current version returns only published detection types; unreleased, in-beta, or internal test types are not returned.

3.2.2 Response Example

json
{
    "code": 200,
    "message": "success",
    "data": [
        {
            "item": "fire_detection",
            "itemDesc": "Fire Detection"
        },
        {
            "item": "bag_detection",
            "itemDesc": "Abandoned Object Detection"
        },
        {
            "item": "no_helmet_detection",
            "itemDesc": "No Helmet Detection"
        }
    ]
}

3.2.3 Response Field Description

FieldTypeDescription
codeIntegerUnified business status code; 200 = query successful, 500 on failure
messageStringUnified response description; fixed as success on success, error details on failure
dataArrayList of currently available video stream detection types
data.itemStringDetection type code, used to identify a specific video stream detection capability
data.itemDescStringHuman-readable description of the detection type, for frontend display or manual configuration

3.2.4 Currently Available Detection Types

itemitemDescDescription
fire_detectionFire DetectionUsed to identify fire anomalies in video streams
bag_detectionAbandoned Bag DetectionUsed to identify abandoned object anomalies in video streams
no_helmet_detectionNo Helmet DetectionUsed to identify human head targets not wearing safety helmets in video streams

3.3 Video Stream Detection Start Interface Definition

3.3.1 Interface Basic Information

ConfigurationDetails
Interface NameVideo Stream Detection Start Interface
Interface Path/api/v1/analysis/video-stream-tasks/start
Request MethodPOST
Content-Typeapplication/json
Interface DescriptionThe platform submits the video stream URL and starts the video stream detection task for the current task instance
  • The path follows API design conventions: /api/v1/{module}/..., where {module}=analysis, the resource name uses the plural form video-stream-tasks, and the action is start.
  • The instanceId in the start request is the unique identifier of the current patrol task instance; the algorithm side must return this field unchanged in subsequent alarm callbacks.
  • In the current version, one start request corresponds to one video stream detection task instance. To detect multiple video streams, a separate start request must be issued for each stream.

3.3.2 Request Example

json
{
    "requestId": "550e8400-e29b-41d4-a716-446655440000",
    "instanceId": "PATROL-INSTANCE-20260424-0001",
    "videoStreamUrl": "rtsp://192.168.1.100/live/robot-camera-main"
}

3.3.3 Field Description

FieldTypeRequiredDescription
requestIdStringYesUnique identifier for the start request, used for request tracing; UUID is recommended
instanceIdStringYesCurrent task instance ID; algorithm alarm callbacks and stop requests both use this field to associate the same detection task
videoStreamUrlStringYesURL of the video stream to be detected; may be RTSP/RTMP or other mutually agreed protocol URLs, must be accessible by the algorithm service

3.3.4 Response Example

json
{
    "code": 200,
    "message": "success",
    "data": {
        "requestId": "550e8400-e29b-41d4-a716-446655440000",
        "instanceId": "PATROL-INSTANCE-20260424-0001",
        "rtmpPushUrl": "rtmp://10.10.10.20:1935/drtm/PATROL-INSTANCE-20260424-0001"
    }
}

3.3.5 Response Field Description

FieldTypeDescription
codeIntegerUnified business status code; 200 = start request successfully accepted, 500 on failure
messageStringUnified response description; fixed as success on success, error details on failure
data.requestIdStringIdentical to the requestId in the request, used for request tracing
data.instanceIdStringIdentical to the instanceId in the request, representing the accepted detection task instance
data.rtmpPushUrlStringRTMP push URL returned by the algorithm side; the streamKey is fixed to the current instanceId

3.4 Video Stream Detection Stop Interface Definition

3.4.1 Interface Basic Information

ConfigurationDetails
Interface NameVideo Stream Detection Stop Interface
Interface Path/api/v1/analysis/video-stream-tasks/stop
Request MethodPOST
Content-Typeapplication/json
Interface DescriptionCalled by the platform when a video stream detection task ends, instructing the algorithm side to stop detection for the specified task instance
  • The stop interface uses instanceId as the primary stop condition. If multiple control requests exist for the same task, the state of the instanceId from the most recent valid start takes precedence.
  • The platform must call the stop interface to release algorithm resources whenever a task completes, is cancelled, or is no longer needed.

3.4.2 Request Example

json
{
    "requestId": "9e0c7b2e-4f65-4c82-8f63-0bc4f9d3857f",
    "instanceId": "PATROL-INSTANCE-20260424-0001"
}

3.4.3 Field Description

FieldTypeRequiredDescription
requestIdStringYesUnique identifier for the stop request, used for request tracing; UUID is recommended
instanceIdStringYesCurrent task instance ID; the algorithm side uses this field to stop the corresponding detection task

3.4.4 Response Example

json
{
    "code": 200,
    "message": "success",
    "data": {
        "requestId": "9e0c7b2e-4f65-4c82-8f63-0bc4f9d3857f",
        "instanceId": "PATROL-INSTANCE-20260424-0001"
    }
}

3.4.5 Response Field Description

FieldTypeDescription
codeIntegerUnified business status code; 200 = stop request successfully accepted, 500 on failure
messageStringUnified response description; fixed as success on success, error details on failure
data.requestIdStringIdentical to the requestId in the request, used for request tracing
data.instanceIdStringIdentical to the instanceId in the request, representing the detection task instance whose stop has been accepted

3.5 Video Stream Detection Callback Interface Definition

3.5.1 Interface Basic Information

ConfigurationDetails
Interface NameVideo Stream Detection Callback Interface
Interface Path/api/v1/analysis/video-stream-callbacks/create
Request MethodPOST
Content-Typeapplication/json
Interface DescriptionUpon detecting an anomaly, the algorithm side invokes the callback in batches to report video stream detection results; the callback must include the task instance ID submitted at start time
  • The callback interface is provided by the platform side; the algorithm side configures the full callback address via reportUrl.
  • The path follows API design conventions: /api/v1/{module}/..., where {module}=analysis, the resource name uses the plural form video-stream-callbacks, and write operations use POST .../create.
  • All request body / response body fields use camelCase.

3.5.2 Request Example

json
{
    "requestId": "550e8400-e29b-41d4-a716-446655440000",
    "instanceId": "PATROL-INSTANCE-20260424-0001",
    "resultsList": [
        {
            "objectId": "1212",
            "results": [
                {
                    "type": "fire_detection",
                    "value": "1",
                    "code": "2000",
                    "resImageUrl": "detect-results/result_1745318400.jpg",
                    "pos": [],
                    "conf": 1.0,
                    "desc": "fire_detection"
                },
                {
                    "type": "no_helmet_detection",
                    "value": "1",
                    "code": "2000",
                    "resImageUrl": "detect-results/result_1745318400.jpg",
                    "pos": [],
                    "conf": 1.0,
                    "desc": "no_helmet_detection"
                }
            ]
        }
    ]
}

3.5.3 Field Description

FieldTypeRequiredDescription
requestIdStringYesStart request unique identifier; recommended to match the requestId in the start interface for request tracing
instanceIdStringYesCurrent task instance ID; must be identical to the instanceId in the start interface to associate alarm results with the specific task instance
resultsListArrayYesResult list; current implementation contains exactly one object
resultsList.objectIdStringYesDetection object ID; current implementation is fixed as 1212
resultsList.resultsArrayYesDetection result list; dynamically generated based on the actual detected anomaly types
results.typeStringYesAnomaly type; see "3.5.4 Detection Type Enumeration" for valid values
results.valueStringYesDetection value; current implementation is fixed as 1, indicating an anomaly was detected in this round
results.codeStringYesResult code; current implementation is fixed as 2000, strictly reusing AnalysisResultCodeEnum
results.resImageUrlStringNoAlarm image path; represents the object path of the alarm image in object storage, using the S3 protocol
results.posArrayNoDetection bounding box position; current implementation returns an empty array
results.confNumberNoConfidence score; current implementation is fixed as 1.0
results.descStringYesDescription; current implementation matches type

3.5.4 Detection Type Enumeration

type ValueTrigger ConditionDetection Logic
fire_detectionFire detectedfire_bag model cls=0
bag_detectionAbandoned bag detectedfire_bag model cls=1
no_helmet_detectionHuman head without helmet detectedhead_aqm model cls=0 with a pedestrian bounding box present

3.5.5 Response Convention

  • Upon successful processing of a callback, the platform must return HTTP 200 with a unified Result<null> response body, for example:
json
{
    "code": 200,
    "message": "success",
    "data": null
}
  • When the request body is missing required fields, contains invalid field formats, or includes unsupported enum values, it is recommended to return HTTP 400, for example:
json
{
    "code": 40001,
    "message": "Parameter validation failed",
    "data": null
}
  • When an internal exception occurs during platform processing, it is recommended to return HTTP 500, for example:
json
{
    "code": 50000,
    "message": "Internal server error",
    "data": null
}
  • The current implementation logs callback response content. Retry behavior is determined by the algorithm side according to its own retry policy.
  • The HTTP callback call timeout is fixed at 10s.

3.5.6 Trigger Timing and Throttling Rules

  • The platform receives alarm results via callbacks and does not depend on the algorithm's internal throttling parameters.
  • The algorithm side may maintain throttling strategies as needed (e.g., alarm deduplication, time interval control), but these are internal implementation details and not part of the platform interface contract.
  • The platform integration focus is: reliably receiving callback requests triggered when an anomaly is detected, and accurately attributing alarm results to the corresponding task instance based on instanceId.

3.6 RTMP Streaming Interface Definition

3.6.1 Interface Basic Information

ConfigurationDetails
Interface NameVideo Stream RTMP Streaming Interface
Push URLrtmp://{host}:{port}/drtm/{instanceId}
ProtocolRTMP
Video EncodingH.264 (libx264)
Container FormatFLV
Interface DescriptionUpon successful acceptance of the start request, the algorithm side returns the corresponding RTMP URL and continuously pushes an AI-annotated video stream for real-time consumption by the backend or media server

3.6.2 Streaming Parameters

ParameterValueDescription
Resolution1280×720Hard-coded in current implementation
Frame Rate15 fpsFrom fps configuration
EncodingH.264 (libx264)Software encoding
Bitrate1200kFixed bitrate
Pixel Formatyuv420pStandard compatible format
PresetsuperfastEncoding speed priority
TunezerolatencyLow latency
GOP10Keyframe interval of 10 frames
Container FormatFLVStandard RTMP container format

3.6.3 Streaming Content

  • Raw video frames are output together with AI detection bounding box annotations.
  • Current detection bounding box color mapping:
Detection TargetAnnotation Color
Fire(0, 255, 0)
Abandoned Bag(0, 0, 255)
Human Head(255, 0, 0)
Safety Helmet(0, 255, 255)

3.6.4 Receiver Requirements

  • The receiver must deploy an RTMP server (e.g., Nginx-RTMP, SRS, Red5) with the listening port configured per the actual deployment.
  • The RTMP server must accept continuous streaming to the path /drtm/{instanceId}.
  • instanceId is used directly as the streamKey for a single video stream. The algorithm side assembles the full rtmpPushUrl after a successful start and returns it to the platform.
  • For the same instanceId, the streaming lifecycle must remain consistent with the detection task lifecycle; streaming must terminate when the task stops.

3.7 Result Code Enum Reuse Specification

Note: The enumeration in this section represents business result codes for video stream alarm result items results[].code and is not equivalent to HTTP-layer status codes.

Enum ValueCodeApplicable Description for Video Stream Detection
SUCCESS2000All anomaly detection results reported in the current version use this code, indicating the anomaly has been successfully identified and written to results[]

Note 1: The current version only reports alarm results when an anomaly is detected; no separate result item is defined for "no anomaly detected."

Note 2: If future versions need to report anomalous results such as frame extraction failures or analysis failures during video stream detection, existing result codes from the appendix must still be reused; adding custom codes is prohibited.

4. VLM Algorithm Interface Definition

4.1 Interface Basic Information

ConfigurationDetails
Interface NameVLM Algorithm Interface
Interface CompositionStart Interface + Stop Interface + HTTP Callback Interface + RTMP Live Streaming + Dynamic Configuration Update Interface
Start Interface Path/api/v1/analysis/vlm-video-stream-tasks/start
Stop Interface Path/api/v1/analysis/vlm-video-stream-tasks/stop
Configuration Update Path/api/v1/analysis/vlm-video-stream-tasks/update
HTTP Callback Path/api/v1/analysis/vlm-video-stream-callbacks/create
HTTP Request MethodPOST
RTMP Push URLReturned by the algorithm side upon successful start; format: rtmp://{host}:{port}/drtm/{instanceId}
Interface DescriptionThe platform submits the video stream URL, task instance ID, and a natural language prompt. Upon successful acceptance, the algorithm side returns the RTMP push URL for the current task instance. When a target matching the prompt is identified, results are reported via the fixed callback interface, and an annotated video stream is continuously pushed. During task execution, the prompt and extension parameters can be updated online via the dynamic configuration update interface. When the task ends, the platform calls the stop interface to terminate analysis.
  • The VLM interface's overall interaction pattern, response structure, callback response convention, and RTMP streaming specification all reuse those defined in Section 3 (Video Stream Detection Interface).
  • The only protocol difference from the Video Stream Detection Interface is that the VLM start interface must include a prompt field describing the analysis objective for this video stream session, e.g., "Check the video stream for a specific defect."
  • To avoid mixing with rule-based video stream detection tasks, VLM interfaces use dedicated paths: vlm-video-stream-tasks and vlm-video-stream-callbacks.
  • VLM tasks also use instanceId as the primary business association key. The instanceId in start requests, stop requests, configuration update requests, and callbacks must all be identical.
  • During a running VLM task, the platform may update the prompt and other runtime parameters online via the dynamic configuration update interface without stopping and restarting the task.

4.2 VLM Analysis Start Interface Definition

4.2.1 Interface Basic Information

ConfigurationDetails
Interface NameVLM Analysis Start Interface
Interface Path/api/v1/analysis/vlm-video-stream-tasks/start
Request MethodPOST
Content-Typeapplication/json
Interface DescriptionThe platform submits the video stream URL, current task instance ID, and prompt to start the VLM video stream analysis task for the current task instance
  • The path follows API design conventions: /api/v1/{module}/..., where {module}=analysis, the resource name uses the plural form vlm-video-stream-tasks, and the action is start.
  • The instanceId in the start request is the unique identifier of the current patrol task instance; the algorithm side must return this field unchanged in subsequent callbacks.
  • prompt is the core control parameter for this VLM analysis session; clear, actionable, and observable natural language descriptions are recommended to avoid ambiguity.
  • In the current version, one start request corresponds to one VLM video stream analysis task instance. To analyze multiple video streams simultaneously, a separate start request must be issued for each stream.

4.2.2 Request Example

json
{
    "requestId": "650e8400-e29b-41d4-a716-446655440000",
    "instanceId": "PATROL-INSTANCE-20260427-0001",
    "videoStreamUrl": "rtsp://192.168.1.100/live/robot-camera-main",
    "prompt": "Check the video stream for a specific defect"
}

4.2.3 Field Description

FieldTypeRequiredDescription
requestIdStringYesUnique identifier for the start request, used for request tracing; UUID is recommended
instanceIdStringYesCurrent task instance ID; algorithm callbacks and stop requests both use this field to associate the same analysis task
videoStreamUrlStringYesURL of the video stream to be analyzed; may be RTSP/RTMP or other mutually agreed protocol URLs, must be accessible by the algorithm service
promptStringYesVLM prompt, describing the analysis objective for this video stream session, e.g., "Check the video stream for a specific defect"

4.2.4 Response Example

json
{
    "code": 200,
    "message": "success",
    "data": {
        "requestId": "650e8400-e29b-41d4-a716-446655440000",
        "instanceId": "PATROL-INSTANCE-20260427-0001",
        "rtmpPushUrl": "rtmp://10.10.10.20:1935/drtm/PATROL-INSTANCE-20260427-0001"
    }
}

4.2.5 Response Field Description

FieldTypeDescription
codeIntegerUnified business status code; 200 = start request successfully accepted, 500 on failure
messageStringUnified response description; fixed as success on success, error details on failure
data.requestIdStringIdentical to the requestId in the request, used for request tracing
data.instanceIdStringIdentical to the instanceId in the request, representing the accepted VLM analysis task instance
data.rtmpPushUrlStringRTMP push URL returned by the algorithm side; the streamKey is fixed to the current instanceId

4.3 VLM Analysis Stop Interface Definition

4.3.1 Interface Basic Information

ConfigurationDetails
Interface NameVLM Analysis Stop Interface
Interface Path/api/v1/analysis/vlm-video-stream-tasks/stop
Request MethodPOST
Content-Typeapplication/json
Interface DescriptionCalled by the platform when a VLM video stream analysis task ends, instructing the algorithm side to stop VLM analysis for the specified task instance
  • The stop interface uses instanceId as the primary stop condition. If multiple control requests exist for the same task, the state of the instanceId from the most recent valid start takes precedence.
  • The platform must call the stop interface to release algorithm resources whenever a task completes, is cancelled, or analysis is no longer needed.

4.3.2 Request Example

json
{
    "requestId": "7c0c7b2e-4f65-4c82-8f63-0bc4f9d3857f",
    "instanceId": "PATROL-INSTANCE-20260427-0001"
}

4.3.3 Field Description

FieldTypeRequiredDescription
requestIdStringYesUnique identifier for the stop request, used for request tracing; UUID is recommended
instanceIdStringYesCurrent task instance ID; the algorithm side uses this field to stop the corresponding VLM analysis task

4.3.4 Response Example

json
{
    "code": 200,
    "message": "success",
    "data": {
        "requestId": "7c0c7b2e-4f65-4c82-8f63-0bc4f9d3857f",
        "instanceId": "PATROL-INSTANCE-20260427-0001"
    }
}

4.3.5 Response Field Description

FieldTypeDescription
codeIntegerUnified business status code; 200 = stop request successfully accepted, 500 on failure
messageStringUnified response description; fixed as success on success, error details on failure
data.requestIdStringIdentical to the requestId in the request, used for request tracing
data.instanceIdStringIdentical to the instanceId in the request, representing the VLM analysis task instance whose stop has been accepted

4.4 VLM Analysis Callback Interface Definition

4.4.1 Interface Basic Information

ConfigurationDetails
Interface NameVLM Analysis Callback Interface
Interface Path/api/v1/analysis/vlm-video-stream-callbacks/create
Request MethodPOST
Content-Typeapplication/json
Interface DescriptionUpon identifying a target matching the prompt description, the algorithm side invokes the callback in batches to report VLM video stream analysis results; the callback must include the task instance ID submitted at start time
  • The callback interface is provided by the platform side; the algorithm side configures the full callback address via reportUrl.
  • The path follows API design conventions: /api/v1/{module}/..., where {module}=analysis, the resource name uses the plural form vlm-video-stream-callbacks, and write operations use POST .../create.
  • All request body / response body fields use camelCase. Except for the path and business semantics, the overall structure is consistent with the Video Stream Detection Callback Interface.

4.4.2 Request Example

json
{
    "requestId": "650e8400-e29b-41d4-a716-446655440000",
    "instanceId": "PATROL-INSTANCE-20260427-0001",
    "resultsList": [
        {
            "objectId": "1212",
            "results": [
                {
                    "type": "vlm_detection",
                    "value": "1",
                    "code": "2000",
                    "resImageUrl": "detect-results/result_1745318400.jpg",
                    "pos": [],
                    "conf": 1.0,
                    "desc": "Check the video stream for a specific defect"
                }
            ]
        }
    ]
}

4.4.3 Field Description

FieldTypeRequiredDescription
requestIdStringYesStart request unique identifier; recommended to match the requestId in the start interface for request tracing
instanceIdStringYesCurrent task instance ID; must be identical to the instanceId in the start interface to associate analysis results with the specific task instance
resultsListArrayYesResult list; current implementation contains exactly one object
resultsList.objectIdStringYesDetection object ID; current implementation is fixed as 1212
resultsList.resultsArrayYesAnalysis result list; dynamically generated based on the actual prompt-matching targets identified
results.typeStringYesResult type; fixed as vlm_detection in VLM scenarios
results.valueStringYesDetection value; current implementation is fixed as 1, indicating a target matching the prompt was identified
results.codeStringYesResult code; current implementation is fixed as 2000, strictly reusing AnalysisResultCodeEnum
results.resImageUrlStringNoAlarm image path; represents the object path of the alarm image in object storage, using the S3 protocol
results.posArrayNoDetection bounding box position; current implementation returns an empty array
results.confNumberNoConfidence score; current implementation is fixed as 1.0
results.descStringYesDescription; recommended to return the prompt from the start interface unchanged or as an equivalent normalized description

4.4.4 Response Convention

  • Upon successful processing of a callback, the platform must return HTTP 200 with a unified Result<null> response body, for example:
json
{
    "code": 200,
    "message": "success",
    "data": null
}
  • When the request body is missing required fields, contains invalid field formats, or includes unsupported enum values, it is recommended to return HTTP 400, for example:
json
{
    "code": 40001,
    "message": "Parameter validation failed",
    "data": null
}
  • When an internal exception occurs during platform processing, it is recommended to return HTTP 500, for example:
json
{
    "code": 50000,
    "message": "Internal server error",
    "data": null
}
  • The current implementation logs callback response content. Retry behavior is determined by the algorithm side according to its own retry policy.
  • The HTTP callback call timeout is fixed at 10s.

4.4.5 Trigger Timing and Throttling Rules

  • The platform receives analysis results via callbacks and does not depend on the algorithm's internal throttling parameters.
  • The algorithm side may maintain throttling strategies as needed (e.g., deduplication of identical prompt results, time interval control), but these are internal implementation details and not part of the platform interface contract.
  • The platform integration focus is: reliably receiving callback requests triggered when a target matching the prompt description is identified, and accurately attributing results to the corresponding task instance based on instanceId.

4.5 VLM RTMP Streaming Interface Definition

4.5.1 Interface Basic Information

ConfigurationDetails
Interface NameVLM Video Stream RTMP Streaming Interface
Push URLrtmp://{host}:{port}/drtm/{instanceId}
ProtocolRTMP
Video EncodingH.264 (libx264)
Container FormatFLV
Interface DescriptionUpon successful acceptance of the start request, the algorithm side returns the corresponding RTMP URL and continuously pushes a VLM-annotated video stream for real-time consumption by the backend or media server

4.5.2 Streaming Parameters

ParameterValueDescription
Resolution1280×720Hard-coded in current implementation
Frame Rate15 fpsFrom fps configuration
EncodingH.264 (libx264)Software encoding
Bitrate1200kFixed bitrate
Pixel Formatyuv420pStandard compatible format
PresetsuperfastEncoding speed priority
TunezerolatencyLow latency
GOP10Keyframe interval of 10 frames
Container FormatFLVStandard RTMP container format

4.5.3 Streaming Content

  • Raw video frames are output together with VLM analysis annotations.
  • Annotation content is determined by the targets matching the prompt and the algorithm implementation. It is recommended to display the matching bounding box, prompt summary, and relevant confidence information in the frame.

4.5.4 Receiver Requirements

  • The receiver must deploy an RTMP server (e.g., Nginx-RTMP, SRS, Red5) with the listening port configured per the actual deployment.
  • The RTMP server must accept continuous streaming to the path /drtm/{instanceId}.
  • instanceId is used directly as the streamKey for a single video stream. The algorithm side assembles the full rtmpPushUrl after a successful start and returns it to the platform.
  • For the same instanceId, the streaming lifecycle must remain consistent with the VLM analysis task lifecycle; streaming must terminate when the task stops.

4.6 Result Code Enum Reuse Specification

Note: The enumeration in this section represents business result codes for VLM analysis result items results[].code and is not equivalent to HTTP-layer status codes.

Enum ValueCodeApplicable Description for VLM Scenarios
SUCCESS2000All VLM prompt-matching results reported in the current version use this code, indicating the anomaly or item of interest has been successfully identified and written to results[]

Note 1: The current version only reports results when a target matching the prompt description is detected; no separate result item is defined for "prompt not matched."

Note 2: If future versions need to report anomalous results such as frame extraction failures or analysis failures during VLM analysis, existing result codes from the appendix must still be reused; adding custom codes is prohibited.

4.7 VLM Dynamic Configuration Update Interface Definition

4.7.1 Interface Basic Information

ConfigurationDetails
Interface NameVLM Dynamic Configuration Update Interface
Interface Path/api/v1/analysis/vlm-video-stream-tasks/update
Request MethodPOST
Content-Typeapplication/json
Interface DescriptionDuring a running VLM task, the platform may use this interface to update the prompt or extension configuration parameters for the current task instance online. The algorithm side applies the update immediately upon receipt without stopping and restarting the task.
  • The path follows API design conventions: /api/v1/{module}/..., where {module}=analysis, the resource name uses the plural form vlm-video-stream-tasks, and the action is update.
  • The instanceId in the update request must be the ID of a currently running VLM task instance. The algorithm side verifies that the task exists and is in an updatable state before returning an acceptance result.
  • The current version only supports calling this interface while a task is running. Calling it when the task is stopped or not yet started will return a failed acceptance response.
  • The update operation does not affect the current RTMP streaming link; the streamKey and push URL remain the same as when the task was started.

4.7.2 Request Example

json
{
    "requestId": "8d1e7b2e-4f65-4c82-8f63-0bc4f9d3857f",
    "instanceId": "PATROL-INSTANCE-20260427-0001",
    "prompt": "Updated prompt description, e.g., check if smoke appears in the video stream",
    "report_interval": 5
}

4.7.3 Field Description

FieldTypeRequiredDescription
requestIdStringYesUnique identifier for the update request, used for request tracing; UUID is recommended
instanceIdStringYesCurrent task instance ID; must be identical to the instanceId in the start interface; the algorithm side uses this field to locate the running VLM analysis task
promptStringNoUpdated VLM prompt, used to replace the analysis objective description of the currently running task; if not provided, the prompt is not updated
report_intervalIntegerNoReporting interval (seconds), controlling the minimum time interval between VLM analysis result callbacks; if not provided, the current effective value is retained

Note: At least one of prompt or report_interval must be provided; otherwise the request is treated as an invalid update request.

4.7.4 Response Example

json
{
    "code": 200,
    "message": "success",
    "data": {
        "requestId": "8d1e7b2e-4f65-4c82-8f63-0bc4f9d3857f",
        "instanceId": "PATROL-INSTANCE-20260427-0001",
        "report_interval": 5
    }
}

4.7.5 Response Field Description

FieldTypeDescription
codeIntegerUnified business status code; 200 = configuration update successfully accepted, 500 on failure
messageStringUnified response description; fixed as success on success, error details on failure
data.requestIdStringIdentical to the requestId in the request, used for request tracing
data.instanceIdStringIdentical to the instanceId in the request, representing the VLM analysis task instance whose configuration update has been accepted
data.report_intervalIntegerThe reporting interval (seconds) effective after this update; echoes the value from the request or the current effective value

4.7.6 Effective Timing and Notes

  • The algorithm side must apply the new configuration immediately upon receiving a valid update request. After a prompt update, subsequent video frame analysis proceeds using the new prompt.
  • Callback results generated before the configuration update are not affected. For new callback results generated after the update, the desc field should reflect the updated prompt content.
  • When the update interface is called multiple times for the same task instance, the most recently successfully accepted configuration takes precedence.
  • If the task has already stopped or the instanceId does not exist at the time of the update, the algorithm side must return a failed acceptance response (code=500) with a clear error reason in the message field.

5. Appendix: Complete Algorithm Result Code Enumeration

Enum ValueCodeDescription
SUCCESS2000Success
GET_IMAGE_ERROR2001Failed to obtain image data
ANALYSIS_IMAGE_ERROR2002Error occurred during algorithm analysis
IMAGE_NO_METER2003No meter found in the image
IMAGE_NO_TARGET_DEVICE2004No corresponding state recognition device found in the image