DeepStack

Object DetectorFace Recognition

DeepStack is a self-hosted, free and open source AI server that provides object detection and face recognition, among other functions.
It is highly optimized and runs on a pleathora of devices and platforms.
Below is quoted from DeepStacks documentation:

DeepStack is an AI server that empowers every developer in the world to easily build state-of-the-art AI systems both on premise and in the cloud. The promises of Artificial Intelligence are huge but becoming a machine learning engineer is hard. DeepStack is device and language agnostic. You can run it on Windows, Mac OS, Linux, Raspberry PI and use it with any programming language.
DeepStack's source code is available on GitHub via https://github.com/johnolafenwa/DeepStack
DeepStack is developed and maintained by DeepQuest AI

Configuration

Configuration example

deepstackmap required

DeepStack configuration.

hoststring required

IP or hostname to your DeepStack server.

portinteger required

Port to your DeepStack server.

Lowest value: 1024

Highest value: 49151

api_keystring (optional)

API key to your DeepStack server, if you have one set.

timeoutinteger (optional, default: 10)

Timeout for requests to your DeepStack server.

object_detectormap (optional)

Object detector domain config.

camerasmap required

Camera-specific configuration. All subordinate keys corresponds to the camera_identifier of a configured camera.

<CAMERA_IDENTIFIER>map required

Camera identifier. Valid characters are lowercase a-z, numbers and underscores.

fpsfloat (optional, default: 1)

The FPS at which the object detector runs.
Higher values will result in more scanning, which uses more resources.

Lowest value: 0

scan_on_motion_onlyboolean (optional, default: true)

When set to true and a motion_detector is configured, the object detector will only scan while motion is detected.

labelslist (optional)

A list of labels (objects) to track.

labelstring required

The label to track.

confidencefloat (optional, default: 0.8)

Lowest confidence allowed for detected objects. The lower the value, the more sensitive the detector will be, and the risk of false positives will increase.

Lowest value: 0

Highest value: 1

height_minfloat (optional, default: 0)

Minimum height allowed for detected objects, relative to stream height.

Lowest value: 0

Highest value: 1

height_maxfloat (optional, default: 1)

Maximum height allowed for detected objects, relative to stream height.

Lowest value: 0

Highest value: 1

width_minfloat (optional, default: 0)

Minimum width allowed for detected objects, relative to stream width.

Lowest value: 0

Highest value: 1

width_maxfloat (optional, default: 1)

Maximum width allowed for detected objects, relative to stream width.

Lowest value: 0

Highest value: 1

trigger_recorderboolean deprecated

DEPRECATED. Use trigger_event_recording instead.

If set to true, objects matching this filter will start the recorder.

trigger_event_recordingboolean (optional, default: true)

If set to true, objects matching this filter will trigger an event recording.

storeboolean (optional, default: true)

If set to true, objects matching this filter will be stored in the database, as well as having a snapshot saved. Labels with trigger_event_recording set to true will always be stored when a recording starts, regardless of this setting.

store_intervalinteger (optional, default: 60)

The interval at which the label should be stored in the database, in seconds. If set to 0, the label will be stored every time it is detected.

require_motionboolean (optional, default: false)

If set to true, the recorder will stop as soon as motion is no longer detected, even if the object still is. This is useful to avoid never ending recordings of stationary objects, such as a car on a driveway

max_frame_agefloat (optional, default: 2)

Drop frames that are older than the given number. Specified in seconds.

Lowest value: 0

log_all_objectsboolean (optional, default: false)

When set to true and loglevel is DEBUG, all found objects will be logged, including the ones not tracked by labels.

masklist (optional)

A mask is used to exclude certain areas in the image from object detection.

coordinateslist required

List of X and Y coordinates to form a polygon

Minimum items: 3

xinteger required

X-coordinate (horizontal axis).

yinteger required

Y-coordinate (vertical axis).

zoneslist (optional)

Zones are used to define areas in the cameras field of view where you want to look for certain objects (labels).

namestring required

Name of the zone. Has to be unique per camera.

coordinateslist required

List of X and Y coordinates to form a polygon

Minimum items: 3

xinteger required

X-coordinate (horizontal axis).

yinteger required

Y-coordinate (vertical axis).

labelslist (optional)

A list of labels (objects) to track.

labelstring required

The label to track.

confidencefloat (optional, default: 0.8)

Lowest confidence allowed for detected objects. The lower the value, the more sensitive the detector will be, and the risk of false positives will increase.

Lowest value: 0

Highest value: 1

height_minfloat (optional, default: 0)

Minimum height allowed for detected objects, relative to stream height.

Lowest value: 0

Highest value: 1

height_maxfloat (optional, default: 1)

Maximum height allowed for detected objects, relative to stream height.

Lowest value: 0

Highest value: 1

width_minfloat (optional, default: 0)

Minimum width allowed for detected objects, relative to stream width.

Lowest value: 0

Highest value: 1

width_maxfloat (optional, default: 1)

Maximum width allowed for detected objects, relative to stream width.

Lowest value: 0

Highest value: 1

trigger_recorderboolean deprecated

DEPRECATED. Use trigger_event_recording instead.

If set to true, objects matching this filter will start the recorder.

trigger_event_recordingboolean (optional, default: true)

If set to true, objects matching this filter will trigger an event recording.

storeboolean (optional, default: true)

store_intervalinteger (optional, default: 60)

The interval at which the label should be stored in the database, in seconds. If set to 0, the label will be stored every time it is detected.

require_motionboolean (optional, default: false)

image_widthinteger (optional)

Frames will be resized to this width before inference to save computing power.

image_heightinteger (optional)

Frames will be resized to this height before inference to save computing power.

custom_modelstring (optional)

Name of a custom DeepStack model. More information here.

face_recognitionmap (optional)

Face recognition domain config.

camerasmap required

Camera-specific configuration. All subordinate keys corresponds to the camera_identifier of a configured camera.

<CAMERA_IDENTIFIER>map required

Camera identifier. Valid characters are lowercase a-z, numbers and underscores.

labelslist (optional)

A list of labels that when detected will be sent to the post processor. Applies only to this specific camera.

masklist (optional)

A mask is used to exclude certain areas in the image from post processing.

coordinateslist required

List of X and Y coordinates to form a polygon

Minimum items: 3

xinteger required

X-coordinate (horizontal axis).

yinteger required

Y-coordinate (vertical axis).

labelslist (optional)

A list of labels that when detected will be sent to the post processor. Applies to all cameras defined under cameras.

face_recognition_pathstring (optional, default: /config/face_recognition/faces)

Path to folder which contains subdirectories with images for each face to track.

save_unknown_facesboolean (optional, default: true)

If set to true, any unrecognized faces will be stored in the database, as well as having a snapshot saved. You can then move this image to the folder of the correct person to improve accuracy.

unknown_faces_pathstring deprecated

DEPRECATED. Config option 'unknown_faces_path' is deprecated and will be removed in a future version.

Path to folder where unknown faces will be stored.

expire_afterfloat (optional, default: 5)

Time in seconds before a detected face is no longer considered detected.

Lowest value: 0

save_facesboolean (optional, default: true)

If set to true, detected faces will be stored in the database, as well as having a snapshot saved.

trainboolean (optional, default: true)

Train DeepStack to recognize faces on Viseron start. Disable this when you have a good model trained.

min_confidencefloat (optional, default: 0.8)

Minimum confidence for a face to be considered a match.

Lowest value: 0

Highest value: 1

Object detector

An object detector scans an image to identify multiple objects and their position.

tip

Object detectors can be taxing on the system, so it is wise to combine it with a motion detector

Labels

Labels are used to tell Viseron what objects to look for and keep recordings of. The available labels depends on what detection model you are using.

The max/min width/height is used to filter out any unreasonably large/small objects to reduce false positives.
Objects can also be filtered out with the use of an optional mask.

tip

These are the labels available in the default DeepStack model:

person, bicycle, car, motorcycle, airplane, bus, train, truck, boat, traffic light, fire hydrant, stop_sign, parking meter, bench, bird, cat, dog, horse, sheep, cow, elephant, bear, zebra, giraffe, backpack, umbrella, handbag, tie, suitcase, frisbee, skis, snowboard, sports ball, kite, baseball bat, baseball glove, skateboard, surfboard, tennis racket, bottle, wine glass, cup, fork, knife, spoon, bowl, banana, apple, sandwich, orange, broccoli, carrot, hot dog, pizza, donut, cake, chair, couch, potted plant, bed, dining table, toilet, tv, laptop, mouse, remote, keyboard, cell phone, microwave, oven, toaster, sink, refrigerator, book, clock, vase, scissors, teddy bear, hair dryer, toothbrush.

Zones

Zones are used to define areas in the cameras field of view where you want to look for certain objects (labels).
Say you have a camera facing the sidewalk and have labels setup to record the label person.
This would cause Viseron to start recording people who are walking past the camera on the sidewalk. Not ideal.
To remedy this you define a zone which covers only the area that you are actually interested in, excluding the sidewalk.

deepstack:
  object_detector:
    cameras:
      camera_one:
        ...
        zones:
          - name: sidewalk
            coordinates:
              - x: 522
                y: 11
              - x: 729
                y: 275
              - x: 333
                y: 603
              - x: 171
                y: 97
            labels:
              - label: person
                confidence: 0.8
                trigger_event_recording: true

tip

See Mask for how to get the coordinates for a zone.

Mask

Masks are used to exclude certain areas in the image from object detection. If a detected object has its lower portion inside of the mask it will be discarded.

The coordinates form a polygon around the masked area.
To easily generate coordinates you can use a tool like image-map.net.
Just upload an image from your camera, choose the Poly shape and start drawing your mask.
Then click Show me the code! and adapt it to the config format.
Coordinates coords="522,11,729,275,333,603,171,97" should be turned into this:

deepstack:
  object_detector:
    cameras:
      camera_one:
        ...
        mask:
          - coordinates:
              - x: 522
                y: 11
              - x: 729
                y: 275
              - x: 333
                y: 603
              - x: 171
                y: 97

Paste your coordinates here and press Get config to generate a config example

Face recognition

Face recognition runs as a post processor when a specific object is detected.

Labels

Labels are used to tell Viseron when to run a post processor.

Any label configured under the object_detector for your camera can be added to the post processors labels section.

note

Only objects that are tracked by an object_detector can be sent to a post_processor. The object also has to pass all of its filters (confidence, height, width etc).

Train

On startup images are read from face_recognition_path and a model is trained to recognize these faces.
The folder structure of the faces folder is very strict. Here is an example of the default one:

/config
|── face_recognition
|   └── faces
|       ├── person1
|       |   ├── image_of_person1_1.jpg
|       |   ├── image_of_person1_2.png
|       |   └── image_of_person1_3.jpg
|       └── person2
|       |   ├── image_of_person2_1.jpeg
|       |   └── image_of_person2_2.jpg

warning

You need to follow this folder structure, otherwise training will not be possible.

Troubleshooting

To enable debug logging for deepstack, add the following to your config.yaml

/config/config.yaml
logger:
  logs:
    viseron.components.deepstack: debug

DeepStack

Configuration​

Object detector​

Labels​

Zones​

Mask​

Face recognition​

Labels​

Train​

Troubleshooting​

Configuration

Object detector

Labels

Zones

Mask

Face recognition

Labels

Train

Troubleshooting