Vehicle model recognition performs best when vehicles are moving towards the camera, and the front of the vehicle is visible in the video. Ideally your cameras should be positioned above the lane(s) of traffic being monitored, and not at the roadside. Cameras that are positioned directly above the traffic are better for recognition because they capture images where the vehicles approach head-on.
Usually three to five training images are sufficient to train each model. You should supply images from several angles (for example, head-on, 15 degrees left/right of center, and 30 degrees left/right of center). It can help to add a set of images at a smaller size and a set at a larger size, in case vehicles are detected at different distances.
The training images that you use to train a single model should be different from each other, such that each image adds new information to the model. Adding many almost identical images to a single model is unlikely to improve accuracy and increases the time required for processing.
You can use Media Server to obtain training images. Run vehicle make and model recognition on a sample video, crop the video frames to the region identified by the vehicle model analysis engine, and encode the cropped images using the image encoder. Media Server includes an example configuration, configurations/examples/VehicleModel/OutputModelPatches.cfg
, that performs these steps. You can then use a selection of the encoded images for training.
In some cases you might need to produce the training images manually. Media Server might not crop frames successfully for some types of vehicles, including larger vehicles such as trucks. If you are training Media Server to recognize a brand new model and the prototype vehicle does not have a number plate, Media Server cannot produce training images because the number plate is used to identify the position of the vehicle in the image.
_FT_HTML5_bannerTitle.htm