Obtain Images for Training

Vehicle model recognition performs best when vehicles are moving towards the camera, and the front of the vehicle is visible in the video. Ideally your cameras should be positioned above the lane(s) of traffic being monitored, and not at the roadside. Cameras that are positioned directly above the traffic are better for recognition because they capture images where the vehicles approach head-on.

Usually three to five training images are sufficient to train each model. You should supply images from several angles (for example, head-on, 15 degrees left/right of center, and 30 degrees left/right of center). It can help to add a set of images at a smaller size and a set at a larger size, in case vehicles are detected at different distances.

The training images that you use to train a single model should be different from each other, such that each image adds new information to the model. Adding many almost identical images to a single model is unlikely to improve accuracy and increases the time required for processing.

You can supply an image that shows an entire vehicle, but only if you specify the country of origin for the vehicle's number plate. An image that contains multiple vehicles is accepted only if one of the number plates covers more than double the image area of the others. The vehicle with the number plate that covers the largest proportion of the image is used to train Media Server and the others are ignored.

If you do not want to specify the country of origin for the number plate, you must supply training images that are cropped and show only a vehicle's number plate and the region around the grille. The Media Server package includes an example configuration file that can produce cropped training images from a video: configurations/examples/VehicleModel/OutputModelPatches.cfg. This configuration runs vehicle recognition, crops the video frames to the relevant region, and encodes the cropped images using the image encoder.

In some cases you might need to produce the training images manually. Media Server might not crop frames successfully for some types of vehicles, including larger vehicles such as trucks. If you are training Media Server to recognize a brand new model and the prototype vehicle does not have a number plate, Media Server cannot produce training images because the number plate is used to identify the position of the vehicle in the image.