Computer Vision Workshop - Project Report


Real-Time Traffic Camera Analysis


Group members: Leron Fliess, Yifat Chernihov, Stav Ashuri

Workshop Instructor: Dr. Lior Wolf
Workshop TA: mr. Assaf Zaritsky


Our goal is to create an efficient traffic analysis application that given a live feed from a static traffic camera, will analyze the traffic congestion and output relevant statistics.
For this workshop, we plan on using a live feed from one of the “Ayalon” cameras, available online.

our working environment is Visual Studio + cygwin on Windows 7 and some extra platforms we will indicate during this report.

The main stages of our work:

1. Data collection

2. Samples creation

3. Haartraining

4. Simple frame manipulation

5. Advanced frame manipulation

6. Finding blobs

7. Tracking and recognition

8. Performance analysis






First step: Data collection
In this step we collected the following data:




These are images that contain only objects of interest, in our case, cars.
We used a live feed from “Ayalon” cameras, the camera that we worked with
is the one that films Herzliya Interchange. In order to capture fames which contain cars we run the video stream using VLC player and captured frames using the "take snapshot" option it supplies. In order to crop Images manually from the frames we created, we used "ImageClipper" which is a multi platform we downloaded from that enabled us to mark a car object and save it as a PNG format (OpenCV does not support GIF).

Eventually we created 834 positives.










These are background images that do not contain objects of interest in order to train haarcascade classifier.
We used 1655 negative (the number of negative samples should be as twice as the positive). Our negative samples contain 50 frames of empty road (that doesn’t contain any car) and 1605 photos of arbitrary backgrounds' we collected from the database that can be found here:  

We used twice as many negative samples than positive samples, as program will have to cope with blocks of the image without cars much more often than blocks with cars.








Using Photoshop we estimated the relation between the car location and the number of its height pixels in order to determine future reference.


In order to perform the next steps we used :


Second step: Samples Creation

In this step we created training samples from the cropped images we created in the first step.

In order to do so we generated a description file using an appropriate UNIX command- "find" and supplied this file along  with the name of the vector we would like to create to the cvCreateTestSamples function that the openCV supplies (from the precompiled directory we downloaded), in order to get a vector file of positive samples.
The vec file starts with 12-byte header features as the number of [positive] samples, length of each sample in pixels ,and the samples themselves go one by one, 2 bytes per pixel ,with 1 zero byte before each sample.

A list of all the negative samples names. (which is located at the same DIR with the samples).


Third step: Haar Training

In this step we generated an xml file as a result of the haartraining process.
We run the haartraining command on the DIR which contains the negative samples we collected in the first step, the negative description file and the positive vec file that were generated in the second step . This process was influenced by the following features:

directory name in which the trained classifier is stored
vector file name (which was created on the second step)
the negative description file
number_of_positive_samples of each classifier stage. We used 834.
number_of_negative_samples of each classifier stage. We used 1655.
number_of_stages to be trained. We used 20 as recommended on the net
way to set the complexity of this simple classifier which consists of nsplits features. But the training process will always add up simple classifiers 
to the classifier stage until the quality requirements for the classifier
stage - max false  alarm and min. hit rate - will be achieved. We used 2 which indicates that a  CART classifier with number_of_splits nodes is used.
Processing memory in MB Default is 200MB we used 1024
we used the –nonsym flag which indicates that the object class does not have vertical symmetry.
minimal desired hit rate for each stage classifier. Overall hit rate may be estimated as (min_hit_rate^number_of_stages)
we used 0.9999
maximal desired false alarm rate for each stage classifier.

Overall false alarm rate is estimated as:
We used the value of 0.5

whether and how much weight trimming should be used. We used the default value of 0.95.
the difference between number of pos and neg images.
selects the type of haar features set used in training. BASIC use only upright features, while ALL uses the full set of upright and 45 degree rotated feature set . we used the ALL mode.
sample_width in pixels , we used 19
sample_height in pixels , we used 21
what Adaboost algorithm to use: Real AB, Gentle AB, etc  . We used default
misclass (default) | gini | entropy. 
We used default.
max number of splits in tree cascade
min number of positive samples per cluster 

Fourth step: simple frame manipulation

Our algorithm captures frame by frame from the video and transforms it into a data type which OpenCV understands.
The next step was to define a Range Of Interest (ROI), from the frame, an area in which all the "action" occurs, the size of the ROI affects the processing time of the frame directly.


Mid term results






We scanned each frame's ROI with a sliding window, which grows with each scan, until it reaches a maximal size. We run the haar object detect on the window, and see if we find a car in it, using the given classifier.
Once an object is detected, it is immediately drawn on screen with a surrounding circle.

Average frame processing time using these methods was 448.336 ms per frame.
this was only to detect cars, with no tracking.

Fifth step: advanced frame manipulation

With detection working well, we decided to put more effort on accelerating the process.
The main idea was to quickly distinguish between static background and moving objects. The easiest method we thought of was comparing each frame with the previous one, we wrote some code that gets the last two frames as matrices, converts them to grayscale, calculates the difference between them and enters the result into a third matrix, which is grayscale representation as well. Now we turn it into a pure binary matrix, named "changedbits", by applying a threshold to it.  This produced the following result:

Since the trees are moving, we needed a way to filter out what we called "dynamic background": background which contains small movement most of the time.
After trying many different methods to replace the one we just explained, we found the best way was just to make an average of the first few "changedbits" matrices (we found an average of the first 30 frames produces a good result). After averaging the first 30 frames of the video we make a "mask" matrix, which contains '0' only in places where there was change throughout most of the first 30 frames.
Now all we have to do is multiply this mask with any "changedbits" matrix we get, and we get an improved "changedbits" matrix clean from "dynamic background".
This produced great results we felt we could work with, and with very little processing time.

Sixth step: finding blobs

The next goal was to use "changedbits" to reduce the area we scan with haar objectdetect, focusing it to only scan for cars in the areas that might contain a moving object.
We used cvblobslib, which contains functions that handle amorphic blobs.
We took the SimpleBlobDetector::findBlobs function and made some changes to it to fit our needs: find white blobs in a certain range of sizes in a frame.
This gave us good results, but with it many problems. At some point we get detection of a few small blobs inside one car.

A good result:

A bad result:


We get a pretty close detection of cars, less accurate than with haardetect, but up to 10 times faster.
Our next goal is to filter blobs that refer to the same car, to track the car and know when it leaves the screen, and to use haardetect to make sure the object we are tracking is indeed a car, so we can count them accurately.

Seventh step: tracking and recognition

In order to reach the goals we have mentioned at the previous step we wrote "blob" and "blob_manager" classes.

The class "blobManager" handles tracking and recognition of cars. This class holds the current location of cars and candidates which could be cars. It operates in this manner:

Blobdetect returns a list of blobs found in the current frame, this list contains center points of each blob.
For each blob we found, we run blobManager's "testAndAddCandidate", which – given a specific blob, checks if it should be added to the list of candidates.

We defined some terms which a blob has to fulfill in order to be added to the list of candidates:
First, if no other candidate exists, the blob becomes a valid candidate.
Otherwise, we check if the closest candidate from the blob is far enough to be considered a different car, otherwise, it might be an updated location of that candidate; we determined criteria for that as well.

Using this class, we overcame the unwanted result of detecting a few candidates inside one car.

Once every frame, we iterate over all the candidates that have still not been recognized  as cars, and use haardetect to scan the small area around them for a car. If a car was recognized inside this small frame, we mark the candidate as a car, and continue tracking it, but we do not need to use haardetect on it again. Thus, once a candidate was recognized as a car we do not waste any CPU time on recognizing it again.

To count the amount of cars that actually passed the camera, we check when the last location update of a candidate was made. If the candidate was not seen for a predefined amount of frames, and was recognized as a car at some point, we consider it as a car that had left the frame, and count it.











Performance comparison between new and old versions

Average processing time – old version vs. new version

In the second part of the workshop, our main goal was to take the car detection mechanism and speed it up significantly, up to the point where we could run it on a real-time video feed. To achieve this, our goal was to maintain processing time of no more than (1/25)fps = 0.04 seconds = 40 miliseconds per frame.

As can be seen in the following graph, the performance increased drastically.

This increase in performance was achieved due to the following reasons:


The relatively small price to pay for these drastic improvements is the background differentiating and blob detection mechanisms. For instance, the "BackDiff" algorithm, written by us, performs very non-consuming manipulations on the video.

ROI impact on the algorithm performance

The following graph shows the relative time taken for each task in the car counting algorithm, in two different scenarios:













Comparison between percentage of time which car is tracked old version vs. new version

Another performance enhancement is the percentage of time in which the car is tracked – the old version had to run the recognition algorithm for each frame and naturally had some misses. The new algorithm locks in on a car and tracks it from the moment it is recognized and until it leaves the screen.