Accelerating the calculation of optical flow fields

This project was created for the 2011 parallel computation workshop.

logo
Workshop Instructor: Prof. Lior Wolf
Workshop Assistant: Assaf Zaritsky
Algorithm by: Bruhn, A., Weickert, J., Feddern, C., Kohlberger, T., and Schnörr, C.
Original implementation by: Adam Harmat
Submitted by: Anat Stolarsky and Eli Levy Karin

Introduction

A frequent goal in computer vision is the detection of a relative movement between two subsequent frames in a sequence. Such differences are commonly known as the “optical flow” associated with the sequence. Algorithms which are used for optical flow calculation are useful for a variety of applications, including motion detection, items tracking and obstacle avoidance. Classic optical flow algorithms are computational heavy, and thus lead to very long run times when implemented. These long run times may be problematic, since many applications require real-time calculation in order to be useful. In this workshop we present a few methods for accelerating optical flow calculation which take advantage of different abilities to compute in parallel. The base algorithm and implementation are described in the paper above.

Project Goal

Accelerating the base implementation by using Intel's Paralel Studio tools.

Hotspot Analysis

Analysis of the algorithm described in Bruhn et al 2003 suggested several points which may be calculated in parallel:

1. Pre-calculation of the temporal and spacial derivatives fx, fy and ft to be done simultaneously: as derivatives are independent of each other, and this should cause no change in the outcome.

2. Calculating several Gauss Seidel iterations in parallel: this may result in some changes in convergence. In the original algorithm V and U values in each iteration are calculated based on the most updated values from the previous iteration. This may not be the case if several iterations are calculated in parallel. For this reason, we will present a qualitative measure to examine the change in the results.

3. Calculating several pixels in parallel: this may also result in some changes in convergence as the original algorithm makes use of the most updated values when calculating each pixel.

This analysis of the paper was compared to the Parallel Studio 2011 hotspot analysis and sections 2 and 3 of the above list have been confirmed to be actual hotspots in the code:

hotspot

Implementation

In order to accelerate the existing implementation, we took advantage of some of Intel's powerful tools. Two code packages were used for multithreading attempts – Intel TBB and Intel Cilk+. A third attempt at acceleration included utilizing Intel's SIMD (Single Instruction Multiple Data) instruction set, which allow simultaneous multiple floating point calculations. This resulted in 7 new code versions, each utilizing different parallelization tools. We'll present the original code and 2 of the new versions here:

1. the original code: This a direct and non parallel implementation of the algorithm presented in Bruhn et al 2003.

2. parallel_guass_seidel_iteration: This version attempts to parallelize Gauss Seidel iterations as suggested in item 2 of the list of hotspots. This version makes use of Intel's TBB "parallel_for" which allows the execution of a function at each iteration in parallel.

3. simd_pixels_parallel_gauss_seidel_iteration_no_frame: This code only goes over pixels which are not in the frame (frame = first row, last row, first column and 1-4 last columns) and the gauss seidel step is preformed inside the gauss_seidel_iteration loop and not by a function. Additionally, this code includes an attempt to parallelize Gauss Seidel iterations as suggested in item 2 of the list of hotspots and an attempt to parallelize the calculations of pixels as suggested in item 3 of the list of hotspots. This version makes use of Intel's TBB "parallel_for" and of SSE intrinsic commands.

Results and Analysis

In order to check the acceleration attempts, we tested all implementations on 3 types of videos. We've captured 100 frames of each video for the analysis. In order for the testing to be comprehensive, 3 video resolutions were tested:

Video 1 Video 2 Video 3
250 X 250 pixels 320 X 240 pixels 480 X 720 pixels

We measured the average and standard deviation of the time it took to calculate the optical flow field per frame and the results for each version of the algorithm will be presented below. We also present the main findings of the amplifier analysis for each version of the algorithm.

As the original code is serial, we realized it is important to examine the quality of the calculations by each version. In order to do that, the calculated optic fields were exported to data files and later analyzed using Matlab codes. Statistical measures were calculated over 30 frames for each implemented version. To give a good estimation of the quality of the optic filed calculated by each implemented version we chose 3 measuring methods and we'll present 2 of them here:

1. A visual method - We calculated the artificial subsequent frame by adding the calculated optical flow to the base frame and then interpolated to match the pixel grid. This artificially generated frame was compared to the actual subsequent frame.

2. A statistical measure – For each implemented version, we compared the artificially calculated subsequent frames with actual capture of subsequent frames from the video. This was done be calculating the difference between the current version's calculated artificial subsequent frame and the actual captured subsequent frame. This result was normalized by dividing the result of the absolute difference by the pixel values of the captured frame. For each algorithm, we present a graph of the average percent error values versus the average percent error values of the original algorithm.

Results

1. the original code:

Time measures:
Video 1 Video 2 Video 3
Average time to calculate the optical flow field for a pair of frames in seconds 0.230304 0.270447 1.185091
Std 0.005011 0.010725 0.033643

Main findings - amplifier analysis:
orig_amp_b orig_amp_a

Real image vs. interpolated image:
orig_inter

2. parallel_guass_seidel_iteration:

Time measures:
Video 1 Video 2 Video 3
Average time to calculate the optical flow field for a pair of frames in seconds 0.196407 0.228475 1.040521
Std 0.008101 0.018093 0.087035

Main findings - amplifier analysis:
para_amp_b para_amp_a

Real image vs. interpolated image:
para_inter

Statistical analysis:
para_stat
original average error: 3.76%, std: 0.3400
parallel_guass_seidel_iteration average error: 4.23%, std: 0.3403

3. simd_pixels_parallel_gauss_seidel_iteration_no_frame:

Time measures:
Video 1 Video 2 Video 3
Average time to calculate the optical flow field for a pair of frames in seconds 0.139754 0.159727 0.684208
Std 0.004751 0.005708 0.025375

Main findings - amplifier analysis:
simd_p_amp_b simd_p_amp_a

Real image vs. interpolated image:
simd_p_inter

Statistical analysis:
simd_p_stat
original average error: 3.76%, std: 0.3400
simd_pixels_parallel_gauss_seidel_iteration_no_frame error: 4.23%, std: 0.4629

Analysis

In parallel_guass_seidel_iteration we managed to reduce the run time by 14.72%, 15.52% and by 12.20% for Video 1, Video 2 and Video 3 respectively (when the run time of the original code is considered 100%). Both the visual assessment and the statistical assessments show the quality of the results is quite similar to the original algorithm

In simd_pixels_parallel_gauss_seidel_iteration_no_frame we managed to reduce the run time by 39.32%, 40.94% and 42.27% for Video 1, Video 2 and Video 3 respectively (when the run time of the original code is considered 100%). This improvement was achieved by the combination of the SIMD commands and the parallelizing iterations together with the fact that this algorithm works on a little less pixels and performs the gauss seidel step inside the loop and not in a function. Both the visual assessment and the statistical assessment show the quality of the results is quite similar to the original algorithm though the average percent different is slightly bigger than in the previous versions (1.29% vs. 1%). This is probably due to the face some pixels are not calculated

Download code versions