Workshop Instructor: |
Prof. Lior Wolf |

Workshop Assistant: |
Assaf Zaritsky |

Algorithm by: |
Bruhn, A., Weickert, J., Feddern, C., Kohlberger, T., and Schnörr, C. |

Original implementation by: |
Adam Harmat |

Submitted by: |
Anat Stolarsky and Eli Levy Karin |

A frequent goal in computer vision is the detection of a relative movement between two subsequent frames in a sequence. Such differences are commonly known as the “optical flow” associated with the sequence. Algorithms which are used for optical flow calculation are useful for a variety of applications, including motion detection, items tracking and obstacle avoidance. Classic optical flow algorithms are computational heavy, and thus lead to very long run times when implemented. These long run times may be problematic, since many applications require real-time calculation in order to be useful. In this workshop we present a few methods for accelerating optical flow calculation which take advantage of different abilities to compute in parallel. The base algorithm and implementation are described in the paper above.

Accelerating the base implementation by using Intel's Paralel Studio tools.

Analysis of the algorithm described in Bruhn et al 2003 suggested several points which may be calculated in parallel:

**1. Pre-calculation of the temporal and spacial derivatives fx, fy and ft to be done simultaneously:** as derivatives are independent of each other, and this should cause no change in the outcome.

**2. Calculating several Gauss Seidel iterations in parallel:** this may result in some changes in convergence. In the original algorithm V and U values in each iteration are calculated based on the most updated values from the previous iteration. This may not be the case if several iterations are calculated in parallel. For this reason, we will present a qualitative measure to examine the change in the results.

**3. Calculating several pixels in parallel:** this may also result in some changes in convergence as the original algorithm makes use of the most updated values when calculating each pixel.

This analysis of the paper was compared to the Parallel Studio 2011 hotspot analysis and sections 2 and 3 of the above list have been confirmed to be actual hotspots in the code:

In order to accelerate the existing implementation, we took advantage of some of Intel's powerful tools. Two code packages were used for multithreading attempts – Intel TBB and Intel Cilk+. A third attempt at acceleration included utilizing Intel's SIMD (Single Instruction Multiple Data) instruction set, which allow simultaneous multiple floating point calculations. This resulted in 7 new code versions, each utilizing different parallelization tools. We'll present the original code and 2 of the new versions here:

**1. the original code:** This a direct and non parallel implementation of the algorithm presented in Bruhn et al 2003.

**2. parallel_guass_seidel_iteration:** This version attempts to parallelize Gauss Seidel iterations as suggested in item 2 of the list of hotspots. This version makes use of Intel's TBB "parallel_for" which allows the execution of a function at each iteration in parallel.

**3. simd_pixels_parallel_gauss_seidel_iteration_no_frame:** This code only goes over pixels which are not in the frame (frame = first row, last row, first column and 1-4 last columns) and the gauss seidel step is preformed inside the gauss_seidel_iteration loop and not by a function. Additionally, this code includes an attempt to parallelize Gauss Seidel iterations as suggested in item 2 of the list of hotspots and an attempt to parallelize the calculations of pixels as suggested in item 3 of the list of hotspots. This version makes use of Intel's TBB "parallel_for" and of SSE intrinsic commands.

In order to check the acceleration attempts, we tested all implementations on 3 types of videos. We've captured 100 frames of each video for the analysis. In order for the testing to be comprehensive, 3 video resolutions were tested:

Video 1 | Video 2 | Video 3 |
---|---|---|

250 X 250 pixels | 320 X 240 pixels | 480 X 720 pixels |

We measured the average and standard deviation of the time it took to calculate the optical flow field per frame and the results for each version of the algorithm will be presented below. We also present the main findings of the amplifier analysis for each version of the algorithm.

As the original code is serial, we realized it is important to examine the quality of the calculations by each version. In order to do that, the calculated optic fields were exported to data files and later analyzed using Matlab codes. Statistical measures were calculated over 30 frames for each implemented version. To give a good estimation of the quality of the optic filed calculated by each implemented version we chose 3 measuring methods and we'll present 2 of them here:

**1. A visual method - **We calculated the artificial subsequent frame by adding the calculated optical flow to the base frame and then interpolated to match the pixel grid. This artificially generated frame was compared to the actual subsequent frame.

**2. A statistical measure – **For each implemented version, we compared the artificially calculated subsequent frames with actual capture of subsequent frames from the video. This was done be calculating the difference between the current version's calculated artificial subsequent frame and the actual captured subsequent frame. This result was normalized by dividing the result of the absolute difference by the pixel values of the captured frame.
For each algorithm, we present a graph of the average percent error values versus the average percent error values of the original algorithm.

**1. the original code:**

Video 1 | Video 2 | Video 3 | |
---|---|---|---|

Average time to calculate the optical flow field for a pair of frames in seconds | 0.230304 | 0.270447 | 1.185091 |

Std | 0.005011 | 0.010725 | 0.033643 |

Main findings - amplifier analysis:

Real image vs. interpolated image:

**2. parallel_guass_seidel_iteration:**

Video 1 | Video 2 | Video 3 | |
---|---|---|---|

Average time to calculate the optical flow field for a pair of frames in seconds | 0.196407 | 0.228475 | 1.040521 |

Std | 0.008101 | 0.018093 | 0.087035 |

Main findings - amplifier analysis:

Real image vs. interpolated image:

Statistical analysis:

original average error: 3.76%, std: 0.3400

parallel_guass_seidel_iteration average error: 4.23%, std: 0.3403

**3. simd_pixels_parallel_gauss_seidel_iteration_no_frame:**

Video 1 | Video 2 | Video 3 | |
---|---|---|---|

Average time to calculate the optical flow field for a pair of frames in seconds | 0.139754 | 0.159727 | 0.684208 |

Std | 0.004751 | 0.005708 | 0.025375 |

Main findings - amplifier analysis:

Real image vs. interpolated image:

Statistical analysis:

original average error: 3.76%, std: 0.3400

simd_pixels_parallel_gauss_seidel_iteration_no_frame error: 4.23%, std: 0.4629

In **parallel_guass_seidel_iteration** we managed to **reduce** the run time by **14.72%**, **15.52%** and by **12.20%** for Video 1, Video 2 and Video 3 respectively (when the run time of the original code is considered 100%).
Both the visual assessment and the statistical assessments show the quality of the results is quite similar to the original algorithm

In **simd_pixels_parallel_gauss_seidel_iteration_no_frame** we managed to **reduce** the run time by **39.32%**, **40.94%** and **42.27%** for Video 1, Video 2 and Video 3 respectively (when the run time of the original code is considered 100%). This improvement was achieved by the combination of the SIMD commands and the parallelizing iterations together with the fact that this algorithm works on a little less pixels and performs the gauss seidel step inside the loop and not in a function.
Both the visual assessment and the statistical assessment show the quality of the results is quite similar to the original algorithm though the average percent different is slightly bigger than in the previous versions (1.29% vs. 1%). This is probably due to the face some pixels are not calculated