Abstract:
End-to-end trained Convolutional Neural Network (CNN) have signi cantly advanced
the eld of computer vision in recent years, particularly high-level vision problems,
because of its strong non-linear tting ability. In context of optical
ow, obtaining
dense, ground truth per-pixel for real scenes is di cult and thus rarely available. But
CNN in recent years demonstrated that dense optical
ow estimation can be cast as a
learning problem. However, the state of the art with regard to the quality of the
ow
has still been de ned by traditional methods. In this thesis, rstly, we used a compact
but e ective CNN model, called U-Net, which contains an encoder part and a decoder
part and used benchmark datasets: MPI-Sintel, KITTI and Middlebury; for training
and evaluation, in a supervised manner. Secondly, we used some traditional energy-
based loss function for dense optical
ow estimation. Thirdly, we used backward
warping with bilinear interpolation to predict rst image and build occlusion mask
using ground truth
ow. Experimental results show that our proposed method is at
par with state-of-the-art supervised CNN methods.