Abstract:
Virtual Try-On (VTON) is an Augmented Reality (AR) application that allows a user to try a product before actually buying it. In this work, we explore VTON approaches in the clothing domain. Considering the cost intensiveness of 3D model-based VTON approaches researchers have focused on the image-based VTON problem. In general, most of the existing approaches are data-based learning systems and constrained to clean clothing images as reference clothing input. To relax the constraint of separate clothing images, this work takes a more challenging version of this problem by considering the reference clothing in the form of a human wearing it. Thus we attempt to solve the model-to-person try-on problem where the goal is to transfer the clothing from the model to the person. An approach widely explored by the existing works is to employ a deep neural network to predict the parameters of thin plate spline (TPS) transform, a warping function, to align the source clothing in the body shape and pose of the target person. However, the human body undergoes very restricted deformation which is because of its specific organization of limbs and muscles. Our first work takes an attempt towards this direction where we employ the correspondences between the structural key points of humans and clothes between the model and the target person to compute the parameters of the TPS transform. Explicitly considering the structural constraints in the form of landmarks aid us in computing more accurate target warps which result in a better FID(Fréchet Inception Distance) score over the state-of-the-art. However, landmark points are only a few in the count which becomes a cause of concern in maintaining fine details of the target warp. Thus we explore the correspondences between a rich body shape and pose representations i.e., dense pose between the model and the person. We employ a geometric matching network (a popular convolutional neural network for feature matching and transformation estimation) to learn the features from the dense poses of the model and the person, and use correlation-based matching to predict the parameters of TPS. While TPS is a well-explored transformation function for clothes warping, it has some inherent properties which limit its applicability in this problem domain. Human arms can move in a variety of ways but the second-order difference constraint of TPS restricts the bending of the grid. Hence, it falls short in modeling the cases when the source or the target person is posing with his arms folded or bent. We address this using a hand-crafted feature-based warping technique with considering the human landmarks as well as the human limb correspondences to compute the target warp. Extensive experiments show the potential of our approach over the state-of-the-art.