Model predition on videos with multiple faces and some pictures.