Human image animation aims to generate a video from a static image by following a specified pose sequence. Existing methods typically adopt a multi-stage pipeline that separately learns appearance and ...