Abstract: In complex visual scenarios, people can understand occluded objects through contextual reasoning and association. However, existing video object detection (VOD) methods could not work well.