Temporal-Enhanced Convolutional Network for Person Re-Identification
Keywords:Deep Neural Networks, Person Re-Identification
We propose a new neural network called Temporal-enhanced Convolutional Network (T-CN) for video-based person reidentification. For each video sequence of a person, a spatial convolutional subnet is first applied to each frame for representing appearance information, and then a temporal convolutional subnet links small ranges of continuous frames to extract local motion information. Such spatial and temporal convolutions together construct our T-CN based representation. Finally, a recurrent network is utilized to further explore global dynamics, followed by temporal pooling to generate an overall feature vector for the whole sequence. In the training stage, a Siamese network architecture is adopted to jointly optimize all the components with losses covering both identification and verification. In the testing stage, our network generates an overall discriminative feature representation for each input video sequence (whose length may vary a lot) in a feed-forward way, and even a simple Euclidean distance based matching can generate good re-identification results. Experiments on the most widely used benchmark datasets demonstrate the superiority of our proposal, in comparison with the state-of-the-art.