* Add FDTensor copy and move assignment and constructor * Upgrade the transpose to receive the output tensor same as input tensor * Add note * Add realloc for FDTensor * Support output equals to input for softmax * Remove FDTensor::Alloc
Modify code structure