The code was jumping to end when tail == 0, but the correct thing is to jump when LEN is zero. TAIL == 0 means we can start the summing pipeline. We also don't need to decrement the TAIL since we overwrite it at start of no_trim.