Both appear in the tokenizer, but it's not clear to me which to use for padding examples in batches (or just when padding to consistent length for specific kernels).
Dear experts, I found there are two pad tokens in deepseek-coder. What's the difference between them? When I need to use pad token, which one shall I use? tokenizer.json ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results