The DMC unit has storage for two DPCM bytes. One is the currently playing byte, which bits are shifted out of periodically for playback. The other is a storage buffer that holds the next byte to be played. Whenever the playback byte empties, the storage byte is moved into it. Whenever the storage buffer is empty and bytes_remaining is nonzero, the DMA unit will fetch a byte into the storage buffer.
When you start a sample by writing to $4015, the storage buffer may still have the last byte of the previous sample in it. If so, then DMA waits until the storage buffer is empty, as usual. If a sample is started while the buffer is already empty, then the first byte is fetched right away (see
https://www.nesdev.org/wiki/DMA#DMC_DMA for timing details).