copyuntil: reduce over-allocation to start
This fits into a 32-byte allocation pool, saving up to 64 bytes when
repeatedly reading small chunks of data (e.g. tokenizing a CSV file). In
some local `@btime` measurements, this seems to take <10% more time
across a range of output lengths.