When a single transfer is an exact multiple of the endpoint's packet size,
that is EXACTLY what the spec says. Without it, the transfer is considered
to be still in progress, hence the performance issue.

I'm unconvinced. The language in the spec refers to an "endpoint",
i.e. that means to me on the device side only: (section 5.8.3 of the
1.1 spec)

"A bulk transfer is complete when the *endpoint* (emphasis mine) does
one of the following:

- Has transferred exactly the amount of data expected
- Transfers a packet with a payload size less than wMaxPacketSize or
transfers a zero-length packet."

It's the same in both directions. Think about it logically. How else
would you ever signal to the device that a transfer is complete?
