"Sticky" allocation of worker threads (#7372)
* Sticky thread alloaction
* Test sticky thread assignment
* Test sticky thread assignment
* Test sticky thread assignment
* Expose control over additional worker assignment stats
* Sticky thread alloaction
* Test sticky thread assignment
* Test sticky thread assignment
* Test sticky thread assignment
* Expose control over additional worker assignment stats
* Merge
* Merge
* Merge
* Fix Windows build
* Fix windows build 2
* Build Python 3.8 Windows CPU only
* Add env var to override binding
* Build Python 3.8 Windows CPU only
* Fix windows build
* Remove thread affinity override
* Remove goodworker
* Remove Python build settings
* Remove unneeded changes
* Remove unneeded changes
* Remove unneeded changes
* Remove unneeded changes
* Remove unneeded changes
* Remove unneeded changes
* Tidy
* Tidy
* Avoid race on preferred_worker vector
* Improve assertions
* Improve assertions
* Enum for PushBackWithTag result
* Remove unused field
* Update comments
* Extra debugging
* Extra debugging
* Extra debugging
* Support varying thread pool sizes
* Improve comments
* Remove requirement for thread local to be trivially destructible
* Use unsigned consistently for thread counts, removing casting
* Remove debug code
* Fix webassembly build
* Merge
* Merge
* Merge
* Remove unused code
* Fix build
* Extra test case for varying loop sizes
* Clean variable names
* Clean variable names
* Clean variable names
* Remove unneeded include, fix build
* Fix profiling
* Update from review comments