Removing non-thread-safe log statement from ReinitializeTensor (#48185)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48185
In a scenario where we have Caffe2 wrapped into a dynamic library, we were running into the memory corruption crash at program termination:
"corrupted size vs. prev_size in fastbins"
Turns out the crash occurs in glog's logging.cc, which is not thread-safe and has to initialize a static hostname string when flushing. If this ends up happening on multiple threads simultaneously, this can lead to a memory corruption.
```
==1533667== Invalid free() / delete / delete[] / realloc()
==1533667== at 0xA3976BB: operator delete(void*, unsigned long) (vg_replace_malloc.c:595)
==1533667== by 0x37E36AE: std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::~basic_string() (basic_string.h:647)
==1533667== by 0xAD87 (https://github.com/pytorch/pytorch/commit/97b9712aedb8ef6ba9e45e03e9b33ac8db3b27ec)F6B: __run_exit_handlers (in /usr/lib64/libc-2.28.so)
==1533667== by 0xAD8809 (https://github.com/pytorch/pytorch/commit/153e2e96d4aaf0c3d97dea7ee9375b2ad26d679f)F: exit (in /usr/lib64/libc-2.28.so)
==1533667== by 0xAD71799: (below main) (in /usr/lib64/libc-2.28.so)
==1533667== Address 0x165cd720 is 0 bytes inside a block of size 31 free'd
==1533667== at 0xA3976BB: operator delete(void*, unsigned long) (vg_replace_malloc.c:595)
==1533667== by 0x37E36AE: std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::~basic_string() (basic_string.h:647)
==1533667== by 0xAD87 (https://github.com/pytorch/pytorch/commit/97b9712aedb8ef6ba9e45e03e9b33ac8db3b27ec)F6B: __run_exit_handlers (in /usr/lib64/libc-2.28.so)
==1533667== by 0xAD8809 (https://github.com/pytorch/pytorch/commit/153e2e96d4aaf0c3d97dea7ee9375b2ad26d679f)F: exit (in /usr/lib64/libc-2.28.so)
==1533667== by 0xAD71799: (below main) (in /usr/lib64/libc-2.28.so)
==1533667== Block was alloc'd at
==1533667== at 0xA39641F: operator new(unsigned long) (vg_replace_malloc.c:344)
==1533667== by 0x37F4E18: std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_mutate(unsigned long, unsigned long, char const*, unsigned long) (basic_string.tcc:317)
==1533667== by 0x37F4F2E: std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_replace(unsigned long, unsigned long, char const*, unsigned long) (basic_string.tcc:466)
==1533667== by 0x5170344: GetHostName(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*) (logging.cc:227)
==1533667== by 0x51702D4 (https://github.com/pytorch/pytorch/commit/fc7f0269808581499571c5db8af87311c943cd4e): google::LogDestination::hostname[abi:cxx11]() (logging.cc:555)
==1533667== by 0x5173789: google::(anonymous namespace)::LogFileObject::Write(bool, long, char const*, int) (logging.cc:1072)
==1533667== by 0x51746DF: google::LogDestination::LogToAllLogfiles(int, long, char const*, unsigned long) (logging.cc:773)
==1533667== by 0x5170BDC: google::LogMessage::SendToLog() (logging.cc:1386)
==1533667== by 0x5171236: google::LogMessage::Flush() (logging.cc:1305)
==1533667== by 0x517114D: google::LogMessage::~LogMessage() (logging.cc:1264)
==1533667== by 0x108DC840: caffe2::ReinitializeTensor(caffe2::Tensor*, c10::ArrayRef<long>, c10::TensorOptions) (tensor.cc:0)
==1533667== by 0x103BBED0: caffe2::int8::Int8GivenTensorFillOp::RunOnDevice() (int8_given_tensor_fill_op.h:29)
==1533667==
```
There doesn't seem to be an obvious easy solution here. The logging API being used by c10 is fundamentally not thread-safe, at least when it uses glog. Glog does have a threadsafe API (raw_logging), but this doesn't seem to be used by c10 right now. I suspect other callers are not running into this crash because:
- They have other libraries using glog in their module, so the static variable in glog gets initialized before getting into a race condition
- They don't use int8 network in a glog context, thus avoiding this problematic log statement
An alternative fix would be to correctly initialize the dtype of the int8 tensor, which is currently always uninitialized, making the log statement always trigger for int8 networks. Initializing the int8 tensor correctly in tensor_int8.h is proving to be challenging though, at least without knowledge of Caffe2's codebase. And even then, it wouldn't fix the issue for all use cases.
Test Plan: Ran my app with valgrind, I no longer get the crash and valgrind doesn't complain about a memory corruption anymore
Reviewed By: thyu, qizzzh
Differential Revision: D25040725
fbshipit-source-id: 1392a97ccf9b4c9ade1ea713610ee44a1578ae7d