llvm-project
af0471c1 - [lldb][Darwin] Fetch detailed binary info in chunks (#190720)

Commit
9 days ago
[lldb][Darwin] Fetch detailed binary info in chunks (#190720) When binaries have been loaded into a process on Darwin, lldb sends a jGetLoadedDynamicLibrariesInfos packet to get the filepath, uuid, load address, and detailed information from the mach header/load commands. For a large UI app, the number of binaries that can be loaded (through various dependencies) can exceed a thousand these days, and requesting detailed information on all of those can result in debugserver allocating too much memory when running in constrained environments, and being killed. In 2023 I laid the groundwork to fetch detailed information in chunks, instead of one large request. The main challenge with this is when we first attach to a process that is running, we send a "tell me about all binaries loaded", and that prevents lldb from chunking the reply; the packet design for jGetLoadedDynamicLibrariesInfos assumes the entire reply is sent in one packet, instead of the typical gdb remote serial protocol trick of a response with partial data starting with 'm' and a response with a complete reply starting with 'l'. The 2023 change is to add a new key to this packet, `report_load_commands` and when that is set to `false`, only the load address of the binaries is reported. lldb then uses the array of load addresses of all the binaries to fetch detailed information about them in smaller groupings. This PR implements the lldb side of that work. Process::GetLoadedDynamicLibrariesInfos now takes a `bool include_mh_and_load_commands`, ProcessGDBRemote sends that as an argument in the jGetLoadedDynamicLibrariesInfos packet. DynamicLoaderMacOS::DoInitialImageFetch is changed to only get the load addresses on initial attach. If the reply includes the full binary information (not just load addresses) -- when talking to an old debugserver -- we will use that information instead of re-fetching it. On a newer debugserver that only sent the load addresses, we'll send this list of addresses to the standard method we use when dyld has told us to load binaries at addresses already. DynamicLoaderMacOS::AddBinaries, which takes a list of addresses and fetches detailed information about them, is updated to request only 600 binaries at a time. A typical UI app will be in the 700-1000 binary range these days, so this will turn one large fetch into two, in most cases. There are some system UI processes that have many dependencies that could require three fetches. I picked this number so most debug sessions will be handled by two requests. In debugserver MachProcess::FormatDynamicLibrariesIntoJSON, I removed the obsolete-for-three-years-now `mod_date` field. I was sending back the binary filepaths for this "don't send the detailed information" version of the packet - I don't need that, and it just increases the size, so I stopped sending filepaths in this mode. I also added a new field for when we ARE sending detailed information, `sizeof_mh_and_loadcmds`. I don't use this in lldb yet, but when we are told about a binary and need to read it from memory today, we have an initial read to get the mach header, which tells us the size of the load commands. Then we have a second read of the mach header plus load commands, before we can start binary processing in earnest. This is an extra read packet and very unnecessary, given that debugserver knows how large the mach header + load commands are. So I'm returning it here, and at some point I'll find a way to pipe that into a new memory object file creation method in lldb. It's one of those "I should really find a way to remove that extra read some day" cleanups, and while I was in this area, I'd add this first piece of that. I don't have a test for this. I've been thinking about an API test that creates 700 dylibs with empty functions in each, runs it, and confirms all of the dylibs were loaded. I'd have to grab a packet log to be completely sure we didn't read the full binary list in one go. But I worry that compiling and linking even 700 do-nothing dylibs might be too much. Maybe I should add a setting in DynamicLoaderMacOS::AddBinaries to reduce the maximum number of binaries that can be read at once, and have a small nubmer of dylibs. When by-hand testing this, I had a maximum of 5 binaries being queried in one packet. rdar://109428337
Author
Parents
Loading