Consider the following two files on a Linux system:
use_message.cpp
#include <iostream>
extern const char* message;
void print_message();
int main() {
std::cout << message << '\n';
print_message();
}
libmessage.cpp
#include <iostream>
const char* message = "Meow!"; // 1. absolute address of string literal
// needs runtime relocation in a .so
void print_message() {
std::cout << message << '\n';
}
We can compile use_message.cpp into an object file, compile libmessage.cpp into a shared library, and link them together, like so:
$ g++ use_message.cpp -c -pie -o use_message.o
$ g++ libmessage.cpp -fPIC -shared -o libmessage.so
$ g++ use_message.o libmessage.so -o use_message
The definition for message
originally lives in libmessage.so. When use_message
is executed, the dynamic linker performs relocations that:
- Update the
message
definition inside libmessage.so with the load address of the string data - Copy the definition of
message
from libmessage.so into use_message's.bss
section - Update the global offset table in libmessage.so to point to the new version of
message
inside use_message
The relevant relocations, as dumped by readelf
, are:
use_message
Offset Info Type Sym. Value Sym. Name + Addend
000000004150 000c00000005 R_X86_64_COPY 0000000000004150 message + 0
This is relocation number 2 in list I wrote before.
libmessage.so
Offset Info Type Sym. Value Sym. Name + Addend
000000004040 000000000008 R_X86_64_RELATIVE 2000
000000003fd8 000b00000006 R_X86_64_GLOB_DAT 0000000000004040 message + 0
These are relocation numbers 1 and 3, respectively.
There's a dependency between relocation numbers 1 and 2: the update to libmessage.so's message
definition must happen before this value is copied into use_message, otherwise use_message will not point to the correct location.
My question is: how is the order for applying relocations specified? Is there something encoded in the ELF files that specifies this? Or in the ABI? Or is the dynamic linker just expected to work out the dependencies between relocations itself and ensure that any relocations that write to a given memory address are run before any relocations that read from the same location? Does the static linker only output relocations such that the ones in the executable can always be processed after the shared library ones?