What is ASLR?
ASLR stands for Address Space Layout Randomisation. It was introduced in Windows with Vista, released to customers in early 2007. Before that, Linux had already been enabling ASLR by default since kernel 2.6.12, released in June 2005. ASLR tries to make life of bad guys™ more difficult by randomizing how an executable is laid out in memory.
When I first wanted to figure out how ASLR works internally, I came across a lot of articles, but not one that tells the entire story. After I completed my research and fully understood how ASLR works in Windows, I decided to write an article about.
What is being moved around?
ASLR moves the following memory segments:
- The stack
- The heap
- Code segments
How does this make life difficult for Bad Guys™?
When someone finds a bug in a program (for instance a buffer overflow), this can potentially be abused to take control of the flow of execution of the program by overwriting key values in the program, such as the return addresses on the stack. Before DEP (or NX) came along, the common approach was injecting shellcode onto the stack and then overwriting the return value on the stack by the offset which is the start of the shellcode. This is no longer possible because the stack is now a so-called ‘non-executable’ memory segment; if the CPU starts executing code on the stack (EIP is pointing somewhere on the stack), the CPU will throw an exception and the program will crash without the harmful code being executed.
DEP can be worked around by, instead of loading your own code onto the stack, executing code that already exists in an executable section of the program, such as the program itself or a library (a trick called return-to-libc). For instance; writing “shutdown /s” somewhere into memory, then loading a pointer to that string as argument onto the stack and then changing the return address to the address of system() will make the target computer shut down. All this requires knowledge of the memory segment layout of the program, which is what ASLR was designed to obfuscate.
How does that impact my code?
It shouldn’t. If ASLR breaks your code, you should definately fix it. You probably made assumptions about where things are located in the address space of your application, which is a bad practice and not very portable.
ASLR in action
We can easely demonstrate the effect of ASLR with a simple piece of C code:
int main() {
int stack;
void *heap;
heap = malloc(1024);
printf("Stack: 0x%p - Heap: 0x%p - Code: 0x%p - "
"strlen(): 0x%p - MessageBoxA(): 0x%p\n",
&stack, heap, &main, &strlen, &MessageBoxA);
}
Life before ASLR
It is important to note that the Visual C++ linker enables ASLR in the executable by default. So first, we’ll compile the program with /DYNAMICBASE:NO
to disable ASLR. Then, we run the program a few times and line up the results:
Stack | Heap | Code | strlen() | MessageBoxA() |
---|---|---|---|---|
0x0019FF34 | 0x006BF6B8 | 0x00401050 | 0x752207A0 | 0x7784D740 |
0x0019FF34 | 0x005EFEE8 | 0x00401050 | 0x752207A0 | 0x7784D740 |
0x0019FF34 | 0x005608E8 | 0x00401050 | 0x752207A0 | 0x7784D740 |
After reboot | ||||
0x0019FF34 | 0x0057E0F0 | 0x00401050 | 0x75C2EFE0 | 0x75D58830 |
0x0019FF34 | 0x0068DDB0 | 0x00401050 | 0x75C2EFE0 | 0x75D58830 |
0x0019FF34 | 0x004EEBB8 | 0x00401050 | 0x75C2EFE0 | 0x75D58830 |
The heap moves around, but the other addresses don’t. The reason the heap CAN be moved around is because malloc() returns a pointer; code can never safely assume that memory allocated on the heap is in the same place. But as you can see, there are other things that stay in place, except the library functions. These are randomized at boot time.
Enabling ASLR
To enable ASLR, we simply recompile the code with the /DYNAMICBASE
switch, then run it a few times again:
Stack | Heap | Code | strlen() | MessageBoxA() |
---|---|---|---|---|
0x008FFAD0 | 0x00C4F720 | 0x00F21050 | 0x750F07A0 | 0x76AAD740 |
0x0053FCF4 | 0x005AF6B8 | 0x00F21050 | 0x750F07A0 | 0x76AAD740 |
0x005DF7F0 | 0x0014F720 | 0x00F21050 | 0x750F07A0 | 0x76AAD740 |
After reboot | ||||
0x003EFA28 | 0x0011EBF8 | 0x00A11050 | 0x76E007A0 | 0x774AD740 |
0x007CFDE0 | 0x00CD0918 | 0x002B1050 | 0x76E007A0 | 0x774AD740 |
0x00D3FBFC | 0x00E5F8B0 | 0x002B1050 | 0x76E007A0 | 0x774AD740 |
Note that the stack and heap always move around, but code doesn’t. This is likely because Windows caches the layout after the executable has been mapped to memory once, until it is gone from the cache or the system is rebooted. This is because mapping and rebasing an executable can take some time (see below).
What about 64-bit?
With /DYNAMICBASE:NO
:
Stack | Heap | Code | strlen() | MessageBoxA() |
---|---|---|---|---|
0x000000000014FF20 | 0x00000000004C3C90 | 0x0000000140001070 | 0x00007FF8A8263C70 | 0x00007FF8AB5DFBE0 |
0x000000000014FF20 | 0x0000000000583C90 | 0x0000000140001070 | 0x00007FF8A8263C70 | 0x00007FF8AB5DFBE0 |
0x000000000014FF20 | 0x0000000000465570 | 0x0000000140001070 | 0x00007FF8A8263C70 | 0x00007FF8AB5DFBE0 |
After reboot | ||||
0x000000000014FF20 | 0x0000000000424E80 | 0x0000000140001070 | 0x00007FFFFF55F090 | 0x00007FF802798020 |
0x000000000014FF20 | 0x00000000005D0D50 | 0x0000000140001070 | 0x00007FFFFF55F090 | 0x00007FF802798020 |
0x000000000014FF20 | 0x000000000047F680 | 0x0000000140001070 | 0x00007FFFFF55F090 | 0x00007FF802798020 |
With /DYNAMICBASE
:
Stack | Heap | Code | strlen() | MessageBoxA() |
---|---|---|---|---|
0x000000E58A52F770 | 0x0000021043823900 | 0x00007FF71AA21070 | 0x00007FF8A8263C70 | 0x00007FF8AB5DFBE0 |
0x000000449F51FA30 | 0x0000015B1D7A3C90 | 0x00007FF71AA21070 | 0x00007FF8A8263C70 | 0x00007FF8AB5DFBE0 |
0x000000312C74FB90 | 0x000001E25C7E3C90 | 0x00007FF71AA21070 | 0x00007FF8A8263C70 | 0x00007FF8AB5DFBE0 |
After reboot | ||||
0x000000B3508FF9C0 | 0x00000217580C0D80 | 0x00007FF656D91070 | 0x00007FFFFF55F090 | 0x00007FF802798020 |
0x0000009F4551FD40 | 0x0000023911081130 | 0x00007FF656D91070 | 0x00007FFFFF55F090 | 0x00007FF802798020 |
0x0000008EF03DFD50 | 0x0000019014F5F6D0 | 0x00007FF656D91070 | 0x00007FFFFF55F090 | 0x00007FF802798020 |
As you can see, 64-bit or 32-bit doesn’t influence which segments are randomized. It does however influence the entropy: since the 64-bit address space is much wider, it would be harder for an attacker to guess addresses.
Summarizing
Enabling ASLR for an application basically randomizes two segments: the code segment of the application and the stack. For convenience, here’s a table showing when the different base addresses are randomized.
/DYNAMICBASE:? | Code | Stack | Heap | DLLs |
---|---|---|---|---|
YES | OS boot | App. start | App. start | OS boot |
NO | Never | Never | App. start | OS boot |
How does this work?
When Windows loads an executable, the dynamic linker gets to work. An executable (or PE) contains a PE-header which tells the dynamic linker how the executable should be mapped into virtual memory and which segments should be readable, writable and executable (which is used in DEP). If ASLR is not enabled for a segment it will be placed on a fixed address (also specified in the PE-header).
The executable also contains a import address table (IAT). This table tells the OS which functions the program will be calling from external DLLs. The dynamic linker loads the required DLLs into memory and places the address of these functions into the IAT in memory. When the application wants to call on one of these functions, it loads the address from the IAT and calls it. Please note that the OS always loads the entire DLL into memory, but only fills in the addresses for the functions that are actually used in the application in the IAT.
Because Windows executables often contain absolute addresses (as opposed to ‘position independent code’), the executable loader scans for these absolute addresses and recalculates them based on the new offset of the segments if they have been relocated (due to ASLR or their preferred base address not being available). This process is called rebasing.
Conclusion
ASLR mitigates the exploitation of software bugs by making it hard for the attacker to create an exploit that works reliably on every system by randomizing the offsets of all the program segments. It is, however, not a magic remedy: if software can be exploited, there is still a chance an attacker can gain control of a system, it is just harder to do. 64-bit systems benefit more from ASLR than 32-bit systems: since the 64-bit address space is larger, there is more entropy in the randomness of the addresses.
To bypass ASLR an attacker may try and obtain pointers to certain locations in the program that may be located on the stack, for instance. This is why ‘pointer leakage’ bugs are a big deal and should be fixed.
In order to use the ‘return-to-libc’ method, an attacker must figure out where the libc-functions he needs are located. Since library locations are randomized at boot time and the same across the system, there is a chance that pointer leakage from one application running on a system can be used to exploit another application running on that system.
References
- https://www.symantec.com/connect/articles/dynamic-linking-linux-and-windows-part-two
- http://blog.morphisec.com/aslr-what-it-is-and-what-it-isnt/
- https://en.wikipedia.org/wiki/Return-oriented_programming
- https://en.wikipedia.org/wiki/Portable_Executable