Previous step is here. By the way, there was a bug in the code, I fixed it. It appeared when PE file had more than one callback.
Let's turn to the next important part of many PE files - relocations. They are used, when it is impossible to load an image to the base address indicated in its header. Mainly this is a typical behavior for DLL files (basically they can't work properly without relocations). Imagine that EXE file is being loaded to 0x400000 address. This EXE file loads the DLL, which is also loaded to this address. The addresses are the same, and the loader will look for DLL file relocations, because the DLL is loaded after the EXE. If there will be no relocations, loading will fail.
The relocations themselves are just the set of the tables with pointers to DWORDs, which should be calculated by the loader, if the image is loaded by an address other than base. There are many types of relocations, but actually only two are used in x86 (PE): IMAGE_REL_BASED_HIGHLOW = 3 and IMAGE_REL_BASED_ABSOLUTE = 0, and the second one does nothing, it is required only for relocation tables alignment.
I will just say, that loader loads EXE files to base address almost always, without using relocation. Our packer is unable to pack DLL yet, so to test relocation packing we should create an EXE file with incorrect base address, and then the loader will have to change it and apply relocations. I will not provide the test project source code here, you will find it in solution at the end of the article. I set 0x7F000000 as a base address (Linker - Advanced - Base Address).
We have to process relocations manually, like everything else, after unpacking a file. We should notify the loader in advance, that the file has relocations. Also, we need to know a new address, to which the loader moved the file.
We don't have to do anything to notify the loader, that our file has relocations - we still have necessary flags set in PE file headers left from original file. However, we need to know at which address the file was loaded.
Let's start with the unpacker code (unpacker project). To see, at which address the file should be loaded, and where the file was actually loaded, we can do the following:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
//Image loading address (Original one, relocations are not applied to it) unsigned int original_image_base_no_fixup; //... //These instructions are necessary only to replace //the addresses to real ones in the unpacker builder __asm { mov original_image_base, 0x11111111; mov rva_of_first_section, 0x22222222; mov original_image_base_no_fixup, 0x33333333; } |
We added a variable, which meaning is completely similar to original_image_base variable, which was introduced at one of previous steps. The difference is that we will apply relocations to original_image_base variable and thus understand at which real address the image was loaded. So we will not have to edit all the following operations in the unpacker, which we perform using this variable. And contents of original_image_base_no_fixup variable will not be modified, thereby we keep the address the image should be loaded to. This variable will be written by the packer for the unpacker, as other two.
Modify parameters.h file in the unpacker and update offsets to these three variables:
1 2 3 4 5 6 |
#pragma once static const unsigned int original_image_base_offset = 0x11; static const unsigned int rva_of_first_section_offset = 0x1B; static const unsigned int original_image_base_no_fixup_offset = 0x22; static const unsigned int empty_tls_callback_offset = 0x2; |
Now, as always, modify packer packed_file_info structure (simple_pe_packer project) by adding two fields to it:
1 2 |
DWORD original_relocation_directory_rva; //Original relocation directory relative address DWORD original_relocation_directory_size; //Original relocation directory size |
Further, as we did with imports and resources:
1 2 3 4 |
//Store relative address and size of //packed file original relocation directory basic_info.original_relocation_directory_rva = image.get_directory_rva(IMAGE_DIRECTORY_ENTRY_BASERELOC); basic_info.original_relocation_directory_size = image.get_directory_size(IMAGE_DIRECTORY_ENTRY_BASERELOC); |
After the line:
1 2 |
//Store the image loading address to necessary offset *reinterpret_cast<DWORD*>(&unpacker_section_data[original_image_base_offset]) = image.get_image_base_32(); |
we add the following one:
1 |
*reinterpret_cast<DWORD*>(&unpacker_section_data[original_image_base_no_fixup_offset]) = image.get_image_base_32(); |
which stores the image loading base address value to the variable recently added to the unpacker. At this stage after any file packing original_image_base and original_image_base_no_fixup variables will have the same values. We need to make the loader fix this variable if the image is relocated in memory. We will write the code after these lines:
1 2 3 4 5 |
//... //Set new entry point - now it points //to the very beginning of the unpacker image.set_ep(image.rva_from_section_offset(unpacker_added_section, 0)); } |
So,
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
//If file has relocations if(image.has_reloc()) { std::cout << "Creating relocations..." << std::endl; //Create relocation table list and a single table pe_base::relocation_table_list reloc_tables; pe_base::relocation_table table; pe_base::section& unpacker_section = image.get_image_sections().at(1); //Set relocation table virtual address //It will be equal to the relative virtual address of the second added //section, because it stores the unpacker code with the variable to fix table.set_rva(unpacker_section.get_virtual_address()); //Add relocation by original_image_base_offset offset from //parameters.h file of the unpacker table.add_relocation(pe_base::relocation_entry(original_image_base_offset, IMAGE_REL_BASED_HIGHLOW)); //Add the table to the list reloc_tables.push_back(table); //Rebuild relocations, placing them at the end //of section with the unpacker code image.rebuild_relocations(reloc_tables, unpacker_section, unpacker_section.get_raw_data().size()); } |
Everything is simple here - we just make relocation table of single element and add it to PE file.
Besides that we have to replace the lines:
1 2 |
//At last, strip unnecessary null bytes from the end of the section pe_base::strip_nullbytes(unpacker_added_section.get_raw_data()); |
to:
1 2 3 |
//At last, strip unnecessary null bytes from the end of the section if(!image.has_reloc()) pe_base::strip_nullbytes(unpacker_added_section.get_raw_data()); |
to prevent the last data bytes used to initialize thread local memory from overwriting relocations, which we place right after these bytes. Also we have to replace lines
1 |
image.rebuild_resources(new_root_dir, added_section, added_section.get_raw_data().size()); |
with
1 |
image.rebuild_resources(new_root_dir, added_section, added_section.get_raw_data().size(), true, !image.has_reloc()); |
to make the section be stripped only in case we are not going to put relocation data to it. We have to remove the line we added before:
1 |
image.remove_directory(IMAGE_DIRECTORY_ENTRY_BASERELOC); |
to prevent relocation directory from being removed from file (image.rebuild_relocations call fill it in such way, that it points to a new relocation directory).
All we have to do is to process original file relocations in the unpacker (unpacker project):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 |
//If a file had relocations and it //was moved by the loader if(info_copy.original_relocation_directory_rva && original_image_base_no_fixup != original_image_base) { //Pointer to a first IMAGE_BASE_RELOCATION structure const IMAGE_BASE_RELOCATION* reloc = reinterpret_cast<const IMAGE_BASE_RELOCATION*>(info_copy.original_relocation_directory_rva + original_image_base); //Relocated elements (relocations) directory size unsigned long reloc_size = info_copy.original_relocation_directory_size; //Count of processed bytes in a directory unsigned long read_size = 0; //List relocation tables while(reloc->SizeOfBlock && read_size < reloc_size) { //List all elements in a table for(unsigned long i = sizeof(IMAGE_BASE_RELOCATION); i < reloc->SizeOfBlock; i += sizeof(WORD)) { //Relocation value WORD elem = *reinterpret_cast<const WORD*>(reinterpret_cast<const char*>(reloc) + i); //If this is IMAGE_REL_BASED_HIGHLOW relocation (there are no other in PE x86) if((elem >> 12) == IMAGE_REL_BASED_HIGHLOW) { //Get DWORD at relocation address DWORD* value = reinterpret_cast<DWORD*>(original_image_base + reloc->VirtualAddress + (elem & ((1 << 12) - 1))); //Fix it like PE loader *value = *value - original_image_base_no_fixup + original_image_base; } } //Calculate number of bytes processed //in relocation directory read_size += reloc->SizeOfBlock; //Go to next relocation table reloc = reinterpret_cast<const IMAGE_BASE_RELOCATION*>(reinterpret_cast<const char*>(reloc) + reloc->SizeOfBlock); } } |
I placed this code right before TLS processing code in the unpacker. We act as a loader. After making ourselves sure that a file was moved and that it has a relocation directory, we enumerate all relocation tables (or, in other words, fixups) and all relocations in each table. We calculate values at each address pointed by fixups. If, for example, DWORD at address, which has to be recalculated, contained the value 0x800000, PE file base loading address was 0x400000, and actually it was loaded to the address 0x500000, then we calculate new value using formula [0x800000 - 0x400000 + 0x500000] = 0x900000.
It's funny, by the way, I mentioned earlier, that MSVC++ does not allow to declare and initialize variables at the same time in naked function body. It turned out, that it is correct only for function scope. If we make a new nested scope, everything works. Thus, the code
1 2 3 4 |
void __declspec(naked) func() { int a = 0; } |
will fail to build, but
1 2 3 4 5 6 |
void __declspec(naked) func() { { int a = 0; } } |
works fine.
At this point work with relocations is completed, and any file, which has relocations and even wrong base loading address, like example in solution, will run. But there is also something: if a file has TLS in addition to relocations, we are going to fail. Absolute, not relative, addresses are used in TLS directory (IMAGE_TLS_DIRECTORY32 structure), thus we have to move them, if the loader placed an image at the address different from base loading address, set in PE file header. Besides that, TLS callback addresses, if they exist, are also absolute and they have to be fixed.
Before working on TLS relocations I wondered how to test this. I was not willing to build binaries with TLS and relocations manually. Thus I modified relocation testing example (reloc_test), which I mentioned above, and linked it using free linker UniLink. This is, probably, the only linker, which is able to build TLS with callbacks. The source code of the example is now as follows:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 |
#include <iostream> #include <Windows.h> //File from UniLink distributive #include "ulnfeat.h" //Several TLS variables __declspec(thread) int a = 123; __declspec(thread) int b = 456; __declspec(thread) char c[128]; //Couple of TLS callbacks void __stdcall tls_callback(void*, unsigned long reason, void*) { if(reason == DLL_PROCESS_ATTACH) MessageBoxA(0, "Process Callback!", "Process Callback!", 0); else if(reason == DLL_THREAD_ATTACH) MessageBoxA(0, "Thread Callback!", "Thread Callback!", 0); } void __stdcall tls_callback2(void*, unsigned long reason, void*) { if(reason == DLL_PROCESS_ATTACH) MessageBoxA(0, "Process Callback 2!", "Process Callback 2!", 0); else if(reason == DLL_THREAD_ATTACH) MessageBoxA(0, "Thread Callback 2!", "Thread Callback 2!", 0); } //Thread procedure (empty, just to call callbacks) DWORD __stdcall thread(void*) { ExitThread(0); } //Two TLS callbacks //This declaration is for UniLink linker TLS_CALLBACK(1, tls_callback); TLS_CALLBACK(2, tls_callback2); int main() { //Display variables from TLS std::cout << "Relocation test " << a << ", " << b << std::endl; c[126] = 'x'; c[127] = 0; std::cout << &c[126] << std::endl; //Sleep for 2 seconds Sleep(2000); //Start the thread and close its handle right away CloseHandle(CreateThread(0, 0, &thread, 0, 0, 0)); //Sleep for 2 seconds Sleep(2000); return 0; } |
I will explain what this example does. Two callbacks will be called when process starts - tls_callback and tls_callback2. Two MessageBox'es will open with texts: "Process Callback!" and "Process Callback 2!". After that the following will be displayed in console:
Relocation test 123, 456
x
At last, after 2 seconds a new thread will be created, and TLS callbacks will be called again, but they will show MessageBox'es with texts: "Thread Callback!" and "Thread Callback 2!", and after 2 seconds the program finishes. Here we test relocations and TLS processing of our packer in full. To build this program, at first let's compile this source (right mouse button on main.cpp file - Compile). We will get main.obj file and pass it to UniLink linker by typing such command in console:
1 |
ulink.exe -B- -b:0x7F000000 main.obj, main.exe |
This command tells ulink.exe linker that main.exe file should be made from main.obj file. Its base loading address should be set to 0x7F000000 (to certainly make the loader apply the relocations). -B- option is used to add the relocations. After executing the command we will get a file with invalid base loading address, relocations and TLS with callbacks. Perfect for testing!
Let's turn to the packer project (simple_pe_packer). We move first_callback_offset variable to wider scope by replacing the lines:
1 2 3 4 |
//It is necessary to reserve place for //original TLS callbacks //plus one cell for zero DWORD DWORD first_callback_offset = data.size(); |
to
1 2 3 4 |
//It is necessary to reserve place for //original TLS callbacks //plus one cell for zero DWORD first_callback_offset = data.size(); |
and adding the lines
1 2 3 |
//Offset to absolute TLS callback address //relative to the beginning of second section DWORD first_callback_offset = 0; |
before
1 2 3 4 5 |
{ //New section pe_base::section unpacker_section; //... |
Further, after the lines
1 2 3 |
//Add relocation by original_image_base_offset offset from //parameters.h file of the unpacker table.add_relocation(pe_base::relocation_entry(original_image_base_offset, IMAGE_REL_BASED_HIGHLOW)); |
write TLS relocation code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
//If a file has TLS if(tls.get()) { //Calculate offset to TLS structure //relative to the beginning of second section DWORD tls_directory_offset = image.get_directory_rva(IMAGE_DIRECTORY_ENTRY_TLS) - image.section_from_directory(IMAGE_DIRECTORY_ENTRY_TLS).get_virtual_address(); //Add relocations for StartAddressOfRawData, //EndAddressOfRawData and AddressOfIndex fields //These fields are always not null table.add_relocation(pe_base::relocation_entry(static_cast<WORD>(tls_directory_offset + offsetof(IMAGE_TLS_DIRECTORY32, StartAddressOfRawData)), IMAGE_REL_BASED_HIGHLOW)); table.add_relocation(pe_base::relocation_entry(static_cast<WORD>(tls_directory_offset + offsetof(IMAGE_TLS_DIRECTORY32, EndAddressOfRawData)), IMAGE_REL_BASED_HIGHLOW)); table.add_relocation(pe_base::relocation_entry(static_cast<WORD>(tls_directory_offset + offsetof(IMAGE_TLS_DIRECTORY32, AddressOfIndex)), IMAGE_REL_BASED_HIGHLOW)); //If TLS callbacks exist if(first_callback_offset) { //Add relocations for AddressOfCallBacks field //and for our empty callback address table.add_relocation(pe_base::relocation_entry(static_cast<WORD>(tls_directory_offset + offsetof(IMAGE_TLS_DIRECTORY32, AddressOfCallBacks)), IMAGE_REL_BASED_HIGHLOW)); table.add_relocation(pe_base::relocation_entry(static_cast<WORD>(first_callback_offset), IMAGE_REL_BASED_HIGHLOW)); } } |
We added fixups for all non-zero fields of IMAGE_TLS_DIRECTORY32 structure, which contain absolute addresses. If we have TLS callbacks, we add a relocation also for our empty TLS callback absolute address. The most interesting thing is that we have nothing to edit in the unpacker, because it will process original file relocations, thus recalculating original TLS callback addresses, and it will call them only after that. The only thing I had to do is to increase the amount of memory reserved by the unpacker on stack, because it is not enough already. (I replaced sub esp, 256 command to sub esp, 4096, to be sure).
We test our packer on our hardcore example main.exe and make sure that everything works fine.
By this moment I have already tested current version of the packer on main EXE files of following applications: IrfanView, HM NIS Edit, Firefox, Notepad++, NSIS, Opera (it should be renamed to opera.exe after packing), Winamp, WinDjView, ResEd, Quake3, CatMario, Media Player Classic, Windows Media Player. They still work after packing!
Finally, I will note, that there is a remark in UPX source code comments, stating that if relocations and TLS are located in same section, the loader will not fix the addresses in TLS. Obviously, our main.exe sample is written this way, and curiously enough everything works on Windows XP and 7 (I didn't test it on other systems).
Full solution for this step: Own PE Packer Step 7