Previous step is here.
So, from previous steps we have working packer and basic unpacker, which does nothing yet. At this step we will make run simple packed programs (which have nothing, except import table and possibly relocations). First thing to do in addition to data uncompressing is to fix original file import table. Usually the loader does that, but for now we play its role for compressed file.
Let's add several fields to our packed_file_info structure:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
//Structure to store packed file information struct packed_file_info { BYTE number_of_sections; //Number of original file sections DWORD size_of_packed_data; //Size of packed data DWORD size_of_unpacked_data; //Size of original data DWORD total_virtual_size_of_sections; //Total virtual size of all original file sections DWORD original_import_directory_rva; //Relative address of original import table DWORD original_import_directory_size; //Original import table size DWORD original_entry_point; //Original entry point DWORD load_library_a; //LoadLibraryA procedure address from kernel32.dll DWORD get_proc_address; //GetProcAddress procedure address from kernel32.dll DWORD end_of_import_address_table; //IAT end }; |
We added 4 fields which will be useful for the unpacker. Now we have to fill them in packer code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
//... //PE file basic information structure packed_file_info basic_info = {0}; //Get and save original sections count basic_info.number_of_sections = sections.size(); //Store relative address and size //of packed file original import table basic_info.original_import_directory_rva = image.get_directory_rva(IMAGE_DIRECTORY_ENTRY_IMPORT); basic_info.original_import_directory_size = image.get_directory_size(IMAGE_DIRECTORY_ENTRY_IMPORT); //Save its entry point basic_info.original_entry_point = image.get_ep(); //Save all packed file sections total virtual size basic_info.total_virtual_size_of_sections = image.get_size_of_image(); |
Everything is simple here. At the second step, if you remember, I calculated manually the total size of all original file sections and explained, that it is equivalent to the value, returned by get_size_of_image function. Here we used it. That's all with packer for now. Now we turn to the unpacker (unpacker project). We need to compile LZO1Z algorithm into it, I did it in a simple and stupid way - I moved all the files required for lzo1z_decompress function compilation (in particular: lzo1z_d1.c, lzo1x_d.ch, config1z.h, config1x.h, lzo_conf.h, lzo_ptr.h, lzo1_d.ch, miniacc.h)to unpacker project. Besides that, I added include directory to the project: ../../lzo-2.06/include. Further I had to dig into project settings. When using memset, memcpy and similar functions (and we are going to use them more then once) Visual C++ can insert whole CRT into resulting exe file at its own discretion, which is absolutely unnecessary for us. I had to turn off intrinsic (internal) functions (C/C++ - Optimization - Enable Intrinsic Functions - No) and full optimization (C/C++ - Optimization - Whole Program Optimization - No), add libcmt.lib to ignored libraries list (Linker - Input - Ignore Specific Default Libraries - libcmt.lib) just in case and to turn off code generation at the linking stage (Linker - Optimization - Link Time Code Generation - Default). As we turned off all internal functions (with memset and memcpy among them) we need our own implementations now. We add two files to the project: memcpy.c and memset.c. I copied source code of functions with same names from CRT to these files:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
void * __cdecl memset ( void *dst, int val, unsigned int count ) { void *start = dst; while (count--) { *(char *)dst = (char)val; dst = (char *)dst + 1; } return(start); } |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
void * __cdecl memcpy ( void * dst, const void * src, unsigned int count ) { void * ret = dst; /* * copy from lower addresses to higher addresses */ while (count--) { *(char *)dst = *(char *)src; dst = (char *)dst + 1; src = (char *)src + 1; } return(ret); } |
Another issue is awaiting us here. There are four modules in our code now (four files with source code, .c and .cpp), and after compilation we will have four object (obj) files. The linker will have to assembly this to one exe file somehow, and it will do this. However, it will place these modules in exe file in an arbitrary order. But we need unpacker_main function to be placed at the beginning of unpacker code. We patch it in packer, do you remember? This problem can be solved easily. We create text file containing the following:
1 2 3 4 |
unpacker_main@0 lzo1z_decompress memset memcpy |
We call it link_order.txt and put it to the folder with unpacker project sources. This file will provide the resulting file functions order to linker. Let's set this file in project settings Linker - Optimization - Function Order - link_order.txt. That's all, the setup is completed, let's develop the unpacker!
Firstly, I increased the amount of data allocated on stack up to 256 bytes (sub esp, 256). There are a lot of local variables, so let's get reinsured, if suddenly 128 will be insufficient.
Let's add unpacker function prototype to the beginning of unpacker.cpp file:
1 2 3 4 5 6 7 |
//Unpacking algorithm #include "lzo_conf.h" /* decompression */ LZO_EXTERN(int) lzo1z_decompress ( const lzo_bytep src, lzo_uint src_len, lzo_bytep dst, lzo_uintp dst_len, lzo_voidp wrkmem /* NOT USED */ ); |
We can use it in code now. Further we will need VirtualAlloc function (to allocate memory), VirtualProtect (to change memory pages attributes) and VirtualFree (to release allocated memory). Let's import them from kernel32.dll:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 |
//kernel32.dll *reinterpret_cast<DWORD*>(&buf[0]) = 'nrek'; *reinterpret_cast<DWORD*>(&buf[4]) = '23le'; *reinterpret_cast<DWORD*>(&buf[8]) = 'lld.'; *reinterpret_cast<DWORD*>(&buf[12]) = 0; //Load kernel32.dll library HMODULE kernel32_dll; kernel32_dll = load_library_a(buf); //VirtualAlloc function prototype typedef typedef LPVOID (__stdcall* virtual_alloc_func)(LPVOID lpAddress, SIZE_T dwSize, DWORD flAllocationType, DWORD flProtect); //VirtualProtect function prototype typedef typedef LPVOID (__stdcall* virtual_protect_func)(LPVOID lpAddress, SIZE_T dwSize, DWORD flNewProtect, PDWORD lpflOldProtect); //VirtualFree function prototype typedef typedef LPVOID (__stdcall* virtual_free_func)(LPVOID lpAddress, SIZE_T dwSize, DWORD dwFreeType); //VirtualAlloc *reinterpret_cast<DWORD*>(&buf[0]) = 'triV'; *reinterpret_cast<DWORD*>(&buf[4]) = 'Alau'; *reinterpret_cast<DWORD*>(&buf[8]) = 'coll'; *reinterpret_cast<DWORD*>(&buf[12]) = 0; //Get VirtualAlloc function address virtual_alloc_func virtual_alloc; virtual_alloc = reinterpret_cast<virtual_alloc_func>(get_proc_address(kernel32_dll, buf)); //VirtualProtect *reinterpret_cast<DWORD*>(&buf[0]) = 'triV'; *reinterpret_cast<DWORD*>(&buf[4]) = 'Plau'; *reinterpret_cast<DWORD*>(&buf[8]) = 'etor'; *reinterpret_cast<DWORD*>(&buf[12]) = 'tc'; //Get VirtualProtect function address virtual_protect_func virtual_protect; virtual_protect = reinterpret_cast<virtual_protect_func>(get_proc_address(kernel32_dll, buf)); //VirtualFree *reinterpret_cast<DWORD*>(&buf[0]) = 'triV'; *reinterpret_cast<DWORD*>(&buf[4]) = 'Flau'; *reinterpret_cast<DWORD*>(&buf[8]) = 'eer'; //Get VirtualFree function address virtual_free_func virtual_free; virtual_free = reinterpret_cast<virtual_free_func>(get_proc_address(kernel32_dll, buf)); |
This piece of code is similar to code at step 3, where we loaded user32.dll and got MessageBoxA function address in it, so I will not explain it again here. Then we should move necessary variables to local scope, which were stored by the packer for us:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
//Relative virtual address of import directory DWORD original_import_directory_rva; //Import directory virtual address DWORD original_import_directory_size; //Original entry point DWORD original_entry_point; //Total size of all file sections DWORD total_virtual_size_of_sections; //Number of original file sections BYTE number_of_sections; //Copy these values from packed_file_info structure, //which was saved for us by the packer original_import_directory_rva = info->original_import_directory_rva; original_import_directory_size = info->original_import_directory_size; original_entry_point = info->original_entry_point; total_virtual_size_of_sections = info->total_virtual_size_of_sections; number_of_sections = info->number_of_sections; |
We did this because packed_file_info structure in the beginning of first packed file section will be overwritten with real unpacked data soon. Now we allocate memory and unpack compressed data block to it:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
//Pointer to the memory //to store unpacked data LPVOID unpacked_mem; //Allocate the memory unpacked_mem = virtual_alloc( 0, info->size_of_unpacked_data, MEM_COMMIT, PAGE_READWRITE); //Unpacked data size //(in fact, this variable is unnecessary) lzo_uint out_len; out_len = 0; //Unpack with LZO algorithm lzo1z_decompress( reinterpret_cast<const unsigned char*>(reinterpret_cast<DWORD>(info) + sizeof(packed_file_info)), info->size_of_packed_data, reinterpret_cast<unsigned char*>(unpacked_mem), &out_len, 0); |
We don't need to initialize LZO algorithm before unpacking, it is enough to call one function to unpack, and we did that. Further we have to calculate first section header virtual address.
1 2 3 4 5 6 7 8 9 10 11 12 |
//Pointer to DOS file header const IMAGE_DOS_HEADER* dos_header; //Pointer to file header IMAGE_FILE_HEADER* file_header; //Virtual address of sections header beginning DWORD offset_to_section_headers; //Calculate this address dos_header = reinterpret_cast<const IMAGE_DOS_HEADER*>(original_image_base); file_header = reinterpret_cast<IMAGE_FILE_HEADER*>(original_image_base + dos_header->e_lfanew + sizeof(DWORD)); //with this formula offset_to_section_headers = original_image_base + dos_header->e_lfanew + file_header->SizeOfOptionalHeader + sizeof(IMAGE_FILE_HEADER) + sizeof(DWORD) /* Signature */; |
Now we have virtual address of section headers. We need to overwrite them, to make them similar to original file headers. Before doing this it is necessary to manage some more things:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
//Null first section memory //This region matches the memory region, //which is occupied by all sections in original file memset( reinterpret_cast<void*>(original_image_base + rva_of_first_section), 0, total_virtual_size_of_sections - rva_of_first_section); //Let's change memory block attributes, in which //PE file and section headers are placed //We need write access DWORD old_protect; virtual_protect(reinterpret_cast<LPVOID>(offset_to_section_headers), number_of_sections * sizeof(IMAGE_SECTION_HEADER), PAGE_READWRITE, &old_protect); //Now we change section number //in PE file header to original file_header->NumberOfSections = number_of_sections; |
We begin to restore section headers:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
//Section header virtual address DWORD current_section_structure_pos; current_section_structure_pos = offset_to_section_headers; //List all sections for(int i = 0; i != number_of_sections; ++i) { //Creates section header structure IMAGE_SECTION_HEADER section_header; //Set structure to null memset(§ion_header, 0, sizeof(section_header)); //Fill the important fields: //Characteristics section_header.Characteristics = (reinterpret_cast<packed_section*>(unpacked_mem) + i)->characteristics; //File data offset section_header.PointerToRawData = (reinterpret_cast<packed_section*>(unpacked_mem) + i)->pointer_to_raw_data; //File data size section_header.SizeOfRawData = (reinterpret_cast<packed_section*>(unpacked_mem) + i)->size_of_raw_data; //Relative section virtual address section_header.VirtualAddress = (reinterpret_cast<packed_section*>(unpacked_mem) + i)->virtual_address; //Section virtual size section_header.Misc.VirtualSize = (reinterpret_cast<packed_section*>(unpacked_mem) + i)->virtual_size; //Copy original section name memcpy(section_header.Name, (reinterpret_cast<packed_section*>(unpacked_mem) + i)->name, sizeof(section_header.Name)); //Copy filled header //to memory, where section headers are stored memcpy(reinterpret_cast<void*>(current_section_structure_pos), §ion_header, sizeof(section_header)); //Move the pointer to next section header current_section_structure_pos += sizeof(section_header); } |
Section headers have been restored, let's restore their data now:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 |
//Pointer to raw section data //is necessary to disassemble compressed sections data //and to put them to right places DWORD current_raw_data_ptr; current_raw_data_ptr = 0; //Restore the pointer to section headers current_section_structure_pos = offset_to_section_headers; //List all the sections again for(int i = 0; i != number_of_sections; ++i) { //Section header we've just written const IMAGE_SECTION_HEADER* section_header = reinterpret_cast<const IMAGE_SECTION_HEADER*>(current_section_structure_pos); //Copying sections data to the place in memory, //where they have to be placed memcpy(reinterpret_cast<void*>(original_image_base + section_header->VirtualAddress), reinterpret_cast<char*>(unpacked_mem) + number_of_sections * sizeof(packed_section) + current_raw_data_ptr, section_header->SizeOfRawData); //Move pointer to section data //in unpacked data block current_raw_data_ptr += section_header->SizeOfRawData; //Turn to next section header current_section_structure_pos += sizeof(IMAGE_SECTION_HEADER); } //Release memory with unpacked data, //we don't need it anymore virtual_free(unpacked_mem, 0, MEM_RELEASE); |
So, everything is almost ready. To launch the unpacked file successfully, we just have to fix its import table, playing a role of PE loader again. At first let's fix import table virtual address and size in PE header:
1 2 3 4 5 6 7 8 9 10 11 |
//Calculate relative virtual address //of directory table beginning DWORD offset_to_directories; offset_to_directories = original_image_base + dos_header->e_lfanew + sizeof(IMAGE_NT_HEADERS32) - sizeof(IMAGE_DATA_DIRECTORY) * IMAGE_NUMBEROF_DIRECTORY_ENTRIES; //Pointer to import directory IMAGE_DATA_DIRECTORY* import_dir = reinterpret_cast<IMAGE_DATA_DIRECTORY*>(offset_to_directories + sizeof(IMAGE_DATA_DIRECTORY) * IMAGE_DIRECTORY_ENTRY_IMPORT); //Write size and virtual address values to corresponding fields import_dir->Size = original_import_directory_size; import_dir->VirtualAddress = original_import_directory_rva; |
Fill import table:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 |
//If the file has imports if(original_import_directory_rva) { //First descriptor virtual address IMAGE_IMPORT_DESCRIPTOR* descr; descr = reinterpret_cast<IMAGE_IMPORT_DESCRIPTOR*>(original_import_directory_rva + original_image_base); //List all descriptors //Last one is nulled while(descr->Name) { //Load the required DLL HMODULE dll; dll = load_library_a(reinterpret_cast<char*>(descr->Name + original_image_base)); //Pointers to address table and lookup table DWORD* lookup, *address; //Take into account that lookup table may be absent, //as I mentioned at previous step lookup = reinterpret_cast<DWORD*>(original_image_base + (descr->OriginalFirstThunk ? descr->OriginalFirstThunk : descr->FirstThunk)); address = reinterpret_cast<DWORD*>(descr->FirstThunk + original_image_base); //List all descriptor imports while(true) { //Till the first null element in lookup table DWORD lookup_value = *lookup; if(!lookup_value) break; //Check if the function is imported by ordinal if(IMAGE_SNAP_BY_ORDINAL32(lookup_value)) *address = static_cast<DWORD>(get_proc_address(dll, reinterpret_cast<const char*>(lookup_value & ~IMAGE_ORDINAL_FLAG32))); else *address = static_cast<DWORD>(get_proc_address(dll, reinterpret_cast<const char*>(lookup_value + original_image_base + sizeof(WORD)))); //Move to next element ++lookup; ++address; } //Move to next descriptor ++descr; } } |
That's all, we, as PE loader, filled PE file import table. There are couple of things remaining:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
//Restore headers memory attributes virtual_protect(reinterpret_cast<LPVOID>(offset_to_section_headers), number_of_sections * sizeof(IMAGE_SECTION_HEADER), old_protect, &old_protect); //Create epilogue manually _asm { //Move to original entry point mov eax, original_entry_point; add eax, original_image_base; leave; //Like this jmp eax; } |
Now you understand, why we need to write our own function prologue and epilogue in assembler. Instead of ret instruction, which was placed at the end of unpacker code, we put jmp eax instruction, which performs jump to original file code.
So, the packer can process simple PE file now, which has import table only. Any file with resources, TLS, exports will not work, and we will manage this at the next steps. But we can pack ourselves and run the packed file!
As you can see, we packed ourselves, got the binary file packed_simple_pe_packer.exe, and it works!
Complete solution with all projects for this step: Own PE Packer Step 4