Previous step is here.
Our packer can do almost everything already, except one thing - packing binaries with exports. This, in particular, is an absolute majority of DLL files and OCX components. Some EXE files also have exports. Our packer should rebuild export table and place it to available space, to make it accessible for the loader.
We can relax a bit for now - we have to add just a small amount of code to the packer (generally, the same for the unpacker, but it will be in assembler).
Let's begin with the packer (simple_pe_packer project). If a file has exports, we have to count them, so right after lines
1 2 3 |
//... tls.reset(new pe_base::tls_info(image.get_tls_info())); } |
write:
1 2 3 4 5 6 7 8 9 |
//If a file has exports, get information about them //and list them pe_base::exported_functions_list exports; pe_base::export_info exports_info; if(image.has_exports()) { std::cout << "Reading exports..." << std::endl; exports = image.get_exported_functions(exports_info); } |
PE library makes our life much easier here, so we don't have to go into details of how export structures are organized. Further, we replace lines
1 2 3 |
//At last, strip unnecessary null bytes from the end of the section if(!image.has_reloc()) pe_base::strip_nullbytes(unpacker_added_section.get_raw_data()); |
to
1 2 3 |
//At last, strip unnecessary null bytes from the end of the section if(!image.has_reloc() && !image.has_exports()) pe_base::strip_nullbytes(unpacker_added_section.get_raw_data()); |
Since either exports, or relocations, or both, will follow the unpacker and TLS, it is necessary that they do not overwrite TLS or the unpacker. Besides that, we have to move lines
1 2 3 4 5 |
//Change the unpacker section data size precisely //by the number of bytes in the unpacker body //(in case that null bytes from the end were stripped by //PE library) data.resize(sizeof(unpacker_data)); |
up, because we have to change data size precisely by the number of bytes in the unpacker directly after writing it in case a file has TLS or relocations or exports. Place this piece after lines
1 2 3 4 5 6 7 8 9 10 11 12 |
//... //Add this section too pe_base::section& unpacker_added_section = image.add_section(unpacker_section); if(tls.get() || image.has_exports() || image.has_reloc()) { //Change the unpacker section data size precisely //by the number of bytes in the unpacker body //(in case that null bytes from the end were stripped by //PE library) unpacker_added_section.get_raw_data().resize(sizeof(unpacker_data)); } |
By the way, this should have been done at the previous step, when we made the packer process relocations. Now we change lines
1 2 3 |
//Rebuild relocations, placing them at the end //of the unpacker code section image.rebuild_relocations(reloc_tables, unpacker_section, unpacker_section.get_raw_data().size()); |
to
1 2 3 |
//Rebuild relocations and place them at the end //of section with unpacker code image.rebuild_relocations(reloc_tables, unpacker_section, unpacker_section.get_raw_data().size(), true, !image.has_exports()); |
by the same reason - to prevent exports from overwriting relocations.
It's time to process exports, rebuild their directory and place it to the second added section ("coderpub"):
1 2 3 4 5 6 7 8 9 |
if(image.has_exports()) { std::cout << "Repacking exports..." << std::endl; pe_base::section& unpacker_section = image.get_image_sections().at(1); //Rebuild exports and place them to "coderpub" section image.rebuild_exports(exports_info, exports, unpacker_section, unpacker_section.get_raw_data().size()); } |
And again PE library makes our life easier. Now we just remove the line, which was added before:
1 |
image.remove_directory(IMAGE_DIRECTORY_ENTRY_EXPORT); |
Now we turn to the unpacker. What can we change? We rebuilt export directory, what else do we need? There is one issue. Unlike EXE file, entry point of DLL can be called by the loader more than once. For example, when creating a new thread, or when a process finishes. And we have the unpacker body at entry point address. The unpacker has completed its work already. If we run it the second time, it will just crash. Therefore we have to make the unpacker to check, if a file was unpacked already, and if it was, then jump to original entry point of the unpacked file. We will allocate space in unpacker code for a variable with 4 byte size, filled with null bytes. We will write original entry point address there after unpacking. Before unpacking we will check if this variable is zero, and if not - this means that the file was unpacked already and we will just jump to the address contained in this variable. Firstly, we create the variable itself and compare it with zero:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 |
//... __asm { mov original_image_base, 0x11111111; mov rva_of_first_section, 0x22222222; mov original_image_base_no_fixup, 0x33333333; } //Address of the variable, //which indicates if code was unpacked already DWORD* was_unpacked; __asm { //Trick to get address //of instruction following "call" call next2; add byte ptr [eax], al; add byte ptr [eax], al; next2: //There is an address of first instruction //add byte ptr [eax], al //in eax pop eax; //Store this address mov was_unpacked, eax; //Check what is stored there mov eax, [eax]; //If there is zero, then move to //the unpacker test eax, eax; jz next3; //If not, then finish the unpacker //and go to original entry point leave; jmp eax; next3: } |
Let me explain what this code does. To create a variable inside the code, we could use dd or db directive or other similar one in MASM32. Such directives are not allowed in inline MSVC++ assembler. But we need to create variable of 4 byte size, containing zero somehow! I made it this way: assembler command "add byte ptr [eax], al" takes two bytes and has opcode 00 00. Thus, we get 4 null bytes in a row, if we write two such commands one after another - this is what we need. We have to somehow get first instruction address, taking into account that it can be placed at any virtual address - we have the base independent code. This is performed using "call next2" instruction, which jumps over our fake commands and also puts return address to stack, which equals to address of the command after call. Our instructions follow "call". Now we have their address. Further we check, what is stored at this address (mov eax, [eax]), originally there will be zeros, and unpacker body will begin to execute, because jz next3 instruction will perform jump to the label. If the unpacker has been executed already, we write the original entry point address to the variable at was_unpacked address (which points to our fake commands), and jz next3 check fails. Unpacker body will be finished and jump to original entry point of file will be performed. We have to write the entry point address at was_unpacked address:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
//... info = reinterpret_cast<const packed_file_info*>(original_image_base + rva_of_first_section); //Get original entry point address DWORD original_ep; original_ep = info->original_entry_point + original_image_base; __asm { //Write it to address stored in //was_unpacked variable mov edx, was_unpacked; mov eax, original_ep; mov [edx], eax; } |
That's all, we can build and test our packer. I made a new solution with several projects for testing: two DLL files and one EXE. EXE file is linked to one library statically, and loads other one dynamically, and after this calls several functions from these libraries. Statically loaded library and DLL file itself contain static TLS. (Dynamically loaded libraries should not have static TLS, because it will not be initialized). I added this solution to the archive at the end of the article. I packed both DLLs and EXE file (after that I gave both packed DLL files their original names), and made myself sure, that everything works properly, like original files.
Full solution for this step: Own PE Packer Step 8