Developing PE file packer step-by-step. Step 9. Delay-loaded DLLs and Image Config

Previous step is here.

Today we will do that little things, which I've put aside during my old packer development. Our new packer can do everything already, but we have a couple of small nuances and it will be good to finish with them. The first one is delay-loaded import. It allows to load required PE file libraries when they are really needed, thus saving time on loading image to memory. This mechanism is implemented by compilers/linkers only and it is not related to the loader. However, there is IMAGE_DIRECTORY_ENTRY_DELAY_IMPORT directory in PE header, which points to delayed import data. I don't know whether this is used by linker and built program, but the loader ignores it, but we better leave this directory and don't zero it. Let's remove the line

    image.remove_directory(IMAGE_DIRECTORY_ENTRY_DELAY_IMPORT);

1	image.remove_directory(IMAGE_DIRECTORY_ENTRY_DELAY_IMPORT);

This article tells about some unscientific bullshit.
If you are a physicist, biologist or just remember school program too well, you should not read it for your own good. Otherwise you risk to die by laugh. We warned you.

I see what you did there.
Information about this article is given as of undefined date. Probably it is hopelessly outdated and can be interesting only to Slowpokes.

That's all with delayed import. The next thing to pay attention to is image load configuration. There is such directory in PE file header, IMAGE_DIRECTORY_ENTRY_LOAD_CONFIG. This directory may contain IMAGE_LOAD_CONFIG_DIRECTORY32 structure for x86 PE files (IMAGE_LOAD_CONFIG_DIRECTORY64 for PE+), which provides information about how the image should be loaded. It also contains the address list of commands with LOCK prefix, which is replaced with NOP on single-processor systems, and the list of all SEH handlers (it is used to prevent SEH-hacking and it is actually the list of all legal and allowed exception handlers in PE file). Latest versions of MSVC++ compilers sometimes generate this directory and place there application SEH handlers list and a pointer to their security cookie (the variable to control buffer/stack overflow and corruption). Judging by Win 2000 kernel sources, all this is read by the loader, so it is not right to completely destroy IMAGE_DIRECTORY_ENTRY_LOAD_CONFIG directory, although I didn't see any problems with PE files operation after zeroing it. Let's keep this directory by moving it to second packed file section ("coderpub"). Firstly, let's remove the following line from the packer code:

    image.remove_directory(IMAGE_DIRECTORY_ENTRY_LOAD_CONFIG);

1	image.remove_directory(IMAGE_DIRECTORY_ENTRY_LOAD_CONFIG);

My PE library will always help us to deal with the rest. We will just move SE handlers list to our second added section. But we have to process LOCK prefixes address list manually in the unpacker (we won't move it, as the loader doesn't have to fix them at this stage - a file is not unpacked yet). In the unpacker, after the lines

      exports = image.get_exported_functions(exports_info);
    }

1 2	exports = image.get_exported_functions(exports_info); }

add

    //If file has Image Load Config, get information about it
    std::auto_ptr<pe_base::image_config_info> load_config;
    if(image.has_config())
    {
      std::cout << "Reading Image Load Config..." << std::endl;
      load_config.reset(new pe_base::image_config_info(image.get_image_config()));
    }

//If file has Image Load Config, get information about it

std::auto_ptr<pe_base::image_config_info> load_config;

if(image.has_config())

{

std::cout << "Reading Image Load Config..." << std::endl;

load_config.reset(new pe_base::image_config_info(image.get_image_config()));

}

These lines read image load configuration, if it exists. The code is similar to TLS. Further, change the line

      if(tls.get() || image.has_exports() || image.has_reloc())

1	if(tls.get() \|\| image.has_exports() \|\| image.has_reloc())

      if(tls.get() || image.has_exports() || image.has_reloc() || load_config.get())

1	if(tls.get() \|\| image.has_exports() \|\| image.has_reloc() \|\| load_config.get())

because we will place the load configuration directory to "coderpub" section. Further, similarly change the line

        if(!image.has_reloc() && !image.has_exports())

1	if(!image.has_reloc() && !image.has_exports())

        if(!image.has_reloc() && !image.has_exports() && !load_config.get())

1	if(!image.has_reloc() && !image.has_exports() && !load_config.get())

and

      image.rebuild_relocations(reloc_tables, unpacker_section, unpacker_section.get_raw_data().size(), true, !image.has_exports());

1	image.rebuild_relocations(reloc_tables, unpacker_section, unpacker_section.get_raw_data().size(), true, !image.has_exports());

      image.rebuild_relocations(reloc_tables, unpacker_section, unpacker_section.get_raw_data().size(), true, !image.has_exports() && !load_config.get());

1	image.rebuild_relocations(reloc_tables, unpacker_section, unpacker_section.get_raw_data().size(), true, !image.has_exports() && !load_config.get());

and, at last

      image.rebuild_exports(exports_info, exports, unpacker_section, unpacker_section.get_raw_data().size());

1	image.rebuild_exports(exports_info, exports, unpacker_section, unpacker_section.get_raw_data().size());

      image.rebuild_exports(exports_info, exports, unpacker_section, unpacker_section.get_raw_data().size(), true, !load_config.get());

1	image.rebuild_exports(exports_info, exports, unpacker_section, unpacker_section.get_raw_data().size(), true, !load_config.get());

Then, we add a couple of new fields to packed_file_info structure (structs.h file):

  DWORD original_load_config_directory_rva; //Original load configuration directory relative address
  DWORD lock_opcode; //LOCK assembler command fake opcode

1 2	DWORD original_load_config_directory_rva; //Original load configuration directory relative address DWORD lock_opcode; //LOCK assembler command fake opcode

We will need these fields in the unpacker, and we will fill them in packer for now by writing after

    //Store relative address and size of
    //original file relocation directory 
    basic_info.original_relocation_directory_rva = image.get_directory_rva(IMAGE_DIRECTORY_ENTRY_BASERELOC);
    basic_info.original_relocation_directory_size = image.get_directory_size(IMAGE_DIRECTORY_ENTRY_BASERELOC);

//Store relative address and size of

//original file relocation directory

basic_info.original_relocation_directory_rva = image.get_directory_rva(IMAGE_DIRECTORY_ENTRY_BASERELOC);

basic_info.original_relocation_directory_size = image.get_directory_size(IMAGE_DIRECTORY_ENTRY_BASERELOC);

the following lines:

    //Store relative address of
    //original file load configuration directory 
    basic_info.original_load_config_directory_rva = image.get_directory_rva(IMAGE_DIRECTORY_ENTRY_LOAD_CONFIG);

//Store relative address of

//original file load configuration directory

basic_info.original_load_config_directory_rva = image.get_directory_rva(IMAGE_DIRECTORY_ENTRY_LOAD_CONFIG);

and after

    //Get and store original section number
    basic_info.number_of_sections = sections.size();

1 2	//Get and store original section number basic_info.number_of_sections = sections.size();

the following:

    //LOCK assembler instruction opcode
    basic_info.lock_opcode = 0xf0;

1 2	//LOCK assembler instruction opcode basic_info.lock_opcode = 0xf0;

I will explain why this is necessary. The loader determines, that our LOCK prefix table consists of a single element, which points to lock_opcode field of basic_info structure (we will build it that way, of course). On a single-processor system LOCK command opcode (0xf0), which we saved to this field, will be replaced with NOP instruction opcode (0x90), and we can check in the unpacker, whether we need to process original LOCK prefix table. In general, I'm not sure if this functionality exists in loaders of the systems starting from XP (it looks like all systems don't care about these tables), however, let it be, you never know when it may arise. Actually, I never saw any files with LOCK tables, and possibly I'm doing useless things. I saw those tables actually, in Win 2000 sources, but I will discuss that further :D

OK, that's all with editing, now we turn to configuration directory rebuilding. Right after the piece of code, which is responsible for exports rebuilding, we add the following code:

    if(load_config.get())
    {
      std::cout << "Repacking load configuration..." << std::endl;

      pe_base::section& unpacker_section = image.get_image_sections().at(1);

      //Clear LOCK prefix address
      load_config->clear_lock_prefix_list();
      //Add the address of our fake LOCK prefix
      load_config->add_lock_prefix_rva(pe_base::rva_from_section_offset(image.get_image_sections().at(0), offsetof(packed_file_info, lock_opcode)));
      
      //Rebuild loading configuration directory and place it to "coderpub" section
      //Rebuild SE Handler and LOCK prefix tables automatically 
      image.rebuild_image_config(*load_config, unpacker_section, unpacker_section.get_raw_data().size(), true, true);
    }

if(load_config.get())

{

std::cout << "Repacking load configuration..." << std::endl;

pe_base::section& unpacker_section = image.get_image_sections().at(1);

//Clear LOCK prefix address

load_config->clear_lock_prefix_list();

//Add the address of our fake LOCK prefix

load_config->add_lock_prefix_rva(pe_base::rva_from_section_offset(image.get_image_sections().at(0), offsetof(packed_file_info, lock_opcode)));

//Rebuild loading configuration directory and place it to "coderpub" section

//Rebuild SE Handler and LOCK prefix tables automatically

image.rebuild_image_config(*load_config, unpacker_section, unpacker_section.get_raw_data().size(), true, true);

}

We rebuild load configuration directory and place it to the end of the second added to packed file section. We set in the unpacker options, that it is necessary to rebuild SE handler and LOCK prefix tables. We will process the original LOCK prefix table in the unpacker. That is all with the packer. Now we turn to the unpacker project. It looks like that the offsets specified in parameters.h file are corrupted again, and we can't be sure that they were correct at the previous step (MSVC++ builds the project in its own way, optimizing it by size, thus small changes can make the compiler to use other assembler commands). That's why I decided to set them once and for all, by doing this:

  //Create prologue manually
  __asm
  {
    jmp next;
    ret 0xC;
next:
    push ebp;
    mov ebp, esp;
    sub esp, 4096;
    
    mov eax, 0x11111111;
    mov ecx, 0x22222222;
    mov edx, 0x33333333;
  }

  //Image loading address
  unsigned int original_image_base;
  //Relative address of the first section,
  //where the packer places the information for the unpacker
  //and packed data themselves
  unsigned int rva_of_first_section;
  //Image loading address (original one, the relocations are not applied to it)
  unsigned int original_image_base_no_fixup;

  //These instructions are necessary only to
  //replace addresses with real ones in the unpacker builder
  __asm
  {
    mov original_image_base, eax;
    mov rva_of_first_section, ecx;
    mov original_image_base_no_fixup, edx;
  }

//Create prologue manually

__asm

{

jmp next;

ret 0xC;

push ebp;

mov ebp, esp;

sub esp, 4096;

mov eax, 0x11111111;

mov ecx, 0x22222222;

mov edx, 0x33333333;

}

//Image loading address

unsigned int original_image_base;

//Relative address of the first section,

//where the packer places the information for the unpacker

//and packed data themselves

unsigned int rva_of_first_section;

//Image loading address (original one, the relocations are not applied to it)

unsigned int original_image_base_no_fixup;

//These instructions are necessary only to

//replace addresses with real ones in the unpacker builder

__asm

{

mov original_image_base, eax;

mov rva_of_first_section, ecx;

mov original_image_base_no_fixup, edx;

}

Now the assembler commands offsets [mov eax, 0x11111111] etc will be always the same, because the commands opcodes [mov eax/ecx/edx, number] are always the same. Let's edit the offset values in parameters.h file to fit the new code:

static const unsigned int original_image_base_offset = 0x0F;
static const unsigned int rva_of_first_section_offset = 0x14;
static const unsigned int original_image_base_no_fixup_offset = 0x19;

static const unsigned int original_image_base_offset = 0x0F;

static const unsigned int rva_of_first_section_offset = 0x14;

static const unsigned int original_image_base_no_fixup_offset = 0x19;

Further we write the following code before TLS processing code:

  //If file has load configuration directory
  if(info_copy.original_load_config_directory_rva)
  {
    //Get pointer to original load configuration directory
    const IMAGE_LOAD_CONFIG_DIRECTORY32* cfg = reinterpret_cast<const IMAGE_LOAD_CONFIG_DIRECTORY32*>(info_copy.original_load_config_directory_rva + original_image_base);
    
    //If the directory has LOCK prefixes table
    //and the loader overwrites our fake LOCK opcode
    //to NOP (0x90) (i.e. the system has a single processor)
    if(cfg->LockPrefixTable && info_copy.lock_opcode == 0x90 /* NOP opcode */)
    {
      //Get pointer to first element of 
      //absolute address of LOCK prefixes table 
      const DWORD* table_ptr = reinterpret_cast<const DWORD*>(cfg->LockPrefixTable);
      //Enumerate them
      while(true)
      {
        //Pointer to LOCK prefix
        BYTE* lock_prefix_va = reinterpret_cast<BYTE*>(*table_ptr);
        
        if(!lock_prefix_va)
          break;
          
        //Change it to NOP
        *lock_prefix_va = 0x90;
      }
    }
  }

//If file has load configuration directory

if(info_copy.original_load_config_directory_rva)

{

//Get pointer to original load configuration directory

const IMAGE_LOAD_CONFIG_DIRECTORY32* cfg = reinterpret_cast<const IMAGE_LOAD_CONFIG_DIRECTORY32*>(info_copy.original_load_config_directory_rva + original_image_base);

//If the directory has LOCK prefixes table

//and the loader overwrites our fake LOCK opcode

//to NOP (0x90) (i.e. the system has a single processor)

if(cfg->LockPrefixTable && info_copy.lock_opcode == 0x90 /* NOP opcode */)

{

//Get pointer to first element of

//absolute address of LOCK prefixes table

const DWORD* table_ptr = reinterpret_cast<const DWORD*>(cfg->LockPrefixTable);

//Enumerate them

while(true)

{

//Pointer to LOCK prefix

BYTE* lock_prefix_va = reinterpret_cast<BYTE*>(*table_ptr);

if(!lock_prefix_va)

break;

//Change it to NOP

*lock_prefix_va = 0x90;

}

So we completed to work with, as it seems, not highly demanded functionality, because modern single-core processors ignore LOCK prefix, and the loader ignores LOCK prefix table. :)
It is curious, by the way, but EXE files from Win 2000 are being packed normally and work on it.

P.S. It seems, that in Win 2000 the loader doesn't care about LOCK prefixes too. The only thing it does while loading is checking that there are no NOP (0x90) instruction opcodes written at LOCK prefix addresses in multiprocessor systems. That time Windows had two kernels - single-processor and multi-processor, one of them was selected during installation. It looks like nobody implemented the Load Configuration directory described functionality since that time, but the fields with descriptions are still remaining. By the way, in Win 2000 the directory structure itself is completely different and some fields are missing. My PE library can't read it. But I decided to leave the functionality in the packer. Now the packer corresponds to open Microsoft documentation. However, their loader doesn't correspond it. :) At last, it is somehow useful to rebuild load configuration directory with keeping SE handlers addresses.

Full solution for this step: Own PE Packer Step 9

Leave a Reply Cancel reply