SLAE Assignment 5: Shellcode Analysis Part 3

Assignment 5 of the SLAE exam is a little different to the previous 4. The assignment asks for a number of shellcode samples to be analysed.

  • Take up at least 3 shellcode samples created using Msfvenom for linux/x86
  • Use GDB/Ndisasm/Libemu to dissect the functionality of the shellcode
  • Present your analysis

Shellcode 3: read_file

On to the final part of Assigment 5. In the previous parts we’ve taken a look the adduser and chmod Msfvenom payloads, for this part we will be looking at the add_user payload. First, lets look at the payload options.

add_user Payload Options

The following command lists the options for the linux/x86/ read_file payload:

msfvenom -a x86 --platform linux -p linux/x86/read_file --payload-options

The following snippet from the output lists the basic options which will be used when generating the payload

Basic options:
 Name  Current Setting Required Description
 ----  --------------- -------- -----------
 FD    1               yes      The file descriptor to write output to
 PATH                  yes      The file path to read

Description:
 Read up to 4096 bytes from the local file system and write it back
 out to the specified file descriptor

The options will use the following settings:

  • FD: 1
  • PATH: /etc/readfile.txt

The final command to generate the payload will be:

msfvenom -a x86 --platform linux -p linux/x86/read_file -f raw -o read_file_raw PATH=/tmp/readfile.txt

File Setup

The shellcode will read the /tmp/readfile.txt set in the payload options. Before running the shellcode the file was created with a small amount of text.

Disassembed the shellcode using ndisasm

‘ndisasm’ allows for the ‘raw’ NASM instructions to be viewed using the following command:

ndisasm ./read_file_raw -u

The instructions are:

user01@ubuntu-x86-01:~/shared/assembly/slae-exam/5/3$ ndisasm ./read_file -u
00000000 EB36         jmp short 0x38
00000002 B805000000   mov eax,0x5
00000007 5B           pop ebx
00000008 31C9         xor ecx,ecx
0000000A CD80         int 0x80
0000000C 89C3         mov ebx,eax
0000000E B803000000   mov eax,0x3
00000013 89E7         mov edi,esp
00000015 89F9         mov ecx,edi
00000017 BA00100000   mov edx,0x1000
0000001C CD80         int 0x80
0000001E 89C2         mov edx,eax
00000020 B804000000   mov eax,0x4
00000025 BB01000000   mov ebx,0x1
0000002A CD80         int 0x80
0000002C B801000000   mov eax,0x1
00000031 BB00000000   mov ebx,0x0
00000036 CD80         int 0x80
00000038 E8C5FFFFFF   call dword 0x2
0000003D 2F           das
0000003E 746D         jz 0xad
00000040 702F         jo 0x71
00000042 7265         jc 0xa9
00000044 61           popad
00000045 6466696C652E7478 imul bp,[fs:ebp+0x2e],word 0x7874
0000004D 7400 jz 0x4f

Initial Analysis of the shellcode

jmp-call-pop

The first line of the shellcode is a ‘jmp short’ instruction and as can be seen in the instructions below the shellcode includes exactly that:

jmp:     00000000 EB36         jmp short 0x38
call:    00000038 E8C5FFFFFF   call dword 0x2
         00000002 B805000000   mov eax,0x5
pop:     00000007 5B           pop ebx

We can assume from this that the data stored in the memory address after the ‘call’ instruction, ‘0x0000003d’, must be a variable which will be used within the shellcode. As the shellcode will read a file it probably be the file path string.

System Calls

The disassembled shellcode has a number of system class, which have been highlighted with a blue font. Looking at the EAX register values prior to the int 0x80 instruction the system calls can be identified as:

  • System Call 1: ID 5 –  open()
  • System Call 2: ID 2 – read()
  • System Call 3: ID 3 – write()
  • System Call 4: ID 1 – exit()

Analysis with gdb

Generate the ‘C’ shellcode instructions using msfvenom:

‘msfvenom’ can generate shellcode in a format which can be executed from within a C program.

msfvenom -a x86 –platform linux -p linux/x86/read_file -f c -o read_file.c PATH=/tmp/readfile.txt

The command above generates the following code:

 unsigned char buf[] =
 "\xeb\x36\xb8\x05\x00\x00\x00\x5b\x31\xc9\xcd\x80\x89\xc3\xb8"
 "\x03\x00\x00\x00\x89\xe7\x89\xf9\xba\x00\x10\x00\x00\xcd\x80"
 "\x89\xc2\xb8\x04\x00\x00\x00\xbb\x01\x00\x00\x00\xcd\x80\xb8"
 "\x01\x00\x00\x00\xbb\x00\x00\x00\x00\xcd\x80\xe8\xc5\xff\xff"
 "\xff\x2f\x74\x6d\x70\x2f\x72\x65\x61\x64\x66\x69\x6c\x65\x2e"
 "\x74\x78\x74\x00";

The code above is the dropped into the ‘C’ shellcode template used within the SLAE course, as below

#include
#include
unsigned char code[] = \
 "\xeb\x36\xb8\x05\x00\x00\x00\x5b\x31\xc9\xcd\x80\x89\xc3\xb8"
 "\x03\x00\x00\x00\x89\xe7\x89\xf9\xba\x00\x10\x00\x00\xcd\x80"
 "\x89\xc2\xb8\x04\x00\x00\x00\xbb\x01\x00\x00\x00\xcd\x80\xb8"
 "\x01\x00\x00\x00\xbb\x00\x00\x00\x00\xcd\x80\xe8\xc5\xff\xff"
 "\xff\x2f\x74\x6d\x70\x2f\x72\x65\x61\x64\x66\x69\x6c\x65\x2e"
 "\x74\x78\x74\x00";

int main()
{
    printf("Shellcode Length: %d\n", strlen(code));
    int (*ret)() = (int(*)())code;
    ret();
}

The ‘C’ program should be compiled with the following command. I added the ‘-ggdb’ option to include the debugging symbols.

gcc -ggdb -fno-stack-protector -z execstack shellcode.c -o shellcode

Running the shellcode in gdb

The system calls required to read the file don’t required root privileges and therefore the ‘sudo’ command is not used when starting ‘gdb’.

gdb ./shellcode

Once gdb is running we need to get the memory location of the shellcode so we can set a breakpoint and see what is happening. The way we do this is to disassembled the instructions in the ‘code’ variable which will list the instructions so we can get the memory address of the first shellcode instruction

screenshot1

A breakpoint is set at the memory address ‘0x0804a040’ and the program is run and execution stops at the breakpoint, as shown in the screenshot below. The output given here is a result of the configuration which I have added to the .gdbinit file, see my post here details.

screenshot2

Shellcode Analysis

Continuing on from the previous section we can now follow the program execution of the shellcode and verify what it does.

System Call 1: open()

The 1st system call opens the file which was set in the msfvenon options when the payload was generated. The function prototype for the open() system call is:

int open(const char *pathname, int flags);
int open(const char *pathname, int flags, mode_t mode);

Therefore the EBX, ECX and possibly the EDX registers must be set in the shellcode with the respective parameters in the prototypes above.

  • EBX: Set at memory address ‘0x0804a042’
  • ECX: Set at memory address ‘0x0804a047’
  • EDX: Not used

The first instruction of the shellcode is part of a jmp, call, pop sequence which is used to get the memory address of the file name and path that will be read by the shellcode. The instructions are as follows:

Dump of assembler code for function code:
 0x0804a040 <+0>:   jmp 0x804a078 <code+56> ; Initiate JMP-call-pop
 0x0804a042 <+2>:   mov eax,0x5             ; Set the system call ID for open()
 0x0804a047 <+7>:   pop ebx                 ; Set open() arg#1 - *pathname
 0x0804a048 <+8>:   xor ecx,ecx             ; Set open() arg#2 - flags
                                            ; Zero the ECX Register
 0x0804a04a <+10>:  int 0x80                ; Execute the system call

System Call 2: read()

The 2nd system call is read() reads the contents of the open file into a buffer which is set in the system call. The read() system call has the following function prototype:

ssize_t read(int fd, void *buf, size_t count);

Therefore the EBX, ECX and the EDX registers must be set in the shellocode with the respective parameters in the prototypes above.

  • EBX: Set at memory address ‘0x804a04c’
  • ECX: Set at memory address ‘0x804a055’
  • EDX: Set at memory address ‘0x804a057’

The 2nd system call instructions are as below:

0x804a04c <+12>: mov ebx,eax    ; Set read() arg#1 the file descriptor returned 
                                ; by the open() system call
0x804a04e <+14>: mov eax,0x3    ; Set the system call ID for read() 
0x804a053 <+19>: mov edi,esp    ; Store the current stack pointer address in EDI
0x804a055 <+21>: mov ecx,edi    ; Set the read() arg#2 *buf address of the 
                                ; current stack pointer to read into
0x804a057 <+23>: mov edx,0x1000 ; Set read() arg#3 the number of characters 
                                ; to read into the buffer
0x804a05c <+28>: int 0x80       ; Execute the system call

System Call 3: Write()

The 3rd system call write() writes the contents of the buffer data from the previous system call to the stdout file drescriptor. The write system call has the following prototype, which matches that of the read() system call.

ssize_t write(int fd, const void *buf, size_t count);

Therefore the EBX, ECX and the EDX registers must be set in the shellocode with the respective parameters in the prototypes above.

  • EBX: Set at memory address ‘0x804a065 ‘
  • ECX: Set at memory address – remains the same as the read() system call
  • EDX: Set at memory address ‘0x804a05e ‘

The instructions for the write() system call are:

0x804a05e <+30>: mov edx,eax   ; Set the number of Bytes to write
0x804a060 <+32>: mov eax,0x4   ; Set the system call ID for write()
0x804a065 <+37>: mov ebx,0x1   ; Set the write() arg#1 File descriptor to write 
                               ; to ‘0x1’
0x804a06a <+42>: int 0x80      ; Execute the system call

System Call 4: Exit()

The program exits gracefully using the exit() system call.

Chmod Shellcode Thoughts

There are a couple instructions within the read_file payload shellcode

Redundent  instructions in the shellcode

There are two instructions within the read() system call which are used to set the buffer address that the open file will be read into.

0x804a053 <+19>: mov edi,esp    ; Store the current stack pointer address in EDI
0x804a055 <+21>: mov ecx,edi    ; Set the *buf address of the stack to read into

Lets look at what they actually do:

  • mov edi, esp: moves the address in ESP into the EDI register.
  • mov ecx, edi: moves the value in EDI, previously copied from ESP, into the ECX register

It seems to me that these two instructions could be replace with the following instruction:

mov ecx, esp   ; moves the memory address in the ESP register straight into the ECX register.

Note: I thought it would be fun to make the changes in the shellcode by changing the opcodes instead of the assembly instructions as it has a couple of interesting parts and would be a different way of looking at what the shellcode is doing. 

So now to test my theory. I created an assembly program with a single instruction as above and assembled it to find out what the machine code instructions are:

8048080: 89 e1     mov ecx,esp

Next, I needed to find the machine code instructions for the original shellcode which was done using ndisasm on the original payload generated by msfvenom

00000013: 89 E7     mov edi,esp
00000015: 89 F9     mov ecx,edi

Lets edit the shellcode instructions in shellcode.c. We know that the instructions are before the 2nd int 0x80 instruction which has the machine instructions cd 80, which I’ve highlighted in a blue font. By stepping back we can see the target instructions which I’ve highlighted in a red font.

unsigned char code[] = \
“\xeb\x36\xb8\x05\x00\x00\x00\x5b\x31\xc9\xcd\x80\x89\xc3\xb8”
“\x03\x00\x00\x00\x89\xe7\x89\xf9\xba\x00\x10\x00\x00\xcd\x80
“\x89\xc2\xb8\x04\x00\x00\x00\xbb\x01\x00\x00\x00\xcd\x80\xb8”
“\x01\x00\x00\x00\xbb\x00\x00\x00\x00\xcd\x80\xe8\xc5\xff\xff”
“\xff\x2f\x74\x6d\x70\x2f\x72\x65\x61\x64\x66\x69\x6c\x65\x2e”
“\x74\x78\x74\x00”;

Looking at the opcodes for the different instructions they all have 0x89 as their first Byte. So I can just change the second Byte from \xE7 to \xE1 and delete \x89\xF9. The edited instructions are highlighted in a red font.

“\xeb\x36\xb8\x05\x00\x00\x00\x5b\x31\xc9\xcd\x80\x89\xc3\xb8”
“\x03\x00\x00\x00\x89\xe1\xba\x00\x10\x00\x00\xcd\x80
“\x89\xc2\xb8\x04\x00\x00\x00\xbb\x01\x00\x00\x00\xcd\x80\xb8”
“\x01\x00\x00\x00\xbb\x00\x00\x00\x00\xcd\x80\xe8\xc5\xff\xff”
“\xff\x2f\x74\x6d\x70\x2f\x72\x65\x61\x64\x66\x69\x6c\x65\x2e”
“\x74\x78\x74\x00”;

Once compiled I can check the instructions are correct using objdump. The snippet below shoes the instructions before the int 0x80 and as can be seen they old instructions have been replaced with the new opcodes

 804a04e: b8 03 00 00 00       mov eax,0x3
 804a053: 89 e1                mov ecx,esp
 804a055: ba 00 10 00 00       mov edx,0x1000
 804a05a: cd 80                int 0x80

As we’ve reduced the number of opcodes within the shellcode I will also need to adjust the jmp and call instructions to account for the difference in length of the shellcode.

The first jmp instructions is straight forward the destination needs to be reduced by two Bytes. So I simply change the eb 36 instruction to eb 34. Objdump shows the difference once compiled from this:

804a040: eb 36                 jmp    804a076 <code+0x38>

to

804a040: eb 34                 jmp    804a076 <code+0x36>

The call instruction is a little more complicated. The actual target can be calculated as follows:

  • E8 is the call instruction which requires a relative offset as its parameter.
  • The relative offset is postive for a call that jumps to a higher memory address or negative for a call jumps to lower memory address.
  • The offset is measure from the address of the next instruction and is set using little endian byte order. The E8 instruction is always 5 Bytes long, but in this instance we don’t really care about that. We just want the call instruction to jump to <code+0x2>

The current call instruction jumps back to the first instruction at , it has not offset so is the first instruction, which then jumps to the call instruction and ends up in a loop.

804a076: e8 c5 ff ff ff call 804a040 

The current relative offset is FFFFFFc5. As the code is now shorter the offset must be reduced by 2 Bytes as call isn’t jumping as far back as before because the number of instructions have been reduced. The offset number is in the Two’s compliment format in which -1 is the hexadecimal number 0xFFFFFFFF. Therefore the to reduce the offset the value must be increased by 2-Bytes giving FFFFFFc7. If this doesn’t make sense see the Wikipedia article about  about Two’s compliment. We now have the new opcodes for the instruction as:

E8 c7 ff ff ff

shellcode.c is compiled and objdump used to view the instructions which shows the

804a076: e8 c7 ff ff ff call 804a042 <code+0x2>

Finally,  Lets run the shellcode in gdb and see if the read system call executes successfully. In the first screenshot the shellcode has stopped before the int 0x80 instruction for the read() system call

screenshot4

The read() system executes correctly and returns the number of Bytes which were read.

screenshot3

The read_file payload shellcode runs correctly with the single instruction, so I wonder why the authors chose include a redundant instruction within the payload. The 2nd instruction looks as though it is saving the the address in the EDI register for use later on, could it be this was put there so that it could be used to restore the memory address in the write() system call. However, as the ECX register isn’t overwritten or changed there is no need to set the memory address of the buffer again, but they forgot to rewrite the previous open() instructions to remove the instruction which saves the memory address to EDI?

Bad characters

The add_user payload also includes a number zeros in the shellcode opcodes. As can be seen below:

00000000 EB36              jmp short 0x38
00000002 B805000000        mov eax,0x5
00000007 5B                pop ebx
00000008 31C9              xor ecx,ecx
0000000A CD80              int 0x80
0000000C 89C3              mov ebx,eax
0000000E B803000000        mov eax,0x3
00000013 89E7              mov edi,esp
00000015 89F9              mov ecx,edi
00000017 BA00100000        mov edx,0x1000
0000001C CD80              int 0x80
0000001E 89C2              mov edx,eax
00000020 B804000000        mov eax,0x4
00000025 BB01000000        mov ebx,0x1
0000002A CD80              int 0x80
0000002C B801000000        mov eax,0x1
00000031 BB00000000        mov ebx,0x0
00000036 CD80              int 0x80
00000038 E8C5FFFFFF        call dword 0x2
0000003D 2F                das
0000003E 746D              jz 0xad
00000040 702F              jo 0x71
00000042 7265              jc 0xa9
00000044 61                popad
00000045 6466696C652E7478  imul bp,[fs:ebp+0x2e],word 0x7874
0000004D 7400              jz 0x4f

The payload author hasn’t used mov instructions that remove the zeros, such as using al instead of EAX. As a result the size of the shellcode is significantly increased.

Read Buffer location

The final aspect of the shellcode I noticed was again in the read() system call, which is actually the same instructions as the previous section.

0x804a053 <+19>: mov edi,esp    ; Store the current stack pointer address in EDI
0x804a055 <+21>: mov ecx,edi    ; Set the read() arg#2 *buf address of the 
                                ; current stack pointer to read into
0x804a057 <+23>: mov edx,0x1000 ; Set read() arg#3 the number of characters

The shellcode uses the current address of the stack pointer in ESP to write into the buffer which have been read from the file.

My question is how does the read() system call write characters into the buffer?

  • Incrementing the buffer memory address i.e. low to high?
  • Decrementing the buffer memory address i.e. high to low.
  • Is the read() system call aware that it is writing to the stack pointer address and changes the direction it write to it doesn’t overwrite other data on the stack?
  • Is the write() system call aware it will be reading from the stack and reverses the direction it reads to take account of that.

The read() system call reads the number of characters set in it’s 3rd parameter ‘count’ which is 0x1000 or 4096 in decimal. So the first step is to create a file with that many characters in it.

Lets run the shellcode inside gdb and see what happens. The shellcode was stopped before the read() system call, the system call executed and the ESP register examined.

screenshot8

The text from the file is written to the ESP value and where it finishes you can see it has overwritten a string containing the file path to the executable file and the buffer is written from low to high memory address, which results in it overwriting the data on the stack.

screenshot7

Looking at this it made me wonder whether the payload should have had something like this instead of the two instructions above.

mov ecx,esp      ; Set read() arg#2 buffer
mov edx,0x1000   ; Set read() arg#3 the number of characters to be read
sub ecx, edx     ; move the buffer away from the top of the stack by the
                 ; number of characters to be read so the stack isn't
                 ; overwritten.

So that’s it for Assignment 5, its been very interesting working through the different msfvenom payloads. It would have been interesting to play around with the different payload options to see which other system calls they added and how they were written by the authors of metasploit, but I’ll save that for another post as the assignments are quite lenghty already.

SLAE Student Details

This blog post has been created for completing the requirements of the SecurityTube Linux Assembly Expert certification:

http://www.securitytube-training.com/online-courses/securitytube-linux-assembly-expert/

Student ID: SLAE-793

Advertisements
This entry was posted in SLAE and tagged , , . Bookmark the permalink.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s