Assembling and Compiling when writing shellcode

I’ve been working on the SecurityTube Linux Assembly Expert certification and thought I’d write about a couple of things I’ve come across which helped me to save quite a lot of time when writing the shellcode throughout the course and for the exam.

I’m sure there are plenty of tools out there which are used by professional programmers who work with assembly on a day to day basis which allow for a smooth development and and build process. The basic process which gets repeated over and over again is as follows:

  • Write some shellcode in assembly and save it in a .nasm file.
    • Assemble the program using ‘nasm’
    • Link the program using ‘ld’
  • Extract the machine instructions
  • Check for Bad characters
  • Put the machine instructions into a shellcode.c file and compile it with gcc

After a couple of times of working through this process I realised that I was wasting quite a lot of time manually performing these steps. The following post details a script which can be used for assembling and compiling shellcode.

Checking for bad 0x0 characters

Whilst we’re assembling the program we may as well check for bad characters. The program must be assembled without the nasm ‘-ggdb’ option otherwise a number of additional sections are added to the program which contain a lot of 0x0’s.

#!/bin/bash

echo '[+] Check for Bad characters'
echo ' The following bad characters were found:'
nasm -f elf32 -o $1.o $1.nasm
ld -z execstack -o _$1 $1.o
objdump -D ./_$1 -M intel | grep 00
echo ''

Assembled with -ggdb option

Now we assemble again with the ‘-ggdb’ option so the debug symbols are included and we can use the name of the sections within gdb when getting memory addresses, setting break points etc.

echo '[+] Assembling with Nasm ... '
nasm -ggdb -f elf32 -o $1.o $1.nasm

echo '[+] Linking ...'
ld -z execstack -o _$1 $1.o
rm $1.o

Note: The output of the ‘ld’ command is _$1, this is done to stop the original .nasm file being overwritten if the script is run with the file extension included in the input parameter by mistake. 

Extracting the Machine Language Instructions

For anyone who’s doing the SLAE course Vivek mentions you have to be aware of how many columns of machine instructions the objdump command outputs using the command based on the -D option from CommandLineFu.com. If the command which are used to remove the text which aren’t machine instructions aren’t correct it is possible for some of the machine language Bytes to be removed and therefore not included within the extracted shellcode.

‘objdump -D’

The command line fu version uses the objdump ‘-D’ option which gives the output in the following format.

Disassembly of section .text:
08049080 :
 8049080: 31 c0             xor eax,eax
 8049082: b0 46             mov al,0x46
 80490a2: e8 e5 ff ff ff    call 804908c 

The example above is an extract and doesn’t make any sense, but it shows the format of the output. The output is given in a number of columns so to extract the machine language instructions we need to know which instructions contain the Bytes we want. Depending on the assembly instructions used the number of columns which contain machine instructions can change. As a result the -D option can give in inconsistent results because the command must know the number columns.

‘objdump -s’

‘objdump’ has a number of different options which can control the output format. The ‘-s’ option gives the output in the following format. Again the instructions in the example are junk, but the important point is to see the format of the output.

Contents of section .text:
 8049080 31c0b046 31db31c9 cd80eb16 5b31c088 1..F1.1.....[1..
Contents of section .stab:
 0000 01000000 00001300 1b000000 01000000 ................
Contents of section .stabstr:
 0000 006f7269 67696e61 6c2d7079 74686f6e .original-python

The machine language instructions required are listed in the .text section and are in the format of a memory address followed by 4 groups of 4 Bytes, highlighted in a blue font. The machine language  instructions can be isolated and extracted consistently as the format of each line doesn’t change, only the number of lines is altered to reflect the number instructions within the program.

objdump Command

As a result of getting inconsistent output I created my own version which seem to output the machine instructions consistently. The following command should extract the machine language instructions, remove any of the other sections that aren’t required and format it in the correct format to be used with the C shellcode template.

objdump -s  ./_$1 | grep -v '^ [0-9a-f][0-9a-f][0-9a-f][0-9a-f] \b' | grep -v 'Contents' | grep -v '^./' | cut -d' ' -f 3-6| sed 's/ //g' | sed '/./!d' | tr -d '\n'| sed 's/.\{2\}/&\\x/g' | sed 's/^/\\x/'|sed 's/..$//'|sed 's/^/"/;s/$/"/g'

Create the shellcode.c file

Now we can put together the different parts of the C source file using ‘echo’, the objdump command above and piping the output to a tmp.c file.

echo ''
echo '[+] Create C Program ...'
echo '#include
#include

unsigned char code[] = \' >> tmp.c

objdump -s ./_$1 | grep -v '^ [0-9a-f][0-9a-f][0-9a-f][0-9a-f] \b' | grep -v 'Contents' | grep -v '^./' | cut -d' ' -f 3-6| sed 's/ //g' | sed '/./!d' | tr -d '\n'| sed 's/.\{2\}/&\\x/g' | sed 's/^/\\x/'|sed 's/..$//'|sed 's/^/"/;s/$/"/g' >> tmp.c

echo ';

int main()
{

 printf("Shellcode Length: %d\n", strlen(code));

 int (*ret)() = (int(*)())code;

 ret();
}' >> tmp.c

rm _$1

Compile the C source file

The final step is to compile the tmp.c  and removes the object files tmp.c and tmp.o.

echo '[+] Compile C Program ...'
gcc -ggdb -fno-stack-protector -z execstack tmp.c -o $1
rm tmp.c
rm tmp.o

echo''
echo '[+] Done!'

Usage

The scripted takes a single parameter the file name of the assembly program with out the .nasm extension. So for my_shellcode.nasm the script would be run using:

./compile.sh my_shellcode

The compiled program would be called:

my_shellcode

Final Script

The final script includes a number of additional echo commands to denote what each section is doing and to provide formatting.

#!/bin/bash

echo '[+] Check for Bad characters'
echo ' The following bad characters were found:'
nasm -f elf32 -o $1.o $1.nasm
ld -z execstack -o _$1 $1.o
objdump -D ./_$1 -M intel | grep 00
echo ''


echo '[+] Assembling with Nasm ... '
nasm -ggdb -f elf32 -o $1.o $1.nasm

echo '[+] Linking ...'
ld -z execstack -o _$1 $1.o

echo ''
echo '[+] Create C Program ...'
echo '#include
#include

unsigned char code[] = \' >> tmp.c

objdump -s ./_$1 | grep -v '^ [0-9a-f][0-9a-f][0-9a-f][0-9a-f] \b' | grep -v 'Contents' | grep -v '^./' | cut -d' ' -f 3-6| sed 's/ //g' | sed '/./!d' | tr -d '\n'| sed 's/.\{2\}/&\\x/g' | sed 's/^/\\x/'|sed 's/..$//'|sed 's/^/"/;s/$/"/g' >> tmp.c

echo ';

int main()
{

 printf("Shellcode Length: %d\n", strlen(code));

 int (*ret)() = (int(*)())code;

 ret();
}' >> tmp.c

rm _$1


echo '[+] Compile C Program ...'
gcc -ggdb -fno-stack-protector -z execstack tmp.c -o $1
rm tmp.c

echo''
echo '[+] Done!'

We now have a script that will do all the steps described. Depending on what I’ve been working on I’ve altered the script to meet my needs. For example sometimes you don’t need to extract the machine language  instructions and compile a C program, or you want it to start gdb . So I just removed the sections but kept the checking for bad characters

I’m sure there are a lot of improvements which can be made to the above, but it works well and I found saved me a lot of time.

I hope this might help other people who are working on the SLAE course or are learning to write shellcode.

Advertisements
This entry was posted in SLAE and tagged , , . Bookmark the permalink.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s