So back in December I read a blog from Akamai explaining the complexities of process injection on Linux. My first instinct reading it was, surely it’s not that hard! gdb
can just call functions, why don’t we just do what gdb
does??
So I thought, Duty Calls! I’m going to figure out how gdb
does it and tell the author - Ori David - why they’re WRONG! There’s no more powerful motivating force for research than being sure somebody else is wrong!
I finally got around to experimenting with it last week, and I’m happy to say: they were actually correct about everything. But I learned a bunch along the way, and expanded on their work a bit, so I thought I’d share my perspective!
In this post, I’ll show you how I developed a tool to load an arbitrary shared library (.so
) file into another process’s memory space. I should be very clear that this isn’t a security bypass of any kind - you have to have access to the system and permission to debug the process - it’s just an interesting way to tinker with your own software.
But why?
(Note that this section is about Windows, and the rest of this blog will be about Linux - this is just an origin story.)
One of the first things I ever learned in security (like EVER) was how to forcibly load a .dll
file into a program’s memory space on Windows. Why, you ask? To cheat at videogames, of course!
I was in highschool and people I knew were doing these awesome hacks in Starcraft where they could have a custom HUD and flip bits in memory and even customize AI. I was never the type to ask for tools, but I did ask them how to do that sorta thing, and they said they injected their .dll
into the process’s memory and hooked functions to go through their code before the real code, and from the process’s memory space you can call their functions to write to the screen (for example).
I thought that was the COOLEST and set out to learn how!
I picked up a book (I don’t currently remember the name, but hopefully by the time I publish this I’ll remember! - as I’m editing, I still can’t figure out the book and it’s driving me crazy - actually, see update at the bottom of this section!) and read it. They explained a bunch of different ways to do it, but the most straightforward was to use a fairly simple three-step process:
- Use
VirtualAlloc()
to allocate read/write memory in a foreign process - Use
WriteProcessMemory()
to write the full path of your.dll
file to that memory - Use
CreateRemoteThread()
to start a new thread in the foreign process, with a starting point ofLoadLibrary
, and the first argument pointing to the allocated memory
That would effectively call LoadLibrary(<.dll file>)
in the process’s context. When the .dll
loads, its DllMain()
function is called and it can do anything it wants in the context of the foreign process! This is used by antivirus and other tools.
I doubt it works anymore, but I actually wrote a tool to do this. I also since published my old hacks, which haven’t worked in 20 years, but you can check out this one for some idea of what I was up to in the early 2000s.
This is all Windows, though, and I want to do the same thing on Linux!
Update: On my second edit pass, this was driving me crazy. And speaking of crazy, I have DM logs from the olden days when I used to work on this stuff, and I actually dug into ICQ logs from 2002, where I found some code I’d copied from the book:
Session Start (ICQ - 96228890:Elliott): Sat Apr 13 14:37:35 2002
[14:37:56] Ron: One problem with my program is that I only have access to windows xp computers so there’s no guarantee that it’ll even work on windows 98.. :-/
[14:38:16] Elliott: :(
Session Close (Elliott): Sat Apr 13 14:40:03 2002
Session Start (ICQ - 96228890:Elliott): Sat Apr 13 15:00:39 2002
[15:00:42] Ron:
#include "..\CmnHdr.h" /* See Appendix A. */ #include <WindowsX.h> #include <tchar.h> #include <stdio.h> [...]
Which, thanks to Googling, was enough to find an archive of the book: Programming Applications in Microsoft Windows by Jeffrey Richter! I knew I was keeping those logs for a reason!
Okay but WHY??
Okay, enough driving down memory lane!
The point of process injection is that you can run your own custom code in the context of another process - that means you have access to the process’s virtual memory, file handles, and even binary code. From there, you can do a lot of tinkering with the process’s internal state, including redirecting calls (a la LD_PRELOAD
, but more powerful! If you want to know about LD_PRELOAD
techniques, I wrote about it here, among other places).
In the context of malware, you can use it to hide code in a running process (think Meterpreter)
In the context of cheating, you can modify how a game works in subtle ways (like I talked about above).
But in the context of reverse engineering, which is what I care about, you can do cool instrumentation stuff to a process to change how it works and perhaps test things. Maybe you can capture / modify decrypted traffic before it’s processed! The sky’s the limit!
Honestly, I don’t know if this is THAT useful, because you can do any of this on Linux with gdb
or LD_PRELOAD
, but I wanted to do it myself and now you’re going to learn about it!
Before we start
We need a couple things first - a target and a library, specifically. Let’s look at those first!
In case it matters, I’m testing all of this on Fedora 40, but it should work on any version of Linux that can run gcc
/ gdb
/ strace
(ie, all of them).
Target
I wrote the world’s simplest C program to serve as a target:
#include <stdio.h>
#include <unistd.h>
int main(int argc, char *argv[]) {
int i;
for(i = 0; ; i++) {
("%d\n", i);
printf(1);
sleep}
}
It just counts:
$ gcc -o target ./target.c
$ ./target
0
1 2
It’s beautiful! We’re going to be using this throughout.
mysolib.so
I need a shared library that does something visible, because we’re going to load it into a foreign process and we want to know that it worked, so I created this:
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
static __attribute__((constructor)) void init_method(void) {
= getpid();
pid_t parent ("Parent = %d\n", parent);
printf
if(fork() == 0) {
for(;;) {
if(getpgid(parent) < 0) {
("Goodbye parent!\n");
printf(0);
exit}
("Test!\n");
printf("sleep 1");
system}
}
}
The idea is that once it starts, it forks, then loops forever printing something until the parent closes. It uses the constructor
syntax, which is a Linux equivalent to the DllMain()
function in a Windows .dll
file.
In a real situation, doing a fork()
would probably defeat the purpose, because now you’re in another process, but I just wanted to simplify.
(Also, forgive the system()
call - I’ll explain that at the end! :) ).
How does gdb
do it?
My gut reaction to the original blog was, why do you have to overcomplicate things? Just do what gdb
does! Let’s find out what that is.
Playing with gdb
With target
running, you can find the process id (pid) using pgrep
:
$ pgrep target 23552
Then attach a debugger:
$ gdb -p 23552
[...]
Using host libthread_db library "/lib64/libthread_db.so.1".
__GI___clock_nanosleep (clock_id=clock_id@entry=0, flags=flags@entry=0, req=req@entry=0x7fff962cd310, rem=rem@entry=0x7fff962cd310)
at ../sysdeps/unix/sysv/linux/clock_nanosleep.c:71
71 return -r; (gdb)
Once gdb
is attached to a process, you can literally just call a function with call
(or print
or a few other keywords):
(gdb) call printf("hi\n") $1 = 3
And observe the output in the target
process:
49
50 hi
We kinda just injected code. Cool, isn’t it? We can also, say, allocate memory (just like using VirtualAlloc()
on Windows):
(gdb) call malloc(128) $2 = (void *) 0x22aa36d0
Populate it using strcpy
(or other techniques) (just like using WriteProcessMemory()
on Windows):
(gdb) call (void)strcpy((char*)0x22aa36d0, "/home/ron/projects/process-injection-experiments/linjector/mysolib.so")
(gdb) x/s 0x22aa36d0 0x22aa36d0: "/home/ron/projects/process-injection-experiments/linjector/mysolib.so"
Then run dlopen
, using that memory as a parameter (just like using LoadLibrary
on Windows):
(gdb) call dlopen(0x22aa36d0, 0x102)
[Attaching after Thread 0x7f10279ab740 (LWP 23552) fork to child process 23706]
[New inferior 2 (process 23706)] [...]
And observe the results (using the .so
file above):
65
66
Parent = 23552
Test! fish: Job 1, './target' terminated by signal SIGSEGV (Address boundary error)
I’m not 100% why it crashes, but it doesn’t really matter. The point is, if gdb
can just call those functions in another process, why can’t I? Clearly the blog is wrong! (Spoiler: it’s not)
What gdb
is doing
Many (smarter) people would probably read the gdb
source (which has some hilarious comments) or documentation, but I got bored realllllly quickly. So I did things my own way - experimentation.
How does somebody like me learn what gdb
is doing? Debug the debugger, of course! It’s just crazy enough to work!
Start the target again, attach a debugger again, get back to where we were before crashing it:
$ gdb -p $(pgrep target)
[...]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
__GI___clock_nanosleep (clock_id=clock_id@entry=0, flags=flags@entry=0, req=req@entry=0x7ffdcb59ec20, rem=rem@entry=0x7ffdcb59ec20)
at ../sysdeps/unix/sysv/linux/clock_nanosleep.c:71
71 return -r; (gdb)
You can even type in the command so it’s ready to go:
(gdb) call malloc(0x1337)
Now in ANOTHER window, use strace
to debug gdb
(isn’t this fun??):
$ strace -p $(pgrep gdb) -o strace.out strace: Process 24189 attached
Then run the command:
(gdb) call malloc(0x1337) $1 = (void *) 0x2c2c56b0
And observe the strace
output, which is written to strace.out
(and is also enormous):
restart_syscall(<... resuming interrupted poll ...>) = 1
[...]
pwrite64(11, "\314", 1, 140728015121279) = 1
ptrace(PTRACE_GETREGS, 23965, {r15=0x403e00, r14=0x7faa850fb000, r13=0, r12=0, rbp=0x7ffdcb59ec10, rbx=0xffffffffffffff88, r11=0x202, r10=0x7ffdcb59ec20, r9=0xfffffffd, r8=0x64, rax=0xfffffffffffffdfc, rcx=0x7faa84f97457, rdx=0x7ffdcb59ec20, rsi=0, rdi=0, orig_rax=0xe6, rip=0x7faa84f97457, cs=0x33, eflags=0x202, rsp=0x7ffdcb59ec08, ss=0x2b, fs_base=0x7faa84eb0740, gs_base=0, ds=0, es=0, fs=0, gs=0}) = 0
ptrace(PTRACE_ETREGS, 23965, {r15=0x403e00, r14=0x7faa850fb000, r13=0, r12=0, rbp=0x7ffdcb59ec10, rbx=0xffffffffffffff88, r11=0x202, r10=0x7ffdcb59ec20, r9=0xfffffffd, r8=0x64, rax=0xfffffffffffffdfc, rcx=0x7faa84f97457, rdx=0x7ffdcb59ec20, rsi=0, rdi=0x1337, orig_rax=0xe6, rip=0x7faa84f97457, cs=0x33, eflags=0x202, rsp=0x7ffdcb59ec08, ss=0x2b, fs_base=0x7faa84eb0740, gs_base=0, ds=0, es=0, fs=0, gs=0}) = 0
[...]
pread64(11, "\220", 1, 140370351439267) = 1
pwrite64(11, "\314", 1, 140370351439267) = 1
pread64(11, "\220", 1, 140370351439399) = 1
pwrite64(11, "\314", 1, 140370351439399) = 1
pread64(11, "\220", 1, 140370352382777) = 1
pwrite64(11, "\314", 1, 140370352382777) = 1
pread64(11, "\220", 1, 140370353357258) = 1
pwrite64(11, "\314", 1, 140370353357258) = 1
pread64(11, "\220", 1, 140370353402231) = 1
pwrite64(11, "\314", 1, 140370353402231) = 1
pread64(11, "\220", 1, 140370353480957) = 1
pwrite64(11, "\314", 1, 140370353480957) = 1
pread64(11, "\314", 1, 140728015121279) = 1
pwrite64(11, "\314", 1, 140728015121279) = 1
ptrace(PTRACE_CONT, 23965, 0x1, 0) = 0
[...]
rt_sigprocmask(SIG_BLOCK, [INT ALRM TERM CHLD WINCH], [], 8) = 0
pipe2([12, 13], O_CLOEXEC) = 0
fcntl(12, F_SETFL, O_RDONLY|O_NONBLOCK) = 0
fcntl(13, F_SETFL, O_RDONLY|O_NONBLOCK) = 0
poll([{fd=12, events=POLLIN}], 1, 0) = 0 (Timeout)
read(12, 0x7ffcfa320b87, 1) = -1 EAGAIN (Resource temporarily unavailable)
write(13, "+", 1) = 1
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_TRAPPED, si_pid=23965, si_uid=1000, si_status=SIGSEGV, si_utime=0, si_stime=2 /* 0.02 s */} ---
read(12, "+", 1) = 1
read(12, 0x7ffcfa31fde7, 1) = -1 EAGAIN (Resource temporarily unavailable)
write(13, "+", 1) = 1
rt_sigreturn({mask=[]}) = 0
read(5, 0x7ffcfa321267, 1) = -1 EAGAIN (Resource temporarily unavailable)
read(12, "+", 1) = 1
read(12, 0x7ffcfa3208b7, 1) = -1 EAGAIN (Resource temporarily unavailable)
[...]
ptrace(PTRACE_PEEKUSER, 23965, 8*SS+8, [0x7faa84eb0740]) = 0
pread64(11, "@\7\353\204\252\177\0\0\340\20\353\204\252\177\0\0@\7\353\204\252\177\0\0\0\0\0\0\0\0\0\0"..., 2368, 140370351163200) = 2368
ptrace(PTRACE_GETREGS, 23965, {r15=0x403e00, r14=0x7faa850fb000, r13=0, r12=0, rbp=0x7ffdcb59eb68, rbx=0xffffffffffffff88, r11=0x202, r10=0x4, r9=0x1, r8=0, rax=0x2c2c69f0, rcx=0x2c2c69f0, rdx=0, rsi=0x1337, rdi=0x2c2c69f0, orig_rax=0xffffffffffffffff, rip=0x7ffdcb59eb7f, cs=0x33, eflags=0x10206, rsp=0x7ffdcb59eb70, ss=0x2b, fs_base=0x7faa84eb0740, gs_base=0, ds=0, es=0, fs=0, gs=0}) = 0
newfstatat(AT_FDCWD, "/proc/23965/fd/0", {st_mode=S_IFCHR|0620, st_rdev=makedev(0x88, 0x3), ...}, 0) = 0
fstat(0, {st_mode=S_IFCHR|0620, st_rdev=makedev(0x88, 0x1), ...}) = 0
[...]
For the FULL output, you can strace
the whole gdb
execution - it’s much, much bigger, but you’ll see extra bits like where it reads the memory file:
$ strace -o strace.out gdb -p $(pgrep target)
What I actually learned from all this was:
gdb
uses/proc/<pid>/mem
to access memory, andpread()
/pwrite()
to read and edit it (though in the source, they note that there are other techniques that are less efficient for OSes that don’t have that file),gdb
usesptrace(PTRACE_GETREGS, ...)
andptrace(PTRACE_SETREGS, ...)
to change registers, andgdb
seems to useptrace(PTRACE_CONT, ...)
to call the function, which means it’s presumably just movingrip
and calling new code from somewhere, but I never figured out how it actually did that.
So basically, gdb
is doing exactly what the blog explained. Huh!
So what else can we do?
Okay, so I proved that I was wrong (or at least that the person I thought was wrong was actually right). But that’s not a satisfying ending, so let’s see if we can take this a step or two further!
I’m sure there are tons of way to do this, and in particular I’m sure that whatever gdb
is doing is better that what I’m going to do, but let’s look at one technique to load a custom .so
file into a Linux process.
I published the final version on Github, but we’ll build a tool step by step until it gets too complex.
Debugging a remote process
Here’s the simplest possible debugger:
#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/ptrace.h>
#include <sys/wait.h>
int main(int argc, char *argv[]) {
if(argc < 2) {
(stderr, "Usage: %s <pid>\n", argv[0]);
fprintf(1);
exit}
= atol(argv[1]);
pid_t pid
if(ptrace(PTRACE_ATTACH, pid, 0, 0) < 0) {
(stderr, "Error attaching: %d\n", errno);
fprintf(1);
exit}
// Wait for the attach to complete
int status;
(pid, &status, WSTOPPED);
waitpid
("Done!\n");
printf
(PTRACE_DETACH, pid, 0, 0);
ptrace
return 0;
}
It requires no special compile flags, either:
$ gcc -o test test.c $
In general, ptrace(PTRACE_ATTACH, ...)
and ptrace(PTRACE_SEIZE, ...)
are the two ways to attach to an already-running process; from the manpage ptrace(2)
:
PTRACE_ATTACH
Attach to the process specified in pid, making it a tracee of the calling process. The
tracee is sent a SIGSTOP, but will not necessarily have stopped by the completion of this
call; use waitpid(2) to wait for the tracee to stop. See the "Attaching and detaching"
subsection for additional information. (addr and data are ignored.)
Permission to perform a PTRACE_ATTACH is governed by a ptrace access mode PTRACE_MODE_AT‐
TACH_REALCREDS check; see below.
PTRACE_SEIZE (since Linux 3.4)
Attach to the process specified in pid, making it a tracee of the calling process. Un‐
like PTRACE_ATTACH, PTRACE_SEIZE does not stop the process. Group-stops are reported as
PTRACE_EVENT_STOP and WSTOPSIG(status) returns the stop signal. Automatically attached
children stop with PTRACE_EVENT_STOP and WSTOPSIG(status) returns SIGTRAP instead of hav‐
ing SIGSTOP signal delivered to them. execve(2) does not deliver an extra SIGTRAP. Only
a PTRACE_SEIZEd process can accept PTRACE_INTERRUPT and PTRACE_LISTEN commands. The
"seized" behavior just described is inherited by children that are automatically attached
using PTRACE_O_TRACEFORK, PTRACE_O_TRACEVFORK, and PTRACE_O_TRACECLONE. addr must be
zero. data contains a bit mask of ptrace options to activate immediately.
Permission to perform a PTRACE_SEIZE is governed by a ptrace access mode PTRACE_MODE_AT‐ TACH_REALCREDS check; see below.
When you attach with PTRACE_ATTACH
, the process will receive the signal SIGSTOP
(and your process will receive SIGCHLD
). You can use waitpid()
to wait for that signal. Then you can tinker with the process as much as you want before using PTRACE_DETACH
to let the program continue to do its thing. You don’t have to use PTRACE_DETACH
, it’ll auto-detach when your process ends, but it’s polite!
You can use strace
to make sure your debugger is doing what you think it’s doing:
$ strace ./test $(pgrep target)
execve("./test", ["./test", "27290"], 0x7ffed2e67eb8 /* 62 vars */) = 0
[...]
ptrace(PTRACE_ATTACH, 27290) = 0
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_TRAPPED, si_pid=27290, si_uid=1000, si_status=SIGSTOP, si_utime=0, si_stime=1 /* 0.01 s */} ---
wait4(27290, [{WIFSTOPPED(s) && WSTOPSIG(s) == SIGSTOP}], WSTOPPED, NULL) = 27290
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(0x88, 0x1), ...}) = 0
getrandom("\xc5\x6b\xc7\xbe\x43\xde\x6e\xdc", 8, GRND_NONBLOCK) = 8
brk(NULL) = 0x217ca000
brk(0x217eb000) = 0x217eb000
write(1, "Done!\n", 6Done!
) = 6
ptrace(PTRACE_DETACH, 27290, NULL, 0) = 0
exit_group(0) = ? +++ exited with 0 +++
You can’t actually debug the target process while you’re doing this, because you can’t debug the same thing twice.
Reading registers
Let’s read registers from the process. We saw PTRACE_GETREGS
earlier, so I read just enough of the documentation to figure out how to use it, then gave it a shot:
#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/ptrace.h>
#include <sys/wait.h>
#include <sys/user.h>
int main(int argc, char *argv[]) {
if(argc < 2) {
(stderr, "Usage: %s <pid>\n", argv[0]);
fprintf(1);
exit}
= atol(argv[1]);
pid_t pid
if(ptrace(PTRACE_ATTACH, pid, 0, 0) < 0) {
(stderr, "Error attaching: %d\n", errno);
fprintf(1);
exit}
// Wait for the attach to complete
int status;
(pid, &status, WSTOPPED);
waitpid
// Save registers
struct user_regs_struct regs;
(PTRACE_GETREGS, pid, NULL, ®s);
ptrace("rip = %llx\n", regs.rip);
printf
("Done!\n");
printf
(PTRACE_DETACH, pid, 0, 0);
ptrace
return 0;
}
Then run it:
$ ./test $(pgrep target)
rip = 7f51bfa0b457 Done!
Run it again to make sure it doesn’t change (it WILL change if you rerun the application though):
$ ./test $(pgrep target)
rip = 7f51bfa0b457 Done!
We can use gdb
and confirm that IS where it stops (assuming we stop during sleep
, which is almost guaranteed):
$ gdb -p (pgrep target)
[...]
(gdb) print/x $rip $1 = 0x7f51bfa0b457
We can use an identical function call with the same structure and PTRACE_SETREGS
to write to registers. Try changing rip
to an executable memory address containing 0xcc
!
Reading memory
Next, how do we read memory?
We can basically use /proc/<pid>/mem
to access the process’s memory, and then pread
to read it at an arbitrary offset. Of course, we need to know where to read memory from, so we’re just going to use the program counter - rip
- which is the instruction that is about to be executed (using rsp
, which is the process stack, would also work great):
[...]
#include <fcntl.h>
[...]
// Save registers
struct user_regs_struct regs;
(PTRACE_GETREGS, pid, NULL, ®s);
ptrace("rip = %llx\n", regs.rip);
printf
char filename[256];
(filename, 255, "/proc/%d/mem", pid);
snprintf
int mem = open(filename, O_RDWR|O_CLOEXEC);
if(!mem) {
(stderr, "Process doesn't appear to exist, or can't access its memory!\n");
fprintf(1);
exit}
unsigned char buffer[16];
(mem, buffer, 16, regs.rip);
preadint i;
for(i = 0; i < 16; i++) {
("%02x ", buffer[i]);
printf}
("Done!\n");
printf[...]
And run it:
$ make test && ./test $(pgrep target)
cc test.c -o test
rip = 7f51bfa0b457 f7 d8 c3 66 0f 1f 44 00 00 55 48 89 e5 48 83 ec Done!
And verify that gdb
sees the same bytes in the same location:
$ gdb -p (pgrep target)
[...]
(gdb) x/16xb $rip
0x7f51bfa0b457 <__GI___clock_nanosleep+39>: 0xf7 0xd8 0xc3 0x66 0x0f 0x1f 0x44 0x00 0x7f51bfa0b45f <__GI___clock_nanosleep+47>: 0x00 0x55 0x48 0x89 0xe5 0x48 0x83 0xec
Writing memory
Not only can we use the same technique to write to memory, we can even write to memory that the process considers read-only! Memory protections aren’t enforced on the /dev/<pid>/mem
endpoint.
Let’s write a debug breakpoint (0xcc
) to rip
(the process counter), then use PTRACE_CONT
to resume execution:
[...]
unsigned char wbuffer = 0xcc;
(mem, &wbuffer, 1, regs.rip);
pwrite(PTRACE_CONT, pid, NULL, 0);
ptrace
("Done!\n");
printf[...]
Then run it! If it works the way you’d expect - which it does! - the target program will terminate immediately (well, after the sleep
ends) with a SIGTRAP
:
$ make test && ./test $(pgrep target)
cc test.c -o test
rip = 7f51bfa0b457 f7 d8 c3 66 0f 1f 44 00 00 55 48 89 e5 48 83 ec Done!
And verify in target
’s window:
866
867
868 fish: Job 1, './target' terminated by signal SIGTRAP (Trace or breakpoint trap)
You also get to see how long it’s taken me to write this much, but thankfully the timer resets here! :)
Taking control
Now we’re building some tools! We’re almost at the point where we have what’s essentially a debugger.
We can technically put the 0xcc
anywhere in the code where we want to take control - that’s actually how your favorite debugger does software breakpoints - but putting it right at rip
is a simple way to demonstrate.
This time, we’ll do the same thing, but instead of detaching and letting the program crash, we’ll use waitpid
to continue debugging it. waitpid
will wait until SIGTRAP
hits, and catch it. Once we get control back, we validate that it was indeed SIGTRAP
, then fix memory, restore registers, and carry on.
This is a good time to show you the full code again, with extra comments, in case you got lost in my samples above.. this is still the quick demo app, though, not linjector
:
#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/ptrace.h>
#include <sys/wait.h>
#include <sys/user.h>
int main(int argc, char *argv[]) {
if(argc < 2) {
(stderr, "Usage: %s <pid>\n", argv[0]);
fprintf(1);
exit}
= atol(argv[1]);
pid_t pid
// Attach to the process
if(ptrace(PTRACE_ATTACH, pid, 0, 0) < 0) {
(stderr, "Error attaching: %d\n", errno);
fprintf(1);
exit}
// Wait for the attach to complete
int status;
(pid, &status, WSTOPPED);
waitpid
// Save registers
struct user_regs_struct regs;
(PTRACE_GETREGS, pid, NULL, ®s);
ptrace("rip = %llx\n", regs.rip);
printf
// Open the memory
char filename[256];
(filename, 255, "/proc/%d/mem", pid);
snprintfint mem = open(filename, O_RDWR|O_CLOEXEC);
if(!mem) {
(stderr, "Process doesn't appear to exist, or can't access its memory!\n");
fprintf(1);
exit}
// Read 16 bytes @ rip
unsigned char buffer[16];
(mem, buffer, 16, regs.rip);
preadint i;
for(i = 0; i < 16; i++) {
("%02x ", buffer[i]);
printf}
// Overwrite the current byte with a breakpoint
unsigned char wbuffer = 0xcc;
(mem, &wbuffer, 1, regs.rip);
pwrite
// Continue execution
(PTRACE_CONT, pid, NULL, 0);
ptrace
// Wait for the process to stop
(pid, &status, WSTOPPED);
waitpid
// Make sure it's a SIGTRAP and not like SIGSEGV or something
if(WIFSTOPPED(status) && WSTOPSIG(status) == SIGTRAP) {
("\nSIGTRAP!\n");
printf} else {
("\nSome other signal!\n");
printf(1);
exit}
// Fix the memory using the first byte of what we read
(mem, buffer, 1, regs.rip);
pwrite
// Put the registers back to what they were (including rip)
(PTRACE_SETREGS, pid, NULL, ®s);
ptrace("rip = %llx\n", regs.rip);
printf
// Continue execution
(PTRACE_CONT, pid, NULL, 0);
ptrace
("Done!\n");
printf
(PTRACE_DETACH, pid, 0, 0);
ptrace
return 0;
}
If you run that (don’t forget to restart target
if it’s crashed!), you’ll see no special output:
$ make test && ./test $(pgrep target)
cc test.c -o test
rip = 7ffb9afdd457
f7 d8 c3 66 0f 1f 44 00 00 55 48 89 e5 48 83 ec
SIGTRAP!
rip = 7ffb9afdd458 Done!
But also, the target doesn’t crash:
[...]
3
4
5 [...]
At this point, we’ve basically made a debugger with a software breakpoint! How cool is that?
Also, once again you can start calculating how quickly I write next time you see target
’s output!
Run arbitrary code
At this point, we can effectively overwrite with any shellcode we want (defining “shellcode” as a self-contained blob of machine code which usually sets up arguments and calls syscall
to do things). You can have msfvenom
generate something that spawns a shell for example, but that’s kinda pointless because you can already run a shell on the computer.
Instead, we want to load a .so
file, which means we need to call dlload()
. That’s a libc function, not a syscall, so we can’t just do standard shellcode - we need to access libc functions.
I’m sure there are other ways to do this, but I opted for:
- Find the address of
libc.so.6
in your process - Find the address of
libc.so.6
in the target process - Find the address of
dlsym
in your process - Do math
- Write a blob of code to
rip
, with a debug breakpoint at the end - Let the process continue executing until it hits the breakpoint
- Fix the memory, reset the registers back to their original states
Once you have access to dlsym()
(the equivalent of GetProcAddress()
), you can find anything else you want!
In retrospect, I realize we could have just gotten the address of dlload()
directly, since we don’t need any other symbols, but being able to call other functions is handy so I’m just leaving it like this!
Finding libc
If you’re already running code on the system, you can parse /proc/<pid>/maps
to figure out where things are loaded:
$ cat /proc/$(pgrep target)/maps | grep 'libc\.so'
7f281913f000-7f2819167000 r--p 00000000 00:21 3141132 /usr/lib64/libc.so.6
7f2819167000-7f28192d5000 r-xp 00028000 00:21 3141132 /usr/lib64/libc.so.6
7f28192d5000-7f2819323000 r--p 00196000 00:21 3141132 /usr/lib64/libc.so.6
7f2819323000-7f2819327000 r--p 001e4000 00:21 3141132 /usr/lib64/libc.so.6 7f2819327000-7f2819329000 rw-p 001e8000 00:21 3141132 /usr/lib64/libc.so.6
I briefly considered writing my own parser, but thankfully found that somebody already did it, and they write code kinda-sorta similarly to me so it fit rather nicely. It also has a permissive license, which makes it easy to use in my demo project. Perfect!
I wrote a little function that can find libc.so.6
in any process (use -1
for the current process) based on their example code:
#include "proc_maps_parser/pmparser.h"
unsigned long long find_libc(int pid) {
= {0};
procmaps_iterator maps_iter = PROCMAPS_SUCCESS;
procmaps_error_t parser_err
= pmparser_parse(pid, &maps_iter);
parser_err if (parser_err) {
(stderr, "Couldn't parse /proc/%d/maps (error=%d)\n", pid, (int)parser_err);
fprintf(1);
exit}
// iterate over areas
*mem_region = NULL;
procmaps_struct while ((mem_region = pmparser_next(&maps_iter)) != NULL) {
if(mem_region->offset == 0 && mem_region->map_type == PROCMAPS_MAP_FILE && !strcmp(mem_region->pathname, "/usr/lib64/libc.so.6")) {
// Don't return the free'd variable! I know how to C
void *result = mem_region->addr_start;
(&maps_iter);
pmparser_free
return (unsigned long long) result;
}
}
(&maps_iter);
pmparser_free
(stderr, "Couldn't locate the start of /usr/lib64/libc.so.6 in the remote process!\n");
fprintf(1);
exit}
Then I used that to find the address of libc in both my process and the target process (I also got rid of all the write-0xcc-to-the-process code for now):
#include <fcntl.h>
[...]
// Read 16 bytes @ rip
unsigned char buffer[16];
(mem, buffer, 16, regs.rip);
preadint i;
for(i = 0; i < 16; i++) {
("%02x ", buffer[i]);
printf}
("\n");
printf
long long local_libc = find_libc(-1);
long long remote_libc = find_libc(pid);
("local libc = %llx, remote libc = %llx\n", local_libc, remote_libc);
printf
// Continue execution
(PTRACE_CONT, pid, NULL, 0);
ptrace
("Done!\n"); printf
And now when we compile, we need to link in the pmparser.c
file as well:
$ gcc -Wall -o test test.c proc_maps_parser/pmparser.c && ./test $(pgrep target)
rip = 7f640d142457
f7 d8 c3 66 0f 1f 44 00 00 55 48 89 e5 48 83 ec
local libc = 7f3769a30000, remote libc = 7f640d05e000 Done!
You’ll note that they’re loaded in different places - that’s ASLR (address space layout randomization) at work.
Finding dlsym
Now the easy part - we use dlsym
to get the address of dlsym
in our memory space, subtract our libc address (to get the offset), then add their libc address (to get the absolute address in the remote process):
#include <dlfcn.h>
[...]
unsigned long long find_dlsym(int remote_pid) {
unsigned long long local_libc = find_libc(-1);
unsigned long long remote_libc = find_libc(remote_pid);
("local libc = %llx, remote libc = %llx\n", local_libc, remote_libc);
printf
unsigned long long local_dlsym = (unsigned long long) dlsym(RTLD_DEFAULT, "dlsym");
unsigned long long dlsym_offset = local_dlsym - local_libc;
unsigned long long remote_dlsym = dlsym_offset + remote_libc;
("local dlsym = %llx, offset = %llx, remote dlsym = %llx\n", local_dlsym, dlsym_offset, remote_dlsym);
printf
return remote_dlsym;
}
And then we can compile/run it:
$ gcc -o test test.c proc_maps_parser/pmparser.c && ./test $(pgrep target)
rip = 7f2459188457
local libc = 7f1cdce13000, remote libc = 7f24590a4000 local dlsym = 7f1cdcea6d50, offset = 93d50, remote dlsym = 7f2459137d50
We can verify that address with a debugger (just make sure you don’t restart target
, because it’ll change):
$ gdb -p $(pgrep target)
[...]
(gdb) x/i dlsym 0x7f2459137d50 <___dlsym>: endbr64
It does indeed match!
Writing a trampoline
I’m quite sure there’s a better way to do this, but I wrote what’s essentially shellcode with a couple templates to populate, and which ends with a debug breakpoint:
64
bits
; Make room on the stack, otherwise we will accidentally overwrite important stuff
; (rsp will be restored afterwards)
sub rsp, 1000
; Save these args to other non-volatile registers
mov rbx, 0x4141414141414141 ; dlsym address
call get_sofile
db "BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB",0 ; .so file
get_sofile:
pop rbp
call get_dlopen
db "dlopen",0
get_dlopen:
pop rsi
mov rdi, 0
call rbx ; dlsym("dlopen", RTLD_DEFAULT)
mov rdi, rbp
mov rsi, 0x102 ; RTLD_NOW|RTLD_GLOBAL
call rax ; dlopen(<.so file>, RTLD_NOW|RTLD_GLOBAL)
; Indicate that we've finished
db 0xcc
Then assemble it:
$ nasm -o trampoline.bin trampoline.asm
Putting it all together
Starting here, I’m literally using code from linjector instead of the test program - it got to that complexity level!
I fill in those templates using memmem()
to find the offset and them memcpy()
to overwrite the address of dlsym
(and the string to load) at those offsets:
FILE *f = fopen("trampoline.bin", "rb");
if(!f) {
(stderr, "Couldn't read trampoline.bin!\n");
fprintf(1);
exit}
// Get the size
(f, 0L, SEEK_END);
fseeklong size = ftell(f);
(f);
rewind
// Allocate memory + read the file
uint8_t *trampoline = (uint8_t*) malloc(size);
(trampoline, 1, size, f);
fread(f);
fclose
// Find and replace our templates
*((uint64_t*)(memmem(trampoline, size, "AAAAAAAA", 8))) = find_dlsym(pid);
(memmem(trampoline, size, "BBBBBBBBBBBBBBBB", 16), argv[2], 128); memcpy
I wrote a quick function to swap memory (read memory and then overwrite it):
void swap_memory(int mem, uint64_t address, uint8_t *new_value, uint8_t *old_value, size_t length) {
if(old_value) {
("Reading %ld bytes from 0x%lx...\n", length, address);
printfif(pread(mem, old_value, length, address) != length) {
(stderr, "Error reading %ld bytes from 0x%lx: %d\n", length, address, errno);
fprintf(1);
exit}
}
("Writing %ld bytes to 0x%lx...\n", length, address);
printfif(pwrite(mem, new_value, length, address) != length) {
(stderr, "Error writing %ld bytes to 0x%lx: %d\n", length, address, errno);
fprintf(1);
exit}
}
So using that, I swap out the memory at rip
with my shellcode and then continue execution:
// Swap the current instruction with a breakpoint
uint8_t *backup = (uint8_t*) malloc(size);
// Replace the memory at RIP with the trampoline
(mem, regs.rip, trampoline, backup, size);
swap_memory
// Continue the process
(PTRACE_CONT, pid, NULL, 0); ptrace
When the code completes, it’ll swap everything back to the way it was:
// Wait for the breakpoint
int status;
(pid, &status, WSTOPPED);
waitpid(status);
print_signal
struct user_regs_struct regs2;
(PTRACE_GETREGS, pid, NULL, ®s2);
ptrace("rip (after) = %llx\n", regs2.rip);
printf
// Swap memory back
(mem, regs.rip, backup, NULL, size);
swap_memory
// Restore the registers
(PTRACE_SETREGS, pid, NULL, ®s);
ptrace
// Detach before exiting
(PTRACE_DETACH, pid, 0, 0); ptrace
And now it’s loaded!
Here’s what it looks like with the .so file I created (NOTE that if target
is running from a different folder, the path to mysolib.so
has to be relative to the TARGET’s directory, not the injector’s, because it’s running in the context of target
(including the working directory):
$ ./linjector $(pgrep target) "$PWD/mysolib.so "
Stopped (signal) (19)
rip = 7f640d142457
Reading 183 bytes from 0x7f640d142457...
Writing 183 bytes to 0x7f640d142457...
Trace/breakpoint trap (5)
rip (after) = 7f640d14250e
Writing 183 bytes to 0x7f640d142457... Done!
And in the window for target
, where you once again learn how fast I’m typing this:
636
637
638
639
640
Parent = 30734
Test!
641
642
Test!
643
Test!
644
Test!
645
Test!
646
Test! ^CGoodbye parent!
Now we have our own process and the target process running alongside each other. Friends forever!!
Funny note
I mentioned at the start that I’d explain why I’m using system("sleep 1")
instead of sleep(1)
.
Well, because the target process also sleeps, when we do the injection we’re actually overwriting code inside the sleep()
function. That means if we call sleep()
again before we fix the memory (in the target process), it’ll hit the trampoline again and bad things will ensue. :)
Conclusion
So yeah, using linjector
will let you load an arbitrary .so
file into an arbitrary process, provided you have permissions.
I wrote this in a couple afternoons as a proof of concept, not as a production tool, so YMMV.
Have fun!