ptrace
is an incredibly useful tool in the Linux kernel; it permits user-space
applications to inspect and interact with an application’s system calls. This
model is used in tools such as strace
and even Google’s gvisor
to provide
different functionality. By intercepting system calls one can create a secure
sandbox or even maliciously interfere with a process; like many things it presents
a double-edged sword. Because we have been building a product that uses system
tracing I’ve been spending more time explaining the wonders of ptrace to folks
so I thought I’d put together a quick into on how it works.
How does ptrace work?
Using ptrace involves three actors: the monitor, the process being monitored,
and the Linux kernel. The monitor initializes the ptrace subsystem by telling
the kernel to trace itself and then uses exec
to replace itself with a child
process.
Example code
Below is an example of how to use ptrace
to see what the first system call
ls
makes.
#include <sys/ptrace.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <unistd.h>
#include <linux/user.h>
int main()
{ pid_t child;
long orig_eax;
child = fork();
if(child == 0) {
ptrace(PTRACE_TRACEME, 0, NULL, NULL);
execl("/bin/ls", "ls", NULL);
}
else {
wait(NULL);
orig_eax = ptrace(PTRACE_PEEKUSER,
child, 4 * ORIG_EAX,
NULL);
printf("The child made a "
"system call %ld\n", orig_eax);
ptrace(PTRACE_CONT, child, NULL, NULL);
}
return 0;
}
Breaking it down, we can see that the monitor program first calls fork()
, in
which the parent process blocks until receiving a signal via wait
. The child
process replaces itself via execl
, but before it does that the child process
signals to the kernel that it wants to be traced using the ptrace
facilities.
The ptrace
function takes a few arguments, a ptrace command (such as
PTRACE_TRACEME
), the PID of the process in question, and then each command can
take a few arguments.
When ls
makes its first syscall, the ptrace facility will signal the parent
process, causing wait
to un-block and then call ptrace’s PTRACE_PEEKUSER
which extracts data from the system call being traced. In this case we peek at
the EAX register which contains the system call identifier (see this table for a
complete mapping of system calls and their identifiers along with arguments).
At this point, the child is still stopped so the parent needs to send a
PTRACE_CONT
command telling ptrace to permit the system call in ls
to
execute. If you were to want to modify the return value of the system call, you
would do that here (which we will look at later).