An introduction to ptrace

ptrace is an incredibly useful tool in the Linux kernel; it permits user-space applications to inspect and interact with an application’s system calls. This model is used in tools such as strace and even Google’s gvisor to provide different functionality. By intercepting system calls one can create a secure sandbox or even maliciously interfere with a process; like many things it presents a double-edged sword. Because we have been building a product that uses system tracing I’ve been spending more time explaining the wonders of ptrace to folks so I thought I’d put together a quick into on how it works.

How does ptrace work?

Using ptrace involves three actors: the monitor, the process being monitored, and the Linux kernel. The monitor initializes the ptrace subsystem by telling the kernel to trace itself and then uses exec to replace itself with a child process.

sequenceDiagram Monitor->>Kernel: Fork and trace child Child->>Kernel: System call Kernel->>Child: Signal STOP Kernel->>Monitor: Signal with info about syscall Monitor->>Kernel: Interact with syscall Monitor->>Kernel: Continue child Kernel->>Child: Continue

Example code

Below is an example of how to use ptrace to see what the first system call ls makes.

#include <sys/ptrace.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <unistd.h>
#include <linux/user.h>  

int main()
{   pid_t child;
    long orig_eax;
    child = fork();
    if(child == 0) {
        ptrace(PTRACE_TRACEME, 0, NULL, NULL);
        execl("/bin/ls", "ls", NULL);
    }
    else {
        wait(NULL);
        orig_eax = ptrace(PTRACE_PEEKUSER,
                          child, 4 * ORIG_EAX,
                          NULL);
        printf("The child made a "
               "system call %ld\n", orig_eax);
        ptrace(PTRACE_CONT, child, NULL, NULL);
    }
    return 0;
}

Breaking it down, we can see that the monitor program first calls fork(), in which the parent process blocks until receiving a signal via wait. The child process replaces itself via execl, but before it does that the child process signals to the kernel that it wants to be traced using the ptrace facilities. The ptrace function takes a few arguments, a ptrace command (such as PTRACE_TRACEME), the PID of the process in question, and then each command can take a few arguments.

When ls makes its first syscall, the ptrace facility will signal the parent process, causing wait to un-block and then call ptrace’s PTRACE_PEEKUSER which extracts data from the system call being traced. In this case we peek at the EAX register which contains the system call identifier (see this table for a complete mapping of system calls and their identifiers along with arguments).

At this point, the child is still stopped so the parent needs to send a PTRACE_CONT command telling ptrace to permit the system call in ls to execute. If you were to want to modify the return value of the system call, you would do that here (which we will look at later).