CS3013 Project 2

Hidden Processes

Due date: Wednesday, September 21st by 11:59pm

Index

Overview
/proc
System Calls
Boot Time
Loadable Modules
Hints
Hand In

Overview

The ps and top programs give a snapshot of currently running processes (try running ps auxww to list all processes). Keen-eyed system administrators will use ps and top to monitor system performance and detect the processes of system intruders. Conversely, computer crackers often try to hide their processes from the prying eyes of system administrators. Understanding how this is done can help you understand how to beat computer attackers, as well as provide a better understanding of operating systems.

There are several steps you must accomplish in order to complete this project:

Part 1: Hide System Calls
1. You must add new system calls that allow hiding and unhiding of specific processes. Here's how a process would use your new calls to hide and then un-hide itself:
```
       main() {
         ....
         /* time to go undercover! */
	 hide(getpid());
	 ...
         /* nah, nah ... you can't see me (via top, say) */
         ...
	 unhide(getpid());
         /* back in the open */
	 ...
       }
    
```
2. You must design and implement data structures that keep track of all processes that are hidden.
3. You must add your source code to the Linux build process.
4. You must modify the /proc file system to use your data structures to not display process with the ps command.
Part 2: (Un)Hide Module
1. You must design and develop a /proc module that allows monitoring of all processes, hidden or not.

In case you are thinking about it ... you must do this by modifying the operating system. For example, you are not allowed to modify the ps command!

`/proc`

In Linux, you can access many values internal to the kernel via the /proc file system. Originally designed to allow easy access to information about processes (hence the name), it is now used by every bit of the kernel which has something interesting to report, such as /proc/modules which has the list of modules and /proc/meminfo which has memory usage statistics. The directory structure and files under /proc are not "real" in the sense that they are persistent data on disk. Rather, they are generated by the operating system dynamically as they are accessed.

The part of the /proc file system we are interested in for this project provides information about running processes. Each process has its own directory, based in process id, under /proc. An example that shows some of the flexibility of /proc files is the entry named "self" that is a soft-link to the current process (try the command ls -l /proc/self. Is the "self" process the shell or the "ls" command?)

The ps and top commands get their information on currently running processes from the /proc entries. Specifically, they look at the stat, statm and status files in each /proc/pid directory. If a currently running process does not have its pid reported as a sub-directory under /proc, it will be "hidden" and will not show up when the ps or top commands are run.

The source code for the /proc file system is located in linux/fs/proc/. In particular, the file root.c has information on the "files" and directories that appear in the root of the proc file system(/proc). You might take a look at the function get_pid_list() in that same file.

Process control blocks in Linux are task_struct structures. The list of processes is kept in a circular linked list of task_structs. The /proc file system makes a scan of the task list to generate the /proc/pid entries on demand. You should be able to design a simple modification to this code to allow process to not be detected. Your system must be able to keep track of which processes are hidden.

You may want to reference include/linux/sched.h to see how process id information is stored in the struct task_struct. You do not have to, but if you do add to struct task_struct, you also need to change the INIT_TASK macro (also in sched.h) to be sure the initial values are in place.

System Calls

A system call is the standard way an OS service is exported to a user program. A system call cannot be called directly. Instead, they are called indirectly via an interrupt and looked up in an interrupt table. Thus, when you define a new system call you insert a new entry in this table. You do this by editing the file linux/arch/i386/kernel/entry.S. Inside, you should see lines like:

  .data
  ENTRY(sys_call_table)
    .long SYMBOL_NAME(sys_ni_call)           /* 0 */
    .long SYMBOL_NAME(sys_exit)
    .long SYMBOL_NAME(sys_fork)
    ...
    .long SYMBOL_NAME(sys_vfork)             /* 190 */

After the "sys_vfork" line, you will add your entries for the system calls you need (see below), with the words "sys_" prepended. For example, you might add the line:

    .long SYMBOL_NAME(sys_hide)             /* 191 */

You also need to generate the proper entry in the system call table so that an ordinary user program invoke your "stub" system call, it goes to the kernel implementation. You do this by editing the file linux/include/asm/unistd.h where you will find lines like:

  /*
   * This file contains the system call numbers.
   */

  #define __NR_exit                 1
  #define __NR_fork                 2
  ...
  #define __NR_vfork              190

You should add #defines for your new system calls at the end, with the prefix "__NR_" in front of it. For example, you might add the line:

  #define __NR_hide              191

It will be easiest to have the system call definitions in your own source code files, say hide.c and hide.h. You will need to modify the build process (edit a Makefile) so that your code gets compiled and linked in properly. A typical place to do this will be in a subdirectory under the main Linux tree. You should create a Makefile so the main Linux "make" process will work. See linux/ipc/Makefile for an example. You would then modify the main linux/Makefile also, adding an entry for your new code. When adding a sub-dir Makefile for your hide build, note, that the TARGET is the library the make process will try to build. You need to have the TARGET different than the name of the objects for it to link properly. For example:

  O_TARGET := hide-lib.o
  O_OBJS   := hide.o

In order to be linked properly, your system calls need the word "asmlinkage" prepended to their function header and "sys_" prepended to the name. For example, you would have:

  asmlinkage int sys_hide(int pid) {
    /* do hide stuff */
  }

as one of the definitions of your hide system calls. You will have to have #include <linux/linkage.h> at the top of your file so the compiler will recognize the word "asmlinkage".

The user program will need a "stub" that sets up the call to the sys_hide() call you write. The stub can be automatically generated so that a user program can use your system call. There are some macros defined for this in <linux/unistd.h> . The format is "_syscallN(return type, function name, arg1 type, arg1 name ...)" where "N" is the number of parameters. For example, you might have the line:

  _syscall1(int, hide, int, pid);

to generate the stub (in this case, the 1 is for 1 argument,

int
pid

). Note, that your call to generate the stub (as above) should not go in unistd.h, rather make a user header file, say hide-user.h, and put it in there. You need to #include <linux/unistd.h> in hide-user.h to make this work. A user program could then just call hide() as they do other system calls. A similar discussion holds for unhide().

You will need proper error checking and returns for your system calls. You can lookup reasonable error values in linux/include/linux/errno.h and linux/include/asm/errno.h.

Boot Time

In case you need to initialize global variables for hiding, you will likely want to do this at system bootup time. In this case, you need to write your own init function (in your source file) with return type void followed by a keyword __init. Then, you need to add two lines to linux/init/main.c to make your kernel invoke your initialization function at bootup. For example:

...
/* in a header file */
void __init hide_init(void);

...
/* in a C file */
void __init hide_init(void) {
    /* initialize global variables used to keep track of hidden procs */
}

linux/init/main.c

...

/* around line 89 */
extern void hide_init(void);

...

/* Around line 1352 within "asmlinkage void __init start_kernel(void)" */
hide_init();

Loadable Modules

In part 2, the idea is to create a tool that you control the source for that, once loaded, will by-pass a crackers attempt to hide. You will create a module, accessible via /proc/scan that displays information on all processes, whether hidden or not. Your module should display the name, process id and user id for each process in the system (whether hidden or not).

A Linux module is a set of functions and data types that can be compiled independently of the kernel, and then loaded dynamically after the kernel is booted. Typically, a kernel has several modules loaded at a time (type /sbin/lsmod to see what modules your OS has loaded). A module binds its interface either as a device (for example, a device driver) or to the /proc filesystem. We will do the latter.

The minimum module interface is to have two functions that the kernel can call when the module is loaded via init_module() and unloaded via cleanup_module(). The init_module() code performs the same function as various init() routines do upon boot time (such as the hide_init() call discussed above), except that a module does this initialization when loaded by /sbin/insmod instead of at boot time. Thus, a typical module program skeleton looks like:

   #include <linux/kernel.h>
   #include <linux/module.h>
   ...
   int init_module() {
      /* code to init the module */
   }
   ...
   void cleanup_module() {
      /* code to close the module */
   }

A fully working, slightly more complete example can be found in hello.tar (use tar xvf to extract).

A module is compiled in user space, but needs some special flags, namely __KERNEL__ and MODULE. For example, to compile a module named hello.c you would do:

	gcc -c -Wall -D__KERNEL__ -DMODULE hello.c

You would then have a compiled module, hello.o in the above example, that can be loaded after boot time. To do so, you type sudo /sbin/insmod hello.o, with hello.o being the module name. To unload the module, you type sudo /sbin/rmmod hello (note the absence of the .o this time). At any time, you can run /sbin/lsmod to see what modules are running. You should be able to see a hello after doing the insmod command above.

Your module will need to register itself with the /proc system under the name "scan". See the biteMe.tar (use tar xvf to extract) example for how /proc registration is done. Your read() call will create and return a buffer containing the id's of all processes.

Because modules are designed and implemented independently of the kernel, they cannot access kernel data structures and functions in the normal fashion (ie- by being linked in). Instead, they can only access those that have been exported explicitly. You can see the explorted symbols in /proc/kallsyms. The file kernel/ksyms.c contains a list of the kernel symbols that are exported. If you want to access a non-exported function you will need to modify ksyms.c and export it yourself. You will then need to recompile your kernel and reboot in order to use your newly exported function in a module.

Hints

Proceed carefully in steps: system calls, hide functionality, hide application, hide module. Finish each step before proceeding. That said, you can debug each step independent of the other steps. Doing so will make it easier to put them together.

For the system calls, create some "empty" calls for your new system calls and get them in place. Don't worry about what they do at first, just print in a printk() statement so you can see that they are working. Write a simple application that tests each stub.

You can allocate memory for your hiding data structures dynamically or statically. If dynamic, malloc() (and its variants) will not work in the OS since it is a user-level function. Instead, you should use kmalloc(), which works very similarly to malloc() but can be done inside the kernel. The complement of kmalloc() is kfree(). You need to have #include <linux/malloc.h> in order to use them. I recommend statically declared arrays, however.

Remember, that accessing global system variables (such as your new hide structures) can lead to race conditions! When you don't want the OS to be interrupted, you can disable interrupts using the function cli(), which stands for "CLear Interupts flag." To re-enable, use the function sti(), for "SeT the Interrupts flag." You need to use #include <asm/system.h> with both instructions. Oh, and also note these functions can only be called in the OS. They are not system calls (remember, we would not want the user program to be able to enable and disable interrupts at will!).

In case you are having troubles adding system calls, refer to the following "quick and dirty" steps. Try a "make dep" before you do "make bzImage". Create a "hide.h" that looks something like:

  #ifndef __I386_HIDE_H__
  #define __I386_HIDE_H__
    
  #include <linux/unistd.h>
  #include <linux/linkage.h>
  ...

  #endif

Your hide.c should look like:

  #include <asm/hide.h>
  /* or #include "hide.h" if you don't put your hide.h
     in the /usr/src/linux/include/asm directory */
    
  asmlinkage int sys_hide(int pid) {
  ...
  }
  ...

Your user code that will make the system calls will need to include a "hide-user.h" file something with something like:

  #ifndef __I386_HIDE_USR_H__
  #define __I386_HIDE_USR_H__

  #include <sys/syscall.h>
  #include <linux/unistd.h>

  _syscall1(int, hide, int, key)
  ...
  #endif

Make appropriate entries in entry.S and unistd.h as indicated.

While modules are a convenient way to write kernel code because they are written and compiled in user space, when executing, a module is still in privileged mode. In other words, a segmentation fault (or something equally bad) can crash the OS. If you see a stack trace in /var/log/messages you may still be able to run stuff (ie- other shells respond properly) but your modules may work inconsistently and you don't even know why. In this case, reboot and continue developing.

If you see any error messages like "unable to resolve" when you are loading your module (with the insmod command), it is quite possible that you are using a kernel function that is not exported. See Loadable Modules for details on how to export kernel data. You can find a for_each_task() macro in linux/include/linux/sched.h that uses it as an example.

Here is a list of #includes that may be useful:

  #include <linux/kernel.h>   /* for module support */
  #include <linux/module.h>   /* for module support */
  #include <linux/fs.h>       /* for struct file_operations */
  #include <linux/types.h>    /* for ssize_t */
  #include <linux/proc_fs.h>  /* for proc file system stuff */
  #include <linux/init.h>     /* for init stuff */
  #include <linux/sched.h>    /* for struct_struct */

To print the information needed by the scan module, you can see the information you need in the status file for each process. You can look for the code for this under linux/fs/proc/ to see what you need to add.

The way you can often discover what to do with an unknown function is not by reading documentation, but by looking at sample code which uses it (combinations of "find" and "grep" often work). If something is unknown about the kernel, this is usually the way to go. A great advantage of Linux is that we have the kernel source code - so use it!

Hand In

You will be graded on:

hiding support (70%)
scan module (25%)
answers to questions (5%)

Code must be fully functional and robust.

You must hand in the following:

All modified source code files for your solution (for example, the entire entry.S and unistd.h files).
Your source code files for your hide system calls and modules
Instructions on how to compile your code into the kernel (and a Makefile)
Instructions on how to compile your module code (or provide a Makefile)
Brief answers to the following questions (in a separate file):
1. If you execute /bin/ls /proc/self, does "self" refer to the shell or the "ls" process? How can you tell and how did you find this out?
2. If a processes is hidden so that top and ps don't display it, how else might you detect it is there (not counting the /proc/scan file)? Briefly describe at least 2 ways.
3. What are some of the advantages of kernel programming with modules? What are some of the disadvantages?

The turnin (/cs/bin/turnin) for proj2 is "proj2". When turnin, also include file "group.txt" which contains the following:

        group_name
        login_name1  last_name1, first_name1
        login_name2  last_name2, first_name2
        ...

Also, before you use turnin tar up (with gzip) your files. For example:

        mkdir proj2
        cp * proj2 /* copy all your files to submit to proj2 directory */
        tar -czf proj2.tgz proj2

then copy your files from your Fossil client to your CCC account:

        scp proj2.tgz login_name@ccc:~/  /* will ask your ccc passwd */
        ssh login_name@ccc               /* will ask your ccc passwd */
        /cs/bin/turnin submit cs3013 proj2 proj2.tgz

Return to 3013 Home Page

Send all project questions to the cs3013-staff at cs.wpi.edu mailing list.

Send all Fossil administrative questions to the fossil at cs.wpi.edu mailing list.