Operating System: Xv6 on RISC-V

7 minute read

Published:

Run Xv6(like Unix) using RISC-V(Architecture) on QEMU(Emulator).

# install required packages
sudo apt-get install flex bison git build-essential gdb-multiarch qemu-system-misc gcc-riscv64-linux-gnu binutils-riscv64-linux-gnu opensbi expect

Build and run xv6

make qemu

Debug xv6

First, can create a .gdbinit file in the home directory.

echo "set auto-load safe-path /" > ~/.gdbinit

Then, run the following command to start gdb.

# run this in one terminal
make qemu-gdb

# run this in another terminal
make run-gdb

# show the source code in a terminal
layout src

# show the backtrace
bt # backtrace

# set breakpoint in the user space
symbol-file user/ls.o
b user/ls.c:10

UNIX System Overview

Overview of a computer system

os-structure

  • The operating system (OS) is a software that manages the computer hardware and software resources.
  • The OS allows multiple users to run programs
  • The OS supports multitasking:
    • Multiple processes can run simultaneously.
    • A process is an executing instance of a program.
    • A process consists of multiple execution units called threads.

UNIX Architecture

os-structure

  • An OS consists of the kernel and other software including system utilities,applications, shells, libraries of common functions, etc.
  • The shell is a special application that provides an interface for running other applications.
  • The interface to the kernel is a layer of software called the system calls.
  • Libraries of common functions are built on top of the system call interface.

User

  • When we log in to a UNIX system, the system looks up our login name in its password file, usually the file /etc/passwd.
    • An entry in the password file, contains seven colon-separated fields: e.g.
      sar:x:205:105:Stephen Rago:/home/sar:/bin/ksh
      1. Login name
      2. Encrypted password
      3. User ID
      4. Group ID
      5. Comment field
      6. Home directory
      7. Login shell
  • Group ID: Groups collect users together into projects or departments. This allows the sharing of resources, such as files, among members of the same group.
  • Shells: A shell is a command-line interpreter that reads user input and executes commands. The system knows which shell to execute for us based on the final field in our entry in the password file.

Files and directories: file system

  • The UNIX file system is a hierarchical arrangement of directories and files. Everything starts in the directory called root, whose name is the single character /. file-system

Input and Output: file descriptors

  • File descriptors are small non-negative integers that the kernel uses to identify the files accessed by a process.
  • Whenever it opens an existing file or creates a new file, the kernel returns a file descriptor that we use when we want to read or write the file.

Programs and Processes

  • A program is an executable file residing on disk in a directory. It is read into memory and is executed by the kernel as a result of one of the seven exec functions.
  • A process is an executing instance of a program.
  • The UNIX System guarantees that every process has a unique numeric identifier called the process ID. The process ID is always a non-negative integer.
  • PID returns the unique process ID

Process in Linux

ps aux
  • ps: is the process status command.
  • a: displays information about other users’ processes as well as your own.
  • u: displays the processes belonging to the specified usernames.
  • x: includes processes that do not have a controlling terminal.

Process Control

There are three primary functions for process control:

  • fork: creates a new process by duplicating the calling process.
  • exec: replaces the calling process with a new program.
  • waitpid: causes the calling process to wait until one of its child processes exits or a signal is received.
  • Note: The exec function has seven variants, but we often refer to them collectively as simply the exec function.

Threads and Thread IDs

  • A process has only one thread of control. One set of machine instructions executing at a time.
  • Some problems are easier to solve when more than one thread of control can operate on different parts of the problem.
  • Multiple threads of control can exploit the parallelism possible on multiprocessor systems.
  • All threads within a process share the same address space, file descriptors, stacks, and process-related attributes.
  • Each thread executes on its own stack, although any thread can access the stacks of other threads in the same process.
  • Like processes, threads are identified by IDs. Thread IDs, however, are local to a process.

Error Handling

  • When an error occurs in one of the UNIX System functions, a negative value is often returned, and the integer errno is usually set to a value that tells why.
  • For example, the open function returns either a non-negative file descriptor if all is OK or −1 if an error occurs.
  • An error from open has about 15 possible errno values, such as file doesn’t exist, permission problem, and so on.
  • Some functions use a convention other than returning a negative value. For example, most functions that return a pointer to an object return a null pointer to indicate an error.

Signals

  • Signals notify a process that an event occurred.
    • e.g., if a process divides by zero, the signal SIGFPE (floating-point exception) is sent to the process.
  • The process has three choices for dealing with the signal.
    1. Ignore the signal. This option isn’t recommended for signals that denote a hardware exception, such as dividing by zero or referencing memory outside the address space of the process, as the results are undefined.
    2. Let the default action occur. For a divide-by-zero condition, the default is to terminate the process.
    3. Provide a function that is called when the signal occurs (this is called catching the signal).
  • Many conditions generate signals.
    • Two terminal keys, called the interrupt key: Often the DELETE key or Control-C, and the quit key, often Control-backslash are used to interrupt the currently running process.
    • Another way to generate a signal is by calling the kill function. We can call this function from a process to send a signal to another process.
  • Limitations: we have to be the owner of the other process (or the superuser) to send it a signal.

System Calls and Library Functions

  • All operating systems provide service points through which programs request services from the kernel.
  • UNIX Systems provide a limited number of entry points directly into the kernel called system calls.
  • Consider the memory allocation function malloc. The UNIX system call that handles memory allocation is sbrk.
    1. The sbrk system call in the kernel allocates an additional chunk of space on behalf of the process.
    2. The malloc library function manages this space from user level.

Privileges Modes(Lower)

User mode:

  • Access to memory is limited
  • Cannot execute some instructions
  • Cannot disable interrupts
  • Cannot change arbitrary processor state
  • Cannot access memory management units Transition from user mode to system mode can only happen via well defined entry points, e.g. (system calls)

Privileges Modes(Higher)

System mode:

  • Can execute any instruction
  • Can access any memory locations, e.g., accessing hardware devices
  • Can enable and disable interrupts
  • Can change privileged processor state
  • Can access memory management units
  • Can modify registers for various descriptor tables

Linux and Xv6 Structure

Linux Kernel

The kernel is the central component of a Linux operating system that manages the computer and its hardware.
The Linux kernel is the intermediary software layer situated between the hardware and user applications.

Structure of the Linux Kernel

linux-kernel

Linux Directory Structure

linux-directory

Xv6 Code Structure

  • kernel: Core OS functionality
  • user: User space programs
  • mkfs: File system creation
  • conf: Booting and build configuration

kernel/ - all the kernel source code

  • kernel/main.c is the entry point of the kernel
  • kernel/syscall.c contains all the system call implementation
  • kernel/kernel is the final kernel binary after building

user/ - all the user programs

  • user/usys.pl generates the system call interfaces
  • user/usys.s is the generated assembly code for system calls

Jump to next Part2:Syscall/Address translation