Lab: Xv6 and Unix utilities

Boot Xv6(easy)

Ubuntu20.04 环境下，执行命令：

shell

// 下载工具
sudo apt-get install git build-essential gdb-multiarch qemu-system-misc gcc-riscv64-linux-gnu binutils-riscv64-linux-gnu
// 下载 xv6
git clone git://g.csail.mit.edu/xv6-labs-2020
cd xv6-labs-2020
git checkout util

// 这时就可以编译运行 xv6 了
make qemu

如果执行 make qemu 卡在qemu-system-riscv64 -machine virt -bios none -kernel kernel/kernel -m 128M -smp 3 -nographic -drive file=fs.img,if=none,format=raw,id=x0 -device virtio-blk-device,drive=x0,bus=virtio-mmio-bus.0，那就执行：

shell

sudo apt-get remove qemu-system-misc
sudo apt-get install qemu-system-misc=1:4.2-3ubuntu6

// 然后重新执行
make clean
make qemu

xv6 没有 ps 命令，可以按 Ctrl--p 打印每个进程的信息退出 Qemu 的方法为 Ctrl-a x （按住 Ctrl 时按下 a ，然后松开，按 x）

sleep(easy)

实验目的：在 xv6 中实现 Unix 中的 sleep 程序；你的 sleep 应该暂停用户指定的时间周期。一个周期是由 xv6 内核定义的时常，也就是中断两个周期的时长。

实验提示

开始写之前，阅读 xv6 book 的第一章
通过看一些其他的 user/ 下的程序(例如：user/echo.c, user/grep.c, user/rm.c)来知道如何从命令行获取参数
如果你的用户忘记传递参数，sleep 应该答应一个错误信息
命令行参数应该通过一个字符串传递；你可以使用 atoi 将它转换为整数(看 user/ulib.c)
使用系统调用 sleep
有关实现 sleep 系统调用的 xv6 内核代码（查找 sys_sleep），请参阅 kernel/sysproc.c，有关可从用户程序调用的 sleep 的 C 定义，请参阅 user/user.h，以及 user/usys。S 表示从用户代码跳转到内核以进行 sleep 的汇编程序代码。
确保 main 方法调用 exit() 来退出程序
添加 sleep 程序到 Makefile 的 UPROGS 中；每次写完，执行 make qemu 都会编译你的程序同时你能在 xv6 的 shell 中运行

实验思路

参考 user 下的其他程序，将头文件引入，(kernel/types.h 声明类型的头文件， user/user.h 声明系统第哦啊用的头文件，ulib.c 声明工具的头文件)
写 int main(int argc, char* argv[]) 函数。其中，参数 argc 是所有参数的个数，argv[] 是参数的内容，第 0 个参数是程序的全名，其他的是用户输入的其他参数
先判断用户输入的参数是否正确，只要参数不等于 2 个，就知道用户输入的参数是有错误的，就可以打印返回错误信息。参考 user/echo.c ，可以使用 fprintf() 函数打印错误信息，最后需要调用 exit(1)，0 表示正常，1 表示异常。
剩下的就是参数没有问题的情况，由于接收的是字符类型，所以应该先用 atoi() 函数将字符转换为整数，调用系统调用 sleep() 函数，最后调用系统调用 exit(0) 退出
在 Makefile 中添加配置，在 UPROGS 的最后一行添加 $U/_sleep\

实现代码

#include "kernel/types.h"
#include "kernel/stat.h"
#include "user/user.h"

int main(int argc, char* argv[])
{
        if (argc != 2)
        {
                fprintf(2, "user: sleep arg err...");
        }

        int arg = atoi(argv[1]);
        sleep(arg);
        exit(0);
}

测试代码
运行 ./grade-lab-util sleep

== Test sleep, no arguments == sleep, no arguments: OK (1.0s) 
== Test sleep, returns == sleep, returns: OK (1.0s) 
== Test sleep, makes syscall == sleep, makes syscall: OK (1.1s)

pingpong(easy)

实验目的：实现一个叫 ping-pong 的用户态程序，通过管道连接。父进程应该发送字节到子进程；子进程应该打印出 "<pid>: received ping" ，"pid" 是它的程序的 ID ，通过管道写入字节给父进程，然后退出；父进程读取子进程的字节，打印出 "<pid>: received pong" ，然后退出。文件写在 user/pingpong.c 。

实验提示

使用 pipe 来创建管道
使用 fork 来创建子进程
使用 read 来读取管道，使用 write 来写入管道
使用 getpid 的来获取程序 ID
添加程序到 Markfile 的 UPROGS 中
xv6 上的用户程序有一组有限的库函数可供使用。可以在 user/user.h 中查看该列表；源代码（系统调用除外）位于 user/ulib.c、user/printf.c 和 user/umalloc.c 中。

实验思路

按照实验目的的思路，先使用 pipe 来创建管道，由于管道是半双工的，因此需要创建两个管道，一个是父进程向子进程发送字节，一个是子进程向父进程发送字节，其中 p[0] 是读端， p[1] 是写端
通过 fork 建立子进程，通过阅读 xv6文档可知，对于父进程， fork 会返回父进程的 pid ，同时创建并返回一个 pid 是 0 子进程；让父进程通过管道发送 "ping" 给子进程，然后使用 read 开始读取来自子进程的 pong
当子进程使用 read 读取管道，读取到父进程发送的 "ping" 时，通过 getpid 获取 "pid" ，然后打印出 "<pid>: received ping" ，之后使用 write 发送 "pong" 给父进程，最后退出子进程
当父进程读取到子进程的发送的 pong 时，打印出 "<pid>: received pong"

实验代码

#include "kernel/types.h"
#include "kernel/stat.h"
#include "user/user.h"

int main()
{
        int p_c[2];
        int c_p[2];
        char buf[8];

        pipe(p_c);
        pipe(c_p);
        if (fork() == 0)
        {
                read(p_c[0], buf, 4);
                int pid = getpid();
                printf("%d: received %s\n", pid, buf);
                write(c_p[1], "pong", 4);
                exit(0);
        }
        else
        {
                write(p_c[1], "ping", 4);
                wait(0);
                read(c_p[0], buf, 4);
                int pid = getpid();
                printf("%d: received %s\n", pid, buf);
                exit(0);
        }
        exit(0);
}

测试代码
运行 ./grade-lab-util pingpong

== Test pingpong == pingpong: OK (1.9s)

primes(moderate)/(hard)

实验目的：使用管道实现一个并发版本的素数筛选器。这个思路来源于 Doug McIlroy(Unix 中管道的发明人) ，本文下半部分的图片和周围的文字说明了如何实现
使用 pipe 和 fork 来建立管道链，第一个进程讲数字 2 至 35 输入管道，对于每个素数，都创建一个进程，xv6 的文件描述符和进程数量有限，所以到 35 就要停止

实验提示

注意关闭进程不需要的文件描述符，否则程序会在第一个进程运行到 35 之前耗尽 xv6 的资源
第一个进程运行到 35 时，应等待整个管道结束，包括所有子进程、孙进程等。因此，主 primes 进程只有在打印完所有输出并退出所有其他 primes 进程后，才能退出
当管道的写入端关闭时， read 返回 0
最简单的方式时直接将 32 为 (4字节) 的整型写到管道中，而不是使用格式化的 ASCII I/O
你应该根据需要来创建进程

实验思路

首先看链接中的图，可以看出，数字全部输入到第一条管道，然后打印出 2 ，然后将所有 2 的倍数都剔除，接着把被剔除的数字全部输入到下一条管道，第二个进程打印的管道传输来的第一个数字 3 ，然后将所有 3 的倍数剔除，以此类推
看图片上面的伪代码，首先 p 从左边的管道获取到一个数字，打印 p ，循环，n 从左边的的管道获取一个数字，如果 p 不能整除 n ，则将 n 发送到右边的管道
代码思路，main 函数中，使用 fork 生成父子进程，使用子进程调用一个 primes 函数，父进程利用管道将数字传输给子进程，子进程将读管道传输给 primes 函数
primes 函数打印当前数字，生成一个用于父子传递的管道，对应提示网站的右边的传输，然后再使用 fork() 生成父子进程，然后子进程递归调用 primes函数，父进程接口读取参数中传来的数据，筛选完后发送给子进程

踩到的坑

一定一定一定要记得把不需要还有用完的文件描述符关掉
对于管道读操作，如果没有数据写入，会出现阻塞，直到有数据写入
对于管道读操作，如果所有写端被关闭时，read() 会返回 0 (EOF)
对于管道写操作，缓冲区满的时候，会出现阻塞，直到有空间写入
对于管道写操作，如果所有读端被关闭时，会触发 SIGPIPE 信号（默认中止进程）

实验代码

#include "kernel/types.h"
#include "kernel/stat.h"
#include "user/user.h"

void primes(int input_fd)
{
  int p;
  read(input_fd, &p, sizeof(int));
  if (p == 0)
  {
    exit(0);
  }
  printf("prime %d\n", p);

  int num;
  int pd[2];
  pipe(pd);
  /*close(input_fd);*/
  if (!fork())
  {
    close(pd[1]);
    primes(pd[0]);
    exit(0);
  }
  else
  {
    close(pd[0]);
    while (read(input_fd, &num, sizeof(int)) == sizeof(int))
    {
      if (num % p != 0)
      {
        write(pd[1], &num, sizeof(int));
      }
    }
    close(pd[1]);
    wait(0);
    exit(0);
  }
  exit(0);
}

int main()
{

  int p[2];
  pipe(p);

  if (!fork())
  {
    close(p[1]);
    primes(p[0]);
    exit(0);
  }
  else
  {
    close(p[0]);
    for (int i = 2; i < 35; i++)
    {
      write(p[1], &i, sizeof(int));
    }
    close(p[1]);
    wait(0);
  }
  exit(0);
}

测试代码./grade-lab-util primes

make: 'kernel/kernel' is up to date.
== Test primes == primes: OK (1.8s) 
    (Old xv6.out.primes failure log removed)

find(moderate)

写一个简单的版本的 UNIX 的查找程序：找到目录中所有带特定名称的文件。

实验提示

看看 user/ls.c 是如何读取目录的
使用递归来查找子目录
不要递归到 "." 和 ".."
对文件系统的更改会在运行 qemu 时持续存在；要获得一个干净的文件系统，请运行 make clean，然后再运行 make qemu
需要使用 C 的 strings ，看一下 C程序设计语言例如 5.5 章节
注意 == 不能像 Python 那样比较字符串，需要使用 strcmp() 代替

实现效果

    $ make qemu
    ...
    init: starting sh
    $ echo > b
    $ mkdir a
    $ echo > a/b
    $ find . b
    ./b
    ./a/b
    $

实验思路

根据提示，先看一下 user/ls.c 是如何读取目录的，使用 open() 打开路径会返回一个文件描述符，使用 fstat() 获取路径信息；然后读取路径类型，如果路径是文件，直接打印，如果是文件夹，判断一下路径是否太长，太长直接返回错误，否则循环读取，每次读一个文件或文件夹，如果文件夹的 inodenum 是 0 ，说明这个文件失效了，直接跳过，接着将文件名复制到路径后面，再在路径末尾新增一个空字符串，最后格式化打印
先遍历当前目录的所有文件，查找出文件名带有目标字符的文件，如果找到直接打印，如果找到文件夹，就递归查找

实验代码

#include "kernel/types.h"
#include "user/user.h"
#include "kernel/stat.h"
#include "kernel/fs.h"

void find(char *path, char *target)
{
  char buf[512], *p;
  int fd;
  struct dirent de;
  struct stat st;

  if ((fd = open(path, 0)) < 0)
  {
    fprintf(2, "find: cannot open %s\n", path);
    return;
  }

  if (fstat(fd, &st) < 0)
  {
    fprintf(2, "find: cannot stat %s\n", path);
    return;
  }

  if(strlen(path) + 1 + DIRSIZ + 1 > sizeof buf){
    printf("ls: path too long\n");
    close(fd);
    return;
  }
  strcpy(buf, path);
  p = buf+strlen(buf);
  *p++ = '/';
   
  while(read(fd, &de, sizeof(de)) == sizeof(de))
  {
    if(de.inum == 0)
      continue;
    if (!strcmp(de.name, ".") || !strcmp(de.name, ".."))
      continue;
    
    memmove(p, de.name, DIRSIZ);
    p[DIRSIZ] = 0;
    if(stat(buf, &st) < 0){
      printf("find: cannot stat %s\n", buf);
      continue;
    }

    if (st.type == T_DIR)
    {
      find(buf, target);
    }
    else if (st.type == T_FILE && !strcmp(de.name, target))
    {
      printf("%s\n", buf);
    }

  }
}

int main(int argc, char *argv[])
{
  if (argc != 3)
  {
    fprintf(2, "find: argv error...\n");
    exit(2);
  }

  find(argv[1], argv[2]);
  exit(0);
}

测试代码./grade-lab-util find

make: 'kernel/kernel' is up to date.
== Test find, in current directory == find, in current directory: OK (1.4s) 
== Test find, recursive == find, recursive: OK (1.0s)

xargs(moderate)

写一个简单版本的 UNIX 的 xargs 程序：从标准输入中读取行数，并为每一行运行一条命令，同时将行数作为参数提供给命令

示例

  $ echo hello too | xargs echo bye
  bye hello too
  $

这里的命令是 echo bye ，附加的参数是 hello too ，因此实际的命令其实是 echo hello too ，所以输出结果是 bye hello too

实验提示

使用 fork 和 exec 在每行输入上调用命令，使用 wait 在父进程里等待子进程完成命令
要读取单行输入的内容，就要每次只读取一个字符，直到出现换行符 \n
kernel/param.h 中声明了 MAXARG ，这在声明 argv 数组时可能有用
如果 xv6 系统的文件被修改了，可以使用 make clean 来清理

xrargs, find, 和 grep 结合：

text

  $ find . b | xargs grep hello

这条命令将对 "." 目录下的每个文件名是 "b" 的文件是运行 grep hello

实验思路

进行参数校验，如果大于 MAXARG ，直接返回错误
通过 read() 读取文件描述 0 ，可以读取到管道传递的数据。系统调用循环读取输入，直到遇到 EOF(文件结束符) ，需要开辟新的内存来存储读取到的参数，同时将所有的参数存储到一个指针数组中，但是不可以使用 main() 里的 argv[] ，因为容量不是固定的
最后，使用 fork() ，创建一个子进程，在子进程中调用 exec() ，父进程一定要使用 wait() 等待子进程执行结束再退出

扩展
至于为什么要使用子进程，我也很好奇，问了一下 ds ，原因如下

防止某个命令会修改环境变量，导致后续的操作逻辑出错
如果不使用子进程，不能实现并行执行
如果有多命令时，单个命令出错会导出整个 xargs 程序崩溃

踩到的坑

头文件顺序对编译会有影响，user/user.h 不能第一个引入，否则会编译出错

实验代码

#include "kernel/types.h"
#include "kernel/stat.h"
#include "kernel/param.h"
#include "user/user.h"

int main(int argc, char* argv[])
{
  if (argc == 1)
  {
    fprintf(2, "xrags: args err...");
    exit(1);
  }

  char* xargv[MAXARG];
  int xargc = 0;
  for (int i = 0; i < argc - 1; i++)
    xargv[xargc++] = argv[i + 1];

  char temp[512];
  int temp_idx = 0;
  while (1)
  {
    char c;
    if (read(0, &c, 1) <= 0) break;

    if (c == ' ' || c == '\n')
    {
      if (temp_idx > 0)
      {
        temp[temp_idx] = '\0';
        char *new_str = malloc(temp_idx + 1);
        memcpy(new_str, temp, temp_idx + 1);
        xargv[xargc++] = new_str;
        temp_idx = 0;

        if (xargc >= MAXARG - 1) break;
      }
    }
    else
    {
      temp[temp_idx++] = c;
      if (temp_idx >= sizeof(temp)) break;
    }
  }
  xargv[xargc] = 0;

  if (xargc > 0)
  {
    if (fork() == 0)
    {
      exec(argv[1], xargv);
      exit(0);
    }
    else
    {
      wait(0);
      exit(0);
    }
  }
  else
  {
    fprintf(2, "exec failed\n");
    exit(1);
  }

  return 0;
}

实验测试
执行 ./grade-lab-util xargs

./grade-lab-util xargs
make: 'kernel/kernel' is up to date.
== Test xargs == xargs: OK (2.2s) 
    (Old xv6.out.xargs failure log removed)

Lab1 所有实现测试

执行 make grade

== Test sleep, no arguments == 
$ make qemu-gdb
sleep, no arguments: OK (2.8s) 
== Test sleep, returns == 
$ make qemu-gdb
sleep, returns: OK (0.7s) 
== Test sleep, makes syscall == 
$ make qemu-gdb
sleep, makes syscall: OK (1.0s) 
== Test pingpong == 
$ make qemu-gdb
pingpong: OK (1.0s) 
== Test primes == 
$ make qemu-gdb
primes: OK (0.9s) 
== Test find, in current directory == 
$ make qemu-gdb
find, in current directory: OK (1.0s) 
== Test find, recursive == 
$ make qemu-gdb
find, recursive: OK (1.2s) 
== Test xargs == 
$ make qemu-gdb
xargs: OK (1.1s) 
== Test time == 
time: FAIL 
    Cannot read time.txt
Score: 99/100
make: *** [Makefile:237: grade] Error 1

最后一个出错了，搜索了一下参考链接，只需要在 xv6-labs-2020 的目录下新建一个 time.txt 文件，然后在里面输入一个整数就可以了。应该是 mit 的老师用来了解学生的进度的

== Test xargs == 
$ make qemu-gdb
xargs: OK (1.2s) 
== Test time == 
time: OK 
Score: 100/100

Lab: Xv6 and Unix utilities ​

Boot Xv6(easy) ​

sleep(easy) ​

pingpong(easy) ​

primes(moderate)/(hard) ​

find(moderate) ​

xargs(moderate) ​

Lab1 所有实现测试 ​

Lab: Xv6 and Unix utilities

Boot Xv6(easy)

sleep(easy)

pingpong(easy)

primes(moderate)/(hard)

find(moderate)

xargs(moderate)

Lab1 所有实现测试