In this homework, you will gain familiarity with threads and processes from the perspective of a user program, which will help you understand how to support these concepts in the operating system. Along the way, you will gain experience with the list data structure used widely in Pintos, but in a user program running on Linux. We hope that completing this assignment will prepare you to begin Project 1, where you will work with the implementations of these constructs in the Pintos kernel. Our goal is to give you experience with how to use them in userspace, to understand the abstractions they provide in an environment where bugs are relatively easy to debug, before having to work with them in Pintos. It will also help you to see how you can do a lot of your development and testing of project code in a contained user setting, before you drop it into the Pintos kernel. This assignment is due at 9:00 pm on 9/18/2019.
1 Getting Started
Log in to your Vagrant Virtual Machine and run:
$ cd ~/code/personal/
$ git pull staff master $ cd hw1
Run make to build the code. Five binaries should be created: pthread, words, pwords, lwords, and fwords.
1.1 Overview of Source Files
Below is an overview of the starter code:
words.c, word_helpers.c, word_helpers.h
This files contain a simple driver for parsing words, computing their aggregate count, and then writing the count to a stream, as in the previous homework.
word_count.h, word_count.c
These files provide the interface of and the implementation of the word_count abstraction using a conventional singly-linked data structure for the representation. They are, in effect, the preferred solution to your C refresher problem in Homework 0. You should study these files carefully and compare them to your previous work from Homework 0. You must not modify these files.
list.c, list.h
These files are the list library used in Pintos, which is based on the list library in Linux. You should be able to understand how to use this library based on the API given in list.h. You must not modify these files. If you’re interested in learning about the internals of the list library, feel free to read list.c and the list_entry macro in list.h. You can find a good explanation of the list_entry macro here[1].
word_count_l.c
This file is the starter code for your implementation of the word_count interface specified in word_count.h, using the Pintos list data structure. We have already provided the type declarations for this in word_count.h. You must use those. Notice how the list element is embedded into the struct, rather than the next pointer. Also, the Makefile provides the #define PINTOS_LIST as a flag to cc. words.c is unchanged, as is its behavior. This exercise will cement your understanding of how to traverse and manipulate the kinds of lists that are used throughout Pintos. Your implementation of the word_count interface in word_count_l.c, when linked with the driver in words.c, should result in an application, lwords that behaves identically to words, but internally uses the Pintos list data structure to keep track of word counts.
pwords.c
This file is starter code to implement the pwords application. This is a version of the words application, where each file is processed in a separate thread. You will need to modify it to spawn the threads and coordinate their work.
word_count_p.c
This file is starter code to implement a version of word_count_l.c that not only uses the Pintos list data structure, but also provides proper synchronization when accessing the word_count data structure concurrently from multiple threads. This implementation of the word_count interface will be linked with your code in pwords.c to produce the pwords application. You will need to complete it.
fwords.c
This file implements a version of the lwords application where each file is processed in a separate process. It uses word_count_l.c to implement the word_count abstraction.
pthread.c
This file implements an example application that creates multiple threads and prints out certain memory addresses and values. In this assignment, you will answer some questions about this program and its output.
2 Check Yourself on Word Count
Read the basic words files and compare them with your Homework 0 solution. You will be building on this one, so make sure you understand it.
3 Warm Up
Write a simple application that reads in the output produced by words and creates the corresponding word_count data structure. Print it out to check.
4 Using Pintos Lists to Count Words
The Pintos operating system makes heavy use of a particular linked list library taken directly from Linux. Familiarity with this library will make it easier to understand Pintos, and you will need to use it in your solution for the projects. The objective of this exercise is to build familiarity with the Pintos linked list library in userspace, where issues are easier to debug than in the Pintos kernel.
First, read list.h to understand the API to library.
Then, complete word_count_l.c so that it properly implements the new word_count data structure with the Pintos list representation. You MUST use the functions in list.h to manipulate the list.After you finish making this change, lwords should work properly.
The wordcount_sort function sorts the wordcount list according to the comparator passed as an argument. Although words and lwords sort the output using the less_count comparator defined in word_helpers.c, the wordcount_sort function that you write should work with any comparator passed to it as the argument less. For example, passing the less_word function in word_helpers.c as the comparator should also work. If you’re having trouble with function pointer syntax when implementing this, here[2] is a good tutorial.
Hint #1: We provide a Makefile that will build lwords based on these source files. It compiles your program with the -DPINTOS_LIST flag, which is equivalent to putting a #define PINTOS_LIST at the top of the file. This selects a definition of the word count structure that uses Pintos lists. We recommend reading the word_count.h file to understand the new structure definition so you can see how the Pintos list structure is being used.
Hint #2: The provided Makefile uses the same words.c file to provide the main() function in both the words and lwords programs. So that the same main() function works for lwords, you should ensure that your implementation of each function in word_count_l.c is functionally equivalent to the corresponding function in word_count.c. In other words, both word_count_l.c and word_count.c must implement the same interface, namely the one in word_count.h.
5 Observing a Multi-Threaded Program
The pthread application is an example application that uses multiple threads. First, read pthread.c carefully. Then, run pthread multiple times and observe its output. Answer the following questions in a file called hw1.txt:
1. Is the program’s output the same each time it is run? Why or why not?
2. Based on the program’s output, do multiple threads share the same stack?
3. Based on the program’s output, do multiple threads have separate copies of global variables?
4. Based on the program’s output, what is the value of void *threadid? How does this relate to the variable’s type (void *)?
5. Using the first command line argument, create a large number of threads in pthread. Do all threads run before the program exits? Why or why not?
6 Using Multiple Threads to Count Words
The words program operates in a single thread, opening, reading, and processing each file one after another. In this exercise, you will write a version of this program that opens, reads, and processes each file in a separate thread.
First, read and understand pwords.c, which is a first cut at a program that intends to use multiple threads to count words.
Your task is to properly implement the pwords application. You will make changes to pwords.c and word_count_p.c to complete this task. It will need to spawn threads, open and process each file in a separate thread, and properly synchronize access to shared data structures when processing files. Your synchronization must be fine-grained. Different threads should be able to open and read their respective files concurrently, serializing only their modifications to the shared data structure. In particular, it is unacceptable to use a global lock around the call to count_words() in pwords.c, as such a lock would prevent multiple threads from reading the files concurrently. Instead, you should only synchronize access to the word count list data structure in word_count_p.c. You will need to ensure all such modifications are complete before printing the result or terminating the process.
We recommend that you start by just implementing the thread-per-file aspect, without synchronizing updates to the word count list.Can you even detect the errors? Multithreaded programs with synchronization bugs may appear to work properly much of the time. But, the bugs are latent, ready to cause problems.
To help you find subtle synchronization bugs in your program, we have provided a somewhat large input for your words program in the gutenberg/ directory. To generate these files, we selected some stories from among the most popular books made freely available by Project Gutenberg[3], making sure to choose short stories so that the word count program does not take too long to run. You should compare the result of running your pwords program on the Gutenberg dataset to the result of running words on the Gutenberg dataset and ensure they are same. This does not guarantee that your code is correct, but it might alert you to subtle concurrency bugs that may not manifest for smaller inputs.
Hint #1: The Makefile that we provide will compile your pwords.c program with the two flags
-DPINTOS_LIST -DPTHREADS, which select a definition of the word count structure that not only uses Pintos lists, but also includes a mutex that you may find useful for synchronization. Unlike the lwords exercise, in which the word_count_t structure was typedef’d to the Pintos list structure directly, the word_count_t structure now contains the Pintos list structure and a mutex. We expect your code in word_count_p.c to be similar to your code in word_count_l.c, with syntactic changes according to the new word_count_t structure and added synchronization to allow concurrent use of the word_count API as needed for pwords.
Hint #2: The man pages for pthread_mutex_lock and related functions are, unfortunately, not included in the class VM by default. To install them, run sudo apt install glibc-doc.
Hint #3: The multiple threads should aggregate their results without reading from or writing to any intermediate files. Attempting to open or read from any files other than the ones passed as input to your program may cause autograder tests to fail or not be run.
7 Using Multiple Processes to Count Words
In this exercise, you will write a version of the words program that opens, reads, and processes each file in a separate process.
First, read and understand fwords.c, which is your skeleton code for this exercise. It is designed to fork a separate child process to process each file and them merge the output of each process into the final output for the program.
Complete the fwords application by modifying fwords.c. Our provided Makefile links it with the code in word_count_l.c, NOT word_count_p.c. Each file must be opened, read, and processed from a separate process. You should not use pthreads (e.g., do not call pthread_create).
Notice the difference in the thread-based and process-based approaches. Are each of these processes able to use the word_counts variable declared in main? Do they interact with each other when they do? How is synchronization accomplished?)
Hint #1: Because each process operates in a separate address space, you do not need a mutex for synchronization.
Hint #2: The pipe system call and fdopen library call might be useful. For information about them, run man 2 pipe and man 3 fdopen.
Hint #3: The multiple processes should aggregate their results without reading from or writing to any intermediate files. Attempting to open or read from any files other than the ones passed as input to your program may cause autograder tests to fail or not be run.
8 Additional Questions
Answer the following additional questions in hw1.txt:
6. What are the pros and cons of using threads, as opposed to processes, to process each file?
7. Briefly compare the performance of lwords, pwords, and fwords when run on the Gutenberg dataset. How might you explain the performance differences?
8. Under what circumstances would pwords perform better than lwords? Under what circumstances would lwords perform better than pwords? Is it possible to use multiple threads in a way that always performs better than lwords?
9 Stretch
Now that you are familiar with threads and processes, you can further improve your solution to pwords by reducing the synchronization overhead. Can you synchronize your thread-based solution in pwords using a similar approach to fwords. What use do you make of thread-local storage?
Once you’ve done the above exercise, consider implementing pwords in Go. Using goroutines instead of threads and channels instead of pipes to synchronize in a similar way to fwords.
[1] https://stackoverflow.com/questions/15832301/understanding-container-of-macro-in-the-linux-kernel
[2] https://www.geeksforgeeks.org/function-pointer-in-c/
[3] https://www.gutenberg.org/ebooks/search/?sort order=downloads