Assignment 11: Where others fear to thread

Due Sunday, December 1 before midnight (No late submissions accepted)

The goals for this assignment are:

  • Implement multi-threaded algorithms using the pthread library

  • Implement a tree data structure

  • Work with OS system calls

Update your repository

Do a fetch upstream to obtain the basecode for this assignment.

Using the command line, make sure that you have the correct upstream repository and do a fetch upstream to get the basecode for this assignment.

  • Open terminal and change your current directory to your assignment repository.

  • git remote set-url upstream git@github.com:brynmawr-cs223-f24/cs223-assignments.git

  • Run the command git fetch upstream

  • Run the command git merge upstream/master

Your repository should now contain a new folder named A11.

The fetch and merge commands update your repository with any changes from the original.

1. Grep

This question implements a simplified version of the bash command grep. Grep searches a list of files for a given keyword or expression. For example, the following command searches for the keyword public in a list of source files. We use the find command to generate the list of source files to search.

grep
Make sure you include the backticks, e.g. ` . This adds the filenames found by find as command line arguments to grep.

In the file, grep.c, implement a program that uses N threads to search for a keyword in a set of files.

$ make grep
gcc -g -Wall -Wvla -Werror grep.c -o grep -lpthreads
$ ./grep
usage: ./grep <NumThreads> <Keyword> <Files>
thread grep

Requirements:

  • Your program should take the number of threads, keyword, and a list of files as command line arguments

  • Subdivide the files among the N threads. Note that the number of files may not divide evenly between threads.

  • Use a mutex to ensure that threads do not interrupt each others printing

  • Print the totals and sub-division of files similarly to the output above

  • Use the fgets to process the lines in each file

  • Use the function strstr defined in string.h to search lines of text for the keyword. Documentation is here

2. Code Analysis

This question has two parts. In the first part, you will implement a search tree with string keys. In the second part, you will use the tree to store the source files from a directory, along with their dependency information. Similarly to your grep implementation, you will build the tree using multiple threads.

2.1. Tree

In the file, tree.c, implement a binary search tree. Do not change the header tree.h, nor the function declarations in tree.c. But you can add functions to tree.c! We recommend using the book "Data Structures and Algorithm Analysis in C" by Mark Allen Weiss. See the course webpage for a link.

$ make ./tree_tests
gcc -g -Wall -Wvla -Werror -Wno-unused-variable -Wno-unused-but-set-variable tree_tests.c tree.c -o tree_tests -lpthread
$ ./tree_tests
Running tests...
test 1: insert for empty tree: PASSED
test 2: name set correctly: PASSED
test 3: left is null: PASSED
test 4: right is null: PASSED
test 5: name set correctly: PASSED
test 6: right is null: PASSED
test 7: name set correctly: PASSED
test 8: item not in tree: PASSED
test 9: item in tree: PASSED
test 10: found correct object: PASSED
M
 l:A
 r:X
  l:P
A
M
P
X

Requirements:

  • You must not modify the basecode for tree.h or tree.c, except to add functions to tree.c

  • Your output should match the above and pass all tests

2.2. Dependencies

A dependency is a general software term for any software required by another. In this question, we will store files in a binary search tree so we can list their file dependencies, as determined by #include statements. For example, the file "Animal.h" depends on "Locomotion.h".

In the file, dependency.c, implement a program that uses N threads to build a binary search tree of a given set of files. After building the tree, the program should give the user a prompt where they can list the processed files in alphabetical order and then query the dependencies of the file by giving the filename.

$ ./dependency 3  ` find code -name ".h" -o -name ".cpp" ` 
Processing 11 files
Thread 0 processing 4 files (arguments 2 to 6)
Thread 1 processing 4 files (arguments 6 to 10)
Thread 2 processing 3 files (arguments 10 to 13)
Elapsed time is 0.000496
$ list
code/Animal.h
code/Bird.h
code/Duck.h
code/Fish.h
code/Fly.h
code/Locomotion.h
code/Seal.h
code/Swim.h
code/Walk.h
code/Whale.h
code/Zoo.cpp
$ code/Animal.cpp
code/Animal.cpp not found
$ code/Animal.h
code/Animal.h has the following dependencies
  string
  vector
  Locomotion.h
$ code/Zoo.cpp
code/Zoo.cpp has the following dependencies
  Animal.h
  Bird.h
  Duck.h
  Fish.h
  Seal.h
  Whale.h
  Locomotion.h
  Walk.h
  Fly.h
  Swim.h
$ apple
apple not found
$ quit

Requirements:

  • Check that a file is valid before adding it to the tree

  • Support the commands: quit, list, <filename>

  • Use a mutex to avoid race conditions when setting up the tree

  • Your program should take the number of threads and a list of files as command line arguments

  • Subdivide the files among the N threads. Note that the number of files may not divide evenly between threads.

  • Use a mutex to ensure that threads do not have a race conditions when constructing the tree

  • Use strstr and fgets to parse the source file and find lines containing #include

  • Strip the filenames from the #include lines when printing dependencies.

3. Submit your work

Submit both your code and images.

1) Push your code work to github

$ git status
$ git add .
$ git status
$ git commit -m "assignment complete"
$ git status
$ git push
$ git status

4. Grading Rubric

Assignment rubrics

Grades are out of 4 points.

  • (1.5 points) Grep

    • (0.15 points) style and header comment

    • (0.1 points) Correct command line arguments: number of threads, keyword, and a list of files

    • (0.2 points) Subdivide the files among the N threads

    • (0.3 points) Correct output, uses mutex

    • (0.75 points) no memory errors

  • (1.5 points) Tree

    • (0.15 points) style and header comment

    • (0.6 points) implements all functions, passes all tests and does not modify basecode

    • (0.75 points) no memory errors

  • (1 points) Dependency

    • (0.1 points) style and header comment

    • (0.1 points) Correct command line arguments: number of threads, keyword, and a list of files

    • (0.1 points) Subdivide the files among the N threads, builds tree using multiple threads using mutex

    • (0.2 points) Correct behavior

    • (0.5 points) no memory errors

Code rubrics

For full credit, your C programs must be feature-complete, robust (e.g. run without memory errors or crashing) and have good style.

  • Some credit lost for missing features or bugs, depending on severity of error

  • -5% for style errors. See the class coding style here.

  • -50% for memory errors

  • -100% for failure to checkin work to Github

  • -100% for failure to compile on linux using make