Starting from:

$25

CS107-Assignment 3 A Heap of Trouble Solved

This assignment consists of two code-study exercises and two small programs to write. One code-study exercise concerns the gets function that reads a line from standard input, the other examines the calloc and realloc functions that allocate memory in the heap. The programs to write are simpli ed versions of the unix utilities uniq and tail .

Get started
Check out the starter project from your cs107 repo using the command

 

The starter project contains code.c with the code for the code-study exercises, C les mycat.c , read_line.c , mytail.c and myuniq.c , and the supporting Makefile , custom_tests , and readme.txt les. In the samples subdirectory, you'll nd our sample solutions and some

sample input les.

1.  Code study: gets
The standard C library function gets has an inherently awful design. It is intended to read a single line of text from stdin, stopping at the rst newline character, and writing the read characters into the client's bu er. The fatal aw of gets is that its only argument is the starting address of the bu er, with no indication of that bu er's length. Without the length, gets cannot tell when it should stop writing characters to avoid over owing the bu er. There is absolutely no way to use gets safely. Its use has long been deprecated in favor of the properly-constrained fgets function.

For reasons of backward compatibility, gets lives on in the standard library and much e ort goes into trying to dissuade programmers from using it. Let's look at how the tools try to impede use of the function and observe the consequences if we proceed past the warnings and use the function anyway. Answer the following questions in your readme.txt le:

a.  The gcc compiler squawks about gets at compile-time. Use make to build the program and you'll see a special case complaint when anyone dares call gets . You should get not one but two warnings, one from the compiler and another from the linker. What are these warnings? (Fun fact: even if you invoke gcc with the -w ag to disable all warnings, the complaint from the linker lives on. You just can't shut that thing up!)

b. The MacOS libc (shown below as apple_gets ) adds a shout-out against gets at runtime.

Describe the behavior/output of apple_gets given its use of the static variable warned . (Note: it is highly irregular for the runtime behavior of a library function to

produce unsolicited output like this. I guess someone really wanted to ensure that the use of gets didn't pass unnoticed!)

c.  The gets function is implemented in musl (shown below as musl_gets ) as a simple passthrough to the better-designed fgets function. Does this choice result in a gets function that actually operates more safely? Justify why or why not.

d. Review the use of the myth's version of gets in code.c. Run the code program and when prompted for your name, enter a response which will cause a bu er over ow. How many characters beyond the end do you have to go before you see any ill e ect? How is the problem reported when it surfaces?

See the BUGS section of man gets for further harsh words against the function and hints of the security problems therein. Peter van der Linden's book Expert C Programming: Deep C Secrets summarizes the most famous gets exploit in the early bird gets() the Internet Worm

(http://proquest.safaribooksonline.com.stanford.idm.oclc.org/book/programming/c/0131774298/2dotit-not-a-bug-it-a-language-feature/ch02lev1sec3?#ch02lev2sec14). When we delve into the stack mechanics later this quarter, we will learn how a bu er over ow leads to these security problems. (To access the book content on Safari Books, you will need to authenticate with your SUNet id and may have to wait if there are too many concurrent users. The entire Expert C book is available and chock full of fascinating information -- I highly recommend it for its illuminating and comprehensive coverage of all things C!)

 

2.  Code study: calloc and realloc
Our programs will start making use of the heap as a client this week. Although we're not yet ready to dig into the internals of how a heap allocator operates, we can take a look at the implementation of some of the ancillary routines that coordinate with malloc and free .

Introduce yourself to malloc 's cousin calloc by reading its man page ( man calloc ) and then review the implementation for calloc shown below (taken from musl):

 

The calloc interface is presented in terms of allocating an array of m elements each of size n bytes. This is somewhat misleading because there is nothing array-speci c to its operation. It simply allocates a region of total size m * n bytes. The call calloc(2, 5) has the same e ect as calloc(5, 2) or calloc(1, 10) . After allocating the space, it then zeros the memory, which is the feature that distinguishes calloc from ordinary malloc .

Answer the following questions in your readme.txt le:

a.  Your colleague objects to the use of division on Line 3 because division is more expensive

than multiplication and requires an extra check for a zero divisor. He proposes changing Line 3 to if (n * m SIZE_MAX) claiming it will operate equivalently and more e        ciently. Explain to him why his plan won't work.

b. Examine the expression on line 10. It bears a resemblance to a piece of code we studied before (lab1 (/class/cs107/lab1/)). What is being calculated by this expression? There is an assumption baked into this code about how many bytes malloc will actually reserve for a given requested size. What is that assumption? (This behavior is not a documented public feature and no client that was a true outsider should assume this, but calloc and malloc were authored by the same programmer, thus calloc is being written by

someone with rsthand knowledge of the malloc internals and acting as a privileged insider.)

c.  Your colleague is also displeased with Line 12. He claims that removing the if (*wp) and leaving just the *wp = 0 would make the loop run faster with no change in what it accomplishes. Do you agree? Explain your reasoning.

The code shown below illustrates the behavior of a standard realloc function. Read the man page for realloc and take note of the behavior for unusual calls such realloc'ing a NULL pointer or resizing to zero bytes. Look for that special-case handling in the code below:

 

Answer the following questions in your readme.txt le:

d. Whenever possible, the heap allocator prefers to keep the memory in-place and accommodate the new size at the existing location. What is the advantage of this option over the alternative of allocating a new piece of memory?

e.  Why must the caller catch the return value from realloc ? Why doesn't realloc just reassign the pointer when it's necessary to move the memory block? (There is both a simple logistical reason and an underlying rationale for the design, either answer is acceptable.)

f.   It is stated that a correct program should have a one-to-one correspondence between malloc calls and free calls. How does adding a realloc call change the needed

number of malloc and free calls? Brie y explain your reasoning.

3.  Write read_line
The C le-reading functions share a similar interface in which the client is responsible for providing the bu er to store the text read. The client must pre-allocate memory based on an expectation of what size will be "large enough". If the client's choice of size is too small, a linereading function might over ow the bu er ( gets ) or truncate the line ( fgets ). A more clientfriendly design would be to bundle the necessary allocation inside the read function and enlarge the space as necessary. Consider this function prototype for read_line function that operates in that manner:

 

The read_line function reads the next line from a le. The return value is a dynamicallyallocated and null-terminated string. If the le contains no more lines to read, read_line returns NULL. This function performs like a snazzier version of fgets (in fact, internally it will be implemented using fgets ) in that it reads the next line but this version also manages the allocation details and provides a cleaner interface for the client to use.

Speci c details of the function's operation:

 The function reads the next line from the le. A line is considered to end after the rst newline character or EOF, whichever comes rst.

If the line ended with a newline, the newline is not included in returned string, instead the newline is replaced with a null char. If the line to be read consists of just a newline character, the empty string would be returned.

To allocate memory for the returned string, the function makes an initial allocation and if needed, successively enlarges it to accommodate the entire line. To be more speci c, it rst mallocs a bu er of minimum size (32) and reads the rst part of the line into the bu er. If the line is longer than the original estimate, it reallocs the bu er to double its current size (64), and attempts to read the rest of the line. If there is still more to read, it reallocs to double the size again and so on until nally it reads to the newline or EOF that marks the end of the line.

 If unable to allocate su cient memory, read_line should raise a fatal assert. As a habit, you should assert the result from every allocation request. Allocation failures are rare but deadly.

The return value of read_line is the address of a dynamically-allocated piece of memory. The client is responsible for deallocating this memory with free when no longer needed.

You are to write the function read_line that operates in this fashion. Write your implementation in the read_line.c le. You will write and test this function in isolation now, and then will use the function later when writing the mytail and myuniq programs. You can test your read_line function using our provided mycat.c program. The mycat program is integrated with sanitycheck.

Review and comment starter code
Both myuniq.c and mytail.c are given to you with a small amount of code to handle the command-line arguments. Before starting on either program, rst read and understand the given code, work out how to change/extend it to suit your needs, and nally add comments to document your strategy.

Some questions you might consider for self-test: (do not submit answers)

 The programs are set up to either read from stdin or a named le (using code similar to that studied in lab3 (/class/cs107/lab3/)). Will any special-case handling be required to handle stdin di erently than a named le? mytail supports a command-line argument of -number to control the number of lines

printed. How does the given code process that optional argument?

What range of values is accepted as an argument for mytail -number ?

Do you see anything unexpected or erroneous? We intend for our code to be bug-free; if you nd otherwise, please let us know!

As per usual, the code we provide has been stripped of its comments and it will be your job to provide the missing documentation.

4.  Implement myuniq
What does the uniq command do?

Many standard unix commands operate as lters where they read input from a le, transform it in some way, and then print out the result. The unix utility uniq ( man uniq ) is one such lter. It removes adjacent matching lines from the input; the output will contain only one copy of each line that is duplicated in the input. When uniq is invoked with the -c ag, it also counts the number of consecutive occurrences for each line in the input. Consider the following sample use

 

It's important to note that uniq does not detect repeated lines unless they are adjacent in the input.

How does myuniq operate?

The myuniq program you are to write is similar in operation to the standard uniq with these di erences:

 myuniq pre xes each line of output with its count of consecutive occurrences in the input

(this is the behavior of standard uniq when invoked with the -c ag) myuniq supports no command-line ags (standard uniq has a number of ags)

The implementation of myuniq is rather straightforward once you have a working read_line , the tricky part is being careful with memory allocation and deallocation. This program makes a nice warmup!

5.  Implement mytail
What does the tail command do?

The unix tail command ( man tail ) is an example of another lter. This lter prints the nal N lines of the input, where N is 10 (default value when not explicitly speci ed) or a number expressed as a command-line ag, e.g. tail -4 . Consider the following sample use of tail :

 

If the input contains fewer than N total lines, all lines from the input are printed.

How does mytail operate?

The mytail program you are to write is similar in operation to the standard tail with these di erences:

 mytail supports only one command-line ag, -N where N is the number of lines to

output (standard tail has a number of other ags)

mytail only reads one le either the named le (if speci ed) or standard input (if not)

(standard tail can read from any number of le arguments)

While mytail is processing its input, it will need to keep track of N lines. For this you should use an array of char * pointers. However, N can potentially be a very large number, large enough that this array might exceed the capability of the stack. The starter code declares a stack array of a conservative maximum length. If N is <= this maximum, mytail should use this stack array. If N is larger, than mytail should instead malloc an array of pointers and use that instead.

At any given point, this array holds the most recent N lines read. The array is used to hold a "sliding window" of N lines that is moving its way through the input. When mytail hits the end of the input, the window will contain the nal N lines.

When mytail starts processing the input, it will read the rst N lines into the array. Upon reading the N+1st line, it should not enlarge the array, but instead should discard the 1st line of the input and use that array slot for the N+1st. The N+2nd overwrites the second line and so on. In this way, an array is used as a circular queue (https://en.wikipedia.org/wiki/Circular_bu er).

There is some logic to work out in the circular queue, but most of the challenge in this program concerns proper memory handling. By the time you are done with this assignment, you will be a heap master!

Valgrind
This assignment provides a lot of targeted practice in managing memory. You'll be using both the stack and heap for various purposes and need to take care to appropriately allocate and properly deallocate each. Bring your A game for this and be sure to get Valgrind on the job. It is critical that your code contain no memory errors. Plugging any pesky memory leaks is a more minor concern. You should also take care to monitor your memory e                    ciency and keep it on target. How can you do this?

Valgrind summarizes the overall heap memory usage. Look for a line at the end of a Valgrind report in this format:

 

This is the aggregate total of all heap allocations during the program's lifetime. The count of allocs should always equal the count of frees. Fewer allocs than free is a sign of an error (free'd something that was not alloc'ed in rst place), and fewer frees than allocs indicates you've got a leak. The count of allocations and total size allocated is a measure of your program's memory "footprint". Run our solution under Valgrind to see its footprint and compare yours to it as a measure of your program's memory e               ciency. We very tightly specify the use of stack/heap memory for these programs, so the memory usage of a correct program should track the solution very closely. In grading, we usually give your programs some slack (say within 2-3x the sample) but you should aim for closer to validate your usage of memory is correct.

More products