In this project, we are creating an Abstract Data Type (ADT) to improve upon the standard C way of representing strings. We will use malloc and free, and get some practice with pointers, structs, and macros.
On Strings
Strings in C are stored simply as an array (buffer) of ASCII-encoded characters with a zero (null character) on the end. This works well, but has some drawbacks. Importantly, strings are a common source of buffer overflow errors, a serious security vulnerability that happens when programmers forget how large the string’s buffer actually is when writing to it.
Our new ADT will still store the actual string in the basically the same way as before -- however, it will also store some extra information as well -- specifically, the length (number of characters) and the capacity (how many characters can it hold before buffer overflow would result). As some amount of extra safety, in the four bytes immediately following the string, we will store a not-sorandom looking byte sequence that we can check to confirm that nothing bad has happened (yet).
On UTStrings
You are to implement each of the functions declared inside String.h. You must use (without modifying) the UTString struct that is defined inside String.h. This struct consists of the following parts:
• length – The length of the string. This is the number of characters in the string, and does not include the null character, nor anything after it. Keep in mind that the length of a string may be shorter than the buffer in which the string is stored. For example, if we had the string “hello”, then the length would be 5 (one for each useful character), regardless of the size of the buffer.
• capacity – The length of the longest string that can be stored in the buffer. For example, if we had a string with a buffer of length 20, then capacity would be 15 (20, minus the null character and check), regardless of the length of the string.
• string – A pointer to the buffer where the string is stored. It must be allocated separately from the UTString itself.
• check –The signature value (~0xdeadbeef) stored after a string. It is not a true member of the struct, but should be in it regardless. We will check the value every time when working with a UTString to make sure that no buffer overflow has occurred yet. If it does not check out correctly, your program should fail an assert and crash immediately. UTString struct:
length (int, 4 bytes) = 11
capacity (int, 4 bytes) = 11 or greater
string (char*, 4 bytes) pointer to location of buffer on heap of at least capacity
The contents of the string buffer for the string “Hello World” (the single quotes and 0x are not shown for brevity):
H
e
l
l
o
‘ ‘
W
o
r
l
d
\0
~0xdeadbeef
If the buffer is bigger, the bytes after 0xef are unknown.
UTStrings can be used much as an ordinary string. Anyone choosing to use our library should declare variables of type UTString*, and should only use our functions to change the stored string. Some notes:
• UTStrings can only be stored on the heap. Declaring a local or global variable of type “UTString” will not work. We can, of course, declare local variables of type “UTString*” (pointers to UTString). We’ll point these pointers at chunks from the heap.
• Our clients (anyone using our library) will only create strings by calling the utstrdup function. Our clients will not poke around inside our struct, or to mess with the characters inside the array. Our clients will call utstrfree with their UTStrings when they’re done with them.
Goals
Write the following functions:
• UTString* utstrdup(const char* src); -- An analog of the C stdlib function strdup. src is a string. Create a UTString on the heap that holds a copy of source, setting the length, capacity, and check appropriately. Return a pointer to the UTString.
• uint32_t utstrlen(const UTString* src); -- An analog of the C stdlib function strlen. src is a UTString. Find and return the length of the string.
• UTString* utstrcat(UTString* s, const char* suffix); -- An analog of the C stdlib function strcat. Append characters to s from suffix until out of capacity or done copying. Do not copy more than can be stored. Do not allocate further space. Do use a null terminator and update the check. Return s after appending.
• UTString* utstrcpy(UTString* dst, const char* src); -- An analog of the C stdlib function strcpy. Should replace characters in dest with characters from src until out of capacity or done copying. Do not copy more than can be stored. Do not allocate further space. Do use a null terminator and update the check. Return dst after copying.
• void utstrfree(UTString* self); -- Used to free a UTString. Similar to the C stdlib function free. Must deallocate both the string buffer and the UTString itself.
• UTString* utstrrealloc(UTString* s, uint32_t new_capacity); -- Used to reallocate space for a UTString. Similar to the C stdlib function realloc. If new_capacity is larger than the current capacity, create a buffer with new_capacity capacity, copy all the old contents, and deallocate the old buffer. Otherwise, do nothing. Either way, return s afterwards.
Every UTString passed to any of these functions should have a valid check. If it does not, your program should fail an assert and crash immediately.
Submit all files created or edited for the assignment except testing or debugging files. Submit no more, and no less.
Testing
The provided tests are useful, but not very thorough (particularly with regards to utstrcpy). Please write your own tests to verify your own code.
The stage 1 and stage 2 tests are intended to convey how UTStrings work. The stage 3 test is trying to ensure you’ve done nothing silly. If all goes well, this test should run in less than a second and should not have any buffer overflows. Otherwise, you’ve probably done something silly.
The stage 4 test is trying to ensure that you are actually checking things as intended.
We will use valgrind to grade your program on memory leaks.
Hints
• You could write your own functions within Project3.cpp to avoid repeating code, for modularity, or for debugging.
• Look for off-by-1 errors in setting length, setting capacity, or copying characters.
• Think of valid but unexpected test cases – for instance, appending a string of length 0.
• Avoid writing code that does the same thing many times. If you need to, say, find the length of a regular string several times during the project, you should write a function to do so.