Skip to content
Snippets Groups Projects
Commit 9c5484e7 authored by hdd29's avatar hdd29
Browse files

LAcking the python part, and the statetab pic

parent 8c06926e
No related branches found
No related tags found
No related merge requests found
lab07/l7 0 → 100644
Q1:
The difference is that the two functions take different argument type to traverse the statetab.
"add" use the pointer to a state as an input to add the input suffix to the beginning of its suffix linked list.
Meanwhile, "addSuffix" has the prefix as the input. It looks up in the statetab if there is that prefix in there,
then add the suffix to the state that has that prefix. if not, it creates 1. Then it add the suffix in to the
suffix component of the newly created state with the new prefix.
Q2:
Q3:
include a picture of the table later
Number of pointers:
It's : 3
a: 3
new: 3
dawn: 2
Q4:
The advantage of this is that the same word is not duplicated in heap memory, meaning, one word is put into a junk of memory only once. There are no other junk of memory that contains the same string. If the string appears at multile place in a table, the number of pointers to it increases.
Advantage: save memory by not having to allocate addtitional memmory for duplicated strings.
Q5:
Disadvantage:
1. when too many pointers is pointed to a location, any changes at the location will affect all the places that are pointing to it. For example: 'new' appears in the table twice as a prefix (if npref =2) and once as a suffix (after (it's, a) ) so if we change 'new' into 'old' at its location, all the suffix and the 2 prefix will be changed to 'old'.
2. The program did not free up heap memmory allocated for the prefix and suffix while running through all the function calls (and out of scope), but just free them up after it has done executing. This takes up a lot of heap memory space if the input text is long with different words.
Q6:
void cleanTable():
{
State *sp = NULL;
State *t;
int i;
for (i= 0; i < NHASH; i++)
{
for (sp = statetab[i]; sp!= NULL; sp = t)
{
temp = sp->next;
free(sp);
}
}
}
Q7: the function works properly (I think because it gives no output while I try to echo out some variables after calling the cleanTable() function). Difficulties: does not have anyway to simulate what exactly is done between steps of the cleaning process.
Q8: We can store the location of each string in an array. Then when moving on to the next state, check if the components of this state is pointing towards the location stored in the "freed already" array. If it is, move on to next string and do not have to free the string again
Q9:
The prefixes are stored in a deque (double ended queue). Reason for use the deque over a vector is that the library of prefixes and suffixes can go very large so using a deque should allocate non-contigous chunk of memory, which does not account for as much of heap memmory as a vector. Also, as the deque/vector grows bigger, it takes more time for vector to allocate memmory when expanding it size, comparing to the linear growth of time for deque to add more prefix to itself. (according to the graphs on https://www.codeproject.com/Articles/5425/An-In-Depth-Study-of-the-STL-Deque-Containe).
Also, the time it takes for deque to perform push_back() or other deque's member functions related on operating on the beginning and engding of the deque is much shorter than that of what it takes for the vector to do something similar
On the difficulties to program side, a deque already has the library with built-in function to handle data so it would be more convenient to just call out the functions.
Q10:
the dictionary in tis C++ code is implemented by a map with prefix as its key and the vector that contains strings as its value. This is basically a built in hash table without us having to write the hash function.
Q11: they are stored in vector of strings mentioned above. Each string is a suffix. The vector is the value to the key which is the prefix.
Q12: Advantage:
The advantage to this C++ implementation is the counter to the disadvantage of the C implementation. As this one stores the suffix in a vector, not by having a lot of pointers pointing towards ONE memmory location, the suffix in a particular vector in a particular state (or should I call it key-value pair) can be altered without affecting the same-name suffix instances in other vectors in other key-value pairs.
Q13: Disadvantage:
In contrast to the advantage of the C implementation, this one consume more memory as it stores multiple duplications of the same word in different suffix vectors and also as prefixes in many places.
Q14:
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment