Update Assignment2

b78c726d · bgm47 · 6883e1b1 · b78c726d · b78c726d · b78c726d
Commit b78c726d authored 5 months ago by bgm47
--- a/2-StudentDB/a2-questions.md
+++ b/2-StudentDB/a2-questions.md
+## Assignment 2 Questions
+#### Directions
+Please answer the following questions and submit in your repo for the second assignment.  Please keep the answers as short and concise as possible.
+1. In this assignment I asked you provide an implementation for the `get_student(...)` function because I think it improves the overall design of the database application.   After you implemented your solution do you agree that externalizing `get_student(...)` into it's own function is a good design strategy?  Briefly describe why or why not.
+    > **Answer**:  I believe that separating the 'get_student()' function into its own dedicated function is an excellent solution. It increases code readability and maintainability by removing redundancies. Instead of duplicating the logic for retrieving student records in numerous locations, we can just call this function as needed. This maintains the code more structured and less cluttered, making future changes or upgrades easy while not affecting other areas of the program.
+2. Another interesting aspect of the `get_student(...)` function is how its function prototype requires the caller to provide the storage for the `student_t` structure:
+    ```c
+    int get_student(int fd, int id, student_t *s);
+    ```
+    Notice that the last parameter is a pointer to storage **provided by the caller** to be used by this function to populate information about the desired student that is queried from the database file. This is a common convention (called pass-by-reference) in the `C` programming language. 
+    In other programming languages an approach like the one shown below would be more idiomatic for creating a function like `get_student()` (specifically the storage is provided by the `get_student(...)` function itself):
+    ```c
+    //Lookup student from the database
+    // IF FOUND: return pointer to student data
+    // IF NOT FOUND: return NULL
+    student_t *get_student(int fd, int id){
+        student_t student;
+        bool student_found = false;
+        //code that looks for the student and if
+        //found populates the student structure
+        //The found_student variable will be set
+        //to true if the student is in the database
+        //or false otherwise.
+        if (student_found)
+            return &student;
+        else
+            return NULL;
+    }
+    ```
+    Can you think of any reason why the above implementation would be a **very bad idea** using the C programming language?  Specifically, address why the above code introduces a subtle bug that could be hard to identify at runtime? 
+    > **ANSWER:** The main issue with the alternative method is that it returns a pointer to a local variable that is allocated on the stack. When the function returns, the local variable is no longer in scope, and subsequent function calls or operations may overwrite it. This means that the function's return pointer becomes a dangling pointer, resulting in undefined behaviour if the caller attempts to access or manipulate the memory to which it points. This flaw can be very subtle and difficult to identify because it does not cause immediate crashes but can result in unpredictable behaviour or data corruption later in the program's execution.
+3. Another way the `get_student(...)` function could be implemented is as follows:
+    ```c
+    //Lookup student from the database
+    // IF FOUND: return pointer to student data
+    // IF NOT FOUND or memory allocation error: return NULL
+    student_t *get_student(int fd, int id){
+        student_t *pstudent;
+        bool student_found = false;
+        pstudent = malloc(sizeof(student_t));
+        if (pstudent == NULL)
+            return NULL;
+        //code that looks for the student and if
+        //found populates the student structure
+        //The found_student variable will be set
+        //to true if the student is in the database
+        //or false otherwise.
+        if (student_found){
+            return pstudent;
+        }
+        else {
+            free(pstudent);
+            return NULL;
+        }
+    }
+    ```
+    In this implementation the storage for the student record is allocated on the heap using `malloc()` and passed back to the caller when the function returns. What do you think about this alternative implementation of `get_student(...)`?  Address in your answer why it work work, but also think about any potential problems it could cause.  
+    > **ANSWER:** The implementation operates by dynamically allocating memory for a'student_t' structure, returning a pointer if the student is discovered and releasing it otherwise. While it handled memory allocation failures correctly and prevents immediate leaks, it delegated responsibility for freeing memory to the caller, which risks leaks if neglected. Furthermore, frequent allocations can have an impact on speed, and uninitialised memory can result in undefined behaviour. A better method would be to allow the caller to give a pre-allocated structure, thereby avoiding unnecessary'malloc()' calls and increasing efficiency.
+4. Lets take a look at how storage is managed for our simple database. Recall that all student records are stored on disk using the layout of the `student_t` structure (which has a size of 64 bytes).  Lets start with a fresh database by deleting the `student.db` file using the command `rm ./student.db`.  Now that we have an empty database lets add a few students and see what is happening under the covers.  Consider the following sequence of commands:
+    ```bash
+    > ./sdbsc -a 1 john doe 345
+    > ls -l ./student.db
+        -rw-r----- 1 bsm23 bsm23 128 Jan 17 10:01 ./student.db
+    > du -h ./student.db
+        4.0K    ./student.db
+    > ./sdbsc -a 3 jane doe 390
+    > ls -l ./student.db
+        -rw-r----- 1 bsm23 bsm23 256 Jan 17 10:02 ./student.db
+    > du -h ./student.db
+        4.0K    ./student.db
+    > ./sdbsc -a 63 jim doe 285 
+    > du -h ./student.db
+        4.0K    ./student.db
+    > ./sdbsc -a 64 janet doe 310
+    > du -h ./student.db
+        8.0K    ./student.db
+    > ls -l ./student.db
+        -rw-r----- 1 bsm23 bsm23 4160 Jan 17 10:03 ./student.db
+    ```
+    For this question I am asking you to perform some online research to investigate why there is a difference between the size of the file reported by the `ls` command and the actual storage used on the disk reported by the `du` command.  Understanding why this happens by design is important since all good systems programmers need to understand things like how linux creates sparse files, and how linux physically stores data on disk using fixed block sizes.  Some good google searches to get you started: _"lseek syscall holes and sparse files"_, and _"linux file system blocks"_.  After you do some research please answer the following:
+    - Please explain why the file size reported by the `ls` command was 128 bytes after adding student with ID=1, 256 after adding student with ID=3, and 4160 after adding the student with ID=64? 
+        > **ANSWER:** When you add student records to the database, the file size increases because of how the file system allocates storage. File systems store data in fixed-size blocks rather than the actual size of the data being written. So, when you add a 64-byte student record, it may allocate a whole block—for example, 128 bytes—because that is the smallest block available. As more records are added, the system reserves additional blocks, causing the file size to increase in block-size increments that are frequently bigger than the individual records.
+    -   Why did the total storage used on the disk remain unchanged when we added the student with ID=1, ID=3, and ID=63, but increased from 4K to 8K when we added the student with ID=64? 
+        > **ANSWER:** Because these student records originally fit within preallocated blocks or sparse regions, the actual du remains constant. Although the file size, ls, grows due to internal block allocation, the disc space used remains low. However, when dealing with larger student IDs, the file may need to expand to accommodate additional data. As more records are added, Linux may allocate additional blocks, resulting in an increase in disc space used—for example, from 4K to 8K, as indicated by 'du'.
+    - Now lets add one more student with a large student ID number  and see what happens:
+        ```bash
+        > ./sdbsc -a 99999 big dude 205 
+        > ls -l ./student.db
+        -rw-r----- 1 bsm23 bsm23 6400000 Jan 17 10:28 ./student.db
+        > du -h ./student.db
+        12K     ./student.db
+        ```
+        We see from above adding a student with a very large student ID (ID=99999) increased the file size to 6400000 as shown by `ls` but the raw storage only increased to 12K as reported by `du`.  Can provide some insight into why this happened?
+        > **ANSWER:**  The file size indicated by `ls` increases to 6,400,000 bytes when a student with a high ID number is added, yet the actual disc consumption displayed by `du` only increases to 12K. Linux employs sparse files, which is why this happens. Although they add to the total reported size of the file, unallocated "holes" in sparse files don't take up physical disc space. In this instance, the file system reserves space for the student with the largest ID without actually writing data into those areas. Because of this, `ls` displays the entire amount of the reserved space, whereas `du` appropriately depicts the far less amount of disc space that is actually being used.
\ No newline at end of file
--- a/2-StudentDB/db.h
+++ b/2-StudentDB/db.h
+#ifndef __DB_H__
+    #define __DB_H__
+// Basic student database record.  Note:
+//  1. id must be > 0.  A student id==0 means the record has been deleted
+//  2. gpa is an int, should be between 0<=gpa<=500, real gpa is gpa/100.0 this
+//     simplifies dealing with floating point types
+//  3. Notice that the student struct was engineered to have a size of
+//     64 bytes.  There are reasons for using such a number
+typedef struct student{
+    int id;
+    char fname[24];
+    char lname[32];
+    int gpa; 
+} student_t;
+//Define limits for sudent ids and allowable GPA ranges.  Note GPA values will
+//be stored as integers but printed as floats.  For example a GPA of 450 is really
+//that value divided by 100.0 or 4.50.
+#define MIN_STD_ID      1
+#define MAX_STD_ID      100000
+#define MIN_STD_GPA     0
+#define MAX_STD_GPA     500
+//some useful constants you should consider using versus hard coding
+//in your program. 
+static const student_t EMPTY_STUDENT_RECORD = {0};
+static const int STUDENT_RECORD_SIZE  = sizeof(struct student);
+static const int DELETED_STUDENT_ID = 0;
+#define DB_FILE     "student.db"            //name of database file
+#define TMP_DB_FILE ".tmp_student.db"       //for extra credit
+#endif
\ No newline at end of file
--- a/2-StudentDB/dblayout.png
+++ b/2-StudentDB/dblayout.png
--- a/2-StudentDB/makefile
+++ b/2-StudentDB/makefile
+# Compiler settings
+CC = gcc
+CFLAGS = -Wall -Wextra -g
+# Target executable name
+TARGET = sdbsc
+# Find all source and header files
+SRCS = $(wildcard *.c)
+HDRS = $(wildcard *.h)
+# Default target
+all: $(TARGET)
+# Compile source to executable
+$(TARGET): $(SRCS) $(HDRS)
+	$(CC) $(CFLAGS) -o $(TARGET) $(SRCS)
+# Clean up build files
+clean:
+	rm -f $(TARGET)
+	rm -f student.db
+test:
+	./test.sh
+# Phony targets
+.PHONY: all clean
\ No newline at end of file
--- a/2-StudentDB/sdbsc
+++ b/2-StudentDB/sdbsc
--- a/2-StudentDB/sdbsc.c
+++ b/2-StudentDB/sdbsc.c
--- a/2-StudentDB/sdbsc1.h
+++ b/2-StudentDB/sdbsc1.h
+#ifndef __SDB_H__
+#include "db.h" //get student record type
+//prototypes for functions go below for this assignment
+int open_db(char *dbFile, bool should_truncate);
+int add_student(int fd, int id, char *fname, char *lname, int gpa);
+int get_student(int fd, int id, student_t *s);
+int del_student(int fd, int id);
+int compress_db(int fd);
+void print_student(student_t *s);
+int validate_range(int id, int gpa);
+int count_db_records(int fd);
+int print_db(int fd);
+void usage(char *);
+//error codes to be returned from individual functions
+// NO_ERROR is returned if there are no errors
+// ERR_DB_FILE is returned if there is are any issues with the database file itself
+// ERR_DB_OP is returned if an operation did not work aka add or delete a student
+// SRCH_NOT_FOUND is returned if the student is not found (get_student, and del_student)
+#define NO_ERROR        0
+#define ERR_DB_FILE     -1
+#define ERR_DB_OP       -2
+#define SRCH_NOT_FOUND  -3
+#define NOT_IMPLEMENTED_YET 0
+//error codes to be returned to the shell
+// EXIT_OK          program executed without error
+// EXIT_FAIL_DB     a database operation failed
+// EXIT_FAIL_ARGS   one or more arguments to program were not valid
+// EXIT_NOT_IMPL    the operation has not been implemented yet
+#define EXIT_OK         0
+#define EXIT_FAIL_DB    1
+#define EXIT_FAIL_ARGS  2
+#define EXIT_NOT_IMPL   3
+//Output messages
+#define M_ERR_STD_RNG     "Cant add student, either ID or GPA out of allowable range!\n"
+#define M_ERR_DB_CREATE   "Error creating DB file, exiting!\n"
+#define M_ERR_DB_OPEN     "Error opening DB file, exiting!\n"
+#define M_ERR_DB_READ     "Error reading DB file, exiting!\n"
+#define M_ERR_DB_WRITE    "Error writing DB file, exiting!\n"
+#define M_ERR_DB_ADD_DUP  "Cant add student with ID=%d, already exists in db.\n"
+#define M_ERR_STD_PRINT   "Cant print student. Student is NULL or ID is zero\n"
+#define M_STD_ADDED       "Student %d added to database.\n"
+#define M_STD_DEL_MSG     "Student %d was deleted from database.\n"
+#define M_STD_NOT_FND_MSG "Student %d was not found in database.\n"
+#define M_DB_COMPRESSED_OK "Database successfully compressed!\n"
+#define M_DB_ZERO_OK      "All database records removed!\n"
+#define M_DB_EMPTY        "Database contains no student records.\n"
+#define M_DB_RECORD_CNT   "Database contains %d student record(s).\n"
+#define M_NOT_IMPL        "The requested operation is not implemented yet!\n"
+//useful format strings for print students
+//For example to print the header in the required output:
+//  printf(STUDENT_PRINT_HDR_STRING, "ID","FIRST NAME", 
+//                                   "LAST_NAME", "GPA");
+#define  STUDENT_PRINT_HDR_STRING   "%-6s %-24s %-32s %-3s\n"
+#define  STUDENT_PRINT_FMT_STRING   "%-6d %-24.24s %-32.32s %-3.2f\n"
+#endif
\ No newline at end of file
--- a/2-StudentDB/student.db
+++ b/2-StudentDB/student.db
--- a/2-StudentDB/test.sh
+++ b/2-StudentDB/test.sh
+#!/usr/bin/env bats
+# The setup function runs before every test
+setup_file() {
+    # Delete the student.db file if it exists
+    if [ -f "student.db" ]; then
+        rm "student.db"
+    fi
+}
+@test "Check if database is empty to start" {
+    run ./sdbsc -p
+    [ "$status" -eq 0 ]
+    [ "$output" = "Database contains no student records." ]
+}
+@test "Add a student 1 to db" {
+    run ./sdbsc -a 1      john doe 345
+    [ "$status" -eq 0 ]
+    [ "${lines[0]}" = "Student 1 added to database." ]
+}
+@test "Add more students to db" {
+    run ./sdbsc -a 3      jane  doe  390
+    [ "$status" -eq 0 ]
+    [ "${lines[0]}" = "Student 3 added to database." ] || {
+        echo "Failed Output:  $output"
+        return 1
+    }
+    run ./sdbsc -a 63     jim   doe  285
+    [ "$status" -eq 0 ]
+    [ "${lines[0]}" = "Student 63 added to database." ] || {
+        echo "Failed Output:  $output"
+        return 1
+    }
+    run ./sdbsc -a 64     janet doe  310
+    [ "$status" -eq 0 ]
+    [ "${lines[0]}" = "Student 64 added to database." ] || {
+        echo "Failed Output:  $output"
+        return 1
+    }
+    run ./sdbsc -a 99999  big   dude 205
+    [ "$status" -eq 0 ]
+    [ "${lines[0]}" = "Student 99999 added to database." ] || {
+        echo "Failed Output:  $output"
+        return 1
+    }
+}
+@test "Check student count" {
+    run ./sdbsc -c
+    [ "$status" -eq 0 ]
+    [ "${lines[0]}" = "Database contains 5 student record(s)." ] || {
+        echo "Failed Output:  $output"
+        return 1
+    }
+}
+@test "Make sure adding duplicate student fails" {
+    run ./sdbsc -a 63 dup student 300
+    [ "$status" -eq 1 ]  || {
+        echo "Expecting status of 1, got:  $status"
+        return 1
+    }
+    [ "${lines[0]}" = "Cant add student with ID=63, already exists in db." ] || {
+        echo "Failed Output:  $output"
+        return 1
+    }
+}
+@test "Make sure the file size is correct at this time" {
+    run stat --format="%s" ./student.db
+    [ "$status" -eq 0 ]
+    [ "${lines[0]}" = "6400000" ] || {
+        echo "Failed Output:  $output"
+        echo "Expected: 64000000"
+        return 1
+    }
+}
+@test "Find student 3 in db" {
+    run ./sdbsc -f 3
+    # Ensure the command ran successfully
+    [ "$status" -eq 0 ]
+    # Use echo with -n to avoid adding extra newline and normalize spaces
+    normalized_output=$(echo -n "${lines[1]}" | tr -s '[:space:]' ' ')
+    # Define the expected output
+    expected_output="3 jane doe 3.90"
+    # Compare the normalized output with the expected output
+    [ "$normalized_output" = "$expected_output" ] || {
+        echo "Failed Output:  $normalized_output"
+        echo "Expected: $expected_output"
+        return 1
+    }
+}
+@test "Try looking up non-existent student" {
+    run ./sdbsc -f 4
+    [ "$status" -eq 1 ]  || {
+        echo "Expecting status of 1, got:  $status"
+        return 1
+    }
+    [ "${lines[0]}" = "Student 4 was not found in database." ] || {
+        echo "Failed Output:  $output"
+        return 1
+    }
+}
+@test "Delete student 64 in db" {
+    run ./sdbsc -d 64
+    [ "$status" -eq 0 ]
+    [ "${lines[0]}" = "Student 64 was deleted from database." ] || {
+        echo "Failed Output:  $output"
+        return 1
+    }
+}
+@test "Try deleting non-existent student" {
+    run ./sdbsc -d 65
+    [ "$status" -eq 1 ]  || {
+        echo "Expecting status of 1, got:  $status"
+        return 1
+    }
+    [ "${lines[0]}" = "Student 65 was not found in database." ] || {
+        echo "Failed Output:  $output"
+        return 1
+    }
+}
+@test "Check student count again, should be 4 now" {
+    run ./sdbsc -c
+    [ "$status" -eq 0 ]
+    [ "${lines[0]}" = "Database contains 4 student record(s)." ] || {
+        echo "Failed Output:  $output"
+        return 1
+    }
+}
+@test "Print student records" {
+    # Run the command
+    run ./sdbsc -p
+    # Ensure the command ran successfully
+    [ "$status" -eq 0 ]
+    # Normalize the output by replacing multiple spaces with a single space
+    normalized_output=$(echo -n "$output" | tr -s '[:space:]' ' ')
+    # Define the expected output (normalized)
+    expected_output="ID FIRST_NAME LAST_NAME GPA 1 john doe 3.45 3 jane doe 3.90 63 jim doe 2.85 99999 big dude 2.05"
+    # Compare the normalized output
+    [ "$normalized_output" = "$expected_output" ] || {
+        echo "Failed Output: $normalized_output"
+        echo "Expected Output: $expected_output"
+        return 1
+    }
+}
+@test "Compress db - try 1" {
+    skip
+    run ./sdbsc -x
+    [ "$status" -eq 0 ]
+    [ "${lines[0]}" = "Database successfully compressed!" ] || {
+        echo "Failed Output:  $output"
+        return 1
+    }
+}
+@test "Delete student 99999 in db" {
+    run ./sdbsc -d 99999
+    [ "$status" -eq 0 ]
+    [ "${lines[0]}" = "Student 99999 was deleted from database." ] || {
+        echo "Failed Output:  $output"
+        return 1
+    }
+}
+@test "Compress db again - try 2" {
+    run ./sdbsc -x
+    [ "$status" -eq 0 ]
+    [ "${lines[0]}" = "Database successfully compressed!" ] || {
+        echo "Failed Output:  $output"
+        return 1
+    }
+}
\ No newline at end of file
--- a/2-StudentDB/testload.sh
+++ b/2-StudentDB/testload.sh
+#! /bin/bash
+./sdbsc -a 1      john doe 3.45
+./sdbsc -a 3      jane  doe  3.90
+./sdbsc -a 63     jim   doe  2.85
+./sdbsc -a 64     janet doe  3.10
+./sdbsc -a 99999  big   dude 2.05
\ No newline at end of file