asignment 2

d530cd9d · vht24 · 0dc6b073 · d530cd9d · d530cd9d · d530cd9d
Commit d530cd9d authored 5 months ago by vht24
--- a/Assignment2/a2-questions.md
+++ b/Assignment2/a2-questions.md
+## Assignment 2 Questions
+#### Directions
+Please answer the following questions and submit in your repo for the second assignment.  Please keep the answers as short and concise as possible.
+1. In this assignment I asked you provide an implementation for the `get_student(...)` function because I think it improves the overall design of the database application.   After you implemented your solution do you agree that externalizing `get_student(...)` into it's own function is a good design strategy?  Briefly describe why or why not.
+    > **Answer**:  Yes, externalizing get_student(...) is a good choice. It formalize the logic for retrieving a student, reducing code duplication and improving ease of use. Other functions (e.g., `del_student()`, `print_student()`) reuse it a lot, ensuring consistent behavior when accessing student records.
+2. Another interesting aspect of the `get_student(...)` function is how its function prototype requires the caller to provide the storage for the `student_t` structure:
+    ```c
+    int get_student(int fd, int id, student_t *s);
+    ```
+    Notice that the last parameter is a pointer to storage **provided by the caller** to be used by this function to populate information about the desired student that is queried from the database file. This is a common convention (called pass-by-reference) in the `C` programming language. 
+    In other programming languages an approach like the one shown below would be more idiomatic for creating a function like `get_student()` (specifically the storage is provided by the `get_student(...)` function itself):
+    ```c
+    //Lookup student from the database
+    // IF FOUND: return pointer to student data
+    // IF NOT FOUND: return NULL
+    student_t *get_student(int fd, int id){
+        student_t student;
+        bool student_found = false;
+        //code that looks for the student and if
+        //found populates the student structure
+        //The found_student variable will be set
+        //to true if the student is in the database
+        //or false otherwise.
+        if (student_found)
+            return &student;
+        else
+            return NULL;
+    }
+    ```
+    Can you think of any reason why the above implementation would be a **very bad idea** using the C programming language?  Specifically, address why the above code introduces a subtle bug that could be hard to identify at runtime? 
+    > **ANSWER:** This implementation is dangerous because student is a local variable stored on the stack. When the function returns, the stack is deallocated, making `&student` a dangling pointer. This leads to undefined behavior.
+3. Another way the `get_student(...)` function could be implemented is as follows:
+    ```c
+    //Lookup student from the database
+    // IF FOUND: return pointer to student data
+    // IF NOT FOUND or memory allocation error: return NULL
+    student_t *get_student(int fd, int id){
+        student_t *pstudent;
+        bool student_found = false;
+        pstudent = malloc(sizeof(student_t));
+        if (pstudent == NULL)
+            return NULL;
+        //code that looks for the student and if
+        //found populates the student structure
+        //The found_student variable will be set
+        //to true if the student is in the database
+        //or false otherwise.
+        if (student_found){
+            return pstudent;
+        }
+        else {
+            free(pstudent);
+            return NULL;
+        }
+    }
+    ```
+    In this implementation the storage for the student record is allocated on the heap using `malloc()` and passed back to the caller when the function returns. What do you think about this alternative implementation of `get_student(...)`?  Address in your answer why it work work, but also think about any potential problems it could cause.  
+    > **ANSWER:** This approach works because malloc() allocates memory on the heap, ensuring the student record remains valid after the function returns. However, it can cause memory issues:
+    > - The caller must free() the memory, which can be annoying sometimes, or it causes memory leaks.
+    > - If the function is called frequently, excessive heap allocations may degrade performance.
+    > Managing memory explicitly like the current implementation reduce that potential threat.
+4. Lets take a look at how storage is managed for our simple database. Recall that all student records are stored on disk using the layout of the `student_t` structure (which has a size of 64 bytes).  Lets start with a fresh database by deleting the `student.db` file using the command `rm ./student.db`.  Now that we have an empty database lets add a few students and see what is happening under the covers.  Consider the following sequence of commands:
+    ```bash
+    > ./sdbsc -a 1 john doe 345
+    > ls -l ./student.db
+        -rw-r----- 1 bsm23 bsm23 128 Jan 17 10:01 ./student.db
+    > du -h ./student.db
+        4.0K    ./student.db
+    > ./sdbsc -a 3 jane doe 390
+    > ls -l ./student.db
+        -rw-r----- 1 bsm23 bsm23 256 Jan 17 10:02 ./student.db
+    > du -h ./student.db
+        4.0K    ./student.db
+    > ./sdbsc -a 63 jim doe 285 
+    > du -h ./student.db
+        4.0K    ./student.db
+    > ./sdbsc -a 64 janet doe 310
+    > du -h ./student.db
+        8.0K    ./student.db
+    > ls -l ./student.db
+        -rw-r----- 1 bsm23 bsm23 4160 Jan 17 10:03 ./student.db
+    ```
+    For this question I am asking you to perform some online research to investigate why there is a difference between the size of the file reported by the `ls` command and the actual storage used on the disk reported by the `du` command.  Understanding why this happens by design is important since all good systems programmers need to understand things like how linux creates sparse files, and how linux physically stores data on disk using fixed block sizes.  Some good google searches to get you started: _"lseek syscall holes and sparse files"_, and _"linux file system blocks"_.  After you do some research please answer the following:
+    - Please explain why the file size reported by the `ls` command was 128 bytes after adding student with ID=1, 256 after adding student with ID=3, and 4160 after adding the student with ID=64? 
+        > **ANSWER:** The database uses sparse file allocation. Each student record is 64 bytes, and their position is based on `id * sizeof(student_t)`.
+        > - ID=1 → First record at offset 1 * 64 = 64 (File size: 128 bytes).
+        > - ID=3 → Third record at offset 3 * 64 = 192 (File size: 256 bytes).
+        > - ID=64 → 64th record at offset 64 * 64 = 4096 (File size: 4160 bytes).
+        > Linux does not store zeros physically in sparse files, so the actual disk usage is just enough to hold the largest id. 
+    -   Why did the total storage used on the disk remain unchanged when we added the student with ID=1, ID=3, and ID=63, but increased from 4K to 8K when we added the student with ID=64? 
+        > **ANSWER:** Before ID=64, all records fit within the first 4 KB block, adding ID=64 moved into a new 4 KB block, causing real storage allocation to increase to 8 KB.
+    - Now lets add one more student with a large student ID number  and see what happens:
+        ```bash
+        > ./sdbsc -a 99999 big dude 205 
+        > ls -l ./student.db
+        -rw-r----- 1 bsm23 bsm23 6400000 Jan 17 10:28 ./student.db
+        > du -h ./student.db
+        12K     ./student.db
+        ```
+        We see from above adding a student with a very large student ID (ID=99999) increased the file size to 6400000 as shown by `ls` but the raw storage only increased to 12K as reported by `du`.  Can provide some insight into why this happened?
+        > **ANSWER:**  `ls` reports the full file size. `du` only reports actual allocated blocks, so when ID 99999 is added, it is stored in another `4K` block, so actual disk usage only increase by `4K` since Linux does not allocate blocks for empty regions.
\ No newline at end of file
--- a/Assignment2/db.h
+++ b/Assignment2/db.h
+#ifndef __DB_H__
+    #define __DB_H__
+// Basic student database record.  Note:
+//  1. id must be > 0.  A student id==0 means the record has been deleted
+//  2. gpa is an int, should be between 0<=gpa<=500, real gpa is gpa/100.0 this
+//     simplifies dealing with floating point types
+//  3. Notice that the student struct was engineered to have a size of
+//     64 bytes.  There are reasons for using such a number
+typedef struct student{
+    int id;
+    char fname[24];
+    char lname[32];
+    int gpa; 
+} student_t;
+//Define limits for sudent ids and allowable GPA ranges.  Note GPA values will
+//be stored as integers but printed as floats.  For example a GPA of 450 is really
+//that value divided by 100.0 or 4.50.
+#define MIN_STD_ID      1
+#define MAX_STD_ID      100000
+#define MIN_STD_GPA     0
+#define MAX_STD_GPA     500
+//some useful constants you should consider using versus hard coding
+//in your program. 
+static const student_t EMPTY_STUDENT_RECORD = {0};
+static const int STUDENT_RECORD_SIZE  = sizeof(struct student);
+static const int DELETED_STUDENT_ID = 0;
+#define DB_FILE     "student.db"            //name of database file
+#define TMP_DB_FILE ".tmp_student.db"       //for extra credit
+#endif
\ No newline at end of file
--- a/Assignment2/dblayout.png
+++ b/Assignment2/dblayout.png
--- a/Assignment2/makefile
+++ b/Assignment2/makefile
+# Compiler settings
+CC = gcc
+CFLAGS = -Wall -Wextra -g
+# Target executable name
+TARGET = sdbsc
+# Find all source and header files
+SRCS = $(wildcard *.c)
+HDRS = $(wildcard *.h)
+# Default target
+all: $(TARGET)
+# Compile source to executable
+$(TARGET): $(SRCS) $(HDRS)
+	$(CC) $(CFLAGS) -o $(TARGET) $(SRCS)
+# Clean up build files
+clean:
+	rm -f $(TARGET)
+	rm -f student.db
+test:
+	./test.sh
+# Phony targets
+.PHONY: all clean
\ No newline at end of file
--- a/Assignment2/sdbsc
+++ b/Assignment2/sdbsc
--- a/Assignment2/sdbsc.c
+++ b/Assignment2/sdbsc.c
--- a/Assignment2/sdbsc.h
+++ b/Assignment2/sdbsc.h
+#ifndef __SDB_H__
+#include "db.h" //get student record type
+//prototypes for functions go below for this assignment
+int open_db(char *dbFile, bool should_truncate);
+int add_student(int fd, int id, char *fname, char *lname, int gpa);
+int get_student(int fd, int id, student_t *s);
+int del_student(int fd, int id);
+int compress_db(int fd);
+void print_student(student_t *s);
+int validate_range(int id, int gpa);
+int count_db_records(int fd);
+int print_db(int fd);
+void usage(char *);
+//error codes to be returned from individual functions
+// NO_ERROR is returned if there are no errors
+// ERR_DB_FILE is returned if there is are any issues with the database file itself
+// ERR_DB_OP is returned if an operation did not work aka add or delete a student
+// SRCH_NOT_FOUND is returned if the student is not found (get_student, and del_student)
+#define NO_ERROR        0
+#define ERR_DB_FILE     -1
+#define ERR_DB_OP       -2
+#define SRCH_NOT_FOUND  -3
+#define NOT_IMPLEMENTED_YET 0
+//error codes to be returned to the shell
+// EXIT_OK          program executed without error
+// EXIT_FAIL_DB     a database operation failed
+// EXIT_FAIL_ARGS   one or more arguments to program were not valid
+// EXIT_NOT_IMPL    the operation has not been implemented yet
+#define EXIT_OK         0
+#define EXIT_FAIL_DB    1
+#define EXIT_FAIL_ARGS  2
+#define EXIT_NOT_IMPL   3
+//Output messages
+#define M_ERR_STD_RNG     "Cant add student, either ID or GPA out of allowable range!\n"
+#define M_ERR_DB_CREATE   "Error creating DB file, exiting!\n"
+#define M_ERR_DB_OPEN     "Error opening DB file, exiting!\n"
+#define M_ERR_DB_READ     "Error reading DB file, exiting!\n"
+#define M_ERR_DB_WRITE    "Error writing DB file, exiting!\n"
+#define M_ERR_DB_ADD_DUP  "Cant add student with ID=%d, already exists in db.\n"
+#define M_ERR_STD_PRINT   "Cant print student. Student is NULL or ID is zero\n"
+#define M_STD_ADDED       "Student %d added to database.\n"
+#define M_STD_DEL_MSG     "Student %d was deleted from database.\n"
+#define M_STD_NOT_FND_MSG "Student %d was not found in database.\n"
+#define M_DB_COMPRESSED_OK "Database successfully compressed!\n"
+#define M_DB_ZERO_OK      "All database records removed!\n"
+#define M_DB_EMPTY        "Database contains no student records.\n"
+#define M_DB_RECORD_CNT   "Database contains %d student record(s).\n"
+#define M_NOT_IMPL        "The requested operation is not implemented yet!\n"
+//useful format strings for print students
+//For example to print the header in the required output:
+//  printf(STUDENT_PRINT_HDR_STRING, "ID","FIRST NAME", 
+//                                   "LAST_NAME", "GPA");
+#define  STUDENT_PRINT_HDR_STRING   "%-6s %-24s %-32s %-3s\n"
+#define  STUDENT_PRINT_FMT_STRING   "%-6d %-24.24s %-32.32s %-3.2f\n"
+#endif
\ No newline at end of file
--- a/Assignment2/student.db
+++ b/Assignment2/student.db
--- a/Assignment2/test.sh
+++ b/Assignment2/test.sh
+#!/usr/bin/env bats
+# The setup function runs before every test
+setup_file() {
+    # Delete the student.db file if it exists
+    if [ -f "student.db" ]; then
+        rm "student.db"
+    fi
+}
+@test "Check if database is empty to start" {
+    run ./sdbsc -p
+    [ "$status" -eq 0 ]
+    [ "$output" = "Database contains no student records." ]
+}
+@test "Add a student 1 to db" {
+    run ./sdbsc -a 1      john doe 345
+    [ "$status" -eq 0 ]
+    [ "${lines[0]}" = "Student 1 added to database." ]
+}
+@test "Add more students to db" {
+    run ./sdbsc -a 3      jane  doe  390
+    [ "$status" -eq 0 ]
+    [ "${lines[0]}" = "Student 3 added to database." ] || {
+        echo "Failed Output:  $output"
+        return 1
+    }
+    run ./sdbsc -a 63     jim   doe  285
+    [ "$status" -eq 0 ]
+    [ "${lines[0]}" = "Student 63 added to database." ] || {
+        echo "Failed Output:  $output"
+        return 1
+    }
+    run ./sdbsc -a 64     janet doe  310
+    [ "$status" -eq 0 ]
+    [ "${lines[0]}" = "Student 64 added to database." ] || {
+        echo "Failed Output:  $output"
+        return 1
+    }
+    run ./sdbsc -a 99999  big   dude 205
+    [ "$status" -eq 0 ]
+    [ "${lines[0]}" = "Student 99999 added to database." ] || {
+        echo "Failed Output:  $output"
+        return 1
+    }
+}
+@test "Check student count" {
+    run ./sdbsc -c
+    [ "$status" -eq 0 ]
+    [ "${lines[0]}" = "Database contains 5 student record(s)." ] || {
+        echo "Failed Output:  $output"
+        return 1
+    }
+}
+@test "Make sure adding duplicate student fails" {
+    run ./sdbsc -a 63 dup student 300
+    [ "$status" -eq 1 ]  || {
+        echo "Expecting status of 1, got:  $status"
+        return 1
+    }
+    [ "${lines[0]}" = "Cant add student with ID=63, already exists in db." ] || {
+        echo "Failed Output:  $output"
+        return 1
+    }
+}
+@test "Make sure the file size is correct at this time" {
+    run stat --format="%s" ./student.db
+    [ "$status" -eq 0 ]
+    [ "${lines[0]}" = "6400000" ] || {
+        echo "Failed Output:  $output"
+        echo "Expected: 64000000"
+        return 1
+    }
+}
+@test "Find student 3 in db" {
+    run ./sdbsc -f 3
+    # Ensure the command ran successfully
+    [ "$status" -eq 0 ]
+    # Use echo with -n to avoid adding extra newline and normalize spaces
+    normalized_output=$(echo -n "${lines[1]}" | tr -s '[:space:]' ' ')
+    # Define the expected output
+    expected_output="3 jane doe 3.90"
+    # Compare the normalized output with the expected output
+    [ "$normalized_output" = "$expected_output" ] || {
+        echo "Failed Output:  $normalized_output"
+        echo "Expected: $expected_output"
+        return 1
+    }
+}
+@test "Try looking up non-existent student" {
+    run ./sdbsc -f 4
+    [ "$status" -eq 1 ]  || {
+        echo "Expecting status of 1, got:  $status"
+        return 1
+    }
+    [ "${lines[0]}" = "Student 4 was not found in database." ] || {
+        echo "Failed Output:  $output"
+        return 1
+    }
+}
+@test "Delete student 64 in db" {
+    run ./sdbsc -d 64
+    [ "$status" -eq 0 ]
+    [ "${lines[0]}" = "Student 64 was deleted from database." ] || {
+        echo "Failed Output:  $output"
+        return 1
+    }
+}
+@test "Try deleting non-existent student" {
+    run ./sdbsc -d 65
+    [ "$status" -eq 1 ]  || {
+        echo "Expecting status of 1, got:  $status"
+        return 1
+    }
+    [ "${lines[0]}" = "Student 65 was not found in database." ] || {
+        echo "Failed Output:  $output"
+        return 1
+    }
+}
+@test "Check student count again, should be 4 now" {
+    run ./sdbsc -c
+    [ "$status" -eq 0 ]
+    [ "${lines[0]}" = "Database contains 4 student record(s)." ] || {
+        echo "Failed Output:  $output"
+        return 1
+    }
+}
+@test "Print student records" {
+    # Run the command
+    run ./sdbsc -p
+    # Ensure the command ran successfully
+    [ "$status" -eq 0 ]
+    # Normalize the output by replacing multiple spaces with a single space
+    normalized_output=$(echo -n "$output" | tr -s '[:space:]' ' ')
+    # Define the expected output (normalized)
+    expected_output="ID FIRST_NAME LAST_NAME GPA 1 john doe 3.45 3 jane doe 3.90 63 jim doe 2.85 99999 big dude 2.05"
+    # Compare the normalized output
+    [ "$normalized_output" = "$expected_output" ] || {
+        echo "Failed Output: $normalized_output"
+        echo "Expected Output: $expected_output"
+        return 1
+    }
+}
+@test "Compress db - try 1" {
+    skip
+    run ./sdbsc -x
+    [ "$status" -eq 0 ]
+    [ "${lines[0]}" = "Database successfully compressed!" ] || {
+        echo "Failed Output:  $output"
+        return 1
+    }
+}
+@test "Delete student 99999 in db" {
+    run ./sdbsc -d 99999
+    [ "$status" -eq 0 ]
+    [ "${lines[0]}" = "Student 99999 was deleted from database." ] || {
+        echo "Failed Output:  $output"
+        return 1
+    }
+}
+@test "Compress db again - try 2" {
+    run ./sdbsc -x
+    [ "$status" -eq 0 ]
+    [ "${lines[0]}" = "Database successfully compressed!" ] || {
+        echo "Failed Output:  $output"
+        return 1
+    }
+}
\ No newline at end of file
--- a/Assignment2/testload.sh
+++ b/Assignment2/testload.sh
+#! /bin/bash
+./sdbsc -a 1      john doe 3.45
+./sdbsc -a 3      jane  doe  3.90
+./sdbsc -a 63     jim   doe  2.85
+./sdbsc -a 64     janet doe  3.10
+./sdbsc -a 99999  big   dude 2.05
\ No newline at end of file