Skip to content
Snippets Groups Projects
Commit 616ed4ba authored by Wendy Nguyen's avatar Wendy Nguyen
Browse files

Revert "update"

This reverts commit a2d0be3a.
parent a2d0be3a
No related branches found
No related tags found
No related merge requests found
## Assignment 2 Questions
#### Directions
Please answer the following questions and submit in your repo for the second assignment. Please keep the answers as short and concise as possible.
1. In this assignment I asked you provide an implementation for the `get_student(...)` function because I think it improves the overall design of the database application. After you implemented your solution do you agree that externalizing `get_student(...)` into it's own function is a good design strategy? Briefly describe why or why not.
> **Answer**: Yes, externalizing `get_student()` improves the design because:
>
> 1. It avoids duplicating file access code across multiple operations (finding, deleting, adding students)
> 2. It separates the low-level file access details from higher-level database operations
> 3. Changes to how students are read only need to be made in one place
> 4. The isolated function is easier to test
2. Another interesting aspect of the `get_student(...)` function is how its function prototype requires the caller to provide the storage for the `student_t` structure:
```c
int get_student(int fd, int id, student_t *s);
```
Notice that the last parameter is a pointer to storage **provided by the caller** to be used by this function to populate information about the desired student that is queried from the database file. This is a common convention (called pass-by-reference) in the `C` programming language.
In other programming languages an approach like the one shown below would be more idiomatic for creating a function like `get_student()` (specifically the storage is provided by the `get_student(...)` function itself):
```c
//Lookup student from the database
// IF FOUND: return pointer to student data
// IF NOT FOUND: return NULL
student_t *get_student(int fd, int id){
student_t student;
bool student_found = false;
//code that looks for the student and if
//found populates the student structure
//The found_student variable will be set
//to true if the student is in the database
//or false otherwise.
if (student_found)
return &student;
else
return NULL;
}
```
Can you think of any reason why the above implementation would be a **very bad idea** using the C programming language? Specifically, address why the above code introduces a subtle bug that could be hard to identify at runtime?
> **Answer**: This implementation is dangerous because it returns a pointer to a local variable (`student`) that is destroyed when the function returns. Using this pointer would cause undefined behavior since it points to invalid memory. This is a common source of bugs in C programs since the memory corruption may not be immediately apparent.
3. Another way the `get_student(...)` function could be implemented is as follows:
```c
//Lookup student from the database
// IF FOUND: return pointer to student data
// IF NOT FOUND or memory allocation error: return NULL
student_t *get_student(int fd, int id){
student_t *pstudent;
bool student_found = false;
pstudent = malloc(sizeof(student_t));
if (pstudent == NULL)
return NULL;
//code that looks for the student and if
//found populates the student structure
//The found_student variable will be set
//to true if the student is in the database
//or false otherwise.
if (student_found){
return pstudent;
}
else {
free(pstudent);
return NULL;
}
}
```
In this implementation the storage for the student record is allocated on the heap using `malloc()` and passed back to the caller when the function returns. What do you think about this alternative implementation of `get_student(...)`? Address in your answer why it work work, but also think about any potential problems it could cause.
> **Answer**: While this implementation would work, it has important drawbacks:
>
> 1. It requires callers to remember to free the memory, risking memory leaks
> 2. NULL return is ambiguous - could mean "not found" or "allocation failed"
> 3. Unnecessarily allocates memory when passing existing memory would work
> 4. Different callers might manage the memory inconsistently
4. Lets take a look at how storage is managed for our simple database. Recall that all student records are stored on disk using the layout of the `student_t` structure (which has a size of 64 bytes). Lets start with a fresh database by deleting the `student.db` file using the command `rm ./student.db`. Now that we have an empty database lets add a few students and see what is happening under the covers. Consider the following sequence of commands:
```bash
> ./sdbsc -a 1 john doe 345
> ls -l ./student.db
-rw-r----- 1 bsm23 bsm23 128 Jan 17 10:01 ./student.db
> du -h ./student.db
4.0K ./student.db
> ./sdbsc -a 3 jane doe 390
> ls -l ./student.db
-rw-r----- 1 bsm23 bsm23 256 Jan 17 10:02 ./student.db
> du -h ./student.db
4.0K ./student.db
> ./sdbsc -a 63 jim doe 285
> du -h ./student.db
4.0K ./student.db
> ./sdbsc -a 64 janet doe 310
> du -h ./student.db
8.0K ./student.db
> ls -l ./student.db
-rw-r----- 1 bsm23 bsm23 4160 Jan 17 10:03 ./student.db
```
For this question I am asking you to perform some online research to investigate why there is a difference between the size of the file reported by the `ls` command and the actual storage used on the disk reported by the `du` command. Understanding why this happens by design is important since all good systems programmers need to understand things like how linux creates sparse files, and how linux physically stores data on disk using fixed block sizes. Some good google searches to get you started: _"lseek syscall holes and sparse files"_, and _"linux file system blocks"_. After you do some research please answer the following:
- Please explain why the file size reported by the `ls` command was 128 bytes after adding student with ID=1, 256 after adding student with ID=3, and 4160 after adding the student with ID=64?
> **Answer**: The `ls` size shows the highest written position in the file. With 64-byte records:
> - ID=1 writes at offset 64, size rounds to 128
> - ID=3 writes at offset 192, size rounds to 256
> - ID=64 writes at offset 4096, size rounds to 4160
> The gaps between records don't use actual disk space.
- Why did the total storage used on the disk remain unchanged when we added the student with ID=1, ID=3, and ID=63, but increased from 4K to 8K when we added the student with ID=64?
> **Answer**: Linux allocates disk space in 4KB blocks. Students 1-63 fit in the first block. Student 64 required a second block because its position (4096) crossed the 4KB boundary.
- Now lets add one more student with a large student ID number and see what happens:
```bash
> ./sdbsc -a 99999 big dude 205
> ls -l ./student.db
-rw-r----- 1 bsm23 bsm23 6400000 Jan 17 10:28 ./student.db
> du -h ./student.db
12K ./student.db
```
We see from above adding a student with a very large student ID (ID=99999) increased the file size to 6400000 as shown by `ls` but the raw storage only increased to 12K as reported by `du`. Can provide some insight into why this happened?
> **Answer**: This demonstrates Linux's efficient sparse file handling:
> - The logical file size (6.4MB) is calculated as 99999 * 64 bytes, since that's the highest offset written
> - However, the actual disk usage (`du`) only shows 12KB because Linux only allocates blocks for regions with actual data
> - The huge gap between student 64 and student 99999 is stored as a "hole" in the sparse file, consuming no physical disk space
> - Only three 4KB blocks are needed to store the actual student records, hence the 12KB physical size
\ No newline at end of file
#ifndef __DB_H__
#define __DB_H__
// Basic student database record. Note:
// 1. id must be > 0. A student id==0 means the record has been deleted
// 2. gpa is an int, should be between 0<=gpa<=500, real gpa is gpa/100.0 this
// simplifies dealing with floating point types
// 3. Notice that the student struct was engineered to have a size of
// 64 bytes. There are reasons for using such a number
typedef struct student{
int id;
char fname[24];
char lname[32];
int gpa;
} student_t;
//Define limits for sudent ids and allowable GPA ranges. Note GPA values will
//be stored as integers but printed as floats. For example a GPA of 450 is really
//that value divided by 100.0 or 4.50.
#define MIN_STD_ID 1
#define MAX_STD_ID 100000
#define MIN_STD_GPA 0
#define MAX_STD_GPA 500
//some useful constants you should consider using versus hard coding
//in your program.
static const student_t EMPTY_STUDENT_RECORD = {0};
static const int STUDENT_RECORD_SIZE = sizeof(struct student);
static const int DELETED_STUDENT_ID = 0;
#define DB_FILE "student.db" //name of database file
#define TMP_DB_FILE ".tmp_student.db" //for extra credit
#endif
\ No newline at end of file
# Compiler settings
CC = gcc
CFLAGS = -Wall -Wextra -g
# Target executable name
TARGET = sdbsc
# Find all source and header files
SRCS = $(wildcard *.c)
HDRS = $(wildcard *.h)
# Default target
all: $(TARGET)
# Compile source to executable
$(TARGET): $(SRCS) $(HDRS)
$(CC) $(CFLAGS) -o $(TARGET) $(SRCS)
# Clean up build files
clean:
rm -f $(TARGET)
rm -f student.db
test:
./test.sh
# Phony targets
.PHONY: all clean
\ No newline at end of file
This diff is collapsed.
#ifndef __SDB_H__
#include "db.h" //get student record type
//prototypes for functions go below for this assignment
int open_db(char *dbFile, bool should_truncate);
int add_student(int fd, int id, char *fname, char *lname, int gpa);
int get_student(int fd, int id, student_t *s);
int del_student(int fd, int id);
int compress_db(int fd);
void print_student(student_t *s);
int validate_range(int id, int gpa);
int count_db_records(int fd);
int print_db(int fd);
void usage(char *);
//error codes to be returned from individual functions
// NO_ERROR is returned if there are no errors
// ERR_DB_FILE is returned if there is are any issues with the database file itself
// ERR_DB_OP is returned if an operation did not work aka add or delete a student
// SRCH_NOT_FOUND is returned if the student is not found (get_student, and del_student)
#define NO_ERROR 0
#define ERR_DB_FILE -1
#define ERR_DB_OP -2
#define SRCH_NOT_FOUND -3
#define NOT_IMPLEMENTED_YET 0
//error codes to be returned to the shell
// EXIT_OK program executed without error
// EXIT_FAIL_DB a database operation failed
// EXIT_FAIL_ARGS one or more arguments to program were not valid
// EXIT_NOT_IMPL the operation has not been implemented yet
#define EXIT_OK 0
#define EXIT_FAIL_DB 1
#define EXIT_FAIL_ARGS 2
#define EXIT_NOT_IMPL 3
//Output messages
#define M_ERR_STD_RNG "Cant add student, either ID or GPA out of allowable range!\n"
#define M_ERR_DB_CREATE "Error creating DB file, exiting!\n"
#define M_ERR_DB_OPEN "Error opening DB file, exiting!\n"
#define M_ERR_DB_READ "Error reading DB file, exiting!\n"
#define M_ERR_DB_WRITE "Error writing DB file, exiting!\n"
#define M_ERR_DB_ADD_DUP "Cant add student with ID=%d, already exists in db.\n"
#define M_ERR_STD_PRINT "Cant print student. Student is NULL or ID is zero\n"
#define M_STD_ADDED "Student %d added to database.\n"
#define M_STD_DEL_MSG "Student %d was deleted from database.\n"
#define M_STD_NOT_FND_MSG "Student %d was not found in database.\n"
#define M_DB_COMPRESSED_OK "Database successfully compressed!\n"
#define M_DB_ZERO_OK "All database records removed!\n"
#define M_DB_EMPTY "Database contains no student records.\n"
#define M_DB_RECORD_CNT "Database contains %d student record(s).\n"
#define M_NOT_IMPL "The requested operation is not implemented yet!\n"
//useful format strings for print students
//For example to print the header in the required output:
// printf(STUDENT_PRINT_HDR_STRING, "ID","FIRST NAME",
// "LAST_NAME", "GPA");
#define STUDENT_PRINT_HDR_STRING "%-6s %-24s %-32s %-3s\n"
#define STUDENT_PRINT_FMT_STRING "%-6d %-24.24s %-32.32s %-3.2f\n"
#endif
\ No newline at end of file
#!/usr/bin/env bats
# The setup function runs before every test
setup_file() {
# Delete the student.db file if it exists
if [ -f "student.db" ]; then
rm "student.db"
fi
}
@test "Check if database is empty to start" {
run ./sdbsc -p
[ "$status" -eq 0 ]
[ "$output" = "Database contains no student records." ]
}
@test "Add a student 1 to db" {
run ./sdbsc -a 1 john doe 345
[ "$status" -eq 0 ]
[ "${lines[0]}" = "Student 1 added to database." ]
}
@test "Add more students to db" {
run ./sdbsc -a 3 jane doe 390
[ "$status" -eq 0 ]
[ "${lines[0]}" = "Student 3 added to database." ] || {
echo "Failed Output: $output"
return 1
}
run ./sdbsc -a 63 jim doe 285
[ "$status" -eq 0 ]
[ "${lines[0]}" = "Student 63 added to database." ] || {
echo "Failed Output: $output"
return 1
}
run ./sdbsc -a 64 janet doe 310
[ "$status" -eq 0 ]
[ "${lines[0]}" = "Student 64 added to database." ] || {
echo "Failed Output: $output"
return 1
}
run ./sdbsc -a 99999 big dude 205
[ "$status" -eq 0 ]
[ "${lines[0]}" = "Student 99999 added to database." ] || {
echo "Failed Output: $output"
return 1
}
}
@test "Check student count" {
run ./sdbsc -c
[ "$status" -eq 0 ]
[ "${lines[0]}" = "Database contains 5 student record(s)." ] || {
echo "Failed Output: $output"
return 1
}
}
@test "Make sure adding duplicate student fails" {
run ./sdbsc -a 63 dup student 300
[ "$status" -eq 1 ] || {
echo "Expecting status of 1, got: $status"
return 1
}
[ "${lines[0]}" = "Cant add student with ID=63, already exists in db." ] || {
echo "Failed Output: $output"
return 1
}
}
@test "Make sure the file size is correct at this time" {
run stat --format="%s" ./student.db
[ "$status" -eq 0 ]
[ "${lines[0]}" = "6400000" ] || {
echo "Failed Output: $output"
echo "Expected: 64000000"
return 1
}
}
@test "Find student 3 in db" {
run ./sdbsc -f 3
# Ensure the command ran successfully
[ "$status" -eq 0 ]
# Use echo with -n to avoid adding extra newline and normalize spaces
normalized_output=$(echo -n "${lines[1]}" | tr -s '[:space:]' ' ')
# Define the expected output
expected_output="3 jane doe 3.90"
# Compare the normalized output with the expected output
[ "$normalized_output" = "$expected_output" ] || {
echo "Failed Output: $normalized_output"
echo "Expected: $expected_output"
return 1
}
}
@test "Try looking up non-existent student" {
run ./sdbsc -f 4
[ "$status" -eq 1 ] || {
echo "Expecting status of 1, got: $status"
return 1
}
[ "${lines[0]}" = "Student 4 was not found in database." ] || {
echo "Failed Output: $output"
return 1
}
}
@test "Delete student 64 in db" {
run ./sdbsc -d 64
[ "$status" -eq 0 ]
[ "${lines[0]}" = "Student 64 was deleted from database." ] || {
echo "Failed Output: $output"
return 1
}
}
@test "Try deleting non-existent student" {
run ./sdbsc -d 65
[ "$status" -eq 1 ] || {
echo "Expecting status of 1, got: $status"
return 1
}
[ "${lines[0]}" = "Student 65 was not found in database." ] || {
echo "Failed Output: $output"
return 1
}
}
@test "Check student count again, should be 4 now" {
run ./sdbsc -c
[ "$status" -eq 0 ]
[ "${lines[0]}" = "Database contains 4 student record(s)." ] || {
echo "Failed Output: $output"
return 1
}
}
@test "Print student records" {
# Run the command
run ./sdbsc -p
# Ensure the command ran successfully
[ "$status" -eq 0 ]
# Normalize the output by replacing multiple spaces with a single space
normalized_output=$(echo -n "$output" | tr -s '[:space:]' ' ')
# Define the expected output (normalized)
expected_output="ID FIRST_NAME LAST_NAME GPA 1 john doe 3.45 3 jane doe 3.90 63 jim doe 2.85 99999 big dude 2.05"
# Compare the normalized output
[ "$normalized_output" = "$expected_output" ] || {
echo "Failed Output: $normalized_output"
echo "Expected Output: $expected_output"
return 1
}
}
@test "Compress db - try 1" {
skip
run ./sdbsc -x
[ "$status" -eq 0 ]
[ "${lines[0]}" = "Database successfully compressed!" ] || {
echo "Failed Output: $output"
return 1
}
}
@test "Delete student 99999 in db" {
run ./sdbsc -d 99999
[ "$status" -eq 0 ]
[ "${lines[0]}" = "Student 99999 was deleted from database." ] || {
echo "Failed Output: $output"
return 1
}
}
@test "Compress db again - try 2" {
run ./sdbsc -x
[ "$status" -eq 0 ]
[ "${lines[0]}" = "Database successfully compressed!" ] || {
echo "Failed Output: $output"
return 1
}
}
\ No newline at end of file
File deleted
This diff is collapsed.
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment