Skip to content
Snippets Groups Projects
Commit 18b2e7ba authored by Andrew To's avatar Andrew To
Browse files

Assignment 2

parent 8e62aa53
Branches
No related tags found
No related merge requests found
## Assignment 2 Questions
#### Directions
Please answer the following questions and submit in your repo for the second assignment. Please keep the answers as short and concise as possible.
1. In this assignment I asked you provide an implementation for the `get_student(...)` function because I think it improves the overall design of the database application. After you implemented your solution do you agree that externalizing `get_student(...)` into it's own function is a good design strategy? Briefly describe why or why not.
> **Answer**: I believe it's a good design to include the get_student() function since whenever we want to get a student, we need to move to this position. And we need to do this process in several functions like del_student() or count_db_records(). Also, this design helps our code be readable when choosing the option 'f', so that we can print different outputs, static or from print_student(), based on the return of the get_student() function.
2. Another interesting aspect of the `get_student(...)` function is how its function prototype requires the caller to provide the storage for the `student_t` structure:
```c
int get_student(int fd, int id, student_t *s);
```
Notice that the last parameter is a pointer to storage **provided by the caller** to be used by this function to populate information about the desired student that is queried from the database file. This is a common convention (called pass-by-reference) in the `C` programming language.
In other programming languages an approach like the one shown below would be more idiomatic for creating a function like `get_student()` (specifically the storage is provided by the `get_student(...)` function itself):
```c
//Lookup student from the database
// IF FOUND: return pointer to student data
// IF NOT FOUND: return NULL
student_t *get_student(int fd, int id){
student_t student;
bool student_found = false;
//code that looks for the student and if
//found populates the student structure
//The found_student variable will be set
//to true if the student is in the database
//or false otherwise.
if (student_found)
return &student;
else
return NULL;
}
```
Can you think of any reason why the above implementation would be a **very bad idea** using the C programming language? Specifically, address why the above code introduces a subtle bug that could be hard to identify at runtime?
> **ANSWER:** If we return the pointer which is created inside the function to the main, it will cause some bug like segmentation faults since this memory is valid only inside the function and cannot be accessed outside this function. Therefore, if in any line of code outside this function we refer this return, it will cause segmentation error
3. Another way the `get_student(...)` function could be implemented is as follows:
```c
//Lookup student from the database
// IF FOUND: return pointer to student data
// IF NOT FOUND or memory allocation error: return NULL
student_t *get_student(int fd, int id){
student_t *pstudent;
bool student_found = false;
pstudent = malloc(sizeof(student_t));
if (pstudent == NULL)
return NULL;
//code that looks for the student and if
//found populates the student structure
//The found_student variable will be set
//to true if the student is in the database
//or false otherwise.
if (student_found){
return pstudent;
}
else {
free(pstudent);
return NULL;
}
}
```
In this implementation the storage for the student record is allocated on the heap using `malloc()` and passed back to the caller when the function returns. What do you think about this alternative implementation of `get_student(...)`? Address in your answer why it work work, but also think about any potential problems it could cause.
> **ANSWER:** This works since we dynamically allocate the storage of the student record on the heap. However, this design is not optional since sometimes it will be confused to determine where we should free this memory(if we free this memory inside the function, we can't use it outside. If we don't, we will need to free it in other place in the main function. But it may cause some potential problem about memory leak)
4. Lets take a look at how storage is managed for our simple database. Recall that all student records are stored on disk using the layout of the `student_t` structure (which has a size of 64 bytes). Lets start with a fresh database by deleting the `student.db` file using the command `rm ./student.db`. Now that we have an empty database lets add a few students and see what is happening under the covers. Consider the following sequence of commands:
```bash
> ./sdbsc -a 1 john doe 345
> ls -l ./student.db
-rw-r----- 1 bsm23 bsm23 128 Jan 17 10:01 ./student.db
> du -h ./student.db
4.0K ./student.db
> ./sdbsc -a 3 jane doe 390
> ls -l ./student.db
-rw-r----- 1 bsm23 bsm23 256 Jan 17 10:02 ./student.db
> du -h ./student.db
4.0K ./student.db
> ./sdbsc -a 63 jim doe 285
> du -h ./student.db
4.0K ./student.db
> ./sdbsc -a 64 janet doe 310
> du -h ./student.db
8.0K ./student.db
> ls -l ./student.db
-rw-r----- 1 bsm23 bsm23 4160 Jan 17 10:03 ./student.db
```
For this question I am asking you to perform some online research to investigate why there is a difference between the size of the file reported by the `ls` command and the actual storage used on the disk reported by the `du` command. Understanding why this happens by design is important since all good systems programmers need to understand things like how linux creates sparse files, and how linux physically stores data on disk using fixed block sizes. Some good google searches to get you started: _"lseek syscall holes and sparse files"_, and _"linux file system blocks"_. After you do some research please answer the following:
- Please explain why the file size reported by the `ls` command was 128 bytes after adding student with ID=1, 256 after adding student with ID=3, and 4160 after adding the student with ID=64?
> **ANSWER:** Student with id=x is located starting at byte x * sizeof(student_t), with student_t = 64 bytes. So with any ID=x, it starts at byte 64, and the student data's size is x*64 bytes, so the total is 64*(x+1). Therefore, with x=1, the file size using 'ls' is 128, and 256 and 4160 for ID=3 and ID=64, respectively.
- Why did the total storage used on the disk remain unchanged when we added the student with ID=1, ID=3, and ID=63, but increased from 4K to 8K when we added the student with ID=64?
> **ANSWER:** Since the block 4KB is large enough for storing data of student with ID=1, ID=3 or ID=63, since the max size is 64 * 64 = 4096, which is exactly equal to 4KB. With ID=64, the size now is 4160.
- Now lets add one more student with a large student ID number and see what happens:
```bash
> ./sdbsc -a 99999 big dude 205
> ls -l ./student.db
-rw-r----- 1 bsm23 bsm23 6400000 Jan 17 10:28 ./student.db
> du -h ./student.db
12K ./student.db
```
We see from above adding a student with a very large student ID (ID=99999) increased the file size to 6400000 as shown by `ls` but the raw storage only increased to 12K as reported by `du`. Can provide some insight into why this happened?
> **ANSWER:** In our code, we are using lseek to move to the position with equivalent ID and store data here. Therefore, our db file is a sparse file, which has "holes", where bytes are all zeros, between real data in the file. Therefore, with ID=99999, we need a file with size 6400000 (since 99999*64 = 6399936). The du command measures the disk space occupied by files or directories. Regarding data storage in disk, it doesn't physically store blocks of zeroes, so we don't need that big space for these "holes"
Name: Andrew To
ID: dt686
I did the extra credit implementation
I added the command rm -f .tmp_student.db in Makefile to make it possible to delete the compressed database when we use make test.
\ No newline at end of file
#ifndef __DB_H__
#define __DB_H__
// Basic student database record. Note:
// 1. id must be > 0. A student id==0 means the record has been deleted
// 2. gpa is an int, should be between 0<=gpa<=500, real gpa is gpa/100.0 this
// simplifies dealing with floating point types
// 3. Notice that the student struct was engineered to have a size of
// 64 bytes. There are reasons for using such a number
typedef struct student{
int id;
char fname[24];
char lname[32];
int gpa;
} student_t;
//Define limits for sudent ids and allowable GPA ranges. Note GPA values will
//be stored as integers but printed as floats. For example a GPA of 450 is really
//that value divided by 100.0 or 4.50.
#define MIN_STD_ID 1
#define MAX_STD_ID 100000
#define MIN_STD_GPA 0
#define MAX_STD_GPA 500
//some useful constants you should consider using versus hard coding
//in your program.
static const student_t EMPTY_STUDENT_RECORD = {0};
static const int STUDENT_RECORD_SIZE = sizeof(struct student);
static const int DELETED_STUDENT_ID = 0;
#define DB_FILE "student.db" //name of database file
#define TMP_DB_FILE ".tmp_student.db" //for extra credit
#endif
\ No newline at end of file
assn2/starter/dblayout.png

54.9 KiB

# Compiler settings
CC = gcc
CFLAGS = -Wall -Wextra -g
# Target executable name
TARGET = sdbsc
# Find all source and header files
SRCS = $(wildcard *.c)
HDRS = $(wildcard *.h)
# Default target
all: $(TARGET)
# Compile source to executable
$(TARGET): $(SRCS) $(HDRS)
$(CC) $(CFLAGS) -o $(TARGET) $(SRCS)
# Clean up build files
clean:
rm -f $(TARGET)
rm -f student.db
rm -f .tmp_student.db
test:
./test.sh
# Phony targets
.PHONY: all clean
\ No newline at end of file
This diff is collapsed.
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple Computer//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>CFBundleDevelopmentRegion</key>
<string>English</string>
<key>CFBundleIdentifier</key>
<string>com.apple.xcode.dsym.sdbsc</string>
<key>CFBundleInfoDictionaryVersion</key>
<string>6.0</string>
<key>CFBundlePackageType</key>
<string>dSYM</string>
<key>CFBundleSignature</key>
<string>????</string>
<key>CFBundleShortVersionString</key>
<string>1.0</string>
<key>CFBundleVersion</key>
<string>1</string>
</dict>
</plist>
File added
#ifndef __SDB_H__
#include "db.h" //get student record type
//prototypes for functions go below for this assignment
int open_db(char *dbFile, bool should_truncate);
int add_student(int fd, int id, char *fname, char *lname, int gpa);
int get_student(int fd, int id, student_t *s);
int del_student(int fd, int id);
int compress_db(int fd);
void print_student(student_t *s);
int validate_range(int id, int gpa);
int count_db_records(int fd);
int print_db(int fd);
void usage(char *);
//error codes to be returned from individual functions
// NO_ERROR is returned if there are no errors
// ERR_DB_FILE is returned if there is are any issues with the database file itself
// ERR_DB_OP is returned if an operation did not work aka add or delete a student
// SRCH_NOT_FOUND is returned if the student is not found (get_student, and del_student)
#define NO_ERROR 0
#define ERR_DB_FILE -1
#define ERR_DB_OP -2
#define SRCH_NOT_FOUND -3
#define NOT_IMPLEMENTED_YET 0
//error codes to be returned to the shell
// EXIT_OK program executed without error
// EXIT_FAIL_DB a database operation failed
// EXIT_FAIL_ARGS one or more arguments to program were not valid
// EXIT_NOT_IMPL the operation has not been implemented yet
#define EXIT_OK 0
#define EXIT_FAIL_DB 1
#define EXIT_FAIL_ARGS 2
#define EXIT_NOT_IMPL 3
//Output messages
#define M_ERR_STD_RNG "Cant add student, either ID or GPA out of allowable range!\n"
#define M_ERR_DB_CREATE "Error creating DB file, exiting!\n"
#define M_ERR_DB_OPEN "Error opening DB file, exiting!\n"
#define M_ERR_DB_READ "Error reading DB file, exiting!\n"
#define M_ERR_DB_WRITE "Error writing DB file, exiting!\n"
#define M_ERR_DB_ADD_DUP "Cant add student with ID=%d, already exists in db.\n"
#define M_ERR_STD_PRINT "Cant print student. Student is NULL or ID is zero\n"
#define M_STD_ADDED "Student %d added to database.\n"
#define M_STD_DEL_MSG "Student %d was deleted from database.\n"
#define M_STD_NOT_FND_MSG "Student %d was not found in database.\n"
#define M_DB_COMPRESSED_OK "Database successfully compressed!\n"
#define M_DB_ZERO_OK "All database records removed!\n"
#define M_DB_EMPTY "Database contains no student records.\n"
#define M_DB_RECORD_CNT "Database contains %d student record(s).\n"
#define M_NOT_IMPL "The requested operation is not implemented yet!\n"
//useful format strings for print students
//For example to print the header in the required output:
// printf(STUDENT_PRINT_HDR_STRING, "ID","FIRST NAME",
// "LAST_NAME", "GPA");
#define STUDENT_PRINT_HDR_STRING "%-6s %-24s %-32s %-3s\n"
#define STUDENT_PRINT_FMT_STRING "%-6d %-24.24s %-32.32s %-3.2f\n"
#endif
\ No newline at end of file
#!/usr/bin/env bats
# The setup function runs before every test
setup_file() {
# Delete the student.db file if it exists
if [ -f "student.db" ]; then
rm "student.db"
fi
}
@test "Check if database is empty to start" {
run ./sdbsc -p
[ "$status" -eq 0 ]
[ "$output" = "Database contains no student records." ]
}
@test "Add a student 1 to db" {
run ./sdbsc -a 1 john doe 345
[ "$status" -eq 0 ]
[ "${lines[0]}" = "Student 1 added to database." ]
}
@test "Add more students to db" {
run ./sdbsc -a 3 jane doe 390
[ "$status" -eq 0 ]
[ "${lines[0]}" = "Student 3 added to database." ] || {
echo "Failed Output: $output"
return 1
}
run ./sdbsc -a 63 jim doe 285
[ "$status" -eq 0 ]
[ "${lines[0]}" = "Student 63 added to database." ] || {
echo "Failed Output: $output"
return 1
}
run ./sdbsc -a 64 janet doe 310
[ "$status" -eq 0 ]
[ "${lines[0]}" = "Student 64 added to database." ] || {
echo "Failed Output: $output"
return 1
}
run ./sdbsc -a 99999 big dude 205
[ "$status" -eq 0 ]
[ "${lines[0]}" = "Student 99999 added to database." ] || {
echo "Failed Output: $output"
return 1
}
}
@test "Check student count" {
run ./sdbsc -c
[ "$status" -eq 0 ]
[ "${lines[0]}" = "Database contains 5 student record(s)." ] || {
echo "Failed Output: $output"
return 1
}
}
@test "Make sure adding duplicate student fails" {
run ./sdbsc -a 63 dup student 300
[ "$status" -eq 1 ] || {
echo "Expecting status of 1, got: $status"
return 1
}
[ "${lines[0]}" = "Cant add student with ID=63, already exists in db." ] || {
echo "Failed Output: $output"
return 1
}
}
@test "Make sure the file size is correct at this time" {
run stat --format="%s" ./student.db
[ "$status" -eq 0 ]
[ "${lines[0]}" = "6400000" ] || {
echo "Failed Output: $output"
echo "Expected: 64000000"
return 1
}
}
@test "Find student 3 in db" {
run ./sdbsc -f 3
# Ensure the command ran successfully
[ "$status" -eq 0 ]
# Use echo with -n to avoid adding extra newline and normalize spaces
normalized_output=$(echo -n "${lines[1]}" | tr -s '[:space:]' ' ')
# Define the expected output
expected_output="3 jane doe 3.90"
# Compare the normalized output with the expected output
[ "$normalized_output" = "$expected_output" ] || {
echo "Failed Output: $normalized_output"
echo "Expected: $expected_output"
return 1
}
}
@test "Try looking up non-existent student" {
run ./sdbsc -f 4
[ "$status" -eq 1 ] || {
echo "Expecting status of 1, got: $status"
return 1
}
[ "${lines[0]}" = "Student 4 was not found in database." ] || {
echo "Failed Output: $output"
return 1
}
}
@test "Delete student 64 in db" {
run ./sdbsc -d 64
[ "$status" -eq 0 ]
[ "${lines[0]}" = "Student 64 was deleted from database." ] || {
echo "Failed Output: $output"
return 1
}
}
@test "Try deleting non-existent student" {
run ./sdbsc -d 65
[ "$status" -eq 1 ] || {
echo "Expecting status of 1, got: $status"
return 1
}
[ "${lines[0]}" = "Student 65 was not found in database." ] || {
echo "Failed Output: $output"
return 1
}
}
@test "Check student count again, should be 4 now" {
run ./sdbsc -c
[ "$status" -eq 0 ]
[ "${lines[0]}" = "Database contains 4 student record(s)." ] || {
echo "Failed Output: $output"
return 1
}
}
@test "Print student records" {
# Run the command
run ./sdbsc -p
# Ensure the command ran successfully
[ "$status" -eq 0 ]
# Normalize the output by replacing multiple spaces with a single space
normalized_output=$(echo -n "$output" | tr -s '[:space:]' ' ')
# Define the expected output (normalized)
expected_output="ID FIRST_NAME LAST_NAME GPA 1 john doe 3.45 3 jane doe 3.90 63 jim doe 2.85 99999 big dude 2.05"
# Compare the normalized output
[ "$normalized_output" = "$expected_output" ] || {
echo "Failed Output: $normalized_output"
echo "Expected Output: $expected_output"
return 1
}
}
@test "Compress db - try 1" {
run ./sdbsc -x
[ "$status" -eq 0 ]
[ "${lines[0]}" = "Database successfully compressed!" ] || {
echo "Failed Output: $output"
return 1
}
}
@test "Delete student 99999 in db" {
run ./sdbsc -d 99999
[ "$status" -eq 0 ]
[ "${lines[0]}" = "Student 99999 was deleted from database." ] || {
echo "Failed Output: $output"
return 1
}
}
@test "Compress db again - try 2" {
run ./sdbsc -x
[ "$status" -eq 0 ]
[ "${lines[0]}" = "Database successfully compressed!" ] || {
echo "Failed Output: $output"
return 1
}
}
\ No newline at end of file
#! /bin/bash
./sdbsc -a 1 john doe 3.45
./sdbsc -a 3 jane doe 3.90
./sdbsc -a 63 jim doe 2.85
./sdbsc -a 64 janet doe 3.10
./sdbsc -a 99999 big dude 2.05
\ No newline at end of file
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment