Creating database from scratch tutorial – part 1

In recent years, because of huge technology advancements, an average computer program shifted from mainly providing functionality, towards data processing. Most computer programs nowadays operate megabytes and gigabytes of data with some of them dealing with terabytes and petabytes. Databases also evolved significantly and became quite complex to remain efficient in processing data.

However, every database (or data storage of any kind) reuses similar patterns and approaches. From experience, the easiest way to learn something complex is to split it into smaller and simpler parts and address each of them separately. As well, it is beneficial not only to learn theory but try-out this knowledge in practice. Throughout this tutorial, we will try to create a fully functioning database by learning in-process and reusing approaches from existing databases. The code is written in Java which is easy to use and understand, however, I’m planning to implement it in a couple of other programming languages as well. Let’s start and have some fun during our learning!

With any complex technology – break it down, simplify, learn and try it out.

Storage types

In the first part of this tutorial, we will start small and simple. Every database is, first of all, storage of data. Storage is mainly classified into primary and secondary categories. Primary storage is used for temporal data storage usually within a single application run and data is held in memory (RAM) or the processor cache. Secondary storage is considered to be permanent data storage – data remains accessible after application shutdown. Most databases currently use both approaches, and the reasons for that are very simple. Disks can store data for an indefinite time (until the disk itself becomes not operational anymore), however, I/O operations (reading/writing data) are usually slow and time-consuming. On the other hand, reading from and writing to in-memory data storage or cache is much faster, however, storage size is finite and data would be wiped out after application shutdown. In this part, we will start with permanent storage on disk and then reach in-memory storage in later tutorials.

Primary storage is fast, small, temporal. Secondary storage is slow, huge, permanent.

Creating data storage

To store data somewhere we first need to create a file (or reuse existing if it was already created before).

public class Database {

    private final File dbFile;

    public Database(File dbFile) throws IOException {
        this.dbFile = dbFile;
        this.dbFile.createNewFile(); // creates a file only if it does not exists yet
    }
}

Now we can add read and write functions to the same class.

public String read() {
    StringBuilder sb = new StringBuilder();
    try (BufferedReader in = new BufferedReader(new FileReader(dbFile))) {
        String line;
        while ((line = in.readLine()) != null) {
            sb.append(line);
        }
    } catch (IOException e) {
        throw new RuntimeException("Failed to read from file " + dbFile);
    }

    return sb.toString();
}

public void write(String data) {
    try (BufferedWriter out = new BufferedWriter(new FileWriter(dbFile, true))) {
        out.write(data);
    } catch (IOException e) {
        throw new RuntimeException("Failed to write data to file " + dbFile);
    }
}

Note that we used FileReader and FileWriter because they help in working with character files (exactly what we have at the moment). As well, it is a good practice to wrap the reader/writer with buffers (like BufferedReader and BufferedWriter in this example). Otherwise, every single character would be processed (converted from bytes and returned) separately which may significantly decrease I/O performance. Another thing to look at here is the new FileWriter(dbFile, true). This flag true notifies FileWriter to only append new data at the end of the file as opposed to rewriting it from the beginning if we would set the flag to false.

Testing data storage

In this simple example, we append data to the end of the file and read the whole file in read() method. Now that we have our implementation let’s write some tests to make sure it works.

public class DatabaseTest {

    private static final File TEST_DB_FILE = new File("test_db_file.db");

    @Test
    public void testReadWriteOperations() throws IOException {
        // 0. Instantiate a Database
        Database database = new Database(TEST_DB_FILE);

        // 1. Read database and make sure that it is empty
        String content = database.read();
        assertThat(content).isEmpty();

        // 2. Write some data
        String dataToWrite = "Lorem ipsum dolor sit amet, consectetur adipiscing elit...";
        database.write(dataToWrite);

        // 3. Read data and make sure that it is exactly what was written there
        content = database.read();
        assertThat(content).isEqualTo(dataToWrite);

        // 4. Write more data
        String moreDataToWrite = "sed do eiusmod tempor incididunt ut labore et dolore magna aliqua...";
        database.write(moreDataToWrite);

        // 5. Read data and and check for equality
        content = database.read();
        assertThat(content).isEqualTo(dataToWrite + moreDataToWrite);
    }

    @After
    public void tearDown() {
        // Remove created file
        if (!TEST_DB_FILE.delete()) {
            throw new RuntimeException("Was not able to delete a file " + TEST_DB_FILE.getName());
        }
    }
}

And it does! If you create some test resources, always make sure to remove them after the test case (or test suite) finishes as we did in tearDown() method. Otherwise, you may end up with unpredicted behavior during the next test run.

That’s it for this tutorial, you can find the full code sample (together with tests) on Github. You can go to the next part where we will take a closer look at reading, writing, and updating key-value pairs.

0