Creating database from scratch tutorial – part 2

In the first part of this tutorial, we started with ‘append writes’, and a ‘full read’ of the whole database in memory. No doubt it is not very efficient (the database might be quite large), and we probably need to read some smaller parts of data at a time for our business applications. In this part, we will look at reading, writing, and updating key-value data.

There are several ways how to approach key-value data storing from a database perspective. We can either use the key provided by a user, or we can instead generate a key and return it. The first approach is widely used almost everywhere as it may be convenient for a user to control key type and size. The second approach has fewer applications but you may have seen it, for example, in SQL databases for an integer primary key with autoincrement function. In this way database controls, key generation, and the user has only influence over the type of the key. As well, database generated keys are of predefined content and don’t need to be escaped (with user-provided keys type and content may be undefined, and based on storing approach it may be escaped). Some databases may store a hash representation of a key that solves escaping issues, but this approach may have hash collisions depending on the size of the hash.

Key may be either provided by a user or generated by the database.

Creating key-value data storage

We will use code from the previous tutorial part and modify it wherever necessary. First of all, we would need to read data by key and write key and value. To support update operations we just need to read the last value written for the specified key. Since our database is append-only, we will read the file from the end to find the last occurrence of the key which would effectively be the most up-to-date entry for that key. Let’s take a closer look at the code.

private static final String DELIMITER = ":";

public String read(String key) {
    String sizePrefixedKey = sizePrefixedKey(key);
    try (ReversedLinesFileReader in = new ReversedLinesFileReader(dbFile, Charset.defaultCharset())) {
        String line;
        while ((line = in.readLine()) != null) {
            if (line.startsWith(sizePrefixedKey)) {
                return line.substring(sizePrefixedKey.length() + DELIMITER.length());
            }
        }
    } catch (IOException e) {
        throw new RuntimeException("Failed to read from file " + dbFile);
    }

    return null;
}

public void write(String key, String value) {
    if (key.contains("\n") || value.contains("\n")) {
        throw new IllegalArgumentException("New line characters in keys or values are prohibited");
    }

    try (BufferedWriter out = new BufferedWriter(new FileWriter(dbFile, true))) {
        out.write(String.join(DELIMITER, sizePrefixedKey(key), value));
        out.newLine();
    } catch (IOException e) {
        throw new RuntimeException("Failed to write data to file " + dbFile);
    }
}

private static String sizePrefixedKey(String key) {
    return String.join(DELIMITER, Integer.toString(key.length()), key);
}

Our first change is to introduce a line-based data structure – every line is a new entry (key-value pair) in our database. To make it work, we also need to prohibit newline characters written either in key or in value. Additionally, we need to separate key and value somehow to avoid entries like “k”-“eyvalue” and “key”-“value” considered to be the same. For that, we use key_size:key:value notation to define where the key ends and the value starts. Note that our : delimiter can be used in key or value without any issues because we know the size of the key and value is everything that goes after key and delimiter till the end of the line.

key_size:key:value_size:value notation may be effectively used as a storing mechanism to avoid the necessity to escape newline characters.

For reading a file line by line from the end we can use ReversedLinesFileReader from commons-io library to avoid the complexity of writing a similar solution ourselves. Since code was written as a Maven project, here is a maven dependency description for commons-io. Take a look at its source code to see how it works.

<dependency>
    <groupId>commons-io</groupId>
    <artifactId>commons-io</artifactId>
    <version>2.6</version>
</dependency>

Testing key-value data storage

Now that we have our implementation let’s write some tests to make sure it works. Like the previous time, don’t forget to wipe out test resources after each test to make the test suite clean.

public class DatabaseTest {

    private static final File TEST_DB_FILE = new File("test_db_file.db");

    @Test(expected = IllegalArgumentException.class)
    public void whenNewLineInKey_shouldThrowIAE() throws IOException {
        Database database = new Database(TEST_DB_FILE);

        database.write("ke\ny", "value");
    }

    @Test(expected = IllegalArgumentException.class)
    public void whenNewLineInValue_shouldThrowIAE() throws IOException {
        Database database = new Database(TEST_DB_FILE);

        database.write("key", "val\nue");
    }

    @Test
    public void whenNoKeyValuePair_shouldReturnNull() throws IOException {
        Database database = new Database(TEST_DB_FILE);

        String actualValue = database.read("some key");
        assertThat(actualValue).isNull();
    }

    @Test
    public void testReadWriteOperations() throws IOException {
        Database database = new Database(TEST_DB_FILE);
        String key = "some key";
        String value = "some value";

        database.write(key, value);

        String actualValue = database.read(key);
        assertThat(actualValue).isEqualTo(value);
    }

    @Test
    public void testUpdateOperation() throws IOException {
        Database database = new Database(TEST_DB_FILE);
        String key = "key";
        String value = "value";

        // 1. Write entry
        database.write(key, value);

        // 2. Read entry and make sure it matches expected
        String content = database.read(key);
        assertThat(content).isEqualTo(value);

        // 3. Update entry (write another entry with the same key)
        String newValue = "newValue";
        database.write(key, newValue);

        // 4. Read updated entry and make sure it matches expected
        content = database.read(key);
        assertThat(content).isEqualTo(newValue);
    }

    @After
    public void tearDown() {
        // Remove created file
        if (!TEST_DB_FILE.delete()) {
            throw new RuntimeException("Was not able to delete a file " + TEST_DB_FILE.getName());
        }
    }
}

So, in this part of the tutorial, we reinvented the append-only log data structure. It is used in one way or another in lots of technologies (Cassandra database and Kafka to name a few). It is a quite primitive but very powerful data structure. As promised, we started small and ended up with a well-known data structure written ourselves, that’s awesome!

And that’s it for this part of the tutorial, you can find the full code sample (together with tests) on Github. The next part is in development and soon to be finished, stay tuned!

0