gql-db, Explained.
gql-db is an SQL database server with version control integrated into the database itself. It's written in Rust and uses Protocol Buffers/gRPC for communication. This was our Capstone Software Engineering project for Purdue University.
The motivation behind this, would be to allow developers to work on separate "branches" of the database, and then merge their changes into a main branch. Reverting changes, now termed as commits, in a database also becomes much easier.
Implementing this as a component of the database system, instead of as a "layer" on top of the database system allows us to take advantage of the same underlying infrastructure used by the database system to efficiently store and access version control information.
Learn more at our repos: GQL-Project/gql_db and GQL-Project/gql_client
Features
- SQL:
- Basic SQL operations:
SELECT
,INSERT
,UPDATE
,DELETE
,CREATE TABLE
,DROP TABLE
- Support for
JOIN
operations, as well asGROUP BY
andORDER BY
operations - BTrees for indexing
- User Permisions
- Basic SQL operations:
- Version Control:
- Creating, Reverting and Squashing Commits
- Creating, Switching and Deleting Branches
- Pulling Changes from source branch
- Merging Branches, with conflict resolution policies
- Logs, Viewing Specific Commits and Viewing Database Schema at a specific commit
Language Choices
-
When choosing a language for the database server, we looked for one that would be fast, safe, and with a great package management system. This is what brought us to Rust.
- Rust is fast, safe, provides zero-cost abstractions, and has a great type-system.
- The structure of the language forces us to write code in a manner that eliminates huge sets of concurrency and memory bugs.
- Iterators in Rust are iterators done right: performing transformations and selections on arrays becomes really straightforward and concise.
- The toolchain that Rust provides with its compiler makes a very seamless experience. The
rustc
compiler additionally provides very helpful error messages and warnings.cargo
allows us to painlessly introduce helpful packages and resources into our project. - There were some drawbacks, particularly its steep learning curve and slow compile times, but overall, using Rust allowed us to iterate and add features very quickly.
-
To efficiently send data between the client and server, we chose Protocol Buffers and gRPC. Protocol Buffers are a language-neutral, platform-neutral, extensible way of serializing structured data. It's extremely efficient, and allows us to define a "structure" for our queries that needed to be enforced.
-
We chose React for the client because it's a popular framework for building web applications, and it's also easy to learn.
- Adding Next.js made using multiple pages much cleaner, and allowed us to also clearly separate the client and server.
- Using Material UI additionally allowed us to make a consistent theme through the entire site, as well as draw from a large library of components in our application.
Design Overview
-
The database is represented as a "folder", with each "file" inside it representing a table. This table "file" is a binary format file, which stores the schema for the table, the rows for the table, and any indexes the table uses.
- These table files store information in "pages", with each page holding
4096
bytes of information. - Any accesses to the table are made as sequences of page accesses, rather than individual row accesses.
- Using this page format helps reduce the number of IO operations required when working with tables.
- Using a binary representation for this file, rather than a simple
.csv
representation, allows us to more efficiently store the rows of the table. - Additionally, it makes updating rows and removing rows, two crucial operations for any database, much faster.
- For more information, see: Page File Structure
- These table files store information in "pages", with each page holding
-
All branches, commits and files are stored in the database server itself. This differs from git, where commits are first stored in a local repository and then pushed to a remote repository.
- The branches and commits for a database are stored across four files.
deltas.gql
: Stores the actual changes made in the commits, as well as any important metadata. This uses a modified version of the page representation format, to better handle variable-sized information.commitheaders.gql
: A table to quickly lookup the location of a commit in the actualdelta.gql
files using the commit hash.branches.gql
: A table that stores the various nodes that are linked together to form branches.branchheads.gql
: A table to store the current HEAD for a particular branch. In other words, the latest commit made on a particular branch.- For more information, see: Version Control System
-
Here's some additional helpful diagrams we made while designing the server: