TheKnarf

Local-first db (lfdb)

Local-First software is software written to work locally without an internet connection, but to still have cloud syncing capabilites to seamlessly work with the cloud.

It relates to the idea of File-first software.

I also like the philosophy of Nikolai's priciples.

Idea

I've wanted to build a database-engine on top of Git forever. I've had this idea since before the term "local-first software" was first coined.

Git was my first introduction to "local-first software", where each Git repo normally is a full clone of the repo, and you can work on it entirely offline, and then sync it with the cloud when you want to.

The idea of local-first db is to take the Rust-based gitoxide-project and implement a database on top of it thats a superset of Git.

We need the following features:

  • Works offline

  • Access management is built into the DB

  • Push & pull

  • Subscription queries

  • Privacy preserving (don't leak IP-addresses, emails, names, etc)

Inspiring projects

  • irmin - Irmin is an OCaml library for building mergeable, branchable distributed data stores thats Git compatible.

  • dolt - Dolt is a SQL database that you can fork, clone, branch, merge, push and pull just like a Git repository.

  • Automerge - a library of data structures for building collaborative applications

Research

  • Upwelling is doing a lot of what I'd like my system to be able to provide

  • Backchannel list out a lot of how I'd like users to interact when building apps on-top of lfdb to preserve privacy. ( Backchannel prototype )

  • SNARKBlock some ideas on federated anonymous blocklists. I'm not sure if that exactly how I'd like to implement it, but it can be used as inspiration.

Architecture

Network transport

Network transpot to communicate between Server / Client. I'd like to have built-in support for privacy preserving decentralized transports. To me the most interesting one is Veilid.

Interesting ones:

  • Veilid - Seems well designed. The DefCon talk presents it well. I like that it doesn't leak IP addresses.
  • Reticulum - Needs dedicated hardware, but could be an interesting target to also support
  • I2P - Newer than Tor, older than Veilid.

Less interesting projects (very alpha and missing features):

  • Mycoria - "Private by default (WIP)", doesn't seem ready for use
  • Yggdrasil - alpha-level research project

Query- & Permission-layer (and push / pull / subscribe)

The permission system needs to be deeply integrated with the query system so that you can only pull down data that the client have read access to (and only push data it hash write access to)

Push and pull model like Git

An additinal Subscribe model where they can subscribe to updates to a particular query

Datamode / Data-layer

  • Should everything be tables like in SQL databases?

    • Propbably easier to build a uniform query system if the data is uniformly stored as tables
  • Or should everything be JSON / Apache Avro?

  • Support multiple different types?

  • Prolly Trees like in Dolt?

    • Prolly Trees seems useful for splitting files into smaller chunks for syncing

      • Can be useful for BitTorrent style file sync

      • Rsync, Bup Backup and IPFS uses similar ideas

    • Seems like Dolt threats a DB table as a "single file" and then chunks it as a Prolly Tree? Or does it do it per row?

      • How will this work effectivly with permission systems?

      • If you can't read a table then just don't sync it

      • What if you don't have read access to a whole row?

      • What if you don't have read access to a whole column?

      • What if you don't have read access to a specific column in a specific row?

      • How would forign-keys and constraints / view-maintaince work on partial data sets?

  • Other interesting datamodels:

Block Store / Storage-layer

The underlaying storage that read/writes to disk

Could be:

  • rocksdb - C++ project with unoffical Rust wrapper

  • sled - Native Rust storage, more experimental

  • My own custom stuff

Could potentially support multiple, databases like MySQL support multiple (InnoDB, MyISAM, ...).

Tooling