Chopin / docs

Database

Request for Chopin-Optimized Database

Overview

Chopin Framework is compatible with any database. However, it is possible to build a more optimized database that will enable light client verification and trust-minimized settlement. Ideally, there would be a solution for both fraud and validity proofs.

RequirementsReason
DeterministicRight now, determinism is the responsibility of the developer. They must avoid non-deterministic functions like random() and datetime(). A truly optimized database would ensure that all operations are deterministic and would even provide alternative verifiable implementations of non-deterministic functions. Further, for light client verification and settlement, we need to construct a deterministic state commitment. This is possible to implement outside of the database itself, however, it would be optimal if we could ensure the database is writing data to disk in a deterministic manner.
MerklizedMerkelization provides a variety of benefits for enabling light client verification and trust-minimized settlement. Primarily, it would allow light nodes to verify the validity of database queries without having to download the entire database.
ProvableProving the execution of a database would allow us to prove the validity of the database state commitment. This can be done compiling the database to a provable target, such as RISC-V.

SQLite

SQLite is a great starting point for this, as it is optimized for single threaded, embedded use cases. Here is some of what could be done to get it working.

Compilation

In order to ensure that the database executes deterministically, we would need to compile it with the following flags:

FlagReason (what it removes / enforces)Effect on determinism
-DSQLITE_OMIT_RANDOMNESSDeletes SQLite’s internal PRNG.random() / randomblob() become constant; WAL “salt” fields and auto-rowids are no longer random.
-DSQLITE_OMIT_DATETIME_FUNCSExcludes all five date/time SQL functions.No query can consult the wall clock, so results never depend on run-time.
-DSQLITE_OMIT_LOCALTIMERemoves UTC→local-time conversion code.Eliminates hidden dependence on host locale / TZ settings.
-DSQLITE_OMIT_WALCompiles out Write-Ahead Logging.Removes random salt values and WAL concurrency paths.
-DSQLITE_TEMP_STORE=3Forces the default temp store to MEMORY.Temp files never hit the filesystem; path/inode/timestamp noise cannot leak into the DB image.
-DSQLITE_THREADSAFE=0Builds in single-thread mode (no mutexes).Removes race-driven nondeterminism.
-DSQLITE_DEFAULT_SYNCHRONOUS=2Bakes synchronous = FULL into the library.Every commit is double-flushed (fsync) without an explicit pragma.
-DSQLITE_DEFAULT_PAGE_SIZE=4096Sets the default page size.All new DBs start with the same page geometry.
-DSQLITE_MAX_PAGE_SIZE=4096Hard ceiling on page size.Prevents any later attempt to enlarge the page size via pragma or rebuild.
-DSQLITE_OMIT_AUTOVACUUMRemoves auto-vacuum logic and the PRAGMA auto_vacuum.Page numbers never reshuffle during ordinary deletes; they change only on an explicit VACUUM, which you already control.
-DSQLITE_DEFAULT_CACHE_SIZE=-2000Default pager cache ≈ 2 000 pages.Guarantees identical cache behaviour across hosts.
-DSQLITE_DEFAULT_LOCKING_MODE=1Default locking mode = EXCLUSIVE.Blocks stray readers/writers at the OS-lock layer.

(Optional: re-implement non-deterministic functions like datetime() with deterministic, verifiable implementations and register them as SQLITE_DETERMINISTIC UDFs.)

To prove compilation, it most likely makes sense to compile it to RISC-V and run it inside of a zkVM, such as SP1 or RISC-0. In theory, this can be done in the following way. However, it reality it might not just work without some additional work to actually run in the zkVMs.

riscv64-linux-gnu-gcc -static -Os -fno-omit-frame-pointer \
    -DSQLITE_OMIT_RANDOMNESS \
    -DSQLITE_OMIT_DATETIME_FUNCS \
    -DSQLITE_OMIT_LOCALTIME \
    -DSQLITE_OMIT_WAL \
    -DSQLITE_TEMP_STORE=3 \
    -DSQLITE_THREADSAFE=0 \
    -DSQLITE_DEFAULT_SYNCHRONOUS=2 \
    -DSQLITE_DEFAULT_PAGE_SIZE=4096 \
    -DSQLITE_MAX_PAGE_SIZE=4096 \
    -DSQLITE_OMIT_AUTOVACUUM \
    -DSQLITE_DEFAULT_CACHE_SIZE=-2000 \
    -DSQLITE_DEFAULT_LOCKING_MODE=1 \
    -DHAVE_USLEEP \
    -o sqlite3-riscv64 \
    shell.c sqlite3.c -lm

Alternatively, it could be compiled to WebAssembly and run inside of a provable WASM VM. Also, instead of compiling the original SQLite, one could attempt to compile libsql or eventually Limbo (written in Rust - not yet stable).

Sequencing

Open one sqlite3* connection in exclusive-locking mode and push all write requests through a FIFO queue. Multiple connections, even if serialised, can reorder the page cache and break determinism.

State Commitment

To generate a state commitment, we need a hash that captures the full state of the database. One naive attempt would be to dump the entire database into a datastructure and hash it. This would generally work with any database, but would be inefficient.

A better way would be to hash the data, as it is represented internally by SQLite, since we took great care to ensure it was written deterministically. In order to do this, we first would need to run the command: VACUUM. This will ensure that the database is written to disk in a deterministic manner. Then, we can hash the database file. Specifically, we would need to hash the file excluding header bytes 92-99 (change-counter & schema cookie) to avoid drift from read-only statements like ANALYZE.

To take this one step further, we can use a virtual file system that merkelizes the pages of the database.

Virtual File System

Instead of using the OS's virtual file system, we can use a custom one that merkelizes the pages of the database. In order to do this, we need to implement a few functions such as xRead, xWrite, xSync, etc. backed by a KV store, such as NOMT.

PostgreSQL

A compelling case can be made for using PostgreSQL instead of SQLite. I will write more about this at a future time.

Redis

There is precedent for drop-in replacements for KV dbs like Redis that don't provide the full functionality. Perhaps the easiest path to getting a verifiable database would be make an SDK that clones a subset of the Redis API. This would only require simple functions that replicate basic KV commands. On the backend, it could be a any provable KV store, such as NOMT. This could also be implemented as drop-in replacements for DynamoDB or other KV stores.

The precedent for this is Vercel KV.

Custom Database

I believe there is considerable demand for a SQL database that is fully optimized for onchain usage. Such a database should likely be a drop-in replacement for a more popular database that already has a rich ecosystem and validated feature set. However, engineering a new database from scratch for this purpose would be relatively straightforward for a team with the right expertise and practically a guaranteed success.

Honorable Mentions

Other interesting related projects include:

  • NOMT - An implementation of a novel binary Merkle Trie and DB, written in Rust. If they branch out into writing a SQL database, perhaps we could use them without any modification.
  • Dolt - A SQL database that you can fork, clone, branch, merge, push and pull just like a Git repository. Dolt is also merkelized and has many great features that would be valuable to have in a Chopin-optimized database, such as built-in forking.

On this page