From 88605b081f0870c21017fd5487f0f0a08c986e0a Mon Sep 17 00:00:00 2001 From: hinto-janai Date: Tue, 24 Sep 2024 12:23:22 -0400 Subject: [PATCH] books/architecture: port database design document (#267) * add chapters * add files, intro * db abstraction * backends * abstraction * syncing * serde * issues * common/types * common/ops * common/service * service diagram * service/resize * service/thread-model * service/shutdown * storage/blockchain * update md files * cleanup * fixes * update for https://github.com/Cuprate/cuprate/pull/290 * review fix --- books/architecture/src/SUMMARY.md | 36 +- books/architecture/src/storage/blockchain.md | 1 - .../src/storage/blockchain/intro.md | 3 + .../src/storage/blockchain/schema/intro.md | 2 + .../src/storage/blockchain/schema/multimap.md | 45 ++ .../src/storage/blockchain/schema/tables.md | 39 ++ .../architecture/src/storage/common/intro.md | 9 + books/architecture/src/storage/common/ops.md | 21 + .../storage/common/service/initialization.md | 9 + .../src/storage/common/service/intro.md | 65 ++ .../src/storage/common/service/requests.md | 8 + .../src/storage/common/service/resizing.md | 15 + .../src/storage/common/service/responses.md | 18 + .../src/storage/common/service/shutdown.md | 4 + .../storage/common/service/thread-model.md | 23 + .../architecture/src/storage/common/types.md | 21 + .../src/storage/database-abstraction.md | 1 - .../src/storage/db/abstraction/backend.md | 50 ++ .../storage/db/abstraction/concrete_env.md | 15 + .../src/storage/db/abstraction/intro.md | 33 + .../src/storage/db/abstraction/trait.md | 49 ++ books/architecture/src/storage/db/intro.md | 23 + .../src/storage/db/issues/endian.md | 6 + .../src/storage/db/issues/hot-swap.md | 17 + .../src/storage/db/issues/intro.md | 7 + .../src/storage/db/issues/multimap.md | 22 + .../src/storage/db/issues/traits.md | 15 + .../src/storage/db/issues/unaligned.md | 24 + books/architecture/src/storage/db/resizing.md | 8 + books/architecture/src/storage/db/serde.md | 44 ++ books/architecture/src/storage/db/syncing.md | 17 + books/architecture/src/storage/intro.md | 35 +- .../storage/{pruning.md => pruning/intro.md} | 0 .../{transaction-pool.md => txpool/intro.md} | 0 storage/README.md | 11 +- storage/blockchain/DESIGN.md | 600 ------------------ 36 files changed, 685 insertions(+), 611 deletions(-) delete mode 100644 books/architecture/src/storage/blockchain.md create mode 100644 books/architecture/src/storage/blockchain/intro.md create mode 100644 books/architecture/src/storage/blockchain/schema/intro.md create mode 100644 books/architecture/src/storage/blockchain/schema/multimap.md create mode 100644 books/architecture/src/storage/blockchain/schema/tables.md create mode 100644 books/architecture/src/storage/common/intro.md create mode 100644 books/architecture/src/storage/common/ops.md create mode 100644 books/architecture/src/storage/common/service/initialization.md create mode 100644 books/architecture/src/storage/common/service/intro.md create mode 100644 books/architecture/src/storage/common/service/requests.md create mode 100644 books/architecture/src/storage/common/service/resizing.md create mode 100644 books/architecture/src/storage/common/service/responses.md create mode 100644 books/architecture/src/storage/common/service/shutdown.md create mode 100644 books/architecture/src/storage/common/service/thread-model.md create mode 100644 books/architecture/src/storage/common/types.md delete mode 100644 books/architecture/src/storage/database-abstraction.md create mode 100644 books/architecture/src/storage/db/abstraction/backend.md create mode 100644 books/architecture/src/storage/db/abstraction/concrete_env.md create mode 100644 books/architecture/src/storage/db/abstraction/intro.md create mode 100644 books/architecture/src/storage/db/abstraction/trait.md create mode 100644 books/architecture/src/storage/db/intro.md create mode 100644 books/architecture/src/storage/db/issues/endian.md create mode 100644 books/architecture/src/storage/db/issues/hot-swap.md create mode 100644 books/architecture/src/storage/db/issues/intro.md create mode 100644 books/architecture/src/storage/db/issues/multimap.md create mode 100644 books/architecture/src/storage/db/issues/traits.md create mode 100644 books/architecture/src/storage/db/issues/unaligned.md create mode 100644 books/architecture/src/storage/db/resizing.md create mode 100644 books/architecture/src/storage/db/serde.md create mode 100644 books/architecture/src/storage/db/syncing.md rename books/architecture/src/storage/{pruning.md => pruning/intro.md} (100%) rename books/architecture/src/storage/{transaction-pool.md => txpool/intro.md} (100%) delete mode 100644 storage/blockchain/DESIGN.md diff --git a/books/architecture/src/SUMMARY.md b/books/architecture/src/SUMMARY.md index d97d223d..bf668609 100644 --- a/books/architecture/src/SUMMARY.md +++ b/books/architecture/src/SUMMARY.md @@ -27,11 +27,37 @@ --- -- [⚪️ Storage](storage/intro.md) - - [⚪️ Database abstraction](storage/database-abstraction.md) - - [⚪️ Blockchain](storage/blockchain.md) - - [⚪️ Transaction pool](storage/transaction-pool.md) - - [⚪️ Pruning](storage/pruning.md) +- [🟢 Storage](storage/intro.md) + - [🟢 Database abstraction](storage/db/intro.md) + - [🟢 Abstraction](storage/db/abstraction/intro.md) + - [🟢 Backend](storage/db/abstraction/backend.md) + - [🟢 ConcreteEnv](storage/db/abstraction/concrete_env.md) + - [🟢 Trait](storage/db/abstraction/trait.md) + - [🟢 Syncing](storage/db/syncing.md) + - [🟢 Resizing](storage/db/resizing.md) + - [🟢 (De)serialization](storage/db/serde.md) + - [🟢 Known issues and tradeoffs](storage/db/issues/intro.md) + - [🟢 Abstracting backends](storage/db/issues/traits.md) + - [🟢 Hot-swap](storage/db/issues/hot-swap.md) + - [🟢 Unaligned bytes](storage/db/issues/unaligned.md) + - [🟢 Endianness](storage/db/issues/endian.md) + - [🟢 Multimap](storage/db/issues/multimap.md) + - [🟢 Common behavior](storage/common/intro.md) + - [🟢 Types](storage/common/types.md) + - [🟢 `ops`](storage/common/ops.md) + - [🟢 `tower::Service`](storage/common/service/intro.md) + - [🟢 Initialization](storage/common/service/initialization.md) + - [🟢 Requests](storage/common/service/requests.md) + - [🟢 Responses](storage/common/service/responses.md) + - [🟢 Resizing](storage/common/service/resizing.md) + - [🟢 Thread model](storage/common/service/thread-model.md) + - [🟢 Shutdown](storage/common/service/shutdown.md) + - [🟢 Blockchain](storage/blockchain/intro.md) + - [🟢 Schema](storage/blockchain/schema/intro.md) + - [🟢 Tables](storage/blockchain/schema/tables.md) + - [🟢 Multimap tables](storage/blockchain/schema/multimap.md) + - [⚪️ Transaction pool](storage/txpool/intro.md) + - [⚪️ Pruning](storage/pruning/intro.md) --- diff --git a/books/architecture/src/storage/blockchain.md b/books/architecture/src/storage/blockchain.md deleted file mode 100644 index 60466879..00000000 --- a/books/architecture/src/storage/blockchain.md +++ /dev/null @@ -1 +0,0 @@ -# ⚪️ Blockchain diff --git a/books/architecture/src/storage/blockchain/intro.md b/books/architecture/src/storage/blockchain/intro.md new file mode 100644 index 00000000..9d35fca6 --- /dev/null +++ b/books/architecture/src/storage/blockchain/intro.md @@ -0,0 +1,3 @@ +# Blockchain +This section contains storage information specific to [`cuprate_blockchain`](https://doc.cuprate.org/cuprate_blockchain), +the database built on-top of [`cuprate_database`](https://doc.cuprate.org/cuprate_database) that stores the blockchain. diff --git a/books/architecture/src/storage/blockchain/schema/intro.md b/books/architecture/src/storage/blockchain/schema/intro.md new file mode 100644 index 00000000..3bd825fc --- /dev/null +++ b/books/architecture/src/storage/blockchain/schema/intro.md @@ -0,0 +1,2 @@ +# Schema +This section contains the schema of `cuprate_blockchain`'s database tables. \ No newline at end of file diff --git a/books/architecture/src/storage/blockchain/schema/multimap.md b/books/architecture/src/storage/blockchain/schema/multimap.md new file mode 100644 index 00000000..2a4c6eb5 --- /dev/null +++ b/books/architecture/src/storage/blockchain/schema/multimap.md @@ -0,0 +1,45 @@ +# Multimap tables +## Outputs +When referencing outputs, Monero will [use the amount and the amount index](https://github.com/monero-project/monero/blob/c8214782fb2a769c57382a999eaf099691c836e7/src/blockchain_db/lmdb/db_lmdb.cpp#L3447-L3449). This means 2 keys are needed to reach an output. + +With LMDB you can set the `DUP_SORT` flag on a table and then set the key/value to: +```rust +Key = KEY_PART_1 +``` +```rust +Value = { + KEY_PART_2, + VALUE // The actual value we are storing. +} +``` + +Then you can set a custom value sorting function that only takes `KEY_PART_2` into account; this is how `monerod` does it. + +This requires that the underlying database supports: +- multimap tables +- custom sort functions on values +- setting a cursor on a specific key/value + +## How `cuprate_blockchain` does it +Another way to implement this is as follows: +```rust +Key = { KEY_PART_1, KEY_PART_2 } +``` +```rust +Value = VALUE +``` + +Then the key type is simply used to look up the value; this is how `cuprate_blockchain` does it +as [`cuprate_database` does not have a multimap abstraction (yet)](../../db/issues/multimap.md). + +For example, the key/value pair for outputs is: +```rust +PreRctOutputId => Output +``` +where `PreRctOutputId` looks like this: +```rust +struct PreRctOutputId { + amount: u64, + amount_index: u64, +} +``` \ No newline at end of file diff --git a/books/architecture/src/storage/blockchain/schema/tables.md b/books/architecture/src/storage/blockchain/schema/tables.md new file mode 100644 index 00000000..15e0c633 --- /dev/null +++ b/books/architecture/src/storage/blockchain/schema/tables.md @@ -0,0 +1,39 @@ +# Tables + +> See also: & . + +The `CamelCase` names of the table headers documented here (e.g. `TxIds`) are the actual type name of the table within `cuprate_blockchain`. + +Note that words written within `code blocks` mean that it is a real type defined and usable within `cuprate_blockchain`. Other standard types like u64 and type aliases (TxId) are written normally. + +Within `cuprate_blockchain::tables`, the below table is essentially defined as-is with [a macro](https://github.com/Cuprate/cuprate/blob/31ce89412aa174fc33754f22c9a6d9ef5ddeda28/database/src/tables.rs#L369-L470). + +Many of the data types stored are the same data types, although are different semantically, as such, a map of aliases used and their real data types is also provided below. + +| Alias | Real Type | +|----------------------------------------------------|-----------| +| BlockHeight, Amount, AmountIndex, TxId, UnlockTime | u64 +| BlockHash, KeyImage, TxHash, PrunableHash | [u8; 32] + +--- + +| Table | Key | Value | Description | +|--------------------|----------------------|-------------------------|-------------| +| `BlockHeaderBlobs` | BlockHeight | `StorableVec` | Maps a block's height to a serialized byte form of its header +| `BlockTxsHashes` | BlockHeight | `StorableVec<[u8; 32]>` | Maps a block's height to the block's transaction hashes +| `BlockHeights` | BlockHash | BlockHeight | Maps a block's hash to its height +| `BlockInfos` | BlockHeight | `BlockInfo` | Contains metadata of all blocks +| `KeyImages` | KeyImage | () | This table is a set with no value, it stores transaction key images +| `NumOutputs` | Amount | u64 | Maps an output's amount to the number of outputs with that amount +| `Outputs` | `PreRctOutputId` | `Output` | This table contains legacy CryptoNote outputs which have clear amounts. This table will not contain an output with 0 amount. +| `PrunedTxBlobs` | TxId | `StorableVec` | Contains pruned transaction blobs (even if the database is not pruned) +| `PrunableTxBlobs` | TxId | `StorableVec` | Contains the prunable part of a transaction +| `PrunableHashes` | TxId | PrunableHash | Contains the hash of the prunable part of a transaction +| `RctOutputs` | AmountIndex | `RctOutput` | Contains RingCT outputs mapped from their global RCT index +| `TxBlobs` | TxId | `StorableVec` | Serialized transaction blobs (bytes) +| `TxIds` | TxHash | TxId | Maps a transaction's hash to its index/ID +| `TxHeights` | TxId | BlockHeight | Maps a transaction's ID to the height of the block it comes from +| `TxOutputs` | TxId | `StorableVec` | Gives the amount indices of a transaction's outputs +| `TxUnlockTime` | TxId | UnlockTime | Stores the unlock time of a transaction (only if it has a non-zero lock time) + + \ No newline at end of file diff --git a/books/architecture/src/storage/common/intro.md b/books/architecture/src/storage/common/intro.md new file mode 100644 index 00000000..a772d878 --- /dev/null +++ b/books/architecture/src/storage/common/intro.md @@ -0,0 +1,9 @@ +# Common behavior +The crates that build on-top of the database abstraction ([`cuprate_database`](https://doc.cuprate.org/cuprate_database)) +share some common behavior including but not limited to: + +- Defining their specific database tables and types +- Having an `ops` module +- Exposing a `tower::Service` API (backed by a threadpool) for public usage + +This section provides more details on these behaviors. \ No newline at end of file diff --git a/books/architecture/src/storage/common/ops.md b/books/architecture/src/storage/common/ops.md new file mode 100644 index 00000000..3a4e6174 --- /dev/null +++ b/books/architecture/src/storage/common/ops.md @@ -0,0 +1,21 @@ +# `ops` +Both [`cuprate_blockchain`](https://doc.cuprate.org/cuprate_blockchain) +and [`cuprate_txpool`](https://doc.cuprate.org/cuprate_txpool) expose an +`ops` module containing abstracted abstracted Monero-related database operations. + +For example, [`cuprate_blockchain::ops::block::add_block`](https://doc.cuprate.org/cuprate_blockchain/ops/block/fn.add_block.html). + +These functions build on-top of the database traits and allow for more abstracted database operations. + +For example, instead of these signatures: +```rust +fn get(_: &Key) -> Value; +fn put(_: &Key, &Value); +``` +the `ops` module provides much higher-level signatures like such: +```rust +fn add_block(block: &Block) -> Result<_, _>; +``` + +Although these functions are exposed, they are not the main API, that would be next section: +the [`tower::Service`](./service/intro.md) (which uses these functions). \ No newline at end of file diff --git a/books/architecture/src/storage/common/service/initialization.md b/books/architecture/src/storage/common/service/initialization.md new file mode 100644 index 00000000..83509717 --- /dev/null +++ b/books/architecture/src/storage/common/service/initialization.md @@ -0,0 +1,9 @@ +# Initialization +A database service is started simply by calling: [`init()`](https://doc.cuprate.org/cuprate_blockchain/service/fn.init.html). + +This function initializes the database, spawns threads, and returns a: +- Read handle to the database +- Write handle to the database +- The database itself + +These handles implement the `tower::Service` trait, which allows sending requests and receiving responses `async`hronously. \ No newline at end of file diff --git a/books/architecture/src/storage/common/service/intro.md b/books/architecture/src/storage/common/service/intro.md new file mode 100644 index 00000000..bba7486b --- /dev/null +++ b/books/architecture/src/storage/common/service/intro.md @@ -0,0 +1,65 @@ +# tower::Service +Both [`cuprate_blockchain`](https://doc.cuprate.org/cuprate_blockchain) +and [`cuprate_txpool`](https://doc.cuprate.org/cuprate_txpool) provide +`async` [`tower::Service`](https://docs.rs/tower)s that define database requests/responses. + +The main API that other Cuprate crates use. + +There are 2 `tower::Service`s: +1. A read service which is backed by a [`rayon::ThreadPool`](https://docs.rs/rayon) +1. A write service which spawns a single thread to handle write requests + +As this behavior is the same across all users of [`cuprate_database`](https://doc.cuprate.org/cuprate_database), +it is extracted into its own crate: [`cuprate_database_service`](https://doc.cuprate.org/cuprate_database_service). + +## Diagram +As a recap, here is how this looks to a user of a higher-level database crate, +`cuprate_blockchain` in this example. Starting from the lowest layer: + +1. `cuprate_database` is used to abstract the database +1. `cuprate_blockchain` builds on-top of that with tables, types, operations +1. `cuprate_blockchain` exposes a `tower::Service` using `cuprate_database_service` +1. The user now interfaces with `cuprate_blockchain` with that `tower::Service` in a request/response fashion + +``` + ┌──────────────────┐ + │ cuprate_database │ + └────────┬─────────┘ +┌─────────────────────────────────┴─────────────────────────────────┐ +│ cuprate_blockchain │ +│ │ +│ ┌──────────────────────┐ ┌─────────────────────────────────────┐ │ +│ │ Tables, types │ │ ops │ │ +│ │ ┌───────────┐┌─────┐ │ │ ┌─────────────┐ ┌──────────┐┌─────┐ │ │ +│ │ │ BlockInfo ││ ... │ ├──┤ │ add_block() │ │ add_tx() ││ ... │ │ │ +│ │ └───────────┘└─────┘ │ │ └─────────────┘ └──────────┘└─────┘ │ │ +│ └──────────────────────┘ └─────┬───────────────────────────────┘ │ +│ │ │ +│ ┌─────────┴───────────────────────────────┐ │ +│ │ tower::Service │ │ +│ │ ┌──────────────────────────────┐┌─────┐ │ │ +│ │ │ Blockchain{Read,Write}Handle ││ ... │ │ │ +│ │ └──────────────────────────────┘└─────┘ │ │ +│ └─────────┬───────────────────────────────┘ │ +│ │ │ +└─────────────────────────────────┼─────────────────────────────────┘ + │ + ┌─────┴─────┐ + ┌────────────────────┴────┐ ┌────┴──────────────────────────────────┐ + │ Database requests │ │ Database responses │ + │ ┌─────────────────────┐ │ │ ┌───────────────────────────────────┐ │ + │ │ FindBlock([u8; 32]) │ │ │ │ FindBlock(Option<(Chain, usize)>) │ │ + │ └─────────────────────┘ │ │ └───────────────────────────────────┘ │ + │ ┌─────────────────────┐ │ │ ┌───────────────────────────────────┐ │ + │ │ ChainHeight │ │ │ │ ChainHeight(usize, [u8; 32]) │ │ + │ └─────────────────────┘ │ │ └───────────────────────────────────┘ │ + │ ┌─────────────────────┐ │ │ ┌───────────────────────────────────┐ │ + │ │ ... │ │ │ │ ... │ │ + │ └─────────────────────┘ │ │ └───────────────────────────────────┘ │ + └─────────────────────────┘ └───────────────────────────────────────┘ + ▲ │ + │ ▼ + ┌─────────────────────────┐ + │ cuprate_blockchain user │ + └─────────────────────────┘ +``` \ No newline at end of file diff --git a/books/architecture/src/storage/common/service/requests.md b/books/architecture/src/storage/common/service/requests.md new file mode 100644 index 00000000..9157359a --- /dev/null +++ b/books/architecture/src/storage/common/service/requests.md @@ -0,0 +1,8 @@ +# Requests +Along with the 2 handles, there are 2 types of requests: +- Read requests, e.g. [`BlockchainReadRequest`](https://doc.cuprate.org/cuprate_types/blockchain/enum.BlockchainReadRequest.html) +- Write requests, e.g. [`BlockchainWriteRequest`](https://doc.cuprate.org/cuprate_types/blockchain/enum.BlockchainWriteRequest.html) + +Quite obviously: +- Read requests are for retrieving various data from the database +- Write requests are for writing data to the database \ No newline at end of file diff --git a/books/architecture/src/storage/common/service/resizing.md b/books/architecture/src/storage/common/service/resizing.md new file mode 100644 index 00000000..13cd3b49 --- /dev/null +++ b/books/architecture/src/storage/common/service/resizing.md @@ -0,0 +1,15 @@ +# Resizing +As noted in the [`cuprate_database` resizing section](../../db/resizing.md), +builders on-top of `cuprate_database` are responsible for resizing the database. + +In `cuprate_{blockchain,txpool}`'s case, that means the `tower::Service` must know +how to resize. This logic is shared between both crates, defined in `cuprate_database_service`: +. + +By default, this uses a _similar_ algorithm as `monerod`'s: + +- [If there's not enough space to fit a write request's data](https://github.com/Cuprate/cuprate/blob/0941f68efcd7dfe66124ad0c1934277f47da9090/storage/service/src/service/write.rs#L130), start a resize +- Each resize adds around [`1,073,745,920`](https://github.com/Cuprate/cuprate/blob/2ac90420c658663564a71b7ecb52d74f3c2c9d0f/database/src/resize.rs#L104-L160) bytes to the current map size +- A resize will be [attempted `3` times](https://github.com/Cuprate/cuprate/blob/0941f68efcd7dfe66124ad0c1934277f47da9090/storage/service/src/service/write.rs#L110) before failing + +There are other [resizing algorithms](https://doc.cuprate.org/cuprate_database/resize/enum.ResizeAlgorithm.html) that define how the database's memory map grows, although currently the behavior of `monerod` is closely followed (for no particular reason). \ No newline at end of file diff --git a/books/architecture/src/storage/common/service/responses.md b/books/architecture/src/storage/common/service/responses.md new file mode 100644 index 00000000..c03b42fd --- /dev/null +++ b/books/architecture/src/storage/common/service/responses.md @@ -0,0 +1,18 @@ +# Responses +After sending a request using the read/write handle, the value returned is _not_ the response, yet an `async`hronous channel that will eventually return the response: +```rust,ignore +// Send a request. +// tower::Service::call() +// V +let response_channel: Channel = read_handle.call(BlockchainReadRequest::ChainHeight)?; + +// Await the response. +let response: BlockchainReadRequest = response_channel.await?; +``` + +After `await`ing the returned channel, a `Response` will eventually be returned when +the `Service` threadpool has fetched the value from the database and sent it off. + +Both read/write requests variants match in name with `Response` variants, i.e. +- `BlockchainReadRequest::ChainHeight` leads to `BlockchainResponse::ChainHeight` +- `BlockchainWriteRequest::WriteBlock` leads to `BlockchainResponse::WriteBlockOk` diff --git a/books/architecture/src/storage/common/service/shutdown.md b/books/architecture/src/storage/common/service/shutdown.md new file mode 100644 index 00000000..4f9890e1 --- /dev/null +++ b/books/architecture/src/storage/common/service/shutdown.md @@ -0,0 +1,4 @@ +# Shutdown +Once the read/write handles to the `tower::Service` are `Drop`ed, the backing thread(pool) will gracefully exit, automatically. + +Note the writer thread and reader threadpool aren't connected whatsoever; dropping the write handle will make the writer thread exit, however, the reader handle is free to be held onto and can be continued to be read from - and vice-versa for the write handle. diff --git a/books/architecture/src/storage/common/service/thread-model.md b/books/architecture/src/storage/common/service/thread-model.md new file mode 100644 index 00000000..b69d62c0 --- /dev/null +++ b/books/architecture/src/storage/common/service/thread-model.md @@ -0,0 +1,23 @@ +# Thread model +The base database abstractions themselves are not concerned with parallelism, they are mostly functions to be called from a single-thread. + +However, the `cuprate_database_service` API, _does_ have a thread model backing it. + +When a `Service`'s init() function is called, threads will be spawned and +maintained until the user drops (disconnects) the returned handles. + +The current behavior for thread count is: +- [1 writer thread](https://github.com/Cuprate/cuprate/blob/0941f68efcd7dfe66124ad0c1934277f47da9090/storage/service/src/service/write.rs#L48-L52) +- [As many reader threads as there are system threads](https://github.com/Cuprate/cuprate/blob/0941f68efcd7dfe66124ad0c1934277f47da9090/storage/service/src/reader_threads.rs#L44-L49) + +For example, on a system with 32-threads, `cuprate_database_service` will spawn: +- 1 writer thread +- 32 reader threads + +whose sole responsibility is to listen for database requests, access the database (potentially in parallel), and return a response. + +Note that the `1 system thread = 1 reader thread` model is only the default setting, the reader thread count can be configured by the user to be any number between `1 .. amount_of_system_threads`. + +The reader threads are managed by [`rayon`](https://docs.rs/rayon). + +For an example of where multiple reader threads are used: given a request that asks if any key-image within a set already exists, `cuprate_blockchain` will [split that work between the threads with `rayon`](https://github.com/Cuprate/cuprate/blob/0941f68efcd7dfe66124ad0c1934277f47da9090/storage/blockchain/src/service/read.rs#L400). \ No newline at end of file diff --git a/books/architecture/src/storage/common/types.md b/books/architecture/src/storage/common/types.md new file mode 100644 index 00000000..b6f2c6f2 --- /dev/null +++ b/books/architecture/src/storage/common/types.md @@ -0,0 +1,21 @@ +# Types +## POD types +Since [all types in the database are POD types](../db/serde.md), we must often +provide mappings between outside types and the types actually stored in the database. + +A common case is mapping infallible types to and from [`bitflags`](https://docs.rs/bitflag) and/or their raw integer representation. +For example, the [`OutputFlag`](https://doc.cuprate.org/cuprate_blockchain/types/struct.OutputFlags.html) type or `bool` types. + +As types like `enum`s, `bool`s and `char`s cannot be casted from an integer infallibly, +`bytemuck::Pod` cannot be implemented on it safely. Thus, we store some infallible version +of it inside the database with a custom type and map them when fetching the data. + +## Lean types +Another reason why database crates define their own types is +to cut any unneeded data from the type. + +Many of the types used in normal operation (e.g. [`cuprate_types::VerifiedBlockInformation`](https://doc.cuprate.org/cuprate_types/struct.VerifiedBlockInformation.html)) contain lots of extra pre-processed data for convenience. + +This would be a waste to store in the database, so in this example, the much leaner +"raw" [`BlockInfo`](https://doc.cuprate.org/cuprate_blockchain/types/struct.BlockInfo.html) +type is stored. diff --git a/books/architecture/src/storage/database-abstraction.md b/books/architecture/src/storage/database-abstraction.md deleted file mode 100644 index b21a192c..00000000 --- a/books/architecture/src/storage/database-abstraction.md +++ /dev/null @@ -1 +0,0 @@ -# ⚪️ Database abstraction diff --git a/books/architecture/src/storage/db/abstraction/backend.md b/books/architecture/src/storage/db/abstraction/backend.md new file mode 100644 index 00000000..02e796a8 --- /dev/null +++ b/books/architecture/src/storage/db/abstraction/backend.md @@ -0,0 +1,50 @@ +# Backend +First, we need an actual database implementation. + +`cuprate-database`'s `trait`s allow abstracting over the actual database, such that any backend in particular could be used. + +This page is an enumeration of all the backends Cuprate has, has tried, and may try in the future. + +## `heed` +The default database used is [`heed`](https://github.com/meilisearch/heed) (LMDB). The upstream versions from [`crates.io`](https://crates.io/crates/heed) are used. `LMDB` should not need to be installed as `heed` has a build script that pulls it in automatically. + +`heed`'s filenames inside Cuprate's data folder are: + +| Filename | Purpose | +|------------|---------| +| `data.mdb` | Main data file +| `lock.mdb` | Database lock file + +`heed`-specific notes: +- [There is a maximum reader limit](https://github.com/monero-project/monero/blob/059028a30a8ae9752338a7897329fe8012a310d5/src/blockchain_db/lmdb/db_lmdb.cpp#L1372). Other potential processes (e.g. `xmrblocks`) that are also reading the `data.mdb` file need to be accounted for +- [LMDB does not work on remote filesystem](https://github.com/LMDB/lmdb/blob/b8e54b4c31378932b69f1298972de54a565185b1/libraries/liblmdb/lmdb.h#L129) + +## `redb` +The 2nd database backend is the 100% Rust [`redb`](https://github.com/cberner/redb). + +The upstream versions from [`crates.io`](https://crates.io/crates/redb) are used. + +`redb`'s filenames inside Cuprate's data folder are: + +| Filename | Purpose | +|-------------|---------| +| `data.redb` | Main data file + + + +## `redb-memory` +This backend is 100% the same as `redb`, although, it uses [`redb::backend::InMemoryBackend`](https://docs.rs/redb/2.1.2/redb/backends/struct.InMemoryBackend.html) which is a database that completely resides in memory instead of a file. + +All other details about this should be the same as the normal `redb` backend. + +## `sanakirja` +[`sanakirja`](https://docs.rs/sanakirja) was a candidate as a backend, however there were problems with maximum value sizes. + +The default maximum value size is [1012 bytes](https://docs.rs/sanakirja/1.4.1/sanakirja/trait.Storable.html) which was too small for our requirements. Using [`sanakirja::Slice`](https://docs.rs/sanakirja/1.4.1/sanakirja/union.Slice.html) and [sanakirja::UnsizedStorage](https://docs.rs/sanakirja/1.4.1/sanakirja/trait.UnsizedStorable.html) was attempted, but there were bugs found when inserting a value in-between `512..=4096` bytes. + +As such, it is not implemented. + +## `MDBX` +[`MDBX`](https://erthink.github.io/libmdbx) was a candidate as a backend, however MDBX deprecated the custom key/value comparison functions, this makes it a bit trickier to implement multimap tables. It is also quite similar to the main backend LMDB (of which it was originally a fork of). + +As such, it is not implemented (yet). diff --git a/books/architecture/src/storage/db/abstraction/concrete_env.md b/books/architecture/src/storage/db/abstraction/concrete_env.md new file mode 100644 index 00000000..059358e5 --- /dev/null +++ b/books/architecture/src/storage/db/abstraction/concrete_env.md @@ -0,0 +1,15 @@ +# `ConcreteEnv` +After a backend is selected, the main database environment struct is "abstracted" by putting it in the non-generic, concrete [`struct ConcreteEnv`](https://doc.cuprate.org/cuprate_database/struct.ConcreteEnv.html). + +This is the main object used when handling the database directly. + +This struct contains all the data necessary to operate the database. +The actual database backend `ConcreteEnv` will use internally [depends on which backend feature is used](https://github.com/Cuprate/cuprate/blob/0941f68efcd7dfe66124ad0c1934277f47da9090/storage/database/src/backend/mod.rs#L3-L13). + +`ConcreteEnv` itself is not too important, what is important is that: +1. It allows callers to not directly reference any particular backend environment +1. It implements [`trait Env`](https://doc.cuprate.org/cuprate_database/trait.Env.html) which opens the door to all the other database traits + +The equivalent "database environment" objects in the backends themselves are: +- [`heed::Env`](https://docs.rs/heed/0.20.0/heed/struct.Env.html) +- [`redb::Database`](https://docs.rs/redb/2.1.0/redb/struct.Database.html) \ No newline at end of file diff --git a/books/architecture/src/storage/db/abstraction/intro.md b/books/architecture/src/storage/db/abstraction/intro.md new file mode 100644 index 00000000..34a43207 --- /dev/null +++ b/books/architecture/src/storage/db/abstraction/intro.md @@ -0,0 +1,33 @@ +# Abstraction +This next section details how `cuprate_database` abstracts multiple database backends into 1 API. + +## Diagram +A simple diagram describing the responsibilities/relationship of `cuprate_database`. + +```text +┌───────────────────────────────────────────────────────────────────────┐ +│ cuprate_database │ +│ │ +│ ┌───────────────────────────┐ ┌─────────────────────────────────┐ │ +│ │ Database traits │ │ Backends │ │ +│ │ ┌─────┐┌──────┐┌────────┐ │ │ ┌─────────────┐ ┌─────────────┐ │ │ +│ │ │ Env ││ TxRw ││ ... │ ├─────┤ │ heed (LMDB) │ │ redb │ │ │ +│ │ └─────┘└──────┘└────────┘ │ │ └─────────────┘ └─────────────┘ │ │ +│ └──────────┬─────────────┬──┘ └──┬──────────────────────────────┘ │ +│ │ └─────┬─────┘ │ +│ │ ┌─────────┴──────────────┐ │ +│ │ │ Database types │ │ +│ │ │ ┌─────────────┐┌─────┐ │ │ +│ │ │ │ ConcreteEnv ││ ... │ │ │ +│ │ │ └─────────────┘└─────┘ │ │ +│ │ └─────────┬──────────────┘ │ +│ │ │ │ +└────────────┼───────────────────┼──────────────────────────────────────┘ + │ │ + └───────────────────┤ + │ + ▼ + ┌───────────────────────┐ + │ cuprate_database user │ + └───────────────────────┘ +``` \ No newline at end of file diff --git a/books/architecture/src/storage/db/abstraction/trait.md b/books/architecture/src/storage/db/abstraction/trait.md new file mode 100644 index 00000000..e7b25d2e --- /dev/null +++ b/books/architecture/src/storage/db/abstraction/trait.md @@ -0,0 +1,49 @@ +# Trait +`cuprate_database` provides a set of `trait`s that abstract over the various database backends. + +This allows the function signatures and behavior to stay the same but allows for swapping out databases in an easier fashion. + +All common behavior of the backend's are encapsulated here and used instead of using the backend directly. + +Examples: +- [`trait Env`](https://github.com/Cuprate/cuprate/blob/2ac90420c658663564a71b7ecb52d74f3c2c9d0f/database/src/env.rs) +- [`trait {TxRo, TxRw}`](https://github.com/Cuprate/cuprate/blob/2ac90420c658663564a71b7ecb52d74f3c2c9d0f/database/src/transaction.rs) +- [`trait {DatabaseRo, DatabaseRw}`](https://github.com/Cuprate/cuprate/blob/2ac90420c658663564a71b7ecb52d74f3c2c9d0f/database/src/database.rs) + +For example, instead of calling `heed` or `redb`'s `get()` function directly, `DatabaseRo::get()` is called. + +## Usage +With a `ConcreteEnv` and a particular backend selected, +we can now start using it alongside these traits to start +doing database operations in a generic manner. + +An example: + +```rust +use cuprate_database::{ + ConcreteEnv, + config::ConfigBuilder, + Env, EnvInner, + DatabaseRo, DatabaseRw, TxRo, TxRw, +}; + +// Initialize the database environment. +let env = ConcreteEnv::open(config)?; + +// Open up a transaction + tables for writing. +let env_inner = env.env_inner(); +let tx_rw = env_inner.tx_rw()?; +env_inner.create_db::(&tx_rw)?; + +// Write data to the table. +{ + let mut table = env_inner.open_db_rw::
(&tx_rw)?; + table.put(&0, &1)?; +} + +// Commit the transaction. +TxRw::commit(tx_rw)?; +``` + +As seen above, there is no direct call to `heed` or `redb`. +Their functionality is abstracted behind `ConcreteEnv` and the `trait`s. \ No newline at end of file diff --git a/books/architecture/src/storage/db/intro.md b/books/architecture/src/storage/db/intro.md new file mode 100644 index 00000000..5973fbe7 --- /dev/null +++ b/books/architecture/src/storage/db/intro.md @@ -0,0 +1,23 @@ +# Database abstraction +[`cuprate_database`](https://doc.cuprate.org/cuprate_database) is Cuprate’s database abstraction. + +This crate abstracts various database backends with `trait`s. + +All backends have the following attributes: + +- [Embedded](https://en.wikipedia.org/wiki/Embedded_database) +- [Multiversion concurrency control](https://en.wikipedia.org/wiki/Multiversion_concurrency_control) +- [ACID](https://en.wikipedia.org/wiki/ACID) +- Are `(key, value)` oriented and have the expected API (`get()`, `insert()`, `delete()`) +- Are table oriented (`"table_name" -> (key, value)`) +- Allows concurrent readers + +The currently implemented backends are: +- [`heed`](https://github.com/meilisearch/heed) (LMDB) +- [`redb`](https://github.com/cberner/redb) + +Said precicely, `cuprate_database` is the embedded database other Cuprate +crates interact with instead of using any particular backend implementation. +This allows the backend to be swapped and/or future backends to be implemented. + +This section will go over `cuprate_database` details. \ No newline at end of file diff --git a/books/architecture/src/storage/db/issues/endian.md b/books/architecture/src/storage/db/issues/endian.md new file mode 100644 index 00000000..577e586f --- /dev/null +++ b/books/architecture/src/storage/db/issues/endian.md @@ -0,0 +1,6 @@ +# Endianness +`cuprate_database`'s (de)serialization and storage of bytes are native-endian, as in, byte storage order will depend on the machine it is running on. + +As Cuprate's build-targets are all little-endian ([big-endian by default machines barely exist](https://en.wikipedia.org/wiki/Endianness#Hardware)), this doesn't matter much and the byte ordering can be seen as a constant. + +Practically, this means `cuprated`'s database files can be transferred across computers, as can `monerod`'s. \ No newline at end of file diff --git a/books/architecture/src/storage/db/issues/hot-swap.md b/books/architecture/src/storage/db/issues/hot-swap.md new file mode 100644 index 00000000..aebfe208 --- /dev/null +++ b/books/architecture/src/storage/db/issues/hot-swap.md @@ -0,0 +1,17 @@ +# Hot-swappable backends +> See also: . + +Using a different backend is really as simple as re-building `cuprate_database` with a different feature flag: +```bash +# Use LMDB. +cargo build --package cuprate-database --features heed + +# Use redb. +cargo build --package cuprate-database --features redb +``` + +This is "good enough" for now, however ideally, this hot-swapping of backends would be able to be done at _runtime_. + +As it is now, `cuprate_database` cannot compile both backends and swap based on user input at runtime; it must be compiled with a certain backend, which will produce a binary with only that backend. + +This also means things like [CI testing multiple backends is awkward](https://github.com/Cuprate/cuprate/blob/main/.github/workflows/ci.yml#L132-L136), as we must re-compile with different feature flags instead. \ No newline at end of file diff --git a/books/architecture/src/storage/db/issues/intro.md b/books/architecture/src/storage/db/issues/intro.md new file mode 100644 index 00000000..eee49812 --- /dev/null +++ b/books/architecture/src/storage/db/issues/intro.md @@ -0,0 +1,7 @@ +# Known issues and tradeoffs +`cuprate_database` takes many tradeoffs, whether due to: +- Prioritizing certain values over others +- Not having a better solution +- Being "good enough" + +This section is a list of the larger ones, along with issues that don't have answers yet. \ No newline at end of file diff --git a/books/architecture/src/storage/db/issues/multimap.md b/books/architecture/src/storage/db/issues/multimap.md new file mode 100644 index 00000000..7e43ce1d --- /dev/null +++ b/books/architecture/src/storage/db/issues/multimap.md @@ -0,0 +1,22 @@ +# Multimap +`cuprate_database` does not currently have an abstraction for [multimap tables](https://en.wikipedia.org/wiki/Multimap). + +All tables are single maps of keys to values. + +This matters as this means some of `cuprate_blockchain`'s tables differ from `monerod`'s tables - the primary key is stored _for all_ entries, compared to `monerod` only needing to store it once: + +```rust +// `monerod` only stores `amount: 1` once, +// `cuprated` stores it each time it appears. +struct PreRctOutputId { amount: 1, amount_index: 0 } +struct PreRctOutputId { amount: 1, amount_index: 1 } +``` + +This means `cuprated`'s database will be slightly larger than `monerod`'s. + +The current method `cuprate_blockchain` uses will be "good enough" as the multimap +keys needed for now are fixed, e.g. pre-RCT outputs are no longer being produced. + +This may need to change in the future when multimap is all but required, e.g. for FCMP++. + +Until then, multimap tables are not implemented as they are tricky to implement across all backends. \ No newline at end of file diff --git a/books/architecture/src/storage/db/issues/traits.md b/books/architecture/src/storage/db/issues/traits.md new file mode 100644 index 00000000..9cf66e43 --- /dev/null +++ b/books/architecture/src/storage/db/issues/traits.md @@ -0,0 +1,15 @@ +# Traits abstracting backends +Although all database backends used are very similar, they have some crucial differences in small implementation details that must be worked around when conforming them to `cuprate_database`'s traits. + +Put simply: using `cuprate_database`'s traits is less efficient and more awkward than using the backend directly. + +For example: +- [Data types must be wrapped in compatibility layers when they otherwise wouldn't be](https://github.com/Cuprate/cuprate/blob/d0ac94a813e4cd8e0ed8da5e85a53b1d1ace2463/database/src/backend/heed/env.rs#L101-L116) +- [There are types that only apply to a specific backend, but are visible to all](https://github.com/Cuprate/cuprate/blob/d0ac94a813e4cd8e0ed8da5e85a53b1d1ace2463/database/src/error.rs#L86-L89) +- [There are extra layers of abstraction to smoothen the differences between all backends](https://github.com/Cuprate/cuprate/blob/d0ac94a813e4cd8e0ed8da5e85a53b1d1ace2463/database/src/env.rs#L62-L68) +- [Existing functionality of backends must be taken away, as it isn't supported in the others](https://github.com/Cuprate/cuprate/blob/d0ac94a813e4cd8e0ed8da5e85a53b1d1ace2463/database/src/database.rs#L27-L34) + +This is a _tradeoff_ that `cuprate_database` takes, as: +- The backend itself is usually not the source of bottlenecks in the greater system, as such, small inefficiencies are OK +- None of the lost functionality is crucial for operation +- The ability to use, test, and swap between multiple database backends is [worth it](https://github.com/Cuprate/cuprate/pull/35#issuecomment-1952804393) diff --git a/books/architecture/src/storage/db/issues/unaligned.md b/books/architecture/src/storage/db/issues/unaligned.md new file mode 100644 index 00000000..3c45c19e --- /dev/null +++ b/books/architecture/src/storage/db/issues/unaligned.md @@ -0,0 +1,24 @@ +# Copying unaligned bytes +As mentioned in [`(De)serialization`](../serde.md), bytes are _copied_ when they are turned into a type `T` due to unaligned bytes being returned from database backends. + +Using a regular reference cast results in an improperly aligned type `T`; [such a type even existing causes undefined behavior](https://doc.rust-lang.org/reference/behavior-considered-undefined.html). In our case, `bytemuck` saves us by panicking before this occurs. + +Thus, when using `cuprate_database`'s database traits, an _owned_ `T` is returned. + +This is doubly unfortunately for `&[u8]` as this does not even need deserialization. + +For example, `StorableVec` could have been this: +```rust +enum StorableBytes<'a, T: Storable> { + Owned(T), + Ref(&'a T), +} +``` +but this would require supporting types that must be copied regardless with the occasional `&[u8]` that can be returned without casting. This was hard to do so in a generic way, thus all `[u8]`'s are copied and returned as owned `StorableVec`s. + +This is a _tradeoff_ `cuprate_database` takes as: +- `bytemuck::pod_read_unaligned` is cheap enough +- The main API, `service`, needs to return owned value anyway +- Having no references removes a lot of lifetime complexity + +The alternative is somehow fixing the alignment issues in the backends mentioned previously. \ No newline at end of file diff --git a/books/architecture/src/storage/db/resizing.md b/books/architecture/src/storage/db/resizing.md new file mode 100644 index 00000000..ebf989e7 --- /dev/null +++ b/books/architecture/src/storage/db/resizing.md @@ -0,0 +1,8 @@ +# Resizing +`cuprate_database` itself does not handle memory map resizes automatically +(for database backends that need resizing, i.e. heed/LMDB). + +When a user directly using `cuprate_database`, it is up to them on how to resize. The database will return [`RuntimeError::ResizeNeeded`](https://doc.cuprate.org/cuprate_database/enum.RuntimeError.html#variant.ResizeNeeded) when it needs resizing. + +However, `cuprate_database` exposes some [resizing algorithms](https://doc.cuprate.org/cuprate_database/resize/index.html) +that define how the database's memory map grows. \ No newline at end of file diff --git a/books/architecture/src/storage/db/serde.md b/books/architecture/src/storage/db/serde.md new file mode 100644 index 00000000..de17f307 --- /dev/null +++ b/books/architecture/src/storage/db/serde.md @@ -0,0 +1,44 @@ +# (De)serialization +All types stored inside the database are either bytes already or are perfectly bitcast-able. + +As such, they do not incur heavy (de)serialization costs when storing/fetching them from the database. The main (de)serialization used is [`bytemuck`](https://docs.rs/bytemuck)'s traits and casting functions. + +## Size and layout +The size & layout of types is stable across compiler versions, as they are set and determined with [`#[repr(C)]`](https://doc.rust-lang.org/nomicon/other-reprs.html#reprc) and `bytemuck`'s derive macros such as [`bytemuck::Pod`](https://docs.rs/bytemuck/latest/bytemuck/derive.Pod.html). + +Note that the data stored in the tables are still type-safe; we still refer to the key and values within our tables by the type. + +## How +The main deserialization `trait` for database storage is [`Storable`](https://doc.cuprate.org/cuprate_database/trait.Storable.html). + +- Before storage, the type is [simply cast into bytes](https://github.com/Cuprate/cuprate/blob/2ac90420c658663564a71b7ecb52d74f3c2c9d0f/database/src/storable.rs#L125) +- When fetching, the bytes are [simply cast into the type](https://github.com/Cuprate/cuprate/blob/2ac90420c658663564a71b7ecb52d74f3c2c9d0f/database/src/storable.rs#L130) + +When a type is casted into bytes, [the reference is casted](https://docs.rs/bytemuck/latest/bytemuck/fn.bytes_of.html), i.e. this is zero-cost serialization. + +However, it is worth noting that when bytes are casted into the type, [it is copied](https://docs.rs/bytemuck/latest/bytemuck/fn.pod_read_unaligned.html). This is due to byte alignment guarantee issues with both backends, see: +- +- + +Without this, `bytemuck` will panic with [`TargetAlignmentGreaterAndInputNotAligned`](https://docs.rs/bytemuck/latest/bytemuck/enum.PodCastError.html#variant.TargetAlignmentGreaterAndInputNotAligned) when casting. + +Copying the bytes fixes this problem, although it is more costly than necessary. However, in the main use-case for `cuprate_database` (`tower::Service` API) the bytes would need to be owned regardless as the `Request/Response` API uses owned data types (`T`, `Vec`, `HashMap`, etc). + +Practically speaking, this means lower-level database functions that normally look like such: +```rust +fn get(key: &Key) -> &Value; +``` +end up looking like this in `cuprate_database`: +```rust +fn get(key: &Key) -> Value; +``` + +Since each backend has its own (de)serialization methods, our types are wrapped in compatibility types that map our `Storable` functions into whatever is required for the backend, e.g: +- [`StorableHeed`](https://github.com/Cuprate/cuprate/blob/2ac90420c658663564a71b7ecb52d74f3c2c9d0f/database/src/backend/heed/storable.rs#L11-L45) +- [`StorableRedb`](https://github.com/Cuprate/cuprate/blob/2ac90420c658663564a71b7ecb52d74f3c2c9d0f/database/src/backend/redb/storable.rs#L11-L30) + +Compatibility structs also exist for any `Storable` containers: +- [`StorableVec`](https://github.com/Cuprate/cuprate/blob/2ac90420c658663564a71b7ecb52d74f3c2c9d0f/database/src/storable.rs#L135-L191) +- [`StorableBytes`](https://github.com/Cuprate/cuprate/blob/2ac90420c658663564a71b7ecb52d74f3c2c9d0f/database/src/storable.rs#L208-L241) + +Again, it's unfortunate that these must be owned, although in the `tower::Service` use-case, they would have to be owned anyway. \ No newline at end of file diff --git a/books/architecture/src/storage/db/syncing.md b/books/architecture/src/storage/db/syncing.md new file mode 100644 index 00000000..3f3444ea --- /dev/null +++ b/books/architecture/src/storage/db/syncing.md @@ -0,0 +1,17 @@ +# Syncing +`cuprate_database`'s database has 5 disk syncing modes. + +1. `FastThenSafe` +1. `Safe` +1. `Async` +1. `Threshold` +1. `Fast` + +The default mode is `Safe`. + +This means that upon each transaction commit, all the data that was written will be fully synced to disk. +This is the slowest, but safest mode of operation. + +Note that upon any database `Drop`, the current implementation will sync to disk regardless of any configuration. + +For more information on the other modes, read the documentation [here](https://github.com/Cuprate/cuprate/blob/2ac90420c658663564a71b7ecb52d74f3c2c9d0f/database/src/config/sync_mode.rs#L63-L144). diff --git a/books/architecture/src/storage/intro.md b/books/architecture/src/storage/intro.md index 214cf15d..a28a0177 100644 --- a/books/architecture/src/storage/intro.md +++ b/books/architecture/src/storage/intro.md @@ -1 +1,34 @@ -# ⚪️ Storage +# Storage +This section covers all things related to the on-disk storage of data within Cuprate. + +## Overview +The quick overview is that Cuprate has a [database abstraction crate](./database-abstraction.md) +that handles "low-level" database details such as key and value (de)serialization, tables, transactions, etc. + +This database abstraction crate is then used by all crates that need on-disk storage, i.e. the +- [Blockchain database](./blockchain/intro.md) +- [Transaction pool database](./txpool/intro.md) + +## Service +The interface provided by all crates building on-top of the +database abstraction is a [`tower::Service`](https://docs.rs/tower), i.e. +database requests/responses are sent/received asynchronously. + +As the interface details are similar across crates (threadpool, read operations, write operations), +the interface itself is abstracted in the [`cuprate_database_service`](./common/service/intro.md) crate, +which is then used by the crates. + +## Diagram +This is roughly how database crates are set up. + +```text + ┌─────────────────┐ +┌──────────────────────────────────┐ │ │ +│ Some crate that needs a database │ ┌────────────────┐ │ │ +│ │ │ Public │ │ │ +│ ┌──────────────────────────────┐ │─►│ tower::Service │◄─►│ Rest of Cuprate │ +│ │ Database abstraction │ │ │ API │ │ │ +│ └──────────────────────────────┘ │ └────────────────┘ │ │ +└──────────────────────────────────┘ │ │ + └─────────────────┘ +``` diff --git a/books/architecture/src/storage/pruning.md b/books/architecture/src/storage/pruning/intro.md similarity index 100% rename from books/architecture/src/storage/pruning.md rename to books/architecture/src/storage/pruning/intro.md diff --git a/books/architecture/src/storage/transaction-pool.md b/books/architecture/src/storage/txpool/intro.md similarity index 100% rename from books/architecture/src/storage/transaction-pool.md rename to books/architecture/src/storage/txpool/intro.md diff --git a/storage/README.md b/storage/README.md index b04d8e78..77a8bcbe 100644 --- a/storage/README.md +++ b/storage/README.md @@ -1,5 +1,10 @@ -# storage +# Storage +This subdirectory contains all things related to the on-disk storage of data within Cuprate. -TODO: This subdirectory used to be `database/` and is in the middle of being shifted around. +See for design documentation +and the following links for user documentation: -The old `database/` design document is in `cuprate-blockchain/` which will eventually be ported Cuprate's architecture book. +- +- +- +- \ No newline at end of file diff --git a/storage/blockchain/DESIGN.md b/storage/blockchain/DESIGN.md deleted file mode 100644 index 22f729f0..00000000 --- a/storage/blockchain/DESIGN.md +++ /dev/null @@ -1,600 +0,0 @@ -# Database -FIXME: This documentation must be updated and moved to the architecture book. - -Cuprate's blockchain implementation. - -- [1. Documentation](#1-documentation) -- [2. File structure](#2-file-structure) - - [2.1 `src/`](#21-src) - - [2.2 `src/backend/`](#22-srcbackend) - - [2.3 `src/config/`](#23-srcconfig) - - [2.4 `src/ops/`](#24-srcops) - - [2.5 `src/service/`](#25-srcservice) -- [3. Backends](#3-backends) - - [3.1 heed](#31-heed) - - [3.2 redb](#32-redb) - - [3.3 redb-memory](#33-redb-memory) - - [3.4 sanakirja](#34-sanakirja) - - [3.5 MDBX](#35-mdbx) -- [4. Layers](#4-layers) - - [4.1 Backend](#41-backend) - - [4.2 Trait](#42-trait) - - [4.3 ConcreteEnv](#43-concreteenv) - - [4.4 ops](#44-ops) - - [4.5 service](#45-service) -- [5. The service](#5-the-service) - - [5.1 Initialization](#51-initialization) - - [5.2 Requests](#53-requests) - - [5.3 Responses](#54-responses) - - [5.4 Thread model](#52-thread-model) - - [5.5 Shutdown](#55-shutdown) -- [6. Syncing](#6-Syncing) -- [7. Resizing](#7-resizing) -- [8. (De)serialization](#8-deserialization) -- [9. Schema](#9-schema) - - [9.1 Tables](#91-tables) - - [9.2 Multimap tables](#92-multimap-tables) -- [10. Known issues and tradeoffs](#10-known-issues-and-tradeoffs) - - [10.1 Traits abstracting backends](#101-traits-abstracting-backends) - - [10.2 Hot-swappable backends](#102-hot-swappable-backends) - - [10.3 Copying unaligned bytes](#103-copying-unaligned-bytes) - - [10.4 Endianness](#104-endianness) - - [10.5 Extra table data](#105-extra-table-data) - ---- - -## 1. Documentation -Documentation for `database/` is split into 3 locations: - -| Documentation location | Purpose | -|---------------------------|---------| -| `database/README.md` | High level design of `cuprate-database` -| `cuprate-database` | Practical usage documentation/warnings/notes/etc -| Source file `// comments` | Implementation-specific details (e.g, how many reader threads to spawn?) - -This README serves as the implementation design document. - -For actual practical usage, `cuprate-database`'s types and general usage are documented via standard Rust tooling. - -Run: -```bash -cargo doc --package cuprate-database --open -``` -at the root of the repo to open/read the documentation. - -If this documentation is too abstract, refer to any of the source files, they are heavily commented. There are many `// Regular comments` that explain more implementation specific details that aren't present here or in the docs. Use the file reference below to find what you're looking for. - -The code within `src/` is also littered with some `grep`-able comments containing some keywords: - -| Word | Meaning | -|-------------|---------| -| `INVARIANT` | This code makes an _assumption_ that must be upheld for correctness -| `SAFETY` | This `unsafe` code is okay, for `x,y,z` reasons -| `FIXME` | This code works but isn't ideal -| `HACK` | This code is a brittle workaround -| `PERF` | This code is weird for performance reasons -| `TODO` | This must be implemented; There should be 0 of these in production code -| `SOMEDAY` | This should be implemented... someday - -## 2. File structure -A quick reference of the structure of the folders & files in `cuprate-database`. - -Note that `lib.rs/mod.rs` files are purely for re-exporting/visibility/lints, and contain no code. Each sub-directory has a corresponding `mod.rs`. - -### 2.1 `src/` -The top-level `src/` files. - -| File | Purpose | -|------------------------|---------| -| `constants.rs` | General constants used throughout `cuprate-database` -| `database.rs` | Abstracted database; `trait DatabaseR{o,w}` -| `env.rs` | Abstracted database environment; `trait Env` -| `error.rs` | Database error types -| `free.rs` | General free functions (related to the database) -| `key.rs` | Abstracted database keys; `trait Key` -| `resize.rs` | Database resizing algorithms -| `storable.rs` | Data (de)serialization; `trait Storable` -| `table.rs` | Database table abstraction; `trait Table` -| `tables.rs` | All the table definitions used by `cuprate-database` -| `tests.rs` | Utilities for `cuprate_database` testing -| `transaction.rs` | Database transaction abstraction; `trait TxR{o,w}` -| `types.rs` | Database-specific types -| `unsafe_unsendable.rs` | Marker type to impl `Send` for objects not `Send` - -### 2.2 `src/backend/` -This folder contains the implementation for actual databases used as the backend for `cuprate-database`. - -Each backend has its own folder. - -| Folder/File | Purpose | -|-------------|---------| -| `heed/` | Backend using using [`heed`](https://github.com/meilisearch/heed) (LMDB) -| `redb/` | Backend using [`redb`](https://github.com/cberner/redb) -| `tests.rs` | Backend-agnostic tests - -All backends follow the same file structure: - -| File | Purpose | -|------------------|---------| -| `database.rs` | Implementation of `trait DatabaseR{o,w}` -| `env.rs` | Implementation of `trait Env` -| `error.rs` | Implementation of backend's errors to `cuprate_database`'s error types -| `storable.rs` | Compatibility layer between `cuprate_database::Storable` and backend-specific (de)serialization -| `transaction.rs` | Implementation of `trait TxR{o,w}` -| `types.rs` | Type aliases for long backend-specific types - -### 2.3 `src/config/` -This folder contains the `cupate_database::config` module; configuration options for the database. - -| File | Purpose | -|---------------------|---------| -| `config.rs` | Main database `Config` struct -| `reader_threads.rs` | Reader thread configuration for `service` thread-pool -| `sync_mode.rs` | Disk sync configuration for backends - -### 2.4 `src/ops/` -This folder contains the `cupate_database::ops` module. - -These are higher-level functions abstracted over the database, that are Monero-related. - -| File | Purpose | -|-----------------|---------| -| `block.rs` | Block related (main functions) -| `blockchain.rs` | Blockchain related (height, cumulative values, etc) -| `key_image.rs` | Key image related -| `macros.rs` | Macros specific to `ops/` -| `output.rs` | Output related -| `property.rs` | Database properties (pruned, version, etc) -| `tx.rs` | Transaction related - -### 2.5 `src/service/` -This folder contains the `cupate_database::service` module. - -The `async`hronous request/response API other Cuprate crates use instead of managing the database directly themselves. - -| File | Purpose | -|----------------|---------| -| `free.rs` | General free functions used (related to `cuprate_database::service`) -| `read.rs` | Read thread-pool definitions and logic -| `tests.rs` | Thread-pool tests and test helper functions -| `types.rs` | `cuprate_database::service`-related type aliases -| `write.rs` | Writer thread definitions and logic - -## 3. Backends -`cuprate-database`'s `trait`s allow abstracting over the actual database, such that any backend in particular could be used. - -Each database's implementation for those `trait`'s are located in its respective folder in `src/backend/${DATABASE_NAME}/`. - -### 3.1 heed -The default database used is [`heed`](https://github.com/meilisearch/heed) (LMDB). The upstream versions from [`crates.io`](https://crates.io/crates/heed) are used. `LMDB` should not need to be installed as `heed` has a build script that pulls it in automatically. - -`heed`'s filenames inside Cuprate's database folder (`~/.local/share/cuprate/database/`) are: - -| Filename | Purpose | -|------------|---------| -| `data.mdb` | Main data file -| `lock.mdb` | Database lock file - -`heed`-specific notes: -- [There is a maximum reader limit](https://github.com/monero-project/monero/blob/059028a30a8ae9752338a7897329fe8012a310d5/src/blockchain_db/lmdb/db_lmdb.cpp#L1372). Other potential processes (e.g. `xmrblocks`) that are also reading the `data.mdb` file need to be accounted for -- [LMDB does not work on remote filesystem](https://github.com/LMDB/lmdb/blob/b8e54b4c31378932b69f1298972de54a565185b1/libraries/liblmdb/lmdb.h#L129) - -### 3.2 redb -The 2nd database backend is the 100% Rust [`redb`](https://github.com/cberner/redb). - -The upstream versions from [`crates.io`](https://crates.io/crates/redb) are used. - -`redb`'s filenames inside Cuprate's database folder (`~/.local/share/cuprate/database/`) are: - -| Filename | Purpose | -|-------------|---------| -| `data.redb` | Main data file - - - -### 3.3 redb-memory -This backend is 100% the same as `redb`, although, it uses `redb::backend::InMemoryBackend` which is a database that completely resides in memory instead of a file. - -All other details about this should be the same as the normal `redb` backend. - -### 3.4 sanakirja -[`sanakirja`](https://docs.rs/sanakirja) was a candidate as a backend, however there were problems with maximum value sizes. - -The default maximum value size is [1012 bytes](https://docs.rs/sanakirja/1.4.1/sanakirja/trait.Storable.html) which was too small for our requirements. Using [`sanakirja::Slice`](https://docs.rs/sanakirja/1.4.1/sanakirja/union.Slice.html) and [sanakirja::UnsizedStorage](https://docs.rs/sanakirja/1.4.1/sanakirja/trait.UnsizedStorable.html) was attempted, but there were bugs found when inserting a value in-between `512..=4096` bytes. - -As such, it is not implemented. - -### 3.5 MDBX -[`MDBX`](https://erthink.github.io/libmdbx) was a candidate as a backend, however MDBX deprecated the custom key/value comparison functions, this makes it a bit trickier to implement [`9.2 Multimap tables`](#92-multimap-tables). It is also quite similar to the main backend LMDB (of which it was originally a fork of). - -As such, it is not implemented (yet). - -## 4. Layers -`cuprate_database` is logically abstracted into 5 layers, with each layer being built upon the last. - -Starting from the lowest: -1. Backend -2. Trait -3. ConcreteEnv -4. `ops` -5. `service` - - - -### 4.1 Backend -This is the actual database backend implementation (or a Rust shim over one). - -Examples: -- `heed` (LMDB) -- `redb` - -`cuprate_database` itself just uses a backend, it does not implement one. - -All backends have the following attributes: -- [Embedded](https://en.wikipedia.org/wiki/Embedded_database) -- [Multiversion concurrency control](https://en.wikipedia.org/wiki/Multiversion_concurrency_control) -- [ACID](https://en.wikipedia.org/wiki/ACID) -- Are `(key, value)` oriented and have the expected API (`get()`, `insert()`, `delete()`) -- Are table oriented (`"table_name" -> (key, value)`) -- Allows concurrent readers - -### 4.2 Trait -`cuprate_database` provides a set of `trait`s that abstract over the various database backends. - -This allows the function signatures and behavior to stay the same but allows for swapping out databases in an easier fashion. - -All common behavior of the backend's are encapsulated here and used instead of using the backend directly. - -Examples: -- [`trait Env`](https://github.com/Cuprate/cuprate/blob/2ac90420c658663564a71b7ecb52d74f3c2c9d0f/database/src/env.rs) -- [`trait {TxRo, TxRw}`](https://github.com/Cuprate/cuprate/blob/2ac90420c658663564a71b7ecb52d74f3c2c9d0f/database/src/transaction.rs) -- [`trait {DatabaseRo, DatabaseRw}`](https://github.com/Cuprate/cuprate/blob/2ac90420c658663564a71b7ecb52d74f3c2c9d0f/database/src/database.rs) - -For example, instead of calling `LMDB` or `redb`'s `get()` function directly, `DatabaseRo::get()` is called. - -### 4.3 ConcreteEnv -This is the non-generic, concrete `struct` provided by `cuprate_database` that contains all the data necessary to operate the database. The actual database backend `ConcreteEnv` will use internally depends on which backend feature is used. - -`ConcreteEnv` implements `trait Env`, which opens the door to all the other traits. - -The equivalent objects in the backends themselves are: -- [`heed::Env`](https://docs.rs/heed/0.20.0/heed/struct.Env.html) -- [`redb::Database`](https://docs.rs/redb/2.1.0/redb/struct.Database.html) - -This is the main object used when handling the database directly, although that is not strictly necessary as a user if the [`4.5 service`](#45-service) layer is used. - -### 4.4 ops -These are Monero-specific functions that use the abstracted `trait` forms of the database. - -Instead of dealing with the database directly: -- `get()` -- `delete()` - -the `ops` layer provides more abstract functions that deal with commonly used Monero operations: -- `add_block()` -- `pop_block()` - -### 4.5 service -The final layer abstracts the database completely into a [Monero-specific `async` request/response API](https://github.com/Cuprate/cuprate/blob/2ac90420c658663564a71b7ecb52d74f3c2c9d0f/types/src/service.rs#L18-L78) using [`tower::Service`](https://docs.rs/tower/latest/tower/trait.Service.html). - -For more information on this layer, see the next section: [`5. The service`](#5-the-service). - -## 5. The service -The main API `cuprate_database` exposes for other crates to use is the `cuprate_database::service` module. - -This module exposes an `async` request/response API with `tower::Service`, backed by a threadpool, that allows reading/writing Monero-related data from/to the database. - -`cuprate_database::service` itself manages the database using a separate writer thread & reader thread-pool, and uses the previously mentioned [`4.4 ops`](#44-ops) functions when responding to requests. - -### 5.1 Initialization -The service is started simply by calling: [`cuprate_database::service::init()`](https://github.com/Cuprate/cuprate/blob/d0ac94a813e4cd8e0ed8da5e85a53b1d1ace2463/database/src/service/free.rs#L23). - -This function initializes the database, spawns threads, and returns a: -- Read handle to the database (cloneable) -- Write handle to the database (not cloneable) - -These "handles" implement the `tower::Service` trait, which allows sending requests and receiving responses `async`hronously. - -### 5.2 Requests -Along with the 2 handles, there are 2 types of requests: -- [`ReadRequest`](https://github.com/Cuprate/cuprate/blob/d0ac94a813e4cd8e0ed8da5e85a53b1d1ace2463/types/src/service.rs#L23-L90) -- [`WriteRequest`](https://github.com/Cuprate/cuprate/blob/d0ac94a813e4cd8e0ed8da5e85a53b1d1ace2463/types/src/service.rs#L93-L105) - -`ReadRequest` is for retrieving various types of information from the database. - -`WriteRequest` currently only has 1 variant: to write a block to the database. - -### 5.3 Responses -After sending one of the above requests using the read/write handle, the value returned is _not_ the response, yet an `async`hronous channel that will eventually return the response: -```rust,ignore -// Send a request. -// tower::Service::call() -// V -let response_channel: Channel = read_handle.call(ReadResponse::ChainHeight)?; - -// Await the response. -let response: ReadResponse = response_channel.await?; - -// Assert the response is what we expected. -assert_eq!(matches!(response), Response::ChainHeight(_)); -``` - -After `await`ing the returned channel, a `Response` will eventually be returned when the `service` threadpool has fetched the value from the database and sent it off. - -Both read/write requests variants match in name with `Response` variants, i.e. -- `ReadRequest::ChainHeight` leads to `Response::ChainHeight` -- `WriteRequest::WriteBlock` leads to `Response::WriteBlockOk` - -### 5.4 Thread model -As mentioned in the [`4. Layers`](#4-layers) section, the base database abstractions themselves are not concerned with parallelism, they are mostly functions to be called from a single-thread. - -However, the `cuprate_database::service` API, _does_ have a thread model backing it. - -When [`cuprate_database::service`'s initialization function](https://github.com/Cuprate/cuprate/blob/9c27ba5791377d639cb5d30d0f692c228568c122/database/src/service/free.rs#L33-L44) is called, threads will be spawned and maintained until the user drops (disconnects) the returned handles. - -The current behavior for thread count is: -- [1 writer thread](https://github.com/Cuprate/cuprate/blob/9c27ba5791377d639cb5d30d0f692c228568c122/database/src/service/write.rs#L52-L66) -- [As many reader threads as there are system threads](https://github.com/Cuprate/cuprate/blob/9c27ba5791377d639cb5d30d0f692c228568c122/database/src/service/read.rs#L104-L126) - -For example, on a system with 32-threads, `cuprate_database` will spawn: -- 1 writer thread -- 32 reader threads - -whose sole responsibility is to listen for database requests, access the database (potentially in parallel), and return a response. - -Note that the `1 system thread = 1 reader thread` model is only the default setting, the reader thread count can be configured by the user to be any number between `1 .. amount_of_system_threads`. - -The reader threads are managed by [`rayon`](https://docs.rs/rayon). - -For an example of where multiple reader threads are used: given a request that asks if any key-image within a set already exists, `cuprate_database` will [split that work between the threads with `rayon`](https://github.com/Cuprate/cuprate/blob/9c27ba5791377d639cb5d30d0f692c228568c122/database/src/service/read.rs#L490-L503). - -### 5.5 Shutdown -Once the read/write handles are `Drop`ed, the backing thread(pool) will gracefully exit, automatically. - -Note the writer thread and reader threadpool aren't connected whatsoever; dropping the write handle will make the writer thread exit, however, the reader handle is free to be held onto and can be continued to be read from - and vice-versa for the write handle. - -## 6. Syncing -`cuprate_database`'s database has 5 disk syncing modes. - -1. FastThenSafe -1. Safe -1. Async -1. Threshold -1. Fast - -The default mode is `Safe`. - -This means that upon each transaction commit, all the data that was written will be fully synced to disk. This is the slowest, but safest mode of operation. - -Note that upon any database `Drop`, whether via `service` or dropping the database directly, the current implementation will sync to disk regardless of any configuration. - -For more information on the other modes, read the documentation [here](https://github.com/Cuprate/cuprate/blob/2ac90420c658663564a71b7ecb52d74f3c2c9d0f/database/src/config/sync_mode.rs#L63-L144). - -## 7. Resizing -Database backends that require manually resizing will, by default, use a similar algorithm as `monerod`'s. - -Note that this only relates to the `service` module, where the database is handled by `cuprate_database` itself, not the user. In the case of a user directly using `cuprate_database`, it is up to them on how to resize. - -Within `service`, the resizing logic defined [here](https://github.com/Cuprate/cuprate/blob/2ac90420c658663564a71b7ecb52d74f3c2c9d0f/database/src/service/write.rs#L139-L201) does the following: - -- If there's not enough space to fit a write request's data, start a resize -- Each resize adds around [`1_073_745_920`](https://github.com/Cuprate/cuprate/blob/2ac90420c658663564a71b7ecb52d74f3c2c9d0f/database/src/resize.rs#L104-L160) bytes to the current map size -- A resize will be attempted `3` times before failing - -There are other [resizing algorithms](https://github.com/Cuprate/cuprate/blob/2ac90420c658663564a71b7ecb52d74f3c2c9d0f/database/src/resize.rs#L38-L47) that define how the database's memory map grows, although currently the behavior of [`monerod`](https://github.com/Cuprate/cuprate/blob/2ac90420c658663564a71b7ecb52d74f3c2c9d0f/database/src/resize.rs#L104-L160) is closely followed. - -## 8. (De)serialization -All types stored inside the database are either bytes already, or are perfectly bitcast-able. - -As such, they do not incur heavy (de)serialization costs when storing/fetching them from the database. The main (de)serialization used is [`bytemuck`](https://docs.rs/bytemuck)'s traits and casting functions. - -The size & layout of types is stable across compiler versions, as they are set and determined with [`#[repr(C)]`](https://doc.rust-lang.org/nomicon/other-reprs.html#reprc) and `bytemuck`'s derive macros such as [`bytemuck::Pod`](https://docs.rs/bytemuck/latest/bytemuck/derive.Pod.html). - -Note that the data stored in the tables are still type-safe; we still refer to the key and values within our tables by the type. - -The main deserialization `trait` for database storage is: [`cuprate_database::Storable`](https://github.com/Cuprate/cuprate/blob/2ac90420c658663564a71b7ecb52d74f3c2c9d0f/database/src/storable.rs#L16-L115). - -- Before storage, the type is [simply cast into bytes](https://github.com/Cuprate/cuprate/blob/2ac90420c658663564a71b7ecb52d74f3c2c9d0f/database/src/storable.rs#L125) -- When fetching, the bytes are [simply cast into the type](https://github.com/Cuprate/cuprate/blob/2ac90420c658663564a71b7ecb52d74f3c2c9d0f/database/src/storable.rs#L130) - -When a type is casted into bytes, [the reference is casted](https://docs.rs/bytemuck/latest/bytemuck/fn.bytes_of.html), i.e. this is zero-cost serialization. - -However, it is worth noting that when bytes are casted into the type, [it is copied](https://docs.rs/bytemuck/latest/bytemuck/fn.pod_read_unaligned.html). This is due to byte alignment guarantee issues with both backends, see: -- https://github.com/AltSysrq/lmdb-zero/issues/8 -- https://github.com/cberner/redb/issues/360 - -Without this, `bytemuck` will panic with [`TargetAlignmentGreaterAndInputNotAligned`](https://docs.rs/bytemuck/latest/bytemuck/enum.PodCastError.html#variant.TargetAlignmentGreaterAndInputNotAligned) when casting. - -Copying the bytes fixes this problem, although it is more costly than necessary. However, in the main use-case for `cuprate_database` (the `service` module) the bytes would need to be owned regardless as the `Request/Response` API uses owned data types (`T`, `Vec`, `HashMap`, etc). - -Practically speaking, this means lower-level database functions that normally look like such: -```rust -fn get(key: &Key) -> &Value; -``` -end up looking like this in `cuprate_database`: -```rust -fn get(key: &Key) -> Value; -``` - -Since each backend has its own (de)serialization methods, our types are wrapped in compatibility types that map our `Storable` functions into whatever is required for the backend, e.g: -- [`StorableHeed`](https://github.com/Cuprate/cuprate/blob/2ac90420c658663564a71b7ecb52d74f3c2c9d0f/database/src/backend/heed/storable.rs#L11-L45) -- [`StorableRedb`](https://github.com/Cuprate/cuprate/blob/2ac90420c658663564a71b7ecb52d74f3c2c9d0f/database/src/backend/redb/storable.rs#L11-L30) - -Compatibility structs also exist for any `Storable` containers: -- [`StorableVec`](https://github.com/Cuprate/cuprate/blob/2ac90420c658663564a71b7ecb52d74f3c2c9d0f/database/src/storable.rs#L135-L191) -- [`StorableBytes`](https://github.com/Cuprate/cuprate/blob/2ac90420c658663564a71b7ecb52d74f3c2c9d0f/database/src/storable.rs#L208-L241) - -Again, it's unfortunate that these must be owned, although in `service`'s use-case, they would have to be owned anyway. - -## 9. Schema -This following section contains Cuprate's database schema, it may change throughout the development of Cuprate, as such, nothing here is final. - -### 9.1 Tables -The `CamelCase` names of the table headers documented here (e.g. `TxIds`) are the actual type name of the table within `cuprate_database`. - -Note that words written within `code blocks` mean that it is a real type defined and usable within `cuprate_database`. Other standard types like u64 and type aliases (TxId) are written normally. - -Within `cuprate_database::tables`, the below table is essentially defined as-is with [a macro](https://github.com/Cuprate/cuprate/blob/31ce89412aa174fc33754f22c9a6d9ef5ddeda28/database/src/tables.rs#L369-L470). - -Many of the data types stored are the same data types, although are different semantically, as such, a map of aliases used and their real data types is also provided below. - -| Alias | Real Type | -|----------------------------------------------------|-----------| -| BlockHeight, Amount, AmountIndex, TxId, UnlockTime | u64 -| BlockHash, KeyImage, TxHash, PrunableHash | [u8; 32] - -| Table | Key | Value | Description | -|-------------------|----------------------|--------------------|-------------| -| `BlockBlobs` | BlockHeight | `StorableVec` | Maps a block's height to a serialized byte form of a block -| `BlockHeights` | BlockHash | BlockHeight | Maps a block's hash to its height -| `BlockInfos` | BlockHeight | `BlockInfo` | Contains metadata of all blocks -| `KeyImages` | KeyImage | () | This table is a set with no value, it stores transaction key images -| `NumOutputs` | Amount | u64 | Maps an output's amount to the number of outputs with that amount -| `Outputs` | `PreRctOutputId` | `Output` | This table contains legacy CryptoNote outputs which have clear amounts. This table will not contain an output with 0 amount. -| `PrunedTxBlobs` | TxId | `StorableVec` | Contains pruned transaction blobs (even if the database is not pruned) -| `PrunableTxBlobs` | TxId | `StorableVec` | Contains the prunable part of a transaction -| `PrunableHashes` | TxId | PrunableHash | Contains the hash of the prunable part of a transaction -| `RctOutputs` | AmountIndex | `RctOutput` | Contains RingCT outputs mapped from their global RCT index -| `TxBlobs` | TxId | `StorableVec` | Serialized transaction blobs (bytes) -| `TxIds` | TxHash | TxId | Maps a transaction's hash to its index/ID -| `TxHeights` | TxId | BlockHeight | Maps a transaction's ID to the height of the block it comes from -| `TxOutputs` | TxId | `StorableVec` | Gives the amount indices of a transaction's outputs -| `TxUnlockTime` | TxId | UnlockTime | Stores the unlock time of a transaction (only if it has a non-zero lock time) - -The definitions for aliases and types (e.g. `RctOutput`) are within the [`cuprate_database::types`](https://github.com/Cuprate/cuprate/blob/31ce89412aa174fc33754f22c9a6d9ef5ddeda28/database/src/types.rs#L51) module. - - - -### 9.2 Multimap tables -When referencing outputs, Monero will [use the amount and the amount index](https://github.com/monero-project/monero/blob/c8214782fb2a769c57382a999eaf099691c836e7/src/blockchain_db/lmdb/db_lmdb.cpp#L3447-L3449). This means 2 keys are needed to reach an output. - -With LMDB you can set the `DUP_SORT` flag on a table and then set the key/value to: -```rust -Key = KEY_PART_1 -``` -```rust -Value = { - KEY_PART_2, - VALUE // The actual value we are storing. -} -``` - -Then you can set a custom value sorting function that only takes `KEY_PART_2` into account; this is how `monerod` does it. - -This requires that the underlying database supports: -- multimap tables -- custom sort functions on values -- setting a cursor on a specific key/value - ---- - -Another way to implement this is as follows: -```rust -Key = { KEY_PART_1, KEY_PART_2 } -``` -```rust -Value = VALUE -``` - -Then the key type is simply used to look up the value; this is how `cuprate_database` does it. - -For example, the key/value pair for outputs is: -```rust -PreRctOutputId => Output -``` -where `PreRctOutputId` looks like this: -```rust -struct PreRctOutputId { - amount: u64, - amount_index: u64, -} -``` - -## 10. Known issues and tradeoffs -`cuprate_database` takes many tradeoffs, whether due to: -- Prioritizing certain values over others -- Not having a better solution -- Being "good enough" - -This is a list of the larger ones, along with issues that don't have answers yet. - -### 10.1 Traits abstracting backends -Although all database backends used are very similar, they have some crucial differences in small implementation details that must be worked around when conforming them to `cuprate_database`'s traits. - -Put simply: using `cuprate_database`'s traits is less efficient and more awkward than using the backend directly. - -For example: -- [Data types must be wrapped in compatibility layers when they otherwise wouldn't be](https://github.com/Cuprate/cuprate/blob/d0ac94a813e4cd8e0ed8da5e85a53b1d1ace2463/database/src/backend/heed/env.rs#L101-L116) -- [There are types that only apply to a specific backend, but are visible to all](https://github.com/Cuprate/cuprate/blob/d0ac94a813e4cd8e0ed8da5e85a53b1d1ace2463/database/src/error.rs#L86-L89) -- [There are extra layers of abstraction to smoothen the differences between all backends](https://github.com/Cuprate/cuprate/blob/d0ac94a813e4cd8e0ed8da5e85a53b1d1ace2463/database/src/env.rs#L62-L68) -- [Existing functionality of backends must be taken away, as it isn't supported in the others](https://github.com/Cuprate/cuprate/blob/d0ac94a813e4cd8e0ed8da5e85a53b1d1ace2463/database/src/database.rs#L27-L34) - -This is a _tradeoff_ that `cuprate_database` takes, as: -- The backend itself is usually not the source of bottlenecks in the greater system, as such, small inefficiencies are OK -- None of the lost functionality is crucial for operation -- The ability to use, test, and swap between multiple database backends is [worth it](https://github.com/Cuprate/cuprate/pull/35#issuecomment-1952804393) - -### 10.2 Hot-swappable backends -Using a different backend is really as simple as re-building `cuprate_database` with a different feature flag: -```bash -# Use LMDB. -cargo build --package cuprate-database --features heed - -# Use redb. -cargo build --package cuprate-database --features redb -``` - -This is "good enough" for now, however ideally, this hot-swapping of backends would be able to be done at _runtime_. - -As it is now, `cuprate_database` cannot compile both backends and swap based on user input at runtime; it must be compiled with a certain backend, which will produce a binary with only that backend. - -This also means things like [CI testing multiple backends is awkward](https://github.com/Cuprate/cuprate/blob/main/.github/workflows/ci.yml#L132-L136), as we must re-compile with different feature flags instead. - -### 10.3 Copying unaligned bytes -As mentioned in [`8. (De)serialization`](#8-deserialization), bytes are _copied_ when they are turned into a type `T` due to unaligned bytes being returned from database backends. - -Using a regular reference cast results in an improperly aligned type `T`; [such a type even existing causes undefined behavior](https://doc.rust-lang.org/reference/behavior-considered-undefined.html). In our case, `bytemuck` saves us by panicking before this occurs. - -Thus, when using `cuprate_database`'s database traits, an _owned_ `T` is returned. - -This is doubly unfortunately for `&[u8]` as this does not even need deserialization. - -For example, `StorableVec` could have been this: -```rust -enum StorableBytes<'a, T: Storable> { - Owned(T), - Ref(&'a T), -} -``` -but this would require supporting types that must be copied regardless with the occasional `&[u8]` that can be returned without casting. This was hard to do so in a generic way, thus all `[u8]`'s are copied and returned as owned `StorableVec`s. - -This is a _tradeoff_ `cuprate_database` takes as: -- `bytemuck::pod_read_unaligned` is cheap enough -- The main API, `service`, needs to return owned value anyway -- Having no references removes a lot of lifetime complexity - -The alternative is either: -- Using proper (de)serialization instead of casting (which comes with its own costs) -- Somehow fixing the alignment issues in the backends mentioned previously - -### 10.4 Endianness -`cuprate_database`'s (de)serialization and storage of bytes are native-endian, as in, byte storage order will depend on the machine it is running on. - -As Cuprate's build-targets are all little-endian ([big-endian by default machines barely exist](https://en.wikipedia.org/wiki/Endianness#Hardware)), this doesn't matter much and the byte ordering can be seen as a constant. - -Practically, this means `cuprated`'s database files can be transferred across computers, as can `monerod`'s. - -### 10.5 Extra table data -Some of `cuprate_database`'s tables differ from `monerod`'s tables, for example, the way [`9.2 Multimap tables`](#92-multimap-tables) tables are done requires that the primary key is stored _for all_ entries, compared to `monerod` only needing to store it once. - -For example: -```rust -// `monerod` only stores `amount: 1` once, -// `cuprated` stores it each time it appears. -struct PreRctOutputId { amount: 1, amount_index: 0 } -struct PreRctOutputId { amount: 1, amount_index: 1 } -``` - -This means `cuprated`'s database will be slightly larger than `monerod`'s. - -The current method `cuprate_database` uses will be "good enough" until usage shows that it must be optimized as multimap tables are tricky to implement across all backends.