mirror of
https://github.com/hinto-janai/cuprate.git
synced 2024-12-22 11:39:30 +00:00
books/architecture: port database design document (#267)
Some checks failed
Architecture mdBook / build (push) Has been cancelled
CI / fmt (push) Has been cancelled
CI / typo (push) Has been cancelled
CI / ci (macos-latest, stable, bash) (push) Has been cancelled
CI / ci (ubuntu-latest, stable, bash) (push) Has been cancelled
CI / ci (windows-latest, stable-x86_64-pc-windows-gnu, msys2 {0}) (push) Has been cancelled
Doc / build (push) Has been cancelled
Doc / deploy (push) Has been cancelled
Some checks failed
Architecture mdBook / build (push) Has been cancelled
CI / fmt (push) Has been cancelled
CI / typo (push) Has been cancelled
CI / ci (macos-latest, stable, bash) (push) Has been cancelled
CI / ci (ubuntu-latest, stable, bash) (push) Has been cancelled
CI / ci (windows-latest, stable-x86_64-pc-windows-gnu, msys2 {0}) (push) Has been cancelled
Doc / build (push) Has been cancelled
Doc / deploy (push) Has been cancelled
* add chapters * add files, intro * db abstraction * backends * abstraction * syncing * serde * issues * common/types * common/ops * common/service * service diagram * service/resize * service/thread-model * service/shutdown * storage/blockchain * update md files * cleanup * fixes * update for https://github.com/Cuprate/cuprate/pull/290 * review fix
This commit is contained in:
parent
5eb712f4de
commit
88605b081f
36 changed files with 685 additions and 611 deletions
|
@ -27,11 +27,37 @@
|
|||
|
||||
---
|
||||
|
||||
- [⚪️ Storage](storage/intro.md)
|
||||
- [⚪️ Database abstraction](storage/database-abstraction.md)
|
||||
- [⚪️ Blockchain](storage/blockchain.md)
|
||||
- [⚪️ Transaction pool](storage/transaction-pool.md)
|
||||
- [⚪️ Pruning](storage/pruning.md)
|
||||
- [🟢 Storage](storage/intro.md)
|
||||
- [🟢 Database abstraction](storage/db/intro.md)
|
||||
- [🟢 Abstraction](storage/db/abstraction/intro.md)
|
||||
- [🟢 Backend](storage/db/abstraction/backend.md)
|
||||
- [🟢 ConcreteEnv](storage/db/abstraction/concrete_env.md)
|
||||
- [🟢 Trait](storage/db/abstraction/trait.md)
|
||||
- [🟢 Syncing](storage/db/syncing.md)
|
||||
- [🟢 Resizing](storage/db/resizing.md)
|
||||
- [🟢 (De)serialization](storage/db/serde.md)
|
||||
- [🟢 Known issues and tradeoffs](storage/db/issues/intro.md)
|
||||
- [🟢 Abstracting backends](storage/db/issues/traits.md)
|
||||
- [🟢 Hot-swap](storage/db/issues/hot-swap.md)
|
||||
- [🟢 Unaligned bytes](storage/db/issues/unaligned.md)
|
||||
- [🟢 Endianness](storage/db/issues/endian.md)
|
||||
- [🟢 Multimap](storage/db/issues/multimap.md)
|
||||
- [🟢 Common behavior](storage/common/intro.md)
|
||||
- [🟢 Types](storage/common/types.md)
|
||||
- [🟢 `ops`](storage/common/ops.md)
|
||||
- [🟢 `tower::Service`](storage/common/service/intro.md)
|
||||
- [🟢 Initialization](storage/common/service/initialization.md)
|
||||
- [🟢 Requests](storage/common/service/requests.md)
|
||||
- [🟢 Responses](storage/common/service/responses.md)
|
||||
- [🟢 Resizing](storage/common/service/resizing.md)
|
||||
- [🟢 Thread model](storage/common/service/thread-model.md)
|
||||
- [🟢 Shutdown](storage/common/service/shutdown.md)
|
||||
- [🟢 Blockchain](storage/blockchain/intro.md)
|
||||
- [🟢 Schema](storage/blockchain/schema/intro.md)
|
||||
- [🟢 Tables](storage/blockchain/schema/tables.md)
|
||||
- [🟢 Multimap tables](storage/blockchain/schema/multimap.md)
|
||||
- [⚪️ Transaction pool](storage/txpool/intro.md)
|
||||
- [⚪️ Pruning](storage/pruning/intro.md)
|
||||
|
||||
---
|
||||
|
||||
|
|
|
@ -1 +0,0 @@
|
|||
# ⚪️ Blockchain
|
3
books/architecture/src/storage/blockchain/intro.md
Normal file
3
books/architecture/src/storage/blockchain/intro.md
Normal file
|
@ -0,0 +1,3 @@
|
|||
# Blockchain
|
||||
This section contains storage information specific to [`cuprate_blockchain`](https://doc.cuprate.org/cuprate_blockchain),
|
||||
the database built on-top of [`cuprate_database`](https://doc.cuprate.org/cuprate_database) that stores the blockchain.
|
|
@ -0,0 +1,2 @@
|
|||
# Schema
|
||||
This section contains the schema of `cuprate_blockchain`'s database tables.
|
45
books/architecture/src/storage/blockchain/schema/multimap.md
Normal file
45
books/architecture/src/storage/blockchain/schema/multimap.md
Normal file
|
@ -0,0 +1,45 @@
|
|||
# Multimap tables
|
||||
## Outputs
|
||||
When referencing outputs, Monero will [use the amount and the amount index](https://github.com/monero-project/monero/blob/c8214782fb2a769c57382a999eaf099691c836e7/src/blockchain_db/lmdb/db_lmdb.cpp#L3447-L3449). This means 2 keys are needed to reach an output.
|
||||
|
||||
With LMDB you can set the `DUP_SORT` flag on a table and then set the key/value to:
|
||||
```rust
|
||||
Key = KEY_PART_1
|
||||
```
|
||||
```rust
|
||||
Value = {
|
||||
KEY_PART_2,
|
||||
VALUE // The actual value we are storing.
|
||||
}
|
||||
```
|
||||
|
||||
Then you can set a custom value sorting function that only takes `KEY_PART_2` into account; this is how `monerod` does it.
|
||||
|
||||
This requires that the underlying database supports:
|
||||
- multimap tables
|
||||
- custom sort functions on values
|
||||
- setting a cursor on a specific key/value
|
||||
|
||||
## How `cuprate_blockchain` does it
|
||||
Another way to implement this is as follows:
|
||||
```rust
|
||||
Key = { KEY_PART_1, KEY_PART_2 }
|
||||
```
|
||||
```rust
|
||||
Value = VALUE
|
||||
```
|
||||
|
||||
Then the key type is simply used to look up the value; this is how `cuprate_blockchain` does it
|
||||
as [`cuprate_database` does not have a multimap abstraction (yet)](../../db/issues/multimap.md).
|
||||
|
||||
For example, the key/value pair for outputs is:
|
||||
```rust
|
||||
PreRctOutputId => Output
|
||||
```
|
||||
where `PreRctOutputId` looks like this:
|
||||
```rust
|
||||
struct PreRctOutputId {
|
||||
amount: u64,
|
||||
amount_index: u64,
|
||||
}
|
||||
```
|
39
books/architecture/src/storage/blockchain/schema/tables.md
Normal file
39
books/architecture/src/storage/blockchain/schema/tables.md
Normal file
|
@ -0,0 +1,39 @@
|
|||
# Tables
|
||||
|
||||
> See also: <https://doc.cuprate.org/cuprate_blockchain/tables> & <https://doc.cuprate.org/cuprate_blockchain/types>.
|
||||
|
||||
The `CamelCase` names of the table headers documented here (e.g. `TxIds`) are the actual type name of the table within `cuprate_blockchain`.
|
||||
|
||||
Note that words written within `code blocks` mean that it is a real type defined and usable within `cuprate_blockchain`. Other standard types like u64 and type aliases (TxId) are written normally.
|
||||
|
||||
Within `cuprate_blockchain::tables`, the below table is essentially defined as-is with [a macro](https://github.com/Cuprate/cuprate/blob/31ce89412aa174fc33754f22c9a6d9ef5ddeda28/database/src/tables.rs#L369-L470).
|
||||
|
||||
Many of the data types stored are the same data types, although are different semantically, as such, a map of aliases used and their real data types is also provided below.
|
||||
|
||||
| Alias | Real Type |
|
||||
|----------------------------------------------------|-----------|
|
||||
| BlockHeight, Amount, AmountIndex, TxId, UnlockTime | u64
|
||||
| BlockHash, KeyImage, TxHash, PrunableHash | [u8; 32]
|
||||
|
||||
---
|
||||
|
||||
| Table | Key | Value | Description |
|
||||
|--------------------|----------------------|-------------------------|-------------|
|
||||
| `BlockHeaderBlobs` | BlockHeight | `StorableVec<u8>` | Maps a block's height to a serialized byte form of its header
|
||||
| `BlockTxsHashes` | BlockHeight | `StorableVec<[u8; 32]>` | Maps a block's height to the block's transaction hashes
|
||||
| `BlockHeights` | BlockHash | BlockHeight | Maps a block's hash to its height
|
||||
| `BlockInfos` | BlockHeight | `BlockInfo` | Contains metadata of all blocks
|
||||
| `KeyImages` | KeyImage | () | This table is a set with no value, it stores transaction key images
|
||||
| `NumOutputs` | Amount | u64 | Maps an output's amount to the number of outputs with that amount
|
||||
| `Outputs` | `PreRctOutputId` | `Output` | This table contains legacy CryptoNote outputs which have clear amounts. This table will not contain an output with 0 amount.
|
||||
| `PrunedTxBlobs` | TxId | `StorableVec<u8>` | Contains pruned transaction blobs (even if the database is not pruned)
|
||||
| `PrunableTxBlobs` | TxId | `StorableVec<u8>` | Contains the prunable part of a transaction
|
||||
| `PrunableHashes` | TxId | PrunableHash | Contains the hash of the prunable part of a transaction
|
||||
| `RctOutputs` | AmountIndex | `RctOutput` | Contains RingCT outputs mapped from their global RCT index
|
||||
| `TxBlobs` | TxId | `StorableVec<u8>` | Serialized transaction blobs (bytes)
|
||||
| `TxIds` | TxHash | TxId | Maps a transaction's hash to its index/ID
|
||||
| `TxHeights` | TxId | BlockHeight | Maps a transaction's ID to the height of the block it comes from
|
||||
| `TxOutputs` | TxId | `StorableVec<u64>` | Gives the amount indices of a transaction's outputs
|
||||
| `TxUnlockTime` | TxId | UnlockTime | Stores the unlock time of a transaction (only if it has a non-zero lock time)
|
||||
|
||||
<!-- TODO(Boog900): We could split this table again into `RingCT (non-miner) Outputs` and `RingCT (miner) Outputs` as for miner outputs we can store the amount instead of commitment saving 24 bytes per miner output. -->
|
9
books/architecture/src/storage/common/intro.md
Normal file
9
books/architecture/src/storage/common/intro.md
Normal file
|
@ -0,0 +1,9 @@
|
|||
# Common behavior
|
||||
The crates that build on-top of the database abstraction ([`cuprate_database`](https://doc.cuprate.org/cuprate_database))
|
||||
share some common behavior including but not limited to:
|
||||
|
||||
- Defining their specific database tables and types
|
||||
- Having an `ops` module
|
||||
- Exposing a `tower::Service` API (backed by a threadpool) for public usage
|
||||
|
||||
This section provides more details on these behaviors.
|
21
books/architecture/src/storage/common/ops.md
Normal file
21
books/architecture/src/storage/common/ops.md
Normal file
|
@ -0,0 +1,21 @@
|
|||
# `ops`
|
||||
Both [`cuprate_blockchain`](https://doc.cuprate.org/cuprate_blockchain)
|
||||
and [`cuprate_txpool`](https://doc.cuprate.org/cuprate_txpool) expose an
|
||||
`ops` module containing abstracted abstracted Monero-related database operations.
|
||||
|
||||
For example, [`cuprate_blockchain::ops::block::add_block`](https://doc.cuprate.org/cuprate_blockchain/ops/block/fn.add_block.html).
|
||||
|
||||
These functions build on-top of the database traits and allow for more abstracted database operations.
|
||||
|
||||
For example, instead of these signatures:
|
||||
```rust
|
||||
fn get(_: &Key) -> Value;
|
||||
fn put(_: &Key, &Value);
|
||||
```
|
||||
the `ops` module provides much higher-level signatures like such:
|
||||
```rust
|
||||
fn add_block(block: &Block) -> Result<_, _>;
|
||||
```
|
||||
|
||||
Although these functions are exposed, they are not the main API, that would be next section:
|
||||
the [`tower::Service`](./service/intro.md) (which uses these functions).
|
|
@ -0,0 +1,9 @@
|
|||
# Initialization
|
||||
A database service is started simply by calling: [`init()`](https://doc.cuprate.org/cuprate_blockchain/service/fn.init.html).
|
||||
|
||||
This function initializes the database, spawns threads, and returns a:
|
||||
- Read handle to the database
|
||||
- Write handle to the database
|
||||
- The database itself
|
||||
|
||||
These handles implement the `tower::Service` trait, which allows sending requests and receiving responses `async`hronously.
|
65
books/architecture/src/storage/common/service/intro.md
Normal file
65
books/architecture/src/storage/common/service/intro.md
Normal file
|
@ -0,0 +1,65 @@
|
|||
# tower::Service
|
||||
Both [`cuprate_blockchain`](https://doc.cuprate.org/cuprate_blockchain)
|
||||
and [`cuprate_txpool`](https://doc.cuprate.org/cuprate_txpool) provide
|
||||
`async` [`tower::Service`](https://docs.rs/tower)s that define database requests/responses.
|
||||
|
||||
The main API that other Cuprate crates use.
|
||||
|
||||
There are 2 `tower::Service`s:
|
||||
1. A read service which is backed by a [`rayon::ThreadPool`](https://docs.rs/rayon)
|
||||
1. A write service which spawns a single thread to handle write requests
|
||||
|
||||
As this behavior is the same across all users of [`cuprate_database`](https://doc.cuprate.org/cuprate_database),
|
||||
it is extracted into its own crate: [`cuprate_database_service`](https://doc.cuprate.org/cuprate_database_service).
|
||||
|
||||
## Diagram
|
||||
As a recap, here is how this looks to a user of a higher-level database crate,
|
||||
`cuprate_blockchain` in this example. Starting from the lowest layer:
|
||||
|
||||
1. `cuprate_database` is used to abstract the database
|
||||
1. `cuprate_blockchain` builds on-top of that with tables, types, operations
|
||||
1. `cuprate_blockchain` exposes a `tower::Service` using `cuprate_database_service`
|
||||
1. The user now interfaces with `cuprate_blockchain` with that `tower::Service` in a request/response fashion
|
||||
|
||||
```
|
||||
┌──────────────────┐
|
||||
│ cuprate_database │
|
||||
└────────┬─────────┘
|
||||
┌─────────────────────────────────┴─────────────────────────────────┐
|
||||
│ cuprate_blockchain │
|
||||
│ │
|
||||
│ ┌──────────────────────┐ ┌─────────────────────────────────────┐ │
|
||||
│ │ Tables, types │ │ ops │ │
|
||||
│ │ ┌───────────┐┌─────┐ │ │ ┌─────────────┐ ┌──────────┐┌─────┐ │ │
|
||||
│ │ │ BlockInfo ││ ... │ ├──┤ │ add_block() │ │ add_tx() ││ ... │ │ │
|
||||
│ │ └───────────┘└─────┘ │ │ └─────────────┘ └──────────┘└─────┘ │ │
|
||||
│ └──────────────────────┘ └─────┬───────────────────────────────┘ │
|
||||
│ │ │
|
||||
│ ┌─────────┴───────────────────────────────┐ │
|
||||
│ │ tower::Service │ │
|
||||
│ │ ┌──────────────────────────────┐┌─────┐ │ │
|
||||
│ │ │ Blockchain{Read,Write}Handle ││ ... │ │ │
|
||||
│ │ └──────────────────────────────┘└─────┘ │ │
|
||||
│ └─────────┬───────────────────────────────┘ │
|
||||
│ │ │
|
||||
└─────────────────────────────────┼─────────────────────────────────┘
|
||||
│
|
||||
┌─────┴─────┐
|
||||
┌────────────────────┴────┐ ┌────┴──────────────────────────────────┐
|
||||
│ Database requests │ │ Database responses │
|
||||
│ ┌─────────────────────┐ │ │ ┌───────────────────────────────────┐ │
|
||||
│ │ FindBlock([u8; 32]) │ │ │ │ FindBlock(Option<(Chain, usize)>) │ │
|
||||
│ └─────────────────────┘ │ │ └───────────────────────────────────┘ │
|
||||
│ ┌─────────────────────┐ │ │ ┌───────────────────────────────────┐ │
|
||||
│ │ ChainHeight │ │ │ │ ChainHeight(usize, [u8; 32]) │ │
|
||||
│ └─────────────────────┘ │ │ └───────────────────────────────────┘ │
|
||||
│ ┌─────────────────────┐ │ │ ┌───────────────────────────────────┐ │
|
||||
│ │ ... │ │ │ │ ... │ │
|
||||
│ └─────────────────────┘ │ │ └───────────────────────────────────┘ │
|
||||
└─────────────────────────┘ └───────────────────────────────────────┘
|
||||
▲ │
|
||||
│ ▼
|
||||
┌─────────────────────────┐
|
||||
│ cuprate_blockchain user │
|
||||
└─────────────────────────┘
|
||||
```
|
|
@ -0,0 +1,8 @@
|
|||
# Requests
|
||||
Along with the 2 handles, there are 2 types of requests:
|
||||
- Read requests, e.g. [`BlockchainReadRequest`](https://doc.cuprate.org/cuprate_types/blockchain/enum.BlockchainReadRequest.html)
|
||||
- Write requests, e.g. [`BlockchainWriteRequest`](https://doc.cuprate.org/cuprate_types/blockchain/enum.BlockchainWriteRequest.html)
|
||||
|
||||
Quite obviously:
|
||||
- Read requests are for retrieving various data from the database
|
||||
- Write requests are for writing data to the database
|
15
books/architecture/src/storage/common/service/resizing.md
Normal file
15
books/architecture/src/storage/common/service/resizing.md
Normal file
|
@ -0,0 +1,15 @@
|
|||
# Resizing
|
||||
As noted in the [`cuprate_database` resizing section](../../db/resizing.md),
|
||||
builders on-top of `cuprate_database` are responsible for resizing the database.
|
||||
|
||||
In `cuprate_{blockchain,txpool}`'s case, that means the `tower::Service` must know
|
||||
how to resize. This logic is shared between both crates, defined in `cuprate_database_service`:
|
||||
<https://github.com/Cuprate/cuprate/blob/0941f68efcd7dfe66124ad0c1934277f47da9090/storage/service/src/service/write.rs#L107-L171>.
|
||||
|
||||
By default, this uses a _similar_ algorithm as `monerod`'s:
|
||||
|
||||
- [If there's not enough space to fit a write request's data](https://github.com/Cuprate/cuprate/blob/0941f68efcd7dfe66124ad0c1934277f47da9090/storage/service/src/service/write.rs#L130), start a resize
|
||||
- Each resize adds around [`1,073,745,920`](https://github.com/Cuprate/cuprate/blob/2ac90420c658663564a71b7ecb52d74f3c2c9d0f/database/src/resize.rs#L104-L160) bytes to the current map size
|
||||
- A resize will be [attempted `3` times](https://github.com/Cuprate/cuprate/blob/0941f68efcd7dfe66124ad0c1934277f47da9090/storage/service/src/service/write.rs#L110) before failing
|
||||
|
||||
There are other [resizing algorithms](https://doc.cuprate.org/cuprate_database/resize/enum.ResizeAlgorithm.html) that define how the database's memory map grows, although currently the behavior of `monerod` is closely followed (for no particular reason).
|
18
books/architecture/src/storage/common/service/responses.md
Normal file
18
books/architecture/src/storage/common/service/responses.md
Normal file
|
@ -0,0 +1,18 @@
|
|||
# Responses
|
||||
After sending a request using the read/write handle, the value returned is _not_ the response, yet an `async`hronous channel that will eventually return the response:
|
||||
```rust,ignore
|
||||
// Send a request.
|
||||
// tower::Service::call()
|
||||
// V
|
||||
let response_channel: Channel = read_handle.call(BlockchainReadRequest::ChainHeight)?;
|
||||
|
||||
// Await the response.
|
||||
let response: BlockchainReadRequest = response_channel.await?;
|
||||
```
|
||||
|
||||
After `await`ing the returned channel, a `Response` will eventually be returned when
|
||||
the `Service` threadpool has fetched the value from the database and sent it off.
|
||||
|
||||
Both read/write requests variants match in name with `Response` variants, i.e.
|
||||
- `BlockchainReadRequest::ChainHeight` leads to `BlockchainResponse::ChainHeight`
|
||||
- `BlockchainWriteRequest::WriteBlock` leads to `BlockchainResponse::WriteBlockOk`
|
|
@ -0,0 +1,4 @@
|
|||
# Shutdown
|
||||
Once the read/write handles to the `tower::Service` are `Drop`ed, the backing thread(pool) will gracefully exit, automatically.
|
||||
|
||||
Note the writer thread and reader threadpool aren't connected whatsoever; dropping the write handle will make the writer thread exit, however, the reader handle is free to be held onto and can be continued to be read from - and vice-versa for the write handle.
|
|
@ -0,0 +1,23 @@
|
|||
# Thread model
|
||||
The base database abstractions themselves are not concerned with parallelism, they are mostly functions to be called from a single-thread.
|
||||
|
||||
However, the `cuprate_database_service` API, _does_ have a thread model backing it.
|
||||
|
||||
When a `Service`'s init() function is called, threads will be spawned and
|
||||
maintained until the user drops (disconnects) the returned handles.
|
||||
|
||||
The current behavior for thread count is:
|
||||
- [1 writer thread](https://github.com/Cuprate/cuprate/blob/0941f68efcd7dfe66124ad0c1934277f47da9090/storage/service/src/service/write.rs#L48-L52)
|
||||
- [As many reader threads as there are system threads](https://github.com/Cuprate/cuprate/blob/0941f68efcd7dfe66124ad0c1934277f47da9090/storage/service/src/reader_threads.rs#L44-L49)
|
||||
|
||||
For example, on a system with 32-threads, `cuprate_database_service` will spawn:
|
||||
- 1 writer thread
|
||||
- 32 reader threads
|
||||
|
||||
whose sole responsibility is to listen for database requests, access the database (potentially in parallel), and return a response.
|
||||
|
||||
Note that the `1 system thread = 1 reader thread` model is only the default setting, the reader thread count can be configured by the user to be any number between `1 .. amount_of_system_threads`.
|
||||
|
||||
The reader threads are managed by [`rayon`](https://docs.rs/rayon).
|
||||
|
||||
For an example of where multiple reader threads are used: given a request that asks if any key-image within a set already exists, `cuprate_blockchain` will [split that work between the threads with `rayon`](https://github.com/Cuprate/cuprate/blob/0941f68efcd7dfe66124ad0c1934277f47da9090/storage/blockchain/src/service/read.rs#L400).
|
21
books/architecture/src/storage/common/types.md
Normal file
21
books/architecture/src/storage/common/types.md
Normal file
|
@ -0,0 +1,21 @@
|
|||
# Types
|
||||
## POD types
|
||||
Since [all types in the database are POD types](../db/serde.md), we must often
|
||||
provide mappings between outside types and the types actually stored in the database.
|
||||
|
||||
A common case is mapping infallible types to and from [`bitflags`](https://docs.rs/bitflag) and/or their raw integer representation.
|
||||
For example, the [`OutputFlag`](https://doc.cuprate.org/cuprate_blockchain/types/struct.OutputFlags.html) type or `bool` types.
|
||||
|
||||
As types like `enum`s, `bool`s and `char`s cannot be casted from an integer infallibly,
|
||||
`bytemuck::Pod` cannot be implemented on it safely. Thus, we store some infallible version
|
||||
of it inside the database with a custom type and map them when fetching the data.
|
||||
|
||||
## Lean types
|
||||
Another reason why database crates define their own types is
|
||||
to cut any unneeded data from the type.
|
||||
|
||||
Many of the types used in normal operation (e.g. [`cuprate_types::VerifiedBlockInformation`](https://doc.cuprate.org/cuprate_types/struct.VerifiedBlockInformation.html)) contain lots of extra pre-processed data for convenience.
|
||||
|
||||
This would be a waste to store in the database, so in this example, the much leaner
|
||||
"raw" [`BlockInfo`](https://doc.cuprate.org/cuprate_blockchain/types/struct.BlockInfo.html)
|
||||
type is stored.
|
|
@ -1 +0,0 @@
|
|||
# ⚪️ Database abstraction
|
50
books/architecture/src/storage/db/abstraction/backend.md
Normal file
50
books/architecture/src/storage/db/abstraction/backend.md
Normal file
|
@ -0,0 +1,50 @@
|
|||
# Backend
|
||||
First, we need an actual database implementation.
|
||||
|
||||
`cuprate-database`'s `trait`s allow abstracting over the actual database, such that any backend in particular could be used.
|
||||
|
||||
This page is an enumeration of all the backends Cuprate has, has tried, and may try in the future.
|
||||
|
||||
## `heed`
|
||||
The default database used is [`heed`](https://github.com/meilisearch/heed) (LMDB). The upstream versions from [`crates.io`](https://crates.io/crates/heed) are used. `LMDB` should not need to be installed as `heed` has a build script that pulls it in automatically.
|
||||
|
||||
`heed`'s filenames inside Cuprate's data folder are:
|
||||
|
||||
| Filename | Purpose |
|
||||
|------------|---------|
|
||||
| `data.mdb` | Main data file
|
||||
| `lock.mdb` | Database lock file
|
||||
|
||||
`heed`-specific notes:
|
||||
- [There is a maximum reader limit](https://github.com/monero-project/monero/blob/059028a30a8ae9752338a7897329fe8012a310d5/src/blockchain_db/lmdb/db_lmdb.cpp#L1372). Other potential processes (e.g. `xmrblocks`) that are also reading the `data.mdb` file need to be accounted for
|
||||
- [LMDB does not work on remote filesystem](https://github.com/LMDB/lmdb/blob/b8e54b4c31378932b69f1298972de54a565185b1/libraries/liblmdb/lmdb.h#L129)
|
||||
|
||||
## `redb`
|
||||
The 2nd database backend is the 100% Rust [`redb`](https://github.com/cberner/redb).
|
||||
|
||||
The upstream versions from [`crates.io`](https://crates.io/crates/redb) are used.
|
||||
|
||||
`redb`'s filenames inside Cuprate's data folder are:
|
||||
|
||||
| Filename | Purpose |
|
||||
|-------------|---------|
|
||||
| `data.redb` | Main data file
|
||||
|
||||
<!-- TODO: document DB on remote filesystem (does redb allow this?) -->
|
||||
|
||||
## `redb-memory`
|
||||
This backend is 100% the same as `redb`, although, it uses [`redb::backend::InMemoryBackend`](https://docs.rs/redb/2.1.2/redb/backends/struct.InMemoryBackend.html) which is a database that completely resides in memory instead of a file.
|
||||
|
||||
All other details about this should be the same as the normal `redb` backend.
|
||||
|
||||
## `sanakirja`
|
||||
[`sanakirja`](https://docs.rs/sanakirja) was a candidate as a backend, however there were problems with maximum value sizes.
|
||||
|
||||
The default maximum value size is [1012 bytes](https://docs.rs/sanakirja/1.4.1/sanakirja/trait.Storable.html) which was too small for our requirements. Using [`sanakirja::Slice`](https://docs.rs/sanakirja/1.4.1/sanakirja/union.Slice.html) and [sanakirja::UnsizedStorage](https://docs.rs/sanakirja/1.4.1/sanakirja/trait.UnsizedStorable.html) was attempted, but there were bugs found when inserting a value in-between `512..=4096` bytes.
|
||||
|
||||
As such, it is not implemented.
|
||||
|
||||
## `MDBX`
|
||||
[`MDBX`](https://erthink.github.io/libmdbx) was a candidate as a backend, however MDBX deprecated the custom key/value comparison functions, this makes it a bit trickier to implement multimap tables. It is also quite similar to the main backend LMDB (of which it was originally a fork of).
|
||||
|
||||
As such, it is not implemented (yet).
|
|
@ -0,0 +1,15 @@
|
|||
# `ConcreteEnv`
|
||||
After a backend is selected, the main database environment struct is "abstracted" by putting it in the non-generic, concrete [`struct ConcreteEnv`](https://doc.cuprate.org/cuprate_database/struct.ConcreteEnv.html).
|
||||
|
||||
This is the main object used when handling the database directly.
|
||||
|
||||
This struct contains all the data necessary to operate the database.
|
||||
The actual database backend `ConcreteEnv` will use internally [depends on which backend feature is used](https://github.com/Cuprate/cuprate/blob/0941f68efcd7dfe66124ad0c1934277f47da9090/storage/database/src/backend/mod.rs#L3-L13).
|
||||
|
||||
`ConcreteEnv` itself is not too important, what is important is that:
|
||||
1. It allows callers to not directly reference any particular backend environment
|
||||
1. It implements [`trait Env`](https://doc.cuprate.org/cuprate_database/trait.Env.html) which opens the door to all the other database traits
|
||||
|
||||
The equivalent "database environment" objects in the backends themselves are:
|
||||
- [`heed::Env`](https://docs.rs/heed/0.20.0/heed/struct.Env.html)
|
||||
- [`redb::Database`](https://docs.rs/redb/2.1.0/redb/struct.Database.html)
|
33
books/architecture/src/storage/db/abstraction/intro.md
Normal file
33
books/architecture/src/storage/db/abstraction/intro.md
Normal file
|
@ -0,0 +1,33 @@
|
|||
# Abstraction
|
||||
This next section details how `cuprate_database` abstracts multiple database backends into 1 API.
|
||||
|
||||
## Diagram
|
||||
A simple diagram describing the responsibilities/relationship of `cuprate_database`.
|
||||
|
||||
```text
|
||||
┌───────────────────────────────────────────────────────────────────────┐
|
||||
│ cuprate_database │
|
||||
│ │
|
||||
│ ┌───────────────────────────┐ ┌─────────────────────────────────┐ │
|
||||
│ │ Database traits │ │ Backends │ │
|
||||
│ │ ┌─────┐┌──────┐┌────────┐ │ │ ┌─────────────┐ ┌─────────────┐ │ │
|
||||
│ │ │ Env ││ TxRw ││ ... │ ├─────┤ │ heed (LMDB) │ │ redb │ │ │
|
||||
│ │ └─────┘└──────┘└────────┘ │ │ └─────────────┘ └─────────────┘ │ │
|
||||
│ └──────────┬─────────────┬──┘ └──┬──────────────────────────────┘ │
|
||||
│ │ └─────┬─────┘ │
|
||||
│ │ ┌─────────┴──────────────┐ │
|
||||
│ │ │ Database types │ │
|
||||
│ │ │ ┌─────────────┐┌─────┐ │ │
|
||||
│ │ │ │ ConcreteEnv ││ ... │ │ │
|
||||
│ │ │ └─────────────┘└─────┘ │ │
|
||||
│ │ └─────────┬──────────────┘ │
|
||||
│ │ │ │
|
||||
└────────────┼───────────────────┼──────────────────────────────────────┘
|
||||
│ │
|
||||
└───────────────────┤
|
||||
│
|
||||
▼
|
||||
┌───────────────────────┐
|
||||
│ cuprate_database user │
|
||||
└───────────────────────┘
|
||||
```
|
49
books/architecture/src/storage/db/abstraction/trait.md
Normal file
49
books/architecture/src/storage/db/abstraction/trait.md
Normal file
|
@ -0,0 +1,49 @@
|
|||
# Trait
|
||||
`cuprate_database` provides a set of `trait`s that abstract over the various database backends.
|
||||
|
||||
This allows the function signatures and behavior to stay the same but allows for swapping out databases in an easier fashion.
|
||||
|
||||
All common behavior of the backend's are encapsulated here and used instead of using the backend directly.
|
||||
|
||||
Examples:
|
||||
- [`trait Env`](https://github.com/Cuprate/cuprate/blob/2ac90420c658663564a71b7ecb52d74f3c2c9d0f/database/src/env.rs)
|
||||
- [`trait {TxRo, TxRw}`](https://github.com/Cuprate/cuprate/blob/2ac90420c658663564a71b7ecb52d74f3c2c9d0f/database/src/transaction.rs)
|
||||
- [`trait {DatabaseRo, DatabaseRw}`](https://github.com/Cuprate/cuprate/blob/2ac90420c658663564a71b7ecb52d74f3c2c9d0f/database/src/database.rs)
|
||||
|
||||
For example, instead of calling `heed` or `redb`'s `get()` function directly, `DatabaseRo::get()` is called.
|
||||
|
||||
## Usage
|
||||
With a `ConcreteEnv` and a particular backend selected,
|
||||
we can now start using it alongside these traits to start
|
||||
doing database operations in a generic manner.
|
||||
|
||||
An example:
|
||||
|
||||
```rust
|
||||
use cuprate_database::{
|
||||
ConcreteEnv,
|
||||
config::ConfigBuilder,
|
||||
Env, EnvInner,
|
||||
DatabaseRo, DatabaseRw, TxRo, TxRw,
|
||||
};
|
||||
|
||||
// Initialize the database environment.
|
||||
let env = ConcreteEnv::open(config)?;
|
||||
|
||||
// Open up a transaction + tables for writing.
|
||||
let env_inner = env.env_inner();
|
||||
let tx_rw = env_inner.tx_rw()?;
|
||||
env_inner.create_db::<Table>(&tx_rw)?;
|
||||
|
||||
// Write data to the table.
|
||||
{
|
||||
let mut table = env_inner.open_db_rw::<Table>(&tx_rw)?;
|
||||
table.put(&0, &1)?;
|
||||
}
|
||||
|
||||
// Commit the transaction.
|
||||
TxRw::commit(tx_rw)?;
|
||||
```
|
||||
|
||||
As seen above, there is no direct call to `heed` or `redb`.
|
||||
Their functionality is abstracted behind `ConcreteEnv` and the `trait`s.
|
23
books/architecture/src/storage/db/intro.md
Normal file
23
books/architecture/src/storage/db/intro.md
Normal file
|
@ -0,0 +1,23 @@
|
|||
# Database abstraction
|
||||
[`cuprate_database`](https://doc.cuprate.org/cuprate_database) is Cuprate’s database abstraction.
|
||||
|
||||
This crate abstracts various database backends with `trait`s.
|
||||
|
||||
All backends have the following attributes:
|
||||
|
||||
- [Embedded](https://en.wikipedia.org/wiki/Embedded_database)
|
||||
- [Multiversion concurrency control](https://en.wikipedia.org/wiki/Multiversion_concurrency_control)
|
||||
- [ACID](https://en.wikipedia.org/wiki/ACID)
|
||||
- Are `(key, value)` oriented and have the expected API (`get()`, `insert()`, `delete()`)
|
||||
- Are table oriented (`"table_name" -> (key, value)`)
|
||||
- Allows concurrent readers
|
||||
|
||||
The currently implemented backends are:
|
||||
- [`heed`](https://github.com/meilisearch/heed) (LMDB)
|
||||
- [`redb`](https://github.com/cberner/redb)
|
||||
|
||||
Said precicely, `cuprate_database` is the embedded database other Cuprate
|
||||
crates interact with instead of using any particular backend implementation.
|
||||
This allows the backend to be swapped and/or future backends to be implemented.
|
||||
|
||||
This section will go over `cuprate_database` details.
|
6
books/architecture/src/storage/db/issues/endian.md
Normal file
6
books/architecture/src/storage/db/issues/endian.md
Normal file
|
@ -0,0 +1,6 @@
|
|||
# Endianness
|
||||
`cuprate_database`'s (de)serialization and storage of bytes are native-endian, as in, byte storage order will depend on the machine it is running on.
|
||||
|
||||
As Cuprate's build-targets are all little-endian ([big-endian by default machines barely exist](https://en.wikipedia.org/wiki/Endianness#Hardware)), this doesn't matter much and the byte ordering can be seen as a constant.
|
||||
|
||||
Practically, this means `cuprated`'s database files can be transferred across computers, as can `monerod`'s.
|
17
books/architecture/src/storage/db/issues/hot-swap.md
Normal file
17
books/architecture/src/storage/db/issues/hot-swap.md
Normal file
|
@ -0,0 +1,17 @@
|
|||
# Hot-swappable backends
|
||||
> See also: <https://github.com/Cuprate/cuprate/issues/209>.
|
||||
|
||||
Using a different backend is really as simple as re-building `cuprate_database` with a different feature flag:
|
||||
```bash
|
||||
# Use LMDB.
|
||||
cargo build --package cuprate-database --features heed
|
||||
|
||||
# Use redb.
|
||||
cargo build --package cuprate-database --features redb
|
||||
```
|
||||
|
||||
This is "good enough" for now, however ideally, this hot-swapping of backends would be able to be done at _runtime_.
|
||||
|
||||
As it is now, `cuprate_database` cannot compile both backends and swap based on user input at runtime; it must be compiled with a certain backend, which will produce a binary with only that backend.
|
||||
|
||||
This also means things like [CI testing multiple backends is awkward](https://github.com/Cuprate/cuprate/blob/main/.github/workflows/ci.yml#L132-L136), as we must re-compile with different feature flags instead.
|
7
books/architecture/src/storage/db/issues/intro.md
Normal file
7
books/architecture/src/storage/db/issues/intro.md
Normal file
|
@ -0,0 +1,7 @@
|
|||
# Known issues and tradeoffs
|
||||
`cuprate_database` takes many tradeoffs, whether due to:
|
||||
- Prioritizing certain values over others
|
||||
- Not having a better solution
|
||||
- Being "good enough"
|
||||
|
||||
This section is a list of the larger ones, along with issues that don't have answers yet.
|
22
books/architecture/src/storage/db/issues/multimap.md
Normal file
22
books/architecture/src/storage/db/issues/multimap.md
Normal file
|
@ -0,0 +1,22 @@
|
|||
# Multimap
|
||||
`cuprate_database` does not currently have an abstraction for [multimap tables](https://en.wikipedia.org/wiki/Multimap).
|
||||
|
||||
All tables are single maps of keys to values.
|
||||
|
||||
This matters as this means some of `cuprate_blockchain`'s tables differ from `monerod`'s tables - the primary key is stored _for all_ entries, compared to `monerod` only needing to store it once:
|
||||
|
||||
```rust
|
||||
// `monerod` only stores `amount: 1` once,
|
||||
// `cuprated` stores it each time it appears.
|
||||
struct PreRctOutputId { amount: 1, amount_index: 0 }
|
||||
struct PreRctOutputId { amount: 1, amount_index: 1 }
|
||||
```
|
||||
|
||||
This means `cuprated`'s database will be slightly larger than `monerod`'s.
|
||||
|
||||
The current method `cuprate_blockchain` uses will be "good enough" as the multimap
|
||||
keys needed for now are fixed, e.g. pre-RCT outputs are no longer being produced.
|
||||
|
||||
This may need to change in the future when multimap is all but required, e.g. for FCMP++.
|
||||
|
||||
Until then, multimap tables are not implemented as they are tricky to implement across all backends.
|
15
books/architecture/src/storage/db/issues/traits.md
Normal file
15
books/architecture/src/storage/db/issues/traits.md
Normal file
|
@ -0,0 +1,15 @@
|
|||
# Traits abstracting backends
|
||||
Although all database backends used are very similar, they have some crucial differences in small implementation details that must be worked around when conforming them to `cuprate_database`'s traits.
|
||||
|
||||
Put simply: using `cuprate_database`'s traits is less efficient and more awkward than using the backend directly.
|
||||
|
||||
For example:
|
||||
- [Data types must be wrapped in compatibility layers when they otherwise wouldn't be](https://github.com/Cuprate/cuprate/blob/d0ac94a813e4cd8e0ed8da5e85a53b1d1ace2463/database/src/backend/heed/env.rs#L101-L116)
|
||||
- [There are types that only apply to a specific backend, but are visible to all](https://github.com/Cuprate/cuprate/blob/d0ac94a813e4cd8e0ed8da5e85a53b1d1ace2463/database/src/error.rs#L86-L89)
|
||||
- [There are extra layers of abstraction to smoothen the differences between all backends](https://github.com/Cuprate/cuprate/blob/d0ac94a813e4cd8e0ed8da5e85a53b1d1ace2463/database/src/env.rs#L62-L68)
|
||||
- [Existing functionality of backends must be taken away, as it isn't supported in the others](https://github.com/Cuprate/cuprate/blob/d0ac94a813e4cd8e0ed8da5e85a53b1d1ace2463/database/src/database.rs#L27-L34)
|
||||
|
||||
This is a _tradeoff_ that `cuprate_database` takes, as:
|
||||
- The backend itself is usually not the source of bottlenecks in the greater system, as such, small inefficiencies are OK
|
||||
- None of the lost functionality is crucial for operation
|
||||
- The ability to use, test, and swap between multiple database backends is [worth it](https://github.com/Cuprate/cuprate/pull/35#issuecomment-1952804393)
|
24
books/architecture/src/storage/db/issues/unaligned.md
Normal file
24
books/architecture/src/storage/db/issues/unaligned.md
Normal file
|
@ -0,0 +1,24 @@
|
|||
# Copying unaligned bytes
|
||||
As mentioned in [`(De)serialization`](../serde.md), bytes are _copied_ when they are turned into a type `T` due to unaligned bytes being returned from database backends.
|
||||
|
||||
Using a regular reference cast results in an improperly aligned type `T`; [such a type even existing causes undefined behavior](https://doc.rust-lang.org/reference/behavior-considered-undefined.html). In our case, `bytemuck` saves us by panicking before this occurs.
|
||||
|
||||
Thus, when using `cuprate_database`'s database traits, an _owned_ `T` is returned.
|
||||
|
||||
This is doubly unfortunately for `&[u8]` as this does not even need deserialization.
|
||||
|
||||
For example, `StorableVec` could have been this:
|
||||
```rust
|
||||
enum StorableBytes<'a, T: Storable> {
|
||||
Owned(T),
|
||||
Ref(&'a T),
|
||||
}
|
||||
```
|
||||
but this would require supporting types that must be copied regardless with the occasional `&[u8]` that can be returned without casting. This was hard to do so in a generic way, thus all `[u8]`'s are copied and returned as owned `StorableVec`s.
|
||||
|
||||
This is a _tradeoff_ `cuprate_database` takes as:
|
||||
- `bytemuck::pod_read_unaligned` is cheap enough
|
||||
- The main API, `service`, needs to return owned value anyway
|
||||
- Having no references removes a lot of lifetime complexity
|
||||
|
||||
The alternative is somehow fixing the alignment issues in the backends mentioned previously.
|
8
books/architecture/src/storage/db/resizing.md
Normal file
8
books/architecture/src/storage/db/resizing.md
Normal file
|
@ -0,0 +1,8 @@
|
|||
# Resizing
|
||||
`cuprate_database` itself does not handle memory map resizes automatically
|
||||
(for database backends that need resizing, i.e. heed/LMDB).
|
||||
|
||||
When a user directly using `cuprate_database`, it is up to them on how to resize. The database will return [`RuntimeError::ResizeNeeded`](https://doc.cuprate.org/cuprate_database/enum.RuntimeError.html#variant.ResizeNeeded) when it needs resizing.
|
||||
|
||||
However, `cuprate_database` exposes some [resizing algorithms](https://doc.cuprate.org/cuprate_database/resize/index.html)
|
||||
that define how the database's memory map grows.
|
44
books/architecture/src/storage/db/serde.md
Normal file
44
books/architecture/src/storage/db/serde.md
Normal file
|
@ -0,0 +1,44 @@
|
|||
# (De)serialization
|
||||
All types stored inside the database are either bytes already or are perfectly bitcast-able.
|
||||
|
||||
As such, they do not incur heavy (de)serialization costs when storing/fetching them from the database. The main (de)serialization used is [`bytemuck`](https://docs.rs/bytemuck)'s traits and casting functions.
|
||||
|
||||
## Size and layout
|
||||
The size & layout of types is stable across compiler versions, as they are set and determined with [`#[repr(C)]`](https://doc.rust-lang.org/nomicon/other-reprs.html#reprc) and `bytemuck`'s derive macros such as [`bytemuck::Pod`](https://docs.rs/bytemuck/latest/bytemuck/derive.Pod.html).
|
||||
|
||||
Note that the data stored in the tables are still type-safe; we still refer to the key and values within our tables by the type.
|
||||
|
||||
## How
|
||||
The main deserialization `trait` for database storage is [`Storable`](https://doc.cuprate.org/cuprate_database/trait.Storable.html).
|
||||
|
||||
- Before storage, the type is [simply cast into bytes](https://github.com/Cuprate/cuprate/blob/2ac90420c658663564a71b7ecb52d74f3c2c9d0f/database/src/storable.rs#L125)
|
||||
- When fetching, the bytes are [simply cast into the type](https://github.com/Cuprate/cuprate/blob/2ac90420c658663564a71b7ecb52d74f3c2c9d0f/database/src/storable.rs#L130)
|
||||
|
||||
When a type is casted into bytes, [the reference is casted](https://docs.rs/bytemuck/latest/bytemuck/fn.bytes_of.html), i.e. this is zero-cost serialization.
|
||||
|
||||
However, it is worth noting that when bytes are casted into the type, [it is copied](https://docs.rs/bytemuck/latest/bytemuck/fn.pod_read_unaligned.html). This is due to byte alignment guarantee issues with both backends, see:
|
||||
- <https://github.com/AltSysrq/lmdb-zero/issues/8>
|
||||
- <https://github.com/cberner/redb/issues/360>
|
||||
|
||||
Without this, `bytemuck` will panic with [`TargetAlignmentGreaterAndInputNotAligned`](https://docs.rs/bytemuck/latest/bytemuck/enum.PodCastError.html#variant.TargetAlignmentGreaterAndInputNotAligned) when casting.
|
||||
|
||||
Copying the bytes fixes this problem, although it is more costly than necessary. However, in the main use-case for `cuprate_database` (`tower::Service` API) the bytes would need to be owned regardless as the `Request/Response` API uses owned data types (`T`, `Vec<T>`, `HashMap<K, V>`, etc).
|
||||
|
||||
Practically speaking, this means lower-level database functions that normally look like such:
|
||||
```rust
|
||||
fn get(key: &Key) -> &Value;
|
||||
```
|
||||
end up looking like this in `cuprate_database`:
|
||||
```rust
|
||||
fn get(key: &Key) -> Value;
|
||||
```
|
||||
|
||||
Since each backend has its own (de)serialization methods, our types are wrapped in compatibility types that map our `Storable` functions into whatever is required for the backend, e.g:
|
||||
- [`StorableHeed<T>`](https://github.com/Cuprate/cuprate/blob/2ac90420c658663564a71b7ecb52d74f3c2c9d0f/database/src/backend/heed/storable.rs#L11-L45)
|
||||
- [`StorableRedb<T>`](https://github.com/Cuprate/cuprate/blob/2ac90420c658663564a71b7ecb52d74f3c2c9d0f/database/src/backend/redb/storable.rs#L11-L30)
|
||||
|
||||
Compatibility structs also exist for any `Storable` containers:
|
||||
- [`StorableVec<T>`](https://github.com/Cuprate/cuprate/blob/2ac90420c658663564a71b7ecb52d74f3c2c9d0f/database/src/storable.rs#L135-L191)
|
||||
- [`StorableBytes`](https://github.com/Cuprate/cuprate/blob/2ac90420c658663564a71b7ecb52d74f3c2c9d0f/database/src/storable.rs#L208-L241)
|
||||
|
||||
Again, it's unfortunate that these must be owned, although in the `tower::Service` use-case, they would have to be owned anyway.
|
17
books/architecture/src/storage/db/syncing.md
Normal file
17
books/architecture/src/storage/db/syncing.md
Normal file
|
@ -0,0 +1,17 @@
|
|||
# Syncing
|
||||
`cuprate_database`'s database has 5 disk syncing modes.
|
||||
|
||||
1. `FastThenSafe`
|
||||
1. `Safe`
|
||||
1. `Async`
|
||||
1. `Threshold`
|
||||
1. `Fast`
|
||||
|
||||
The default mode is `Safe`.
|
||||
|
||||
This means that upon each transaction commit, all the data that was written will be fully synced to disk.
|
||||
This is the slowest, but safest mode of operation.
|
||||
|
||||
Note that upon any database `Drop`, the current implementation will sync to disk regardless of any configuration.
|
||||
|
||||
For more information on the other modes, read the documentation [here](https://github.com/Cuprate/cuprate/blob/2ac90420c658663564a71b7ecb52d74f3c2c9d0f/database/src/config/sync_mode.rs#L63-L144).
|
|
@ -1 +1,34 @@
|
|||
# ⚪️ Storage
|
||||
# Storage
|
||||
This section covers all things related to the on-disk storage of data within Cuprate.
|
||||
|
||||
## Overview
|
||||
The quick overview is that Cuprate has a [database abstraction crate](./database-abstraction.md)
|
||||
that handles "low-level" database details such as key and value (de)serialization, tables, transactions, etc.
|
||||
|
||||
This database abstraction crate is then used by all crates that need on-disk storage, i.e. the
|
||||
- [Blockchain database](./blockchain/intro.md)
|
||||
- [Transaction pool database](./txpool/intro.md)
|
||||
|
||||
## Service
|
||||
The interface provided by all crates building on-top of the
|
||||
database abstraction is a [`tower::Service`](https://docs.rs/tower), i.e.
|
||||
database requests/responses are sent/received asynchronously.
|
||||
|
||||
As the interface details are similar across crates (threadpool, read operations, write operations),
|
||||
the interface itself is abstracted in the [`cuprate_database_service`](./common/service/intro.md) crate,
|
||||
which is then used by the crates.
|
||||
|
||||
## Diagram
|
||||
This is roughly how database crates are set up.
|
||||
|
||||
```text
|
||||
┌─────────────────┐
|
||||
┌──────────────────────────────────┐ │ │
|
||||
│ Some crate that needs a database │ ┌────────────────┐ │ │
|
||||
│ │ │ Public │ │ │
|
||||
│ ┌──────────────────────────────┐ │─►│ tower::Service │◄─►│ Rest of Cuprate │
|
||||
│ │ Database abstraction │ │ │ API │ │ │
|
||||
│ └──────────────────────────────┘ │ └────────────────┘ │ │
|
||||
└──────────────────────────────────┘ │ │
|
||||
└─────────────────┘
|
||||
```
|
||||
|
|
|
@ -1,5 +1,10 @@
|
|||
# storage
|
||||
# Storage
|
||||
This subdirectory contains all things related to the on-disk storage of data within Cuprate.
|
||||
|
||||
TODO: This subdirectory used to be `database/` and is in the middle of being shifted around.
|
||||
See <https://architecture.cuprate.org/storage/intro.html> for design documentation
|
||||
and the following links for user documentation:
|
||||
|
||||
The old `database/` design document is in `cuprate-blockchain/` which will eventually be ported Cuprate's architecture book.
|
||||
- <https://doc.cuprate.org/cuprate_database>
|
||||
- <https://doc.cuprate.org/cuprate_database_service>
|
||||
- <https://doc.cuprate.org/cuprate_blockchain>
|
||||
- <https://doc.cuprate.org/cuprate_txpool>
|
|
@ -1,600 +0,0 @@
|
|||
# Database
|
||||
FIXME: This documentation must be updated and moved to the architecture book.
|
||||
|
||||
Cuprate's blockchain implementation.
|
||||
|
||||
- [1. Documentation](#1-documentation)
|
||||
- [2. File structure](#2-file-structure)
|
||||
- [2.1 `src/`](#21-src)
|
||||
- [2.2 `src/backend/`](#22-srcbackend)
|
||||
- [2.3 `src/config/`](#23-srcconfig)
|
||||
- [2.4 `src/ops/`](#24-srcops)
|
||||
- [2.5 `src/service/`](#25-srcservice)
|
||||
- [3. Backends](#3-backends)
|
||||
- [3.1 heed](#31-heed)
|
||||
- [3.2 redb](#32-redb)
|
||||
- [3.3 redb-memory](#33-redb-memory)
|
||||
- [3.4 sanakirja](#34-sanakirja)
|
||||
- [3.5 MDBX](#35-mdbx)
|
||||
- [4. Layers](#4-layers)
|
||||
- [4.1 Backend](#41-backend)
|
||||
- [4.2 Trait](#42-trait)
|
||||
- [4.3 ConcreteEnv](#43-concreteenv)
|
||||
- [4.4 ops](#44-ops)
|
||||
- [4.5 service](#45-service)
|
||||
- [5. The service](#5-the-service)
|
||||
- [5.1 Initialization](#51-initialization)
|
||||
- [5.2 Requests](#53-requests)
|
||||
- [5.3 Responses](#54-responses)
|
||||
- [5.4 Thread model](#52-thread-model)
|
||||
- [5.5 Shutdown](#55-shutdown)
|
||||
- [6. Syncing](#6-Syncing)
|
||||
- [7. Resizing](#7-resizing)
|
||||
- [8. (De)serialization](#8-deserialization)
|
||||
- [9. Schema](#9-schema)
|
||||
- [9.1 Tables](#91-tables)
|
||||
- [9.2 Multimap tables](#92-multimap-tables)
|
||||
- [10. Known issues and tradeoffs](#10-known-issues-and-tradeoffs)
|
||||
- [10.1 Traits abstracting backends](#101-traits-abstracting-backends)
|
||||
- [10.2 Hot-swappable backends](#102-hot-swappable-backends)
|
||||
- [10.3 Copying unaligned bytes](#103-copying-unaligned-bytes)
|
||||
- [10.4 Endianness](#104-endianness)
|
||||
- [10.5 Extra table data](#105-extra-table-data)
|
||||
|
||||
---
|
||||
|
||||
## 1. Documentation
|
||||
Documentation for `database/` is split into 3 locations:
|
||||
|
||||
| Documentation location | Purpose |
|
||||
|---------------------------|---------|
|
||||
| `database/README.md` | High level design of `cuprate-database`
|
||||
| `cuprate-database` | Practical usage documentation/warnings/notes/etc
|
||||
| Source file `// comments` | Implementation-specific details (e.g, how many reader threads to spawn?)
|
||||
|
||||
This README serves as the implementation design document.
|
||||
|
||||
For actual practical usage, `cuprate-database`'s types and general usage are documented via standard Rust tooling.
|
||||
|
||||
Run:
|
||||
```bash
|
||||
cargo doc --package cuprate-database --open
|
||||
```
|
||||
at the root of the repo to open/read the documentation.
|
||||
|
||||
If this documentation is too abstract, refer to any of the source files, they are heavily commented. There are many `// Regular comments` that explain more implementation specific details that aren't present here or in the docs. Use the file reference below to find what you're looking for.
|
||||
|
||||
The code within `src/` is also littered with some `grep`-able comments containing some keywords:
|
||||
|
||||
| Word | Meaning |
|
||||
|-------------|---------|
|
||||
| `INVARIANT` | This code makes an _assumption_ that must be upheld for correctness
|
||||
| `SAFETY` | This `unsafe` code is okay, for `x,y,z` reasons
|
||||
| `FIXME` | This code works but isn't ideal
|
||||
| `HACK` | This code is a brittle workaround
|
||||
| `PERF` | This code is weird for performance reasons
|
||||
| `TODO` | This must be implemented; There should be 0 of these in production code
|
||||
| `SOMEDAY` | This should be implemented... someday
|
||||
|
||||
## 2. File structure
|
||||
A quick reference of the structure of the folders & files in `cuprate-database`.
|
||||
|
||||
Note that `lib.rs/mod.rs` files are purely for re-exporting/visibility/lints, and contain no code. Each sub-directory has a corresponding `mod.rs`.
|
||||
|
||||
### 2.1 `src/`
|
||||
The top-level `src/` files.
|
||||
|
||||
| File | Purpose |
|
||||
|------------------------|---------|
|
||||
| `constants.rs` | General constants used throughout `cuprate-database`
|
||||
| `database.rs` | Abstracted database; `trait DatabaseR{o,w}`
|
||||
| `env.rs` | Abstracted database environment; `trait Env`
|
||||
| `error.rs` | Database error types
|
||||
| `free.rs` | General free functions (related to the database)
|
||||
| `key.rs` | Abstracted database keys; `trait Key`
|
||||
| `resize.rs` | Database resizing algorithms
|
||||
| `storable.rs` | Data (de)serialization; `trait Storable`
|
||||
| `table.rs` | Database table abstraction; `trait Table`
|
||||
| `tables.rs` | All the table definitions used by `cuprate-database`
|
||||
| `tests.rs` | Utilities for `cuprate_database` testing
|
||||
| `transaction.rs` | Database transaction abstraction; `trait TxR{o,w}`
|
||||
| `types.rs` | Database-specific types
|
||||
| `unsafe_unsendable.rs` | Marker type to impl `Send` for objects not `Send`
|
||||
|
||||
### 2.2 `src/backend/`
|
||||
This folder contains the implementation for actual databases used as the backend for `cuprate-database`.
|
||||
|
||||
Each backend has its own folder.
|
||||
|
||||
| Folder/File | Purpose |
|
||||
|-------------|---------|
|
||||
| `heed/` | Backend using using [`heed`](https://github.com/meilisearch/heed) (LMDB)
|
||||
| `redb/` | Backend using [`redb`](https://github.com/cberner/redb)
|
||||
| `tests.rs` | Backend-agnostic tests
|
||||
|
||||
All backends follow the same file structure:
|
||||
|
||||
| File | Purpose |
|
||||
|------------------|---------|
|
||||
| `database.rs` | Implementation of `trait DatabaseR{o,w}`
|
||||
| `env.rs` | Implementation of `trait Env`
|
||||
| `error.rs` | Implementation of backend's errors to `cuprate_database`'s error types
|
||||
| `storable.rs` | Compatibility layer between `cuprate_database::Storable` and backend-specific (de)serialization
|
||||
| `transaction.rs` | Implementation of `trait TxR{o,w}`
|
||||
| `types.rs` | Type aliases for long backend-specific types
|
||||
|
||||
### 2.3 `src/config/`
|
||||
This folder contains the `cupate_database::config` module; configuration options for the database.
|
||||
|
||||
| File | Purpose |
|
||||
|---------------------|---------|
|
||||
| `config.rs` | Main database `Config` struct
|
||||
| `reader_threads.rs` | Reader thread configuration for `service` thread-pool
|
||||
| `sync_mode.rs` | Disk sync configuration for backends
|
||||
|
||||
### 2.4 `src/ops/`
|
||||
This folder contains the `cupate_database::ops` module.
|
||||
|
||||
These are higher-level functions abstracted over the database, that are Monero-related.
|
||||
|
||||
| File | Purpose |
|
||||
|-----------------|---------|
|
||||
| `block.rs` | Block related (main functions)
|
||||
| `blockchain.rs` | Blockchain related (height, cumulative values, etc)
|
||||
| `key_image.rs` | Key image related
|
||||
| `macros.rs` | Macros specific to `ops/`
|
||||
| `output.rs` | Output related
|
||||
| `property.rs` | Database properties (pruned, version, etc)
|
||||
| `tx.rs` | Transaction related
|
||||
|
||||
### 2.5 `src/service/`
|
||||
This folder contains the `cupate_database::service` module.
|
||||
|
||||
The `async`hronous request/response API other Cuprate crates use instead of managing the database directly themselves.
|
||||
|
||||
| File | Purpose |
|
||||
|----------------|---------|
|
||||
| `free.rs` | General free functions used (related to `cuprate_database::service`)
|
||||
| `read.rs` | Read thread-pool definitions and logic
|
||||
| `tests.rs` | Thread-pool tests and test helper functions
|
||||
| `types.rs` | `cuprate_database::service`-related type aliases
|
||||
| `write.rs` | Writer thread definitions and logic
|
||||
|
||||
## 3. Backends
|
||||
`cuprate-database`'s `trait`s allow abstracting over the actual database, such that any backend in particular could be used.
|
||||
|
||||
Each database's implementation for those `trait`'s are located in its respective folder in `src/backend/${DATABASE_NAME}/`.
|
||||
|
||||
### 3.1 heed
|
||||
The default database used is [`heed`](https://github.com/meilisearch/heed) (LMDB). The upstream versions from [`crates.io`](https://crates.io/crates/heed) are used. `LMDB` should not need to be installed as `heed` has a build script that pulls it in automatically.
|
||||
|
||||
`heed`'s filenames inside Cuprate's database folder (`~/.local/share/cuprate/database/`) are:
|
||||
|
||||
| Filename | Purpose |
|
||||
|------------|---------|
|
||||
| `data.mdb` | Main data file
|
||||
| `lock.mdb` | Database lock file
|
||||
|
||||
`heed`-specific notes:
|
||||
- [There is a maximum reader limit](https://github.com/monero-project/monero/blob/059028a30a8ae9752338a7897329fe8012a310d5/src/blockchain_db/lmdb/db_lmdb.cpp#L1372). Other potential processes (e.g. `xmrblocks`) that are also reading the `data.mdb` file need to be accounted for
|
||||
- [LMDB does not work on remote filesystem](https://github.com/LMDB/lmdb/blob/b8e54b4c31378932b69f1298972de54a565185b1/libraries/liblmdb/lmdb.h#L129)
|
||||
|
||||
### 3.2 redb
|
||||
The 2nd database backend is the 100% Rust [`redb`](https://github.com/cberner/redb).
|
||||
|
||||
The upstream versions from [`crates.io`](https://crates.io/crates/redb) are used.
|
||||
|
||||
`redb`'s filenames inside Cuprate's database folder (`~/.local/share/cuprate/database/`) are:
|
||||
|
||||
| Filename | Purpose |
|
||||
|-------------|---------|
|
||||
| `data.redb` | Main data file
|
||||
|
||||
<!-- TODO: document DB on remote filesystem (does redb allow this?) -->
|
||||
|
||||
### 3.3 redb-memory
|
||||
This backend is 100% the same as `redb`, although, it uses `redb::backend::InMemoryBackend` which is a database that completely resides in memory instead of a file.
|
||||
|
||||
All other details about this should be the same as the normal `redb` backend.
|
||||
|
||||
### 3.4 sanakirja
|
||||
[`sanakirja`](https://docs.rs/sanakirja) was a candidate as a backend, however there were problems with maximum value sizes.
|
||||
|
||||
The default maximum value size is [1012 bytes](https://docs.rs/sanakirja/1.4.1/sanakirja/trait.Storable.html) which was too small for our requirements. Using [`sanakirja::Slice`](https://docs.rs/sanakirja/1.4.1/sanakirja/union.Slice.html) and [sanakirja::UnsizedStorage](https://docs.rs/sanakirja/1.4.1/sanakirja/trait.UnsizedStorable.html) was attempted, but there were bugs found when inserting a value in-between `512..=4096` bytes.
|
||||
|
||||
As such, it is not implemented.
|
||||
|
||||
### 3.5 MDBX
|
||||
[`MDBX`](https://erthink.github.io/libmdbx) was a candidate as a backend, however MDBX deprecated the custom key/value comparison functions, this makes it a bit trickier to implement [`9.2 Multimap tables`](#92-multimap-tables). It is also quite similar to the main backend LMDB (of which it was originally a fork of).
|
||||
|
||||
As such, it is not implemented (yet).
|
||||
|
||||
## 4. Layers
|
||||
`cuprate_database` is logically abstracted into 5 layers, with each layer being built upon the last.
|
||||
|
||||
Starting from the lowest:
|
||||
1. Backend
|
||||
2. Trait
|
||||
3. ConcreteEnv
|
||||
4. `ops`
|
||||
5. `service`
|
||||
|
||||
<!-- TODO: insert image here after database/ split -->
|
||||
|
||||
### 4.1 Backend
|
||||
This is the actual database backend implementation (or a Rust shim over one).
|
||||
|
||||
Examples:
|
||||
- `heed` (LMDB)
|
||||
- `redb`
|
||||
|
||||
`cuprate_database` itself just uses a backend, it does not implement one.
|
||||
|
||||
All backends have the following attributes:
|
||||
- [Embedded](https://en.wikipedia.org/wiki/Embedded_database)
|
||||
- [Multiversion concurrency control](https://en.wikipedia.org/wiki/Multiversion_concurrency_control)
|
||||
- [ACID](https://en.wikipedia.org/wiki/ACID)
|
||||
- Are `(key, value)` oriented and have the expected API (`get()`, `insert()`, `delete()`)
|
||||
- Are table oriented (`"table_name" -> (key, value)`)
|
||||
- Allows concurrent readers
|
||||
|
||||
### 4.2 Trait
|
||||
`cuprate_database` provides a set of `trait`s that abstract over the various database backends.
|
||||
|
||||
This allows the function signatures and behavior to stay the same but allows for swapping out databases in an easier fashion.
|
||||
|
||||
All common behavior of the backend's are encapsulated here and used instead of using the backend directly.
|
||||
|
||||
Examples:
|
||||
- [`trait Env`](https://github.com/Cuprate/cuprate/blob/2ac90420c658663564a71b7ecb52d74f3c2c9d0f/database/src/env.rs)
|
||||
- [`trait {TxRo, TxRw}`](https://github.com/Cuprate/cuprate/blob/2ac90420c658663564a71b7ecb52d74f3c2c9d0f/database/src/transaction.rs)
|
||||
- [`trait {DatabaseRo, DatabaseRw}`](https://github.com/Cuprate/cuprate/blob/2ac90420c658663564a71b7ecb52d74f3c2c9d0f/database/src/database.rs)
|
||||
|
||||
For example, instead of calling `LMDB` or `redb`'s `get()` function directly, `DatabaseRo::get()` is called.
|
||||
|
||||
### 4.3 ConcreteEnv
|
||||
This is the non-generic, concrete `struct` provided by `cuprate_database` that contains all the data necessary to operate the database. The actual database backend `ConcreteEnv` will use internally depends on which backend feature is used.
|
||||
|
||||
`ConcreteEnv` implements `trait Env`, which opens the door to all the other traits.
|
||||
|
||||
The equivalent objects in the backends themselves are:
|
||||
- [`heed::Env`](https://docs.rs/heed/0.20.0/heed/struct.Env.html)
|
||||
- [`redb::Database`](https://docs.rs/redb/2.1.0/redb/struct.Database.html)
|
||||
|
||||
This is the main object used when handling the database directly, although that is not strictly necessary as a user if the [`4.5 service`](#45-service) layer is used.
|
||||
|
||||
### 4.4 ops
|
||||
These are Monero-specific functions that use the abstracted `trait` forms of the database.
|
||||
|
||||
Instead of dealing with the database directly:
|
||||
- `get()`
|
||||
- `delete()`
|
||||
|
||||
the `ops` layer provides more abstract functions that deal with commonly used Monero operations:
|
||||
- `add_block()`
|
||||
- `pop_block()`
|
||||
|
||||
### 4.5 service
|
||||
The final layer abstracts the database completely into a [Monero-specific `async` request/response API](https://github.com/Cuprate/cuprate/blob/2ac90420c658663564a71b7ecb52d74f3c2c9d0f/types/src/service.rs#L18-L78) using [`tower::Service`](https://docs.rs/tower/latest/tower/trait.Service.html).
|
||||
|
||||
For more information on this layer, see the next section: [`5. The service`](#5-the-service).
|
||||
|
||||
## 5. The service
|
||||
The main API `cuprate_database` exposes for other crates to use is the `cuprate_database::service` module.
|
||||
|
||||
This module exposes an `async` request/response API with `tower::Service`, backed by a threadpool, that allows reading/writing Monero-related data from/to the database.
|
||||
|
||||
`cuprate_database::service` itself manages the database using a separate writer thread & reader thread-pool, and uses the previously mentioned [`4.4 ops`](#44-ops) functions when responding to requests.
|
||||
|
||||
### 5.1 Initialization
|
||||
The service is started simply by calling: [`cuprate_database::service::init()`](https://github.com/Cuprate/cuprate/blob/d0ac94a813e4cd8e0ed8da5e85a53b1d1ace2463/database/src/service/free.rs#L23).
|
||||
|
||||
This function initializes the database, spawns threads, and returns a:
|
||||
- Read handle to the database (cloneable)
|
||||
- Write handle to the database (not cloneable)
|
||||
|
||||
These "handles" implement the `tower::Service` trait, which allows sending requests and receiving responses `async`hronously.
|
||||
|
||||
### 5.2 Requests
|
||||
Along with the 2 handles, there are 2 types of requests:
|
||||
- [`ReadRequest`](https://github.com/Cuprate/cuprate/blob/d0ac94a813e4cd8e0ed8da5e85a53b1d1ace2463/types/src/service.rs#L23-L90)
|
||||
- [`WriteRequest`](https://github.com/Cuprate/cuprate/blob/d0ac94a813e4cd8e0ed8da5e85a53b1d1ace2463/types/src/service.rs#L93-L105)
|
||||
|
||||
`ReadRequest` is for retrieving various types of information from the database.
|
||||
|
||||
`WriteRequest` currently only has 1 variant: to write a block to the database.
|
||||
|
||||
### 5.3 Responses
|
||||
After sending one of the above requests using the read/write handle, the value returned is _not_ the response, yet an `async`hronous channel that will eventually return the response:
|
||||
```rust,ignore
|
||||
// Send a request.
|
||||
// tower::Service::call()
|
||||
// V
|
||||
let response_channel: Channel = read_handle.call(ReadResponse::ChainHeight)?;
|
||||
|
||||
// Await the response.
|
||||
let response: ReadResponse = response_channel.await?;
|
||||
|
||||
// Assert the response is what we expected.
|
||||
assert_eq!(matches!(response), Response::ChainHeight(_));
|
||||
```
|
||||
|
||||
After `await`ing the returned channel, a `Response` will eventually be returned when the `service` threadpool has fetched the value from the database and sent it off.
|
||||
|
||||
Both read/write requests variants match in name with `Response` variants, i.e.
|
||||
- `ReadRequest::ChainHeight` leads to `Response::ChainHeight`
|
||||
- `WriteRequest::WriteBlock` leads to `Response::WriteBlockOk`
|
||||
|
||||
### 5.4 Thread model
|
||||
As mentioned in the [`4. Layers`](#4-layers) section, the base database abstractions themselves are not concerned with parallelism, they are mostly functions to be called from a single-thread.
|
||||
|
||||
However, the `cuprate_database::service` API, _does_ have a thread model backing it.
|
||||
|
||||
When [`cuprate_database::service`'s initialization function](https://github.com/Cuprate/cuprate/blob/9c27ba5791377d639cb5d30d0f692c228568c122/database/src/service/free.rs#L33-L44) is called, threads will be spawned and maintained until the user drops (disconnects) the returned handles.
|
||||
|
||||
The current behavior for thread count is:
|
||||
- [1 writer thread](https://github.com/Cuprate/cuprate/blob/9c27ba5791377d639cb5d30d0f692c228568c122/database/src/service/write.rs#L52-L66)
|
||||
- [As many reader threads as there are system threads](https://github.com/Cuprate/cuprate/blob/9c27ba5791377d639cb5d30d0f692c228568c122/database/src/service/read.rs#L104-L126)
|
||||
|
||||
For example, on a system with 32-threads, `cuprate_database` will spawn:
|
||||
- 1 writer thread
|
||||
- 32 reader threads
|
||||
|
||||
whose sole responsibility is to listen for database requests, access the database (potentially in parallel), and return a response.
|
||||
|
||||
Note that the `1 system thread = 1 reader thread` model is only the default setting, the reader thread count can be configured by the user to be any number between `1 .. amount_of_system_threads`.
|
||||
|
||||
The reader threads are managed by [`rayon`](https://docs.rs/rayon).
|
||||
|
||||
For an example of where multiple reader threads are used: given a request that asks if any key-image within a set already exists, `cuprate_database` will [split that work between the threads with `rayon`](https://github.com/Cuprate/cuprate/blob/9c27ba5791377d639cb5d30d0f692c228568c122/database/src/service/read.rs#L490-L503).
|
||||
|
||||
### 5.5 Shutdown
|
||||
Once the read/write handles are `Drop`ed, the backing thread(pool) will gracefully exit, automatically.
|
||||
|
||||
Note the writer thread and reader threadpool aren't connected whatsoever; dropping the write handle will make the writer thread exit, however, the reader handle is free to be held onto and can be continued to be read from - and vice-versa for the write handle.
|
||||
|
||||
## 6. Syncing
|
||||
`cuprate_database`'s database has 5 disk syncing modes.
|
||||
|
||||
1. FastThenSafe
|
||||
1. Safe
|
||||
1. Async
|
||||
1. Threshold
|
||||
1. Fast
|
||||
|
||||
The default mode is `Safe`.
|
||||
|
||||
This means that upon each transaction commit, all the data that was written will be fully synced to disk. This is the slowest, but safest mode of operation.
|
||||
|
||||
Note that upon any database `Drop`, whether via `service` or dropping the database directly, the current implementation will sync to disk regardless of any configuration.
|
||||
|
||||
For more information on the other modes, read the documentation [here](https://github.com/Cuprate/cuprate/blob/2ac90420c658663564a71b7ecb52d74f3c2c9d0f/database/src/config/sync_mode.rs#L63-L144).
|
||||
|
||||
## 7. Resizing
|
||||
Database backends that require manually resizing will, by default, use a similar algorithm as `monerod`'s.
|
||||
|
||||
Note that this only relates to the `service` module, where the database is handled by `cuprate_database` itself, not the user. In the case of a user directly using `cuprate_database`, it is up to them on how to resize.
|
||||
|
||||
Within `service`, the resizing logic defined [here](https://github.com/Cuprate/cuprate/blob/2ac90420c658663564a71b7ecb52d74f3c2c9d0f/database/src/service/write.rs#L139-L201) does the following:
|
||||
|
||||
- If there's not enough space to fit a write request's data, start a resize
|
||||
- Each resize adds around [`1_073_745_920`](https://github.com/Cuprate/cuprate/blob/2ac90420c658663564a71b7ecb52d74f3c2c9d0f/database/src/resize.rs#L104-L160) bytes to the current map size
|
||||
- A resize will be attempted `3` times before failing
|
||||
|
||||
There are other [resizing algorithms](https://github.com/Cuprate/cuprate/blob/2ac90420c658663564a71b7ecb52d74f3c2c9d0f/database/src/resize.rs#L38-L47) that define how the database's memory map grows, although currently the behavior of [`monerod`](https://github.com/Cuprate/cuprate/blob/2ac90420c658663564a71b7ecb52d74f3c2c9d0f/database/src/resize.rs#L104-L160) is closely followed.
|
||||
|
||||
## 8. (De)serialization
|
||||
All types stored inside the database are either bytes already, or are perfectly bitcast-able.
|
||||
|
||||
As such, they do not incur heavy (de)serialization costs when storing/fetching them from the database. The main (de)serialization used is [`bytemuck`](https://docs.rs/bytemuck)'s traits and casting functions.
|
||||
|
||||
The size & layout of types is stable across compiler versions, as they are set and determined with [`#[repr(C)]`](https://doc.rust-lang.org/nomicon/other-reprs.html#reprc) and `bytemuck`'s derive macros such as [`bytemuck::Pod`](https://docs.rs/bytemuck/latest/bytemuck/derive.Pod.html).
|
||||
|
||||
Note that the data stored in the tables are still type-safe; we still refer to the key and values within our tables by the type.
|
||||
|
||||
The main deserialization `trait` for database storage is: [`cuprate_database::Storable`](https://github.com/Cuprate/cuprate/blob/2ac90420c658663564a71b7ecb52d74f3c2c9d0f/database/src/storable.rs#L16-L115).
|
||||
|
||||
- Before storage, the type is [simply cast into bytes](https://github.com/Cuprate/cuprate/blob/2ac90420c658663564a71b7ecb52d74f3c2c9d0f/database/src/storable.rs#L125)
|
||||
- When fetching, the bytes are [simply cast into the type](https://github.com/Cuprate/cuprate/blob/2ac90420c658663564a71b7ecb52d74f3c2c9d0f/database/src/storable.rs#L130)
|
||||
|
||||
When a type is casted into bytes, [the reference is casted](https://docs.rs/bytemuck/latest/bytemuck/fn.bytes_of.html), i.e. this is zero-cost serialization.
|
||||
|
||||
However, it is worth noting that when bytes are casted into the type, [it is copied](https://docs.rs/bytemuck/latest/bytemuck/fn.pod_read_unaligned.html). This is due to byte alignment guarantee issues with both backends, see:
|
||||
- https://github.com/AltSysrq/lmdb-zero/issues/8
|
||||
- https://github.com/cberner/redb/issues/360
|
||||
|
||||
Without this, `bytemuck` will panic with [`TargetAlignmentGreaterAndInputNotAligned`](https://docs.rs/bytemuck/latest/bytemuck/enum.PodCastError.html#variant.TargetAlignmentGreaterAndInputNotAligned) when casting.
|
||||
|
||||
Copying the bytes fixes this problem, although it is more costly than necessary. However, in the main use-case for `cuprate_database` (the `service` module) the bytes would need to be owned regardless as the `Request/Response` API uses owned data types (`T`, `Vec<T>`, `HashMap<K, V>`, etc).
|
||||
|
||||
Practically speaking, this means lower-level database functions that normally look like such:
|
||||
```rust
|
||||
fn get(key: &Key) -> &Value;
|
||||
```
|
||||
end up looking like this in `cuprate_database`:
|
||||
```rust
|
||||
fn get(key: &Key) -> Value;
|
||||
```
|
||||
|
||||
Since each backend has its own (de)serialization methods, our types are wrapped in compatibility types that map our `Storable` functions into whatever is required for the backend, e.g:
|
||||
- [`StorableHeed<T>`](https://github.com/Cuprate/cuprate/blob/2ac90420c658663564a71b7ecb52d74f3c2c9d0f/database/src/backend/heed/storable.rs#L11-L45)
|
||||
- [`StorableRedb<T>`](https://github.com/Cuprate/cuprate/blob/2ac90420c658663564a71b7ecb52d74f3c2c9d0f/database/src/backend/redb/storable.rs#L11-L30)
|
||||
|
||||
Compatibility structs also exist for any `Storable` containers:
|
||||
- [`StorableVec<T>`](https://github.com/Cuprate/cuprate/blob/2ac90420c658663564a71b7ecb52d74f3c2c9d0f/database/src/storable.rs#L135-L191)
|
||||
- [`StorableBytes`](https://github.com/Cuprate/cuprate/blob/2ac90420c658663564a71b7ecb52d74f3c2c9d0f/database/src/storable.rs#L208-L241)
|
||||
|
||||
Again, it's unfortunate that these must be owned, although in `service`'s use-case, they would have to be owned anyway.
|
||||
|
||||
## 9. Schema
|
||||
This following section contains Cuprate's database schema, it may change throughout the development of Cuprate, as such, nothing here is final.
|
||||
|
||||
### 9.1 Tables
|
||||
The `CamelCase` names of the table headers documented here (e.g. `TxIds`) are the actual type name of the table within `cuprate_database`.
|
||||
|
||||
Note that words written within `code blocks` mean that it is a real type defined and usable within `cuprate_database`. Other standard types like u64 and type aliases (TxId) are written normally.
|
||||
|
||||
Within `cuprate_database::tables`, the below table is essentially defined as-is with [a macro](https://github.com/Cuprate/cuprate/blob/31ce89412aa174fc33754f22c9a6d9ef5ddeda28/database/src/tables.rs#L369-L470).
|
||||
|
||||
Many of the data types stored are the same data types, although are different semantically, as such, a map of aliases used and their real data types is also provided below.
|
||||
|
||||
| Alias | Real Type |
|
||||
|----------------------------------------------------|-----------|
|
||||
| BlockHeight, Amount, AmountIndex, TxId, UnlockTime | u64
|
||||
| BlockHash, KeyImage, TxHash, PrunableHash | [u8; 32]
|
||||
|
||||
| Table | Key | Value | Description |
|
||||
|-------------------|----------------------|--------------------|-------------|
|
||||
| `BlockBlobs` | BlockHeight | `StorableVec<u8>` | Maps a block's height to a serialized byte form of a block
|
||||
| `BlockHeights` | BlockHash | BlockHeight | Maps a block's hash to its height
|
||||
| `BlockInfos` | BlockHeight | `BlockInfo` | Contains metadata of all blocks
|
||||
| `KeyImages` | KeyImage | () | This table is a set with no value, it stores transaction key images
|
||||
| `NumOutputs` | Amount | u64 | Maps an output's amount to the number of outputs with that amount
|
||||
| `Outputs` | `PreRctOutputId` | `Output` | This table contains legacy CryptoNote outputs which have clear amounts. This table will not contain an output with 0 amount.
|
||||
| `PrunedTxBlobs` | TxId | `StorableVec<u8>` | Contains pruned transaction blobs (even if the database is not pruned)
|
||||
| `PrunableTxBlobs` | TxId | `StorableVec<u8>` | Contains the prunable part of a transaction
|
||||
| `PrunableHashes` | TxId | PrunableHash | Contains the hash of the prunable part of a transaction
|
||||
| `RctOutputs` | AmountIndex | `RctOutput` | Contains RingCT outputs mapped from their global RCT index
|
||||
| `TxBlobs` | TxId | `StorableVec<u8>` | Serialized transaction blobs (bytes)
|
||||
| `TxIds` | TxHash | TxId | Maps a transaction's hash to its index/ID
|
||||
| `TxHeights` | TxId | BlockHeight | Maps a transaction's ID to the height of the block it comes from
|
||||
| `TxOutputs` | TxId | `StorableVec<u64>` | Gives the amount indices of a transaction's outputs
|
||||
| `TxUnlockTime` | TxId | UnlockTime | Stores the unlock time of a transaction (only if it has a non-zero lock time)
|
||||
|
||||
The definitions for aliases and types (e.g. `RctOutput`) are within the [`cuprate_database::types`](https://github.com/Cuprate/cuprate/blob/31ce89412aa174fc33754f22c9a6d9ef5ddeda28/database/src/types.rs#L51) module.
|
||||
|
||||
<!-- TODO(Boog900): We could split this table again into `RingCT (non-miner) Outputs` and `RingCT (miner) Outputs` as for miner outputs we can store the amount instead of commitment saving 24 bytes per miner output. -->
|
||||
|
||||
### 9.2 Multimap tables
|
||||
When referencing outputs, Monero will [use the amount and the amount index](https://github.com/monero-project/monero/blob/c8214782fb2a769c57382a999eaf099691c836e7/src/blockchain_db/lmdb/db_lmdb.cpp#L3447-L3449). This means 2 keys are needed to reach an output.
|
||||
|
||||
With LMDB you can set the `DUP_SORT` flag on a table and then set the key/value to:
|
||||
```rust
|
||||
Key = KEY_PART_1
|
||||
```
|
||||
```rust
|
||||
Value = {
|
||||
KEY_PART_2,
|
||||
VALUE // The actual value we are storing.
|
||||
}
|
||||
```
|
||||
|
||||
Then you can set a custom value sorting function that only takes `KEY_PART_2` into account; this is how `monerod` does it.
|
||||
|
||||
This requires that the underlying database supports:
|
||||
- multimap tables
|
||||
- custom sort functions on values
|
||||
- setting a cursor on a specific key/value
|
||||
|
||||
---
|
||||
|
||||
Another way to implement this is as follows:
|
||||
```rust
|
||||
Key = { KEY_PART_1, KEY_PART_2 }
|
||||
```
|
||||
```rust
|
||||
Value = VALUE
|
||||
```
|
||||
|
||||
Then the key type is simply used to look up the value; this is how `cuprate_database` does it.
|
||||
|
||||
For example, the key/value pair for outputs is:
|
||||
```rust
|
||||
PreRctOutputId => Output
|
||||
```
|
||||
where `PreRctOutputId` looks like this:
|
||||
```rust
|
||||
struct PreRctOutputId {
|
||||
amount: u64,
|
||||
amount_index: u64,
|
||||
}
|
||||
```
|
||||
|
||||
## 10. Known issues and tradeoffs
|
||||
`cuprate_database` takes many tradeoffs, whether due to:
|
||||
- Prioritizing certain values over others
|
||||
- Not having a better solution
|
||||
- Being "good enough"
|
||||
|
||||
This is a list of the larger ones, along with issues that don't have answers yet.
|
||||
|
||||
### 10.1 Traits abstracting backends
|
||||
Although all database backends used are very similar, they have some crucial differences in small implementation details that must be worked around when conforming them to `cuprate_database`'s traits.
|
||||
|
||||
Put simply: using `cuprate_database`'s traits is less efficient and more awkward than using the backend directly.
|
||||
|
||||
For example:
|
||||
- [Data types must be wrapped in compatibility layers when they otherwise wouldn't be](https://github.com/Cuprate/cuprate/blob/d0ac94a813e4cd8e0ed8da5e85a53b1d1ace2463/database/src/backend/heed/env.rs#L101-L116)
|
||||
- [There are types that only apply to a specific backend, but are visible to all](https://github.com/Cuprate/cuprate/blob/d0ac94a813e4cd8e0ed8da5e85a53b1d1ace2463/database/src/error.rs#L86-L89)
|
||||
- [There are extra layers of abstraction to smoothen the differences between all backends](https://github.com/Cuprate/cuprate/blob/d0ac94a813e4cd8e0ed8da5e85a53b1d1ace2463/database/src/env.rs#L62-L68)
|
||||
- [Existing functionality of backends must be taken away, as it isn't supported in the others](https://github.com/Cuprate/cuprate/blob/d0ac94a813e4cd8e0ed8da5e85a53b1d1ace2463/database/src/database.rs#L27-L34)
|
||||
|
||||
This is a _tradeoff_ that `cuprate_database` takes, as:
|
||||
- The backend itself is usually not the source of bottlenecks in the greater system, as such, small inefficiencies are OK
|
||||
- None of the lost functionality is crucial for operation
|
||||
- The ability to use, test, and swap between multiple database backends is [worth it](https://github.com/Cuprate/cuprate/pull/35#issuecomment-1952804393)
|
||||
|
||||
### 10.2 Hot-swappable backends
|
||||
Using a different backend is really as simple as re-building `cuprate_database` with a different feature flag:
|
||||
```bash
|
||||
# Use LMDB.
|
||||
cargo build --package cuprate-database --features heed
|
||||
|
||||
# Use redb.
|
||||
cargo build --package cuprate-database --features redb
|
||||
```
|
||||
|
||||
This is "good enough" for now, however ideally, this hot-swapping of backends would be able to be done at _runtime_.
|
||||
|
||||
As it is now, `cuprate_database` cannot compile both backends and swap based on user input at runtime; it must be compiled with a certain backend, which will produce a binary with only that backend.
|
||||
|
||||
This also means things like [CI testing multiple backends is awkward](https://github.com/Cuprate/cuprate/blob/main/.github/workflows/ci.yml#L132-L136), as we must re-compile with different feature flags instead.
|
||||
|
||||
### 10.3 Copying unaligned bytes
|
||||
As mentioned in [`8. (De)serialization`](#8-deserialization), bytes are _copied_ when they are turned into a type `T` due to unaligned bytes being returned from database backends.
|
||||
|
||||
Using a regular reference cast results in an improperly aligned type `T`; [such a type even existing causes undefined behavior](https://doc.rust-lang.org/reference/behavior-considered-undefined.html). In our case, `bytemuck` saves us by panicking before this occurs.
|
||||
|
||||
Thus, when using `cuprate_database`'s database traits, an _owned_ `T` is returned.
|
||||
|
||||
This is doubly unfortunately for `&[u8]` as this does not even need deserialization.
|
||||
|
||||
For example, `StorableVec` could have been this:
|
||||
```rust
|
||||
enum StorableBytes<'a, T: Storable> {
|
||||
Owned(T),
|
||||
Ref(&'a T),
|
||||
}
|
||||
```
|
||||
but this would require supporting types that must be copied regardless with the occasional `&[u8]` that can be returned without casting. This was hard to do so in a generic way, thus all `[u8]`'s are copied and returned as owned `StorableVec`s.
|
||||
|
||||
This is a _tradeoff_ `cuprate_database` takes as:
|
||||
- `bytemuck::pod_read_unaligned` is cheap enough
|
||||
- The main API, `service`, needs to return owned value anyway
|
||||
- Having no references removes a lot of lifetime complexity
|
||||
|
||||
The alternative is either:
|
||||
- Using proper (de)serialization instead of casting (which comes with its own costs)
|
||||
- Somehow fixing the alignment issues in the backends mentioned previously
|
||||
|
||||
### 10.4 Endianness
|
||||
`cuprate_database`'s (de)serialization and storage of bytes are native-endian, as in, byte storage order will depend on the machine it is running on.
|
||||
|
||||
As Cuprate's build-targets are all little-endian ([big-endian by default machines barely exist](https://en.wikipedia.org/wiki/Endianness#Hardware)), this doesn't matter much and the byte ordering can be seen as a constant.
|
||||
|
||||
Practically, this means `cuprated`'s database files can be transferred across computers, as can `monerod`'s.
|
||||
|
||||
### 10.5 Extra table data
|
||||
Some of `cuprate_database`'s tables differ from `monerod`'s tables, for example, the way [`9.2 Multimap tables`](#92-multimap-tables) tables are done requires that the primary key is stored _for all_ entries, compared to `monerod` only needing to store it once.
|
||||
|
||||
For example:
|
||||
```rust
|
||||
// `monerod` only stores `amount: 1` once,
|
||||
// `cuprated` stores it each time it appears.
|
||||
struct PreRctOutputId { amount: 1, amount_index: 0 }
|
||||
struct PreRctOutputId { amount: 1, amount_index: 1 }
|
||||
```
|
||||
|
||||
This means `cuprated`'s database will be slightly larger than `monerod`'s.
|
||||
|
||||
The current method `cuprate_database` uses will be "good enough" until usage shows that it must be optimized as multimap tables are tricky to implement across all backends.
|
Loading…
Reference in a new issue