serai/substrate/tendermint/client/src/block_import.rs

183 lines
7.7 KiB
Rust
Raw Normal View History

Initial Tendermint implementation (#145) * Machine without timeouts * Time code * Move substrate/consensus/tendermint to substrate/tendermint * Delete the old paper doc * Refactor out external parts to generics Also creates a dedicated file for the message log. * Refactor <V, B> to type V, type B * Successfully compiling * Calculate timeouts * Fix test * Finish timeouts * Misc cleanup * Define a signature scheme trait * Implement serialization via parity's scale codec Ideally, this would be generic. Unfortunately, the generic API serde doesn't natively support borsh, nor SCALE, and while there is a serde SCALE crate, it's old. While it may be complete, it's not worth working with. While we could still grab bincode, and a variety of other formats, it wasn't worth it to go custom and for Serai, we'll be using SCALE almost everywhere anyways. * Implement usage of the signature scheme * Make the infinite test non-infinite * Provide a dedicated signature in Precommit of just the block hash Greatly simplifies verifying when syncing. * Dedicated Commit object Restores sig aggregation API. * Tidy README * Document tendermint * Sign the ID directly instead of its SCALE encoding For a hash, which is fixed-size, these should be the same yet this helps move past the dependency on SCALE. It also, for any type where the two values are different, smooths integration. * Litany of bug fixes Also attempts to make the code more readable while updating/correcting documentation. * Remove async recursion Greatly increases safety as well by ensuring only one message is processed at once. * Correct timing issues 1) Commit didn't include the round, leaving the clock in question. 2) Machines started with a local time, instead of a proper start time. 3) Machines immediately started the next block instead of waiting for the block time. * Replace MultiSignature with sr25519::Signature * Minor SignatureScheme API changes * Map TM SignatureScheme to Substrate's sr25519 * Initial work on an import queue * Properly use check_block * Rename import to import_queue * Implement tendermint_machine::Block for Substrate Blocks Unfortunately, this immediately makes Tendermint machine capable of deployment as crate since it uses a git reference. In the future, a Cargo.toml patch section for serai/substrate should be investigated. This is being done regardless as it's the quickest way forward and this is for Serai. * Dummy Weights * Move documentation to the top of the file * Move logic into TendermintImport itself Multiple traits exist to verify/handle blocks. I'm unsure exactly when each will be called in the pipeline, so the easiest solution is to have every step run every check. That would be extremely computationally expensive if we ran EVERY check, yet we rely on Substrate for execution (and according checks), which are limited to just the actual import function. Since we're calling this code from many places, it makes sense for it to be consolidated under TendermintImport. * BlockImport, JustificationImport, Verifier, and import_queue function * Update consensus/lib.rs from PoW to Tendermint Not possible to be used as the previous consensus could. It will not produce blocks nor does it currenly even instantiate a machine. This is just he next step. * Update Cargo.tomls for substrate packages * Tendermint SelectChain This is incompatible with Substrate's expectations, yet should be valid for ours * Move the node over to the new SelectChain * Minor tweaks * Update SelectChain documentation * Remove substrate/node lib.rs This shouldn't be used as a library AFAIK. While runtime should be, and arguably should even be published, I have yet to see node in the same way. Helps tighten API boundaries. * Remove unused macro_use * Replace panicking todos with stubs and // TODO Enables progress. * Reduce chain_spec and use more accurate naming * Implement block proposal logic * Modularize to get_proposal * Trigger block importing Doesn't wait for the response yet, which it needs to. * Get the result of block importing * Split import_queue into a series of files * Provide a way to create the machine The BasicQueue returned obscures the TendermintImport struct. Accordingly, a Future scoped with access is returned upwards, which when awaited will create the machine. This makes creating the machine optional while maintaining scope boundaries. Is sufficient to create a 1-node net which produces and finalizes blocks. * Don't import justifications multiple times Also don't broadcast blocks which were solely proposed. * Correct justication import pipeline Removes JustificationImport as it should never be used. * Announce blocks By claiming File, they're not sent ovber the P2P network before they have a justification, as desired. Unfortunately, they never were. This works around that. * Add an assert to verify proposed children aren't best * Consolidate C and I generics into a TendermintClient trait alias * Expand sanity checks Substrate doesn't expect nor officially support children with less work than their parents. It's a trick used here. Accordingly, ensure the trick's validity. * When resetting, use the end time of the round which was committed to The machine reset to the end time of the current round. For a delayed network connection, a machine may move ahead in rounds and only later realize a prior round succeeded. Despite acknowledging that round's success, it would maintain its delay when moving to the next block, bricking it. Done by tracking the end time for each round as they occur. * Move Commit from including the round to including the round's end_time The round was usable to build the current clock in an accumulated fashion, relative to the previous round. The end time is the absolute metric of it, which can be used to calculate the round number (with all previous end times). Substrate now builds off the best block, not genesis, using the end time included in the justification to start its machine in a synchronized state. Knowing the end time of a round, or the round in which block was committed to, is necessary for nodes to sync up with Tendermint. Encoding it in the commit ensures it's long lasting and makes it readily available, without the load of an entire transaction. * Add a TODO on Tendermint * Misc bug fixes * More misc bug fixes * Clean up lock acquisition * Merge weights and signing scheme into validators, documenting needed changes * Add pallet sessions to runtime, create pallet-tendermint * Update node to use pallet sessions * Update support URL * Partial work on correcting pallet calls * Redo Tendermint folder structure * TendermintApi, compilation fixes * Fix the stub round robin At some point, the modulus was removed causing it to exceed the validators list and stop proposing. * Use the validators list from the session pallet * Basic Gossip Validator * Correct Substrate Tendermint start block The Tendermint machine uses the passed in number as the block's being worked on number. Substrate passed in the already finalized block's number. Also updates misc comments. * Clean generics in Tendermint with a monolith with associated types * Remove the Future triggering the machine for an async fn Enables passing data in, such as the network. * Move TendermintMachine from start_num, time to last_num, time Provides an explicitly clear API clearer to program around. Also adds additional time code to handle an edge case. * Connect the Tendermint machine to a GossipEngine * Connect broadcast * Remove machine from TendermintImport It's not used there at all. * Merge Verifier into block_import.rs These two files were largely the same, just hooking into sync structs with almost identical imports. As this project shapes up, removing dead weight is appreciated. * Create a dedicated file for being a Tendermint authority * Deleted comment code related to PoW * Move serai_runtime specific code from tendermint/client to node Renames serai-consensus to sc_tendermint * Consolidate file structure in sc_tendermint * Replace best_* with finalized_* We test their equivalency yet still better to use finalized_* in general. * Consolidate references to sr25519 in sc_tendermint * Add documentation to public structs/functions in sc_tendermint * Add another missing comment * Make sign asynchronous Some relation to https://github.com/serai-dex/serai/issues/95. * Move sc_tendermint to async sign * Implement proper checking of inherents * Take in a Keystore and validator ID * Remove unnecessary PhantomDatas * Update node to latest sc_tendermint * Configure node for a multi-node testnet * Fix handling of the GossipEngine * Use a rounded genesis to obtain sufficient synchrony within the Docker env * Correct Serai d-f names in Docker * Remove an attempt at caching I don't believe would ever hit * Add an already in chain check to block import While the inner should do this for us, we call verify_order on our end *before* inner to ensure sequential import. Accordingly, we need to provide our own check. Removes errors of "non-sequential import" when trying to re-import an existing block. * Update the consensus documentation It was incredibly out of date. * Add a _ to the validator arg in slash * Make the dev profile a local testnet profile Restores a dev profile which only has one validator, locally running. * Reduce Arcs in TendermintMachine, split Signer from SignatureScheme * Update sc_tendermint per previous commit * Restore cache * Remove error case which shouldn't be an error * Stop returning errors on already existing blocks entirely * Correct Dave, Eve, and Ferdie to not run as validators * Rename dev to devnet --dev still works thanks to the |. Acheieves a personal preference of mine with some historical meaning. * Add message expiry to the Tendermint gossip * Localize the LibP2P protocol to the blockchain Follows convention by doing so. Theoretically enables running multiple blockchains over a single LibP2P connection. * Add a version to sp-runtime in tendermint-machine * Add missing trait * Bump Substrate dependency Fixes #147. * Implement Schnorr half-aggregation from https://eprint.iacr.org/2021/350.pdf Relevant to https://github.com/serai-dex/serai/issues/99. * cargo update (tendermint) * Move from polling loops to a pure IO model for sc_tendermint's gossip * Correct protocol name handling * Use futures mpsc instead of tokio * Timeout futures * Move from a yielding loop to select in tendermint-machine * Update Substrate to the new TendermintHandle * Use futures pin instead of tokio * Only recheck blocks with non-fatal inherent transaction errors * Update to the latest substrate * Separate the block processing time from the latency * Add notes to the runtime * Don't spam slash Also adds a slash condition of failing to propose. * Support running TendermintMachine when not a validator This supports validators who leave the current set, without crashing their nodes, along with nodes trying to become validators (who will now seamlessly transition in). * Properly define and pass around the block size * Correct the Duration timing The proposer will build it, send it, then process it (on the first round). Accordingly, it's / 3, not / 2, as / 2 only accounted for the latter events. * Correct time-adjustment code on round skip * Have the machine respond to advances made by an external sync loop * Clean up time code in tendermint-machine * BlockData and RoundData structs * Rename Round to RoundNumber * Move BlockData to a new file * Move Round to an Option due to the pseudo-uninitialized state we create Before the addition of RoundData, we always created the round, and on .round(0), simply created it again. With RoundData, and the changes to the time code, we used round 0, time 0, the latter being incorrect yet not an issue due to lack of misuse. Now, if we do misuse it, it'll panic. * Clear the Queue instead of draining and filtering There shouldn't ever be a message which passes the filter under the current design. * BlockData::new * Move more code into block.rs Introduces type-aliases to obtain Data/Message/SignedMessage solely from a Network object. Fixes a bug regarding stepping when you're not an active validator. * Have verify_precommit_signature return if it verified the signature Also fixes a bug where invalid precommit signatures were left standing and therefore contributing to commits. * Remove the precommit signature hash It cached signatures per-block. Precommit signatures are bound to each round. This would lead to forming invalid commits when a commit should be formed. Under debug, the machine would catch that and panic. On release, it'd have everyone who wasn't a validator fail to continue syncing. * Slight doc changes Also flattens the message handling function by replacing an if containing all following code in the function with an early return for the else case. * Always produce notifications for finalized blocks via origin overrides * Correct weird formatting * Update to the latest tendermint-machine * Manually step the Tendermint machine when we synced a block over the network * Ignore finality notifications for old blocks * Remove a TODO resolved in 8c51bc011d03c8d54ded05011e7f4d1a01e9f873 * Add a TODO comment to slash Enables searching for the case-sensitive phrase and finding it. * cargo fmt * Use a tmp DB for Serai in Docker * Remove panic on slash As we move towards protonet, this can happen (if a node goes offline), yet it happening brings down the entire net right now. * Add log::error on slash * created shared volume between containers * Complete the sh scripts * Pass in the genesis time to Substrate * Correct block announcements They were announced, yet not marked best. * Correct pupulate_end_time It was used as inclusive yet didn't work inclusively. * Correct gossip channel jumping when a block is synced via Substrate * Use a looser check in import_future This triggered so it needs to be accordingly relaxed. * Correct race conditions between add_block and step Also corrects a <= to <. * Update cargo deny * rename genesis-service to genesis * Update Cargo.lock * Correct runtime Cargo.toml whitespace * Correct typo * Document recheck * Misc lints * Fix prev commit * Resolve low-hanging review comments * Mark genesis/entry-dev.sh as executable * Prevent a commit from including the same signature multiple times Yanks tendermint-machine 0.1.0 accordingly. * Update to latest nightly clippy * Improve documentation * Use clearer variable names * Add log statements * Pair more log statements * Clean TendermintAuthority::authority as possible Merges it into new. It has way too many arguments, yet there's no clear path at consolidation there, unfortunately. Additionally provides better scoping within itself. * Fix #158 Doesn't use lock_import_and_run for reasons commented (lack of async). * Rename guard to lock * Have the devnet use the current time as the genesis Possible since it's only a single node, not requiring synchronization. * Fix gossiping I really don't know what side effect this avoids and I can't say I care at this point. * Misc lints Co-authored-by: vrx00 <vrx00@proton.me> Co-authored-by: TheArchitect108 <TheArchitect108@protonmail.com>
2022-12-03 23:38:02 +00:00
use std::{marker::PhantomData, sync::Arc, collections::HashMap};
use async_trait::async_trait;
use sp_api::BlockId;
use sp_runtime::traits::{Header, Block};
use sp_blockchain::{BlockStatus, HeaderBackend, Backend as BlockchainBackend};
use sp_consensus::{Error, CacheKeyId, BlockOrigin, SelectChain};
use sc_consensus::{BlockCheckParams, BlockImportParams, ImportResult, BlockImport, Verifier};
use sc_client_api::{Backend, BlockBackend};
use crate::{TendermintValidator, tendermint::TendermintImport};
impl<T: TendermintValidator> TendermintImport<T> {
fn check_already_in_chain(&self, hash: <T::Block as Block>::Hash) -> bool {
let id = BlockId::Hash(hash);
// If it's in chain, with justifications, return it's already on chain
// If it's in chain, without justifications, continue the block import process to import its
// justifications
// This can be triggered if the validators add a block, without justifications, yet the p2p
// process then broadcasts it with its justifications
(self.client.status(id).unwrap() == BlockStatus::InChain) &&
self.client.justifications(hash).unwrap().is_some()
}
}
#[async_trait]
impl<T: TendermintValidator> BlockImport<T::Block> for TendermintImport<T>
where
Arc<T::Client>: BlockImport<T::Block, Transaction = T::BackendTransaction>,
<Arc<T::Client> as BlockImport<T::Block>>::Error: Into<Error>,
{
type Error = Error;
type Transaction = T::BackendTransaction;
// TODO: Is there a DoS where you send a block without justifications, causing it to error,
// yet adding it to the blacklist in the process preventing further syncing?
async fn check_block(
&mut self,
mut block: BlockCheckParams<T::Block>,
) -> Result<ImportResult, Self::Error> {
if self.check_already_in_chain(block.hash) {
return Ok(ImportResult::AlreadyInChain);
}
self.verify_order(block.parent_hash, block.number)?;
// Does not verify origin here as origin only applies to unfinalized blocks
// We don't have context on if this block has justifications or not
block.allow_missing_state = false;
block.allow_missing_parent = false;
self.client.check_block(block).await.map_err(Into::into)
}
async fn import_block(
&mut self,
mut block: BlockImportParams<T::Block, Self::Transaction>,
new_cache: HashMap<CacheKeyId, Vec<u8>>,
) -> Result<ImportResult, Self::Error> {
// Don't allow multiple blocks to be imported at once
let _lock = self.sync_lock.lock().await;
if self.check_already_in_chain(block.header.hash()) {
return Ok(ImportResult::AlreadyInChain);
}
self.check(&mut block).await?;
self.client.import_block(block, new_cache).await.map_err(Into::into)
}
}
#[async_trait]
impl<T: TendermintValidator> Verifier<T::Block> for TendermintImport<T>
where
Arc<T::Client>: BlockImport<T::Block, Transaction = T::BackendTransaction>,
<Arc<T::Client> as BlockImport<T::Block>>::Error: Into<Error>,
{
async fn verify(
&mut self,
mut block: BlockImportParams<T::Block, ()>,
) -> Result<(BlockImportParams<T::Block, ()>, Option<Vec<(CacheKeyId, Vec<u8>)>>), String> {
block.origin = match block.origin {
BlockOrigin::Genesis => BlockOrigin::Genesis,
BlockOrigin::NetworkBroadcast => BlockOrigin::NetworkBroadcast,
// Re-map NetworkInitialSync to NetworkBroadcast so it still triggers notifications
// Tendermint will listen to the finality stream. If we sync a block we're running a machine
// for, it'll force the machine to move ahead. We can only do that if there actually are
// notifications
//
// Then Serai also runs data indexing code based on block addition, so ensuring it always
// emits events ensures we always perform our necessary indexing (albeit with a race
// condition since Substrate will eventually prune the block's state, potentially before
// indexing finishes when syncing)
//
// The alternative to this would be editing Substrate directly, which would be a lot less
// fragile, manually triggering the notifications (which may be possible with code intended
// for testing), writing our own notification system, or implementing lock_import_and_run
// on our end, letting us directly set the notifications, so we're not beholden to when
// Substrate decides to call notify_finalized
//
// lock_import_and_run unfortunately doesn't allow async code and generally isn't feasible to
// work with though. We also couldn't use it to prevent Substrate from creating
// notifications, so it only solves half the problem. We'd *still* have to keep this patch,
// with all its fragility, unless we edit Substrate or move the entire block import flow here
BlockOrigin::NetworkInitialSync => BlockOrigin::NetworkBroadcast,
// Also re-map File so bootstraps also trigger notifications, enabling using bootstraps
BlockOrigin::File => BlockOrigin::NetworkBroadcast,
// We do not want this block, which hasn't been confirmed, to be broadcast over the net
// Substrate will generate notifications unless it's Genesis, which this isn't, InitialSync,
// which changes telemetry behavior, or File, which is... close enough
BlockOrigin::ConsensusBroadcast => BlockOrigin::File,
BlockOrigin::Own => BlockOrigin::File,
};
if self.check_already_in_chain(block.header.hash()) {
return Ok((block, None));
}
self.check(&mut block).await.map_err(|e| format!("{}", e))?;
Ok((block, None))
}
}
/// Tendermint's Select Chain, where the best chain is defined as the most recently finalized
/// block.
///
/// leaves panics on call due to not being applicable under Tendermint. Any provided answer would
/// have conflicts best left unraised.
//
// SelectChain, while provided by Substrate and part of PartialComponents, isn't used by Substrate
// It's common between various block-production/finality crates, yet Substrate as a system doesn't
// rely on it, which is good, because its definition is explicitly incompatible with Tendermint
//
// leaves is supposed to return all leaves of the blockchain. While Tendermint maintains that view,
// an honest node will only build on the most recently finalized block, so it is a 'leaf' despite
// having descendants
//
// best_chain will always be this finalized block, yet Substrate explicitly defines it as one of
// the above leaves, which this finalized block is explicitly not included in. Accordingly, we
// can never provide a compatible decision
//
// Since PartialComponents expects it, an implementation which does its best is provided. It panics
// if leaves is called, yet returns the finalized chain tip for best_chain, as that's intended to
// be the header to build upon
pub struct TendermintSelectChain<B: Block, Be: Backend<B>>(Arc<Be>, PhantomData<B>);
impl<B: Block, Be: Backend<B>> Clone for TendermintSelectChain<B, Be> {
fn clone(&self) -> Self {
TendermintSelectChain(self.0.clone(), PhantomData)
}
}
impl<B: Block, Be: Backend<B>> TendermintSelectChain<B, Be> {
pub fn new(backend: Arc<Be>) -> TendermintSelectChain<B, Be> {
TendermintSelectChain(backend, PhantomData)
}
}
#[async_trait]
impl<B: Block, Be: Backend<B>> SelectChain<B> for TendermintSelectChain<B, Be> {
async fn leaves(&self) -> Result<Vec<B::Hash>, Error> {
panic!("Substrate definition of leaves is incompatible with Tendermint")
}
async fn best_chain(&self) -> Result<B::Header, Error> {
Ok(
self
.0
.blockchain()
// There should always be a finalized block
.header(BlockId::Hash(self.0.blockchain().last_finalized().unwrap()))
// There should not be an error in retrieving it and since it's finalized, it should exist
.unwrap()
.unwrap(),
)
}
}