serai/processor/TODO/tests/batch_signer.rs

155 lines
4.2 KiB
Rust
Raw Normal View History

// TODO
use std::collections::HashMap;
2023-04-10 16:48:48 +00:00
use rand_core::{RngCore, OsRng};
2023-04-10 16:48:48 +00:00
use ciphersuite::group::GroupEncoding;
2023-04-10 16:48:48 +00:00
use frost::{
curve::Ristretto,
Participant,
dkg::tests::{key_gen, clone_without},
};
use sp_application_crypto::{RuntimePublic, sr25519::Public};
use serai_db::{DbTxn, Db, MemDb};
#[rustfmt::skip]
use serai_client::{primitives::*, in_instructions::primitives::*, validator_sets::primitives::Session};
2023-04-10 16:48:48 +00:00
use messages::{
substrate,
Add a cosigning protocol to ensure finalizations are unique (#433) * Add a function to deterministically decide which Serai blocks should be co-signed Has a 5 minute latency between co-signs, also used as the maximal latency before a co-sign is started. * Get all active tributaries we're in at a specific block * Add and route CosignSubstrateBlock, a new provided TX * Split queued cosigns per network * Rename BatchSignId to SubstrateSignId * Add SubstrateSignableId, a meta-type for either Batch or Block, and modularize around it * Handle the CosignSubstrateBlock provided TX * Revert substrate_signer.rs to develop (and patch to still work) Due to SubstrateSigner moving when the prior multisig closes, yet cosigning occurring with the most recent key, a single SubstrateSigner can be reused. We could manage multiple SubstrateSigners, yet considering the much lower specifications for cosigning, I'd rather treat it distinctly. * Route cosigning through the processor * Add note to rename SubstrateSigner post-PR I don't want to do so now in order to preserve the diff's clarity. * Implement cosign evaluation into the coordinator * Get tests to compile * Bug fixes, mark blocks without cosigners available as cosigned * Correct the ID Batch preprocesses are saved under, add log statements * Create a dedicated function to handle cosigns * Correct the flow around Batch verification/queueing Verifying `Batch`s could stall when a `Batch` was signed before its predecessors/before the block it's contained in was cosigned (the latter being inevitable as we can't sign a block containing a signed batch before signing the batch). Now, Batch verification happens on a distinct async task in order to not block the handling of processor messages. This task is the sole caller of verify in order to ensure last_verified_batch isn't unexpectedly mutated. When the processor message handler needs to access it, or needs to queue a Batch, it associates the DB TXN with a lock preventing the other task from doing so. This lock, as currently implemented, is a poor and inefficient design. It should be modified to the pattern used for cosign management. Additionally, a new primitive of a DB-backed channel may be immensely valuable. Fixes a standing potential deadlock and a deadlock introduced with the cosigning protocol. * Working full-stack tests After the last commit, this only required extending a timeout. * Replace "co-sign" with "cosign" to make finding text easier * Update the coordinator tests to support cosigning * Inline prior_batch calculation to prevent panic on rotation Noticed when doing a final review of the branch.
2023-11-15 21:57:21 +00:00
coordinator::{self, SubstrateSignableId, SubstrateSignId, CoordinatorMessage},
ProcessorMessage,
};
2023-11-21 03:20:58 +00:00
use crate::batch_signer::BatchSigner;
2023-04-10 16:48:48 +00:00
#[test]
fn test_batch_signer() {
2023-08-25 00:35:50 +00:00
let keys = key_gen::<_, Ristretto>(&mut OsRng);
2023-04-10 16:48:48 +00:00
let participant_one = Participant::new(1).unwrap();
let id: u32 = 5;
2023-04-10 16:48:48 +00:00
let block = BlockHash([0xaa; 32]);
let batch = Batch {
network: NetworkId::Monero,
id,
2023-04-10 16:48:48 +00:00
block,
instructions: vec![
InInstructionWithBalance {
instruction: InInstruction::Transfer(SeraiAddress([0xbb; 32])),
balance: Balance { coin: Coin::Bitcoin, amount: Amount(1000) },
2023-04-10 16:48:48 +00:00
},
InInstructionWithBalance {
instruction: InInstruction::Dex(DexCall::SwapAndAddLiquidity(SeraiAddress([0xbb; 32]))),
balance: Balance { coin: Coin::Monero, amount: Amount(9999999999999999) },
2023-04-10 16:48:48 +00:00
},
],
};
Reattempts (#483) * Schedule re-attempts and add a (not filled out) match statement to actually execute them A comment explains the methodology. To copy it here: """ This is because we *always* re-attempt any protocol which had participation. That doesn't mean we *should* re-attempt this protocol. The alternatives were: 1) Note on-chain we completed a protocol, halting re-attempts upon 34%. 2) Vote on-chain to re-attempt a protocol. This schema doesn't have any additional messages upon the success case (whereas alternative #1 does) and doesn't have overhead (as alternative #2 does, sending votes and then preprocesses. This only sends preprocesses). """ Any signing protocol which reaches sufficient participation will be re-attempted until it no longer does. * Have the Substrate scanner track DKG removals/completions for the Tributary code * Don't keep trying to publish a participant removal if we've already set keys * Pad out the re-attempt match a bit more * Have CosignEvaluator reload from the DB * Correctly schedule cosign re-attempts * Actuall spawn new DKG removal attempts * Use u32 for Batch ID in SubstrateSignableId, finish Batch re-attempt routing The batch ID was an opaque [u8; 5] which also included the network, yet that's redundant and unhelpful. * Clarify a pair of TODOs in the coordinator * Remove old TODO * Final comment cleanup * Correct usage of TARGET_BLOCK_TIME in reattempt scheduler It's in ms and I assumed it was in s. * Have coordinator tests drop BatchReattempts which aren't relevant yet may exist * Bug fix and pointless oddity removal We scheduled a re-attempt upon receiving 2/3rds of preprocesses and upon receiving 2/3rds of shares, so any signing protocol could cause two re-attempts (not one more). The coordinator tests randomly generated the Batch ID since it was prior an opaque byte array. While that didn't break the test, it was pointless and did make the already-succeeded check before re-attempting impossible to hit. * Add log statements, correct dead-lock in coordinator tests * Increase pessimistic timeout on recv_message to compensate for tighter best-case timeouts * Further bump timeout by a minute AFAICT, GH failed by just a few seconds. This also is worst-case in a single instance, making it fine to be decently long. * Further further bump timeout due to lack of distinct error
2023-12-12 17:28:53 +00:00
let actual_id =
SubstrateSignId { session: Session(0), id: SubstrateSignableId::Batch(batch.id), attempt: 0 };
2023-11-07 00:50:32 +00:00
let mut signing_set = vec![];
while signing_set.len() < usize::from(keys.values().next().unwrap().params().t()) {
let candidate = Participant::new(
u16::try_from((OsRng.next_u64() % u64::try_from(keys.len()).unwrap()) + 1).unwrap(),
)
.unwrap();
if signing_set.contains(&candidate) {
continue;
}
signing_set.push(candidate);
}
let mut signers = HashMap::new();
let mut dbs = HashMap::new();
2023-04-10 16:48:48 +00:00
let mut preprocesses = HashMap::new();
for i in 1 ..= keys.len() {
let i = Participant::new(u16::try_from(i).unwrap()).unwrap();
let keys = keys.get(&i).unwrap().clone();
let mut signer = BatchSigner::<MemDb>::new(NetworkId::Monero, Session(0), vec![keys]);
let mut db = MemDb::new();
let mut txn = db.txn();
match signer.sign(&mut txn, batch.clone()).unwrap() {
// All participants should emit a preprocess
coordinator::ProcessorMessage::BatchPreprocess {
id,
block: batch_block,
preprocesses: mut these_preprocesses,
} => {
assert_eq!(id, actual_id);
assert_eq!(batch_block, block);
assert_eq!(these_preprocesses.len(), 1);
if signing_set.contains(&i) {
preprocesses.insert(i, these_preprocesses.swap_remove(0));
}
}
_ => panic!("didn't get preprocess back"),
2023-04-10 16:48:48 +00:00
}
txn.commit();
signers.insert(i, signer);
dbs.insert(i, db);
2023-04-10 16:48:48 +00:00
}
let mut shares = HashMap::new();
for i in &signing_set {
let mut txn = dbs.get_mut(i).unwrap().txn();
match signers
.get_mut(i)
.unwrap()
.handle(
&mut txn,
Add a cosigning protocol to ensure finalizations are unique (#433) * Add a function to deterministically decide which Serai blocks should be co-signed Has a 5 minute latency between co-signs, also used as the maximal latency before a co-sign is started. * Get all active tributaries we're in at a specific block * Add and route CosignSubstrateBlock, a new provided TX * Split queued cosigns per network * Rename BatchSignId to SubstrateSignId * Add SubstrateSignableId, a meta-type for either Batch or Block, and modularize around it * Handle the CosignSubstrateBlock provided TX * Revert substrate_signer.rs to develop (and patch to still work) Due to SubstrateSigner moving when the prior multisig closes, yet cosigning occurring with the most recent key, a single SubstrateSigner can be reused. We could manage multiple SubstrateSigners, yet considering the much lower specifications for cosigning, I'd rather treat it distinctly. * Route cosigning through the processor * Add note to rename SubstrateSigner post-PR I don't want to do so now in order to preserve the diff's clarity. * Implement cosign evaluation into the coordinator * Get tests to compile * Bug fixes, mark blocks without cosigners available as cosigned * Correct the ID Batch preprocesses are saved under, add log statements * Create a dedicated function to handle cosigns * Correct the flow around Batch verification/queueing Verifying `Batch`s could stall when a `Batch` was signed before its predecessors/before the block it's contained in was cosigned (the latter being inevitable as we can't sign a block containing a signed batch before signing the batch). Now, Batch verification happens on a distinct async task in order to not block the handling of processor messages. This task is the sole caller of verify in order to ensure last_verified_batch isn't unexpectedly mutated. When the processor message handler needs to access it, or needs to queue a Batch, it associates the DB TXN with a lock preventing the other task from doing so. This lock, as currently implemented, is a poor and inefficient design. It should be modified to the pattern used for cosign management. Additionally, a new primitive of a DB-backed channel may be immensely valuable. Fixes a standing potential deadlock and a deadlock introduced with the cosigning protocol. * Working full-stack tests After the last commit, this only required extending a timeout. * Replace "co-sign" with "cosign" to make finding text easier * Update the coordinator tests to support cosigning * Inline prior_batch calculation to prevent panic on rotation Noticed when doing a final review of the branch.
2023-11-15 21:57:21 +00:00
CoordinatorMessage::SubstratePreprocesses {
id: actual_id.clone(),
preprocesses: clone_without(&preprocesses, i),
},
)
.unwrap()
2023-04-10 16:48:48 +00:00
{
Add a cosigning protocol to ensure finalizations are unique (#433) * Add a function to deterministically decide which Serai blocks should be co-signed Has a 5 minute latency between co-signs, also used as the maximal latency before a co-sign is started. * Get all active tributaries we're in at a specific block * Add and route CosignSubstrateBlock, a new provided TX * Split queued cosigns per network * Rename BatchSignId to SubstrateSignId * Add SubstrateSignableId, a meta-type for either Batch or Block, and modularize around it * Handle the CosignSubstrateBlock provided TX * Revert substrate_signer.rs to develop (and patch to still work) Due to SubstrateSigner moving when the prior multisig closes, yet cosigning occurring with the most recent key, a single SubstrateSigner can be reused. We could manage multiple SubstrateSigners, yet considering the much lower specifications for cosigning, I'd rather treat it distinctly. * Route cosigning through the processor * Add note to rename SubstrateSigner post-PR I don't want to do so now in order to preserve the diff's clarity. * Implement cosign evaluation into the coordinator * Get tests to compile * Bug fixes, mark blocks without cosigners available as cosigned * Correct the ID Batch preprocesses are saved under, add log statements * Create a dedicated function to handle cosigns * Correct the flow around Batch verification/queueing Verifying `Batch`s could stall when a `Batch` was signed before its predecessors/before the block it's contained in was cosigned (the latter being inevitable as we can't sign a block containing a signed batch before signing the batch). Now, Batch verification happens on a distinct async task in order to not block the handling of processor messages. This task is the sole caller of verify in order to ensure last_verified_batch isn't unexpectedly mutated. When the processor message handler needs to access it, or needs to queue a Batch, it associates the DB TXN with a lock preventing the other task from doing so. This lock, as currently implemented, is a poor and inefficient design. It should be modified to the pattern used for cosign management. Additionally, a new primitive of a DB-backed channel may be immensely valuable. Fixes a standing potential deadlock and a deadlock introduced with the cosigning protocol. * Working full-stack tests After the last commit, this only required extending a timeout. * Replace "co-sign" with "cosign" to make finding text easier * Update the coordinator tests to support cosigning * Inline prior_batch calculation to prevent panic on rotation Noticed when doing a final review of the branch.
2023-11-15 21:57:21 +00:00
ProcessorMessage::Coordinator(coordinator::ProcessorMessage::SubstrateShare {
id,
shares: mut these_shares,
}) => {
assert_eq!(id, actual_id);
assert_eq!(these_shares.len(), 1);
shares.insert(*i, these_shares.swap_remove(0));
}
_ => panic!("didn't get share back"),
2023-04-10 16:48:48 +00:00
}
txn.commit();
2023-04-10 16:48:48 +00:00
}
for i in &signing_set {
let mut txn = dbs.get_mut(i).unwrap().txn();
match signers
.get_mut(i)
.unwrap()
.handle(
&mut txn,
Add a cosigning protocol to ensure finalizations are unique (#433) * Add a function to deterministically decide which Serai blocks should be co-signed Has a 5 minute latency between co-signs, also used as the maximal latency before a co-sign is started. * Get all active tributaries we're in at a specific block * Add and route CosignSubstrateBlock, a new provided TX * Split queued cosigns per network * Rename BatchSignId to SubstrateSignId * Add SubstrateSignableId, a meta-type for either Batch or Block, and modularize around it * Handle the CosignSubstrateBlock provided TX * Revert substrate_signer.rs to develop (and patch to still work) Due to SubstrateSigner moving when the prior multisig closes, yet cosigning occurring with the most recent key, a single SubstrateSigner can be reused. We could manage multiple SubstrateSigners, yet considering the much lower specifications for cosigning, I'd rather treat it distinctly. * Route cosigning through the processor * Add note to rename SubstrateSigner post-PR I don't want to do so now in order to preserve the diff's clarity. * Implement cosign evaluation into the coordinator * Get tests to compile * Bug fixes, mark blocks without cosigners available as cosigned * Correct the ID Batch preprocesses are saved under, add log statements * Create a dedicated function to handle cosigns * Correct the flow around Batch verification/queueing Verifying `Batch`s could stall when a `Batch` was signed before its predecessors/before the block it's contained in was cosigned (the latter being inevitable as we can't sign a block containing a signed batch before signing the batch). Now, Batch verification happens on a distinct async task in order to not block the handling of processor messages. This task is the sole caller of verify in order to ensure last_verified_batch isn't unexpectedly mutated. When the processor message handler needs to access it, or needs to queue a Batch, it associates the DB TXN with a lock preventing the other task from doing so. This lock, as currently implemented, is a poor and inefficient design. It should be modified to the pattern used for cosign management. Additionally, a new primitive of a DB-backed channel may be immensely valuable. Fixes a standing potential deadlock and a deadlock introduced with the cosigning protocol. * Working full-stack tests After the last commit, this only required extending a timeout. * Replace "co-sign" with "cosign" to make finding text easier * Update the coordinator tests to support cosigning * Inline prior_batch calculation to prevent panic on rotation Noticed when doing a final review of the branch.
2023-11-15 21:57:21 +00:00
CoordinatorMessage::SubstrateShares {
id: actual_id.clone(),
shares: clone_without(&shares, i),
},
)
.unwrap()
2023-04-10 16:48:48 +00:00
{
ProcessorMessage::Substrate(substrate::ProcessorMessage::SignedBatch {
batch: signed_batch,
}) => {
assert_eq!(signed_batch.batch, batch);
assert!(Public::from_raw(keys[&participant_one].group_key().to_bytes())
.verify(&batch_message(&batch), &signed_batch.signature));
}
_ => panic!("didn't get signed batch back"),
2023-04-10 16:48:48 +00:00
}
txn.commit();
2023-04-10 16:48:48 +00:00
}
}