Add a cosigning protocol to ensure finalizations are unique (#433)
* Add a function to deterministically decide which Serai blocks should be co-signed
Has a 5 minute latency between co-signs, also used as the maximal latency
before a co-sign is started.
* Get all active tributaries we're in at a specific block
* Add and route CosignSubstrateBlock, a new provided TX
* Split queued cosigns per network
* Rename BatchSignId to SubstrateSignId
* Add SubstrateSignableId, a meta-type for either Batch or Block, and modularize around it
* Handle the CosignSubstrateBlock provided TX
* Revert substrate_signer.rs to develop (and patch to still work)
Due to SubstrateSigner moving when the prior multisig closes, yet cosigning
occurring with the most recent key, a single SubstrateSigner can be reused.
We could manage multiple SubstrateSigners, yet considering the much lower
specifications for cosigning, I'd rather treat it distinctly.
* Route cosigning through the processor
* Add note to rename SubstrateSigner post-PR
I don't want to do so now in order to preserve the diff's clarity.
* Implement cosign evaluation into the coordinator
* Get tests to compile
* Bug fixes, mark blocks without cosigners available as cosigned
* Correct the ID Batch preprocesses are saved under, add log statements
* Create a dedicated function to handle cosigns
* Correct the flow around Batch verification/queueing
Verifying `Batch`s could stall when a `Batch` was signed before its
predecessors/before the block it's contained in was cosigned (the latter being
inevitable as we can't sign a block containing a signed batch before signing
the batch).
Now, Batch verification happens on a distinct async task in order to not block
the handling of processor messages. This task is the sole caller of verify in
order to ensure last_verified_batch isn't unexpectedly mutated.
When the processor message handler needs to access it, or needs to queue a
Batch, it associates the DB TXN with a lock preventing the other task from
doing so.
This lock, as currently implemented, is a poor and inefficient design. It
should be modified to the pattern used for cosign management. Additionally, a
new primitive of a DB-backed channel may be immensely valuable.
Fixes a standing potential deadlock and a deadlock introduced with the
cosigning protocol.
* Working full-stack tests
After the last commit, this only required extending a timeout.
* Replace "co-sign" with "cosign" to make finding text easier
* Update the coordinator tests to support cosigning
* Inline prior_batch calculation to prevent panic on rotation
Noticed when doing a final review of the branch.
2023-11-15 21:57:21 +00:00
|
|
|
use core::fmt;
|
|
|
|
use std::collections::HashMap;
|
|
|
|
|
|
|
|
use rand_core::OsRng;
|
|
|
|
|
|
|
|
use ciphersuite::group::GroupEncoding;
|
|
|
|
use frost::{
|
|
|
|
curve::Ristretto,
|
|
|
|
ThresholdKeys, FrostError,
|
|
|
|
algorithm::Algorithm,
|
|
|
|
sign::{
|
|
|
|
Writable, PreprocessMachine, SignMachine, SignatureMachine, AlgorithmMachine,
|
|
|
|
AlgorithmSignMachine, AlgorithmSignatureMachine,
|
|
|
|
},
|
|
|
|
};
|
|
|
|
use frost_schnorrkel::Schnorrkel;
|
|
|
|
|
|
|
|
use log::{info, warn};
|
|
|
|
|
|
|
|
use scale::Encode;
|
|
|
|
|
|
|
|
use messages::coordinator::*;
|
|
|
|
use crate::{Get, DbTxn, create_db};
|
|
|
|
|
|
|
|
create_db! {
|
|
|
|
CosignerDb {
|
|
|
|
Completed: (id: [u8; 32]) -> (),
|
2023-11-19 01:54:37 +00:00
|
|
|
Attempt: (id: [u8; 32], attempt: u32) -> (),
|
Add a cosigning protocol to ensure finalizations are unique (#433)
* Add a function to deterministically decide which Serai blocks should be co-signed
Has a 5 minute latency between co-signs, also used as the maximal latency
before a co-sign is started.
* Get all active tributaries we're in at a specific block
* Add and route CosignSubstrateBlock, a new provided TX
* Split queued cosigns per network
* Rename BatchSignId to SubstrateSignId
* Add SubstrateSignableId, a meta-type for either Batch or Block, and modularize around it
* Handle the CosignSubstrateBlock provided TX
* Revert substrate_signer.rs to develop (and patch to still work)
Due to SubstrateSigner moving when the prior multisig closes, yet cosigning
occurring with the most recent key, a single SubstrateSigner can be reused.
We could manage multiple SubstrateSigners, yet considering the much lower
specifications for cosigning, I'd rather treat it distinctly.
* Route cosigning through the processor
* Add note to rename SubstrateSigner post-PR
I don't want to do so now in order to preserve the diff's clarity.
* Implement cosign evaluation into the coordinator
* Get tests to compile
* Bug fixes, mark blocks without cosigners available as cosigned
* Correct the ID Batch preprocesses are saved under, add log statements
* Create a dedicated function to handle cosigns
* Correct the flow around Batch verification/queueing
Verifying `Batch`s could stall when a `Batch` was signed before its
predecessors/before the block it's contained in was cosigned (the latter being
inevitable as we can't sign a block containing a signed batch before signing
the batch).
Now, Batch verification happens on a distinct async task in order to not block
the handling of processor messages. This task is the sole caller of verify in
order to ensure last_verified_batch isn't unexpectedly mutated.
When the processor message handler needs to access it, or needs to queue a
Batch, it associates the DB TXN with a lock preventing the other task from
doing so.
This lock, as currently implemented, is a poor and inefficient design. It
should be modified to the pattern used for cosign management. Additionally, a
new primitive of a DB-backed channel may be immensely valuable.
Fixes a standing potential deadlock and a deadlock introduced with the
cosigning protocol.
* Working full-stack tests
After the last commit, this only required extending a timeout.
* Replace "co-sign" with "cosign" to make finding text easier
* Update the coordinator tests to support cosigning
* Inline prior_batch calculation to prevent panic on rotation
Noticed when doing a final review of the branch.
2023-11-15 21:57:21 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
type Preprocess = <AlgorithmMachine<Ristretto, Schnorrkel> as PreprocessMachine>::Preprocess;
|
|
|
|
type SignatureShare = <AlgorithmSignMachine<Ristretto, Schnorrkel> as SignMachine<
|
|
|
|
<Schnorrkel as Algorithm<Ristretto>>::Signature,
|
|
|
|
>>::SignatureShare;
|
|
|
|
|
|
|
|
pub struct Cosigner {
|
|
|
|
#[allow(dead_code)] // False positive
|
|
|
|
keys: Vec<ThresholdKeys<Ristretto>>,
|
|
|
|
|
2023-11-16 01:23:19 +00:00
|
|
|
block_number: u64,
|
Add a cosigning protocol to ensure finalizations are unique (#433)
* Add a function to deterministically decide which Serai blocks should be co-signed
Has a 5 minute latency between co-signs, also used as the maximal latency
before a co-sign is started.
* Get all active tributaries we're in at a specific block
* Add and route CosignSubstrateBlock, a new provided TX
* Split queued cosigns per network
* Rename BatchSignId to SubstrateSignId
* Add SubstrateSignableId, a meta-type for either Batch or Block, and modularize around it
* Handle the CosignSubstrateBlock provided TX
* Revert substrate_signer.rs to develop (and patch to still work)
Due to SubstrateSigner moving when the prior multisig closes, yet cosigning
occurring with the most recent key, a single SubstrateSigner can be reused.
We could manage multiple SubstrateSigners, yet considering the much lower
specifications for cosigning, I'd rather treat it distinctly.
* Route cosigning through the processor
* Add note to rename SubstrateSigner post-PR
I don't want to do so now in order to preserve the diff's clarity.
* Implement cosign evaluation into the coordinator
* Get tests to compile
* Bug fixes, mark blocks without cosigners available as cosigned
* Correct the ID Batch preprocesses are saved under, add log statements
* Create a dedicated function to handle cosigns
* Correct the flow around Batch verification/queueing
Verifying `Batch`s could stall when a `Batch` was signed before its
predecessors/before the block it's contained in was cosigned (the latter being
inevitable as we can't sign a block containing a signed batch before signing
the batch).
Now, Batch verification happens on a distinct async task in order to not block
the handling of processor messages. This task is the sole caller of verify in
order to ensure last_verified_batch isn't unexpectedly mutated.
When the processor message handler needs to access it, or needs to queue a
Batch, it associates the DB TXN with a lock preventing the other task from
doing so.
This lock, as currently implemented, is a poor and inefficient design. It
should be modified to the pattern used for cosign management. Additionally, a
new primitive of a DB-backed channel may be immensely valuable.
Fixes a standing potential deadlock and a deadlock introduced with the
cosigning protocol.
* Working full-stack tests
After the last commit, this only required extending a timeout.
* Replace "co-sign" with "cosign" to make finding text easier
* Update the coordinator tests to support cosigning
* Inline prior_batch calculation to prevent panic on rotation
Noticed when doing a final review of the branch.
2023-11-15 21:57:21 +00:00
|
|
|
id: [u8; 32],
|
|
|
|
attempt: u32,
|
|
|
|
#[allow(clippy::type_complexity)]
|
|
|
|
preprocessing: Option<(Vec<AlgorithmSignMachine<Ristretto, Schnorrkel>>, Vec<Preprocess>)>,
|
|
|
|
#[allow(clippy::type_complexity)]
|
|
|
|
signing: Option<(AlgorithmSignatureMachine<Ristretto, Schnorrkel>, Vec<SignatureShare>)>,
|
|
|
|
}
|
|
|
|
|
|
|
|
impl fmt::Debug for Cosigner {
|
|
|
|
fn fmt(&self, fmt: &mut fmt::Formatter<'_>) -> fmt::Result {
|
|
|
|
fmt
|
|
|
|
.debug_struct("Cosigner")
|
2023-11-16 01:23:19 +00:00
|
|
|
.field("block_number", &self.block_number)
|
Add a cosigning protocol to ensure finalizations are unique (#433)
* Add a function to deterministically decide which Serai blocks should be co-signed
Has a 5 minute latency between co-signs, also used as the maximal latency
before a co-sign is started.
* Get all active tributaries we're in at a specific block
* Add and route CosignSubstrateBlock, a new provided TX
* Split queued cosigns per network
* Rename BatchSignId to SubstrateSignId
* Add SubstrateSignableId, a meta-type for either Batch or Block, and modularize around it
* Handle the CosignSubstrateBlock provided TX
* Revert substrate_signer.rs to develop (and patch to still work)
Due to SubstrateSigner moving when the prior multisig closes, yet cosigning
occurring with the most recent key, a single SubstrateSigner can be reused.
We could manage multiple SubstrateSigners, yet considering the much lower
specifications for cosigning, I'd rather treat it distinctly.
* Route cosigning through the processor
* Add note to rename SubstrateSigner post-PR
I don't want to do so now in order to preserve the diff's clarity.
* Implement cosign evaluation into the coordinator
* Get tests to compile
* Bug fixes, mark blocks without cosigners available as cosigned
* Correct the ID Batch preprocesses are saved under, add log statements
* Create a dedicated function to handle cosigns
* Correct the flow around Batch verification/queueing
Verifying `Batch`s could stall when a `Batch` was signed before its
predecessors/before the block it's contained in was cosigned (the latter being
inevitable as we can't sign a block containing a signed batch before signing
the batch).
Now, Batch verification happens on a distinct async task in order to not block
the handling of processor messages. This task is the sole caller of verify in
order to ensure last_verified_batch isn't unexpectedly mutated.
When the processor message handler needs to access it, or needs to queue a
Batch, it associates the DB TXN with a lock preventing the other task from
doing so.
This lock, as currently implemented, is a poor and inefficient design. It
should be modified to the pattern used for cosign management. Additionally, a
new primitive of a DB-backed channel may be immensely valuable.
Fixes a standing potential deadlock and a deadlock introduced with the
cosigning protocol.
* Working full-stack tests
After the last commit, this only required extending a timeout.
* Replace "co-sign" with "cosign" to make finding text easier
* Update the coordinator tests to support cosigning
* Inline prior_batch calculation to prevent panic on rotation
Noticed when doing a final review of the branch.
2023-11-15 21:57:21 +00:00
|
|
|
.field("id", &self.id)
|
|
|
|
.field("attempt", &self.attempt)
|
|
|
|
.field("preprocessing", &self.preprocessing.is_some())
|
|
|
|
.field("signing", &self.signing.is_some())
|
|
|
|
.finish_non_exhaustive()
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
impl Cosigner {
|
|
|
|
pub fn new(
|
|
|
|
txn: &mut impl DbTxn,
|
|
|
|
keys: Vec<ThresholdKeys<Ristretto>>,
|
2023-11-16 01:23:19 +00:00
|
|
|
block_number: u64,
|
Add a cosigning protocol to ensure finalizations are unique (#433)
* Add a function to deterministically decide which Serai blocks should be co-signed
Has a 5 minute latency between co-signs, also used as the maximal latency
before a co-sign is started.
* Get all active tributaries we're in at a specific block
* Add and route CosignSubstrateBlock, a new provided TX
* Split queued cosigns per network
* Rename BatchSignId to SubstrateSignId
* Add SubstrateSignableId, a meta-type for either Batch or Block, and modularize around it
* Handle the CosignSubstrateBlock provided TX
* Revert substrate_signer.rs to develop (and patch to still work)
Due to SubstrateSigner moving when the prior multisig closes, yet cosigning
occurring with the most recent key, a single SubstrateSigner can be reused.
We could manage multiple SubstrateSigners, yet considering the much lower
specifications for cosigning, I'd rather treat it distinctly.
* Route cosigning through the processor
* Add note to rename SubstrateSigner post-PR
I don't want to do so now in order to preserve the diff's clarity.
* Implement cosign evaluation into the coordinator
* Get tests to compile
* Bug fixes, mark blocks without cosigners available as cosigned
* Correct the ID Batch preprocesses are saved under, add log statements
* Create a dedicated function to handle cosigns
* Correct the flow around Batch verification/queueing
Verifying `Batch`s could stall when a `Batch` was signed before its
predecessors/before the block it's contained in was cosigned (the latter being
inevitable as we can't sign a block containing a signed batch before signing
the batch).
Now, Batch verification happens on a distinct async task in order to not block
the handling of processor messages. This task is the sole caller of verify in
order to ensure last_verified_batch isn't unexpectedly mutated.
When the processor message handler needs to access it, or needs to queue a
Batch, it associates the DB TXN with a lock preventing the other task from
doing so.
This lock, as currently implemented, is a poor and inefficient design. It
should be modified to the pattern used for cosign management. Additionally, a
new primitive of a DB-backed channel may be immensely valuable.
Fixes a standing potential deadlock and a deadlock introduced with the
cosigning protocol.
* Working full-stack tests
After the last commit, this only required extending a timeout.
* Replace "co-sign" with "cosign" to make finding text easier
* Update the coordinator tests to support cosigning
* Inline prior_batch calculation to prevent panic on rotation
Noticed when doing a final review of the branch.
2023-11-15 21:57:21 +00:00
|
|
|
id: [u8; 32],
|
|
|
|
attempt: u32,
|
|
|
|
) -> Option<(Cosigner, ProcessorMessage)> {
|
|
|
|
assert!(!keys.is_empty());
|
|
|
|
|
|
|
|
if Completed::get(txn, id).is_some() {
|
|
|
|
return None;
|
|
|
|
}
|
|
|
|
|
|
|
|
if Attempt::get(txn, id, attempt).is_some() {
|
|
|
|
warn!(
|
|
|
|
"already attempted cosigning {}, attempt #{}. this is an error if we didn't reboot",
|
|
|
|
hex::encode(id),
|
|
|
|
attempt,
|
|
|
|
);
|
|
|
|
return None;
|
|
|
|
}
|
|
|
|
Attempt::set(txn, id, attempt, &());
|
|
|
|
|
|
|
|
info!("cosigning block {} with attempt #{}", hex::encode(id), attempt);
|
|
|
|
|
|
|
|
let mut machines = vec![];
|
|
|
|
let mut preprocesses = vec![];
|
|
|
|
let mut serialized_preprocesses = vec![];
|
|
|
|
for keys in &keys {
|
|
|
|
// b"substrate" is a literal from sp-core
|
|
|
|
let machine = AlgorithmMachine::new(Schnorrkel::new(b"substrate"), keys.clone());
|
|
|
|
|
|
|
|
let (machine, preprocess) = machine.preprocess(&mut OsRng);
|
|
|
|
machines.push(machine);
|
|
|
|
serialized_preprocesses.push(preprocess.serialize());
|
|
|
|
preprocesses.push(preprocess);
|
|
|
|
}
|
|
|
|
let preprocessing = Some((machines, preprocesses));
|
|
|
|
|
|
|
|
let substrate_sign_id = SubstrateSignId {
|
|
|
|
key: keys[0].group_key().to_bytes(),
|
|
|
|
id: SubstrateSignableId::CosigningSubstrateBlock(id),
|
|
|
|
attempt,
|
|
|
|
};
|
|
|
|
|
|
|
|
Some((
|
2023-11-16 01:23:19 +00:00
|
|
|
Cosigner { keys, block_number, id, attempt, preprocessing, signing: None },
|
Add a cosigning protocol to ensure finalizations are unique (#433)
* Add a function to deterministically decide which Serai blocks should be co-signed
Has a 5 minute latency between co-signs, also used as the maximal latency
before a co-sign is started.
* Get all active tributaries we're in at a specific block
* Add and route CosignSubstrateBlock, a new provided TX
* Split queued cosigns per network
* Rename BatchSignId to SubstrateSignId
* Add SubstrateSignableId, a meta-type for either Batch or Block, and modularize around it
* Handle the CosignSubstrateBlock provided TX
* Revert substrate_signer.rs to develop (and patch to still work)
Due to SubstrateSigner moving when the prior multisig closes, yet cosigning
occurring with the most recent key, a single SubstrateSigner can be reused.
We could manage multiple SubstrateSigners, yet considering the much lower
specifications for cosigning, I'd rather treat it distinctly.
* Route cosigning through the processor
* Add note to rename SubstrateSigner post-PR
I don't want to do so now in order to preserve the diff's clarity.
* Implement cosign evaluation into the coordinator
* Get tests to compile
* Bug fixes, mark blocks without cosigners available as cosigned
* Correct the ID Batch preprocesses are saved under, add log statements
* Create a dedicated function to handle cosigns
* Correct the flow around Batch verification/queueing
Verifying `Batch`s could stall when a `Batch` was signed before its
predecessors/before the block it's contained in was cosigned (the latter being
inevitable as we can't sign a block containing a signed batch before signing
the batch).
Now, Batch verification happens on a distinct async task in order to not block
the handling of processor messages. This task is the sole caller of verify in
order to ensure last_verified_batch isn't unexpectedly mutated.
When the processor message handler needs to access it, or needs to queue a
Batch, it associates the DB TXN with a lock preventing the other task from
doing so.
This lock, as currently implemented, is a poor and inefficient design. It
should be modified to the pattern used for cosign management. Additionally, a
new primitive of a DB-backed channel may be immensely valuable.
Fixes a standing potential deadlock and a deadlock introduced with the
cosigning protocol.
* Working full-stack tests
After the last commit, this only required extending a timeout.
* Replace "co-sign" with "cosign" to make finding text easier
* Update the coordinator tests to support cosigning
* Inline prior_batch calculation to prevent panic on rotation
Noticed when doing a final review of the branch.
2023-11-15 21:57:21 +00:00
|
|
|
ProcessorMessage::CosignPreprocess {
|
|
|
|
id: substrate_sign_id,
|
|
|
|
preprocesses: serialized_preprocesses,
|
|
|
|
},
|
|
|
|
))
|
|
|
|
}
|
|
|
|
|
|
|
|
#[must_use]
|
|
|
|
pub async fn handle(
|
|
|
|
&mut self,
|
|
|
|
txn: &mut impl DbTxn,
|
|
|
|
msg: CoordinatorMessage,
|
|
|
|
) -> Option<ProcessorMessage> {
|
|
|
|
match msg {
|
|
|
|
CoordinatorMessage::CosignSubstrateBlock { .. } => {
|
|
|
|
panic!("Cosigner passed CosignSubstrateBlock")
|
|
|
|
}
|
|
|
|
|
|
|
|
CoordinatorMessage::SubstratePreprocesses { id, preprocesses } => {
|
|
|
|
assert_eq!(id.key, self.keys[0].group_key().to_bytes());
|
|
|
|
let SubstrateSignableId::CosigningSubstrateBlock(block) = id.id else {
|
|
|
|
panic!("cosigner passed Batch")
|
|
|
|
};
|
|
|
|
if block != self.id {
|
|
|
|
panic!("given preprocesses for a distinct block than cosigner is signing")
|
|
|
|
}
|
|
|
|
if id.attempt != self.attempt {
|
|
|
|
panic!("given preprocesses for a distinct attempt than cosigner is signing")
|
|
|
|
}
|
|
|
|
|
|
|
|
let (machines, our_preprocesses) = match self.preprocessing.take() {
|
|
|
|
// Either rebooted or RPC error, or some invariant
|
|
|
|
None => {
|
|
|
|
warn!(
|
|
|
|
"not preprocessing for {}. this is an error if we didn't reboot",
|
|
|
|
hex::encode(block),
|
|
|
|
);
|
|
|
|
return None;
|
|
|
|
}
|
|
|
|
Some(preprocess) => preprocess,
|
|
|
|
};
|
|
|
|
|
|
|
|
let mut parsed = HashMap::new();
|
|
|
|
for l in {
|
|
|
|
let mut keys = preprocesses.keys().cloned().collect::<Vec<_>>();
|
|
|
|
keys.sort();
|
|
|
|
keys
|
|
|
|
} {
|
|
|
|
let mut preprocess_ref = preprocesses.get(&l).unwrap().as_slice();
|
|
|
|
let Ok(res) = machines[0].read_preprocess(&mut preprocess_ref) else {
|
|
|
|
return Some(ProcessorMessage::InvalidParticipant { id, participant: l });
|
|
|
|
};
|
|
|
|
if !preprocess_ref.is_empty() {
|
|
|
|
return Some(ProcessorMessage::InvalidParticipant { id, participant: l });
|
|
|
|
}
|
|
|
|
parsed.insert(l, res);
|
|
|
|
}
|
|
|
|
let preprocesses = parsed;
|
|
|
|
|
|
|
|
// Only keep a single machine as we only need one to get the signature
|
|
|
|
let mut signature_machine = None;
|
|
|
|
let mut shares = vec![];
|
|
|
|
let mut serialized_shares = vec![];
|
|
|
|
for (m, machine) in machines.into_iter().enumerate() {
|
|
|
|
let mut preprocesses = preprocesses.clone();
|
|
|
|
for (i, our_preprocess) in our_preprocesses.clone().into_iter().enumerate() {
|
|
|
|
if i != m {
|
|
|
|
assert!(preprocesses.insert(self.keys[i].params().i(), our_preprocess).is_none());
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2023-11-16 01:23:19 +00:00
|
|
|
let (machine, share) =
|
|
|
|
match machine.sign(preprocesses, &cosign_block_msg(self.block_number, self.id)) {
|
|
|
|
Ok(res) => res,
|
|
|
|
Err(e) => match e {
|
|
|
|
FrostError::InternalError(_) |
|
|
|
|
FrostError::InvalidParticipant(_, _) |
|
|
|
|
FrostError::InvalidSigningSet(_) |
|
|
|
|
FrostError::InvalidParticipantQuantity(_, _) |
|
|
|
|
FrostError::DuplicatedParticipant(_) |
|
|
|
|
FrostError::MissingParticipant(_) => unreachable!(),
|
|
|
|
|
|
|
|
FrostError::InvalidPreprocess(l) | FrostError::InvalidShare(l) => {
|
|
|
|
return Some(ProcessorMessage::InvalidParticipant { id, participant: l })
|
|
|
|
}
|
|
|
|
},
|
|
|
|
};
|
Add a cosigning protocol to ensure finalizations are unique (#433)
* Add a function to deterministically decide which Serai blocks should be co-signed
Has a 5 minute latency between co-signs, also used as the maximal latency
before a co-sign is started.
* Get all active tributaries we're in at a specific block
* Add and route CosignSubstrateBlock, a new provided TX
* Split queued cosigns per network
* Rename BatchSignId to SubstrateSignId
* Add SubstrateSignableId, a meta-type for either Batch or Block, and modularize around it
* Handle the CosignSubstrateBlock provided TX
* Revert substrate_signer.rs to develop (and patch to still work)
Due to SubstrateSigner moving when the prior multisig closes, yet cosigning
occurring with the most recent key, a single SubstrateSigner can be reused.
We could manage multiple SubstrateSigners, yet considering the much lower
specifications for cosigning, I'd rather treat it distinctly.
* Route cosigning through the processor
* Add note to rename SubstrateSigner post-PR
I don't want to do so now in order to preserve the diff's clarity.
* Implement cosign evaluation into the coordinator
* Get tests to compile
* Bug fixes, mark blocks without cosigners available as cosigned
* Correct the ID Batch preprocesses are saved under, add log statements
* Create a dedicated function to handle cosigns
* Correct the flow around Batch verification/queueing
Verifying `Batch`s could stall when a `Batch` was signed before its
predecessors/before the block it's contained in was cosigned (the latter being
inevitable as we can't sign a block containing a signed batch before signing
the batch).
Now, Batch verification happens on a distinct async task in order to not block
the handling of processor messages. This task is the sole caller of verify in
order to ensure last_verified_batch isn't unexpectedly mutated.
When the processor message handler needs to access it, or needs to queue a
Batch, it associates the DB TXN with a lock preventing the other task from
doing so.
This lock, as currently implemented, is a poor and inefficient design. It
should be modified to the pattern used for cosign management. Additionally, a
new primitive of a DB-backed channel may be immensely valuable.
Fixes a standing potential deadlock and a deadlock introduced with the
cosigning protocol.
* Working full-stack tests
After the last commit, this only required extending a timeout.
* Replace "co-sign" with "cosign" to make finding text easier
* Update the coordinator tests to support cosigning
* Inline prior_batch calculation to prevent panic on rotation
Noticed when doing a final review of the branch.
2023-11-15 21:57:21 +00:00
|
|
|
if m == 0 {
|
|
|
|
signature_machine = Some(machine);
|
|
|
|
}
|
|
|
|
|
|
|
|
let mut share_bytes = [0; 32];
|
|
|
|
share_bytes.copy_from_slice(&share.serialize());
|
|
|
|
serialized_shares.push(share_bytes);
|
|
|
|
|
|
|
|
shares.push(share);
|
|
|
|
}
|
|
|
|
self.signing = Some((signature_machine.unwrap(), shares));
|
|
|
|
|
|
|
|
// Broadcast our shares
|
|
|
|
Some(ProcessorMessage::SubstrateShare { id, shares: serialized_shares })
|
|
|
|
}
|
|
|
|
|
|
|
|
CoordinatorMessage::SubstrateShares { id, shares } => {
|
|
|
|
assert_eq!(id.key, self.keys[0].group_key().to_bytes());
|
|
|
|
let SubstrateSignableId::CosigningSubstrateBlock(block) = id.id else {
|
|
|
|
panic!("cosigner passed Batch")
|
|
|
|
};
|
|
|
|
if block != self.id {
|
|
|
|
panic!("given preprocesses for a distinct block than cosigner is signing")
|
|
|
|
}
|
|
|
|
if id.attempt != self.attempt {
|
|
|
|
panic!("given preprocesses for a distinct attempt than cosigner is signing")
|
|
|
|
}
|
|
|
|
|
|
|
|
let (machine, our_shares) = match self.signing.take() {
|
|
|
|
// Rebooted, RPC error, or some invariant
|
|
|
|
None => {
|
|
|
|
// If preprocessing has this ID, it means we were never sent the preprocess by the
|
|
|
|
// coordinator
|
|
|
|
if self.preprocessing.is_some() {
|
|
|
|
panic!("never preprocessed yet signing?");
|
|
|
|
}
|
|
|
|
|
|
|
|
warn!(
|
|
|
|
"not preprocessing for {}. this is an error if we didn't reboot",
|
|
|
|
hex::encode(block)
|
|
|
|
);
|
|
|
|
return None;
|
|
|
|
}
|
|
|
|
Some(signing) => signing,
|
|
|
|
};
|
|
|
|
|
|
|
|
let mut parsed = HashMap::new();
|
|
|
|
for l in {
|
|
|
|
let mut keys = shares.keys().cloned().collect::<Vec<_>>();
|
|
|
|
keys.sort();
|
|
|
|
keys
|
|
|
|
} {
|
|
|
|
let mut share_ref = shares.get(&l).unwrap().as_slice();
|
|
|
|
let Ok(res) = machine.read_share(&mut share_ref) else {
|
|
|
|
return Some(ProcessorMessage::InvalidParticipant { id, participant: l });
|
|
|
|
};
|
|
|
|
if !share_ref.is_empty() {
|
|
|
|
return Some(ProcessorMessage::InvalidParticipant { id, participant: l });
|
|
|
|
}
|
|
|
|
parsed.insert(l, res);
|
|
|
|
}
|
|
|
|
let mut shares = parsed;
|
|
|
|
|
|
|
|
for (i, our_share) in our_shares.into_iter().enumerate().skip(1) {
|
|
|
|
assert!(shares.insert(self.keys[i].params().i(), our_share).is_none());
|
|
|
|
}
|
|
|
|
|
|
|
|
let sig = match machine.complete(shares) {
|
|
|
|
Ok(res) => res,
|
|
|
|
Err(e) => match e {
|
|
|
|
FrostError::InternalError(_) |
|
|
|
|
FrostError::InvalidParticipant(_, _) |
|
|
|
|
FrostError::InvalidSigningSet(_) |
|
|
|
|
FrostError::InvalidParticipantQuantity(_, _) |
|
|
|
|
FrostError::DuplicatedParticipant(_) |
|
|
|
|
FrostError::MissingParticipant(_) => unreachable!(),
|
|
|
|
|
|
|
|
FrostError::InvalidPreprocess(l) | FrostError::InvalidShare(l) => {
|
|
|
|
return Some(ProcessorMessage::InvalidParticipant { id, participant: l })
|
|
|
|
}
|
|
|
|
},
|
|
|
|
};
|
|
|
|
|
|
|
|
info!("cosigned {} with attempt #{}", hex::encode(block), id.attempt);
|
|
|
|
|
|
|
|
Completed::set(txn, block, &());
|
|
|
|
|
2023-11-16 01:23:19 +00:00
|
|
|
Some(ProcessorMessage::CosignedBlock {
|
|
|
|
block_number: self.block_number,
|
|
|
|
block,
|
|
|
|
signature: sig.to_bytes().to_vec(),
|
|
|
|
})
|
Add a cosigning protocol to ensure finalizations are unique (#433)
* Add a function to deterministically decide which Serai blocks should be co-signed
Has a 5 minute latency between co-signs, also used as the maximal latency
before a co-sign is started.
* Get all active tributaries we're in at a specific block
* Add and route CosignSubstrateBlock, a new provided TX
* Split queued cosigns per network
* Rename BatchSignId to SubstrateSignId
* Add SubstrateSignableId, a meta-type for either Batch or Block, and modularize around it
* Handle the CosignSubstrateBlock provided TX
* Revert substrate_signer.rs to develop (and patch to still work)
Due to SubstrateSigner moving when the prior multisig closes, yet cosigning
occurring with the most recent key, a single SubstrateSigner can be reused.
We could manage multiple SubstrateSigners, yet considering the much lower
specifications for cosigning, I'd rather treat it distinctly.
* Route cosigning through the processor
* Add note to rename SubstrateSigner post-PR
I don't want to do so now in order to preserve the diff's clarity.
* Implement cosign evaluation into the coordinator
* Get tests to compile
* Bug fixes, mark blocks without cosigners available as cosigned
* Correct the ID Batch preprocesses are saved under, add log statements
* Create a dedicated function to handle cosigns
* Correct the flow around Batch verification/queueing
Verifying `Batch`s could stall when a `Batch` was signed before its
predecessors/before the block it's contained in was cosigned (the latter being
inevitable as we can't sign a block containing a signed batch before signing
the batch).
Now, Batch verification happens on a distinct async task in order to not block
the handling of processor messages. This task is the sole caller of verify in
order to ensure last_verified_batch isn't unexpectedly mutated.
When the processor message handler needs to access it, or needs to queue a
Batch, it associates the DB TXN with a lock preventing the other task from
doing so.
This lock, as currently implemented, is a poor and inefficient design. It
should be modified to the pattern used for cosign management. Additionally, a
new primitive of a DB-backed channel may be immensely valuable.
Fixes a standing potential deadlock and a deadlock introduced with the
cosigning protocol.
* Working full-stack tests
After the last commit, this only required extending a timeout.
* Replace "co-sign" with "cosign" to make finding text easier
* Update the coordinator tests to support cosigning
* Inline prior_batch calculation to prevent panic on rotation
Noticed when doing a final review of the branch.
2023-11-15 21:57:21 +00:00
|
|
|
}
|
|
|
|
CoordinatorMessage::BatchReattempt { .. } => panic!("BatchReattempt passed to Cosigner"),
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|