* Upstream GBP, divisor, circuit abstraction, and EC gadgets from FCMP++
* Initial eVRF implementation
Not quite done yet. It needs to communicate the resulting points and proofs to
extract them from the Pedersen Commitments in order to return those, and then
be tested.
* Add the openings of the PCs to the eVRF as necessary
* Add implementation of secq256k1
* Make DKG Encryption a bit more flexible
No longer requires the use of an EncryptionKeyMessage, and allows pre-defined
keys for encryption.
* Make NUM_BITS an argument for the field macro
* Have the eVRF take a Zeroizing private key
* Initial eVRF-based DKG
* Add embedwards25519 curve
* Inline the eVRF into the DKG library
Due to how we're handling share encryption, we'd either need two circuits or to
dedicate this circuit to the DKG. The latter makes sense at this time.
* Add documentation to the eVRF-based DKG
* Add paragraph claiming robustness
* Update to the new eVRF proof
* Finish routing the eVRF functionality
Still needs errors and serialization, along with a few other TODOs.
* Add initial eVRF DKG test
* Improve eVRF DKG
Updates how we calculcate verification shares, improves performance when
extracting multiple sets of keys, and adds more to the test for it.
* Start using a proper error for the eVRF DKG
* Resolve various TODOs
Supports recovering multiple key shares from the eVRF DKG.
Inlines two loops to save 2**16 iterations.
Adds support for creating a constant time representation of scalars < NUM_BITS.
* Ban zero ECDH keys, document non-zero requirements
* Implement eVRF traits, all the way up to the DKG, for secp256k1/ed25519
* Add Ristretto eVRF trait impls
* Support participating multiple times in the eVRF DKG
* Only participate once per key, not once per key share
* Rewrite processor key-gen around the eVRF DKG
Still a WIP.
* Finish routing the new key gen in the processor
Doesn't touch the tests, coordinator, nor Substrate yet.
`cargo +nightly fmt && cargo +nightly-2024-07-01 clippy --all-features -p serai-processor`
does pass.
* Deduplicate and better document in processor key_gen
* Update serai-processor tests to the new key gen
* Correct amount of yx coefficients, get processor key gen test to pass
* Add embedded elliptic curve keys to Substrate
* Update processor key gen tests to the eVRF DKG
* Have set_keys take signature_participants, not removed_participants
Now no one is removed from the DKG. Only `t` people publish the key however.
Uses a BitVec for an efficient encoding of the participants.
* Update the coordinator binary for the new DKG
This does not yet update any tests.
* Add sensible Debug to key_gen::[Processor, Coordinator]Message
* Have the DKG explicitly declare how to interpolate its shares
Removes the hack for MuSig where we multiply keys by the inverse of their
lagrange interpolation factor.
* Replace Interpolation::None with Interpolation::Constant
Allows the MuSig DKG to keep the secret share as the original private key,
enabling deriving FROST nonces consistently regardless of the MuSig context.
* Get coordinator tests to pass
* Update spec to the new DKG
* Get clippy to pass across the repo
* cargo machete
* Add an extra sleep to ensure expected ordering of `Participation`s
* Update orchestration
* Remove bad panic in coordinator
It expected ConfirmationShare to be n-of-n, not t-of-n.
* Improve documentation on functions
* Update TX size limit
We now no longer have to support the ridiculous case of having 49 DKG
participations within a 101-of-150 DKG. It does remain quite high due to
needing to _sign_ so many times. It'd may be optimal for parties with multiple
key shares to independently send their preprocesses/shares (despite the
overhead that'll cause with signatures and the transaction structure).
* Correct error in the Processor spec document
* Update a few comments in the validator-sets pallet
* Send/Recv Participation one at a time
Sending all, then attempting to receive all in an expected order, wasn't working
even with notable delays between sending messages. This points to the mempool
not working as expected...
* Correct ThresholdKeys serialization in modular-frost test
* Updating existing TX size limit test for the new DKG parameters
* Increase time allowed for the DKG on the GH CI
* Correct construction of signature_participants in serai-client tests
Fault identified by akil.
* Further contextualize DkgConfirmer by ValidatorSet
Caught by a safety check we wouldn't reuse preprocesses across messages. That
raises the question of we were prior reusing preprocesses (reusing keys)?
Except that'd have caused a variety of signing failures (suggesting we had some
staggered timing avoiding it in practice but yes, this was possible in theory).
* Add necessary calls to set_embedded_elliptic_curve_key in coordinator set rotation tests
* Correct shimmed setting of a secq256k1 key
* cargo fmt
* Don't use `[0; 32]` for the embedded keys in the coordinator rotation test
The key_gen function expects the random values already decided.
* Big-endian secq256k1 scalars
Also restores the prior, safer, Encryption::register function.
* Use an extended timeout for DKGs specifically
* Add a log statement when message-queue connection fails
* Add a 60 second keep-alive to connections
* Use zalloc for processor/message-queue/coordinator
An additional layer which protects us against edge cases with Zeroizing
(objects which don't support it or don't miss it).
* Add further logs to message-queue
* Further increase re-attempt timeouts in CI
* Remove misplaced continue inmessage-queue client
Fixes observed CI failures.
* Revert "Further increase re-attempt timeouts in CI"
This reverts commit 3723530cf6.
* Route validators for any active set through sc-authority-discovery
Additionally adds an RPC route to retrieve their P2P addresses.
* Have the coordinator get peers from substrate
* Have the RPC return one address, not up to 3
Prevents the coordinator from believing it has 3 peers when it has one.
* Add missing feature to serai-client
* Correct network argument in serai-client for p2p_validators call
* Add a test in serai-client to check DHT population with a much quicker failure than the coordinator tests
* Update to latest Substrate
Removes distinguishing BABE/AuthorityDiscovery keys which causes
sc_authority_discovery to populate as desired.
* Update to a properly tagged substrate commit
* Add all dialed to peers to GossipSub
* cargo fmt
* Reduce common code in serai-coordinator-tests with amore involved new_test
* Use a recursive async function to spawn `n` DockerTests with the necessary networking configuration
* Merge UNIQUE_ID and ONE_AT_A_TIME
* Tidy up the new recursive code in tests/coordinator
* Use a Mutex in CONTEXT to let it be set multiple times
* Make complimentary edits to full-stack tests
* Augment coordinator P2p connection logs
* Drop lock acquisitions before recursing
* Better scope lock acquisitions in full-stack, preventing a deadlock
* Ensure OUTER_OPS is reset across the test boundary
* Add cargo deny allowance for dockertest fork
Slight downscope which helps combat the antipattern which is the futures glob
crate. While futures_util is still a large crate, it has better defaults and
is smaller by virtue of not pulling the executor.
* Move logic for evaluating if a cosign should occur to its own file
Cleans it up and makes it more robust.
* Have expected_next_batch return an error instead of retrying
While convenient to offer an error-free implementation, it potentially caused
very long lived lock acquisitions in handle_processor_message.
* Unify and clean DkgConfirmer and DkgRemoval
Does so via adding a new file for the common code, SigningProtocol.
Modifies from_cache to return the preprocess with the machine, as there's no
reason not to. Also removes an unused Result around the type.
Clarifies the security around deterministic nonces, removing them for
saved-to-disk cached preprocesses. The cached preprocesses are encrypted as the
DB is not a proper secret store.
Moves arguments always present in the protocol from function arguments into the
struct itself.
Removes the horribly ugly code in DkgRemoval, fixing multiple issues present
with it which would cause it to fail on use.
* Set SeraiBlockNumber in cosign.rs as it's used by the cosigning protocol
* Remove unnecessary Clone from lambdas in coordinator
* Remove the EventDb from Tributary scanner
We used per-Transaction DB TXNs so on error, we don't have to rescan the entire
block yet only the rest of it. We prevented scanning multiple transactions by
tracking which we already had.
This is over-engineered and not worth it.
* Implement borsh for HasEvents, removing the manual encoding
* Merge DkgConfirmer and DkgRemoval into signing_protocol.rs
Fixes a bug in DkgConfirmer which would cause it to improperly handle indexes
if any validator had multiple key shares.
* Strictly type DataSpecification's Label
* Correct threshold_i_map_to_keys_and_musig_i_map
It didn't include the participant's own index and accordingly was offset.
* Create TributaryBlockHandler
This struct contains all variables prior passed to handle_block and stops them
from being passed around again and again.
This also ensures fatal_slash is only called while handling a block, as needed
as it expects to operate under perfect consensus.
* Inline accumulate, store confirmation nonces with shares
Inlining accumulate makes sense due to the amount of data accumulate needed to
be passed.
Storing confirmation nonces with shares ensures that both are available or
neither. Prior, one could be yet the other may not have been (requiring an
assert in runtime to ensure we didn't bungle it somehow).
* Create helper functions for handling DkgRemoval/SubstrateSign/Sign Tributary TXs
* Move Label into SignData
All of our transactions which use SignData end up with the same common usage
pattern for Label, justifying this.
Removes 3 transactions, explicitly de-duplicating their handlers.
* Remove CurrentlyCompletingKeyPair for the non-contextual DkgKeyPair
* Remove the manual read/write for TributarySpec for borsh
This struct doesn't have any optimizations booned by the manual impl. Using
borsh reduces our scope.
* Use temporary variables to further minimize LoC in tributary handler
* Remove usage of tuples for non-trivial Tributary transactions
* Remove serde from dkg
serde could be used to deserialize intenrally inconsistent objects which could
lead to panics or faults.
The BorshDeserialize derives have been replaced with a manual implementation
which won't produce inconsistent objects.
* Abstract Future generics using new trait definitions in coordinator
* Move published_signed_transaction to tributary/mod.rs to reduce the size of main.rs
* Split coordinator/src/tributary/mod.rs into spec.rs and transaction.rs
* Use redb and in Dockerfiles
The motivation for redb was to remove the multiple rocksdb compile times from
CI.
* Correct feature flagging of coordinator and message-queue in Dockerfiles
* Correct message-queue DB type alias
* Use consistent table typing in redb
* Correct rebase artifacts
* Correct removal of binaries feature from message-queue
* Correct processor feature flagging
* Replace redb with parity-db
It still has much better compile times yet doesn't block when creating multiple
transactions. It also is actively maintained and doesn't grow our tree. The MPT
aspects are irrelevant.
* Correct stray Redb
* clippy warning
* Correct txn get
* Add SignalsConfig to chain_spec
* Correct multiexp feature flagging for rand_core std
* Remove bincode for borsh
Replaces a non-canonical encoding with a canonical encoding which additionally
should be faster.
Also fixes an issue where we used bincode in transcripts where it cannot be
trusted.
This ended up fixing a myriad of other bugs observed, unfortunately.
Accordingly, it either has to be merged or the bug fixes from it must be ported
to a new PR.
* Make serde optional, minimize usage
* Make borsh an optional dependency of substrate/ crates
* Remove unused dependencies
* Use [u8; 64] where possible in the processor messages
* Correct borsh feature flagging
* Add a function to deterministically decide which Serai blocks should be co-signed
Has a 5 minute latency between co-signs, also used as the maximal latency
before a co-sign is started.
* Get all active tributaries we're in at a specific block
* Add and route CosignSubstrateBlock, a new provided TX
* Split queued cosigns per network
* Rename BatchSignId to SubstrateSignId
* Add SubstrateSignableId, a meta-type for either Batch or Block, and modularize around it
* Handle the CosignSubstrateBlock provided TX
* Revert substrate_signer.rs to develop (and patch to still work)
Due to SubstrateSigner moving when the prior multisig closes, yet cosigning
occurring with the most recent key, a single SubstrateSigner can be reused.
We could manage multiple SubstrateSigners, yet considering the much lower
specifications for cosigning, I'd rather treat it distinctly.
* Route cosigning through the processor
* Add note to rename SubstrateSigner post-PR
I don't want to do so now in order to preserve the diff's clarity.
* Implement cosign evaluation into the coordinator
* Get tests to compile
* Bug fixes, mark blocks without cosigners available as cosigned
* Correct the ID Batch preprocesses are saved under, add log statements
* Create a dedicated function to handle cosigns
* Correct the flow around Batch verification/queueing
Verifying `Batch`s could stall when a `Batch` was signed before its
predecessors/before the block it's contained in was cosigned (the latter being
inevitable as we can't sign a block containing a signed batch before signing
the batch).
Now, Batch verification happens on a distinct async task in order to not block
the handling of processor messages. This task is the sole caller of verify in
order to ensure last_verified_batch isn't unexpectedly mutated.
When the processor message handler needs to access it, or needs to queue a
Batch, it associates the DB TXN with a lock preventing the other task from
doing so.
This lock, as currently implemented, is a poor and inefficient design. It
should be modified to the pattern used for cosign management. Additionally, a
new primitive of a DB-backed channel may be immensely valuable.
Fixes a standing potential deadlock and a deadlock introduced with the
cosigning protocol.
* Working full-stack tests
After the last commit, this only required extending a timeout.
* Replace "co-sign" with "cosign" to make finding text easier
* Update the coordinator tests to support cosigning
* Inline prior_batch calculation to prevent panic on rotation
Noticed when doing a final review of the branch.
* chore: convert nonce_deicer to use create_db macro
* Restore pub NonceDecider
* Remove extraneous comma
I forgot to run git commit --amend on the prior commit :/
---------
Co-authored-by: Luke Parker <lukeparker5132@gmail.com>
If a crate has std set, it should enable std for all dependencies in order to
let them properly select which algorithms to use. Some crates fallback to
slower/worse algorithms on no-std.
Also more aggressively sets default-features = false leading to a *10%*
reduction in the amount of crates coordinator builds.
It's largely unoptimized, and not yet exclusive to validators, yet has basic
sanity (using message content for ID instead of sender + index).
Fixes bugs as found. Notably, we used a time in milliseconds where the
Tributary expected seconds.
Also has Tributary::new jump to the presumed round number. This reduces slashes
when starting new chains (whose times will be before the current time) and was
the only way I was able to observe successful confirmations given current
surrounding infrastructure.
This defines the tart of a very complex series of locks I'm really unhappy
with. At the same time, there's not immediately a better solution. This also
should work without issue.
Removes last_block as an argument from Tendermint. It now loads from the DB as
needed. While slightly less performant, it's easiest and should be fine.
Impls a LocalP2p for testing.
Moves rebroadcasting into Tendermint, since it's what knows if a message is
fully valid + original.
Removes TributarySpec::validators() HashMap, as its non-determinism caused
different instances to have different round robin schedules. It was already
prior moved to a Vec for this issue, so I'm unsure why this remnant existed.
Also renames the GH no-std workflow from the prior commit.
It originally wasn't an enum so software which had yet to update before an
integration wouldn't error (as now enums are strictly typed). The strict typing
is preferable though.