serai

mirror of https://github.com/serai-dex/serai.git synced 2024-12-23 03:59:22 +00:00

Author	SHA1	Message	Date
Luke Parker	af9b1ad5f9	Initial pruning of backlogged consensus messages	2024-03-22 23:18:53 -04:00
Luke Parker	2f07d04d88	Extend timeout for rebroadcast of consensus messages in coordinator	2024-03-22 16:06:31 -04:00
Luke Parker	0889627e60	Typo fix for prior commit	2024-03-11 02:20:51 -04:00
Luke Parker	ace41c79fd	Tidy the BlockHasEvents cache	2024-03-11 01:44:00 -04:00
Luke Parker	f7d16b3fc5	Fix 0 - 1 which caused a panic	2024-03-09 05:37:41 -05:00
Luke Parker	6374d9987e	Correct how we save the block to scan from	2024-03-09 03:48:44 -05:00
Luke Parker	c93f6bf901	Replace yield_now with sleep 100 to prevent hammering a task, despite still being over-eager	2024-03-09 03:34:31 -05:00
Luke Parker	61a81e53e1	Further optimize cosign DB	2024-03-09 03:31:06 -05:00
Luke Parker	89b237af7e	Correct the return value of block_has_events	2024-03-09 02:44:04 -05:00
Luke Parker	2347bf5fd3	Bound cosign work and ensure it progress forward even when cosigns don't occur Should resolve the DB load observed on testnet.	2024-03-09 02:20:23 -05:00
Luke Parker	454bebaa77	Have the TendermintMachine domain-separate by genesis Enbables support for multiple machines over the same DB.	2024-03-08 01:22:02 -05:00
Luke Parker	e266bc2e32	Stop validators from equivocating on reboot Part of https://github.com/serai-dex/serai/issues/345. The lack of full DB persistence does mean enough nodes rebooting at the same time may cause a halt. This will prevent slashes.	2024-03-07 22:56:35 -05:00
Luke Parker	f0694172ef	Fix potential generation of invalid SignData in shim	2024-02-09 02:52:08 -05:00
akildemir	347d4cf413	Fix tendermint distinct precommit bug (#517 ) * fix tendermint distinct precommit bug * remove conflicting precommit error	2024-02-08 13:47:37 -05:00
akildemir	ad0ecc5185	complete various todos in tributary (#520 ) * complete various todos * fix pr comments * Document bounds on unique hashes in TransactionKind --------- Co-authored-by: Luke Parker <lukeparker5132@gmail.com>	2024-02-05 03:50:55 -05:00
Luke Parker	4913873b10	Slash reports (#523 ) * report_slashes plumbing in Substrate Notably delays the SetRetired event until it provides a slash report or the set after it becomes the set to report its slashes. * Add dedicated AcceptedHandover event * Add SlashReport TX to Tributary * Create SlashReport TXs * Handle SlashReport TXs * Add logic to generate a SlashReport to the coordinator * Route SlashReportSigner into the processor * Finish routing the SlashReport signing/TX publication * Add serai feature to processor's serai-client	2024-01-29 03:48:53 -05:00
Luke Parker	f3429ec1ef	Inside publish (for a Serai transaction from the coordinator), use RetiredDb over latest session Not only is this more performant, the definition of retired won't be if a newer session is active. It will be if the session has posted a slash report or the stake for that session has unlocked. Initial commit towards implementing SlashReports.	2024-01-05 23:40:15 -05:00
Luke Parker	7eb388e546	PR to track down CI failures (#501 ) * Use an extended timeout for DKGs specifically * Add a log statement when message-queue connection fails * Add a 60 second keep-alive to connections * Use zalloc for processor/message-queue/coordinator An additional layer which protects us against edge cases with Zeroizing (objects which don't support it or don't miss it). * Add further logs to message-queue * Further increase re-attempt timeouts in CI * Remove misplaced continue inmessage-queue client Fixes observed CI failures. * Revert "Further increase re-attempt timeouts in CI" This reverts commit `3723530cf6`.	2024-01-04 01:08:13 -05:00
Luke Parker	02776c54a8	Increase reattempt delays in the GH CI, which is extremely latent	2023-12-30 22:11:04 -05:00
Luke Parker	ec8dfd4639	Correct SignData serialization test from creating 256 signers of data This overflows the u8 allowed and caused a CI failure. The actual code/assumption is fine.	2023-12-30 19:08:29 -05:00
Luke Parker	b493e3e31f	Validator DHT (#494 ) * Route validators for any active set through sc-authority-discovery Additionally adds an RPC route to retrieve their P2P addresses. * Have the coordinator get peers from substrate * Have the RPC return one address, not up to 3 Prevents the coordinator from believing it has 3 peers when it has one. * Add missing feature to serai-client * Correct network argument in serai-client for p2p_validators call * Add a test in serai-client to check DHT population with a much quicker failure than the coordinator tests * Update to latest Substrate Removes distinguishing BABE/AuthorityDiscovery keys which causes sc_authority_discovery to populate as desired. * Update to a properly tagged substrate commit * Add all dialed to peers to GossipSub * cargo fmt * Reduce common code in serai-coordinator-tests with amore involved new_test * Use a recursive async function to spawn `n` DockerTests with the necessary networking configuration * Merge UNIQUE_ID and ONE_AT_A_TIME * Tidy up the new recursive code in tests/coordinator * Use a Mutex in CONTEXT to let it be set multiple times * Make complimentary edits to full-stack tests * Augment coordinator P2p connection logs * Drop lock acquisitions before recursing * Better scope lock acquisitions in full-stack, preventing a deadlock * Ensure OUTER_OPS is reset across the test boundary * Add cargo deny allowance for dockertest fork	2023-12-22 21:09:18 -05:00
Luke Parker	00774c29d7	Replace remaining direct uses of futures with futures_util Slight downscope which helps combat the antipattern which is the futures glob crate. While futures_util is still a large crate, it has better defaults and is smaller by virtue of not pulling the executor.	2023-12-18 19:45:08 -05:00
Luke Parker	a4c82632fb	Use pub(crate) for create_db items, not pub	2023-12-18 17:15:02 -05:00
Luke Parker	c8747e23c5	Remove offline participants from future DKG protocols so long as the threshold is met Makes RemoveParticipantDueToDkg a voted-on event instead of a Provided. This removes the requirement for offline parties to be able to fully validate blame, yet unfortunately lets an dishonest supermajority have an honest node label any arbitrary node as dishonest. Corrects a variety of `.i(...)` calls which panicked when they shouldn't have. Cleans up a couple no-longer-used storage values.	2023-12-18 17:14:51 -05:00
Luke Parker	c2fffb9887	Correct a couple years of accumulated typos	2023-12-17 02:06:51 -05:00
Luke Parker	065d314e2a	Further expand clippy workspace lints Achieves a notable amount of reduced async and clones.	2023-12-17 00:04:49 -05:00
Luke Parker	ea3af28139	Add workspace lints	2023-12-17 00:04:47 -05:00
Luke Parker	2532423d42	Remove the RemoveParticipant protocol for having new DKGs specify the participants which were removed Obvious code cleanup is obvious.	2023-12-14 23:51:57 -05:00
Luke Parker	b60e3c2524	Replace PSTTrait and PstTxType with PublishSeraiTransaction	2023-12-14 16:06:08 -05:00
Luke Parker	77edd00725	Handle the combination of DKG removals with re-attempts With a DKG removal comes a reduction in the amount of participants which was ignored by re-attempts. Now, we determine n/i based on the parties removed, and deterministically obtain the context of who was removd.	2023-12-13 14:03:07 -05:00
Luke Parker	6a172825aa	Reattempts (#483 ) * Schedule re-attempts and add a (not filled out) match statement to actually execute them A comment explains the methodology. To copy it here: """ This is because we always re-attempt any protocol which had participation. That doesn't mean we should re-attempt this protocol. The alternatives were: 1) Note on-chain we completed a protocol, halting re-attempts upon 34%. 2) Vote on-chain to re-attempt a protocol. This schema doesn't have any additional messages upon the success case (whereas alternative #1 does) and doesn't have overhead (as alternative #2 does, sending votes and then preprocesses. This only sends preprocesses). """ Any signing protocol which reaches sufficient participation will be re-attempted until it no longer does. * Have the Substrate scanner track DKG removals/completions for the Tributary code * Don't keep trying to publish a participant removal if we've already set keys * Pad out the re-attempt match a bit more * Have CosignEvaluator reload from the DB * Correctly schedule cosign re-attempts * Actuall spawn new DKG removal attempts * Use u32 for Batch ID in SubstrateSignableId, finish Batch re-attempt routing The batch ID was an opaque [u8; 5] which also included the network, yet that's redundant and unhelpful. * Clarify a pair of TODOs in the coordinator * Remove old TODO * Final comment cleanup * Correct usage of TARGET_BLOCK_TIME in reattempt scheduler It's in ms and I assumed it was in s. * Have coordinator tests drop BatchReattempts which aren't relevant yet may exist * Bug fix and pointless oddity removal We scheduled a re-attempt upon receiving 2/3rds of preprocesses and upon receiving 2/3rds of shares, so any signing protocol could cause two re-attempts (not one more). The coordinator tests randomly generated the Batch ID since it was prior an opaque byte array. While that didn't break the test, it was pointless and did make the already-succeeded check before re-attempting impossible to hit. * Add log statements, correct dead-lock in coordinator tests * Increase pessimistic timeout on recv_message to compensate for tighter best-case timeouts * Further bump timeout by a minute AFAICT, GH failed by just a few seconds. This also is worst-case in a single instance, making it fine to be decently long. * Further further bump timeout due to lack of distinct error	2023-12-12 12:28:53 -05:00
Luke Parker	11fdb6da1d	Coordinator Cleanup (#481 ) * Move logic for evaluating if a cosign should occur to its own file Cleans it up and makes it more robust. * Have expected_next_batch return an error instead of retrying While convenient to offer an error-free implementation, it potentially caused very long lived lock acquisitions in handle_processor_message. * Unify and clean DkgConfirmer and DkgRemoval Does so via adding a new file for the common code, SigningProtocol. Modifies from_cache to return the preprocess with the machine, as there's no reason not to. Also removes an unused Result around the type. Clarifies the security around deterministic nonces, removing them for saved-to-disk cached preprocesses. The cached preprocesses are encrypted as the DB is not a proper secret store. Moves arguments always present in the protocol from function arguments into the struct itself. Removes the horribly ugly code in DkgRemoval, fixing multiple issues present with it which would cause it to fail on use. * Set SeraiBlockNumber in cosign.rs as it's used by the cosigning protocol * Remove unnecessary Clone from lambdas in coordinator * Remove the EventDb from Tributary scanner We used per-Transaction DB TXNs so on error, we don't have to rescan the entire block yet only the rest of it. We prevented scanning multiple transactions by tracking which we already had. This is over-engineered and not worth it. * Implement borsh for HasEvents, removing the manual encoding * Merge DkgConfirmer and DkgRemoval into signing_protocol.rs Fixes a bug in DkgConfirmer which would cause it to improperly handle indexes if any validator had multiple key shares. * Strictly type DataSpecification's Label * Correct threshold_i_map_to_keys_and_musig_i_map It didn't include the participant's own index and accordingly was offset. * Create TributaryBlockHandler This struct contains all variables prior passed to handle_block and stops them from being passed around again and again. This also ensures fatal_slash is only called while handling a block, as needed as it expects to operate under perfect consensus. * Inline accumulate, store confirmation nonces with shares Inlining accumulate makes sense due to the amount of data accumulate needed to be passed. Storing confirmation nonces with shares ensures that both are available or neither. Prior, one could be yet the other may not have been (requiring an assert in runtime to ensure we didn't bungle it somehow). * Create helper functions for handling DkgRemoval/SubstrateSign/Sign Tributary TXs * Move Label into SignData All of our transactions which use SignData end up with the same common usage pattern for Label, justifying this. Removes 3 transactions, explicitly de-duplicating their handlers. * Remove CurrentlyCompletingKeyPair for the non-contextual DkgKeyPair * Remove the manual read/write for TributarySpec for borsh This struct doesn't have any optimizations booned by the manual impl. Using borsh reduces our scope. * Use temporary variables to further minimize LoC in tributary handler * Remove usage of tuples for non-trivial Tributary transactions * Remove serde from dkg serde could be used to deserialize intenrally inconsistent objects which could lead to panics or faults. The BorshDeserialize derives have been replaced with a manual implementation which won't produce inconsistent objects. * Abstract Future generics using new trait definitions in coordinator * Move published_signed_transaction to tributary/mod.rs to reduce the size of main.rs * Split coordinator/src/tributary/mod.rs into spec.rs and transaction.rs	2023-12-10 20:21:44 -05:00
Luke Parker	6caf45ea1d	Downscope usage of futures	2023-12-10 19:32:52 -05:00
Luke Parker	7122e0faf4	Cache the block's events within TemporalSerai Event retrieval was prior: - Retrieve all events in the block, which may be hundreds of KB - Filter to just a few Since it's frequent to want multiple sets of events, each filtered in their own way, this caused the retrieval to happen multiple times. Now, it only will happen once. Also has the scoped clients take a reference, not an owned TemporalSerai.	2023-12-08 10:46:10 -05:00
David Bell	16b22dd105	Convert coordinator/substrate/db to use create_db macro (#436 ) * chore: implement create_db for substrate (fix broken branch) * Correct rebase artifacts * chore: remove todo statement * chore: rename BlockDb to NextBlock * chore: return empty tuple instead of empty array for event storage * Finish rebasing * .Minor tweaks to remove leftover variables These may be rebase artifacts. --------- Co-authored-by: Luke Parker <lukeparker5132@gmail.com>	2023-12-08 05:12:16 -05:00
Luke Parker	5c047ebe74	Log the reason for yielding BlockError::Fatal to Tendermint from the Tributary	2023-12-07 09:30:25 -05:00
econsta	91a024e119	coordinator/src/db.rs db macro implimentation (#431 ) * coordinator/src/db.rs db macro implimentation * fixed fmt errors * converted txn functions to get/set counterparts * use take_signed_transaction function * fix for two fo the tests * Misc tweaks * Minor tweaks --------- Co-authored-by: Luke Parker <lukeparker5132@gmail.com>	2023-12-07 09:30:11 -05:00
Luke Parker	c511a54d18	Move serai-client off serai-runtime, MIT licensing it Uses a full-fledged serai-abi to do so. Removes use of UncheckedExtrinsic as a pointlessly (for us) length-prefixed block with a more complicated signing algorithm than advantageous. In the future, we should considering consolidating the various primitives crates. I'm not convinced we benefit from one primitives crate per pallet.	2023-12-07 02:30:09 -05:00
Luke Parker	797ed49e7b	DKG Removals (#467 ) * Update ValidatorSets with a remove_participant call * Add DkgRemoval, a sign machine for producing the relevant MuSig signatures * Don't use position-dependent u8s yet Public when removing validators from the DKG * Add DkgRemovalPreprocess, DkgRemovalShares Implementation is via a new publish_tributary_tx lambda. This is code is a copy-pasted mess which will need to be cleaned up. * Only allow non-removed validators to vote for removals Otherwise, it's risked that the remaining validators fall below 67% of the original set. * Correct publish_serai_tx, which was prior publish_set_keys in practice	2023-12-04 07:04:44 -05:00
Luke Parker	6e8a5f9cb1	cargo update, remove unneeded dependencies from the processor	2023-12-03 00:05:03 -05:00
Luke Parker	4446a369b1	Remove old TODO	2023-12-03 00:04:58 -05:00
Luke Parker	ce038972df	Use as_slice instead of as_ref I don't know why this didn't trigger for me. Potentially a difference in the month of the nightly clippy?	2023-12-03 00:04:58 -05:00
Luke Parker	2f6fb93f87	Bridge the gap between the prior two commits	2023-12-03 00:04:58 -05:00
Luke Parker	1e6cb8044c	Domain Separate the coordinator's tributary transaction hashes	2023-12-03 00:04:58 -05:00
Luke Parker	1ca66b846a	Use multiple nonces in the Tributary	2023-12-03 00:04:58 -05:00
Luke Parker	c82d1283af	Move the coordinator to expect multiple nonces This mirrors how Provided TXs handle topics. Now, instead of managing a global nonce stream, we can use items such as plan IDs as topics. This massively benefits re-attempts, as else we'd need a NOP TX to clear unused nonces.	2023-12-03 00:04:58 -05:00
Luke Parker	b823413c9b	Use parity-db in current Dockerfiles (#455 ) * Use redb and in Dockerfiles The motivation for redb was to remove the multiple rocksdb compile times from CI. * Correct feature flagging of coordinator and message-queue in Dockerfiles * Correct message-queue DB type alias * Use consistent table typing in redb * Correct rebase artifacts * Correct removal of binaries feature from message-queue * Correct processor feature flagging * Replace redb with parity-db It still has much better compile times yet doesn't block when creating multiple transactions. It also is actively maintained and doesn't grow our tree. The MPT aspects are irrelevant. * Correct stray Redb * clippy warning * Correct txn get	2023-11-30 04:22:37 -05:00
Luke Parker	f0ff3a18d2	Use debug builds in our Dockerfiles to reduce CI times (#462 ) * Use debug builds in our Dockerfiles to reduce CI times Also enables only spawning the mdns service when debug in the coordinator. * Correct underflow in processor Prior undetected due to relase builds not having bounds checks enabled. * Restore Serai release due to CI/RPC failures caused by compiling it in debug mode This is probably worth an issue filed upstream, if it can be tracked down. * Correct failing debug asserts in Monero These debug asserts assumed there was a change address to take the remainder. If there's no change address, the remainder is shunted to the fee, causing the fee to be distinct from the estimate. We presumably need to modify monero-serai such that change: None isn't valid, and users must use Change::Fingerprintable(None).	2023-11-29 00:24:37 -05:00
Luke Parker	695d1f0ecf	Remove subxt (#460 ) * Remove subxt Removes ~20 crates from our Cargo.lock. Removes downloading the metadata and enables removing the getMetadata RPC route (relevant to #379). Moves forward #337. Done now due to distinctions in the subxt 0.32 API surface which make it justifiable to not update. * fmt, update due to deny triggering on a yanked crate * Correct the handling of substrate_block_notifier now that it's ephemeral, not long-lived * Correct URL in tests/coordinator from ws to http	2023-11-28 02:29:50 -05:00
Luke Parker	571195bfda	Resolve #360 (#456 ) * Remove NetworkId from processor-messages Because intent binds to the sender/receiver, it's not needed for intent. The processor knows what the network is. The coordinator knows which to use because it's sending this message to the processor for that network. Also removes the unused zeroize. * ProcessorMessage::Completed use Session instead of key * Move SubstrateSignId to Session * Finish replacing key with session	2023-11-26 12:14:23 -05:00

1 2 3 4 5 ...

290 commits