serai

mirror of https://github.com/serai-dex/serai.git synced 2025-01-26 04:25:57 +00:00

Author	SHA1	Message	Date
Luke Parker	e01848aa9e	Correct boolean NOT on is_fresh_dial	2024-04-21 07:30:31 -04:00
Luke Parker	320b5627b5	Retry if initial dials fail, not just upon disconnect	2024-04-21 07:26:16 -04:00
Luke Parker	be7780e69d	Restart coordinator peer finding upon disconnections	2024-04-21 07:02:49 -04:00
Luke Parker	593aefd229	Extend time in sync test	2024-04-18 02:51:38 -04:00
Luke Parker	fea16df567	Only reply to heartbeats after a certain distance	2024-04-18 01:39:34 -04:00
Luke Parker	4960c3222e	Ensure we don't reply to stale heartbeats	2024-04-18 01:24:38 -04:00
Luke Parker	6b4df4f2c0	Only have some nodes respond to latent heartbeats Also only respond if they're more than 2 blocks behind to minimize redundant sending of blocks.	2024-04-17 21:54:10 -04:00
Luke Parker	bc44fbdbac	Add TODO to coordinator P2P	2024-03-23 23:32:21 -04:00
Luke Parker	b7d49af1d5	Track total peer count in the coordinator	2024-03-23 18:02:48 -04:00
Luke Parker	4914420a37	Don't add as an explicit peer if already connected	2024-03-22 23:51:51 -04:00
Luke Parker	f11a08c436	Peer finding which won't get stuck on one specific network	2024-03-22 23:47:43 -04:00
Luke Parker	35b58a45bd	Split peer finding into a dedicated task	2024-03-22 23:40:15 -04:00
Luke Parker	b493e3e31f	Validator DHT (#494 ) * Route validators for any active set through sc-authority-discovery Additionally adds an RPC route to retrieve their P2P addresses. * Have the coordinator get peers from substrate * Have the RPC return one address, not up to 3 Prevents the coordinator from believing it has 3 peers when it has one. * Add missing feature to serai-client * Correct network argument in serai-client for p2p_validators call * Add a test in serai-client to check DHT population with a much quicker failure than the coordinator tests * Update to latest Substrate Removes distinguishing BABE/AuthorityDiscovery keys which causes sc_authority_discovery to populate as desired. * Update to a properly tagged substrate commit * Add all dialed to peers to GossipSub * cargo fmt * Reduce common code in serai-coordinator-tests with amore involved new_test * Use a recursive async function to spawn `n` DockerTests with the necessary networking configuration * Merge UNIQUE_ID and ONE_AT_A_TIME * Tidy up the new recursive code in tests/coordinator * Use a Mutex in CONTEXT to let it be set multiple times * Make complimentary edits to full-stack tests * Augment coordinator P2p connection logs * Drop lock acquisitions before recursing * Better scope lock acquisitions in full-stack, preventing a deadlock * Ensure OUTER_OPS is reset across the test boundary * Add cargo deny allowance for dockertest fork	2023-12-22 21:09:18 -05:00
Luke Parker	00774c29d7	Replace remaining direct uses of futures with futures_util Slight downscope which helps combat the antipattern which is the futures glob crate. While futures_util is still a large crate, it has better defaults and is smaller by virtue of not pulling the executor.	2023-12-18 19:45:08 -05:00
Luke Parker	065d314e2a	Further expand clippy workspace lints Achieves a notable amount of reduced async and clones.	2023-12-17 00:04:49 -05:00
Luke Parker	ea3af28139	Add workspace lints	2023-12-17 00:04:47 -05:00
Luke Parker	6a172825aa	Reattempts (#483 ) * Schedule re-attempts and add a (not filled out) match statement to actually execute them A comment explains the methodology. To copy it here: """ This is because we always re-attempt any protocol which had participation. That doesn't mean we should re-attempt this protocol. The alternatives were: 1) Note on-chain we completed a protocol, halting re-attempts upon 34%. 2) Vote on-chain to re-attempt a protocol. This schema doesn't have any additional messages upon the success case (whereas alternative #1 does) and doesn't have overhead (as alternative #2 does, sending votes and then preprocesses. This only sends preprocesses). """ Any signing protocol which reaches sufficient participation will be re-attempted until it no longer does. * Have the Substrate scanner track DKG removals/completions for the Tributary code * Don't keep trying to publish a participant removal if we've already set keys * Pad out the re-attempt match a bit more * Have CosignEvaluator reload from the DB * Correctly schedule cosign re-attempts * Actuall spawn new DKG removal attempts * Use u32 for Batch ID in SubstrateSignableId, finish Batch re-attempt routing The batch ID was an opaque [u8; 5] which also included the network, yet that's redundant and unhelpful. * Clarify a pair of TODOs in the coordinator * Remove old TODO * Final comment cleanup * Correct usage of TARGET_BLOCK_TIME in reattempt scheduler It's in ms and I assumed it was in s. * Have coordinator tests drop BatchReattempts which aren't relevant yet may exist * Bug fix and pointless oddity removal We scheduled a re-attempt upon receiving 2/3rds of preprocesses and upon receiving 2/3rds of shares, so any signing protocol could cause two re-attempts (not one more). The coordinator tests randomly generated the Batch ID since it was prior an opaque byte array. While that didn't break the test, it was pointless and did make the already-succeeded check before re-attempting impossible to hit. * Add log statements, correct dead-lock in coordinator tests * Increase pessimistic timeout on recv_message to compensate for tighter best-case timeouts * Further bump timeout by a minute AFAICT, GH failed by just a few seconds. This also is worst-case in a single instance, making it fine to be decently long. * Further further bump timeout due to lack of distinct error	2023-12-12 12:28:53 -05:00
Luke Parker	f0ff3a18d2	Use debug builds in our Dockerfiles to reduce CI times (#462 ) * Use debug builds in our Dockerfiles to reduce CI times Also enables only spawning the mdns service when debug in the coordinator. * Correct underflow in processor Prior undetected due to relase builds not having bounds checks enabled. * Restore Serai release due to CI/RPC failures caused by compiling it in debug mode This is probably worth an issue filed upstream, if it can be tracked down. * Correct failing debug asserts in Monero These debug asserts assumed there was a change address to take the remainder. If there's no change address, the remainder is shunted to the fee, causing the fee to be distinct from the estimate. We presumably need to modify monero-serai such that change: None isn't valid, and users must use Change::Fingerprintable(None).	2023-11-29 00:24:37 -05:00
Luke Parker	1822e31142	Grow the yamux buffers to exceed the maximum message size	2023-11-21 02:01:41 -05:00
Luke Parker	74a8df4c7b	Add a new primitive of a DB-backed channel The coordinator already had one of these, albeit implemented much worse than the one now properly introduced. It had to either be sending or receiving, whereas the new one can do both at the same time. This replaces said instance and enables pleasant patterns when implementing the processor/coordinator.	2023-11-19 02:05:01 -05:00
Luke Parker	be48dcc4a4	Use multiple LibP2P topics We still need a peering protocol... Hopefully, we can read peers off of the Substrate node's DHT.	2023-11-18 20:37:55 -05:00
Luke Parker	ee50f584aa	Add saner log statements to the coordinator Disables trace on every single P2P message. Logs a short-form of each transaction.	2023-11-16 13:37:39 -05:00
Luke Parker	369af0fab5	\#339 addendum	2023-11-15 20:23:19 -05:00
Luke Parker	96f1d26f7a	Add a cosigning protocol to ensure finalizations are unique (#433 ) * Add a function to deterministically decide which Serai blocks should be co-signed Has a 5 minute latency between co-signs, also used as the maximal latency before a co-sign is started. * Get all active tributaries we're in at a specific block * Add and route CosignSubstrateBlock, a new provided TX * Split queued cosigns per network * Rename BatchSignId to SubstrateSignId * Add SubstrateSignableId, a meta-type for either Batch or Block, and modularize around it * Handle the CosignSubstrateBlock provided TX * Revert substrate_signer.rs to develop (and patch to still work) Due to SubstrateSigner moving when the prior multisig closes, yet cosigning occurring with the most recent key, a single SubstrateSigner can be reused. We could manage multiple SubstrateSigners, yet considering the much lower specifications for cosigning, I'd rather treat it distinctly. * Route cosigning through the processor * Add note to rename SubstrateSigner post-PR I don't want to do so now in order to preserve the diff's clarity. * Implement cosign evaluation into the coordinator * Get tests to compile * Bug fixes, mark blocks without cosigners available as cosigned * Correct the ID Batch preprocesses are saved under, add log statements * Create a dedicated function to handle cosigns * Correct the flow around Batch verification/queueing Verifying `Batch`s could stall when a `Batch` was signed before its predecessors/before the block it's contained in was cosigned (the latter being inevitable as we can't sign a block containing a signed batch before signing the batch). Now, Batch verification happens on a distinct async task in order to not block the handling of processor messages. This task is the sole caller of verify in order to ensure last_verified_batch isn't unexpectedly mutated. When the processor message handler needs to access it, or needs to queue a Batch, it associates the DB TXN with a lock preventing the other task from doing so. This lock, as currently implemented, is a poor and inefficient design. It should be modified to the pattern used for cosign management. Additionally, a new primitive of a DB-backed channel may be immensely valuable. Fixes a standing potential deadlock and a deadlock introduced with the cosigning protocol. * Working full-stack tests After the last commit, this only required extending a timeout. * Replace "co-sign" with "cosign" to make finding text easier * Update the coordinator tests to support cosigning * Inline prior_batch calculation to prevent panic on rotation Noticed when doing a final review of the branch.	2023-11-15 16:57:21 -05:00
Luke Parker	057c3b7cf1	libp2p 0.52.4 Adds several packages into tree, deprecates an API we use. This commit does update accordingly. While this may not be preferable, it is inevitable.	2023-10-19 00:27:21 -04:00
Luke Parker	e4adaa8947	Further tweaks re: retiry	2023-10-14 19:55:14 -04:00
Luke Parker	3b3fdd104b	Most of coordinator Tributary retiry Adds Event::SetRetired to validator-sets. Emit TributaryRetired. Replaces is_active_set, which made multiple network requests, with is_retired_tributary, a DB read. Performs most of the removals necessary upon TributaryRetired. Still needs to clean up the actual Tributary/Tendermint tasks.	2023-10-14 16:47:25 -04:00
Luke Parker	863a7842ca	Have every node respond to Heartbeat so they don't download the messages over the net	2023-10-14 15:27:40 -04:00
Luke Parker	f414735be5	Redo new_tributary from being over ActiveTributary to TributaryEvent TributaryEvent also allows broadcasting a retiry event.	2023-10-14 15:27:39 -04:00
Luke Parker	80e5ca9328	Move heartbeat_tributaries and handle_p2p to p2p.rs	2023-10-13 22:40:11 -04:00
Luke Parker	77f7794452	Remove lazy_static for proper use of channels	2023-09-25 18:23:52 -04:00
Luke Parker	06a6cd29b0	Set nodelay on coordinator's P2P sockets	2023-09-06 22:57:33 -04:00
Luke Parker	3af9dc5d6f	Tweak Heartbeat configuration so LibP2P can be expected to deliver messages within latency window	2023-08-31 01:33:52 -04:00
Luke Parker	1e79de87e8	Remove contention between LibP2p spawned task and consumers via channels	2023-08-30 23:31:09 -04:00
Luke Parker	dc88b29b92	Add keep-alive timeout to coordinator The Heartbeat was meant to serve for this, yet no Heartbeats are fired when we don't have active tributaries. libp2p does offer an explicit KeepAlive protocol, yet it's not recommended in prod. While this likely has the same pit falls as LibP2p's KeepAlive protocol, it's at least tailored to our timing.	2023-08-21 02:36:03 -04:00
Luke Parker	7e71450dc4	Bug fixes and log statements Also shims next nonce code with a fine-for-now piece of code which is unviable in production, yet should survive testnet.	2023-08-13 04:03:59 -04:00
Luke Parker	f6f945e747	Add a LibP2P instantiation to coordinator It's largely unoptimized, and not yet exclusive to validators, yet has basic sanity (using message content for ID instead of sender + index). Fixes bugs as found. Notably, we used a time in milliseconds where the Tributary expected seconds. Also has Tributary::new jump to the presumed round number. This reduces slashes when starting new chains (whose times will be before the current time) and was the only way I was able to observe successful confirmations given current surrounding infrastructure.	2023-08-08 15:12:47 -04:00
Luke Parker	2feebe536e	Test handle_p2p and Tributary syncing Includes bug fixes.	2023-04-24 03:30:19 -04:00
Luke Parker	14388e746c	Implement Tributary syncing Also adds a forwards-lookup to the Tributary blockchain.	2023-04-24 00:53:18 -04:00
Luke Parker	c476f9b640	Break coordinator main into multiple functions Also moves from std::sync::RwLock to tokio::sync::RwLock to prevent wasting cycles on spinning.	2023-04-23 23:15:15 -04:00
Luke Parker	05b1fc5f05	Send a heartbeat message when a Tributary falls behind	2023-04-23 18:55:43 -04:00
Luke Parker	ad5522d854	Start handling P2P messages This defines the tart of a very complex series of locks I'm really unhappy with. At the same time, there's not immediately a better solution. This also should work without issue.	2023-04-23 17:01:30 -04:00
Luke Parker	af84b7f707	Add a test for Tributary Further fleshes out the Tributary testing code.	2023-04-22 22:28:20 -04:00
Luke Parker	8c74576cf0	Add a test to the coordinator for running a Tributary Impls a LocalP2p for testing. Moves rebroadcasting into Tendermint, since it's what knows if a message is fully valid + original. Removes TributarySpec::validators() HashMap, as its non-determinism caused different instances to have different round robin schedules. It was already prior moved to a Vec for this issue, so I'm unsure why this remnant existed. Also renames the GH no-std workflow from the prior commit.	2023-04-22 10:49:52 -04:00
Luke Parker	79655672ef	Make progres on handling NewSet events Further bones out the coordinator.	2023-04-16 00:51:56 -04:00

45 commits