Commit graph

317 commits

Author SHA1 Message Date
Luke Parker
93ba8d840a
Remove cbor 2024-04-23 09:31:33 -04:00
Luke Parker
485e454680
Inline broadcast_raw now that it doesn't have multiple callers 2024-04-23 09:31:17 -04:00
Luke Parker
c3b6abf020
Properly diversify ReqResMessageKind/GossipMessageKind 2024-04-23 09:31:09 -04:00
Luke Parker
f3ccf1cab0
Move keep alive, heartbeat, block to request/response 2024-04-23 09:30:58 -04:00
Luke Parker
0deee0ec6b
Line for prior commit 2024-04-21 08:55:50 -04:00
Luke Parker
6b428948d4
Comment the insanely aggressive timeout future trace log 2024-04-21 08:55:32 -04:00
Luke Parker
6986257d4f
Add missing continue to prevent dialing a node we're connected to 2024-04-21 08:37:06 -04:00
Luke Parker
a3c37cba21
Replace expect with debug log 2024-04-21 08:03:01 -04:00
Luke Parker
b5f2ff1397
Correct boolean NOT on is_fresh_dial 2024-04-21 07:30:18 -04:00
Luke Parker
c84931c6ae
Retry if initial dials fail, not just upon disconnect 2024-04-21 07:26:29 -04:00
Luke Parker
63abf2d022
Restart coordinator peer finding upon disconnections 2024-04-21 07:03:03 -04:00
Luke Parker
a62d2d05ad
Correct log which didn't work as intended 2024-04-20 19:55:17 -04:00
Luke Parker
967cc16748
Correct log targets in tendermint-machine 2024-04-20 19:55:06 -04:00
Luke Parker
ab4b8cc2d5
Better logs in tendermint-machine 2024-04-20 18:13:57 -04:00
Luke Parker
387ccbad3a
Extend time in sync test 2024-04-18 16:39:16 -04:00
Luke Parker
26cdfdd824
fmt 2024-04-18 16:39:03 -04:00
Luke Parker
68e77384ac
Don't broadcast added blocks
Online validators should inherently have them. Offline validators will receive
from the sync protocol.

This does somewhat eliminate the class of nodes who would follow the blockchain
(without validating it), yet that's fine for the performance benefit.
2024-04-18 16:38:52 -04:00
Luke Parker
68da88c1f3
Only reply to heartbeats after a certain distance 2024-04-18 16:38:43 -04:00
Luke Parker
2b481ab71e
Ensure we don't reply to stale heartbeats 2024-04-18 16:38:21 -04:00
Luke Parker
05e6d81948
Only have some nodes respond to latent heartbeats
Also only respond if they're more than 2 blocks behind to minimize redundant
sending of blocks.
2024-04-18 16:38:16 -04:00
Luke Parker
10124ac4a8
Add Testnet 2 Config
Starts Tuesday, April 16th, with confirmed keys/boot nodes.
2024-04-11 15:49:32 -04:00
Luke Parker
bc44fbdbac
Add TODO to coordinator P2P 2024-03-23 23:32:21 -04:00
Luke Parker
4cacce5e55
Perform key share amortization on-chain to avoid discrepancies 2024-03-23 23:32:14 -04:00
Luke Parker
b7d49af1d5
Track total peer count in the coordinator 2024-03-23 18:02:48 -04:00
Luke Parker
4914420a37
Don't add as an explicit peer if already connected 2024-03-22 23:51:51 -04:00
Luke Parker
f11a08c436
Peer finding which won't get stuck on one specific network 2024-03-22 23:47:43 -04:00
Luke Parker
35b58a45bd
Split peer finding into a dedicated task 2024-03-22 23:40:15 -04:00
Luke Parker
af9b1ad5f9
Initial pruning of backlogged consensus messages 2024-03-22 23:18:53 -04:00
Luke Parker
2f07d04d88
Extend timeout for rebroadcast of consensus messages in coordinator 2024-03-22 16:06:31 -04:00
Luke Parker
0889627e60
Typo fix for prior commit 2024-03-11 02:20:51 -04:00
Luke Parker
ace41c79fd
Tidy the BlockHasEvents cache 2024-03-11 01:44:00 -04:00
Luke Parker
f7d16b3fc5
Fix 0 - 1 which caused a panic 2024-03-09 05:37:41 -05:00
Luke Parker
6374d9987e
Correct how we save the block to scan from 2024-03-09 03:48:44 -05:00
Luke Parker
c93f6bf901
Replace yield_now with sleep 100 to prevent hammering a task, despite still being over-eager 2024-03-09 03:34:31 -05:00
Luke Parker
61a81e53e1
Further optimize cosign DB 2024-03-09 03:31:06 -05:00
Luke Parker
89b237af7e
Correct the return value of block_has_events 2024-03-09 02:44:04 -05:00
Luke Parker
2347bf5fd3
Bound cosign work and ensure it progress forward even when cosigns don't occur
Should resolve the DB load observed on testnet.
2024-03-09 02:20:23 -05:00
Luke Parker
454bebaa77
Have the TendermintMachine domain-separate by genesis
Enbables support for multiple machines over the same DB.
2024-03-08 01:22:02 -05:00
Luke Parker
e266bc2e32
Stop validators from equivocating on reboot
Part of https://github.com/serai-dex/serai/issues/345.

The lack of full DB persistence does mean enough nodes rebooting at the same
time may cause a halt. This will prevent slashes.
2024-03-07 22:56:35 -05:00
Luke Parker
f0694172ef
Fix potential generation of invalid SignData in shim 2024-02-09 02:52:08 -05:00
akildemir
347d4cf413
Fix tendermint distinct precommit bug (#517)
* fix tendermint distinct precommit bug

* remove conflicting precommit error
2024-02-08 13:47:37 -05:00
akildemir
ad0ecc5185
complete various todos in tributary (#520)
* complete various todos

* fix pr comments

* Document bounds on unique hashes in TransactionKind

---------

Co-authored-by: Luke Parker <lukeparker5132@gmail.com>
2024-02-05 03:50:55 -05:00
Luke Parker
4913873b10
Slash reports (#523)
* report_slashes plumbing in Substrate

Notably delays the SetRetired event until it provides a slash report or the set
after it becomes the set to report its slashes.

* Add dedicated AcceptedHandover event

* Add SlashReport TX to Tributary

* Create SlashReport TXs

* Handle SlashReport TXs

* Add logic to generate a SlashReport to the coordinator

* Route SlashReportSigner into the processor

* Finish routing the SlashReport signing/TX publication

* Add serai feature to processor's serai-client
2024-01-29 03:48:53 -05:00
Luke Parker
f3429ec1ef
Inside publish (for a Serai transaction from the coordinator), use RetiredDb over latest session
Not only is this more performant, the definition of retired won't be if a newer
session is active. It will be if the session has posted a slash report or the
stake for that session has unlocked.

Initial commit towards implementing SlashReports.
2024-01-05 23:40:15 -05:00
Luke Parker
7eb388e546
PR to track down CI failures (#501)
* Use an extended timeout for DKGs specifically

* Add a log statement when message-queue connection fails

* Add a 60 second keep-alive to connections

* Use zalloc for processor/message-queue/coordinator

An additional layer which protects us against edge cases with Zeroizing
(objects which don't support it or don't miss it).

* Add further logs to message-queue

* Further increase re-attempt timeouts in CI

* Remove misplaced continue inmessage-queue client

Fixes observed CI failures.

* Revert "Further increase re-attempt timeouts in CI"

This reverts commit 3723530cf6.
2024-01-04 01:08:13 -05:00
Luke Parker
02776c54a8
Increase reattempt delays in the GH CI, which is extremely latent 2023-12-30 22:11:04 -05:00
Luke Parker
ec8dfd4639
Correct SignData serialization test from creating 256 signers of data
This overflows the u8 allowed and caused a CI failure. The actual
code/assumption is fine.
2023-12-30 19:08:29 -05:00
Luke Parker
b493e3e31f
Validator DHT (#494)
* Route validators for any active set through sc-authority-discovery

Additionally adds an RPC route to retrieve their P2P addresses.

* Have the coordinator get peers from substrate

* Have the RPC return one address, not up to 3

Prevents the coordinator from believing it has 3 peers when it has one.

* Add missing feature to serai-client

* Correct network argument in serai-client for p2p_validators call

* Add a test in serai-client to check DHT population with a much quicker failure than the coordinator tests

* Update to latest Substrate

Removes distinguishing BABE/AuthorityDiscovery keys which causes
sc_authority_discovery to populate as desired.

* Update to a properly tagged substrate commit

* Add all dialed to peers to GossipSub

* cargo fmt

* Reduce common code in serai-coordinator-tests with amore involved new_test

* Use a recursive async function to spawn `n` DockerTests with the necessary networking configuration

* Merge UNIQUE_ID and ONE_AT_A_TIME

* Tidy up the new recursive code in tests/coordinator

* Use a Mutex in CONTEXT to let it be set multiple times

* Make complimentary edits to full-stack tests

* Augment coordinator P2p connection logs

* Drop lock acquisitions before recursing

* Better scope lock acquisitions in full-stack, preventing a deadlock

* Ensure OUTER_OPS is reset across the test boundary

* Add cargo deny allowance for dockertest fork
2023-12-22 21:09:18 -05:00
Luke Parker
00774c29d7
Replace remaining direct uses of futures with futures_util
Slight downscope which helps combat the antipattern which is the futures glob
crate. While futures_util is still a large crate, it has better defaults and
is smaller by virtue of not pulling the executor.
2023-12-18 19:45:08 -05:00
Luke Parker
a4c82632fb
Use pub(crate) for create_db items, not pub 2023-12-18 17:15:02 -05:00