Commit graph

60 commits

Author SHA1 Message Date
Luke Parker
9bf24480f4
Spawn an async test per P2P message to try and resolve latency issues 2023-08-31 02:35:50 -04:00
Luke Parker
3af9dc5d6f
Tweak Heartbeat configuration so LibP2P can be expected to deliver messages within latency window 2023-08-31 01:33:52 -04:00
Luke Parker
148bc380fe
Add assert on edge case requiring a validator with 34% and a broken invariant 2023-08-31 01:08:40 -04:00
Luke Parker
8dad62f300
Fix panic when post-verifying Precommits in log
Notes an edge case which enables invalid commit production.
2023-08-31 01:02:57 -04:00
Luke Parker
2f57a69cb6
Define BLOCK_PROCESSING_TIME, LATENCY_TIME in ms
Updates Tributary values to allow 999ms for block processing (from 2s) and
1667ms for latency (up from 1s).

The intent is to resolve #365. I don't know if this will, but it increases the
chances of success and these values should be fine in prod since Tributary is a
post-execution chain (making block procesisng time minimal).

Does embed the dagger of N::block_time() panicking if the block time in ms
doesn't cleanly divide by 1000.
2023-08-30 22:58:42 -04:00
Luke Parker
2db53d5434
Use &self for handle_message and sync_block in Tributary
They used &mut self to prevent execution at the same time. This uses a lock
over the channel to achieve the same security, without requiring a lock over
the entire tributary.

This fixes post-provided Provided transactions. sync_block waited for the TX to
be provided, yet it never would as sync_block held a mutable reference over the
entire Tributary, preventing any other read/write operations of any scope.

A timeout increased (bc2f23f72b) due to this bug
not being identified has been decreased back, thankfully.

Also shims in basic support for Completed, which was the WIP before this bug
was identified.
2023-08-27 05:07:11 -04:00
Luke Parker
c65bb70741
Remove SlashVote per #349 2023-08-21 23:48:33 -04:00
akildemir
e319762c69
use half-aggregation for tm messages (#346)
* dalek 4.0

* cargo update

Moves to a version of Substrate which uses curve25519-dalek 4.0 (not a rc).
Doesn't yet update the repo to curve25519-dalek 4.0 (as a branch does) due
to the official schnorrkel using a conflicting curve25519-dalek. This would
prevent installation of frost-schnorrkel without a patch.

* use half-aggregation for tm messages

* fmt

* fix pr comments

* cargo update

Achieves three notable updates.

1) Resolves RUSTSEC-2022-0093 by updating libp2p-identity.
2) Removes 3 old rand crates via updating ed25519-dalek (a dependency of
libp2p-identity).
3) Sets serde_derive to 1.0.171 via updating to time 0.3.26 which pins at up to
1.0.171.

The last one is the most important. The former two are niceties.

serde_derive, since 1.0.171, ships a non-reproducible binary blob in what's a
complete compromise of supply chain security. This is done in order to reduce
compile times, yet also for the maintainer of serde (dtolnay) to leverage
serde's position as the 8th most downloaded crate to attempt to force changes
to the Rust build pipeline.

While dtolnay's contributions to Rust are respectable, being behind syn, quote,
and proc-macro2 (the top three crates by downloads), along with thiserror,
anyhow, async-trait, and more (I believe also being part of the Rust project),
they have unfortunately decided to refuse to listen to the community on this
issue (or even engage with counter-commentary). Given their political agenda
they seem to try to be accomplishing with force, I'd go as far as to call their
actions terroristic (as they're using the threat of the binary blob as
justification for cargo to ship 'proper' support for binary blobs).

This is arguably representative of dtolnay's past work on watt. watt was a wasm
interpreter to execute a pre-compiled proc macro. This would save the compile
time of proc macros, yet sandbox it so a full binary did not have to be run.

Unfortunately, watt (while decreasing compile times) fails to be a valid
solution to supply chain security (without massive ecosystem changes). It never
implemented reproducible builds for its wasm blobs, and a malicious wasm blob
could still fundamentally compromise a project. The only solution for an end
user to achieve a secure pipeline would be to locally build the project,
verifying the blob aligns, yet doing so would negate all advantages of the
blob.

dtolnay also seems to be giving up their role as a FOSS maintainer given that
serde no longer works in several environments. While FOSS maintainers are not
required to never implement breaking changes, the version number is still 1.0.
While FOSS maintainers are not required to follow semver, releasing a very
notable breaking change *without a new version number* in an ecosystem which
*does follow semver*, then refusing to acknowledge bugs as bugs with their work
does meet my personal definition of "not actively maintaining their existing
work". Maintenance would be to fix bugs, not introduce and ignore.

For now, serde > 1.0.171 has been banned. In the future, we may host a fork
without the blobs (yet with the patches). It may be necessary to ban all of
dtolnay's maintained crates, if they continue to force their agenda as such,
yet I hope this may be resolved within the next week or so.

Sources:

https://github.com/serde-rs/serde/issues/2538 - Binary blob discussion

This includes several reports of various workflows being broken.

https://github.com/serde-rs/serde/issues/2538#issuecomment-1682519944

dtolnay commenting that security should be resolved via Rust toolchain edits,
not via their own work being secure. This is why I say they're trying to
leverage serde in a political game.

https://github.com/serde-rs/serde/issues/2526 - Usage via git broken

dtolnay explicitly asks the submitting user if they'd be willing to advocate
for changes to Rust rather than actually fix the issue they created. This is
further political arm wrestling.

https://github.com/serde-rs/serde/issues/2530 - Usage via Bazel broken

https://github.com/serde-rs/serde/issues/2575 - Unverifiable binary blob

https://github.com/dtolnay/watt - dtolnay's prior work on precompilation

* add Rs() api to  SchnorrAggregate

* Correct serai-processor-tests to dalek 4

* fmt + deny

* Slash malevolent validators  (#294)

* add slash tx

* ignore unsigned tx replays

* verify that provided evidence is valid

* fix clippy + fmt

* move application tx handling to another module

* partially handle the tendermint txs

* fix pr comments

* support unsigned app txs

* add slash target to the votes

* enforce provided, unsigned, signed tx ordering within a block

* bug fixes

* add unit test for tendermint txs

* bug fixes

* update tests for tendermint txs

* add tx ordering test

* tidy up tx ordering test

* cargo +nightly fmt

* Misc fixes from rebasing

* Finish resolving clippy

* Remove sha3 from tendermint-machine

* Resolve a DoS in SlashEvidence's read

Also moves Evidence from Vec<Message> to (Message, Option<Message>). That
should meet all requirements while being a bit safer.

* Make lazy_static a dev-depend for tributary

* Various small tweaks

One use of sort was inefficient, sorting unsigned || signed when unsigned was
already properly sorted. Given how the unsigned TXs were given a nonce of 0, an
unstable sort may swap places with an unsigned TX and a signed TX with a nonce
of 0 (leading to a faulty block).

The extra protection added here sorts signed, then concats.

* Fix Tributary tests I broke, start review on tendermint/tx.rs

* Finish reviewing everything outside tests and empty_signature

* Remove empty_signature

empty_signature led to corrupted local state histories. Unfortunately, the API
is only sane with a signature.

We now use the actual signature, which risks creating a signature over a
malicious message if we have ever have an invariant producing malicious
messages. Prior, we only signed the message after the local machine confirmed
it was okay per the local view of consensus.

This is tolerated/preferred over a corrupt state history since production of
such messages is already an invariant. TODOs are added to make handling of this
theoretical invariant further robust.

* Remove async_sequential for tokio::test

There was no competition for resources forcing them to be run sequentially.

* Modify block order test to be statistically significant without multiple runs

* Clean tests

---------

Co-authored-by: Luke Parker <lukeparker5132@gmail.com>

* Add DSTs to Tributary TX sig_hash functions

Prevents conflicts with other systems/other parts of the Tributary.

---------

Co-authored-by: Luke Parker <lukeparker5132@gmail.com>
2023-08-21 01:22:00 -04:00
akildemir
39ce819876
Slash malevolent validators (#294)
* add slash tx

* ignore unsigned tx replays

* verify that provided evidence is valid

* fix clippy + fmt

* move application tx handling to another module

* partially handle the tendermint txs

* fix pr comments

* support unsigned app txs

* add slash target to the votes

* enforce provided, unsigned, signed tx ordering within a block

* bug fixes

* add unit test for tendermint txs

* bug fixes

* update tests for tendermint txs

* add tx ordering test

* tidy up tx ordering test

* cargo +nightly fmt

* Misc fixes from rebasing

* Finish resolving clippy

* Remove sha3 from tendermint-machine

* Resolve a DoS in SlashEvidence's read

Also moves Evidence from Vec<Message> to (Message, Option<Message>). That
should meet all requirements while being a bit safer.

* Make lazy_static a dev-depend for tributary

* Various small tweaks

One use of sort was inefficient, sorting unsigned || signed when unsigned was
already properly sorted. Given how the unsigned TXs were given a nonce of 0, an
unstable sort may swap places with an unsigned TX and a signed TX with a nonce
of 0 (leading to a faulty block).

The extra protection added here sorts signed, then concats.

* Fix Tributary tests I broke, start review on tendermint/tx.rs

* Finish reviewing everything outside tests and empty_signature

* Remove empty_signature

empty_signature led to corrupted local state histories. Unfortunately, the API
is only sane with a signature.

We now use the actual signature, which risks creating a signature over a
malicious message if we have ever have an invariant producing malicious
messages. Prior, we only signed the message after the local machine confirmed
it was okay per the local view of consensus.

This is tolerated/preferred over a corrupt state history since production of
such messages is already an invariant. TODOs are added to make handling of this
theoretical invariant further robust.

* Remove async_sequential for tokio::test

There was no competition for resources forcing them to be run sequentially.

* Modify block order test to be statistically significant without multiple runs

* Clean tests

---------

Co-authored-by: Luke Parker <lukeparker5132@gmail.com>
2023-08-21 00:28:23 -04:00
Luke Parker
6b41c91dc2
Correct TendermintMachine sleep on construction
For logging purposes, I added code to handle negative time till start. I forgot
to only sleep on positive time till start.

Should fix the recent CI failure.
2023-08-13 05:57:14 -04:00
Luke Parker
e2901cab06
Revert round-advance on TendermintMachine::new if local clock is ahead of block start
It was improperly implemented, as it assumed rounds had a constant time
interval, which they do not. It also is against the spec and was meant to
absolve us of issues with poor performance when post-starting blockchains. The
new, and much more proper, workaround for the latter is a 120-second delay
between the Substrate time and the Tributary start time.
2023-08-13 04:35:46 -04:00
Luke Parker
7e71450dc4
Bug fixes and log statements
Also shims next nonce code with a fine-for-now piece of code which is unviable
in production, yet should survive testnet.
2023-08-13 04:03:59 -04:00
Luke Parker
ad51c123e3
Fix Tendermint bug from prior commit 2023-08-08 16:33:09 -04:00
Luke Parker
f6f945e747
Add a LibP2P instantiation to coordinator
It's largely unoptimized, and not yet exclusive to validators, yet has basic
sanity (using message content for ID instead of sender + index).

Fixes bugs as found. Notably, we used a time in milliseconds where the
Tributary expected  seconds.

Also has Tributary::new jump to the presumed round number. This reduces slashes
when starting new chains (whose times will be before the current time) and was
the only way I was able to observe successful confirmations given current
surrounding infrastructure.
2023-08-08 15:12:47 -04:00
Luke Parker
3c38a0ec11
cargo +nightly fmt 2023-08-01 00:47:36 -04:00
Luke Parker
0eb56406a4
Further dependency minimization for build times 2023-07-26 03:03:44 -04:00
Luke Parker
807ec30762
Update the flow for completed signing processes
Now, an on-chain transaction exists. This resolves some ambiguities and
provides greater coordination.
2023-07-14 14:05:12 -04:00
Luke Parker
dfa3106a38
Fix incorrect sig_hash generation
sig_hash was used as a challenge. challenges should be of the form H(R, A, m).
These sig hashes were solely H(A, m), allowing trivial forgeries.
2023-06-08 06:38:25 -04:00
Luke Parker
88f0e89350
Ensure Tributary commits are minimal 2023-05-09 23:45:05 -04:00
Luke Parker
78d5372fb7
Initial code to handle messages from processors 2023-04-25 03:14:42 -04:00
Luke Parker
e74b4ab94f
Add a TributaryReader which doesn't require a borrow to operate
Reduces lock contention.

Additionally changes block_key to include the genesis. While not technically
needed, the lack of genesis introduced a side effect where any Tributary on the
the database could return the block of any other Tributary. While that wasn't a
security issue, returning it suggested it was on-chain when it wasn't. This may
have been usable to create issues.
2023-04-24 07:02:00 -04:00
Luke Parker
cc491ee1e1
Don't return from sync_block until the Tendermint machine returns if it's valid or not
We had a race condition where'd we be informed of blocks 1 .. 3, and
immediately add 1 .. 3. Because we immediately tried to add 2 after 1, it'd
fail since the tip was still the genesis, yet 2 needs the tip to be 1.

Adding a channel, while ugly, was the simplest way to accomplish this.

Also has any added block be broadcasted. Else there's a race condition where a
node which syncs up to the most recent block does so, yet fails to add the next
block when it's committed to.
2023-04-24 02:46:13 -04:00
Luke Parker
14388e746c
Implement Tributary syncing
Also adds a forwards-lookup to the Tributary blockchain.
2023-04-24 00:53:18 -04:00
Luke Parker
215155f84b
Remove reliance on a blockchain read lock from block/commit 2023-04-23 23:51:10 -04:00
Luke Parker
c476f9b640
Break coordinator main into multiple functions
Also moves from std::sync::RwLock to tokio::sync::RwLock to prevent wasting
cycles on spinning.
2023-04-23 23:15:15 -04:00
Luke Parker
aa0ec4ac41
cargo fmt 2023-04-23 18:56:48 -04:00
Luke Parker
05b1fc5f05
Send a heartbeat message when a Tributary falls behind 2023-04-23 18:55:43 -04:00
Luke Parker
72633d6421
Clarify Arc RwLocks and sleeps in coordinator 2023-04-23 18:29:50 -04:00
Luke Parker
ad5522d854
Start handling P2P messages
This defines the tart of a very complex series of locks I'm really unhappy
with. At the same time, there's not immediately a better solution. This also
should work without issue.
2023-04-23 17:01:30 -04:00
Luke Parker
710e6e5217
Add Transaction::sign.
While I don't love the introduction of empty_signed, it's practically fine.
2023-04-23 01:25:45 -04:00
Luke Parker
af84b7f707
Add a test for Tributary
Further fleshes out the Tributary testing code.
2023-04-22 22:28:20 -04:00
Luke Parker
8c74576cf0
Add a test to the coordinator for running a Tributary
Impls a LocalP2p for testing.

Moves rebroadcasting into Tendermint, since it's what knows if a message is
fully valid + original.

Removes TributarySpec::validators() HashMap, as its non-determinism caused
different instances to have different round robin schedules. It was already
prior moved to a Vec for this issue, so I'm unsure why this remnant existed.

Also renames the GH no-std workflow from the prior commit.
2023-04-22 10:49:52 -04:00
Luke Parker
294ad08e00
Add support for multiple orderings in Provided
Necessary as our Tributary chains needed to agree when a Serai block has
occurred, and when a Monero block has occurred. Since those could happen at the
same time, some validators may put SeraiBlock before ExternalBlock and vice
versa, causing a chain halt. Now they can have distinct ordering queues.
2023-04-20 07:32:40 -04:00
Luke Parker
9c2a44f9df
Apply DKG TX handling code to all sign TXs
The existing code was almost entirely applicable. It just needed to be scoped
with an ID. While the handle function is now a bit convoluted, I don't see a
better option.
2023-04-20 06:27:05 -04:00
Luke Parker
8041a0d845
Initial Tributary handling 2023-04-20 05:05:17 -04:00
Luke Parker
f48022c6eb
Add basic getters to tributary 2023-04-15 00:41:48 -04:00
Luke Parker
2e2bc59703
Support reloading the mempool from disk 2023-04-14 15:51:56 -04:00
Luke Parker
695d923593
Reloaded provided transactions from the disk
Also resolves a race condition by asserting provided transactions must be
unique, allowing them to be safely provided multiple times.
2023-04-14 15:03:01 -04:00
Luke Parker
63318cb728
Add a DB to Tributary
Adds support for reloading most of the blockchain.
2023-04-14 14:11:40 -04:00
Luke Parker
72dd665ebf
Add DoS limits to tributary and require provided transactions be ordered 2023-04-13 20:35:55 -04:00
Luke Parker
8b1bce6abd
Add correction the last commit missed 2023-04-13 18:47:34 -04:00
Luke Parker
e73a51bfa5
Finish binding Tendermint into Tributary and define a Tributary master object 2023-04-13 18:43:27 -04:00
Luke Parker
5858b6c03e
Replace Tendermint step with sync_block
Step moved a step forward after an externally synced/added block. This created
a race condition to add the block between the sync process and the Tendermint
machine. Now that the block routes through Tendermint, there is no such race
condition.
2023-04-13 18:18:29 -04:00
Luke Parker
a509dbfad6
Embed the mempool into the Blockchain 2023-04-13 09:47:14 -04:00
Luke Parker
03a6470a5b
Finish binding Tendermint, bar the P2P layer 2023-04-12 18:04:28 -04:00
Luke Parker
997dd611d5
Don't add blocks which aren't valid
Previously, Tendermint needed to be live more than it needed to be correct.
Under the original intention for it, correctness would fail if any coin
desynced, which would cause the node to fail entirely. By accepting a
supermajority's view of state, despite its own, a single coin's failure would
only lead to inability to participate with that single coin.

Now that Tendermint is solely for Tributary, nodes should halt a coin-specific
chain if their view of the chain differs. They are unable to meaningless
participate regardless.

This also means a supermajority of validators can no longer fake messages from
other validators, allowing the Tributary chain to use uniform weights with much
less impact. There is still enough impact they can't be used (ability to cause
a fork), yet they should allow uniform block production (as that's solely a DoS
concern).

While we prior could've simply additionally checked signatures, add_block's
lack of a failure case would've meant it had to panic. This would've been a DoS
possible a minority-weight *which affected the entire coordinator* and
therefore *the entire validator for all coins*.
2023-04-12 16:18:42 -04:00
Luke Parker
86cbf6e02e
Bind the signature scheme for tendermint-machine 2023-04-12 16:06:14 -04:00
Luke Parker
8c8232516d
Only allow designated participants to send transactions 2023-04-12 12:42:23 -04:00
Luke Parker
be947ce152
Add a mempool 2023-04-12 12:15:38 -04:00
Luke Parker
7c7f17aac6
Test the blockchain 2023-04-12 11:13:48 -04:00