misc-research/Monero-Black-Marble-Flood/pdf/monero-black-marble-flood.tex

921 lines
48 KiB
TeX

\documentclass[usletter,11pt,english,openany]{article}
\usepackage{float}
\usepackage{wrapfig}
%Primary packages
\usepackage{fancyvrb}
\usepackage[utf8]{inputenc}
\usepackage[english]{babel}
\usepackage[pdftex]{graphicx}
% Useful packages:
% Advanced mathematical formulas and symbols
% -------------------------------------
\usepackage{amsmath}
\usepackage{amssymb}
\usepackage{amsfonts}
\usepackage{bm}
% Footnotes
% -------------------------------------
\usepackage[stable,splitrule]{footmisc}
% Color management package
% -------------------------------------
\usepackage[usenames,dvipsnames]{xcolor}
% Control line spacing
% -------------------------------------
% putting this between footmisc and hyperref seemed to fix broken footnote links
\usepackage{setspace}
\AtBeginDocument{\let~=\nobreakspace}
\spacing{1.4}
\usepackage{lineno}
\linenumbers
\usepackage[bookmarks=true]{hyperref}
\hypersetup{colorlinks=false}
\usepackage{orcidlink}
\usepackage{booktabs}
\usepackage{caption}
\usepackage{longtable}
\usepackage[T1]{fontenc}
\usepackage{geometry}
\geometry{verbose,tmargin=2cm,bmargin=2cm,lmargin=2cm,rmargin=2cm}
\usepackage{array}
\usepackage{url}
\usepackage{multirow}
\usepackage{stackrel}
\usepackage{rotating}
\usepackage{longtable}
\usepackage{booktabs}
\newcolumntype{L}{>{\raggedright\arraybackslash}}
\newcolumntype{R}{>{\raggedleft\arraybackslash}}
\newcolumntype{C}{>{\centering\arraybackslash}}
% https://tex.stackexchange.com/questions/151241/remove-metadata-of-pdf-generated-by-latex
\hypersetup{
bookmarks=true, % show bookmarks bar?
unicode=false, % non-Latin characters in Acrobat's bookmarks
pdftoolbar=true, % show Acrobat's toolbar?
pdfmenubar=true, % show Acrobat's menu?
pdffitwindow=false, % window fit to page when opened
% pdfstartview={FitW}, % fits the width of the page to the window
pdftitle={Monero Black Marble Flood}, % title
pdfauthor={Rucknium}, % author
pdfsubject={}, % subject of the document
pdfcreator={Rucknium}, % creator of the document
pdfproducer={}, % producer of the document
pdfkeywords={}, % list of keywords
pdfnewwindow=true, % links in new window
colorlinks=false, % false: boxed links; true: colored links
linkcolor=red, % color of internal links
citecolor=green, % color of links to bibliography
filecolor=magenta, % color of file links
urlcolor=cyan % color of external links
}
\begin{document}
\title{March 2024 Suspected Black Marble Flooding Against Monero:
Privacy, User Experience, and Countermeasures\\\vspace{.3cm}
\large Draft v0.3\vspace{-.715cm}}
\author{Rucknium\orcidlink{0000-0001-5999-8950} }
\date{October 9, 2024}
\maketitle
\begin{abstract}
On March 4, 2024, aggregate Monero transaction volume suddenly almost
tripled. This note analyzes the effect of the large number of transactions,
assuming that the transaction volume is an attempted black marble
flooding attack by an adversary. According to my estimates, mean effective
ring size has decreased from 16 to 5.5 if the black marble flooding
hypothesis is correct. At current transaction volumes, the suspected
spam transactions probably cannot be used for large-scale ``chain
reaction'' analysis to eliminate all ring members except for the
real spend. Effects of increasing Monero's ring size above 16 are
analyzed.
\end{abstract}
\section{March 4, 2024: Sudden transaction volume }
\begin{figure}[H]
\caption{Volume of Monero transactions with spam fingerprint}
\label{fig-spam-tx-volume}
\centering{}\includegraphics[scale=0.5]{images/spam-fingerprint-tx-volume}
\end{figure}
On March 4, 2024 at approximately block height 3097764 (15:21:24 UTC),
the number of 1input/2output minimum fee (20 nanoneros/byte) transactions
sent to the Monero network rapidly increased. Figure \ref{fig-spam-tx-volume}
shows daily volume of this type of transaction increasing from about
15,000 to over 100,000.
The large volume of these transactions was enough to entirely fill
the 300 kB Monero blocks mined about every two minutes. Monero's dynamic
block size algorithm activated. The 100 block rolling median block
size slowly increased to adjust for the larger number of transactions
that miners could pack in blocks. Figure \ref{fig-empirical-block-weight}
shows the adjustment. The high transaction volume raised the 100 block
median gradually for period of time. Then the transaction volume reduced
just enough to allow the 100 block median to reset to a lower level.
Then the process would restart. Block sizes have usually remained
between 300 kB and 400 kB. Occasionally, high-fee transactions would
allow miners to get more total revenue by giving up some of the 0.6
XMR/block tail emission and including more transactions in a block.
The ``maximum peaks'' plot shows this phenomenon.
\begin{figure}[H]
\caption{Monero empirical block weight}
\label{fig-empirical-block-weight}
\centering{}\includegraphics[scale=0.5]{images/rolling-median-block-weight}\includegraphics[scale=0.5]{images/rolling-max-block-weight}
\end{figure}
The sudden transaction volume rise may originate from a single entity.
The motive may be spamming transactions to bloat the blockchain size,
increase transaction confirmation times for real users, perform a
network stress test, or execute a black marble flooding attack to
reduce the privacy of Monero users. I will focus most of my analysis
on the last possibility.
\section{Literature review}
The very first research bulletin released by the Monero Research Lab
described black marble transaction flooding. \cite{Noether2014} points
out that the ring signature privacy model requires rings to contain
transaction outputs that are could be plausible real spends. If a
single entity owns a large share of outputs (spent or not), it can
use its knowledge to rule out ring members in other users' transactions
that cannot be the real spend. Since the entity knows that itself
did not spend the output(s) in a particular ring, the effective ring
size that protects other users' privacy can be reduced --- even to
an effective ring size of 1 when the entity knows the real spend with
certainty. Rings with known real spends can be leveraged to determine
the real spend in other rings in a ``chain reaction'' attack.
\cite{Noether2014} gave the name ``black marble'' to the outputs
owned by an anti-privacy adversary since they modeled the problem
using a marble draw problem with a hypergeometric distribution. When
a specific number of marbles are drawn \textit{without} replacement
from an urn containing a specific number of white and black marbles,
the hypergeometric distribution describes the probability of drawing
a specific number of black marbles. In my modeling I use the binomial
distribution, which is the same as the hypergeometric except marbles
are drawn \textit{with} replacement. The binomial distribution makes
more sense now ten years after \cite{Noether2014} was written. The
total number of RingCT outputs on the blockchain that can be included
in a ring is over 90 million. The hypergeometric distribution converges
to the binomial distribution as the total number of marbles increases
to infinity. Moreover, Monero's current decoy selection algorithm
does not select all outputs with equal probability. More recent outputs
are selected with much higher probability. The hypergeometric distribution
cannot be used when individual marbles have unequal probability of
being selected.
\cite{Chervinski2021} simulates a realistic black marble flood attack.
They consider two scenarios. The adversary could create 2input/16output
transactions to maximize the number of black marble outputs per block
or the adversary could create 2input/2output transactions to make
the attack less obvious. The paper uses Monero transaction data from
2020 to set the estimated number of real outputs and kB per block
at 41 outputs and 51 kB respectively. The nominal ring size at this
time was 11. The researchers simulated filling the remaining 249 kB
of the 300 kB block with black marble transactions. A ``chain reaction''
algorithm was used to boost the effectiveness of the attack. In the
2in/2out scenario, the real spend could be deduced (effective ring
size 1) in 11\% of rings after one month of spamming black marbles.
Later I will compare the results of this simulation with the current
suspected spam incident.
\cite{Krawiec-Thayer2021} analyze a suspected spam incident in July-August
2021. Transactions' inputs, outputs, fees, and ring member ages were
plotted to evaluate evidence that a single entity created the spam.
The analysis concluded, ``All signs point towards a single entity.
While transaction homogeneity is a strong clue, a the {[}sic{]} input
consumption patterns are more conclusive. In the case of organic growth
due to independent entities, we would expect the typically semi-correlated
trends across different input counts, and no correlation between independent
users\textquoteright{} wallets. During the anomaly, we instead observed
an extremely atypical spike in 1--2 input txns with no appreciable
increase in 4+ input transactions.''
TODO: A few papers like \cite{Ronge2021,Egger2022} discuss black
marble attacks too.
\section{Black marble theory}
The binomial distribution describes the probability of drawing $x$
number of ``successful'' items when drawing a total of $n$ items
when the probability of a successful draw is $p$. It can be used
to model the number of transaction outputs selected by the decoy selection
algorithm that are not controlled by a suspected adversary.
The probability mass function of the binomial distribution with $n\in\{0,1,2,\ldots\}$
number of draws and $p\in[0,1]$ probability of success is
\begin{equation}
f(x,n,p)=\binom{n}{x}p^{x}\left(1-p\right)^{n-x}\textrm{, where }\binom{n}{x}=\frac{n!}{x!(n-x)!}
\end{equation}
The expected value (the theoretical mean) of a random variable with
a binomial distribution is $np$.
Monero's standard decoy selection algorithm programmed in \texttt{wallet2}
does not select outputs with equal probability. The probability of
selecting each output depends on the age of the output. Specifics
are in \cite{Rucknium2023a}. The probability of a single draw selecting
an output that is not owned by the adversary, $p_{r}$, is equal to
the share of the probability mass function occupied by those outputs:
$p_{r}=\sum_{i\in R}g(i)$, where $R$ is the set of outputs owned
by real users and $g(x)$ is the probability mass function of the
decoy selection algorithm.
\subsection{Spam assumptions\label{subsec:spam-assumptions}}
There is some set of criteria that identifies suspected spam. The
early March 2024 suspected spam transactions: 1) have one input; 2)
have two outputs; 3) pay the minimum 20 nanoneros per byte transaction
fee. The normal volume of these transactions produced by real users
must be estimated. The volume in excess of the normal volume is assumed
to be spam. I followed this procedure:
\begin{enumerate}
\item Compute the mean number of daily transactions that fit the suspected
spam criteria for the four weeks that preceded the suspected spam
incident. A separate mean was calculated for each day of the week
(Monday, Tuesday,...) because Monero transaction volumes have weekly
cycles. These volume means are denoted $v_{r,m},v_{r,t},v_{r,w},\ldots$
for the days of the week.
\item For each day of the suspected spam interval, sum the number of transactions
that fit the suspected spam criteria. Subtract the amounts found in
step (1) from this sum, matching on the day of the week. This provides
the estimated number of spam transactions for each day: $v_{s,1},v_{s,2},v_{s,3},\ldots$
\item For each day of the suspected spam interval, randomly select $v_{s,t}$
transactions from the set of transactions that fit the suspected spam
criteria, without replacement. This randomly selected set is assumed
to be the true spam transactions.
\item During the period of time of the spam incident, compute the expected
probability $p_{r}$ that one output drawn from the \texttt{wallet2}
decoy distribution will select an output owned by a real user (instead
of the adversary) when the wallet constructs a ring at the point in
time when the blockchain tip is at height $h$. The closed-form formula
of the \texttt{wallet2} decoy distribution is in \cite{Rucknium2023a}.
\item The expected effective ring size of each ring constructed at block
height $h$ is $1+15\cdot p_{r}$. The coefficient on $p_{r}$ is
the number of decoys.
\end{enumerate}
Figure \ref{fig-estimated-mean-effective-ring-size} shows the results
of this methodology. The mean effective ring size settled at about
5.5 by the fifth day of the large transaction volume. On March 12
and 13 there was a large increase in the number of 1input/2output
transactions that paid 320 nanoneros/byte (the third fee tier). This
could have been the spammer switching fee level temporarily or a service
that uses Monero increasing fees to avoid delays. I used the same
method to estimate the spam volume of these 320 nanoneros/byte suspected
spam. The 1in/2out 320 nanoneros/byte transactions displaced some
of the 1in/2out 20 nanoneros/byte transactions because miners preferred
to put transactions with higher fees into blocks. Other graphs and
analysis will consider only the 1in/2out 20 nanoneros/byte transactions
as spam unless indicated otherwise.
\begin{figure}[H]
\caption{Estimated mean effective ring size}
\label{fig-estimated-mean-effective-ring-size}
\centering{}\includegraphics[scale=0.5]{images/empirical-effective-ring-size}
\end{figure}
Figure \ref{fig-spam-share-outputs} shows the daily share of outputs
on the blockchain that are owned by the suspected spammer. The mean
share of outputs since the suspected spam started is about 75 percent.
\begin{figure}[H]
\caption{Spam share of outputs}
\label{fig-spam-share-outputs}
\centering{}\includegraphics[scale=0.5]{images/spam-share-outputs}
\end{figure}
\subsection{Long term projection scenarios at different ring sizes}
Fix the number of outputs owned by real users at $r$. The analysis
will let the number $s$ of outputs owned by the adversary vary. The
share of outputs owned by real users is
\begin{equation}
p_{r}=\dfrac{r}{r+s}\label{eq:p_r-fixed-real}
\end{equation}
The \ref{eq:p_r-fixed-real} expression can be written $p_{r}=\frac{1}{r}\cdot\dfrac{r}{1+\tfrac{1}{r}s}$
, which is the formula for hyperbolic decay with the additional $\frac{1}{r}$
coefficient at the beginning of the expression \cite{Aguado2010}.
Let $n$ be the nominal ring size (16 in Monero version 0.18). The
number of decoys chosen by the decoy selection algorithm is $n-1$.
The mean effective ring size for a real user's ring is one (the real
spend) plus the ring's expected number of decoys owned by other real
users.
\begin{equation}
\mathrm{E}\left[n_{e}\right]=1+\left(n-1\right)\cdot\dfrac{r}{r+s}\label{eq:expectation-n_e}
\end{equation}
The empirical analysis of Section \ref{subsec:spam-assumptions} considered
the fact that the \texttt{wallet2} decoy selection algorithm draws
a small number of decoys from the pre-spam era. Now we will assume
that the spam incident has continued for a very long time and all
but a negligible number of decoys are selected from the spam era.
We will hold constant the non-spam transactions and vary the number
of spam transactions and the ring size. Figures \ref{fig-projected-effective-ring-size-non-log},
\ref{fig-projected-effective-ring-size-log-log}, and \ref{fig-projected-share-ring-size-one}
show the results of the simulations.
\begin{figure}[H]
\caption{Long-term projected mean effective ring size}
\label{fig-projected-effective-ring-size-non-log}
\centering{}\includegraphics[scale=0.5]{images/projected-effective-ring-size-non-log}
\end{figure}
\begin{figure}[H]
\caption{Long-term projected mean effective ring size (log-log scale)}
\label{fig-projected-effective-ring-size-log-log}
\centering{}\includegraphics[scale=0.5]{images/projected-effective-ring-size-log-log}
\end{figure}
\begin{figure}[H]
\caption{Long-term projected share of rings with effective ring size 1}
\label{fig-projected-share-ring-size-one}
\centering{}\includegraphics[scale=0.5]{images/projected-ring-size-one}
\end{figure}
\subsection{Guessing the real spend using a black marble flooder's simple classifier}
The adversary carrying out a black marble flooding attack could use
a simple classifier to try to guess the real spend: Let $n$ be nominal
ring size and $n_{s}$ be the number of outputs in a given ring that
are owned by the attacker. $n_{s}$ is a random variable because decoy
selection is a random process. The adversary can eliminate $n_{s}$
of the $n$ ring members as possible real spends. The attacker guesses
randomly with uniform probability that the $i$th ring member of the
$n-n_{s}$ remaining ring members is the real spend. The probability
of correctly guessing the real spend is $\frac{1}{n-n_{s}}$. If the
adversary owns all ring members except for one ring member, which
must be the real spend, the probability of correctly guessing the
real spend is 100\%. If the adversary owns all except two ring members,
the probability of correctly guessing is 50\%. And so forth.
The mean effective ring size is $\mathrm{E}\left[n_{e}\right]$ from
\ref{eq:expectation-n_e}. Does this mean that the mean probability
of correctly guessing the real spend is $\frac{1}{\mathrm{E}\left[n_{e}\right]}$?
No. The $h(x)=\frac{1}{x}$ function is strictly convex. By Jensen's
inequality, $\mathrm{E}\left[\frac{1}{n_{e}}\right]>\frac{1}{\mathrm{E}\left[n_{e}\right]}$.
The mean probability of correctly guessing the real spend is
\begin{equation}
\mathrm{E}\left[\frac{1}{n_{e}}\right]=\stackrel[i=1]{n}{\sum}\dfrac{1}{i}\cdot f(i-1,n-1,\frac{\mathrm{E}\left[n_{e}\right]-1}{n-1})
\end{equation}
$\frac{1}{i}$ is the probability of correctly guessing the real spend
when the effective ring size is $i$. $f$ is the probability mass
function of the binomial distribution. It calculates the probability
of the decoy selection algorithm selecting $i-1$ decoys that are
owned by real users. The total number of decoys to select is $n-1$
(that is the argument in the second position of $f$). The probability
of selecting a decoy owned by a real user is $\frac{\mathrm{E}\left[n_{e}\right]-1}{n-1}=\frac{r}{r+s}$.
\begin{figure}[H]
\caption{Estimated probability of correctly guessing the real spend}
\label{fig-prob-guessing-real-spend}
\centering{}\includegraphics[scale=0.5]{images/empirical-guessing-probability}
\end{figure}
The probability of a given ring having all adversary-owned ring members
except for the real spend is $f\left(0,n-1,\frac{\mathrm{E}\left[n_{e}\right]-1}{n-1}\right)$
. Figure \ref{fig-share-ring-size-one} plots the estimated share
of rings with effective ring size one.
\begin{figure}[H]
\caption{Estimated share of rings with effective ring size of one}
\label{fig-share-ring-size-one}
\centering{}\includegraphics[scale=0.5]{images/empirical-ring-size-one}
\end{figure}
\section{Chain reaction graph attacks}
The effective ring size can be reduced further by applying a process
of elimination to related rings. This technique is called a ``chain
reaction'' or a ``graph analysis attack''. Say that the effective
ring size in transaction $A$ is reduced to two because of a black
marble attack. One of the remaining two ring members is an output
in transaction $B$. If the output in transaction $B$ is known to
be spent in transaction $C$ because the effective ring size of transaction
$C$ was one, then that output can be ruled out as a plausible real
spend in transaction $A$. Therefore, the adversary can reduce the
effective ring size of transaction $A$ to one.
Theorem 1 of \cite{Yu2019a} says that a ``closed set'' attack is
as effective as exhaustively checking all subsets of outputs. The
brute force attack is infeasible since its complexity is $O\left(2^{m}\right)$,
where $m$ is the total number of RingCT outputs on the blockchain.
\cite{Yu2019a} implements a heuristic algorithm to execute the closed
set attack that is almost as effective as the brute force method.
\cite{Vijayakumaran2023} proves that the Dulmage-Mendelsohn (DM)
decomposition gives the same results as the brute force closed set
attack, but the algorithm renders a result in polynomial time. The
open source implementation of the DM decomposition in \cite{Vijayakumaran2023}
processes 37 million RingCT rings in about four hours.
In practice, how much further can chain reaction attacks reduce the
effective ring size when combined with a black marble attack? \cite{Egger2022}
suggest some closed-form formulas to compute the vulnerability of
different ring sizes to chain reaction attacks. However, \cite{Egger2022}
assume that decoys are selected by a partitioning process instead
of Monero's actual mimicking decoy selection algorithm. It is not
clear how relevant the findings of \cite{Egger2022} are for Monero's
mainnet. Monte Carlo simulations would be a better way to evaluate
the risk of chain reactions.
\cite{Chervinski2021} carries out a simulation using the old ring
size of 11. In the 2input/2output spam scenario, 82\% of outputs are
black marbles. Assuming only the binomial distribution, i.e. no chain
reaction analysis, Figure \ref{fig-effective-ring-size-binomial-pmf}
compares the theoretical long-term distribution of effective ring
sizes in the \cite{Chervinski2021} scenario and the March 2024 suspected
spam on Monero's mainnet. The share of rings with effective ring size
1 in the \cite{Chervinski2021} scenario is 11.9 percent, but the
share is only 0.8 percent with the suspected March 2024 spam. The
mean effective ring sizes of the \cite{Chervinski2021} scenario without
chain reaction and the March 2024 spam estimate are 2.9 and 5.2, respectively.
\begin{figure}[H]
\caption{Probability mass function of long-term effective ring sizes}
\label{fig-effective-ring-size-binomial-pmf}
\centering{}\includegraphics[scale=0.5]{images/effective-ring-size-binomial-pmf}\includegraphics[scale=0.5]{images/chervinski-chain-reaction}
\end{figure}
\cite{Chervinski2021} executes chain reaction analysis to increase
the effectiveness of the attack. The second plot in Figure \ref{fig-effective-ring-size-binomial-pmf}
compares the long term effective ring size achieved by \cite{Chervinski2021}
when leveraging chain reaction analysis and the effective ring size
when only the binomial distribution is assumed. \cite{Chervinski2021}
increases the share of ring with effective ring size one from 11.9
to 14.5 percent. Mean effective ring size decreases from 2.94 to 2.76.
This is a modest gain of attack effectiveness, but \cite{Chervinski2021}
appears to be using a suboptimal chain reaction algorithm instead
of the closed set attack.
I implemented a DM decomposition simulation, using the real data from
the black marble era of transactions as the starting point. The set
of transactions produced by the adversary is known only to the adversary,
so a reasonable guess was required. First, transactions that fit the
spamming criteria were randomly assigned to black marble status in
a proportion equal to the spam volume. Second, each ring was randomly
assigned a real spend so that rings in non-black marble transactions
would not entirely disappear in the next step. Third, outputs in black
marble transactions were removed from the rings of non-black-marble
transactions, except when the ``real spend'' assigned in the previous
step would be removed. Fourth, all black marble transactions were
removed from the dataset. The transaction graph left after these deletions
is not necessarily internally consistent (i.e. funds might not actually
be able to flow between transactions), but the objective is to approximate
a chain reaction attack. Fifth, I used a modified version of the DM
decomposition developed by \cite{Vijayakumaran2023} to simulate a
chain reaction attack.\footnote{\url{https://github.com/avras/cryptonote-analysis}\\
\url{https://www.respectedsir.com/cna}}
After the black marble outputs were removed but before the DM decomposition
was applied, 0.57 percent of rings in the simulated dataset had a
single ring member left. The real spend could be deduced in these
0.57 percent of rings. This simulated estimate is consistent with
the results in Figure \ref{fig-share-ring-size-one} that uses the
$f\left(0,n-1,\frac{\mathrm{E}\left[n_{e}\right]-1}{n-1}\right)$
formula instead of a simulation. After the DM decomposition was applied
to the simulated dataset, the share of rings whose real spend could
be deterministically deduced increased to 0.82 percent. Therefore,
the DM decomposition would increase the black-marble adversary's ability
to deterministically deduce the real spend by 44 percent. My simulation
results can be compared to the results of \cite{Chervinski2021} in
a different parameter environment, which found a 22 percent increase
from a chain reaction attack (the share of rings with effective ring
size one increased from 11.9 to 14.5 percent).
\section{Countermeasures}
See \url{https://github.com/monero-project/research-lab/issues/119}
TODO
\section{Estimated cost to suspected spammer}
When the 1in/2out 20 nanoneros/byte spam definition is used, the total
fees paid by the spam transactions over the 23 days of spam was 61.5
XMR. The sum total of the transaction sizes of the spam transactions
was 3.08 GB.
When the 1in/2out 20 or 320 nanoneros/byte spam definition is used,
the total fees paid by the spam transactions over the 23 days of spam
was 81.3 XMR. The sub total of the transaction sizes of the spam transactions
was 3.12 GB.
\section{Transaction confirmation delay}
Monero's transaction propagation rules are different from BTC's rules
for good reasons, but two of the rules can make transactions seem
like they are ``stuck'' when the txpool (mempool) is congested.
First, Monero does not have replace-by-fee (RBF). When a Monero node
sees that a transaction attempts to spend an output that is already
spent by another transaction in the txpool, the node does not send
the transaction to other nodes because it is an attempt to double
spend the output. (Monero nodes do not know the real spend in the
ring, but double spends can be detected by comparing the key images
of ring signatures in different transactions.) Monero users cannot
increase the fee of a transaction that they already sent to a node
because the transaction with the higher fee would be considered a
double spend. BTC has RBF that allows a transaction to replace a transaction
in the mempool that spends the same output if the replacement transaction
pays a higher fee. One of RBF's downsides is that merchants cannot
safely accept zero-confirmation transactions because a malicious customer
can replace the transaction in the mempool with a higher-fee transaction
that spends the output back to themselves. Without RBF, Monero users
must wait for their low-fee transaction to confirm on the blockchain.
They cannot choose to raise their ``bid'' for block space even if
they were willing to pay more. They have to get it right the first
time. Fee prediction is especially important for Monero users when
the txpool is congested because of the lack of RBF, but very little
Monero-specific fee prediction research has been done.
Unlike BTC, Monero also does not have child-pays-for-parent (CPFP),
which allows users to chain multiple transactions together while they
are still in the mempool. With CPFP, users can spend the output of
the unconfirmed parent transaction and attach a higher fee to the
child transaction. Miners have an incentive to include the parent
transaction in the block because the child transaction is only valid
if the parent transaction is also mined in a block. Monero transaction
outputs cannot be spent in the same block that they are confirmed
in. Actually, Monero users need to wait at least ten blocks to spend
new transaction outputs because benign or malicious blockchain reorganizations
can invalidate ring signatures.\footnote{``Eliminating the 10-block-lock'' \url{https://github.com/monero-project/research-lab/issues/95}}
Monero's transaction propagation rules can create long delays for
users who pay the same minimum fee that the suspected spammer pays.
When users pay the same fee as the spam, their transactions are put
in a ``queue'' with other transactions at the same fee per byte
level. Their transactions are confirmed in first-in/first-out order
because the \texttt{get\_block\_template} RPC call to \texttt{monerod}
arranges transactions that way.\footnote{\url{https://github.com/monero-project/monero/blob/9bf06ea75de4a71e3ad634e66a5e09d0ce021b67/src/cryptonote_core/tx_pool.cpp\#L1596}}
Most miners use \texttt{get\_block\_template} to construct blocks,
but P2Pool orders transactions randomly after they have been sorted
by fee per byte.\footnote{\url{https://github.com/SChernykh/p2pool/blob/dd17372ec0f64545311af40b976e6274f625ddd8/src/block_template.cpp\#L194}}
The first plot in Figure \ref{fig-delay-tx-confirmation} shows the
mean delay of transaction confirmation in each hour. The plot shows
the mean time that elapsed between when the transaction entered the
txpool and when it was confirmed in a block. Each hour's value in
the line plot is computed from transactions that were confirmed in
blocks in that hour. This data is based on txpool archive data actively
collected from a few nodes.\footnote{\url{https://github.com/Rucknium/misc-research/tree/main/Monero-Mempool-Archive}}
The mean includes transactions with and without the spam fingerprint.
Usually mean confirmation time was less than 30 minutes, but sometimes
confirmations of the average transaction were delayed by over two
hours.
\begin{figure}[H]
\caption{Delay to first transaction confirmation}
\label{fig-delay-tx-confirmation}
\centering{}\includegraphics[scale=0.5]{images/mean-delay-first-confirmation}\includegraphics[scale=0.5]{images/max-delay-first-confirmation}
\end{figure}
The second plot in Figure \ref{fig-delay-tx-confirmation} shows the
\textit{maximum} waiting time for a transaction to be confirmed. The
value of the line at each hour is the longest time that a transaction
waited to be confirmed in one of the block mined in the hour or the
amount of time that a transaction was still waiting to be confirmed
at the end of the hour (whichever is greater). There were a handful
of transactions that paid fees below the 20 nanoneros/byte tier that
the spam was paying. These transactions did not move forward in the
queue when the spam transactions were confirmed. Instead, they had
to wait until the txpool completely emptied. Exactly 100 transactions
waited longer than three hours. They paid between 19465 and 19998
piconeros per byte. Most of the transactions appeared to have set
fees slightly lower than 20 nanonerpos per byte because they had an
unusual number of inputs. 92 of them had four or more inputs. The
remaining eight of them had just one input. Those eight may have been
constructed by a nonstandard wallet.
\section{Real user fee behavior}
During the suspected spam, users must pay more than the minimum fee
to put their transactions at the front of the confirmation queue.
If users pay more than the minimum fee, usually their transactions
would be confirmed in the next mined block. Monero's standard fee
levels are 20, 80, 320, and 4000 nanoneros per byte. Users are not
required to pay one of these fee levels, but all wallets that are
based on \texttt{wallet2} do not allow users to choose custom fees
outside of the four standard levels because of the privacy risk of
unusual transactions.\footnote{\url{https://github.com/Rucknium/misc-research/tree/main/Monero-Nonstandard-Fees}}
The ``auto'' fee level of the Monero GUI and CLI wallets is supposed
to automatically change the fee of a transaction from the lowest tier
(20 nanoneros/byte) to the second tier (80 nanoneros/byte) when the
txpool is congested. Unfortunately, a bug prevented the automatic
adjustment. On March 9, 2024 the Monero Core Team released the 0.18.3.2
version of Monero and the GUI/CLI wallet that fixed the bug.\footnote{``Monero 0.18.3.2 'Fluorine Fermi' released'' \url{https://www.getmonero.org/2024/03/09/monero-0.18.3.2-released.html}
``wallet2: adjust fee during backlog, fix set priority'' \url{https://github.com/monero-project/monero/pull/9220}} Users are not required to upgrade to the latest wallet version, so
probably many users still use the version that is not automatically
adjusting fees.
The first plot of Figure \ref{fig-share-tx-by-fee-tier} shows the
share of trasnactions paying each of the four fee tiers. Any transactions
that do not pay in the standard ranges $\left\{ \left[18,22\right],\left[72,82\right],\left[315,325\right],\left[3000,4100\right]\right\} $
were not included in the plot. The 320 nanoneros/byte tier is interesting.
About 10 percent of transactions paid 320 nanonero/byte until Februray
17, 2024. The date could have something to do with Monero being delisted
from Binance on February 20, 2024.\footnote{\url{https://decrypt.co/218194/binance-finalizes-monero-delisting}}
Then on March 12-13, 2024 there was a burst of 320 nanonero/byte transactions.
The 0.18.3.2 GUI/CLI wallet release could not explain the burst since
the auto fee adjustment would only increase fees from 20 to 80 nanoneros/byte.
The burst of 320 nanonero/byte transactions must have been either
from a central service producing fees or from the suspected spammer.
The second plot of Figure \ref{fig-share-tx-by-fee-tier} shows the
same data with the suspected spam transactions eliminated both the
80 and 320 nanoneros/byte transactions with the spam fingerprint were
removed. There is a modest increase in 80 nanonero/byte transactions
after the spam started.
\begin{figure}[H]
\caption{Share of transactions by fee tier}
\label{fig-share-tx-by-fee-tier}
\centering{}\includegraphics[scale=0.5]{images/share-tx-in-fee-tier-all-txs}\includegraphics[scale=0.5]{images/share-tx-in-fee-tier-spam-removed}
\end{figure}
The mempool archive data suggest that merchants using zero-confirmation
delivery were still safe during the spam incident. Once submitted
to the network, transactions did not drop out of the mempool. They
just took longer to confirm. There were only two transaction IDs in
the mempool of one of the mempool archive nodes that did not confirm
during the spam period. Both occurred on March 8 when the mempool
was very congested. The two ``disappearing transactions'' could
happen if someone submits a transactions to an overloaded public RPC
node, the transactions does not propagate well, and then the user
reconstructs the transactions with another node. The first transaction
will not confirm because it is a double spend. Seeing a transaction
in the mempool that never confirms happens sometimes during normal
transaction volumes, too. Single transactions like that appeared on
February 14, 17, and 23 and March 1 in the mempool archive data.
\section{Evidence for and against the spam hypothesis}
Is the March 4, 2024 transaction volume a result of many real users
starting to use Monero more, or is it spam created by a single entity?
\cite{Krawiec-Thayer2021} analyzed the July/August 2021 sudden rise
in transaction volume. We concluded that it was likely spam. Our evidence
was: 1) There was a sharp increase of 1in/2out and 2in/1out transactions,
but the volume of other transaction types did not increase, 2) All
the suspected spam paid minimum fees, 3) The distribution of ring
members became much younger, suggesting that the spammer was rapidly
re-spending outputs as quickly as possible.
Available time has not permitted a full run of the \cite{Krawiec-Thayer2021}
analysis on the March 2024 suspected spam data. It is easy to do a
quick check of transaction volume by input/output type. Figure \ref{fig-in-out-tx-type-volume}
plots the eight most common in/out transaction types on a log scale.
Only the volume of 1in/2out transactions increased on March 4, supporting
the spam hypothesis.
\begin{figure}[H]
\caption{Transaction volume by number of inputs and outputs (log scale)}
\label{fig-in-out-tx-type-volume}
\centering{}\includegraphics[scale=0.5]{images/in-out-tx-type-volume}
\end{figure}
More can be done to generate evidence for or against the spam hypothesis.
\cite{Krawiec-Thayer2021} analyzed the age of all ring members. Using
the OSPEAD techniques, the distribution of the age of the real spends
can be estimated.\footnote{\url{https://github.com/Rucknium/OSPEAD}}
Dandelion++ can defeat attempts to discover the origin of most transactions
because the signal of the real transaction is covered by the Dandelion++
noise. When the signal is huge like the spam, some statistical analysis
could overcome the Dandelion++ protection. Node can use the \texttt{net.p2p.msg:INFO}
log setting to record incoming fluff-phase transactions. From April
14, 2024 to May 23, 2024, peer-to-peer log data was collected from
about ten Monero nodes to try to establish evidence that the suspected
black marble transactions originated from a single node.\footnote{Thanks to cyrix126, Nep Nep, and anonymous node operators for contributing
log data.} Two factors have made this difficult. First, network topology information,
i.e. which nodes are connected to each other, is not easily obtained.
\cite{Cao2020} used the last\_seen timestamp in peer-to-peer communications
to estimate the node topology, but the timestamp has been removed
from Monero's node code.\footnote{\url{https://github.com/monero-project/monero/pull/5681} and \url{https://github.com/monero-project/monero/pull/5682}}
Topology information would have allowed a ``node crawler'' to move
through the network toward the likely source of the transaction spam.
Second, log data collection started after the spam wave ended, and
no new spam waves appeared. Therefore, the aim of the data analysis
had to change. The following analysis uncovers facts about Monero's
network and transaction propagation during normal operation that could
provide a foundation for future research on the network's privacy
and transaction propagation properties.
The number of unique IP addresses of peer nodes in the dataset is
about 13,600. This may be a rough estimate of the total number of
nodes on the network. Counting nodes this way can create both under-counts
and over-counts because of nodes entering and leaving the network,
nodes changing IP addresses, and multiple nodes behind the same IP
address. In any case, the 13,600 figure is similar to a May 29, 2024
count by \texttt{monero.fail} of about 12,000 nodes on the network.\footnote{https://web.archive.org/web/20240529014020/https://monero.fail/map}
The stability of the network topology is one of the factors that influences
the effectiveness of Monero's Dandelion++ network privacy protocol.
When nodes are connected to each other for a long time, it is easier
for an adversary to get information about network topology and use
it to try to discover the true node origin of a transaction (\cite{Sharma2022}).
The rate of connection creation and destruction could also affect
the vulnerability of the network to partitioning and eclipse attacks
(\cite{Franzoni2022b}).
A node can have two basic type of connections: incoming and outgoing.
A node's ``incoming'' connections are connections that the node's
peer initiated. A node's ``outgoing'' connections are connections
that the node initiated. By default, nodes that are behind a firewall
or residential router usually do not accept incoming connections.
The default maximum number of outgoing connections is 12. There is
no limit on incoming connections by default, but usually nodes accepting
incoming connections have between 50 and 100 incoming connections.
\begin{wrapfigure}{I}{0.45\columnwidth}%
\caption{Peer connection duration}
\label{fig-p2p-connection-duration}
\begin{centering}
\includegraphics[scale=0.5]{images/p2p-connection-duration}
\par\end{centering}
\end{wrapfigure}%
\begin{wraptable}{O}{0.45\columnwidth}%
\input{tables/multiple-send-p2p.tex}\end{wraptable}%
Based on the timestamps of transaction gossip messages from nodes
that accept incoming connections, the median duration of incoming
connections was 23 minutes. For outgoing connections, the median duration
was 23.5 minutes. A small number of connections last for much longer.
About 1.5 percent of incoming connections lasted longer than 6 hours.
About 0.2 percent of incoming connections lasted longer than 24 hours.
No outgoing connections lasted longer than six hours. This means that
some peer nodes chose to keep connections alive for a long period
of time. Node operators can manually set the \texttt{-{}-add-priority-node}
or \texttt{-{}-add-exclusive-node} node startup option to maintain
longer connections. Figure \ref{fig-p2p-connection-duration} is a
kernel density estimate of the duration of incoming and outgoing connections.
A small number of connections last for only a few minutes. A large
number of connections end at about 25 minutes.
Monero's fluff-phase transaction propagation is a type of gossip protocol.
In most gossip protocols, nodes send each unique message to each peer
one time at most. Monero nodes will sent a transaction to the same
peer multiple times if the transaction has not been confirmed by miners
after a period of time. Arguably, this behavior makes transaction
propagation more reliable, at the cost of higher bandwidth usage.
Usually, transactions are confirmed immediately when the next block
is mined, so transactions are not sent more than once. If the transaction
pool is congested or if there is an unusually long delay until the
next block is mined, transactions may be sent more than once. In the
dataset, about 93 percent of transactions were received from the same
peer only once. About 6 percent were received from the same peer twice.
About 1 percent of transactions were received from the same peer more
than twice.
Table \ref{table-multiple-send-p2p} shows the median time interval
between receiving duplicate transaction from the same peer. Up to
the seventh relay, the $i$th relay has a delay of $f(i)=5\cdot2^{i-2}$.
After the seventh relay, the data suggests that some peers get stuck
broadcasting transactions every two to four minutes.\footnote{boog900 stated that ``re-broadcasts happen after 5 mins then 10,
then 15 increasing the wait by 5 mins each time upto {[}sic{]} 4 hours
where it is capped''. The form of this additive delay is similar
to the exponential delay that the empirical data suggests. https://libera.monerologs.net/monero-research-lab/20240828\#c418612}
A Monero node's fluff-phase gossip message can contain more than one
transaction. Usually, when a stem-phase transaction converts into
a fluff-phase transaction, it will be the only transaction in its
gossip message. As transactions propagates through the network, they
will tend to clump together into gossip messages with other transactions.
The clumping occurs because nodes maintain a single fluff-phase delay
timer for each connection. As soon as the ``first'' transaction
is received from a peer, a Poisson-distributed random timer is set
for each connection to a peer. If a node receives a ``second'',
``third'', etc. transaction before a connection's timer expires,
then those transaction are grouped with the first one in a single
message that eventually is sent to the peer when the timer expires.
Table \ref{table-tx-clumping-p2p} is shows the distribution of clumping.
About 25 percent of gossip messages contained just one transaction.
Another 25 percent contained two transactions. The remaining messages
contained three or more transactions.
\begin{wraptable}{O}{0.35\columnwidth}%
\input{tables/tx-clumping-p2p.tex}\end{wraptable}%
A subset of the nodes that collected the peer-to-peer log data also
collected network latency data through ping requests to peer nodes.
The data can be used to analyze how network latency affects transaction
propagation. When it takes longer for a peer node to send a message
to the data logging node, we expect that data logging node will receive
transactions from high-latency nodes later, on average, than from
low-latency nodes. I estimated an Ordinary Least Squares (OLS) regression
model to evaluate this hypothesis. First, I computed the time interval
between the first receipt of a transaction from any node and the time
that each node sent the transaction: \texttt{time\_since\_first\_receipt}.
Then, the round-trip ping time was divide by two to get the one-way
network latency: \texttt{one\_way\_ping}. The regression equation
was \texttt{time\_since\_first\_receipt = one\_way\_ping + error\_term}
The estimated coefficient on \texttt{one\_way\_ping} was 7.5 (standard
error: 0.02). This is the expected direction of the association, but
the magnitude seems high. The coefficient means that a one millisecond
increase in ping time was associated with a 7.5 millisecond increase
in the time to receive the transaction from the peer. If the effect
of ping on transaction receipt delay only operated through the connection
between the peer node and the logging node, we may expect an estimated
coefficient value of one. There are at least two possible explanations
for the high estimated coefficient. First, assume that the logging
nodes were located in a geographic area with low average ping to peers.
And assume that the high-ping peers were located in an area with high
average ping to peers. Then, the high-ping nodes would have high delay
in sending \textit{and} receiving transactions from the ``low-ping''
cluster of nodes. That effect could at least double the latency, but
the effect could be even higher because of complex network interactions.
Second, only about two-thirds of peer nodes responded to ping requests.
The incomplete response rate could cause sample selection bias issues.
Occasionally, two of the logging nodes were connected to the same
peer node at the same time. Data from these simultaneous connections
can be analyzed to reveal the transaction broadcast delay patterns.
The logging nodes did not try to synchronize their system clocks.
The following analysis used the pair of logging nodes whose system
clocks seemed to be in sync.
\begin{figure}[H]
\caption{Time difference between tx receipt, one-second cycle}
\label{fig-one-second-period-tx-p2p-msg}
\centering{}\includegraphics[scale=0.5]{images/one-second-period-tx-p2p-msg}
\end{figure}
During the data logging period, Monero nodes drew a random variable
from a Poisson distribution to create transaction broadcast timers
for each of its connections. The distribution may change to exponential
in the future.\footnote{\url{https://github.com/monero-project/monero/pull/9295}}
The raw draw from the Poisson distribution set the rate parameter
$\lambda$ to 20 seconds. Then, the draw is divided by 4. The final
distribution has a mean of 5 seconds, with possible values at each
quarter second. If a node is following the protocol, we should observe
two data patterns when we compute the difference between the arrival
times of a transaction between two logging nodes. First, the differences
should usually be in quarter second intervals. Second, the difference
should follow a Skellam distribution, which is the distribution that
describes the difference between two Poisson-distributed independent
random variables. These patterns will not be exact because of difference
in network latencies.
Figure \ref{fig-one-second-period-tx-p2p-msg} shows a circular kernel
density plot of the time difference between two nodes receiving the
same transaction from the same peer node. The data in the plot was
created by taking the remainder (modulo) of these time differences
when divided by one second. The results are consistent with expectations.
The vast majority of time differences are at the 0, 1/4, 1/2, and
3/4 second mark.
Figure \ref{fig-skellam-histogram-tx-p2p-msg} shows a histogram of
the empirical distribution of time differences and a theoretical Skellam
distribution.\footnote{The Skellam distribution probability mass function has been re-scaled
upward by a factor of 8 to align with the histogram. Each second contains
8 histogram bins. } The histogram of the real data and the theoretical distribution are
roughly similar except that the number of empirical observation at
zero is almost double what is expected from the theoretical distribution.
A zero value means that the two logging nodes received the transaction
from the peer node at almost the same time.
\begin{figure}[H]
\caption{Histogram of time difference between tx receipt}
\label{fig-skellam-histogram-tx-p2p-msg}
\centering{}\includegraphics[scale=0.5]{images/skellam-histogram-tx-p2p-msg}
\end{figure}
\bibliographystyle{apalike-ejor}
\bibliography{monero-black-marble-flood}
\end{document}