misc-research/Monero-Black-Marble-Flood/pdf/monero-black-marble-optimal-fee-ring-size.tex


\documentclass[english]{article}
\usepackage[T1]{fontenc}
\usepackage[latin9]{inputenc}
\usepackage{float}
\usepackage{url}
\usepackage{amsmath}
\usepackage{amssymb}
\usepackage{stackrel}
\usepackage{graphicx}

\makeatletter


\usepackage[dvipsnames,table]{xcolor}

\usepackage{lineno}
\linenumbers
\linespread{1.25}

\usepackage{longtable}
\usepackage{pdflscape}

\usepackage[bookmarks=true]{hyperref}
\usepackage{orcidlink}
\usepackage{booktabs}
\usepackage{caption}
\usepackage{longtable}
% \usepackage[T1]{fontenc}
\usepackage{geometry}
\geometry{verbose,tmargin=2cm,bmargin=2cm,lmargin=2cm,rmargin=2cm}
\usepackage{array}
\usepackage{url}
\usepackage{multirow}
\usepackage{stackrel}
\usepackage{rotating}

\usepackage{bbold}


\usepackage{array} % for ExtraRowHeight

\usepackage{graphicx}
\usepackage{siunitx}
\usepackage[normalem]{ulem}
\usepackage{colortbl}

\usepackage{hhline}
\usepackage{calc}
\usepackage{tabularx}
\usepackage{threeparttable}
\usepackage{wrapfig}
\usepackage{adjustbox}


\usepackage{hyperref}

\newcolumntype{L}{>{\raggedright\arraybackslash}}
\newcolumntype{R}{>{\raggedleft\arraybackslash}}
\newcolumntype{C}{>{\centering\arraybackslash}}

% \renewcommand{\arraystretch}{1.5}
\renewcommand{\tabcolsep}{0.2cm}

\setlength{\LTpre}{6pt}
\setlength{\LTpost}{6pt}

\makeatother

\usepackage{babel}


% https://tex.stackexchange.com/questions/151241/remove-metadata-of-pdf-generated-by-latex
\hypersetup{
    bookmarks=true,         % show bookmarks bar?
    unicode=false,          % non-Latin characters in Acrobat's bookmarks
    pdftoolbar=true,        % show Acrobat's toolbar?
    pdfmenubar=true,        % show Acrobat's menu?
    pdffitwindow=false,     % window fit to page when opened
%    pdfstartview={FitW},    % fits the width of the page to the window
    pdftitle={Defeating a Black Marble Flood Against Monero: Best Options for Ring Size and Transaction Fee},    % title
    pdfauthor={Rucknium},     % author
    pdfsubject={},   % subject of the document
    pdfcreator={Rucknium},   % creator of the document
    pdfproducer={},  % producer of the document
    pdfkeywords={}, % list of keywords
    pdfnewwindow=true,      % links in new window
    colorlinks=false,       % false: boxed links; true: colored links
    linkcolor=red,          % color of internal links
    citecolor=green,        % color of links to bibliography
    filecolor=magenta,      % color of file links
    urlcolor=cyan           % color of external links
}


\begin{document}

\title{Defeating a Black Marble Flood Against Monero: Best Options for Ring Size and Transaction Fee\\\vspace{.3cm}
\large Draft v0.1\vspace{-.715cm}}
\author{Rucknium\orcidlink{0000-0001-5999-8950} }
\date{May 28, 2024}


\maketitle

\section{Summary}

Increasing Monero's ring size and minimum transaction fee are two
options for defeating black marble flooding. This document attempts
to answer the question: Is it better to increase the ring size or
the transaction fee, or some combination of the two? Cost-Effectiveness
Analysis is used to analyze this question. It considers the additional
costs imposed on transacting users and node operators compared to
the benefit of stronger resistance to black marble flooding.

Consider an adversary with a daily budget of 12.5 XMR, five times
higher than the daily expenditure of the suspected March 2024 black
marble flooder. Given the constraints considered, the most cost-effective
combination of defense parameters are ring size 60 and minimum 70
nanonero per byte fee. Effective ring size would be 22.8 if the adversary
spent his entire budget every day. The 2in/2out reference transaction
with ring size 60 would be about 140\% larger than the transaction
with current ring size 16. The user's cost to send this transaction
would be about 4.4 USD cents. The total time to verify all transactions
in a block of normal transaction volume would increase from 0.5 seconds
to 1.8 seconds. An unpruned node would grow 59 GB in a year instead
of 25 GB. Pruned nodes would grow 14 GB instead of 8 GB.

\section{Black marble flooding as a game}

We will analyze the problem as a game with two players. One player
aims to flood the Monero blockchain with black marble outputs. This
player is limited by his budget. The other player aims to deter the
first player, or at least limit the damage, by choosing minimum fee
and ring size. This player is limited by the costs that fees and ring
size impose of transacting users and node operators.

Sam is a privacy adversary. His goal is to reduce Monero's effective
ring size by flooding the Monero blockchain with black marble outputs
that he owns. He has some budget $b$ denominated in XMR to spend
on transaction fees per block.

Alice wishes to defeat Sam. She can set Monero's ring size and minimum
transaction fee to try to accomplish her goal. Sam would have to spend
more XMR per output if the minimum fee per byte were higher. A larger
ring size would require Sam to own a larger share of outputs to achieve
a specified effective ring size. (Without changing the minimum fee
per byte, a larger ring size also requires Sam to spent more XMR to
produce each output because transaction size is larger.)

Larger ring sizes and fees help Alice accomplish her goal of defeating
Sam, but Alice cannot raise ring size and fee without limit. Users
who send Monero transactions need to pay a higher fee when the minimum
transaction fee is higher. Larger ring sizes mean that transactions
are larger. At a given transaction volume, larger transactions make
the blockchain grow faster. People who operate Monero nodes need to
store the blockchain on their storage media such as Solid State Drives
(SSDs). Alice needs to balance the benefit of greater defense against
Sam against the cost imposed on transacting users and node operators.

These are the factors on Alice's mind:
\begin{itemize}
\item I do not know Sam's budget $b$. I do not know what effective ring
size he hopes to achieve. If I set ring size and fee so that he cannot
achieve his desired effective ring size with his budget $b$, he will
choose not to flood the blockchain with black marbles. This is the
deterrence outcome.
\item If I fail to deter Sam, at least I can hold him to a specific effective
ring size when he spends his budget $b$. This is the fallback outcome.
\item I do not want to set ring size and transaction fee unnecessarily high
because transacting users and node operators pay higher costs when
these parameters increase.
\end{itemize}
We will simplify the problem:
\begin{itemize}
\item Sam may actually change the budget he is willing to spend based on
the effective ring size he is able to achieve. In other words, Sam
may have a tradeoff function between budget and effective ring size.
We will ignore this complication and assume that Sam's budget is fixed,
but unknown to Alice.
\item We will use the fallback outcome to measure the effectiveness of Alice's
options. When the fallback outcome is better for Alice, the deterrence
outcome is more likely. Therefore, it is a little redundant to compute
the probability of the deterrence outcome as an effectiveness metric.
\item Transaction volume by normal users is assumed to be constant and unaffected
by changes in the transaction fee. In other words, we will assume
that the demand for Monero transactions is completely inelastic with
respect to transaction fee.
\item We will assume that Sam's black marble transactions are 1in/2out because
the suspected black marble flood of March 2024 used this type of transaction.
Sam could produce black marble outputs more cheaply with 1in/16out
transactions, but the flood transactions would be easier for an observer
to identify.
\end{itemize}
Alice will use Cost-Effectiveness Analysis (CEA) to evaluate her ring
size and fee options. Cost effectiveness is the ratio of cost to effectiveness:

\begin{equation}
CE=\dfrac{\mathrm{Cost}}{\mathrm{Effectiveness}}
\end{equation}

A lower value of $CE$ is better. Alice must define cost and effectiveness
as functions of ring size, transaction fee, and the adversary's budget.
Let $n$ be nominal ring size, $f$ be the fee per byte in nanonero
units, and $b$ be the adversary's budget. Costs will be measured
in terms of XMR per block.

The cost has two components: cost to transacting users and cost to
node operators. The $i$th transaction has some number of inputs and
outputs. Changing the ring size $n$ changes the total size of the
$i$th transaction, which affects the total minimum fee to send the
transactions. And changing the minimum fee per byte changes the total
fee, of course. Let $w_{i}\left(n\right)$ be the weight of transaction
$i$ when ring size is $n$. When a transaction has two outputs, transaction
weight is equal to transaction size in bytes. Weight is larger than
size when the number of outputs is greater than two.\footnote{See Section 7.3.2 of koe, Alonso, K. M., \& Noether, S. (2020). \textit{Zero
to Monero: Second Edition}.} The block is assumed to contain an average set of transactions $T$.
The average is based on observed transactions confirmed on the blockchain
in the four weeks before the March 2024 suspected black marble flooding:
February 5 -- March 3. $C_{u}\left(f,n\right)$ is the aggregate
users' cost to send transactions for an average block:

\begin{equation}
C_{u}\left(f,n\right)=\underset{i\in T}{\sum}f\cdot w_{i}\left(n\right)
\end{equation}

The cost to node operators is a function of ring size. Node operators
do not pay higher costs when the minimum transaction fee is higher.
All units of computer storage in this document will be SI units, i.e.
a kilobyte, megabyte, gigabyte and terabyte are $10^{3}$, $10^{6}$,
$10^{9}$, and $10^{12}$ bytes, respectively. The retail price of
one consumer 1 TB SATA SSD is about 1 XMR.\footnote{In April 2024, the median retail price of a 1TB SATA SSD on \url{https://ssd.userbenchmark.com/}
was 114.50 USD. The exchange rate at the time was 120 USD per XMR.} A node operator's cost $C_{SSD}$ to store one byte of Monero blockchain
data is $10^{-12}$ XMR (a piconero). According to \texttt{monero.fail/map},
there were about 20,000 Monero nodes on the network in April 2024.
Currently the minimum relay fee is 20,000 piconeros (20 nanoneros)
per byte. Therefore, by coincidence Monero transactions pay for their
own storage space on the node network when users pay the minimum fee
per byte.

Let $d_{n}$, the number of nodes (daemons), be 20,000. $z_{i}\left(n\right)$
is the size of the $i$th transaction in the $T$ set when ring size
is $n$. The $m$ is an adjustment parameter that raises or lowers
total node operators' costs by a linear factor to adjust for uncertainty
about the true number of nodes and to add costs that are more difficult
to compute like CPU and RAM use. In the analysis below $m$ will be
set to 2. We will assume that each node is an unpruned node that stores
all transaction data in full. The total cost to node operators is
the sum of the size of transactions in the $T$ set multiplied by
the storage cost on a single SSD, the number of nodes on the network,
and the $m$ adjustment parameter:

\begin{equation}
C_{d}\left(n,m\right)=m\cdot d_{n}\cdot C_{SSD}\cdot\underset{i\in T}{\sum}z_{i}\left(n\right)
\end{equation}

Notice that $C_{d}\left(n,m\right)$ is the cost to node operators
under normal transaction volume, i.e. when there is no black marble
flooding. Total cost is the sum of $C_{u}\left(f,n\right)$ and $C_{d}\left(n,m\right)$:

\begin{equation}
C\left(f,n,m\right)=C_{u}\left(f,n\right)+C_{d}\left(n,m\right)
\end{equation}

With budget $b$, Sam can afford to place $\frac{b}{f}$ bytes of
transaction data in a block. Sam would create transactions with one
input and two outputs. The formula for the number of bytes of a transaction
like this in terms of the ring size $n$ is $975+35n$. The $975$
bytes is the size of the transaction except for the linear cost of
the ring size, i.e. a (invalid) 1in/2out transaction with ring size
0 would have $975$ bytes composed of the input's key image, other
input data that does not scale up with ring size, the outputs' bulletproofs+,
the outputs' public key, and \texttt{tx\_extra} data. The $35$ coefficient
on $n$ is the sum of the bytes of the ``$s$'' component of the
CLSAG ring signature of each ring member (32 bytes) and $3$ bytes
of the key offset integer that is used to create the output indices
of the ring members. The $3$ bytes is an empirical average of the
byes used by each key offset integer. The number of outputs per byte
that Sam produces is $2/\left(975+35n\right)$ because each of his
transaction has two outputs. To calculate the number of outputs per
block that Sam can afford with budget $b$ when fee is $f$ and nominal
ring size is $n$, we compute the product of $\frac{b}{f}$ and $2/\left(975+35n\right)$,
producing the formula for $s\left(b,f,n\right)$:

\begin{equation}
s\left(b,f,n\right)=\dfrac{2b}{f\cdot\left(975+35n\right)}
\end{equation}

Let $r$ be the number of real user outputs. When the number of outputs
owned by Sam is $s\left(b,f,n\right)$, the long-term mean effective
ring size\footnote{For a derivation of mean effective ring size, see Section 3 of Draft
v0.2 of Rucknium (2024) ``March 2024 Suspected Black Marble Flooding
Against Monero: Privacy, User Experience, and Countermeasures'' \url{https://github.com/Rucknium/misc-research/blob/main/Monero-Black-Marble-Flood/pdf/monero-black-marble-flood.pdf}} is

\begin{equation}
n_{e}\left(b,f,n\right)=1+\left(n-1\right)\cdot\dfrac{r}{r+s\left(b,f,n\right)}\label{eq:expectation-n_e}
\end{equation}

Alice wants to have a larger $n_{e}$ when Sam is producing black
marbles. $n_{e}$ is the desired outcome in the cost-effectiveness
analysis:

\begin{equation}
CE=\dfrac{C\left(f,n,m\right)}{n_{e}\left(b,f,n\right)}\label{eq:cost-effectiveness-full}
\end{equation}

Alice's goal is to choose minimum fee per byte $f$ and nominal ring
size $n$ to minimize $CE$ when Sam spends his budget $b$ producing
black marbles and the node cost multiplier is some specified $m$.
In game theory, a player's \textit{best response} in a two-player
game is a strategy that gives the player the best payoff when the
other player plays some specified strategy. Alice's best response
to Sam playing some $b$ as a strategy is to set $f$ and $n$ to
minimize $CE$. Alice does not know what value of $b$ Sam intends
to play, but reasonable values of $b$ can be analyzed to guide reasonable
choices of $f$ and $n$. In game theory terms, Alice's uncertainty
about Sam's $b$ means that this is a game of imperfect information.
Sam's player ``type'' is the unknown $b$. Sam has some probability
of being each type. In this document I will not explicitly declare
some probability distribution of Sam's type, but one could determine
Alice's single best response for the expected value of her cost effectiveness
when Sam's type has some probability distribution.

Define $f_{\min}$ and $f_{\max}$ as the minimum and maximum $f$
that Alice is willing to set. Let $n_{\min}$ and $n_{\max}$ be the
minimum and maximum $n$ that Alice is willing to set. Assume that
Alice wants to make sure that the effective ring size does not fall
below some specified minimum acceptable limit $\check{n}_{e}$. Alice
will try to minimize (\ref{eq:cost-effectiveness-full}) except when
the effective ring size would be below $\check{n}_{e}$ at the minimum
of (\ref{eq:cost-effectiveness-full}). In that case, Alice will exclude
the values of $n$ and $f$ that cause effective ring size to be below
$\check{n}_{e}$, then choose $n$ and $f$ to minimize (\ref{eq:cost-effectiveness-full})
from the set of $n$ and $f$ values that remain.

Alice's best response correspondence given Sam's choice of $b$ and
the node cost multiplier $m$ is the solution to

\begin{equation}
\begin{array}{l}
\underset{f,n}{\arg\min}\dfrac{C\left(f,n,m\right)}{n_{e}\left(b,f,n\right)}\\
\mathrm{subject\,to}\\
f_{\min}\leq f,\,f\leq f_{\max}\\
n_{\min}\leq n,\,n\leq n_{\max}\\
\check{n}_{e}\leq n_{e}\left(b,f,n\right)
\end{array}\label{eq:best-response-correspondence}
\end{equation}

The problem in (\ref{eq:best-response-correspondence}) is a nonlinear
minimization problem with nonlinear inequality constraints. Note that
the constraint set is convex, but the objective function is neither
globally convex nor globally concave.\footnote{The full proof of this statement is TODO. The first four constraints
form a convex set because they are affine. The $\check{n}_{e}\leq n_{e}\left(b,f,n\right)$
constraint is more complicated. The Hessian matrix of the second-order
partial derivatives of $n_{e}$ with respect to $f$ and $n$ is negative
definite as long as $n>1$. That means that its superlevel set for
some $\check{n}_{e}$ is convex. (The $\check{n}_{e}\leq n_{e}\left(b,f,n\right)$
inequality defines the superlevel set.) The intersection of two convex
sets is convex, so the constraint set of (\ref{eq:best-response-correspondence})
is convex.} The necessary conditions for the solution could be found analytically
by checking the Karush-Kuhn-Tucker conditions. I will solve it numerically
with a grid search. The grid is formed by evaluating (\ref{eq:cost-effectiveness-full})
many times at different values of $f$ and $n$. The values of $f$
are 40 equally-spaced values between $f_{\min}$ and $f_{\max}$.
The values of $n$ are each integer between $n_{\min}$ and $n_{\max}$.

We will start with a simple example. Assume that the adversary's budget
is 2.5 XMR per day. This is approximately the expenditure rate of
the suspected black marble flooder in March 2024. We will evaluate
cost-effectiveness at each combination of $f=\left\{ 10,20,40,100,200\right\} $
nanoneros per byte and $n=\left\{ 16,30,45,60\right\} $ ring size.

Table \ref{table-2_5-budget} contains the cost effectiveness (CE)
computations with other metrics like transaction size, total projected
growth of the blockchain, and estimated transaction verification time.
Note that the cost to send a 2in/2out transaction increases when ring
size increases even if the fee per byte does not increase because
users have to pay for larger total transaction size. The numerator
of CE has been scaled to millineros. The lowest value in the CE column
is 0.48 when nominal ring size is 60 and minimum fee is 40 nanoneros
per byte. Sam can achieve a 37.5 effective ring size with a 2.5 XMR/day
budget when nominal ring size is 60 and minimum fee is 40 nanoneros
per byte. Estimation of transaction and block verification time is
explained in Appendix \ref{sec:Appendix:-Transaction-verification}..

Figure \ref{fig-contour-plot-50-budget} is a shaded contour plot
of cost effectiveness when Sam has a budget of 50 XMR per day. Lighter
colors on the plot indicate lower CE values at the specified minimum
fee and ring size values. The blue triangle indicates the fee and
ring size values that minimize the CE when the minimum acceptable
effective ring size of 5 is disregarded. When we allow only fee and
ring size values that produce effective ring size above the minimum
acceptable effective ring, the green circle indicates the fee and
ring size values that minimize the CE. In this plot the triangle and
circle are at the same location because the minimum CE produces an
effective ring size of 12.8, above the minimum effective ring size
of 5.

Table \ref{table-scenarios-budget} shows the values of minimum fee
and ring size that produce optimal cost effectiveness when Sam has
different budgets. The maximum budget, 500 XMR per day, exceeds Monero's
daily security budget provided by tail emission. An adversary's budget
higher than 500 might imply that the adversary could directly 51 percent
attack the blockchain by renting CPU hashpower. It seems unnecessary
to consider a black marble flooder's budget greater than 500 XMR per
day because an adversary with a higher budget might be able to do
more damage to Monero than flooding the blockchain with black marble
outputs.

\begin{landscape}
\footnotesize{
\input{tables/permuted-cost-effectiveness-2_5-budget.tex}
}
\end{landscape}

\begin{figure}
\caption{Most cost-effective minimum fee and ring size when adversary budget
is 50 XMR per day}

\label{fig-contour-plot-50-budget}

\includegraphics[scale=0.65]{images/cost-effective-contour-plot-50-budget}
\end{figure}

\begin{landscape}
\footnotesize{
\input{tables/cost-effectiveness-budget-scenarios.tex}
}
\end{landscape}

\section{Discussion}

What have we learned? According to this analysis, raising the ring
size is a more cost-effective strategy against a black marble
attack than raising fees. A combination of a large increase in ring
size and a modest increase in fee seems to provide a good, cost-effective
defense.

Consider an adversary with a daily budget of 12.5 XMR, five times
higher than the daily expenditure of the suspected March 2024 black
marble flooder. Table \ref{table-scenarios-budget} says the most
cost-effective combination of defense parameters are ring size 60
and minimum 70 nanonero per byte fee. Effective ring size would be
22.8 if Sam spent his entire budget every day. The 2in/2out reference
transaction with ring size 60 would be about 140\% larger than the
transaction with current ring size 16. The user's cost to send this
transaction would be about 4.4 USD cents. The total time to verify
all transactions in a block of normal transaction volume would increase
from 0.5 seconds to 1.8 seconds. An unpruned node would grow 59 GB
in a year instead of 25 GB. Pruned nodes would grow 14 GB instead
of 8 GB.

Put these storage requirements into perspective. Recall that we use
base-10 (SI) units to measure bytes in this document. As of May 2024,
a unpruned Monero blockchain is 206 GB. A pruned Monero node takes
79 GB of storage space. The 2023 Ultra 4K edition of Call of Duty
requires 229 GB of storage.\footnote{{\scriptsize{}\url{https://web.archive.org/web/20231214215231/https://www.callofduty.com/blog/2023/10/call-of-duty-modern-warfare-III-specs-preloading-pc-trailer}}}
An unpruned BTC node requires 650 GB of storage and grows about 89
GB per year.\footnote{\url{https://bitcoin.stackexchange.com/a/116350} and \url{https://transactionfee.info/charts/block-size/}}
Therefore, with ring size 60 the Monero blockchain would grow slower
than the BTC blockchain, crossing the Call of Duty storage threshold
within a year.

Encouraging node operators to prune their nodes and implementing a
coinbase consolidation transaction type could reduce the impact of
increasing the minimum fee and ring size. Pruning could be encouraged
by setting pruning as the default in more Monero software interfaces,
such as the Monero GUI wallet Pull Request \#4320, and public information
campaigns.\footnote{\url{https://github.com/monero-project/monero-gui/pull/4320}}
A coinbase consolidation type would reduce the transaction size for
small coinbase outputs.\footnote{\url{https://github.com/monero-project/research-lab/issues/108}}

\section{Summary: Downsides and benefits of options}
\begin{enumerate}
\item Increase the minimum relay fee per byte
\begin{enumerate}
\item Downsides:
\begin{enumerate}
\item Users may make fewer transactions. That would reduce Monero's total
anonymity set because the rate of creation of new outputs would fall.
\item Users could move to another means of payment.
\item Monero might lose its reputation as a low-cost means of payment.
\item Large changes in Monero's fiat exchange rate could make the purchasing
power of the minimum fee much higher or lower than anticipated.
\end{enumerate}
\item Benefits:
\begin{enumerate}
\item Miners would earn more from fees. This would increase Monero's resistance
to 51 percent attack because its mining security budget would increase
a little.
\item Higher fees would increase the cost of all spam regardless of motivation.
(Increasing the ring size only negatively affects spammers that want
to reduce the effective ring size.)
\end{enumerate}
\end{enumerate}
\item Increase the ring size
\begin{enumerate}
\item Downsides:
\begin{enumerate}
\item Greater storage requirements for operating a Monero node could cause
some node operators to stop running their nodes. This would make the
Monero network less decentralized.
\item Some Monero wallet users may stop running local nodes and switch to
remote nodes. This would increase the load on public remote nodes
and potentially expose the wallet users to some privacy risks from
malicious remote nodes.\footnote{See \url{https://docs.featherwallet.org/guides/nodes}}
\item Verification time per transaction would increase. During normal operation,
the Monero node would use more CPU resources. During initial blockchain
download, total sync time would be greater. Syncing a Monero node
on a HDD, which is already very difficult, might become completely
nonviable because of the necessary random reads for ring signature
verification.
\item At extremes, long verification times can threaten network stability.
In 2023 Pirate Chain suffered a transaction spam attack that caused
chain splits because of long transaction verification times.\footnote{\url{https://web.archive.org/web/20230803171107/https://old.reddit.com/user/SignificantRoof5656/comments/15h9reh/pirate_chains_045_spam_attack_2_months_later/}}
Monero uses the Fluffy Blocks protocol to verify transactions as they
arrive in the txpool instead of bottlenecking verification at the
time new blocks are mined. It is unclear if Pirate Chain, a code fork
of Zcash, uses a compact block protocol.\footnote{See \url{https://github.com/zcash/zips/issues/360}}
As long as the time to verify each block's transactions does not become
a large fraction of mean time between blocks (120 seconds), this is
probably not a threatening issue, \textit{in theory}. In practice,
the Monero node performs many more actions than just verifying the
cryptography of transactions. There may be hidden bottlenecks. Recently,
spikes of transactions with large numbers of inputs have seemed to
cause excess RAM usage of nodes, shutting down nodes in some cases.\footnote{\url{https://github.com/monero-project/monero/issues/9317}}
\end{enumerate}
\item Benefits:
\begin{enumerate}
\item Increasing the ring size increases the anonymity set of all transaction
inputs. Other statistical attacks unrelated to black marble flooding
like EAE attacks and timing analysis would become more difficult.
\end{enumerate}
\item Neutral:
\begin{enumerate}
\item Increasing the ring size has very little effect on wallet sync times.
The bandwidth costs for syncing transactions in mined blocks are only
about three bytes per ring member for the ring offset data. No additional
computation is required. However, ring signature data is sent from
nodes to wallets when transactions are still in the txpool.\footnote{Thanks to jeffro256 for explaining this.}
\end{enumerate}
\end{enumerate}
\item Encourage blockchain pruning
\begin{enumerate}
\item Downsides:
\begin{enumerate}
\item New unpruned nodes may have to connect to more nodes to create an
unpruned copy of the blockchain.
\item All pruned nodes keep one-eighth of the transaction data that is designated
``prunable''. If all nodes on the network are pruned, there is an
extremely small chance that one of the eight pruning slices will not
exist on the whole network. That would mean that not all signature
data on the blockchain could be verified. When blockchain pruning
is enabled, a Monero node randomly chooses one of eight possible pruning
seeds independently of the pruning seeds that other nodes have chosen.
By chance, the network could be missing one of the eight slices of
the pruneable part of the blockchain because the choice of pruning
seed is not coordinated between nodes. This chance is extremely small.
If the network only has pruned nodes and the total number of nodes
on the network is 681, the probability of missing one of the eight
pruning slices is less than $2^{-128}$, which is the probability
of guessing a specific 12-word BIP39 bitcoin seed phrase with a single
guess. See Appendix \ref{sec:Probability-of-recovering-complete-blockchain-data}
for how to compute this probability. This probability does not consider
the challenge of new nodes finding their pruning slices by connecting
to multiple nodes throughout the network.
\end{enumerate}
\item Benefits:
\begin{enumerate}
\item Pruned nodes consume much less storage space.
\end{enumerate}
\item Neutral:
\begin{enumerate}
\item ``There are no privacy or security downsides when using a pruned
node.''\footnote{\url{https://web.getmonero.org/resources/moneropedia/pruning.html}}
\end{enumerate}
\end{enumerate}
\item Implement ``Coinbase Consolidation Tx Type''\footnote{\url{https://github.com/monero-project/research-lab/issues/108}}
\begin{enumerate}
\item Downsides:
\begin{enumerate}
\item koe, the original proper of this protocol modification, said, ``After
thinking more, I am not sure this proposal is the right direction.
Enote consolidation being statistically significant, and consolidating
enotes with small amounts being expensive, is a general problem not
specific to coinbase enotes. Implementing a specific solution for
coinbase enotes amounts to elevating the circumstances of miners to
first-class status in the protocol, without solving the more general
problem. If another major project on the scale of p2pool becomes active
in Monero and would benefit from specific protocol changes (not trivial
benefits - privacy and scaling benefits even), should we hard fork
to accommodate them? To support protocol longevity by reducing hardforks
(and not setting precedents that would justify a relatively higher
rate of future hardforks), it seems better to aim for general solutions
to problems. In this case, one general solution to the privacy impacts
of consolidation would be a global membership proof. The cost of consolidations
might be addressed by using aggregate membership proofs that scale
sub-linearly with the number of memberships being proven (i.e. number
of tx inputs).''\footnote{\url{https://github.com/monero-project/research-lab/issues/108\#issuecomment-1379288635}}
\item There is a small privacy impact to some miners. Most centralized mining
pools already publish the blocks that they mine, so the ownership
of mining pools' coinbases is usually publicly known already.\footnote{Wijaya, D. A., Liu, J. K., Steinfeld, R., \& Liu, D. (2021) ``Transparency
or anonymity leak: Monero mining pools data publication''. Paper
presented at Information Security and Privacy - 26th Australasian
Conference, ACISP 2021, Virtual Event, December 1-3, 2021, Proceedings.} P2Pool payout addresses are public on the P2Pool side chain, allowing
good guesses about which transactions are consolidating coinbases
to specific mining addresses.\footnote{\url{https://p2pool.observer/sweeps}}
The P2Pool README recommends miners to use a separate mining wallet.\footnote{\url{https://github.com/SChernykh/p2pool?tab=readme-ov-file\#general-considerations}}
Therefore, a coinbase consolidation transaction type would not have
a large negative impact on the privacy of most miners because the
on-chain privacy for miners is low to begin with. The privacy of solo
miners could be negatively impacted, however. With the new transaction
consolidation type, those miners could send coins to themselves once
to create outputs that would enter the non-coinbase anonymity set.
\end{enumerate}
\item Benefits:
\begin{enumerate}
\item If the coinbase consolidation transaction type is implemented at the
same time as much larger rings, coinbase consolidations would not
take up so much storage. In the 60 ring member scenario, annual blockchain
growth would be 2.7 GB less if all coinbase outputs are spent by inputs
with ring size one.
\item If the ring size and/or fee per byte increases a lot, P2Pool mining
may become uncompetitive compared to centralized pool mining, especially
for the P2Pool mini chain. Consider the 10th percentile of multi-output
coinbase outputs during February 2024: 0.000272 XMR. (10\% of the
likely P2Pool outputs are below this amount.) With the status quo
ring size and minimum fee per byte, consolidating this P2Pool payout
by adding an input to a transaction costs the miner about 5 percent
of the value of that output. With the ring size 60 and 70 nanoneros
per byte scenario considered above, about 57 percent of the value
of that output would be consumed by the cost to spent the output in
a transaction's output. But if coinbase outputs only have to have
ring size 1, then even paying 60 nanoneros per byte would cost the
miner only 4.2 percent of the output's value when you spent it in
a 1-ring-member input. (The cost quoted here do not include the bytes
contributed by outputs or other transaction data.)
\item Coinbase outputs can behave like black marbles in the rings of transactions
that do not spend coinbase outputs. See the ``Avoiding selecting
coinbase outputs as decoys'' Monero Research Lab issue.\footnote{\url{https://github.com/monero-project/research-lab/issues/109}}
Implementing a coinbase consolidation transaction type would prevent
coinbase outputs from being included in the rings of transactions
that do not spend coinbase outputs. This would improve the privacy
of those transactions.
\end{enumerate}
\end{enumerate}
\end{enumerate}

\appendix
\newpage{}

\section{Appendix: Transaction verification time estimates\label{sec:Appendix:-Transaction-verification}}

The verification time estimates are based on performance tests developed
by koe. I modified the test parameters to produce estimates of a large
set of ring sizes, input counts, and output counts in \url{https://github.com/Rucknium/monero-tx-performance}.
koe provides interpretation of the performance tests in Monero Research
Lab issue $\#91$.\footnote{\url{https://github.com/monero-project/research-lab/issues/91}}
I used the same machine as koe for the tests. The verification performance
tests do not include the time to read data from storage media. The
2in/2out reference transaction and the assumed 1in/2out black marble
transaction type could be tested directly, but there were too many
permutations of the real transaction in/out types in the February-March
2024 sample to test those directly. Estimates of the real transaction
verification type were necessary to estimate the verification time
for an average real block. All tests were in batches of one because
setting the batching parameter did not seem to affect the verification
time of inputs (it did affect verification time of outputs, but the
research question is about varying different ring sizes of inputs,
not outputs).

A linear regression model was fit by Ordinary Least Squares (OLS)
to interpolate the estimated verification times for the set of real
transactions at different ring sizes. The performance test developed
by koe were originally designed to only compute ring sizes that are
powers of two. Therefore, ring size performance was tested for ring
size 1, 2, 4, 8, 16, 32, and 64. The number of outputs tested was
every integer between 2 and 16 because these are the allowed number
of transaction outputs by blockchain consensus rules. The maximum
number of inputs in a single transaction that a standard Monero wallet
can produce is 150. The number of inputs I used for the performance
estimates was:

$\{1,2,3,4,5,6,7,8,9,10,20,30,40,50,60,70,80,90,100,110,120,130,140,150\}$

Taking all permutations of these sets gives $7\cdot15\cdot24=2520$
permutations. Tests with these permutations produce the dataset used
in the OLS regression. We can guess a good functional form for the
regression equation based on knowledge of the time complexity of the
algorithms used to cryptographically verify transaction components.
I included ring size and inputs, the base 2 log of each, and their
interaction terms. Dummy variables of the ceiling of base-2 log of
the number of outputs was included in the regression equation since
bulletproofs verification times is a function of the integer ceiling
of the power of two of the number of outputs in the transaction. The
full regression equation is below.

\begin{equation}
\begin{array}{cl}
time= & \beta_{0}+\beta_{1}ring\_size+\beta_{2}inputs+\beta_{3}\log_{2}(ring\_size)+\beta_{4}\log_{2}(inputs)+\beta_{5}\mathbb{{1}}\left\{ \lceil\log_{2}(outputs)\rceil=2\right\} \\
 & +\beta_{6}\mathbb{{1}}\left\{ \lceil\log_{2}(outputs)\rceil=3\right\} +\beta_{7}\mathbb{{1}}\left\{ \lceil\log_{2}(outputs)\rceil=4\right\} \\
 & +\beta_{8}ring\_size\times inputs+\beta_{9}\log_{2}(ring\_size)\times\log_{2}(inputs)+\epsilon
\end{array}\label{eq:verification-time-regression}
\end{equation}

$\lceil x\rceil$ means the integer ceiling of $x$. $\mathbb{{1}}\left\{ x\right\} $
is an indicator function. Its value is $1$ when the statement in
braces is true and is $0$ otherwise. The results of the regression
are in Table \ref{table-regression-results}.. The adjusted $R^{2}$
is extremely high (0.9998), indicating that the model fits the data
well. However, a model may have a high $R^{2}$ when the scale of
the different observations is vastly different, which is the case
here.

Given the estimated parameters from (\ref{eq:verification-time-regression}),
predicted values of the verification time for all types of transactions
and all ring sizes can be computed for the February-March 2024 sample
by plugging the number of inputs, outputs, and ring size into the
regression equation with the $\hat{\boldsymbol{\beta}}$ estimated
parameters.

\begin{table}[H]
\caption{CLSAG transaction verification time OLS regression. Units are milliseconds.}

\input{tables/verification-time-regression.tex}

\label{table-regression-results}
\end{table}

\newpage{}

\section{Probability of recovering complete blockchain data from a network
with only pruned nodes\label{sec:Probability-of-recovering-complete-blockchain-data}}

The problem of collecting all eight of the Monero's pruning slices
is a type of coupon collector's problem. Holst (1986) provides the
formula to find the probability that you need $n$ pruned nodes on
the network to be able to recover the intact blockchain from the eight
unique slices.\footnote{Holst, L. (1986). ``On Birthday, Collectors\textquoteright , Occupancy
and Other Classical Urn Problems.'' International Statistical Review
/ Revue Internationale de Statistique, 54(1), 15--27. https://doi.org/10.2307/1403255} Holst says, ``In this paper we will consider problems connected
with drawing with replacement from an urn with $r$ balls of different
colours.....The inverse of the occupancy problem is sometimes called
the coupon collector's problem. It reads: how many draws are necessary
for obtaining $k$ different balls?'' Holst gives the general problem
when $r$ is not necessarily equal to $k$. In the pruned node problem,
we only need one copy of each unique slice, so $r=k$ in our case.
Holst says that the probability of needing exactly $n$ draws for
obtaining $k$ different balls when the urn has $r$ balls of different
colors is

\begin{equation}
\mathrm{P}\left(W_{k:r}=n\right)=r_{(k)}S(n-1,k-1)/r^{n}\label{eq:coupon-collectors}
\end{equation}

Holst defines $r_{(k)}\equiv r(r-1)\ldots(r-k+1)$. When $r=k$, this
is the factorial $r_{(k)}=r!$. $S(n,k)$ is a Stirling number of
the second kind:

\[
S(n,k)=\stackrel[j=0]{k}{\sum}(-1)^{j}\dbinom{k}{j}(k-j)^{n}
\]

The (\ref{eq:coupon-collectors}) equation is the probability that
you need exactly $n$ nodes on the network to have all eight distinct
slices. We want to know the probability that you need more than $n$
nodes to have all the slices. This probability is $1-\sum_{i=8}^{n}\mathrm{P}\left(W_{k:r}=i\right)$.

To avoid limitations of floating point computer arithmetic, when computing
these values it is recommended to use a software library that uses
arbitrary-precision numbers such as the GNU Multiple Precision Arithmetic
Library.

Figure \ref{fig-pruned-node-collectors-problem} plots the probability
that a Monero network would not contain all 8 pruned node slices.
When there are 100 nodes on the network, the probability is about
0.001 percent. When the number of nodes is 681, the probability of
not having all 8 pruned node slices is less than $2^{-128}$, which
is the probability of guessing a specific 12-word BIP39 bitcoin seed
phrase with a single guess.\footnote{\url{https://github.com/bitcoin/bips/blob/master/bip-0039.mediawiki}}
These probabilities correspond to a network in a single point in time.
If we consider that a network will have many ``draws'' in its lifetime,
then the probability of missing one of the eight slices during any
point in its lifetime is higher. If the whole set of $n$ nodes re-draws
its random pruning seed $q$ times, the probability of never missing
one of the eight slices is $\left(1-\mathrm{P}\left(\mathrm{missing\,at\,least\,one\,slice}\right)\right)^{q}$
because the draws are independent.

\begin{figure}

\caption{Probability of not having all 8 distinct pruned slices on the Monero
network}

\label{fig-pruned-node-collectors-problem}

\includegraphics[scale=0.4]{images/pruned-node-collectors-problem-to-100}\includegraphics[scale=0.4]{images/pruned-node-collectors-problem-to-1000}

\end{figure}

\end{document}