<?xml version="1.0" encoding="UTF-8"?>
<rss  xmlns:atom="http://www.w3.org/2005/Atom" 
      xmlns:media="http://search.yahoo.com/mrss/" 
      xmlns:content="http://purl.org/rss/1.0/modules/content/" 
      xmlns:dc="http://purl.org/dc/elements/1.1/" 
      version="2.0">
<channel>
<title>Tabaré Capitán</title>
<link>https://www.tabarecapitan.com/blog/</link>
<atom:link href="https://www.tabarecapitan.com/blog/index.xml" rel="self" type="application/rss+xml"/>
<description>Projects and blog on causal inference.</description>
<generator>quarto-1.7.33</generator>
<lastBuildDate>Sat, 07 Feb 2026 23:00:00 GMT</lastBuildDate>
<item>
  <title>My first package in Python</title>
  <link>https://www.tabarecapitan.com/blog/0006-ritest-pypi/</link>
  <description><![CDATA[ 






<section id="intro" class="level2">
<h2 class="anchored" data-anchor-id="intro">Intro</h2>
<p>A few months ago I was analysing a product adoption A/B test and wanted to use randomisation inference (RI), which I’ve done many times in Stata using <code>ritest</code>. But I did not find a functional equivalent in Python. ‘Not a problem’, I thought. I could code my own RI implementation; it is simply a function to permute the assignment and refit OLS inside a <code>for</code> loop.</p>
<p>Long story short: it took me a few months to call this project ‘done’.</p>
<p>Indeed the logic of RI is simple; it was not hard to get a working version. The point estimate and <img src="https://latex.codecogs.com/png.latex?p">-value matched Stata closely. I could have moved on. But I was a bit disappointed. I expected randomisation inference to be fast in Python, or at least faster than in Stata.<sup>1</sup> But that first version was much slower than Stata’s <code>ritest</code>. And I was familiar with R’s <code>ritest</code>, so I had an idea for what was possible.<sup>2</sup></p>
<p>In retrospect, it was naive of me to expect faster than Stata performance. I wrote that first version entirely with pandas and repeated calls to <code>statsmodels</code>; each iteration rebuilt the model and design matrix. Most of the runtime was overhead: rebuilding model objects and shuffling DataFrames, not the OLS itself. In terms of the data structures, pandas adds a lot of high-level machinery (indexing, alignment, copying) on top of NumPy arrays. And in terms of the estimation, <code>statsmodels</code> is great, but for this loop it was doing far more setup than needed.</p>
<p>I knew exactly what I had to do next: push the linear algebra closer to NumPy, avoid rebuilding objects unnecessarily, and use more specialised tools. As expected, my code got much faster. Since I’ve done so much work already, I naturally wanted to make sure my <code>ritest</code> would be ready for the next time I needed. In fact, wouldn’t it be nice to at least match Stata’s <code>ritest</code> features?</p>
<p>This post is about what I learned while following this very specific rabbit hole. Most of what I present in this post is obvious for those familiar with software development, but I am just an economist trying to use the right tools. Perhaps there are more like me out there, maybe in other fields, people who get distracted well beyond their original problem and convince themselves that it is time to write their first Python package.</p>
</section>
<section id="design" class="level2">
<h2 class="anchored" data-anchor-id="design">Design</h2>
<p>I think this is the most important step. You should know what you want.</p>
<p>My first priority was to achieve the same convenience I have had using Stata’s <code>ritest</code>, which required a simple public API. My second priority was flexibility, which was again a design concept taken from Stata’s <code>ritest</code>. This is what would end up being the ‘linear’ and ‘generic’ path. Finally, my last priority was speed, which is at odds with the second priority. Conveniently, the separation between ‘linear’ and ‘generic’ path provided a natural solution. I would guarantee speed on the linear path.</p>
</section>
<section id="tools" class="level2">
<h2 class="anchored" data-anchor-id="tools">Tools</h2>
<p>This section is a description of the tools that I used. This is not a tutorial on how to install or use these tools; I’m not the right person to do so. The section simply sets the stage for the next sections, in which I precisely describe my development workflow. You can safely skip this section if you are familiar with <a href="https://en.wikipedia.org/wiki/DevOps">DevOps</a>.</p>
<section id="version-control" class="level3">
<h3 class="anchored" data-anchor-id="version-control">Version control</h3>
<p>I used Git for basic version control for a single-developer workflow. I mostly used <code>add</code>, <code>commit</code>, <code>checkout</code>, <code>branch</code>, and <code>push</code>.</p>
</section>
<section id="hooks" class="level3">
<h3 class="anchored" data-anchor-id="hooks">Hooks</h3>
<p><a href="https://git-scm.com/book/en/v2/Customizing-Git-Git-Hooks">Git hooks</a> are scripts that run automatically at certain points in the git workflow (for example, before a commit is created). In practice, this meant that a basic standard was enforced <em>before</em> anything became a commit. It is very easy to set everything up using <a href="https://github.com/pre-commit"><code>pre-commit</code></a>.</p>
<p>My hooks, declared in <a href="https://github.com/tabareCapitan/ritest/blob/master/.pre-commit-config.yaml"><code>.pre-commit-config.yaml</code></a> , include:</p>
<ul>
<li><strong><a href="https://docs.astral.sh/ruff/linter/">Ruff Linter</a></strong>: a fast <a href="https://en.wikipedia.org/wiki/Lint_(software)">linter</a> that catches common mistakes and, importantly, can fix many of them automatically.</li>
<li><strong><a href="https://docs.astral.sh/ruff/formatter/">Ruff Formatter</a></strong>: format the code into a consistent style so formatting decisions stop being a recurring discussion.</li>
<li><strong>end-of-file-fixer</strong>: ensures files end with a newline.</li>
<li><strong>trailing-whitespace</strong>: removes stray whitespace at the end of lines.</li>
</ul>
<p>Hooks are not intended to affect program logic; they mostly enforce style and catch common mistakes. Their job is to keep the codebase clean and quiet.</p>
</section>
<section id="testing" class="level3">
<h3 class="anchored" data-anchor-id="testing">Testing</h3>
<p>Economics was not my first choice for my BA, I started in Computer Science and transferred after a few semesters. One of the things I learned was that you should always write tests. So I did. In Python, you can use <a href="https://docs.pytest.org/en/stable/"><code>pytest</code></a>.</p>
<p>When I did not have a working version of the code, I was doing <a href="https://en.wikipedia.org/wiki/Unit_testing">unit testing</a>: testing very specific units of the codebase. Once I started putting the pieces together, I moved on to <a href="https://en.wikipedia.org/wiki/Integration_testing">integration testing</a>, which is when you check that units interact as expected. Despite all of these tests, I was very much relieved when I was able to do <a href="https://en.wikipedia.org/wiki/System_testing">end-to-end testing</a> to verify that I was indeed getting the correct results.</p>
<p>I should say that this is one of the cases in which I found <a href="https://en.wikipedia.org/wiki/Large_language_model">LLMs</a> to be most helpful. You still need to supervise and check the code, but the task of writing unit tests provided a script is very well suited.</p>
</section>
<section id="continuous-integration-ci" class="level3">
<h3 class="anchored" data-anchor-id="continuous-integration-ci">Continuous integration (CI)</h3>
<p>Since I was coding by myself, it did not matter to have a remote repository during development. Still, I kept pushing because I wanted <a href="https://github.com/features/actions">GitHub Actions</a>.</p>
<p>A test may pass on my computer but not somewhere else. I could create a new virtual environment, but I was still on my computer. And it was not convenient. I wanted to install the package in a fresh environment and run the test suite. Well, this is precisely what continuous integration (CI) does, and you can do it for free (if the repository is public) using GitHub Actions.</p>
<p>In essence, the CI does one thing: it installs the package in a fresh environment and runs the test suite. This happens automatically on pushes and pull requests, and it runs across the Python versions that the package claims to support.</p>
</section>
<section id="build-and-release-tooling" class="level3">
<h3 class="anchored" data-anchor-id="build-and-release-tooling">Build and release tooling</h3>
<p>I used (what I think is) the standard approach . Once the code was ready to be released, I built the distribution artefacts locally: a source distribution (<code>sdist</code>) and a built distribution (<code>wheel</code>). This step forces you to confront packaging issues early, because it exercises exactly the same metadata and configuration that users rely on when installing the package. I then uploaded these artefacts using <code>twine</code>.</p>
</section>
</section>
<section id="workflow" class="level2">
<h2 class="anchored" data-anchor-id="workflow">Workflow</h2>
<p>Now that I’ve described the tools, I can tell you about my (solo) development workflow. It represents the workflow for a new feature or change, for a very minor change or at earlier stages, you can just skip the new branch.</p>
<p>By the way, I wrote <code>ritest</code> during 2025 with support from Open AI’s Codex. Things may have changed with the new models, but at the time, Codex needed close oversight and made lots of mistakes. Not to say it was not a great tool; this package would be worse (or even a never completed project) without LLMs. Like all the tools mentioned in this post, it is just another way to make the job easier.</p>
<div class="cell" data-layout-align="default">
<div class="cell-output-display">
<div>
<p></p><figure class="figure"><p></p>
<div>
<pre class="mermaid mermaid-js">flowchart TD
  A[Create branch] --&gt; B[Edit code]
  B --&gt; C[Run pre-commit]
  C --&gt; D{Hooks changed files?}
  D -- yes --&gt; E[git add -A]
  E --&gt; F[Commit]
  D -- no --&gt; F[Commit]
  F --&gt; G[Run tests: pytest]
  G --&gt; H{Tests pass?}
  H -- no --&gt; B
  H -- yes --&gt; I[Merge into main/master]
  I --&gt; J[Push]
  J --&gt; K{CI green?}
  K -- no --&gt; B
  K -- yes --&gt; L[Done]

</pre>
</div>
<p></p></figure><p></p>
</div>
</div>
</div>
<p>The workflow corresponds to:</p>
<div class="sourceCode" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb1-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">git</span> checkout <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">-b</span> feat/x</span>
<span id="cb1-2"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># edit code</span></span>
<span id="cb1-3"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">pre-commit</span> run <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">--all-files</span></span>
<span id="cb1-4"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">pytest</span></span>
<span id="cb1-5"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">git</span> add <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">-A</span></span>
<span id="cb1-6"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">git</span> commit <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">-m</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"message"</span></span>
<span id="cb1-7"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">git</span> checkout main</span>
<span id="cb1-8"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">git</span> merge feat/x</span>
<span id="cb1-9"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">git</span> push</span></code></pre></div>
</section>
<section id="packaging-and-release-on-pypi" class="level2">
<h2 class="anchored" data-anchor-id="packaging-and-release-on-pypi">Packaging and release on PyPI</h2>
<p>Once you are ready to ship, you need a <code>pyproject.toml</code> file. This is where you tell Python tools what your project is and how it should be built. It declares project metadata (name, version, description, URLs), minimum Python version, and runtime dependencies.</p>
<p>To release on PyPI, you can follow these steps:</p>
<ul>
<li>build wheel + sdist locally</li>
<li>upload to <a href="https://test.pypi.org/">TestPyPI</a> with <code>twine</code> (this is ‘rehearsal’)</li>
<li>install into a clean environment and actually use it</li>
<li>upload the same artefacts to <a href="https://pypi.org/">PyPI</a></li>
<li>bump version + tag release</li>
<li>Done!</li>
</ul>
</section>
<section id="documentation" class="level2">
<h2 class="anchored" data-anchor-id="documentation">Documentation</h2>
<p>The second lesson I took from my time as a CS student is that you must write proper documentation. So I also did that. And it takes much longer than what I thought.</p>
<p>Beyond proper in-code documentation, I think, at the very least, you need a <code>README</code> file for your GitHub repository. It describes what the package does, how to install it, and how to use it. Then, you need a <code>CHANGELOG</code> file to keep track of changes. For me, the real work was to write a comprehensive documentation site, built with Quarto and deployed via GitHub Pages. This is where I show basic and advanced use, examples, technical notes, and the reference API.</p>
</section>
<section id="one-last-word" class="level2">
<h2 class="anchored" data-anchor-id="one-last-word">One last word</h2>
<p>I hope that I’ve clearly conveyed that this is not a “how it is done” post. I just did it for the first time.I am sharing this post for selfish reasons. I would like to hear from people who have done this many times. Am I missing something that would make the process easier, faster, or more robust? Please <a href="https://tabarecapitan.com/">reach out</a> if you have something to say.</p>
<p>And if you are thinking about releasing your first package, you may find this post useful, just keep in mind that this is in no way an authoritative guide.</p>


</section>


<a onclick="window.scrollTo(0, 0); return false;" id="quarto-back-to-top"><i class="bi bi-arrow-up"></i> Back to top</a><div id="quarto-appendix" class="default"><section id="footnotes" class="footnotes footnotes-end-of-document"><h2 class="anchored quarto-appendix-heading">Footnotes</h2>

<ol>
<li id="fn1"><p>I can’t guarantee this is correct. I don’t know much about Stata’s MATA language for linear algebra. This is just what you may call an ‘informed hunch’.↩︎</p></li>
<li id="fn2"><p>In R’s <code>ritest</code> documentation, Grant McDermott presents <a href="https://grantmcdermott.com/ritest/articles/ritest.html#example-ii-real-life-data">a case</a> in which the runtime goes down from 183 seconds in Stata, to 6.58 seconds in R.↩︎</p></li>
</ol>
</section></div> ]]></description>
  <category>software</category>
  <guid>https://www.tabarecapitan.com/blog/0006-ritest-pypi/</guid>
  <pubDate>Sat, 07 Feb 2026 23:00:00 GMT</pubDate>
</item>
<item>
  <title>Confidence intervals in randomisation inference</title>
  <link>https://www.tabarecapitan.com/blog/0005-ritest-ci/</link>
  <description><![CDATA[ 






<section id="intro" class="level2">
<h2 class="anchored" data-anchor-id="intro">Intro</h2>
<p>When you do randomisation inference (RI), your output typically shows the observed statistic, as well as a <img src="https://latex.codecogs.com/png.latex?p">-value, a standard error, and a confidence interval. This looks very similar to what you get from a regression where, say, for a given coefficient, your output shows the point estimate, as well as a standard error, a <img src="https://latex.codecogs.com/png.latex?t">-statistic, a <img src="https://latex.codecogs.com/png.latex?p">-value, and a confidence interval.</p>
<p>But these outputs, other than the observed statistic or point estimate, are conceptually very different. In this post I try to make sense of the output of randomisation inference.</p>
</section>
<section id="p-values" class="level2">
<h2 class="anchored" data-anchor-id="p-values"><img src="https://latex.codecogs.com/png.latex?p">-values</h2>
<p><a href="../../blog/0001-inference/index.html">Randomisation inference (RI)</a> starts from a statistic <img src="https://latex.codecogs.com/png.latex?T(%5Ccdot)">: a difference in means, a regression coefficient, a median difference, or anything else you care about. The design (the assignment mechanism) induces a randomisation distribution for that statistic <em>under a null hypothesis</em>.</p>
<p>Under a sharp null, the randomisation <img src="https://latex.codecogs.com/png.latex?p">-value is a tail probability under the assignment mechanism: <img src="https://latex.codecogs.com/png.latex?%0Ap%20%5C;=%5C;%20%5CPr%5C!%5Cleft(%7CT%7C%20%5Cge%20%7CT_%7B%5Ctext%7Bobs%7D%7D%7C%20%5C;%5Cmiddle%7C%5C;%20H_0,%5C;%20%5Ctext%7Bdesign%7D%5Cright).%0A"></p>
<p>If you can enumerate every valid assignment, you can compute <img src="https://latex.codecogs.com/png.latex?p"> exactly, in the design-based sense. In most real problems there are too many valid assignments and it is not worth it to go through all of them, so you sample <img src="https://latex.codecogs.com/png.latex?R"> valid reassignments.</p>
<p>Let <img src="https://latex.codecogs.com/png.latex?c"> be the number of assignments in which the statistic is more extreme than the observed statistic: <img src="https://latex.codecogs.com/png.latex?%0Ac%20%5C;=%5C;%20%5Csum_%7Br=1%7D%5ER%20%5Cmathbf%7B1%7D%5C%7B%7CT_r%7C%20%5Cge%20%7CT_%7B%5Ctext%7Bobs%7D%7D%7C%5C%7D.%0A"> A common Monte Carlo estimator is<sup>1</sup> <img src="https://latex.codecogs.com/png.latex?%0A%5Chat%20p%20%5C;=%5C;%20%5Cfrac%7Bc%7D%7BR%7D.%0A"></p>
<p>At this point it is no longer correct to treat the reported <img src="https://latex.codecogs.com/png.latex?p">-value as a fixed number. It has Monte Carlo error because <img src="https://latex.codecogs.com/png.latex?R"> is finite. In contrast, in an OLS regression, the <img src="https://latex.codecogs.com/png.latex?p">-value is a deterministic function of the data, given the modelling assumptions. So there is no Monte Carlo uncertainty.</p>
<p>It is worth pointing out that <img src="https://latex.codecogs.com/png.latex?p">-values, both in RI and the regression context, only make sense in the context of a hypothesis test. <em>A <img src="https://latex.codecogs.com/png.latex?p">-value is not a generic measure of ‘signal strength’; it is defined relative to a null hypothesis and a reference distribution for the statistic.</em> In RI, both are explicit: the null is sharp, and the reference distribution comes from the assignment mechanism. In the regression context, the null is typically weak and the reference distribution is introduced analytically (often via asymptotic arguments).</p>
</section>
<section id="confidence-intervals" class="level2">
<h2 class="anchored" data-anchor-id="confidence-intervals">Confidence intervals</h2>
<p>We now have our Randomisation inference (RI) <img src="https://latex.codecogs.com/png.latex?p">-value, which is an estimate with Monte Carlo error. Then, it is natural to represent this error with a confidence interval (CI) <em>for the <img src="https://latex.codecogs.com/png.latex?p">-value itself</em>.</p>
<p>Conditional on the observed data and the null, each reassignment either lands in the tail or it does not. That makes <img src="https://latex.codecogs.com/png.latex?c"> behave like a binomial count: <img src="https://latex.codecogs.com/png.latex?%0Ac%20%5Csim%20%5Coperatorname%7BBinomial%7D(R,%20p).%0A"> Then, you can build a <img src="https://latex.codecogs.com/png.latex?(1-%5Calpha)"> interval for <img src="https://latex.codecogs.com/png.latex?p"> from this binomial model (there are several standard choices).</p>
<p>The interpretation is narrow but clean. The RI confidence interval of the <img src="https://latex.codecogs.com/png.latex?p">-value:</p>
<ul>
<li>quantifies uncertainty from <em>simulation</em>, not from (theoretically) drawing a new dataset,</li>
<li>shrinks as <img src="https://latex.codecogs.com/png.latex?R"> grows, and</li>
<li>tells you when a ‘statistical significance’ call is robust versus when you are basically flipping a coin near a threshold.</li>
</ul>
<p>Now let’s get back to the regression setting, say, for an A/B test. How do we build a confidence interval?</p>
<p>To be concrete, define the finite-sample average treatment effect <img src="https://latex.codecogs.com/png.latex?%0A%5Ctau_%7B%5Ctext%7BATE%7D%7D%20%5C;=%5C;%20%5Cfrac%7B1%7D%7BN%7D%5Csum_%7Bi=1%7D%5EN%20%5Cbig(Y_i(1)-Y_i(0)%5Cbig).%0A"></p>
<p>If you estimate the effect as a treated–control difference in means, or as the coefficient on a treatment indicator in an OLS regression with an intercept, you are targeting <img src="https://latex.codecogs.com/png.latex?%5Ctau_%7B%5Ctext%7BATE%7D%7D"> on the outcome scale. A standard regression confidence interval takes the form <img src="https://latex.codecogs.com/png.latex?%0A%5Chat%5Ctau%20%5Cpm%20z_%7B1-%5Calpha/2%7D%5C,%5Cwidehat%7BSE%7D(%5Chat%5Ctau),%0A"> with details depending on how you estimate the variance.</p>
<p>The key thing is what that CI is <em>trying</em> to cover. In the usual regression presentation, the motivation is based on repeated sampling (often asymptotic): if we reran ‘the relevant randomness’ many times, the interval would cover the target parameter with frequency <img src="https://latex.codecogs.com/png.latex?1-%5Calpha">. The ‘not significant if the CI includes 0’ common saying is shorthand for not rejecting a weak null like <img src="https://latex.codecogs.com/png.latex?%0AH_0%5E%7B%5Ctext%7Bweak%7D%7D:%5C;%20%5Ctau_%7B%5Ctext%7BATE%7D%7D%20=%200,%0A"> under that sampling-based uncertainty and approximation. That is an uncertainty statement about an average effect.</p>
<p>Let’s take a moment for the distinction to sink in. The confidence interval in a typical regression table reflects uncertainty around the coefficient, while the (default) confidence interval in randomisation inference reflects uncertainty around the <img src="https://latex.codecogs.com/png.latex?p">-value. They are not at all comparable.</p>
</section>
<section id="confidence-set" class="level2">
<h2 class="anchored" data-anchor-id="confidence-set">Confidence set</h2>
<p>It looks like we are missing a piece in randomisation inference. <em>Is there no confidence interval for the coefficient?</em> Well… sort of. At least in spirit. But we need to do much more work, both to build it and to interpret it.</p>
<p>A randomisation test needs a null that lets you impute missing potential outcomes. The canonical ‘no effect’ sharp null is <img src="https://latex.codecogs.com/png.latex?%0AH_0:%5C;%20Y_i(1)=Y_i(0)%5C;%5C;%5Cforall%20i.%0A"> That is stronger than ‘the average effect is 0’—the ’weak‘ null. It says nobody is affected.</p>
<p>To get an interval for an effect size, RI typically inverts a <em>family</em> of sharp nulls indexed by a candidate constant additive effect: <img src="https://latex.codecogs.com/png.latex?%0AH_0(%5Ctau_0):%5Cquad%20Y_i(1)%20=%20Y_i(0)%20+%20%5Ctau_0%20%5C;%5C;%5Ctext%7Bfor%20all%20%7D%20i.%0A"></p>
<p>For each <img src="https://latex.codecogs.com/png.latex?%5Ctau_0"> you compute a randomisation <img src="https://latex.codecogs.com/png.latex?p">-value <img src="https://latex.codecogs.com/png.latex?p(%5Ctau_0)">. Then you invert the tests: <img src="https://latex.codecogs.com/png.latex?%0A%5Cmathcal%7BC%7D_%7B1-%5Calpha%7D%20%5C;=%5C;%20%5C%7B%5Ctau_0:%5C;%20p(%5Ctau_0)%20%3E%20%5Calpha%5C%7D.%0A"></p>
<p>In other words: A ‘confidence interval’ for the coefficient under randomisation inference is the set of constant additive effects that you do not reject under design-based uncertainty.</p>
<p>That is what it means mechanically, and it is also the safest way to interpret it.</p>
<p>Everything else—especially ‘is it an ATE interval?’—depends on whether the constant-effect assumption is a defensible approximation for your application.</p>
<p>Note that the ‘confidence interval’ in randomisation inference is literally defined as a set, which explains that in randomisation inference we call the boundaries of that set the ‘confidence bounds’ instead of ‘confidence interval’. In addition, we call the whole set a ‘confidence band’. This keeps the whole inversion visible: plot the curve <img src="https://latex.codecogs.com/png.latex?%5Ctau_0%20%5Cmapsto%20p(%5Ctau_0)"> and draw the horizontal line at <img src="https://latex.codecogs.com/png.latex?%5Calpha">. The confidence set is where the curve sits above the line. I like bands because they answer questions the bounds cannot, such as</p>
<ul>
<li>do the endpoints come from a sharp crossing or a curve that barely grazes <img src="https://latex.codecogs.com/png.latex?%5Calpha">?</li>
<li>is the acceptable region a single chunk, or does it fragment?</li>
<li>if <img src="https://latex.codecogs.com/png.latex?p(%5Ctau_0)"> is computed by Monte Carlo, is the crossing stable once you acknowledge simulation noise?</li>
</ul>
<section id="interpretation" class="level3">
<h3 class="anchored" data-anchor-id="interpretation">Interpretation</h3>
<p>I did say that there is much work to do to interpret RI confidence bounds. The object itself is unambiguous: <img src="https://latex.codecogs.com/png.latex?%5Cmathcal%7BC%7D_%7B1-%5Calpha%7D"> it is the set of <img src="https://latex.codecogs.com/png.latex?%5Ctau_0"> values for which the constant-additive-effect null <img src="https://latex.codecogs.com/png.latex?H_0(%5Ctau_0)"> is not rejected at level <img src="https://latex.codecogs.com/png.latex?%5Calpha">.</p>
<p>Now the interpretation splits.</p>
<p>If effects are (approximately) constant and additive, so that <img src="https://latex.codecogs.com/png.latex?%0AY_i(1)=Y_i(0)+%5Ctau%20%5Cquad%20%5Cforall%20i,%0A"> then <img src="https://latex.codecogs.com/png.latex?%5Ctau"> is approximately equal to <img src="https://latex.codecogs.com/png.latex?%5Ctau_%7B%5Ctext%7BATE%7D%7D">, and <img src="https://latex.codecogs.com/png.latex?%5Cmathcal%7BC%7D_%7B1-%5Calpha%7D"> is naturally read as a confidence set for the ATE under design-based uncertainty. In that world, the confidence bound carries essentially the meaning people expect.</p>
<p>If effects are heterogeneous, <img src="https://latex.codecogs.com/png.latex?H_0(%5Ctau_0)"> is a strong claim: it says <img src="https://latex.codecogs.com/png.latex?%5Ctau_i=%5Ctau_0"> for everyone. Then:</p>
<ul>
<li><img src="https://latex.codecogs.com/png.latex?0%20%5Cin%20%5Cmathcal%7BC%7D_%7B1-%5Calpha%7D"> means you did not reject “everyone’s effect is exactly 0” (given your statistic and design).</li>
<li><img src="https://latex.codecogs.com/png.latex?0%20%5Cnotin%20%5Cmathcal%7BC%7D_%7B1-%5Calpha%7D"> means you rejected that claim.</li>
</ul>
<p>What it does <em>not</em> automatically mean is ‘the ATE might be 0’ or ‘the ATE is nonzero’, because those are weak-null statements about an average, and the inverted tests are about a constant effect for every unit.</p>
<p>So under heterogeneity, in practice, you can read RI confidence bounds as a compatibility check for a simple constant-effect model, not as a replacement for an ATE interval.</p>
</section>
<section id="computation" class="level3">
<h3 class="anchored" data-anchor-id="computation">Computation</h3>
<p>I also said that we had to do much work to build the confidence bounds. And we do. Much more work.</p>
<p>A single randomisation test uses <img src="https://latex.codecogs.com/png.latex?R"> reassignments. Test inversion uses many randomisation tests—one for each candidate <img src="https://latex.codecogs.com/png.latex?%5Ctau_0"> you evaluate—so the computation is ‘RI, repeated’. A band is even more demanding because you are deliberately evaluating <img src="https://latex.codecogs.com/png.latex?p(%5Ctau_0)"> across a grid.</p>
</section>
<section id="discussion" class="level3">
<h3 class="anchored" data-anchor-id="discussion">Discussion</h3>
<p>So randomisation inference (sort of) have confidence intervals for the coefficient. But they are much harder to build and to interpret. It almost feels pointless. In a regression setting, confidence intervals feel more immediately useful: they are designed to speak directly about an average effect on the outcome scale under a weak-null framing.</p>
<p>Still, I think that RI confidence bounds and bands can (sometimes) be useful. They are just useful in a different way: as a transparent, design-grounded way to ask ‘what constant-effect stories are compatible with what we saw?’</p>
<p>Or perhaps we are simply very used to thinking in terms of weak nulls. <em>A statement like <img src="https://latex.codecogs.com/png.latex?%5Ctau_%7B%5Ctext%7BATE%7D%7D=0"> is convenient and often relevant for decisions, but it can also be uninformative.</em> If effects differ in sign, the average can be zero even though treatment has substantial consequences for many units. In that sense, a weak null can hide structure rather than reveal it.</p>
<p>Sharp nulls force a different discipline. They ask whether any effect at all is compatible with the design-based evidence, or whether a simple, homogeneous effect could plausibly summarise what happened. That is a stronger question. But it is also a clarifying one. Seen this way, RI confidence bounds are not really ‘competing’ with regression confidence intervals. They are probing a different dimension: not ‘how large is the average effect?’, but ‘how simple a story about effects can we still defend?’.</p>


</section>
</section>


<a onclick="window.scrollTo(0, 0); return false;" id="quarto-back-to-top"><i class="bi bi-arrow-up"></i> Back to top</a><div id="quarto-appendix" class="default"><section id="footnotes" class="footnotes footnotes-end-of-document"><h2 class="anchored quarto-appendix-heading">Footnotes</h2>

<ol>
<li id="fn1"><p>Some implementations use small finite-sample adjustments; the point here is the same either way.↩︎</p></li>
</ol>
</section></div> ]]></description>
  <category>statistics</category>
  <guid>https://www.tabarecapitan.com/blog/0005-ritest-ci/</guid>
  <pubDate>Sun, 01 Feb 2026 23:00:00 GMT</pubDate>
</item>
<item>
  <title>Randomisation inference in Stata, R, and Python</title>
  <link>https://www.tabarecapitan.com/blog/0004-ritest-implementations/</link>
  <description><![CDATA[ 






<section id="intro" class="level2">
<h2 class="anchored" data-anchor-id="intro">Intro</h2>
<p>In this brief post I show how to do randomisation inference (RI) using three different implementations.</p>
<p>I start with the OG, which is <a href="https://hesss.org/">Simon Heß</a>’s <a href="https://github.com/simonheb/ritest">implementation in Stata</a>. This is the one I’ve used for many years. Then I cover <a href="https://grantmcdermott.com/">Grant McDermott</a>’s <a href="https://grantmcdermott.com/ritest/">implementation in R</a>, which is described as a port of the Stata’s implementation. Finally, I present my own very recent <a href="../../projects/ritest/">implementation in Python</a>, which is as not necessarily a port, but it explicitly aims to be functionally equivalent to the preceding implementations. All three implementations share the same name: <code>ritest</code>.</p>
<p>This is not a comprehensive list of implementations, I chose the ones that I have used. For example, in Stata, Alwyn Young has shared <a href="https://personal.lse.ac.uk/YoungA/">code</a> to do randomization inference and confidence intervals. In R, Alexander Coppock, authored <code>ri2</code>, documented <a href="https://cran.r-project.org/web/packages/ri2/vignettes/ri2_vignette.html">here</a>. And you can find more general permutation frameworks in Python.</p>
</section>
<section id="hypothetical-example" class="level2">
<h2 class="anchored" data-anchor-id="hypothetical-example">Hypothetical example</h2>
<p>Think of a product A/B test at TikTok. A new onboarding flow is rolled out to a random subset of users, and you want an <a href="../../blog/0001-inference/index.html">assignment-based uncertainty</a> statement for the treatment effect.</p>
<ul>
<li>unit: user</li>
<li>treatment indicator: <code>treat</code> (1 = new onboarding, 0 = old)</li>
<li>primary outcome: <code>activated_7d</code> (1 = activated within 7 days)</li>
<li>pre-treatment covariates (optional, for precision): <code>pre_usage</code> (numeric), <code>device_ios</code> (0/1), <code>region_eu</code> (0/1)</li>
</ul>
<p>If the experiment used blocked randomisation, you also have a strata variable:</p>
<ul>
<li>strata / blocks: <code>strata_id</code> (e.g., country-by-device buckets used in the randomisation)</li>
</ul>
<p>If the experiment randomised at a higher level (say, by creator cohort or by market), you may also have:</p>
<ul>
<li>cluster: <code>cluster_id</code></li>
</ul>
<p>The statistic I will use is the treatment coefficient from</p>
<p><img src="https://latex.codecogs.com/png.latex?activated%5C_7d%20=%20%5Calpha%20+%20%5Ctau%20%5C,%20treat%20+%20%5Cbeta_1%20pre%5C_usage%20+%20%5Cbeta_2%20device%5C_ios%20+%20%5Cbeta_3%20region%5C_eu%20+%20%5Cvarepsilon."></p>
</section>
<section id="stata" class="level2">
<h2 class="anchored" data-anchor-id="stata">Stata</h2>
<p>You can install the command <code>ritest</code> from the SSC archive:</p>
<div class="sourceCode" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode stata code-with-copy"><code class="sourceCode stata"><span id="cb1-1"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">ssc</span> install ritest</span></code></pre></div>
<p>Below, <code>ritest</code> permutes the assignment variable <code>treat</code>, re-runs the estimation command each time, and collects the statistic of interest. The typical pattern is:</p>
<ul>
<li>write the model as you usually would</li>
<li>tell <code>ritest</code> which coefficient/statistic to track</li>
<li>optionally enforce strata / clusters to mirror the design</li>
</ul>
<div class="sourceCode" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode stata code-with-copy"><code class="sourceCode stata"><span id="cb2-1">ritest treat _b[treat], reps(5000) <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">seed</span>(123) <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">///</span></span>
<span id="cb2-2">    <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">strata</span>(strata_id) <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">cluster</span>(cluster_id) <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">nodots</span> : <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">///</span></span>
<span id="cb2-3">    <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">regress</span> activated_7d treat pre_usage device_ios region_eu, <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">vce</span>(<span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">cluster</span> cluster_id)</span></code></pre></div>
<p>Notes:</p>
<ul>
<li>The <code>cluster()</code> option in <code>ritest</code> refers to the randomisation unit if assignment is clustered; the <code>vce(cluster ...)</code> inside <code>regress</code> is a modelling choice for conventional standard errors, not the RI itself. The latter does not have implications for randomisation inference.</li>
<li>If you did not randomise within strata, drop <code>strata(strata_id)</code>. If you did not cluster assignment, drop <code>cluster(cluster_id)</code>.</li>
</ul>
</section>
<section id="r" class="level1">
<h1>R</h1>
<p>You can install the package <code>ritest</code> from GitHub</p>
<div class="sourceCode" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb3-1">remotes<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">install_github</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"grantmcdermott/ritest"</span>)</span></code></pre></div>
<p>In R, the pattern is usually:</p>
<ol type="1">
<li>fit a model object (often <code>lm()</code> or <code>fixest::feols()</code>)</li>
<li>pass the fitted object to <code>ritest()</code>, specifying the resampling variable and (optionally) strata/cluster structure</li>
</ol>
<div class="sourceCode" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb4-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(ritest)</span>
<span id="cb4-2"></span>
<span id="cb4-3">fit <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">lm</span>(</span>
<span id="cb4-4">  activated_7d <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> treat <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> pre_usage <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> device_ios <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> region_eu,</span>
<span id="cb4-5">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> df</span>
<span id="cb4-6">)</span>
<span id="cb4-7"></span>
<span id="cb4-8">ri <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ritest</span>(</span>
<span id="cb4-9">  fit,</span>
<span id="cb4-10">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">resampvar =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"treat"</span>,</span>
<span id="cb4-11">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">reps =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5000</span>,</span>
<span id="cb4-12">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">strata =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"strata_id"</span>,</span>
<span id="cb4-13">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">cluster =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"cluster_id"</span>,</span>
<span id="cb4-14">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">seed =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">123</span></span>
<span id="cb4-15">)</span>
<span id="cb4-16"></span>
<span id="cb4-17">ri</span></code></pre></div>
<p>This mirrors the Stata workflow: the statistic is defined through a fitted model object, and RI is performed by permuting the assignment in a way that respects the experimental design.</p>
<section id="python" class="level2">
<h2 class="anchored" data-anchor-id="python">Python</h2>
<p>You can install the package from PyPI<sup>1</sup></p>
<div class="sourceCode" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb5-1"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">pip</span> install ritest-python</span></code></pre></div>
<p>The Python interface follows the same conceptual pattern:</p>
<ul>
<li>specify what to permute (<code>permute_var="treat"</code>)</li>
<li>specify the statistic through a formula and the coefficient name (<code>stat="treat"</code>)</li>
<li>optionally pass <code>strata</code> and <code>cluster</code> to respect the design</li>
</ul>
<div class="sourceCode" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb6-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> ritest <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> ritest</span>
<span id="cb6-2"></span>
<span id="cb6-3">res <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> ritest(</span>
<span id="cb6-4">    df<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>df,</span>
<span id="cb6-5">    permute_var<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"treat"</span>,</span>
<span id="cb6-6">    formula<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"activated_7d ~ treat + pre_usage + device_ios + region_eu"</span>,</span>
<span id="cb6-7">    stat<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"treat"</span>,</span>
<span id="cb6-8">    strata<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"strata_id"</span>,</span>
<span id="cb6-9">    cluster<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"cluster_id"</span>,</span>
<span id="cb6-10">    reps<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5000</span>,</span>
<span id="cb6-11">    seed<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">123</span>,</span>
<span id="cb6-12">)</span>
<span id="cb6-13"></span>
<span id="cb6-14"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(res.summary())</span></code></pre></div>
</section>
</section>
<section id="notes" class="level1">
<h1>Notes</h1>
<p>I think that the sequential development of these implementations has made it very convenient for a user to move around the three different languages. They all support very similar features and usage patterns, including ways to define generic statistics.</p>
<p>Despite these functional similarities, there are significant implementation differences. Some are necessary differences imposed by the software environment, and others reflect deliberate design and implementation choices. Because randomisation inference is, at its core, repeated re-estimation of the statistic, these differences can have a significant <a href="../../projects/ritest/benchmarks/intro.html">impact on performance</a>. On a shared <a href="../../projects/ritest/benchmarks/colombia.html">real dataset example</a>, my documentation reports approximate wall-clock times of about 220 seconds (Stata), 16.45 seconds (R), and 7.28 seconds (Python) for 5,000 permutations with a fixed-effects style specification and clustered and stratified design constraints. That said, performance is contingent on the specific computations for a given statistic.</p>
<p>Which one should you use? Most likely, whatever you were planning to use. For most applications, it would not make sense to choose your software based on the performance of the available <code>ritest</code>. Stata is certainly not well known for its speed, but it remains my favourite language for data analysis.</p>


</section>


<a onclick="window.scrollTo(0, 0); return false;" id="quarto-back-to-top"><i class="bi bi-arrow-up"></i> Back to top</a><div id="quarto-appendix" class="default"><section id="footnotes" class="footnotes footnotes-end-of-document"><h2 class="anchored quarto-appendix-heading">Footnotes</h2>

<ol>
<li id="fn1"><p>You may have noticed that my package in PyPI is listed as <code>ritest-python</code> instead of <code>ritest</code>. It is not my fault. PyPI has an automatic check preventing packages to be named in a way that may be too similar to current packages, and there is really not much you can do about it. I think the culprit is <code>rotest</code>. In any case, this is only an issue for installation, you can then use <code>ritest</code> in your script.↩︎</p></li>
</ol>
</section></div> ]]></description>
  <category>statistics</category>
  <category>software</category>
  <guid>https://www.tabarecapitan.com/blog/0004-ritest-implementations/</guid>
  <pubDate>Mon, 26 Jan 2026 23:00:00 GMT</pubDate>
</item>
<item>
  <title>The economics of the WNBA</title>
  <link>https://www.tabarecapitan.com/blog/0003-wnba/</link>
  <description><![CDATA[ 






<section id="introduction" class="level2">
<h2 class="anchored" data-anchor-id="introduction">Introduction</h2>
<p>It is an exciting time for the WNBA, albeit seemingly puzzling. On the one hand, the league boasts record viewership, has signed a new media deal worth over $2 billion (roughly $200 million per year), and has negotiated more than $1 billion in expected expansion fees. On the other hand, the league continues to present itself as operating closer to a charitable venture, running at a loss for nearly three decades for the benefit of the players.</p>
<p>These conflicted views are currently facing each other as the league negotiates the new collective bargaining agreement with the players. It is not going well: the third deadline expired on January 9th, there is <a href="https://sports.yahoo.com/articles/does-wnba-moratorium-agreement-mean-142838476.html?guccounter=1">now a moratorium</a>, and players <a href="https://www.espn.com/wnba/story/_/id/47348578/wnbpa-says-members-voted-strike-necessary">have voted to authorise calling a strike “when necessary”</a>.</p>
<p><a href="https://www.espn.com/wnba/story/_/id/47348578/wnbpa-says-members-voted-strike-necessary">The latest proposal by the WNBA</a>, reportedly, includes an almost four-fold increase in the minimum and average yearly salary (around $250,000 and $500,000); which the players did not accept.</p>
<p>At first glance, rejecting such an offer may seem hard to reconcile with the league’s long-running narrative of financial losses. But that reaction implicitly assumes that player pay in professional sports should be anchored to reported operating profits. In this post, I argue that this is a misleading frame. To understand the current negotiations, it is necessary to look at how professional leagues actually set wages, how operating losses relate to investor returns, and how value is created and distributed over time. Once those pieces are in place, the players’ position appears far less puzzling.</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://www.tabarecapitan.com/blog/0003-wnba/Madison_Square_Garden_Liberty.jpg" class="img-fluid figure-img"></p>
<figcaption>Photo by <a href="https://www.flickr.com/photos/mesohungry/3629347913/">Jason Lam</a>, CC BY-SA 2.0, <a href="https://commons.wikimedia.org/w/index.php?curid=15821278">wikimedia</a></figcaption>
</figure>
</div>
</section>
<section id="theory" class="level2">
<h2 class="anchored" data-anchor-id="theory">Theory</h2>
<p>Professional leagues like the WNBA are best understood as a legally constrained joint venture that functions as a monopoly platform in the market for top-tier basketball while also operating as a dominant buyer of elite basketball labour. Because a league season is a joint product—teams are rivals on the court but complements in production—many core choices are centralised: rules of play, scheduling, media-rights packaging, revenue sharing, and restrictions on labour mobility. As a result, wages do not emerge from decentralised market clearing. They are set inside an administered internal labour market shaped by league governance (draft assignment, rookie scale, restricted free agency, salary cap and maximum contracts), which both manages competitive balance (a demand-side feature of the product) and compresses the set of feasible contracts relative to an open auction.</p>
<p>Player pay is therefore the equilibrium outcome of collective bargaining in a bilateral-monopoly environment: the league/teams act as a coordinated buyer through shared rules, and the union coordinates labour supply. The bargaining set is pinned down by basketball-related revenues (especially national media and sponsorship income) and by the league’s commitment to accounting definitions and cap architecture; the division of surplus is governed by threat points and outside options. Owners’ leverage comes from capital depth and the ability to withstand short-run losses, while players’ leverage comes from the star elasticity of demand, reputational and broadcast losses from a stoppage, and whatever outside earnings or alternative playing opportunities raise reservation pay. Negotiations are thus about the rent split and its incidence across player types—how the cap, maximums, minimums, and exceptions allocate revenue growth between superstars, mid-tier players, and marginal roster players—rather than ‘price discovery’ in any competitive sense.</p>
</section>
<section id="practice" class="level2">
<h2 class="anchored" data-anchor-id="practice">Practice</h2>
<p>This is much easier than the theory. What do we know in practice? I’ll cover three areas: the players’ experience, the league’s reported losses, and the investors’ bullish attitude.</p>
<section id="players-experience" class="level3">
<h3 class="anchored" data-anchor-id="players-experience">Players’ experience</h3>
<p>After a lifetime of hard work, very few players make it to the WNBA. Most of them get to the league after <a href="https://www.espn.com/wnba/story/_/id/43625067/who-eligible-enter-wnba-draft-rules-know">a (sort of) mandatory period of four years in college</a>—where they played for no salary. Congratulations, you’ve made it. Here is what you can expect.</p>
<p>Even if you are arguably the greatest prospect ever, Caitlin Clark, you will be paid under <a href="https://www.sportingnews.com/us/wnba/indiana-fever/news/caitlin-clark-salary-breakdown-reveals-how-underpaid-she-wnba/b2faa9f4c5319c4f3df07f21">$80,000.</a> If you are an average drafted player, you are likely getting under <a href="https://www.sportingnews.com/us/wnba/news/wnba-highest-paid-average-salary-rookie-deals-2024/def661966f0f9625d5427326">$70,000</a>. You can also count on a housing stipend, reportedly <a href="https://highposthoops.com/wnba-cba-negotiations-are-revealing-hidden-truths-about-league">ranging from ~$1,100 in Las Vegas to ~$2,500 in New York</a>. Unless you choose to live in housing provided by your team, typically a one-bedroom apartment. So you have housing covered, one way or another. Except that it is only during the season and, if you make it, postseason. And only if you are not suddenly cut. In practice, it is a bit of a nomad lifestyle.</p>
<p>Alright, you got used to your housing situation. It is what it is. Training camp went great. It is time for the regular season. If you are starting in 2025, you are in luck. You will get charter flights and single rooms at hotels for away games. For those who started before last year, commercial flight was on the menu. And for those who started before 2020, shared rooms for non-veterans were the norm. There is progress, for sure. But it is far from the lifestyle you were hoping to experience at the very top league in the world.</p>
<p>Well, at least you only work half of the year! Your friends may say. If only. You cannot afford to simply take off. It takes a lot of time and resources to stay in shape and ready for the next season. LeBron James, an obvious outlier, reportedly spends about <a href="https://fortune.com/well/article/lebron-james-biohacking-regimen-routine/">$1.5 million</a> per year to stay sharp, and he has a human body just like you. Nevermind… you are likely not going on vacation, it is time to travel abroad for the second season of the year.</p>
</section>
<section id="the-league-loses-money-every-year" class="level3">
<h3 class="anchored" data-anchor-id="the-league-loses-money-every-year">The league loses money every year</h3>
<p>As far as I know, there is no publicly available, standardised, audited, line-by-line WNBA income statement for outsiders to observe. We only have informal reports, two in particular. In 2018, the NBA commissioner, Adam Silver, mentioned that the WNBA has historically lost money; around <a href="https://apnews.com/wnba-crossroads-league-looks-to-cut-losses-hire-president-75e117e82df7470c94784438048171d1">$10 million per year</a>. Other sources report losses of around <a href="https://www.si.com/onsi/womens-fastbreak/news/adam-silver-addresses-report-nba-owners-are-frustrated-with-wnba-financial-losses-01jbdbg0b84y">$40-50 million in 2024</a>.</p>
<p>What exactly does it mean for the league to “lose money”? It means that what the league accounts for as operating expenses is higher than what the league accounts for as operating revenue. So, what are we talking about? The main operating expenses are travel, team and league operations (such as coaches, training staff, medical staff, facilities), game operations (arena staff and other costs), marketing and league administration, and player compensation. And the main revenue sources are media rights, sponsorships, ticketing, and merch (or licensing).</p>
<p>Because of this operating loss, many people conclude that players cannot expect to be paid well. After all, the league reports losses; there must simply be no money to pay them more.<sup>1</sup> Implicit in this view is the idea that owners are subsidising the league so that players can participate at all—and that players should therefore be grateful for the platform itself. But this framing conflates operating losses with investor returns.</p>
<p>I think this is what a lot of people misunderstand. The league may have operating losses, it does not mean that the owners are losing.<sup>2</sup> Many high-growth firms (e.g., Amazon in its early years) generated negative operating income while delivering large capital gains to owners. The return on the investment to the owner in the WNBA is the sum of the operating profit, the capital gains, the cash distributions, and the capital flows:</p>
<p><img src="https://latex.codecogs.com/png.latex?%0A%5Ctext%7BROI%7D%20%5C;%5Capprox%5C;%0A%5Cunderbrace%7B%5CPi_%7B%5Ctext%7Boperating%7D%7D%7D_%7B%5Ctext%7Bcash%20flow%20(can%20be%20%3C0)%7D%7D%20%5C;+%5C;%0A%5Cunderbrace%7B%5CDelta%20V_%7B%5Ctext%7Bfranchise%7D%7D%7D_%7B%5Ctext%7Bcapital%20gain%7D%7D%20%5C;+%5C;%0A%5Cunderbrace%7BD%7D_%7B%5Ctext%7Bdistributions%20/%20payouts%7D%7D%20%5C;-%5C;%0A%5Cunderbrace%7BK%7D_%7B%5Ctext%7Bcapital%20calls%20/%20injections%7D%7D%0A"></p>
<p>This is why accounting gets strategic. If the league wants to raise more capital, they can emphasise the explosive growth in franchise value. If they want the players to take less money, they can emphasise the operating loss.</p>
</section>
<section id="the-investors-bullish-attitude" class="level3">
<h3 class="anchored" data-anchor-id="the-investors-bullish-attitude">The investors’ bullish attitude</h3>
<p>Investors do not appear to be worried about the league’s historical operating loss. There is no great public data, so I’m going by reports here. All valuation numbers should be treated as estimates, not facts.</p>
<p>The league may be valued at almost <a href="https://basketball.realgm.com/wiretap/280937/13-WNBA-Franchises-Collectively-Worth-$35-Billion">$3.5 billion in 2025</a>, with an average team value of <a href="https://www.forbes.com/sites/brettknight/2025/06/06/the-wnbas-most-valuable-teams-2025/">~$270 million</a>. This average <a href="https://edition.cnn.com/2025/06/24/sport/wnba-franchise-increase-value-sportico-spt">value in 2024 was $96 million</a>, for a ~2.8x multiple in one year. We also know that the New York Liberty reportedly sold for ~$10-14 million in 2019, and it was valued by Forbes in 2025 at $400 million, with some reports about a minority stake sale reportedly <a href="https://www.espn.com/wnba/story/_/id/45271533/reports-wnba-liberty-sell-stake-record-450m-valuation">at $450 million</a>. That’s ~30-45x in 6 years. No wonders investors are willing to pay <a href="https://www.espn.com/wnba/story/_/id/45618874/wnba-expansion-cleveland-detroit-philadelphia-cba-draft">$250 million</a> in expansion fees to get a team.</p>
<p>The question is who has benefited from this growth? Consider Michael Jordan—without a doubt the most profitable player in the history of the NBA. In 13 seasons with the Chicago Bulls, <a href="https://www.forbes.com/sites/justinbirnbaum/2023/10/02/michael-jordan-joins-forbes-400-worth-3-billion/">he made less than $100 million in NBA salaries</a>. That’s 13 years of literal sweat, including six championships. In 2010, Jordan bought a majority stake to become the owner of the Charlotte Hornets for $275 million. In 2023, <a href="https://www.espn.com/nba/story/_/id/37863644/sources-michael-jordan-finalizing-charlotte-hornets-sale">he sold his majority stake for about $3 billion.</a> Roughly, we can correctly say that in 13 years as “the” player, Jordan made 3.3% of the money he made in 13 years as an owner.</p>
<p>Is it fair? That is subjective. I certainly believe that Jordan created more value as a player than as an owner (remember Michael Kidd-Gilchrist? yeah, me neither). In Jordan’s case, it feels appropriate that he was able to capture a share of the wealth he created. The point is that the operating losses are almost irrelevant when owners stand to benefit for a vast majority of the capital gains; the value of the league as an asset is increasing, and the owners own it.</p>
<p>The capital behind the owners operates in a logic very different from a ‘mom and pop’ shop. For example, the <a href="https://en.wikipedia.org/wiki/National_Basketball_Association">NBA’s roots</a> come from the Basketball Association of America (1946), which was founded by big hockey-arena owners to fill unused arena dates with basketball, and later merged into the NBA (1949). This strategy remains active: in many markets the NBA team is the anchor tenant that keeps an expensive arena (and its surrounding entertainment business) consistently monetised, while concerts and other events fill the remaining dates. More broadly, billionaire ownership is balance-sheet driven: owners can borrow against appreciating assets rather than “use cash” (or sell and realise taxable gains), structure holdings to optimise tax exposure, and treat ownership itself as a form of prestige and deal-flow that pays off beyond the team’s operating profit.</p>
</section>
</section>
<section id="current-negotiation" class="level2">
<h2 class="anchored" data-anchor-id="current-negotiation">Current negotiation</h2>
<p>The WNBA’s last proposal, <a href="https://www.espn.com/wnba/story/_/id/47466821/sources-wnba-projecting-big-losses-latest-proposal-union-disagrees">as reported</a>, looks generous at first glance because it roughly offers a four-fold increase in salaries across the board, with an average salary around half a million dollars. The players rejected the proposal. Instead, they demand a ‘fair’ share of the ‘cake’, as well as a clearly defined cake.</p>
<p>For context, in the NBA, there is a clearly defined cake: <a href="https://www.investopedia.com/articles/investing/070715/nbas-business-model.asp">basketball-related income</a>. This cake has been shared in a roughly 50/50 split since 1983, <a href="https://www.forbes.com/sites/kurtbadenhausen/2014/01/22/as-stern-says-goodbye-knicks-lakers-set-records-as-nbas-most-valuable-teams/">when the league revenues were around $118 M</a>. Based on <a href="https://www.nytimes.com/1983/06/09/sports/kings-are-sold.html">the sell of the Kansas City Kings to a group from Sacramento in 1983 for $10.5 millions</a> as an anchor price, we can multiply that price by the 23 teams in the league at the time, to obtain a total of $241 millions, which correspond to about $781 millions in 2025. Even assuming that the anchor price was below average, this back-of-the envelope calculation suggests that the NBA was valued under a billion dollars when the players negotiated a roughly 50/50 split. Again, the WNBA today is estimated to be about $3.5 billion.</p>
<p>The players’ demands focus on revenue sharing. They want a similar definition of the cake, as well as a share of it. In particular, they <a href="https://www.cbc.ca/sports/basketball/wnba-players-union-no-deal-collective-agreement-negotiations-9.7042228">reportedly seek a 30/70 split</a>. Still below the NBA’s 50/50 split.</p>
<p>The players have also <a href="https://www.espn.com/wnba/story/_/id/47011671/wnba-cba-negotiations-wnbpa-updates-latest-news">reported discontent</a> with the ‘disrespectful tone’ of the negotiations, in which the league has repeatedly argued that the players do not understand the situation, repeatedly returning to the operating losses in which they seem to want the negotiation to be anchored.</p>
<p>Unfortunately, the league’s narrative of confused players seems to resonate with the public. As I have argued in this piece, the theory behind the price determination in this case is rather complex, but there is credible empirical evidence about the investor’s bullish attitude towards the WNBA. It is then puzzling (or is it?) that people argue that the WNBA players don’t understand a situation that directly and greatly affects them. Never mind that <a href="https://herhoopstats.substack.com/p/what-did-wnba-players-study-in-college">most players have completed 4-year degrees</a>.</p>
</section>
<section id="a-proposal" class="level2">
<h2 class="anchored" data-anchor-id="a-proposal">A proposal</h2>
<p>There has always been value in the WNBA. This was recognised in the mid-1990s, when the NBA launched the WNBA as a “strategic extension of its basketball platform”. My interpretation is that the motivation was not “to help women’s basketball”, but to control the product, timing, and branding of women’s pro basketball in the US. The value remains today, greater than ever, as shown by the recent growth in market value.</p>
<p>So… I think that the players are asking for too little. I say they should ask for almost the same deal the NBA gets. I don’t see a strong economic reason for large differences in draft eligibility, revenue-sharing split, salary cap strictness, salary structure, and player movement or retention tools.<sup>3</sup></p>
<p>Along with the “bold” demand of equality, the players may benefit from steering away from the league’s focus on the operating loss. Instead, bring the focus to the owners and their capital gains. More specifically, bring the NBA to the spotlight. After all, <a href="https://www.espn.ph/wnba/story/_/id/47602121/wnba-cba-negotiations-collective-bargaining-agreement-wnbpa-update-latest">the NBA owns about 42% of the WNBA</a> (outside team owners own ~42% and outside investors the remaining ~16%). When talking about ‘owners’ in abstract, there is no reputation at stake. The NBA’s reputation should be at stake.</p>
<p>Furthermore, there are credible alternatives to the WNBA, including the <a href="https://auprosports.com/basketball/">Athletes Unlimited Pro Basketball</a> league, <a href="https://www.unrivaled.basketball/">Unrivaled</a>, and <a href="https://eu.usatoday.com/story/sports/basketball/2025/11/21/project-b-guide-startup-womens-basketball-league/87395675007/">Project B</a>, as well as established leagues outside the US. These alternatives provide real outside options and therefore meaningful leverage in negotiations. The same strategic logic that led the NBA to launch the WNBA in the 1990s remains relevant today: controlling the development of a growing adjacent market can be more valuable than maximising short-term operating margins. From the NBA’s perspective, the risk is not that a challenger is imminent, but that suppressing player compensation in a rapidly growing women’s game increases the probability that future growth occurs outside its institutional umbrella. In that sense, conceding meaningful equality to WNBA players is less a concession than a hedge against long-term competitive and reputational risk.</p>
<p>Finally, there is a channel of value creation that is largely absent from the current framing: demand expansion. For most of the league’s history, playing in the WNBA has been closer to a “for the love of the game” proposition than a financial one, at least for the median player. In my view, that matters because former players—especially those who played the sport growing up—are likely to be among the most durable and loyal consumers of basketball over a lifetime, even if this effect is gradual and cohort-driven rather than immediate.</p>
<p>In a league whose domestic fan base has historically skewed male, the WNBA represents a structurally underexploited opportunity to broaden basketball consumption by bringing in new fans rather than reallocating existing ones. This is not simply about gender composition—differences there appear modest—but also about age and entry points into fandom, where WNBA audiences tend to <a href="https://yougov.com/en-us/articles/49344-interest-in-the-wnba-is-the-highest-ever-but-who-are-the-fans">skew younger,</a> which is particularly valuable for long-run demand. This perspective is especially relevant for the NBA today, as much of its recent growth appears to be driven by <a href="https://www.sportbusiness.com/news/international-fanbase-driving-nbas-social-media-success/?utm_source=chatgpt.com">international expansion</a> rather than domestic market deepening.</p>
<p>From that standpoint, the WNBA is not only an asset whose value lies in media rights and franchise appreciation, but also a long-horizon investment in <a href="https://yougov.com/en-us/articles/49344-interest-in-the-wnba-is-the-highest-ever-but-who-are-the-fans?utm_source=chatgpt.com">expanding the overall basketball audience in the US and beyond</a>. Underinvesting in player compensation risks slowing that process. Viewed this way, improving the economic terms for WNBA players is not primarily about fairness or bargaining power; it is a strategic investment in <a href="https://basis.com/blog/3-things-advertisers-need-to-know-about-womens-sports-fans?utm_source=chatgpt.com">future demand</a>.</p>


</section>


<a onclick="window.scrollTo(0, 0); return false;" id="quarto-back-to-top"><i class="bi bi-arrow-up"></i> Back to top</a><div id="quarto-appendix" class="default"><section id="footnotes" class="footnotes footnotes-end-of-document"><h2 class="anchored quarto-appendix-heading">Footnotes</h2>

<ol>
<li id="fn1"><p>It should be noted that the commissioner’s salary is private information. Industry norms suggest, however, that the salary is unlikely to be under $1 million. Possibly around $1.5 million, which is the same as the salary cap for an entire team. Does the commissioner add as much value as one whole team?↩︎</p></li>
<li id="fn2"><p>In fact, <a href="https://www.espn.com/nba/story/_/id/20747413/a-confidential-report-shows-nearly-half-nba-lost-money-last-season-now-what">many NBA teams reportedly have had operating losses</a>.↩︎</p></li>
<li id="fn3"><p>Some differences are defensible. For example, given that the NBA has a longer season.↩︎</p></li>
</ol>
</section></div> ]]></description>
  <category>economics</category>
  <category>basketball</category>
  <guid>https://www.tabarecapitan.com/blog/0003-wnba/</guid>
  <pubDate>Thu, 15 Jan 2026 23:00:00 GMT</pubDate>
</item>
<item>
  <title>Introducing ritest: randomisation inference in Python</title>
  <link>https://www.tabarecapitan.com/blog/0002-ritest-intro/</link>
  <description><![CDATA[ 






<p>A few months ago I was analysing data from a randomised experiment aimed at increasing product adoption. It was the kind of project that shows up everywhere: a new feature ships, some users see it, some do not, and the goal is to figure out whether the feature had the intended effect.</p>
<p>The obvious next step is a <img src="https://latex.codecogs.com/png.latex?t">-test. That is what most analyses of this kind start with, and often where they stop.</p>
<p>But in this setting, the only thing that was actually random was the assignment itself: who saw the feature and who did not. The outcomes were not sampled at random from a population; they were observed after a deliberate assignment.</p>
<p>Instead of asking what would happen if I repeatedly sampled new users, I wanted to know what would have happened under different random assignments of the same users. This is the logic of randomisation inference.</p>
<p>I’ve done this before in Stata, where a well-established command,<a href="https://github.com/simonheb/ritest"><code>ritest</code></a>, covers most practical uses of randomisation inference. But I was working in Python. I found tools that cover some uses, but I did not find a functional equivalent to Stata’s <code>ritest</code>.</p>
<p>So I wrote Python’s <code>ritest</code>.</p>
<p>This post is a short announcement. My new package, <a href="https://tabarecapitan.com/projects/ritest/"><code>ritest</code></a>, brings a familiar randomisation inference tool to Python. It is designed to be easy to use, flexible, and fast.</p>
<section id="randomisation-inference" class="level2">
<h2 class="anchored" data-anchor-id="randomisation-inference">Randomisation inference</h2>
<p>When an experiment is randomised, there are two different stories you can tell about uncertainty.</p>
<p>One story is the ‘sampling’ story. You imagine your dataset as one draw from a larger population, and you ask what would happen if you could repeat the data-collection process. That is the story behind most textbook standard errors and t-tests.</p>
<p>The other story is the ‘assignment’ story. You hold the outcomes fixed and ask what would have happened under different random assignments of the same treatment. That is the story behind randomisation inference.</p>
<p>Operationally, randomisation inference is simple:</p>
<ol type="1">
<li>pick a statistic that measures the effect you care about</li>
<li>compute it on the observed assignment</li>
<li>recompute it under many alternative assignments that respect the experimental design</li>
<li>compare the observed statistic to its randomisation distribution</li>
</ol>
<p>That’s it. The hard part, in practice, is doing it in a way that is fast enough to use, and strict enough about the design to be trustworthy.</p>
</section>
<section id="features" class="level2">
<h2 class="anchored" data-anchor-id="features">Features</h2>
<p><code>ritest</code> supports two ways of defining the test statistic. In the most common case, the statistic is a coefficient from a linear model, specified through a regression formula. When that is not appropriate, you can instead provide a custom Python function that maps the data to a single scalar statistic.</p>
<p>In both cases, permutations can be constrained to respect the experimental design, including stratified randomisation, clustered assignment, and optional weighting on the linear path.</p>
<p>By default, <code>ritest</code> makes the Monte Carlo uncertainty in the p-value explicit when permutations are sampled rather than enumerated (which is almost always true). In that case, the p-value itself is an estimate, and the output includes a confidence interval for that estimate. On the linear path, the package also reports coefficient bounds (or a confidence interval) by default.</p>
<p>The package can be installed from PyPI:</p>
<div class="sourceCode" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1">pip install ritest<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>python</span></code></pre></div>
</section>
<section id="example" class="level2">
<h2 class="anchored" data-anchor-id="example">Example</h2>
<p>Here is a realistic pattern from product work. Imagine a rollout where users are randomised to see a new onboarding flow. The outcome is whether the user activates within 7 days. You also have pre-treatment covariates that help with precision (previous activity, device type, country). The effect you want is the coefficient on <code>treat</code>.</p>
<div class="sourceCode" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> pandas <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> pd</span>
<span id="cb2-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> ritest <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> ritest</span>
<span id="cb2-3"></span>
<span id="cb2-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Example column meanings:</span></span>
<span id="cb2-5"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># - activated_7d: 0/1 (activated within 7 days)</span></span>
<span id="cb2-6"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># - treat: 0/1 (assigned to new onboarding)</span></span>
<span id="cb2-7"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># - pre_usage: numeric (pre-treatment engagement)</span></span>
<span id="cb2-8"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># - device_ios: 0/1 (pre-built dummy; you can build dummies upstream)</span></span>
<span id="cb2-9"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># - region_eu: 0/1 (pre-built dummy)</span></span>
<span id="cb2-10"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># - strata_id: str/int (block or bucket used in the randomisation)</span></span>
<span id="cb2-11"></span>
<span id="cb2-12">res <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> ritest(</span>
<span id="cb2-13">    df<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>df,</span>
<span id="cb2-14">    permute_var<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"treat"</span>,</span>
<span id="cb2-15">    formula<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"activated_7d ~ treat + pre_usage + device_ios + region_eu"</span>,</span>
<span id="cb2-16">    stat<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"treat"</span>,</span>
<span id="cb2-17">    strata<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"strata_id"</span>,</span>
<span id="cb2-18">    reps<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5000</span>,</span>
<span id="cb2-19">    alpha<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.05</span>,</span>
<span id="cb2-20">    seed<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">23</span>,</span>
<span id="cb2-21">)</span>
<span id="cb2-22"></span>
<span id="cb2-23"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(res.summary())</span></code></pre></div>
<p>This is the workflow I wanted: I can express the estimand as a familiar regression coefficient, and I can get assignment-based uncertainty without pretending the only randomness in the problem is sampling noise.</p>
<p>Now imagine that the adoption question is not your bottleneck. Your bottleneck is latency: you care about the median time-to-value, which is skewed and full of long tails. You still have a randomised assignment, but you do not want to force the problem into a linear model.</p>
<p>That is what the generic path is for.</p>
<div class="sourceCode" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> ritest <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> ritest</span>
<span id="cb3-2"></span>
<span id="cb3-3"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">def</span> median_diff(d):</span>
<span id="cb3-4">    treated <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> d.loc[d[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"treat"</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"time_to_value_hours"</span>].median()</span>
<span id="cb3-5">    control <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> d.loc[d[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"treat"</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"time_to_value_hours"</span>].median()</span>
<span id="cb3-6">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">return</span> treated <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> control</span>
<span id="cb3-7"></span>
<span id="cb3-8">res <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> ritest(</span>
<span id="cb3-9">    df<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>df,</span>
<span id="cb3-10">    permute_var<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"treat"</span>,</span>
<span id="cb3-11">    stat_fn<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>median_diff,</span>
<span id="cb3-12">    reps<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5000</span>,</span>
<span id="cb3-13">    alpha<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.05</span>,</span>
<span id="cb3-14">    seed<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">23</span>,</span>
<span id="cb3-15">)</span>
<span id="cb3-16"></span>
<span id="cb3-17"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(res.pvalue)</span></code></pre></div>
<p>The point is not that medians are “better” than conditional means. The point is that a real workflow often has both kinds of questions, and the underlying source of uncertainty (the assignment) is the same.</p>
</section>
<section id="conclusion" class="level2">
<h2 class="anchored" data-anchor-id="conclusion">Conclusion</h2>
<p>I built this package because I needed it. The project grew well beyond my original plan as I tried to emulate, in Python, the same sense of convenience I had relied on when doing randomisation inference in Stata. I’m happy with the result, and I hope others find it useful. Since this is my first time releasing a package on PyPI, I genuinely want to hear what people think.</p>
<p>Finally, I want to encourage data scientists, data analysts, and researchers who are not familiar with randomisation inference to take a closer look. Randomisation inference can be appropriate whenever assignment is controlled and known. This is a common setting in many contexts: A/B testing in product and platform experiments, randomised controlled trials in economics and political science, greenhouse and field experiments in agricultural science, and laboratory or clinical studies in life sciences. If the main source of uncertainty in your problem comes from the design itself, randomisation inference may be right for you.</p>


</section>

<a onclick="window.scrollTo(0, 0); return false;" id="quarto-back-to-top"><i class="bi bi-arrow-up"></i> Back to top</a> ]]></description>
  <category>statistics</category>
  <category>software</category>
  <guid>https://www.tabarecapitan.com/blog/0002-ritest-intro/</guid>
  <pubDate>Mon, 05 Jan 2026 23:00:00 GMT</pubDate>
</item>
<item>
  <title>On inference</title>
  <link>https://www.tabarecapitan.com/blog/0001-inference/</link>
  <description><![CDATA[ 






<p>Consider the following hypothetical example. Spotify is investing in audiobooks, and wants to learn how much more discovery it can drive without harming core music listening. An obvious first step is an A/B test: add an ‘Audiobooks’ shelf to the Home feed for <em>some</em> eligible users. After a couple of weeks, estimate the treated minus control difference in time spent listening to books, with a guardrail like time spent listening to music.</p>
<p>If the assignment was random, that difference has a clean causal interpretation <em>for this experiment, for these users, over this window</em>. Identification is straightforward. Inference is more nuanced: how uncertain is the estimate of the difference, and uncertain <em>about what</em>? In other words, what can we infer about the world <em>beyond</em> this particular experiment?</p>
<p>This post is me trying to get the concept of inference straight. I’m going to treat ‘inference’ as a question about the <em>story of what could have happened</em>, not as a set of techniques I can apply.</p>
<section id="inference-is-a-thought-experiment" class="level2">
<h2 class="anchored" data-anchor-id="inference-is-a-thought-experiment">Inference is a thought experiment</h2>
<p>Inference is always a thought experiment. In our example, we get a point estimate for <em>that experiment, for those users, over that window</em>. What if we had another experiment? What if we had other users? What if we had another window? Unfortunately, that we cannot see. And so we rely on thought experiments.</p>
<p>Confidence intervals and <img src="https://latex.codecogs.com/png.latex?p">-values answer those ‘what if’ questions within a given thought experiment: What would we see if the world replayed repeatedly, in some relevant sense? That replay is not a minor detail. It is the <em>definition</em> of what your uncertainty statement means. And in most settings there are two replay modes that make immediate sense.</p>
<p><strong>Mode 1: replay the users</strong>. Imagine Spotify could re-run the same experiment many times, but each time the platform happens to see a different slice of users: different people are active, eligible, reachable, or simply online during your windows. You run the same A/B each time, and your estimate moves around <em>because the people changed</em>. This is sampling-based uncertainty.</p>
<p>That story corresponds to the <em>classical statistical inference</em> we typically encounter in textbooks and beyond: <img src="https://latex.codecogs.com/png.latex?p">-values motivated via repeated sampling. The key idea is that your effect (point estimate) could have been different had your sample of users been different. That is the uncertainty you are trying to estimate.</p>
<p><strong>Mode 2: replay the assignment</strong>. Now hold the users fixed. Imagine Spotify could take the same users, same window, and same (potential) outcomes. The only thing you replay is the randomisation: who got the ‘Audiobooks’ shelf and who didn’t, respecting whatever rules you originally used (such as equal split, stratification, blocked randomisation, and so on).</p>
<p>That story is the realm of <em>randomisation inference</em>. It is also the clean way to interpret permutation tests in an experiment: you are not permuting “because it is non-parametric”; you are generating the distribution of your statistic under the assignment mechanism you actually used.</p>
</section>
<section id="quantifying-uncertainty" class="level2">
<h2 class="anchored" data-anchor-id="quantifying-uncertainty">Quantifying uncertainty</h2>
<p>I find it useful to think about inference in three layers. The first layer is the estimator (or statistic): it produces an estimate <img src="https://latex.codecogs.com/png.latex?%5Chat%7B%5Ctau%7D"> of a target effect <img src="https://latex.codecogs.com/png.latex?%5Ctau">. You need <em>something</em> to make an inference about. Furthermore, random assignment lends causal credibility to the interpretation of the estimate. The second layer relates to the scope of the inference. Are we trying to make inferences about the broader population we want to generalise to? (replay mode 1) Or are we trying to make inferences about who happened to see the ‘Audiobooks’ shelf due to the particular realisation of the randomisation process? (replay mode 2) Or maybe both? The third layer refers to the quantification of the uncertainty <em>within the scope of the inference</em>. It is here that we can find the many methods that take the first two layers and turn them into <img src="https://latex.codecogs.com/png.latex?p">-values and confidence intervals.</p>
<p>For example, in our hypothetical experiment, the first layer is the estimator. We compute the treatment effect estimate <img src="https://latex.codecogs.com/png.latex?%5Chat%7B%5Ctau%7D">, for instance as the OLS coefficient on the treatment indicator, which (with an intercept) is algebraically equal to the treated–control difference in means, <img src="https://latex.codecogs.com/png.latex?%0A%5Chat%7B%5Ctau%7D%20%5C;=%5C;%20%5Cbar%7BY%7D_%7BT%7D%20-%20%5Cbar%7BY%7D_%7BC%7D.%0A"></p>
<p>Suppose that, in the second layer, we adopt a sampling-based replay story: the estimate <img src="https://latex.codecogs.com/png.latex?%5Chat%7B%5Ctau%7D"> would have been different had the experiment observed a different random sample of users.</p>
<p>In the third layer, we can quantify that uncertainty in several ways; with its interpretation being contingent on the sampling-based story.</p>
<p>A <em>standard error</em> is an absolute measure of dispersion. It estimates the variability of the estimator across repeated samples, <img src="https://latex.codecogs.com/png.latex?%0A%5Cwidehat%7BSE%7D(%5Chat%7B%5Ctau%7D)%20%5C;%5Capprox%5C;%20%5Csqrt%7B%5Coperatorname%7BVar%7D(%5Chat%7B%5Ctau%7D)%7D.%0A"></p>
<p>A <em>confidence interval</em> converts the same idea into an absolute uncertainty range for the estimand (the target effect). Under a Normal approximation, a <img src="https://latex.codecogs.com/png.latex?(1-%5Calpha)"> confidence interval for <img src="https://latex.codecogs.com/png.latex?%5Ctau"> is <img src="https://latex.codecogs.com/png.latex?%0A%5Cleft%5B%0A%5Chat%7B%5Ctau%7D%20-%20z_%7B1-%5Calpha/2%7D%5C,%5Cwidehat%7BSE%7D(%5Chat%7B%5Ctau%7D),%0A%5C;%5C;%0A%5Chat%7B%5Ctau%7D%20+%20z_%7B1-%5Calpha/2%7D%5C,%5Cwidehat%7BSE%7D(%5Chat%7B%5Ctau%7D)%0A%5Cright%5D,%0A"> where <img src="https://latex.codecogs.com/png.latex?z_%7B1-%5Calpha/2%7D"> denotes the corresponding quantile of the standard Normal distribution (or the appropriate <img src="https://latex.codecogs.com/png.latex?t"> quantile in finite samples).</p>
<p>A <em><img src="https://latex.codecogs.com/png.latex?p">-value</em> is different in nature: it is defined only relative to a hypothesis. If we wish to assess compatibility with a specific reference value <img src="https://latex.codecogs.com/png.latex?%5Ctau_0"> (often <img src="https://latex.codecogs.com/png.latex?%5Ctau_0%20=%200">), we form the standardised statistic <img src="https://latex.codecogs.com/png.latex?%0At%20%5C;=%5C;%20%5Cfrac%7B%5Chat%7B%5Ctau%7D%20-%20%5Ctau_0%7D%7B%5Cwidehat%7BSE%7D(%5Chat%7B%5Ctau%7D)%7D.%0A"></p>
<p>Under the sampling-based assumptions and the chosen reference distribution, the <img src="https://latex.codecogs.com/png.latex?p">-value is <img src="https://latex.codecogs.com/png.latex?%0Ap%20%5C;=%5C;%20%5CPr%5C!%5Cleft(%20%7CT%7C%20%5Cge%20%7Ct_%7B%5Ctext%7Bobs%7D%7D%7C%20%5C;%5Cmiddle%7C%5C;%20H_0:%5Ctau=%5Ctau_0%20%5Cright),%0A"> that is, the probability—computed under the null hypothesis—that a re-sampled experiment would produce a standardised statistic at least as extreme as the one observed.<sup>1</sup></p>
<p>So far, this may look like ‘methods’: <img src="https://latex.codecogs.com/png.latex?t">-tests, confidence intervals, p-values. But the three layers are the point. None of these outputs make sense in isolation. The meaning comes from (i) the estimator, (ii) the replay story, and only then (iii) the calculator used to turn the story into numbers.</p>
</section>
<section id="uncertainty-calculators" class="level2">
<h2 class="anchored" data-anchor-id="uncertainty-calculators">Uncertainty calculators</h2>
<p>Now that the estimator and the replay story are fixed, the remaining question is how to <em>compute</em> uncertainty within that scope. This is where most methods people recognise live. They are mostly different ways of approximating the same object: the distribution of <img src="https://latex.codecogs.com/png.latex?%5Chat%7B%5Ctau%7D"> under the chosen replay mode.</p>
<p>There are two broad ways to get that distribution.</p>
<p><strong>Route A: estimate variability, then approximate a reference distribution.</strong> This is the standard error route. You compute <img src="https://latex.codecogs.com/png.latex?%5Cwidehat%7BSE%7D(%5Chat%7B%5Ctau%7D)">, form a standardised statistic, and then map it to a <img src="https://latex.codecogs.com/png.latex?p">-value (or CI) using a reference distribution (Normal or <img src="https://latex.codecogs.com/png.latex?t"> in simple cases). This family includes the classic <img src="https://latex.codecogs.com/png.latex?t">-test and its close relatives (Wald tests, <img src="https://latex.codecogs.com/png.latex?F"> tests, <img src="https://latex.codecogs.com/png.latex?%5Cchi%5E2"> tests), all of which share the same structure: <img src="https://latex.codecogs.com/png.latex?%0A%5Ctext%7Bstatistic%7D%20%5Cquad%20%5Crightarrow%20%5Cquad%20%5Ctext%7Bestimated%20variability%7D%20%5Cquad%20%5Crightarrow%20%5Cquad%20%5Ctext%7Breference%20distribution%7D.%0A"></p>
<p>Within this route, you still have choices about the variability estimate. In regression output, for instance, a <em>model-based</em> OLS standard error is tied to a particular noise model, while a <em>robust (sandwich)</em> standard error is designed to be less dependent on that model. The estimator <img src="https://latex.codecogs.com/png.latex?%5Chat%7B%5Ctau%7D"> can be identical, while the attached uncertainty calculation changes because the calculator changed. That is why Freedman’s (2008) warning matters even in experiments: randomisation can justify the causal meaning of <img src="https://latex.codecogs.com/png.latex?%5Chat%7B%5Ctau%7D">, while leaving room for disagreement (or mistakes) about the standard error attached to it.</p>
<p><strong>Route B: build a reference distribution directly by replaying the world.</strong> This is the resampling or re-randomisation route. Instead of estimating an <img src="https://latex.codecogs.com/png.latex?SE"> and leaning on a Normal or <img src="https://latex.codecogs.com/png.latex?t"> approximation, you generate many ‘replays’ and recompute the statistic each time. The output is an empirical distribution of <img src="https://latex.codecogs.com/png.latex?%5Chat%7B%5Ctau%7D">, from which you can read off uncertainty summaries.</p>
<p>Two big families sit here:</p>
<ul>
<li><p><strong>Bootstrap and jackknife (sampling replay):</strong> you replay <em>which users you observed</em> by resampling units (bootstrap) or systematically leaving them out (jackknife). You can then compute <img src="https://latex.codecogs.com/png.latex?%5Cwidehat%7BSE%7D(%5Chat%7B%5Ctau%7D)"> as the standard deviation of the replicated estimates, or compute confidence intervals from quantiles of the empirical distribution. A <img src="https://latex.codecogs.com/png.latex?p">-value is also possible, but it requires an explicit hypothesis construction, just like before.</p></li>
<li><p><strong>Randomisation inference (assignment replay):</strong> you replay <em>who was treated</em> by re-running the randomisation procedure many times, respecting the original design. Under a sharp null, this directly gives a reference distribution for your statistic under the assignment mechanism.<sup>2</sup> A <img src="https://latex.codecogs.com/png.latex?p">-value can be computed with the most literal tail probability: <img src="https://latex.codecogs.com/png.latex?%0A%5Chat%7Bp%7D%20=%5C;%20%5Cfrac%7B%5C#%5C%7B%7CT_r%7C%20%5Cge%20%7CT_%7B%5Ctext%7Bobs%7D%7D%7C%5C%7D%7D%7BR%7D,%0A"> where <img src="https://latex.codecogs.com/png.latex?R"> is the number of simulated reassignments and <img src="https://latex.codecogs.com/png.latex?T_r"> is the statistic under reassignment <img src="https://latex.codecogs.com/png.latex?r">. Notice what is missing: there is no required step of ‘estimate an <img src="https://latex.codecogs.com/png.latex?SE"> and assume Normality’. The design supplies the reference distribution.<sup>3</sup></p></li>
</ul>
</section>
<section id="practical-implications" class="level2">
<h2 class="anchored" data-anchor-id="practical-implications">Practical implications</h2>
<p>You may have felt something off up to this point. In our Spotify example, the effect estimate is justified by random assignment (design logic), while a lot of standard inference is presented through a sampling lens. Furthermore, in plain A/B tests, you often find that robust <img src="https://latex.codecogs.com/png.latex?t">-tests, bootstrap uncertainty, and randomisation-based checks all tell the same story.</p>
<p>This is not because the layers collapse into one. It is because the situation is unusually ‘friendly’:</p>
<ul>
<li><strong>The estimator is simple.</strong> A difference in means is a stable object.</li>
<li><strong>Sample sizes are large.</strong> Many distributions become well-behaved once you have enough users, and many reasonable standardisations start to look similar.</li>
<li><strong>Different variance calculators converge.</strong> In the binary-treatment case, several common standard-error formulas are built from the same ingredients (treated and control variability and group sizes), so their numerical differences can get washed out.</li>
</ul>
<p>If all you need is a quick answer to ‘did the shelf move audiobook listening?’, this is why your preferred software’s default often feels like it ‘just works’.</p>
<p>But the friendly zone is not guaranteed. The moment the design stops being “randomise users 50/50”, the replay world changes, and the calculator has to match it.</p>
<p>For example, if you randomised within strata (country, device, prior engagement), then ‘replay the assignment’ means reshuffling <em>within strata</em>. A calculator that ignores this is quantifying uncertainty for a world that never could have happened. Alternatively, if assignment happens at a higher level (households, classrooms, markets), the effective sample size is the number of clusters, not the number of users. Many default approximations become fragile when there are few clusters.</p>
<p>This is the practical take of the three layers: inference is not a button you press after you get <img src="https://latex.codecogs.com/png.latex?%5Chat%7B%5Ctau%7D">. It is the combination of (i) what you estimated, (ii) what you think could have happened, and (iii) how you chose to quantify that.</p>
</section>
<section id="reading-list" class="level2">
<h2 class="anchored" data-anchor-id="reading-list">Reading list</h2>
<p>The references below have been helpful to me; they are not in any way meant as a comprehensive survey.</p>
<p>⭐ Abadie, A., Athey, S., Imbens, G., and Wooldridge, J. 2020. “Sampling-Based versus Design-Based Uncertainty in Regression Analysis.” <em>Econometrica</em>. <a href="https://economics.mit.edu/sites/default/files/publications/ECTA12675.pdf">link</a></p>
<p>Athey, S., &amp; Imbens, G. W. (2017). The econometrics of randomized experiments. In <em>Handbook of economic field experiments</em>. North-Holland. <a href="https://arxiv.org/abs/1607.00698">link</a></p>
<p>Freedman, David A. 2008. “On Regression Adjustments to Experimental Data.” <em>Advances in Applied Mathematics</em>. <a href="https://www.stat.berkeley.edu/~census/neyregr.pdf">link</a></p>
<p>Imbens, G. W., &amp; Rubin, D. B. (2015). <em>Causal inference in statistics, social, and biomedical sciences</em>. Cambridge university press. <a href="https://books.google.se/books?id=Bf1tBwAAQBAJ">link</a></p>
<p>Spotify. 2025-03-13. “How Spotify Is Driving Growth, Discovery, and Innovation in the Audiobook Market.” <em>Spotify Newsroom.</em> <a href="https://newsroom.spotify.com/2025-03-13/how-spotify-is-driving-growth-discovery-and-innovation-in-the-audiobook-market/">link</a></p>


</section>


<a onclick="window.scrollTo(0, 0); return false;" id="quarto-back-to-top"><i class="bi bi-arrow-up"></i> Back to top</a><div id="quarto-appendix" class="default"><section id="footnotes" class="footnotes footnotes-end-of-document"><h2 class="anchored quarto-appendix-heading">Footnotes</h2>

<ol>
<li id="fn1"><p>Note that inference can exist without hypothesis testing; testing is a decision layered on top of an uncertainty statement, not its foundation.↩︎</p></li>
<li id="fn2"><p>Without additional structure, this exactness is tied to sharp nulls; a null about an average effect does not by itself pin down the missing potential outcomes. You can read more about randomisation inference with weak nulls (such as related to ATE) in <a href="https://arxiv.org/abs/1809.07419">this paper</a> published <a href="https://www.tandfonline.com/doi/abs/10.1080/01621459.2020.1750415?casa_token=hMXx_Nl6vIMAAAAA:DMVLAKErKTsFGYb898Sr5wHC2Uxtt2_JIJ-ATIsahRreYayFZ_VaYPF6Q6dJpXsNc8newgsWgp-Bgg">in JASA</a>.↩︎</p></li>
<li id="fn3"><p>If you approximate a randomisation <img src="https://latex.codecogs.com/png.latex?p">-value by simulation, then <img src="https://latex.codecogs.com/png.latex?%5Chat%7Bp%7D"> has Monte Carlo error because <img src="https://latex.codecogs.com/png.latex?R"> is finite. If <img src="https://latex.codecogs.com/png.latex?c"> is the number of simulated statistics at least as extreme as <img src="https://latex.codecogs.com/png.latex?T_%7B%5Ctext%7Bobs%7D%7D">, then <img src="https://latex.codecogs.com/png.latex?%5Chat%7Bp%7D=c/R">. Treating <img src="https://latex.codecogs.com/png.latex?c%20%5Csim%20%5Coperatorname%7BBinomial%7D(R,p)"> gives a simple way to compute a confidence interval for <img src="https://latex.codecogs.com/png.latex?p"> (for example via a Clopper–Pearson or Wilson interval). This is uncertainty about the Monte Carlo approximation, not uncertainty about the treatment effect.↩︎</p></li>
</ol>
</section></div> ]]></description>
  <category>statistics</category>
  <guid>https://www.tabarecapitan.com/blog/0001-inference/</guid>
  <pubDate>Sun, 28 Dec 2025 23:00:00 GMT</pubDate>
</item>
</channel>
</rss>
