Notes on Stan

(and rstan)

Stan is a probabilistic programming language for Bayesian inference using Markov chain Monte Carlo sampling. It’s what I use for Bayesian inference for my PhD research.

Back in undergrad, I used JAGS, but Stan uses Hamiltonian Monte Carlo which can allow it to more efficiently sample complicated distributions.


Startup occasionally fails on some chains

TODO: Need to look up the error message on this again

We had a problem where some chains would fail to start on one of our models that used a softmax noise parameter σ.

The problem (with our model, not Stan) turned out to be that random initialization would cause some values of σ to be really low. This low noise would create a really steep logistic function, basically a step function. This step function would be really hard to sample in, since the gradient was essentially zero, meaning the chain couldn’t find which direction to move.

The solution was to start initialization at high noise values, that way there’d always be a gradient. This doesn’t prevent σ from contracting to steeper values, it just prevents it from starting with a steep value in a location with little probability density.


Generated quantities with transformed parameters

rstan has an issue (see Issue #714) where gqs() (which samples generated quantities) doesn’t work with transformed parameters. I submitted a patch back in June 2021 that can fix this issue, but I may need to ping someone again about that. This is pretty low priority, since I think most people won’t run gqs() by itself.

Riemannian sampling?

This would be complicated and I don’t think it’ll be solved soon, but some models with scale parameters would benefit from a non-Euclidean/Riemannian sampler. Scale parameters can squeeze some areas of space more than others, but the problem is that the step size doesn’t shrink to accomodate this. So in tight spaces, your step size will be too big and you’ll jump out of the typical set, and in looser/bigger spaces your step size will be inefficient.

IIRC, step size is calculated during one of the slow phases of warmup where a matrix (either called a “metric” or “mass matrix”) is measured. I also think this metric M only uses diagonal values currently. It might be possible for the metric to slightly adjust step sizes if it also used covariance, but I need to double check this.

Parallel Nix build hanging issue?

It’s been years since I’ve tried this, so I’m just writing from memory, but I remember an issue where compiling Stan models could hang when compiling in a Nix build in parallel. Not sure if this is still a problem. But what was really surprising about this is that the Nix builders were separate, and Nix builds are run in hermetic sandboxes. You would think that there’d be no way for two builds to interefere with each other. They shouldn’t really have any IPC mechanism or visibility outside the sandbox, so I’m not sure what happened there.

I don’t remember exactly how I solved the problem either. Did I set Nix’s --max-jobs=1 when compiling Stan models to prevent this from happening? Or did I find a specific critical section that I separated into its own derivation? Not sure.