# Transforming random variables

In certain cases, it is useful to transform a random variable $\Theta$ to another random variable $\Psi$ via a transformation $\Psi=f(\Theta)$. For instance, this situation arises typically if one wants to sample from the probability density function $p_{\scriptscriptstyle{\Theta}}(\theta)$ of a positively-valued random variable $\Theta$. Markov chain Monte Carlo (MCMC) algorithms are conventionally designed to draw correlated samples from a desired target (density) $p_{\scriptscriptstyle{\Psi}}(\psi)$ of a random variable $\Psi$ taking values over the real line. So, if the target of interest $p_{\scriptscriptstyle{\Theta}}$ has support over the positive real line and the MCMC algorithm samples from a density $p_{\scriptscriptstyle{\Psi}}$ with support over the real line, then a random variable transformation, such as $\Psi=\log{(\Theta)}$, can help resolve the matter. In particular, the transformation allows to sample from $p_{\scriptscriptstyle{\Psi}}$ via the MCMC method of choice and then the inverse transformation $\Theta=\exp{(\Psi)}$ converts the simulated Markov chain to a set of sample points from $p_{\scriptscriptstyle{\Theta}}$. Obviously, it is needed to find the form of the target density $p_{\scriptscriptstyle{\Psi}}$ on which MCMC will be applied.

Although such random variable transformations are common practice, one may need to look up the formula for the transformation to pass from the original density $p_{\scriptscriptstyle{\Theta}}$ to the transformed density $p_{\scriptscriptstyle{\Psi}}$. The main source of confusion is whether one needs the Jacobian associated with the transformation $f$ or with the inverse transformation $f^{-1}$.

There is a way to retrieve the formula intuitively via a geometric argument, rather than trying to uncover it mnemonically. The main argument is that of area preservation in the case of univariate random variables. It suffices to realize that for a small displacement, the area below the curves of the two densities is the same, which means that $p_{\scriptscriptstyle{\Psi}}(\psi)d\psi=p_{\scriptscriptstyle{\Theta}}(\theta)d\theta$.

This realization suffices to recover the remaining steps. It follows that $p_{\scriptscriptstyle{\Psi}}(\psi)=p_{\scriptscriptstyle{\Theta}}(\theta)\frac{d\theta}{d\psi}$.

Notice that $\Theta \overset{f}{\underset{f^{-1}}\rightleftarrows} \Psi$,

which gives the transformed density $p_{\scriptscriptstyle{\Psi}}(\psi)=p_{\scriptscriptstyle{\Theta}}(f^{-1}(\psi))\left|\frac{df^{-1}(\psi)}{d\psi}\right|$.

The Jacobian in the univariate case is the derivative $\frac{df^{-1}(\psi)}{d\psi}$, associated with the inverse transformation $f^{-1}$. The absolute value of the derivative ensures that the density $p_{\scriptscriptstyle{\Psi}}$ is non-negative. Having understood the univariate case, the multivariate scenario follows straightforwardly as $p_{\scriptscriptstyle{\boldsymbol\Psi}}(\boldsymbol\psi)=p_{\scriptscriptstyle{\boldsymbol\Theta}}(f^{-1}(\boldsymbol\psi))\left|\frac{\partial f^{-1}_{i}(\boldsymbol\psi)}{\partial_{j}\boldsymbol\psi}\right|$,

where $\left|\frac{\partial f^{-1}_{i}(\boldsymbol\psi)}{\partial_{j}\boldsymbol\psi}\right|$ denotes the determinant of the Jacobian of $f^{-1}$.

To follow through with the example $\Theta \overset{\log}{\underset{\exp}\rightleftarrows} \Psi$,

notice that $f=\log$, $f^{-1}=\exp$. So, the derivative $\frac{df^{-1}(\psi)}{d\psi}$ becomes $\displaystyle\frac{df^{-1}(\psi)}{d\psi}=\frac{d \exp (\psi)}{d\psi}=\exp{(\psi)}$,

whence $p_{\scriptscriptstyle{\Psi}}(\psi)=p_{\scriptscriptstyle{\Theta}}(\exp{(\psi)}) \exp{(\psi)}$.

The target log-density for MCMC is thus $\log{(p_{\scriptscriptstyle{\Psi}}(\psi))}=\log{(p_{\scriptscriptstyle{\Theta}}(\exp{(\psi)}))} + \psi$.

# The birthday problem and tail recursion

The birthday problem or probability paradox concerns the probability that a group of $n$ people all have different birthdays, see for instance p. 15 of the book “Statistical modelling and computation” by Dirk Kroese and Joshua Chan. The purpose of this blog post is to exemplify how tail recursion can be used to avoid mutability in the context of a simple statistical computation, thus switching from the imperative to the functional programming paradigm.

Short recap on the solution to the birthday problem

As a succinct recap of the solution to the birthday problem, let $A_k$ denote the event that the $k$ first people have different birthdays for $k=1,2,\dots, n$. Notice that $A_{k}\subseteq A_{k-1}$, whence $A_n=\underset{k=1}{\overset{n}{\bigcap}}A_k$. The probability that all $n$ people have different birthdays is then $\mbox{P}(A_n)=\mbox{P}(\underset{k=1}{\overset{n}{\bigcap}}A_k)$. By application of the product rule of probability, $\mbox{P}(A_n)=\mbox{P}(A_1)\displaystyle\prod_{k=2}^{n}\mbox{P}(A_k|A_{k-1})$.

Given that the first $k-1$ people have different birthdays, the first $k$ people have different birthdays if and only if the birthday of the $k$-th person is chosen from the remaining $365-(k-1)$ birthdays. So, $\mbox{P}(A_k|A_{k-1})=\displaystyle\frac{365-k+1}{365}$.

Obviously, $\mbox{P}(A_1)=1$, which leads to $\mbox{P}(A_n)=\displaystyle\frac{1}{365^{n-1}} \prod_{k=2}^{n}(365-k+1)$.

Computing the probability of different birthdays via an iterative function

To compute the probability $\mbox{P}(A_n)$ that all $n$ people have distinct birthdays as a function of $n$, consider the following iterative function in R:

UniqueBirthdayProb <- function(n) {
p <- 1
for (k in 1:n) {
p <- p * (365 - k + 1) / 365
}
return(p)
}


The function call

UniqueBirthdayProb(23)


gives a probability of $0.4927028$, which means that the probability of $n=23$ people not exhibiting a duplicate birthday is less than $50\%$.

The following snippet computes and plots $\mbox{P}(A_n)$ for $n=1,2,\dots,70$:

nseq <- 1:70
pseq <- sapply(nseq, UniqueBirthdayProb)

plot(
nseq, pseq,
type="o", pch="*", ylim=c(0, 1),
xlab="n", ylab=expression('P(A'[n]*')'),
main="Probability of distinct birthdays"
)


The resulting plot is shown below: Computing the probability of different birthdays via a recursive function

The following R function computes $\mbox{P}(A_n)$ recursively:

RecUniqueBirthdayProb <- function(n) {
if (n == 1) {
return(1)
} else {
return(RecUniqueBirthdayProb(n - 1) * (365 - n + 1) / 365)
}
}


Mutability has been avoided in that the state of the local variable p in the body of UniqueBirthdayProb(n) is not altered. The function RecUniqueBirthdayProb(n) calls itself recursively, and therefore avoids the need for a local variable. A downside of the recursive function calls in RecUniqueBirthdayProb(n) is that the stack frame is filled for increasing values of the input argument n.

Computing the probability of different birthdays via a tail-recursive function

Tail recursion reduces the number of recursive calls. This is how to define a tail-recursive function in R to compute $\mbox{P}(A_n)$:

TailRecUniqueBirthdayProb <- function(n) {
loop <- function(acc, n) {
if (n == 1) {
return(acc)
} else {
return(loop(acc * (365 - n + 1) / 365, n - 1))
}
}
loop(1, n)
}


A closure loop(acc, n) is defined inside the tail-recursive function TailRecUniqueBirthdayProb(n). The number of recursive calls is reduced via the accumulator acc, which is the first input argument to loop(acc, n).

A comparison of stack frames between the three functions

To compare the iterative, recursive and tail-recursive functions in terms of their stack demands, the trace() R function is called upon each of the three functions, yielding the following output:

> trace(UniqueBirthdayProb)
> UniqueBirthdayProb(10)
trace: UniqueBirthdayProb(10)
 0.8830518
> untrace(UniqueBirthdayProb)
>
> trace(RecUniqueBirthdayProb)
> RecUniqueBirthdayProb(10)
trace: RecUniqueBirthdayProb(10)
trace: RecUniqueBirthdayProb
trace: RecUniqueBirthdayProb
trace: RecUniqueBirthdayProb
trace: RecUniqueBirthdayProb
trace: RecUniqueBirthdayProb
trace: RecUniqueBirthdayProb
trace: RecUniqueBirthdayProb
trace: RecUniqueBirthdayProb
trace: RecUniqueBirthdayProb
 0.8830518
> untrace(RecUniqueBirthdayProb)
>
> trace(TailRecUniqueBirthdayProb)
> TailRecUniqueBirthdayProb(10)
trace: TailRecUniqueBirthdayProb(10)
 0.8830518
> untrace(TailRecUniqueBirthdayProb)


It is clear from the output that RecUniqueBirthdayProb(n) requires more recursive function calls than TailRecUniqueBirthdayProb(n). Strictly speaking, and despite the reduction in the number of recursive function calls, R does not support tail elimination. The focus of this post is on the concept of tail recursion and how it can be applied in a simple statistical problem, not on the specifics of tail elimination in R. The choice of programming language in this post has been made on pedagogical and not on performance grounds.