Wolters-ALRcoding

2016-11-14	Original version.
2017-12-19	Updated with link to the published paper.

Introduction

Here you will find an interactive demonstration allowing you to explore the behaviour of autologistic regression (ALR) models. The demo is something of a companion to the paper “Better Autologistic Regression”. The article shows that the different variants of the ALR model are in fact different probability models with quite different qualitative behaviour. This demonstration allows you to observe these differences for yourself.

This demonstration was written in javascript with html + css. I was motivated to try it by the similar example of the Ising model I happened upon here. Javascript is not exactly designed for numerical computing but if you can get around that one way or another it is nice to be able to make something that will just run in the browser without requiring anything special on either server or client side.

The demo

The demonstration is based on the following scenario, which is very similar to that in section 4.2 of the paper. We are interested in binary random variables \(Z_1, Z_2,\ldots Z_{n^2}\), arranged in an \(n\)-by-\(n\) regular grid covering the unit square. Variable \(Z_i\) is positioned at spatial coordinates \((x_i, y_i)\). Each variable can take either of two states “high” (yellow) or “low” (green), and we can visualize any realization of the random vector \(\mathbf{Z}\) as a two-color image. Similarly we can view the probabilities \(P(Z_i=\text{high})\) as an image with colors varying smoothly from green to yellow.

Random vector \(\mathbf{Z}\) is distributed according to the ALR model

\[\log\left(\frac{\pi_i}{1-\pi_i}\right) = \beta_0 + \beta_1s_1(x_i,y_i) + \beta_2s_2(x_i,y_i) + \lambda\sum_{j\sim i}(z_j - \mu_j).\]

In this formula, \(\pi_i\) is the conditional probability that variable \(i\) takes the high state, given the states of all of its neighbours; \(\beta_0\), \(\beta_1\), and \(\beta_2\) are regression coefficients; \(\lambda\) measures the strength of association between adjacent variables; and functions \(s_1\) and \(s_2\) determine the covariate values (depending on the choice of endogenous structure). More details are given below.

Here is the demonstration. If you mouse over the boldface titles you can see some descriptions and instructions. Those wanting a more thorough understanding of what’s happening are referred to the paper.

More details

The autologistic (AL) model is basically the statistician’s version of the well-known Ising model that has its origins in physics. It’s a probability model for a collection of binary random variables. The dependencies among the random variables are expressed by a graph. The conditional distribution of any variable in the collection, given all the others, depends only on the variables connected to it in the graph.

When people refer to the Ising model they usually mean the model for variables arranged in a square 2D grid with each variable connected to its four nearest neighbours. The square grid is also used in the demonstration, but each interior variable can be connected to either its nearest 4 or 8 neighbours (edge variables have fewer neighbours). The mathematical structure of the model allows for more general arrangements, however. It is part of the class of Markov random field (MRF) models, where the variables’ graph can be arbitrary. The AL model can be viewed as a flexible way of specifying the joint probability mass function of a set of correlated binary random variables. For this reason it is (potentially) useful in a variety of data analysis situations.

The AL model is simplest to express in its “conditional logit” or “conditional log-odds” form. If \(\pi_i\) is the conditional probability that variable \(Z_i\) takes its high value, given the values of all its neighbours, then the AL model says

\[\log\left(\frac{\pi_i}{1-\pi_i}\right) = \alpha_i + \lambda\sum_{j\sim i}z_j.\]

Parameter \(\alpha_i\) is the unary parameter, and controls the \(i\)th variable’s own (or endogenous) contribution to its log-odds. Parameter \(\lambda\) is the pairwise parameter, and controls the strength of influence neighbours have on each other. The sum on the right hand side is over all neighbours of variable \(i\); it adds up the values of the neighbours.

The above AL model is the “standard” one (the one that’s equivalent to the Ising model). A “centered” version of the AL model has been proposed¹. It has conditional log-odds form

\[\log\left(\frac{\pi_i}{1-\pi_i}\right) = \alpha_i + \lambda\sum_{j\sim i}(z_j - \mu_j),\]

Where \(\mu_j\) is the expectation of \(Z_j\) under the assumption of independence. In the calculations underlying the demo, the \(\mu_j\) terms are included in the formulae, but they are all set to zero when the standard model is selected.

An autologistic regression model is obtained by making the unary parameters linear functions of some covariates. Letting \(s_{ri}\) represent the \(r\)th covariate value for the \(i\)th response, the ALR model with \(p\) covariates is

\[\log\left(\frac{\pi_i}{1-\pi_i}\right) = \beta_0 + \beta_1s_{1i} + \beta_2s_{2i} + \cdots + \beta_ps_{pi} + \lambda\sum_{j\sim i}(z_j - \mu_j).\]

There is an obvious similarity between the ALR model and ordinary logistic regression: the ALR model looks just the same as logistic regression², but with an extra term (sometimes called the “autocovariate”) tacked onto the end. When \(\lambda=0\), all \(Z_i\) are independent and the two models are in fact the same.

Returning to the case illustrated in the demo, the regression part of the model is \(\beta_0 + \beta_1s_1(x_i,y_i) + \beta_2s_2(x_i,y_i).\) We call this part of the model the endogenous structure, and if we assume \(\lambda=0\) we can use it to compute the endogenous probability map seen in the leftmost image.

The functions \(s_1(x,y)\), \(s_2(x,y)\) give the values of the covariates, which are constructed to depend on the spatial location of the variables. These functions are different for the three choices of the endogenous structure control:

For the smooth case, \(s_1(x_i,y_i) = x_i\) and \(s_2(x_i,y_i)=y_i\). The covariates are just the coordinates of the variables in space. Changing \(\beta_1\) and \(\beta_2\) will produce a smooth probability gradient over the unit square.
In the box case, the functions are defined piecewise over two parts of the unit square. If \((x_i,y_i)\) is inside the box \([0.3, 0.7]^2\), the functions are \(s_1(x_i,y_i) = 5(x_i-0.5)\) and \(s_2(x_i,y_i) = 5(y_i-0.5)\). Outside the central box, \(s_1=s_2=0\). This produces a smooth probability gradient inside the central square, with a uniform background (determined by \(\beta_0\)) elsewhere.
In the random case, both \(s_1(x_i,y_i)\) and \(s_2(x_i,y_i)\) are functions that return \(+1\) or \(-1\) at random with equal probability.

When computing with the model we must assign numerical values to the high and low states of the variables. This is called the coding. An interesting result from the aforementioned article is that in the ALR model, changing the coding produces a different probability model—it’s more than just a parameter transformation. In physics (with the Ising model), it is customary to use \(\{-1,+1\}\) coding; in spatial statistics applications, people typically use \(\{0,1\}\). You can change between these two options using the coding control³.

Using the demonstration

The controls at the top of the demo allow you to change the specification of the model. By changing the response coding and model type options, you can produce four distinct variants of the ALR model. Things are not very interesting when \(\lambda=0\). In this case all pixels are mutually independent and all four model variants are equivalent. But when you begin to increase the magnitude⁴ of \(\lambda\) you will begin to see large differences in the variants’ behaviour. The three endogenous structure options allow you to consider what the models do when the regression part is smooth, smooth with discontinuities, or spatially random.

It is my contention that the standard model with plus/minus coding is the only one that gives reasonable behaviour across all conditions you might throw at it. The main goal of the demonstration is to allow you to get a feeling for this conclusion yourself.

The centered model was proposed by Caragea and Kaiser, JABES, 2009.↩︎
But note that in logistic regression, the responses are assumed independent, whereas they are not independent in the ALR model. In the ALR model \(\pi_i\) is a conditional probability.↩︎
The plus/minus option actually uses \(\{-1/2,1/2\}\) rather than \(\{-1,+1\}\), to ensure that the two codings are equivalent when \(\lambda=0\). That’s okay, because any two codings that are symmetric around zero are equivalent.↩︎
Positive values of \(\lambda\) (which encourage neighbours to be similar) are of most interest for practical applications, especially those with a spatial interpretation. You can also make \(\lambda\) negative, which amounts to repulsive interactions between adjacent pixels. At the very least this produces interesting-looking patterns.↩︎

Interactive Demo: Autologistic Regression Models

Introduction

The demo

More details

Using the demonstration