Glen
Posts: 67
Joined: Fri Jan 25, 2008 4:11 am
Location: Australia

I debated whether to post this to Literature (since I am describing a paper), or here (since
I am describing how the mathematics in the paper relates to nomography), or to Software
(since the algorithm has been implemented in the free package [i]R[/i]). I decided to post
here and maybe have a supplementary post in Software.

Lately I've been doing a lot of research (mostly reading papers) into transformations, and
in particular for the present discussion, transformation to additivity.

This is in the situation where you have a collection of z=F(x,y) data (likely in the form of
a table, but that's not necessary - it can just be a collection of x,y,z triples). The aim is
to find a transformation h(z), such that

h(z) ~= f(x) + g(y) (where ~= is "approximately equals")

- this is a transformation to additivity in the sense that the transformation h(z) makes
the bivariate function F into a sum of univariate functions.

There are actually several algorithms around that try to do this!

I will talk mostly about a particular one, ACE ("alternating conditional expectations"), which
is designed to work with noisy data. This is described in a paper by Breiman and Friedman [1].
There are some other algorithms that I may describe later.

In the notation of the paper, there is a response variable, Y and predictors X_1,...,X_p.
The ACE algorithm finds a transformation \theta(Y) and smooth functions \phi_1,...,\phi_p
such that the correlation corr(\theta(Y), \phi_1(X_1) + ... + \phi_p(X_p) ) is maximized.
(Excuse the LaTeX-ese, but its hard to write mathematics in ASCII. )

In terms of our problem, ACE attempts to find a smooth(ish) transformation h(z) and a pair
of smooth functions f(x) and g(y) such that the correlation corr(h(z), f(x)+g(y)) is maximized.

ACE alternates between estimating the \phi() functions by doing a kind of local averaging (smoothing)
of the ith partial residuals (the transformed Y minus the sum of the [i]other[/i] phi_j(X_j)) to
estimate phi_i, and once the \phi functions have all been re-estimated, the estimate of the transform
\theta is updated by averaging(/smoothing) the sums of the phi functions at each value of Y.

The overall approach has a number of deficiencies for our purposes, but on the whole it works quite well.

Even though the basic algorithm is not particularly tricky to implement, fortunately free code to
do this is already available. I will write about how you can get it and perhaps afterward a little
about how to use it in a post on the software section.

[b]Relationship to nomography[/b]

If Z=h(z), X=f(x) and Y=g(y), then we have a standard nomogram form, Z=X+Y.

This is easily implemented as a parallel-scale nomogram (though it can be done in other
forms, including N charts and even the elliptic-function nomograms Ron posted about).

Once you have the set of tick marks(z(i),x(j), or y(k)) you just take the relvant scale
and transform via the estimated h, f, or g. This is easiest if the data is already at the
desired tick marks, but the program can deal with other cases. Well, there are few other
details, but it's not particularly complex.

[b]Minor problems and issues[/b]

1) If the noise is very low (or nonexistent, as with data created from a purely functional
relationship), you need to play around with the convergence criteria, and I found it helps
to find a functional approximation to the smooth curves and cycle around with transformed
data as a new input.

2) As the algorithm goes through, places where it doesn't fit well tend to get strongly
transformed so that they have less impact on the correlation (e.g. if it fits large values badly,
it will tend to use, say an inverse tranformation, making the large values and their large errors
simultaneously smaller). You can combat this if you play with the weights to each point (I will
explain what to do here at some later time) as the data gets transformed (it won't stop it using
a strong transformation, but it stops it thinking that that fixes the problem).

[b]What next?[/b]

In a later post or posts I will put up an example or two showing what's involved, from equation
to nomogram, and describe how to get the software.

[b]References[/b]

[1] Breiman, L and Friedman, J.H. (1985), Estimating Optimal Transformations for Multiple
Regression and Correlation, Journal of the American Statistical Association