squash is an add-on package for the R statistical environment. This package provides functions for color-based visualization of multivariate data, i.e. colorgrams or heatmaps. Lower-level functions are provided to map numeric values to colors, display a matrix as an array of colors, and draw color keys. Higher-level plotting functions are provided to generate a bivariate histogram, a dendrogram aligned with a color-coded matrix, a triangular distance matrix, and more.
The current version is 1.0.1 (2011-08-15).
As with many R packages, squash can be obtained from CRAN, or can can be downloaded and installed automatically by entering the following at the R prompt:
install.packages('squash')
Previous versions are here.
Please send questions or comments about squash to Aron.
Aug 15, 2011: squash is now available from CRAN. I have fixed numerous bugs, added new functions, and enhanced functionality of existing functions. For anyone already using squash, I should point out that I have made some major interface changes, and I have renamed some functions for clarity, or to avoid conflicts. Sorry for any inconvenience; hopefully this will be the last time I do this.
library(squash)
"hist2" is a useful alternative to a scatter plot, if the number of points is large.
x <- rnorm(10000) y <- rnorm(10000) + x hist2(x, y)
Here, 3-dimensional (x, y, z) points are plotted with x and y in the graph plane and z indicated by color.
"makecmap" defines a mapping from numbers to colors.
"jet" is a color palette.
"cmap" does the conversion from numbers to colors, using the previously defined mapping.
"hkey" draws a horizontal color key.
map <- makecmap(iris$Petal.Length, colFn = jet)
plot(iris[,1:2], pch = 16,
col = cmap(iris$Petal.Length, map = map),
main = 'Iris data')
hkey(map, 'Petal length')
Given a large number of 3-dimensional points (x, y, z), how does z vary as a function of x and y?
The "squashgram" is similar to a 2-dimensional histogram, except that the color indicates a summary (in this case, the median) of all z values of the points falling into the bin.
attach(quakes)
squashgram(depth ~ long + lat, FUN = mean,
main = 'Earthquakes off Fiji')
A larger square indicates more points falling into the rectangular interval, and thus greater confidence.
squashgram(depth ~ long + lat, FUN = mean,
main = 'Earthquakes off Fiji', shrink = 5)
"colorgram" is similar to the built-in R function "image" but offers several additional features: 1. An optional color key is added. 2. A color can be specified for missing values, and for values outside the range of the color scale. 3. The size of each grid rectangle can be specified to convey additional information.
"blueorange" is a color palette.
x <- y <- seq(-10, 10, length= 29)
f <- function(x,y) { r <- sqrt(x^2+y^2); 10 * sin(r)/(r+1) }
z <- outer(x, y, f)
map <- makecmap(z, colFn = blueorange, n = 20, symm = TRUE)
colorgram(x, y, z, map = map)
"cimage" is similar to "colormap", except that there is no number-to-color mapping. Instead, we pass the function a matrix of RGB values.
red <- green <- 0:255 rg <- outer(red, green, rgb, blue = 1, maxColorValue = 255) cimage(red, green, zcol = rg)
The colors indicate characteristics of each item being clustered.
us.dend <- hclust(dist(scale(state.x77)))
income <- state.x77[, 'Income']
frost <- state.x77[, 'Frost']
murder <- state.x77[, 'Murder']
## generate color maps
income.cmap <- makecmap(income, n = 5, colFn = colorRampPalette(c('black', 'green')))
frost.cmap <- makecmap(frost, n = 5, colFn = colorRampPalette(c('black', 'blue')))
murder.cmap <- makecmap(murder, n = 5, colFn = colorRampPalette(c('black', 'red')))
us.mat <- data.frame(Frost = cmap(frost, frost.cmap),
Murder = cmap(murder, murder.cmap),
Income = cmap(income, income.cmap))
par(mar = c(5,4,4,3)+0.1) # make space for color keys
dendromat(us.dend, us.mat,
ylab = 'Distance', main = 'US states')
vkey(frost.cmap, 'Frost')
vkey(murder.cmap, 'Murder', y = 0.3)
vkey(income.cmap, 'Income', y = 0.7)
distogram(eurodist, title = 'Distance (km)')
We provide a few functions to generate contiguous color palettes.
squash.palettes <- c('rainbow2', 'jet', 'grayscale', 'heat', 'coolheat', 'blueorange', 'bluered', 'darkbluered')
R.palettes <- c('rainbow', 'heat.colors', 'terrain.colors', 'topo.colors', 'cm.colors')
plot(0:8, type = 'n', ann = FALSE, axes = FALSE)
for (i in 1:5) {
p <- R.palettes[i]
hkey(makecmap(c(0, 9), colFn = get(p)),
title = p, x = 2, y = i - 1)
}
for (i in 1:8) {
p <- squash.palettes[i]
hkey(makecmap(c(0, 9), colFn = get(p)),
title = p, x = 6, y = i - 1)
}
text(3, 8, 'R palettes', font = 2)
text(7, 8, 'squash palettes', font = 2)