Monday, 23 March 2009

Manifold & R for spatial statistics: an unlikely couple!

There have been numerous discussions in the past in the Manifold User Forums regarding a lack of (exploratory) spatial analysis/statistics tools, such as measures and visualisations, in the Manifold GIS. Altough it seems likely that a number of users have been sending in detailed suggestions for spatial analysis functionality, so far it seems that Manifolds development focus has been devoted to other more fundamental areas such as interfacing with spatial databases and the ability to efficiently use multithreading and CUDA.

Given that in the short to medium term, we most probably won't see the integration of significant spatial analysis functionality into Manifold, a pragmatic approach is the integration of external software packages with Manifold. There are a number of software packages that offer spatial analysis and statistics capabilities such as for example Crimestat, Geoda and the R project. R, an open source project benefits from a wide support in academia as a platform for the implementation of statistical computing, and thus provides a very rich environment for the analysis of spatial data through a combination of free packages. R is a command-line environment, and although the syntax is relatively accessible, it does present a significant learning curve for any beginners.

Recently, my research project led me to investigate the spatial distribution of foreign investors into London. I needed to do a density analysis of historic investment patterns to identify likely agglomeration or dispersion processes between investors. Although Manifold doesn't offer any relevant density estimation algorithms, R, and specifically the spatstat package, allows for the creation of Kernel Density Estimation (for the estimation of density) grids.

I took the opportunity to write a script that gives users a point and click front-end to both the kernel smoothed intensity function from a point pattern (KDE), and spatial smoothing (interpolation) of numeric values observed at a set of irregular locations (GKS) from inside Manifold.  The script takes care of building and maintaining the interface between Manifold and R, running the analysis in the background and creating a result surface component. I must acknowledge here the help and inspiration of numerous users on the forum which have been working with R.

[caption id="attachment_44" align="aligncenter" width="527" caption="Screenshot example of interface and output"]Screenshot example of interface and output[/caption]



Although R is completely hidden from the user of the script once installed, the successful installation relies on a basic understanding of the concepts of R and the installation of a few prerequisite software tools and R packages. Along with R , you need to have installed R(D)Com A all in one package for R and R(D)Com can be found at statconn. You also need a C:\temp\ directory (temporary files are stored there). You also need to have the following R packages installed:

  • spatstat, the main analysis package

  • maptools, helper package for the conversion of data to a spatstat compatible format

  • rgdal, package allowing the import and export of data from R to Manifold


I strongly advise anyone wanting to use this script to first read and understand the algorithms and outputs involved by consulting the relevant help pages from the spatstat package.  I also include two datasets that are suitable for experimentation with the script, one each for KDE and GKS.

Finally, there are some caveats to this script. I do not make any guarantees as to the output of the script, and I have to repeat that you need to have an understanding of what the algorithms do to fully comprehend the analysis.  Also, the script at the moment doesn't take account of projections at all. I have personally only tested the script with projected point patterns (British National Grid). In the case of BNG, assigning the BNG projection to the created Image (Preserve Local Values ticked) should be sufficient. Your mileage may vary when using other projections.

You can find the download of the Manifold .map file with the script here!

As you may have guessed, this is only a first stab at the integration of Manifold with R, and it is still an unlikely couple which sometimes can have communication difficulties. Clearly is a lot of work left in integrating other most basic functionality, a few examples of functionalities being other interpolators such as IDW, Geographically Weighted Regression, LISA and Moran's I ... .

But this proof of concept shows the potential for added functionality to boost Manifold's power from a pure GIS to an exploratory spatial statistics toolset.

Monday, 16 March 2009

Manifold 9: A world record and release date

Just saw this press release on the Manifold website:
Carson City, NV USA — 16 March 2009 — Manifold.net today announced a new world record for the number of processors used in a personal computer for Geographic Information Systems (GIS) processing. At the company's 2009 European User Meeting in London, Manifold demonstrated an upcoming new software product that simultaneously utilized over 1440 processor cores to perform a remote sensing image computation at supercomputer speed with over 3.5 teraflops of performance. Manifold demonstrated the new software on a desktop 64-bit Windows PC equipped with three NVIDIA GTX 295 GPU cards costing less than $500 each. (Illustration at right shows the demonstration hardware.)

Wow, I didnt know that we witnessed a world record at the User Meeting back in February here at UCL! Well I am glad that Manifold came and did their demo of Release 9.0, even though they did so in their usual hyberbole style.

Also, good to see that they are showing commitment to a release date around June. This should imply a beta starting in the next weeks...!!!

Friday, 13 March 2009

Heatmaps for Mashups ... too easy?


heatmapapi



HeatMapApi.com is a new service which allows Google Maps mashups to integrate heat map representations easily. Heat maps,  or more generally point to raster interpolations allow the graphical representation of point patterns through the use of continuous colors identifying areas of higher or lower density of points. Areas where this has been employed are crime hotspots analysis or economic activity analysis.

A novel concept in Web 2.0 mashups, I was interested in finding out what the methodology was behind the generation of these rasters. Sadly, I couldn't find a definitive answer on the algorithm that the website uses to generate its hotspot maps. They do expose in their API two variables that can influence the generation of the heatmaps, decay and boost, but without information on the algorithm behind it, the setting of these values remains a pure exercise in trial and error, and seeing what "looks" best. Also, because the parameters are set as "optional", most developers will be tempted into a one size fits all approach, smoothing out interesting patterns in the data, or creating hotspots that are not statistically viable, creating masses of effectively meaningless maps.

Mashup developers thus will more than ever the spatial analysis literacy skills to understand the processes, models and algorithms that lie behind the pretty maps.

Note: This is not a new problem, but has been present all through the development of Exploratory Spatial Data Analysis over the past 20-30 years in academia and commercial settings, and a lot can be learned from this past experience.