Basic Grid Mapping

Two techniques are used for most of the mapping at this site.

The first technique is a standard zip code tract process in which zip code centroids are used to produce base data.  Centroids of these zip code areas are obtained from national data warehouse sources, and all information for that area are linked to the centroid point.

The second uses a multiformula grid algorithm.  Multiformula refers to the several steps needed to make a grid that is pleasant to project and view, and can be small area and precise in nature.

Another multiformula process I developed transforms the grid back into real space representation.  I developed this was an attempt to get some “truth” back into the picture for those purists out there who feel uncomfortable with the meaning of grid cell centroid data.  There is this “corner effect” of centroids that I refer to on the first page about this methodology.  Including myself, some epidemiologists like to know where the real cases are.  Approximations won’t do in the real world applications involving streetside observations, surveillance, prevention and treatment tactics, only in the statisticial analyses.

So for truth, the grid is used to make the final “perfect map” of disease distributions.  The other grid map makes for detailed streetsmart work.  The hybrid of both serves mostly to merge grid results with the true locations in any isoline or krigging maps.

To be honest, as the uses planned for this method defined, and the errors that they tend to have in reporting and location analyses, the purists’ reason for a “truer” approach is found to be itself unreliable.  We learn about and accept reliability and variance everyday in statistical work, only we forget that we were once taught this.  The grid method does have this benefit of “correcting” for loss of validity and reliability.  It engages in the same logic as any fuzzy logic algorithm we might choose to use for work.   Fuzzy logic is by nature as much responsible for clouding up the true facts spatially, by introducing them back into the analyses, as any grid mapping technique is.  Both have the same error impacts, and both are equally acceptable “guesses” about the outcomes.

Knowing the limits of applications for this method is the point being made here.  This technique can be used for example to map out a particular misbehavior, a common accident, an injury related problem like a particular fracture, a recurring allergy problem in a community, a form of cancer or skin rash with a suspected local cause, a recurring social problem like heavy drug use and overdosing, a neighborhood child bullying problem, the density of a particular culturally-defined subgroup, the richness of a neighborhood with repeated claims of tourette’s syndrome onset or diabolo possession, cases of illegal teen use of prescription drugs, a small area hanta virus outbreak, a cluster of legionnaire’s lung, the density of deliberately set alley fires, cases of suspected fraud, recurrence of non-compliance with prescription drug use by mental health disease patients, thefts and burgleries, etc.

At the small area level, knowing your community is what this technique is all about.  At the large area, knowing you population and where and how a particular diagnosis, health behavior, psychological problem, etc. is dispersed becomes the primary goal.  Surveillance is as accurate as the methods you use to engage in it.  Focusing upon particular points like hamlets 1, 2, 3, … n, is interesting, but not continuous in nature spatially.  The intent of grid mapping is to be continuous across a surface, and in the case of hybridization, be accurate in defining the diffusion process, how people and medical conditions/health behaviors traverse over time and space.

This technique has historically been most commonly applied to business applications.  This was done for several reasons:

  • Bulk mailings, a major application of this technique, require that the mailing be processed by zip code location.
  • This means that the most widely used datum in addresses nationally and internationally is the zip code, which due to electronic mail processing techniques is for the most part the global standard for defining a specific point related to this information. For this reason ,the initial use of Big Data in the medical or HIT community has been focused on zip code data use, with census block analysts developing similar methods for small area, local epidemiological surveillance work.
  • It is common to find that personal address data has a 5 to 10 percent error rate, more than zip code error rates, and tends to involve generic postal locations (US post offices, mailbox facilities or shared mailbox locations at street ends), at a frequency too high to be of use for both core urban and fairly rural settings (again usually 7-10% on average, as much as 25% in extremely rural settings).  Zipcode patient mapping cannot correct for this problem.  This is an error that adds to the errors generated by data entry errors and grid cell centroid specific spatial errors in mapping.

The next method of mapping is grid mapping, utilizes either the above zip code lat-long information or uses individual address information to accurately locate each case or aggregated form of datum.  In some cases, zip code centroids are mapped onto the grid and spatial characteristics are taken into effect for data-sharing between nearest neighbors.  Other times this areal adjustment is not made, or is limited in its application due to “sliver effects” impacting the polygon data.  Either way, the grid method does present a realistic interpretation of the spatial distribution of these results, and helps users of this information to determine exactly where clusters exists that are of numerical importance or normalized, fraction/decimal defined (prevalence) importance.

In many cases, normalized data or data with age-gender adjustments are expected to be more reliable.  But this means for limiting or changing the outcomes data quality only holds true when certain comparisons are to be made.   Many standard comparisons are carried out at a cost level instead of a population ratio level.  Businesses are interested in where the money is spent and on what, not necessarily by whom and to what extent per family member or the like.  Businesses may be trying to link neighborhoods and place to specific consumer marketing capabilities or cultural and ethnicity-defined marketplaces yet to be taken advantage of.

The reason for these two means of assessing data –zip or polygon, and grid cell — focuses on intent and purpose of the evaluation.

With health, we try to reduce prevelance rates–the percent of people who have a particular problem.

With business,we try to keep track of where the money is coming in or where costs are too high, and try to make adjustments for these; our concern for money spent or acquired at the per population head level is not always important to us.

If the goal is to reduce system expenses, total are the focus before people or customers are the focus.

Very unusual or rare cases require more an understanding of where they happen and how frequently over time and space, than where they occur most frequently.  This may not impact anything at all at the individual level, but may impact corporate activities and financial planning activities.  For example, 10,000 people may not impact a system too much in Seattle, but this doesn’t mean you need to ignore those people due to their relative unimportance in the local urban setting.  Sometimes interventions engaged in even with these small groups are very important to keep in mind, for the consequences of not engaging in them could include significant increases in cost down the line.

As an example of what I am referring to here, consider the rate at which mothers refuse to immunize their children in Seattle.

This may have limited impacts of overall immunization rates according to HEDIS standards, but still pose an important public health risk and maybe even make these children a public health threat in some places (small communities or communal settings where aggregates of refusing mothers and neighboring families reside).  Even with an annual review of immunization rates documenting a success in terms of percent completion of the total series, this study ignores those not engage in the immunization programs.

A conversion of 100,000 children from 93% to 95% immunized represents an improvement of health care for 2000 children during that time frame.  One tenth that amount of more don’t undergo immunizations, a care practice which the potential of resulting in several deaths or severe chronic disease outcomes for disease normally absent like mumps, pertussis, rubella, polio and diphtheria.  Some of these lead to lifelong paralysis, others to neurological damage, others to renal failure.

So the reasons for the use of both numbers and ratios/prevalence/rates on maps include both economic and public health related needs for this form of regional analyses.

Finally, this grid mapping method allows for unique algorithms to be developed.  As an example, average number of days spent in a hospital for rural versus urban communities defined spatially on a map, numbers of annual breast screenings missed per grid cell, costs per household for specific forms of medical service, percent of income spent of health care expenses based on median local income census data.  One can evaluate at a per grid cell level:

  • average days late in obtaining your refill late in order to “save money” or due to lack of adequate transportation, based on rural locations
  • ratio of prescription drug costs to care costs annually, averaged per household, for families with specific ICD defined medical histories
  • numbers of seizure events per year for children, based on location and family income
  • prevalence of poor nutrition and malnutrition indicator ICDs, per small area defined by zip codes or grid cell values.
  • numbers of accidents filed as Emergent Care codes relative to type of rural setting, for a specific age group of children, specific income class, in relation to the various forms of recreational vehicle accidents that are coded.
  • differences in forms of suicide mothers versus fathers engage in, and where

These are all values that can be polygon (zip code) and/or grid mapped.

In fact there are no limits to the use of this means for applying and modeling business using this National Population Grid Mapping program.

The following recommendations were also made for its use:

  1. Mapping toy purchases by major stores, for regions and then defining purchase power per region using a Theissen polygon technique
  2. Mapping tobacco product use by age-gender-income-ethnicity and place, then differentiating these statistics into tobacco product type utilization, at a small areal level and a regional US modelling level
  3. Mapping gasoline utilization and costs regionally and locally based on petroleum service industry data, relating each of these to petroleum storage site and import and processing locations.
  4. Mapping hispanic cultural disease traits regionally (this means culturally-bound, culturally-linked, culturally-related), to determine migration routes and places with possible needs for improved or upscaled epidemiological surveillance activities and where culturally-targeted intervention and educational programs may need to be developed.


There are hundreds of maps or map videos I produced using this technique.  Ironically, the more you display to potential businesses interested in this technique, the less likely they become to admire it and accept it as a part of their strategy.  I realized this one day, early on in this work, when a midwestern company had an interest in learning what population health grid mapping was like.  I first demonstrated a few results.  A few days later, I made available for review a hundred or more examples of the outcomes, the best and the most spatially revealing.  This brought the reviewed outside their comfort zone.  The results were clear–I had already perfected this technique and was now fine-tuning, and at times, drawing relationships between the maps that were never before visualized in this field.  innovations are usually not immediately accepted.

3D mapping tells us more than 2D mapping.  3D mapping can be used to illustrate what we already know from 2D maps, but adds more details to the 2D map by allowing for applications that are more point directed or community health focused.  2D mapping cannot tell you which street corner is the source for all complaints and problems.  3D mapping brings you down to the exact neighborhood, street, street gang area, or even household.

Like Schopenhauer once stated . . .

All truth passes through three stages. First, it is ridiculed. Second, it is violently opposed. Third, it is accepted as being self-evident. 

Arthur Schopenhauer