Heat map

Summary

A heat map (or heatmap) is a data visualization technique that shows magnitude of a phenomenon as color in two dimensions. The variation in color may be by hue or intensity, giving obvious visual cues to the reader about how the phenomenon is clustered or varies over space. There are two fundamentally different categories of heat maps: the cluster heat map and the spatial heat map. In a cluster heat map, magnitudes are laid out into a matrix of fixed cell size whose rows and columns are discrete phenomena and categories, and the sorting of rows and columns is intentional and somewhat arbitrary, with the goal of suggesting clusters or portraying them as discovered via statistical analysis. The size of the cell is arbitrary but large enough to be clearly visible. By contrast, the position of a magnitude in a spatial heat map is forced by the location of the magnitude in that space, and there is no notion of cells; the phenomenon is considered to vary continuously.

Heat map generated from DNA microarray data reflecting gene expression values in several conditions
A heat map showing the RF coverage of a drone detection system

"Heat map" is a relatively new term, but the practice of shading matrices has existed for over a century.[1]

HistoryEdit

Heat maps originated in 2D displays of the values in a data matrix. Larger values were represented by small dark gray or black squares (pixels) and smaller values by lighter squares. Toussaint Loua [fr] (1873) used a shading matrix to visualize social statistics across the districts of Paris.[1] Sneath (1957) displayed the results of a cluster analysis by permuting the rows and the columns of a matrix to place similar values near each other according to the clustering. Jacques Bertin used a similar representation to display data that conformed to a Guttman scale. The idea for joining cluster trees to the rows and columns of the data matrix originated with Robert Ling in 1973. Ling used overstruck printer characters to represent different shades of gray, one character-width per pixel. Leland Wilkinson developed the first computer program in 1994 (SYSTAT) to produce cluster heat maps with high-resolution color graphics. The Eisen et al. display shown in the figure is a replication of the earlier SYSTAT design.[citation needed]

Software designer Cormac Kinney trademarked the term 'heat map' in 1991 to describe a 2D display depicting financial market information.[2] The company that acquired Kinney's invention in 2003 unintentionally allowed the trademark to lapse.[3]

 
Spatial Heat Map Example: Displays temperature across a world image with red being the highest and blue being the lowest degree in temperatures.

TypesEdit

There are two main type of heat maps: spatial, and grid.

A spatial heat map displays the magnitude of a spatial phenomena as color, usually cast over a map. In the image labeled “Spatial Heat Map Example,” temperature is displayed by color range across a map of the world. Color ranges from blue (cold) to red (hot).

A grid heat map displays magnitude as color in a two-dimensional matrix, with each dimension representing a category of trait and the color representing the magnitude of some measurement on the combined traits from each of the two categories. For example, one dimension might represent year, and the other dimension might represent month, and the value measured might be temperature. This heat map would show how temperature changed over the years in each month. Grid heat maps are further categorized into two different types of matrices: clustered, and correlogram. [4]

  • Clustered heat map: The example of the monthly temperature by year is a clustered heat map.
  • Correlogram: A correlogram is a clustered heat map that has the same trait for each axis in order to display how the traits in the set of traits interact with each other. The correlogram is a triangle instead of a square because the combination of A-B is the same as the combination of B-A and so does not need to be expressed twice.

UsesEdit

Heat maps have a wide range of possibilities amongst applications due to their ability to simplify data and make for visually appealing to read data analysis. Many applications using different types of heat maps are listed below.

Business Analysis: Heat maps are used in business analytics to give a visual representation about a company’s current functioning, performance, and the need for improvements. Heat maps are a way to analyze a company’s existing data and update it to reflect growth and other specific efforts. Heat maps visually appeal to team members and clients of the business or company.

Websites: There are many different ways heat maps are used within websites to determine a visiting users actions. Typically, there are multiple heat maps used together to determine insight to a website on what are the best and worst performing elements on the page. Some specific heat maps used for website analysis are listed below.

  • Mouse Tracking: Mouse tracking heat maps or hover maps, are used to visualize where the user of the site hovers their cursor.
  • Eye tracking: Eye tracking heat maps measure the eye position of the website's users and gathers measurements such as eye fixation volume, eye fixation duration, and areas of interest.
  • Click Tracking: Click tracking heat maps or touch maps, are similar to mouse tracking heat maps, but instead of hover actions, these types of heat maps help visualize the users click actions. Click tracking heat maps not only allow for visual cues on clickable components on a webpage, such as buttons or dropdown menus, but these heat maps also allow for tracking on non-clickable objects anywhere on the page.
  • AI-Generation Attention: AI-generated attention heat maps help visualize where the visiting user’s attention will go on a certain section of a webpage. These types of heat maps are implemented using a created software algorithm to determine and predict the attention actions of the user.
  • Scroll Tracking: Scroll tracking heat maps are used to represent the scrolling behavior of the website’s users. This helps produce visual cues to what section on the website the user spends the most time at.[5]
 
Data Analysis Heat Map Example: Displays the normalized linkage disequilibrium of Genomic Windows within the Hist1 region of a mouse (Mus musculus).
 
Data Analysis Heat Map Example: Subgraph of one of five hub nodes with a large degree of centrality in a genomic region in mice (Mus musculus) called the Hist1 region, where each cell in the graph represents one edge in the genomic network.

Exploratory Data Analysis: Working with small and large data sets, data scientists and data analysts look at and determine essential relationships and characteristics amongst different points in a data set as well as features of those data points. Data scientists and analysts work with a team of others in different professions. The use of heat maps make for a visually easy way to summarize findings and main components. There are other ways to represent data, however heat maps can visualize these data points and their relationships in a high dimensional space without becoming too compact and visually unappealing. Heat maps in data analysis, allow for specific variables of rows and/or columns on the axes and even on the diagonal.

  • Biology: In the biological field, heat maps are used to visually represent large and small sets of data. The focus is towards patterns and similarities in DNA, RNA, gene expression, etc. Working with these sets of data, data scientists in bioinformatics, focus on different concepts, some of which being community detection, association and correlation, and the concept of centrality, where heat maps are a compelling way to visually summarize results and to share amongst other professions not in the field of biology or bioinformatics. The two heat maps to the right, labeled “Data Analysis Heat Map Example,” show different ways in which one may present genomic data over a specific region (Hist1 region) to someone outside the field of biology so they have a better understanding of the general concept a biologist or data scientist are trying to present.

Financial Analysis: The values of different product and assets fluctuate both rapidly and/or gradually over time. The need to log changes to the daily markets is imperative. It allows for the ability to draw predictions from patterns while being able to revisit past numerical data. Heat maps are able to remove the tedious process and enable the user to visualize data points and compare amongst the different performers.[6]

Geographical Visualization: Heat maps are used to visualize and display a geographic distribution of data. Heat maps represent different densities of data points on a geographical map to help users see the intensities of certain phenomena and to show items of most or least importance. Usually, heat maps used in geographical visualization are mistaken for Choropleth maps, but the difference comes with how certain data is presented which differentiate the two. [7][8]

Sports: Heat maps can be used in many sports and can influence manager’s and/or coaches decisions based on high and low densities of data displayed. Users can identify patterns within the game, the strategies of opponents and one’s own team, make more informed decisions benefitting the player, team, and business, and can enhance performance in different areas by identifying enhancement is needed. Heat maps also visualize comparisons and relationships amongst different teams in the same sport or between different sports all together. [9]

Color schemesEdit

Many different color schemes can be used to illustrate the heat map, with perceptual advantages and disadvantages for each. Choosing a good color scheme is integral to accurately and effectively displaying data, whereas a poor color scheme can lead viewers to inaccurate conclusions or exclude those with color deficiencies from proper analysis of said data.

Rainbow color maps are a common choice, as humans can perceive more shades of color than they can of gray, and this would purportedly increase the amount of detail perceivable in the image. However, this is heavily discouraged in the scientific community for a number of reasons. Possibly the largest reason is that when there is a large number of colors involved, the visualization may give off the impression that there exist gradients in the data that are not really present. The more colors used in a visualization the more values begin to bleed together and color lacks the natural perceptual ordering found in grayscale or blackbody spectrum colormaps. Additionally, values represented by different shades of the same color can imply that the values are related when they are not.[10][11][12]

An important consideration when choosing a color scheme is whether or not the data will be viewed by anyone with any form of color deficiency. If the audience contains individuals with any form of color blindness, it may be wise to avoid color schemes with prominent reds and greens or uneven color gradients.[12]

 
A heat map showing the average temperature in the Southern Rockies from 1950 to 2020 using the "Blues" color palette from the Color Brewer library

In addition to audience considerations, it is also important to consider the form in which the data will be viewed. For example, if the data is to be printed in black and white or projected onto a large screen, it may be wise to adjust one's choice in color scheme. Common colormaps (like the "jet" colormap used as the default in many visualization software packages) have uncontrolled changes in luminance that prevent meaningful conversion to grayscale for display or printing. This also distracts from the actual data, arbitrarily making yellow and cyan regions appear more prominent than the regions of the data that are actually most important.[10][12]

Software ImplementationsEdit

Several heat map software implementations are freely available:

 
This heat map shows the normalized linkage disequilibrium of Genomic Windows within the Hist1 region of a mouse (Mus musculus)
  • R, a free software environment for statistical computing and graphics, contains several functions to trace heat maps,[13][14]
  • Gnuplot, a universal and free command-line plotting program, can trace 2D and 3D heat maps.[15]
  • Google Fusion Tables can generate a heat map from a Google Sheets spreadsheet limited to 1000 points of geographic data.[16]
  • Dave Green's 'cubehelix' colour scheme provides resources for a colour scheme that prints as a monotonically increasing greyscale on black and white postscript devices.[17]
  • Openlayers3 can render a heat map layer of a selected property of all geographic features in a vector layer.[18]
  • D3.js,[19][20] AnyChart[21][22] and Highcharts[23][24] are JavaScript libraries for data visualization that provide the ability to create interactive heat map charts, from basic to highly customized, as part of their solutions.

Choropleth maps versus heat mapsEdit

 
A choropleth map visualizing United States population density by state.

Choropleth maps and heat maps are often used in place of one another incorrectly when referring to data visualized geographically.[citation needed] Both techniques show the proportion of a variable of interest, but the two differ in how the boundaries for the variable’s data aggregations are constructed. If the data were collected and aggregated using irregular boundaries, such as administrative units, then a heat map displaying that data will be the same as a choropleth map, encouraging confusion about the how the two differ.

Choropleth maps show data grouped by geographic boundaries like countries, states, provinces or even floodplains. Each region has a singular value, visualized by color intensity, shading or pattern. The figure on the right displaying a choropleth map showing the United States' population density by state may be used as an example. The figure illustrates a singular value (population) denoted by blue color intensity proportionate to the state's value relative to all other states' values, bounded by each state's border.

Similarly, heat maps may also visualize data over a geographic region. However, unlike choropleth maps, heat maps show the proportion of a variable over an arbitrary, but usually small grid size, independent of geographic boundaries. [25] [26] The figure on the right displaying a heat map of world population is an example. The figure illustrates a single value (population) bounded in an arbitrary grid (square kilometers) with each cell in the grid represented by a color intensity proportionate to the value of the cell relative to all other cells. Some heat maps that are created using approximated regional data may show familiar geographic borders in the visualization where none really exist. The illusion of geographic borders is due to the existence of patterns within the dataset rather than the visualization technique. The figure on the right displaying a heat map of world population also contains this occurrence. Areas in rural parts of the United States and South America may closely resemble familiar geographic borders in those regions.

 
A heat map visualizing population density per square kilometer around the world in 1994.

ExamplesEdit

See alsoEdit

ReferencesEdit

  1. ^ a b Wilkinson L, Friendly M (May 2009). "The History of the Cluster Heat Map". The American Statistician. 63 (2): 179–184. CiteSeerX 10.1.1.165.7924. doi:10.1198/tas.2009.0033. S2CID 122792460.
  2. ^ "United States Patent and Trademark Office, registration #75263259". 1993-09-01.
  3. ^ Silhavy R, Senkerik R, Oplatkova ZK, Silhavy P, Prokopova Z (2016-04-26). Software Engineering Perspectives and Application in Intelligent Systems. ISBN 978-3-319-33622-0.
  4. ^ "All About Heatmaps". 24 December 2020.
  5. ^ "A Guide to Heatmaps: What is a Heatmap, the Use, and Types? | Attention Insight". 27 May 2021.
  6. ^ "5 Real Heat Map Examples from Leading Industries [2022] | VWO". 20 January 2020.
  7. ^ "All About Heatmaps". 24 December 2020.
  8. ^ "Guide to Geographic Heat Maps [Types & Examples]". 20 December 2021.
  9. ^ "5 Real Heat Map Examples from Leading Industries [2022] | VWO". 20 January 2020.
  10. ^ a b Borland D, Taylor MR (2007). "Rainbow color map (still) considered harmful". IEEE Computer Graphics and Applications. 27 (2): 14–7. doi:10.1109/MCG.2007.323435. PMID 17388198.
  11. ^ Borkin MA, Gajos KZ, Peters A, Mitsouras D, Melchionna S, Rybicki FJ, et al. (December 2011). "Evaluation of artery visualizations for heart disease diagnosis". IEEE Transactions on Visualization and Computer Graphics. 17 (12): 2479–88. CiteSeerX 10.1.1.309.590. doi:10.1109/TVCG.2011.192. PMID 22034369. S2CID 2548700.
  12. ^ a b c Crameri F, Shephard GE, Heron PJ (October 2020). "The misuse of colour in science communication". Nature Communications. 11 (1): 5444. Bibcode:2020NatCo..11.5444C. doi:10.1038/s41467-020-19160-7. PMC 7595127. PMID 33116149.
  13. ^ "Using R to draw a heat map from Microarray Data". Molecular Organisation and Assembly in Cells. 26 Nov 2009.
  14. ^ "Draw a Heat Map". R Manual.
  15. ^ "Gnuplot demo script: Heatmaps.dem".
  16. ^ "Fusion Tables Help - Create a heat map". Jan 2018. support.google.com
  17. ^ "Dave Green's 'cubehelix' colour scheme".
  18. ^ "ol/layer/Heatmap~Heatmap". OpenLayers. Retrieved 2019-01-01.
  19. ^ "Heatmap". D3.js Graph Gallery. Retrieved 25 July 2020.
  20. ^ "Most basic heatmap in d3.js". D3.js Graph Gallery. Retrieved 25 July 2020.
  21. ^ "Heat Map Chart". AnyChart Documentation. Retrieved 25 July 2020.
  22. ^ "Heat Map Charts - Gallery". AnyChart Gallery. Retrieved 25 July 2020.
  23. ^ "Heatmap - Highcharts docs". Highcharts. Retrieved 9 December 2019.
  24. ^ "Heat and tree maps - Highcharts demos". Highcharts. Retrieved 9 December 2019.
  25. ^ "Choropleth vs. Heat Map « Cartographer's Toolkit". Retrieved 2022-04-15.
  26. ^ "Heatmaps vs Choropleths". www.standardco.de. Retrieved 2022-04-15.

Further readingEdit

  • Bertin J (1967). Sémiologie Graphique. Les diagrammes, les réseaux, les cartes [Graphic semiotics. Diagrams, networks, maps] (in French). Gauthier-Villars. OCLC 2656278.
  • Eisen MB, Spellman PT, Brown PO, Botstein D (December 1998). "Cluster analysis and display of genome-wide expression patterns". Proceedings of the National Academy of Sciences of the United States of America. 95 (25): 14863–8. Bibcode:1998PNAS...9514863E. doi:10.1073/pnas.95.25.14863. PMC 24541. PMID 9843981.
  • Friendly M (March 1994). "Mosaic Displays for Multi-Way Contingency Tables". Journal of the American Statistical Association. 89 (425): 190–200. doi:10.1080/01621459.1994.10476460. JSTOR 2291215.
  • Ling RL (1973). "A computer generated aid for cluster analysis". Communications of the ACM. 16 (6): 355–361. doi:10.1145/362248.362263. S2CID 8033024.
  • Sneath PH (August 1957). "The application of computers to taxonomy". Journal of General Microbiology. 17 (1): 201–26. doi:10.1099/00221287-17-1-201. PMID 13475686.
  • Wilkinson L (1994). Advanced Applications: Systat for DOS Version 6. SYSTAT. ISBN 978-0-13-447285-0.
  • Barter RL, Yu B (2018). "Superheat: An R package for creating beautiful and extendable heatmaps for visualizing complex data". Journal of Computational and Graphical Statistics. 27 (4): 910–922. arXiv:1512.01524. doi:10.1080/10618600.2018.1473780. PMC 6430237. PMID 30911216.

External linksEdit

  • Wilkinson L, Friendly M. "The History of the Cluster Heat Map" (PDF).
  • Albergotti R (May 7, 2014). "Strava, Popular With Cyclists and Runners, Wants to Sell Its Data to Urban Planners". The Wall Street Journal.