Interactive visual analysis

Summary

Interactive Visual Analysis (IVA) is a set of techniques for combining the computational power of computers with the perceptive and cognitive capabilities of humans, in order to extract knowledge from large and complex datasets. The techniques rely heavily on user interaction and the human visual system, and exist in the intersection between visual analytics and big data. It is a branch of data visualization. IVA is a suitable technique for analyzing high-dimensional data that has a large number of data points, where simple graphing and non-interactive techniques give an insufficient understanding of the information.[1]

These techniques involve looking at datasets through different, correlated views and iteratively selecting and examining features the user finds interesting. The objective of IVA is to gain knowledge which is not readily apparent from a dataset, typically in tabular form. This can involve generating, testing or verifying hypotheses, or simply exploring the dataset to look for correlations between different variables.

History edit

Focus + Context visualization and its related techniques date back to the 1970s.[2] Early attempts at combining these techniques for Interactive Visual Analysis occur in the WEAVE visualization system for cardiac simulation [3] in the year 2000. SimVis appeared in 2003,[4] and multiple Ph. D. projects have explored the concept since then - notably Helmut Doleisch in 2004,[5] Johannes Kehrer in 2011 [6] and Zoltan Konyha in 2013.[7] ComVis, which is used in the visualization community, appeared in 2008.[8]

Basics edit

The objective of Interactive Visual Analysis is to discover information in data which is not readily apparent. The goal is to move from the data itself to the information contained in the data, ultimately uncovering knowledge which was not apparent from looking at the raw numbers.

The most basic form of IVA is to use coordinated multiple views [9] displaying different columns of our dataset. At least two views are required for IVA. The views are usually among the common tools of information visualization, such as histograms, scatterplots or parallel coordinates, but using volume rendered views is also possible if this is appropriate for the data.[6] Typically, one view will display the independent variables of the dataset (e.g. time or spatial location), while the others display the dependent variables (e.g. temperature, pressure or population density) in relation to each other. If the views are linked, the user can select data points in one view and have the corresponding data points automatically highlighted in the other views. This technique, which intuitively allows exploration of higher-dimensional properties of the data, is known as linking and brushing.[10][11]

The selection made in one of the views doesn't have to be binary. Software packages for IVA can allow for a gradual “degree of interest” [5][6][12] in the selection, where data points are gradually highlighted as we move from low to high interest. This allows for an inherent “focus+context” [13] aspect to the search for information. For instance, when examining a tumor in a Magnetic resonance imaging dataset, the tissue surrounding the tumor might also be of some interest to the operator.

The IVA loop edit

Interactive Visual Analysis is an iterative process. Discoveries made after brushing of the data and looking at the linked views can be used as a starting point for repeating the process, leading to a form of information drill-down. As an example, consider the analysis of data from a simulation of a combustion engine. The user brushes a histogram of temperature distribution, and discovers that one specific part of one cylinder has dangerously high temperatures. This information can be used to formulate the hypothesis that all cylinders have a problem with heat dissipation. This could be verified by brushing the same region in all other cylinders and seeing in the temperature histogram that these cylinders also have higher temperatures than expected.[14]

Data model edit

The data source for IVA is usually tabular data where the data is represented in columns and rows. The data variables can be divided into two different categories: independent and dependent variables. The independent variables represent the domain of the observed values, such as for instance time and space. The dependent variables represent the data being observed, for instance temperature, pressure or height.[14]

IVA can help the user uncover information and knowledge about data sources that have fewer dimensions as well as datasets that have a very large number of dimensions.[2]

Levels of IVA edit

The IVA tools can be divided into several different levels of complexity. These levels provides the user with different interaction tools to analyze the data. For most uses, the first level will be sufficient and this is also the level that provides the user with the fastest response from the interaction. The higher levels make it possible to uncover more subtle relationships in the data. However, this requires more knowledge about the tools and the interaction process has a longer response time.[1]

Base level edit

The most simple form of IVA is the base level which consists of brushing and linking. Here the user can set up several views with different dataset variables and mark an interesting area in one of the views. The data points corresponding to the selection is marked automatically in the other views. A lot of information can be derived from this level of IVA. For datasets where the relationships between the variables are reasonably simple, this technique is usually sufficient for the user to achieve the required level of understanding.[7]

Second level edit

Brushing and linking with logical combination of brushes is a more advanced form of IVA. This makes it possible for the user to mark several areas in one or several views and combine these areas with the logical operators: and, or, not. This makes it is possible to explore deeper into the dataset and see more hidden information.[7] A simple example would be the analysis of weather data: The analyst might want to discover regions that both have warm temperatures and low precipitation.

Third level edit

The logical combination of selections might not be sufficient to uncover meaningful information from the data set. There are multiple techniques available that make hidden relationships in the data more apparent. One of these is attribute derivation. This allows the user to derive additional attributes from the data, such as derivatives, clustering information or other statistic properties. In principle, the operator can perform any set of calculations on the raw data. The derived attributes can then be linked and brushed like any other attribute.[7]

The second tool in level three of IVA is advanced brushing techniques, such as angular brushing, similarity brushing or percentile brushing. These brushing tools select data points in a more advanced fashion than plain "point and click" selection. Advanced brushing generates a faster response than attribute derivation, but has a higher learning curve and require a deeper understanding of the dataset.[7]

Fourth level edit

The fourth level of IVA is specific to each dataset and varies dependent on the dataset and the purpose of the analysis. Any calculated attribute which is specific to the data under consideration, belongs to this category. An example from the analysis of flow data would be the detection and categorization of vortexes or other structures present in the flow data. This means that fourth-level IVA techniques must be individually tailored to the specific application. After detection of higher-order features, the calculated attributes would be connected to the original data set and subjected to the normal technique of linking and brushing.[1]

Patterns of IVA edit

The "linking and brushing" (selection) concept of IVA can be used between different types of variables in the dataset. Which pattern we should use depends on which aspect of the correlations in the dataset are of interest.[1][15]

Feature localization edit

Brushing data points from the set of dependent variables (e.g. temperature) and seeing where among the independent variables (e.g. space or time) these data points show up, is called "feature localization". With feature localization, the user can easily identify the location of features in the dataset. Examples from a meteorological dataset would be which regions have a warm climate or which times of the year have a lot of precipitation.[1][15]

Local investigation edit

If independent variables are brushed and we look for the corresponding connection to a dependent view, this is termed "local investigation". This makes it possible to investigate the characteristics of for example a specific region or specific time. In the case of meteorological data, we could for instance discover the temperature distribution during the winter months.[1][15]

Multivariate analysis edit

Brushing dependent variables and watching the connection to other dependent variables is called multivariate analysis. This could for example be used to find out if high temperatures are correlated with pressure by brushing high temperatures and watching a linked view of pressure distributions.

Since each of the linked views usually has two or more dimensions, multivariate analysis can implicitly uncover higher-dimensional features of the data which would not be readily apparent from e.g. a simple scatterplot.[1][15]

See also edit

References edit

  1. ^ a b c d e f g Interactive Visual Analysis of Scientific Data. Steffen Oeltze, Helmut Doleisch, Helwig Hauser, Gunther Weber. Presentation at IEEE VisWeek 2012, Seattle (WA), USA
  2. ^ a b Hauser, Helwig. "Generalizing focus+ context visualization." Scientific visualization: The visual extraction of knowledge from data. Springer Berlin Heidelberg, 2006. 305-327.
  3. ^ Gresh, Donna L., et al. "WEAVE: A system for visually linking 3-D and statistical visualizations, applied to cardiac simulation and measurement data." Proceedings of the conference on Visualization'00. IEEE Computer Society Press, 2000.
  4. ^ Doleisch, Helmut, Martin Gasser, and Helwig Hauser. "Interactive feature specification for focus+ context visualization of complex simulation data." Proceedings of the symposium on Data visualisation 2003. Eurographics Association, 2003.
  5. ^ a b Doleisch, Helmut. Visual analysis of complex simulation data using multiple heterogenous views. 2004.
  6. ^ a b c Kehrer, Johannes. Interactive visual analysis of multi-faceted scientific data. PhD dissertation, Department of Informatics, University of Bergen, Norway, 2011.
  7. ^ a b c d e Konyha, Zoltán, et al. "Interactive visual analysis of families of curves using data aggregation and derivation." Proceedings of the 12th International Conference on Knowledge Management and Knowledge Technologies. ACM, 2012.
  8. ^ Matkovic, Krešimir, et al. "ComVis: A coordinated multiple views system for prototyping new visualization technology." Information Visualisation, 2008. IV'08. 12th International Conference. IEEE, 2008
  9. ^ Roberts, Jonathan C. "State of the art: Coordinated & multiple views in exploratory visualization." Coordinated and Multiple Views in Exploratory Visualization, 2007. CMV'07. Fifth International Conference on. IEEE, 2007.
  10. ^ Martin, Allen R., and Matthew O. Ward. "High dimensional brushing for interactive exploration of multivariate data." Proceedings of the 6th Conference on Visualization'95. IEEE Computer Society, 1995.
  11. ^ Keim, Daniel A. "Information visualization and visual data mining." Visualization and Computer Graphics, IEEE Transactions on 8.1 (2002): 1-8.
  12. ^ Doleisch, Helmut, and Helwig Hauser. "Smooth brushing for focus+ context visualization of simulation data in 3D." Journal of WSCG 10.1 (2002): 147-154.
  13. ^ Lamping, John, Ramana Rao, and Peter Pirolli. "A focus+ context technique based on hyperbolic geometry for visualizing large hierarchies." Proceedings of the SIGCHI conference on Human factors in computing systems. ACM Press/Addison-Wesley Publishing Co., 1995.
  14. ^ a b Konyha, Zoltan, et al. "Interactive visual analysis of families of function graphs." Visualization and Computer Graphics, IEEE Transactions on 12.6 (2006): 1373-1385.
  15. ^ a b c d Oeltze, Steffen, et al. "Interactive visual analysis of perfusion data." Visualization and Computer Graphics, IEEE Transactions on 13.6 (2007): 1392-1399.