Session (web analytics)

Summary

In web analytics, a session, or visit is a unit of measurement of a user's actions taken within a period of time or with regard to completion of a task. Sessions are also used in operational analytics and provision of user-specific recommendations. There are two primary methods used to define a session: time-oriented approaches based on continuity in user activity and navigation-based approaches based on continuity in a chain of requested pages.

Definition edit

The definition of "session" varies, particularly when applied to search engines.[1] Generally, a session is understood to consist of "a sequence of requests made by a single end-user during a visit to a particular site".[2] In the context of search engines, "sessions" and "query sessions" have at least two definitions.[1] A session or query session may be all queries made by a user in a particular time period[3] or it may also be a series of queries or navigations with a consistent underlying user need.[4][5]

Uses edit

Sessions per user can be used as a measurement of website usage.[6][7] Other metrics used within research and applied web analytics include session length,[8] and user actions per session.[9] Session length is seen as a more accurate alternative to measuring page views.[10]

Reconstructed sessions have also been used to measure total user input, including to measure the number of labour hours taken to construct Wikipedia.[11] Sessions are also used for operational analytics, data anonymization, identifying networking anomalies, and synthetic workload generation for testing servers with artificial traffic.[12][13]

Session reconstruction edit

 
an illustration of the different criteria used by different session reconstruction approaches.

Essential to the use of sessions in web analytics is being able to identify them. This is known as "session reconstruction". Approaches to session reconstruction can be divided into two main categories: time-oriented, and navigation-oriented.[14]

Time-oriented approaches edit

Time-oriented approaches to session reconstruction look for a set period of user inactivity commonly called an "inactivity threshold." Once this period of inactivity is reached, the user is assumed to have left the site or stopped using the browser entirely and the session is ended. Further requests from the same user are considered a second session. A common value for the inactivity threshold is 30 minutes and sometimes described as the industry standard.[15][16] Some have argued that a threshold of 30 minutes produces artifacts around naturally long sessions and have experimented with other thresholds.[17][18] Others simply state: "no time threshold is effective at identifying [sessions]".[19]

One alternative that has been proposed is using user-specific thresholds rather than a single, global threshold for the entire dataset.[20][21] This has the problem of assuming that the thresholds follow a bimodal distribution, and is not suitable for datasets that cover a long period of time.[17]

Navigation-oriented approaches edit

Navigation-oriented approaches exploit the structure of websites - specifically, the presence of hyperlinks and the tendency of users to navigate between pages on the same website by clicking on them, rather than typing the full URL into their browser.[14] One way of identifying sessions by looking at this data is to build a map of the website: if the user's first page can be identified, the "session" of actions lasts until they land on a page which cannot be accessed from any of the previously-accessed pages. This takes into account backtracking, where a user will retrace their steps before opening a new page.[22] A simpler approach, which does not take backtracking into account, is to simply require that the HTTP referer of each request be a page that is already in the session. If it is not, a new session is created.[23] This class of heuristics "exhibits very poor performance" on websites that contain framesets.[24]

References edit

Bibliography edit

  • Arlitt, Martin (2000). "Characterizing Web User Sessions" (PDF). SIGMETRICS Performance Evaluation Review. 28 (2): 50–63. doi:10.1145/362883.362920. S2CID 2946044.
  • Berendt, Bettina; Mobasher, Bamshad; Nakagawa, Miki; Spiliopoulou, Myra (2003). "The Impact of Site Structure and User Environment on Session Reconstruction in Web Usage Analysis" (PDF). WEBKDD 2002 - Mining Web Data for Discovering Usage Patterns and Profiles. Lecture Notes in Computer Science. Vol. 2703. Springer. pp. 159–179. doi:10.1007/978-3-540-39663-5_10. ISBN 978-3-540-39663-5.
  • Catledge, L.; Pitkow, J. (1995). "Characterizing browsing strategies in the World-Wide web" (PDF). Computer Networks and Isdn Systems. 27 (6): 1065–1073. doi:10.1016/0169-7552(95)00043-7. S2CID 14313721.
  • Cooley, Robert; Mobasher, Bamshad; Srivastava, Jaideep (1999). "Data Preparation for Mining World Wide Web Browsing Patterns" (PDF). Knowledge and Information Systems. 1 (1): 5–32. CiteSeerX 10.1.1.33.2792. doi:10.1007/BF03325089. ISSN 0219-3116. S2CID 1165622.
  • Donato, Debora; Bonchi, Francesco; Chi, Tom (2010). "Do you want to take notes?: Identifying research missions in Yahoo! Search pad" (PDF). Proceedings of the 19th international conference on World wide web. ACM. pp. 321–330. doi:10.1145/1772690.1772724. ISBN 9781605587998. S2CID 6951065.
  • Eickhoff, Carsten; Teevan, Jaime; White, Ryen; Dumais, Susan. (2014). "Lessons from the journey". Proceedings of the 7th ACM international conference on Web search and data mining (PDF). ACM. pp. 223–232. doi:10.1145/2556195.2556217. ISBN 9781450323512. S2CID 14666769.
  • Gayo-Avello, Daniel (2009). "A survey on session detection methods in query logs and a proposal for future evaluation" (PDF). Information Sciences. 179 (12): 1822–1843. doi:10.1016/j.ins.2009.01.026. hdl:10651/8686. ISSN 0020-0255. Archived from the original (PDF) on 2016-03-04. Retrieved 2015-02-18.
  • Geiger, R.S.; Halfaker, A. (2014). "Using edit sessions to measure participation in wikipedia". Proceedings of the 2013 conference on Computer supported cooperative work (PDF). ACM. pp. 861–870. doi:10.1145/2441776.2441873. ISBN 9781450313315. S2CID 7166943.
  • He, Daqing; Goker, Ayse; Harper, David J. (2002). "Combining evidence for automatic Web session identification". Information Processing and Management. 38 (5): 727–742. doi:10.1016/S0306-4573(01)00060-7. ISSN 0306-4573.
  • Heer, Jeffrey; Chi, Ed H. (2002). "Separating the swarm: Categorization methods for user sessions on the web" (PDF). Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. Vol. 4. ACM. pp. 243–250. doi:10.1145/503376.503420. ISBN 1581134533. S2CID 14018957.
  • Huang, Chien‐Kang; Chien, Lee‐Feng; Oyang, Yen‐Jen (2003). "Relevant term suggestion in interactive web search based on contextual information in query session logs". Journal of the American Society for Information Science and Technology. 54 (7): 638–649. CiteSeerX 10.1.1.105.5584. doi:10.1002/asi.10256.
  • Jansen, Bernard J.; Spink, Amanda; Saracevic, Tefko (2000). "Real life, real users, and real needs: a study and analysis of user queries on the web" (PDF). Information Processing and Management. 36 (2): 207–227. CiteSeerX 10.1.1.155.1383. doi:10.1016/S0306-4573(99)00056-4. ISSN 0306-4573.
  • Jansen, Bernard J.; Spink, Amanda (2006). "How are we searching the world wide web? A comparison of nine search engine transaction logs" (PDF). Information Processing and Management. 42 (1): 248–263. doi:10.1016/j.ipm.2004.10.007. ISSN 0306-4573.
  • Jones, Rosie; Klinkner, Kristina Lisa (2008). "Beyond the session timeout: Automatic hierarchical segmentation of search topics in query logs". Proceedings of the 17th ACM conference on Information and knowledge management (PDF). ACM. pp. 699–708. doi:10.1145/1458082.1458176. ISBN 9781595939913. S2CID 6548724.
  • Khoo, Michael; Pagano, Joe; Washington, Anne L.; Recker, Mimi; Palmer, Bart; Donahue, Robert A. (2008). "Using Web Metrics to Analyze Digital Libraries" (PDF). Proceedings of the 8th ACM/IEEE-CS Joint Conference on Digital Libraries. ACM.
  • Lam, Heidi; Russell, Daniel; Tang, Diane (2007). "Session viewer: Visual exploratory analysis of web session logs". IEEE Symposium on Visual Analytics Science and Technology. IEEE.
  • Mehrzadi, David; Feitelson, Dror G. (2012). "On Extracting Session Data from Activity Logs" (PDF). Proceedings of the 5th Annual International Systems and Storage Conference. SYSTOR '12. ACM. CiteSeerX 10.1.1.381.1956. doi:10.1145/2367589.2367592. ISBN 978-1-4503-1448-0. S2CID 8820623.
  • Meiss, Mark; Duncan, John; Gonçalves, Bruno; Ramasco, José J.; Menczer, Filippo (2009). "What's in a session: Tracking individual behavior on the web" (PDF). Proceedings of the 20th ACM conference on Hypertext and hypermedia. ACM. pp. 173–182. arXiv:1003.5325. doi:10.1145/1557914.1557946. ISBN 9781605584867. S2CID 6564335.
  • Menascé, Daniel A.; Almeida, V.; Fonseca, R.; Mendes, M. (1999). "A methodology for workload characterization of E-commerce sites" (PDF). Proceedings of the 1st ACM conference on Electronic commerce. ACM. pp. 119–128. doi:10.1145/336992.337024. ISBN 1581131763. S2CID 7239612.
  • Murray, G. Craig; Lin, Jimmy; Chowdhury, Abdur (2006). "Identification of User Sessions with Hierarchical Agglomerative Clustering" (PDF). Proceedings of the American Society for Information Science and Technology. 43 (1): 1–9. doi:10.1002/meet.14504301312.
  • Ortega, J.L.; Aguillo, I. (2010). "Differences Between Web Sessions According to the Origin of their Visits" (PDF). Journal of Informetrics. 4 (3): 331–337. doi:10.1016/j.joi.2010.02.001. ISSN 1751-1577.
  • Spiliopoulou, Myra; Mobasher, Bamshad; Berendt, Bettina; Nakagawa, Miki (2003). "A framework for the evaluation of session reconstruction heuristics in web-usage analysis" (PDF). INFORMS Journal on Computing. 15 (2): 171–190. CiteSeerX 10.1.1.621.3037. doi:10.1287/ijoc.15.2.171.14445. ISSN 1526-5528.
  • Weischdel, Birgit; Huizingh, Eelko K. R. E. (2006). "Website optimization with web metrics". Proceedings of the 8th international conference on Electronic commerce the new e-commerce: Innovations for conquering current barriers, obstacles and limitations to conducting successful business on the internet - ICEC '06 (PDF). p. 463. doi:10.1145/1151454.1151525. ISBN 978-1595933928. S2CID 2965255.