Video browsing

Summary

Video browsing, also known as exploratory video search, is the interactive process of skimming through video content in order to satisfy some information need or to interactively check if the video content is relevant. While originally proposed to help users inspecting a single video through visual thumbnails,[1] modern video browsing tools enable users to quickly find desired information in a video archive by iterative human–computer interaction through an exploratory search approach.[2][3] Many of these tools presume a smart user that wants features to interactively inspect video content, as well as automatic content filtering features. For that purpose, several video interaction features[4] are usually provided, such as sophisticated navigation in video or search by a content-based query. Video browsing tools often build on lower-level video content analysis, such as shot transition detection, keyframe extraction, semantic concept detection, and create a structured content overview of the video file or video archive. Furthermore, they usually provide sophisticated navigation features, such as advanced timelines,[5] visual seeker bars or a list of selected thumbnails, as well as means for content querying. Examples of content queries are shot filtering through visual concepts (e.g., only shots showing cars), through some specific characteristics (e.g., color or motion filtering), through user-provided sketches (e.g., a visually drawn sketch), or through content-based similarity search.

History edit

Video browsing was originally proposed by Iranian engineer Farshid Arman, Taiwanese computer scientist Arding Hsu, and computer scientist Ming-Yee Chiu, while working at Siemens, and it was presented at the ACM International Conference in August 1993.[1][6] They described a shot detection algorithm for compressed video that was originally encoded with discrete cosine transform (DCT) video coding standards such as JPEG, MPEG and H.26x.[7] The basic idea was that, since the DCT coefficients are mathematically related to the spatial domain and represent the content of each frame, they can be used to detect the differences between video frames. In the algorithm, a subset of blocks in a frame and a subset of DCT coefficients for each block are used as motion vector representation for the frame. By operating on compressed DCT representations, the algorithm significantly reduces the computational requirements for decompression and enables effective video browsing.[8] The algorithm represents separate shots of a video sequence by an r-frame, a thumbnail of the shot framed by a motion tracking region. A variation of this concept was later adopted for QBIC video content mosaics, where each r-frame is a salient still from the shot it represents.[9]

Video Notebook edit

Modern video browsing solutions include Video Notebook, a Menlo Park startup founded in 2021 by Mike Lanza, which uses computer vision to extract slides and optical character recognition and speech recognition to facilitate video search. The software can be either used on the client side (using a browser extension), where the slides and text are extracted while the video is watched (e.g. on a video platform like YouTube or Udemy),[10][11] or on the server side. Processed videos, which can be viewed in the Video Notebook web app, feature a video browsing user interface with extracted timestamped slides, a search bar for querying the video (or a collection of videos), and text chapters. Video Notebook customers include organisations like Ernst & Young.[12]

Video Browser Showdown edit

The Video Browser Showdown (VBS)[13] is an annual live evaluation competition for exploratory video search tools, where international researchers use video browsing tools to solve ad-hoc video search tasks on a moderately large data set as fast as possible. The main goal of the VBS, which started in 2012 at the International Conference on MultiMedia Modeling (MMM), is to advance the performance of video browsing tools. Since 2016, the VBS also collaborates with TRECVID.[14] The aim of the VBS is to evaluate video browsing tools for efficiency at known-item search (KIS) tasks with a well-defined data set in direct comparison to other tools.[15]

References edit

  1. ^ a b Arman, Farshid; Depommier, Remi; Hsu, Arding; Chiu, Ming-Yee (October 1994). "Content-based browsing of video sequences". Proceedings of the second ACM international conference on Multimedia - MULTIMEDIA '94. Association for Computing Machinery. pp. 97–103. doi:10.1145/192593.192630. ISBN 0897916867. S2CID 1360834.
  2. ^ Supporting video library exploratory search: when storyboards are not enough. M. G. Christel. 2008.
  3. ^ The Video Explorer - a tool for navigation and searching within a single video based on fast content analysis. K. Schoeffmann, M. Taschwer, and L. Boeszoermenyi. 2010.
  4. ^ Video Interaction Tools: A Survey of Recent Work. K. Schoeffmann, M. A. Hudelist, and J. Huber. 2015.
  5. ^ Interfaces for timeline-based mobile video browsing. W. Hürst and K. Meier. 2008.
  6. ^ Arman, Farshid; Hsu, Arding; Chiu, Ming-Yee (August 1993). "Image processing on compressed data for large video databases". Proceedings of the first ACM international conference on Multimedia - MULTIMEDIA '93. Association for Computing Machinery. pp. 267–272. doi:10.1145/166266.166297. ISBN 0897915968. S2CID 10392157.
  7. ^ Skodras, Athanassios (2009-01-01). "Real time data hiding by exploiting the IPCM macroblocks in H. 264/AVC streams". Journal of Real-Time Image Processing.
  8. ^ Zhang, HongJiang (1998). "Content-Based Video Browsing And Retrieval". In Furht, Borko (ed.). Handbook of Internet and Multimedia Systems and Applications. CRC Press. pp. 83–108 (89). ISBN 9780849318580.
  9. ^ Steele, Michael; Hearst, Marti A.; Lawrence, A. Rowe (1998). "The Video Workbench: a direct manipulation interface for digital media editing by amateur videographers" (PDF): 1-19 (14). S2CID 18212394. Archived from the original (PDF) on 2019-02-26. Retrieved 18 October 2019. {{cite journal}}: Cite journal requires |journal= (help)
  10. ^ "Video Notebook - Notes on all video platforms". chrome.google.com. Retrieved 2022-06-03.
  11. ^ "Video Screenshots and Notes - YouTube & more". www.videonotebook.com. Retrieved 2022-06-03.
  12. ^ "Videos made Browsable & Searchable - Video Notebook". www.videonotebook.com. Retrieved 2022-06-03.
  13. ^ Video Browser Showdown
  14. ^ TRECVID, Academic benchmark initiative by NIST
  15. ^ Schöffmann, Klaus; Bailer, Werner (2012-07-24). "Video browser showdown". ACM SIGMultimedia Records. 4 (2): 1–2. doi:10.1145/2350204.2350205. S2CID 46224263.