Beautiful_Soup_(HTML_parser) Knowpia

Beautiful Soup
Original author(s)	Leonard Richardson
Initial release	2004
Stable release	4.12.3 / 17 January 2024; 2 months ago
Repository	code.launchpad.net/beautifulsoup/ ;
Written in	Python
Platform	Python
Type	HTML parser library, Web scraping
License	Python Software Foundation License (Beautiful Soup 3); MIT License (versions 4 and up);
Website	www.crummy.com/software/BeautifulSoup/

Beautiful Soup is a Python package for parsing HTML and XML documents, including those with malformed markup. It creates a parse tree for documents that can be used to extract data from HTML,^[3] which is useful for web scraping.^[2]^[4]

Beautiful Soup was started by Leonard Richardson, who continues to contribute to the project,^[5] and is additionally supported by Tidelift, a paid subscription to open-source maintenance.^[6]

Code example edit

Beautiful Soup represents parsed data as a tree which can be searched and iterated over with ordinary Python loops.^[7] The example below uses the Python standard library's urllib^[8] to load Wikipedia's main page, then uses Beautiful Soup to parse the document and search for all links within.

#!/usr/bin/env python3
# Anchor extraction from HTML document
from bs4 import BeautifulSoup
from urllib.request import urlopen
with urlopen('https://en.wikipedia.org/wiki/Main_Page') as response:
    soup = BeautifulSoup(response, 'html.parser')
    for anchor in soup.find_all('a'):
        print(anchor.get('href', '/'))

History edit

Beautiful Soup is named both after a poem in Alice's Adventures in Wonderland^[9] and tag soup.^[10]

Beautiful Soup 3 was the official release line of Beautiful Soup from May 2006 to March 2012. The current release is Beautiful Soup 4.x. Beautiful Soup 4 can be installed with pip install beautifulsoup4.

In 2021, Python 2.7 support was retired and the release 4.9.3 was the last to support Python 2.7.^[11]

References edit

^ Error: Unable to display the reference properly. See the documentation for details.
^ ^a ^b "Beautiful Soup website". Retrieved 18 April 2012. Beautiful Soup is licensed under the same terms as Python itself
^ Hajba, Gábor László (2018), Hajba, Gábor László (ed.), "Using Beautiful Soup", Website Scraping with Python: Using BeautifulSoup and Scrapy, Apress, pp. 41–96, doi:10.1007/978-1-4842-3925-4_3, ISBN 978-1-4842-3925-4
^ Python, Real. "Beautiful Soup: Build a Web Scraper With Python – Real Python". realpython.com. Retrieved 2023-06-01.
^ "Code : Leonard Richardson". Launchpad. Retrieved 2020-09-19.
^ Tidelift. "beautifulsoup4 | pypi via the Tidelift Subscription". tidelift.com. Retrieved 2020-09-19.
^ "How To Scrape Web Pages with Beautiful Soup and Python 3 | DigitalOcean". www.digitalocean.com. Retrieved 2023-06-01.
^ Python, Real. "Python's urllib.request for HTTP Requests – Real Python". realpython.com. Retrieved 2023-06-01.
^ makcorps (2022-12-13). "BeautifulSoup tutorial: Let's Scrape Web Pages with Python". Retrieved 2024-01-24.
^ "Python Web Scraping". Udacity. 2021-02-11. Retrieved 2024-01-24.
^ Richardson, Leonard (7 Sep 2021). "Beautiful Soup 4.10.0". beautifulsoup. Google Groups. Retrieved 27 September 2022.

[wikidata-270cf90818bd03dc83ccffd63c9903d697c1d933-v11-1] Error: Unable to display the reference properly. See the documentation for details.

[crummy.com-2] "Beautiful Soup website". Retrieved 18 April 2012. Beautiful Soup is licensed under the same terms as Python itself

[3] Hajba, Gábor László (2018), Hajba, Gábor László (ed.), "Using Beautiful Soup", Website Scraping with Python: Using BeautifulSoup and Scrapy, Apress, pp. 41–96, doi:10.1007/978-1-4842-3925-4_3, ISBN 978-1-4842-3925-4

[4] Python, Real. "Beautiful Soup: Build a Web Scraper With Python – Real Python". realpython.com. Retrieved 2023-06-01.

[5] "Code : Leonard Richardson". Launchpad. Retrieved 2020-09-19.

[6] Tidelift. "beautifulsoup4 | pypi via the Tidelift Subscription". tidelift.com. Retrieved 2020-09-19.

[7] "How To Scrape Web Pages with Beautiful Soup and Python 3 | DigitalOcean". www.digitalocean.com. Retrieved 2023-06-01.

[8] Python, Real. "Python's urllib.request for HTTP Requests – Real Python". realpython.com. Retrieved 2023-06-01.

[9] rps (2022-12-13). "BeautifulSoup tutorial: Let's Scrape Web Pages with Python". Retrieved 2024-01-24.

[10] "Python Web Scraping". Udacity. 2021-02-11. Retrieved 2024-01-24.

[11] Richardson, Leonard (7 Sep 2021). "Beautiful Soup 4.10.0". beautifulsoup. Google Groups. Retrieved 27 September 2022.

[3]

[2]

[4]

[1]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

Summary

Code example edit

History edit

See also edit

References edit