In this workshop, you will learn how to extract data from websites using Python — a process known as web scraping. You will learn how to use various Python packages to retrieve the HTML of webpages and to extract specific content from both static and dynamic webpages.
Difficulty rating: ★★★☆ Intermediate
Who is it for?
Both research staff and research students. Users who are already familiar with the basics of Python.
Summary of the topics covered
- Recap of how websites are structured using HTML
- How to extract information from HTML data using the BeautifulSoup package
- How to retrieve the HTML of a webpage using the requests package
- The difference between static and dynamic webpages
- How to scrape dynamic content using Selenium
Prerequisites
This workshop is intended for learners who already have a basic understanding of Python. In particular, you should be comfortable with:
-
- Install and import packages and modules
- Use lists and dictionaries
- Use conditional statements (if, else, elif)
- Use for loops
- Calling functions, understanding parameters/arguments and return values
Frequency
3 times a year
Duration
3 hours
Next course
Wednesday 18th March (13:00 - 16:00)
Book here
Can't attend?
Help with Python
If you need help with using Python, you can contact our Research Software Group for advice. Or you can raise a ServiceDesk ticket with us, or attend one of our drop-in sessions.