Everything you need for Web Scraping workshop
Use PIP to install all packages.
Pip is a package management system used to install and manage software packages written in Python. Many packages can be found in the Python Package Index (PyPI). Python 2.7.9 and later (on the python2 series), and Python 3.4 and later include pip (pip3 for Python 3) by default.
For more info and installation:
Pip and virtualenv on Mac
Pip and virtualenv on Windows
Fetching URLs
Urllib module for python
Urllib is a Python module for fetching URLs. You do not have to install it. Urllib module comes with Python package. For python 3.6 use:
https://docs.python.org/3.6/howto/urllib2.html
For python 2.7 use:
https://docs.python.org/2.7/howto/urllib2.html
Requests library
Requests is HTTP library for Python, official documentation is here:
http://docs.python-requests.org/en/master/
Installation:
pip install requests
WGET library
Python download utility WGET, official documentation is here:
https://pypi.python.org/pypi/wget
Installation:
pip install wget
Scraping
Beautiful Soup
Beautiful Soup is a Python library for pulling data out of HTML and XML files. Official documentation is here:
https://www.crummy.com/software/BeautifulSoup/bs4/doc/#calling-a-tag-is-like-calling-find-all
Installation:
pip install beautifulsoup4
PDFminer3k
PDFminer3k PDF parser and analyzer, official documentation is here:
https://pypi.python.org/pypi/pdfminer3k
Installation:
pip install pdfminer3k