WHY Python `requests` DON'T Always Get Web Page Text

Nov 11, 2022

Have you ever been web scraping a lot ...

You come to a new page to scrape.

You see the data on the web page.

You run a `page = requests.get(page_url)` command, and you get ...

NOTHING!

Ouch!

Sometimes, AND MORE AND MORE OFTEN, unless you actually load that page in a browser, you can't get to that data.

What can we do? AUTOMATE!

Automate the web page operations with a Python program that uses Selenium, and then collect the data.

Does this sound super cool?

I assure you that it is. As you build up a set of tools and approaches, it becomes more and more fun.

"Wait Thom. This sounds like making bots with Python!"

That is correct. You are essentially making a bot this way, and can grow into better bot making just by scraping data from pages this way.

It's great fun to automate the operations in a web browser on a website's pages and watch your code navigating through that website and operating it and collecting data.

The document contains a link to my DagsHub repo that contains all the starter code and setup instructions.

The attached PDF is just a conversion from my ReadMe.md in that repo. This PDF and an HTML version of this are also in the repo.

I hope you will follow the repo, so that you can see updates and examples as I add them.

I should be getting back to adding more soon. I had to study some other things first before coming back to advance this work.

Until next time,
Thom

Data Science with Thom

WHY Python `requests` DON'T Always Get Web Page Text