Guide to web scraping using python & selenium

Photo by on


  • What is Selenium?
  • Why Selenium?
  • Project Pre-requisites
  • Website
  • Code Snippets
  • Data Visualization

What is Selenium? Selenium is an open-source web-based automation tool. Originally developed in ThoughtWorks as an in-house tool in 2004, it was eventually released as an open-source.

While Selenium is understood as a testing tool, some working on web scraping would likely come across selenium in the form of web drivers. So what gives? In the context of testing, Selenium is used in the locating of data on a website, thereby verifying the specific element is present/absent on the page. …


A look at customizing options towards enhancing visualizations, and checking out some lesser-known plot functions along the way

Data visualization provides a visual context through maps or graphs. In doing so, it translates the data to a more natural form for the human mind to comprehend and pick out patterns or points of interest. A good visualization facilitates the conveyance of information or calls to action as part of the data presentation (storytelling).

Image by Author | Good for laughs, use at own risk for professional presentations.

The idea for the underlying project was partially inspired while maintaining records on iaito orders some years back. I chose this data because I felt this could be a way to bridge data science (visualization and analytics) with traditional art (Iaido), and as a visualization…

Web Scraping, Data Visualization

Simple web scraping and visualization, using insights to address dietary concerns

As the covid situation flared up once again in the city-state, we naturally cut down our time spent outdoors and try to minimize time spent in crowded areas as far as possible. Nevertheless, the pantry will require periodic replenishment. Also, the Household Overlord has been remarked that online grocery delivery time slots are few and far in between. So, if we were to make the trip ourselves to the supermarkets, we would want to be informed on the items available and make targetted acquisition trips.

Using skills picked up from UpLevel Web scraping MasterClass, I suggested to the wife that…

Metrics and approaches towards mitigating multi-collinearity for Linear Regression Model

Feature selection is a process where the predictor variables that contribute most significantly towards the prediction/ classification of the target variable are selected. In feature selection for linear regression models, we are concerned with four aspects regarding the variables. Framed as a mnemonic “LINE”, these are:

  1. Linearity. The selected variable possesses a linear relationship with the target variable.
  2. Independence of predictor variables. Selected variables to be independent of each other.
  3. Normality. Residuals generally follow a normal distribution (mean of zero).
  4. Equality of variance. The residual errors are generally consistent across the values of predictor variables (i.e. Homoscedasticity).

In cases where…

Python EDA tools to facilitate EDA

Photo by on

Data cleaning and Exploratory Data Analysis go hand-in-hand — with a better understanding of the data, can one be better positioned to spot errors or outliers for mitigation.

Most of us do EDA through pandas functions, coupled with visualizations using matplotlib to seaborn. Occasionally, we define functions to do 1) automated and 2) customized EDA of datasets (e.g. doing EDA on multiple, large datasets before merging them). While working on correlation matrices for a pet project, I came across Sweetviz & Pandas profiling and incorporated them into my EDA workflow. Here are some of the observations:

Pandas Profiling

Compatibility & Installation: Weary…

An exploration of text classification using text feature and topic modeling, from concept to deployment

Word features from the Motive text. (Image by author)

One and a half years ago, I chanced upon Python and Anaconda as a tool for Data Science (DS) while taking part in a Data Hackathon. The myriad of Python libraries underscored Python’s versatility as a toolkit for data science. Sadly, I didn’t possess the necessary skillset to utilize Python back then. Since then, I took online courses in Python and SQL, and gradually developed an interest in the field of DS. Seeking to expedite the transition into DS, I enrolled in the Data Science Immersive (DSI) program by General Assembly. The DSI is a 3 months intensive boot camp…


Data Science Enthusiast, Analyst. Sharing insights from own learning journey and pet projects in this space. Linkedin Profile:

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store