Stop using R plot and learn to love ggplot

ggplot Diamond Comparison

When I first got started I always found myself using R’s “plot” capability because, well, it is easy! Unfortunately, it lacks some advanced features — and the plots it produces are really ugly looking (subjective, but I bet you will agree with me). Luckily, there is a better tool for the job – ggplot. With only a few tricks you will find it just as easy to use.

Read More

Comparing diamonds with linear regressions using python R in jupyter notebooks

Buying a ring is a big decision. You have the whole “are they the one” decision. I can’t help you with that. Then you have the reality that this could likely be the first major financial decision that will impact both of you. Wouldn’t it be nice if you could save hundreds or even thousands of dollars?

I am not here to convince you to avoid buying a diamond (thanks, De Beers). Instead, I am going to show you a basic statistical programming technique with python and R known as a “linear regression model.” I will use a jupyter notebook to execute data analysis so you can see step by step how it works.

You might be able to use this to shop smartly by allowing you to compare an actual cost in store to a predicted price. My wife and I built and used this code in 2013 while engagement ring shopping together. Hope it helps others!

Let’s get started!

Read More

Learning to screen scrape data using Google Chrome and curl

Pricescope.com Results

Screen scraping can be effective at getting free data very quickly. When attempting to screen scrape large amounts of data, I often use Google Chrome’s “Developer Tools” to obtain the steps necessary to recreate a web request. Here is an example process I used to screen scrape data from pricescope.com which contains a database about diamonds for sale online. I will use this data in an upcoming post on how to build statistical models.

Read More