When I first got started I always found myself using R’s “plot” capability because, well, it is easy! Unfortunately, it lacks some advanced features — and the plots it produces are really ugly looking (subjective, but I bet you will agree with me). Luckily, there is a better tool for the job – ggplot. With only a few tricks you will find it just as easy to use.
Tag: data science
Comparing diamonds with linear regressions using python R in jupyter notebooks
Buying a ring is a big decision. You have the whole “are they the one” decision. I can’t help you with that. Then you have the reality that this could likely be the first major financial decision that will impact both of you. Wouldn’t it be nice if you could save hundreds or even thousands of dollars?
I am not here to convince you to avoid buying a diamond (thanks, De Beers). Instead, I am going to show you a basic statistical programming technique with python and R known as a “linear regression model.” I will use a jupyter notebook to execute data analysis so you can see step by step how it works.
You might be able to use this to shop smartly by allowing you to compare an actual cost in store to a predicted price. My wife and I built and used this code in 2013 while engagement ring shopping together. Hope it helps others!
Let’s get started!
Learning to screen scrape data using Google Chrome and curl
Screen scraping can be effective at getting free data very quickly. When attempting to screen scrape large amounts of data, I often use Google Chrome’s “Developer Tools” to obtain the steps necessary to recreate a web request. Here is an example process I used to screen scrape data from pricescope.com which contains a database about diamonds for sale online. I will use this data in an upcoming post on how to build statistical models.