When I first got started I always found myself using R’s “plot” capability because, well, it is easy! Unfortunately, it lacks some advanced features — and the plots it produces are really ugly looking (subjective, but I bet you will agree with me). Luckily, there is a better tool for the job – ggplot. With only a few tricks you will find it just as easy to use.
In my previous post Comparing diamonds with linear regressions using python R in jupyter notebooks you will see exactly how I produced this R plot.
With only a few lines of code changes, we can import the same data into a pandas DataFrame (it is basically a spreadsheet representation of your data) and create a colorful ggplot.
I was really impressed with how easy it is to “factor” my diamond color data as a new attribute to modify the colors of my data points. This simple feature revealed a whole new world of information about my price comparison. We can clearly see from this picture that the colors are not sporadically spread throughout. In fact, “D” color diamonds are misbehaving quite a bit from the linear regression model. Said simply, “if you want to buy a diamond with a perfect heirloom quality “D” color (the best), you are likely going to pay a significant premium well outside of the norm.” This makes common sense, but ggplot makes it very easy to present this information.
Enough talk, here is the code.