All posts by Jack Schultz

Classifying Amazon Reviews with Scikit-Learn — More Data is Better Turns Out

Last time, I went through some basics of how naive Bayes algorithm works, and the logic behind it, and implemented the classifier myself, as well as using the NLTK. That’s great and all, and hopefully people reading it got a […]

Practical Naive Bayes — Classification of Amazon Reviews

If you search around the internet looking for applying Naive Bayes classification on text, you’ll find a ton of articles that talk about the intuition behind the algorithm, maybe some slides from a lecture about the math and some notation behind […]

Classifying Country Music Songs is an Art — Getting Training Data

If you’ve been following along recently, I’ve been writing about my theory of country music, and how unlike most other genres out there, country music song topics are, let’s just say, much more centralized. And so in my continuing effort […]

Talkin’ ‘Bout Trucks, Beer, and Love in Country Songs — Analyzing Genius Lyrics

Trucks, beer, and love, all things that make country music go round. I’ve said before that country music is just pop music with a slide, and then lyrics about slightly different topics than what you’ll hear in hip hop or “normal” pop […]

Getting Song Lyrics from Genius’s API + Scraping

Genius is a great resource. At a high level, Genius has song lyrics and allows users to comment on what the artist meant. Starting as Rap Genius, where users annotated rap lyrics, the site rebranded as “Genius”, allowing all songs to […]

Predicting PGA Tour Scoring Average from Statistics Using Linear Regression

First off, I admit, that’s probably the most boring title for a blog post ever. It gets a negative value on the clickbait scale that is generally unseen in the modern, “every click equals dollars” era that we live in. […]

Python, Postgres, SQLAlchemy, and PGA Tour Stats

A little ago, I wrote an article about scraping a bunch of PGA Tour stats. The end result of that was writing those stats out into CSV files. While this was suitable for that task of gathering the stats, let’s […]