Getting Song Lyrics from Genius’s API + Scraping

Genius is a great resource. At a high level, Genius has song lyrics and allows users to comment on what the artist meant. Starting as Rap Genius, where users annotated rap lyrics, the site rebranded as “Genius”, allowing all songs to be talked about. According to their website, “Genius is the world’s biggest collection of song lyrics and crowdsourced musical knowledge.” Recently even, they’ve moved to allowing annotations of pretty much anything posted online.

I’ve have used it a bunch recently while trying to figure out what the hell Frank Ocean was trying to say in his new album Blond. Users of the site explained tons of Frank’s references that went whoosh right over my head when I listened the first time and all the times after.

And recently, when I had some ideas for mini projects using song lyrics, I was pretty happy to find that Genius had a API for getting the data on their site. Whenever I’m trying to get data elsewhere, I’m much happier with an API, or at least being able to get it from JSON responses rather than parsing HTML. It’s just cleaner to look at, and with an API, I can expect good documentation that isn’t going to change with css updates.

Their API docs looked pretty good at first glance, with endpoints for artists, songs, albums, and annotations. One things I did notice was that they don’t have an artist entry point. A lot of what I want to do is artist based, meaning I need to know the artist id for everyone. And in order for me to get that, I have to search the artist, grab a song from the results, hit the song endpoint for that song’s information, and then grab the artist id from there. It’d be nice if you could specify what I’m searching for when I hit the search endpoint so I don’t have to go through that whole charade just to get the artist. But that’s a blog post for another time. Overall, they give out tons of information pretty easily.

But why, Genius, why don’t you have an endpoint for getting the raw lyrics of a song?! You have a songs endpoint on the API, and you give me a ton of information from there — the song title, album name, featured artists on the song, number of annotations, images associated with the song, album information, page views for that song, and a whole host of more data. But the one thing you don’t give me, and the one thing that people using the API probably want the most, is plain text lyrics!

Pre-Genius, I was stuck with these jankily laid out sites with super old looking css that would have the lyrics, but not necessarily correct, and definitely no annotations. Those sites are probably easily scrapeable considering their simplicity, but searching for the right song would be more difficult, and the lyrics might not be correct. Genius solved this all now for a web user, but dammit, I want the lyrics in the API!

Now you might be able to get the entire set of lyrics by using the annotations endpoint, which had information about all the annotations for a certain song or article, but that would require a song to have annotations for every word in the song. For someone like Chance the Rapper who like Frank Ocean (and most other hip hop artists uses tons of references in his lyrics, having complete annotations might not be an issue. But of Jake Owen, who’s new single “American Country Love Song” has probably the most self explanatory lyrics ever (sorry for throwing you under the bus here, Jake. Still a fan), there’s no need to annotate anything, and getting the lyrics in this manner wouldn’t work.

The lyrics are there on the internet however, and I can get at them by hitting the song endpoint, and using the web url that it returns. The rest of this article will show you how to do that using Python and it’s requests and BeautifulSoup libraries. But I don’t have to have to resort to HTML parsing, and I don’t think Genius wants users doing that either.

I’m left here wondering why they don’t want to give up the lyrics so easily, and I really don’t have much to go on. Genius’s goal seems to be wanting to annotate the internet. It has already moved on from their initial site of Rap Genius, into all music, and now into speech transcripts, as well as pretty much any other content on the web. Their value comes from those annotations themselves, not the information they’re annotating. They give away the annotations freely, but not the information (lyrics) in this case.

Enough speculation on why Genius doesn’t spit out the lyrics to a song when you get the other information. And as I’m writing this, I realize I easily could have overlooked something in their API and Genius might return the full lyrics, but I overlooked it. In that case, half of this article will be pointless and I’ll hold my head in shame from yelling at them like I did.

For purposes here, I’m going to show you how to get the song lyrics from Genius if you have the song title, and also talk through my process of getting there.

Note of clarification, just to make sure I’m not violating their terms of service, this post is for informational purposes only. Hopefully this can help programmers out there learn. Don’t do something bad with this knowledge. Code time!

First thing you’re going to need is an account set up with Genius. You can sign up from the upper right hand corner of the genius.com homepage. After that, navigate to the api docs where you’ll then see your Bearer token that you’ll need for all API requests.

I’m using the requests library here, and once you have the bearer token, here’s what all the API requests to Genius should look like if, for example, you’re searching for a song title.

import requests

#TOKEN below should be the string that the API docs tells you
#Clearly I'm not giving mine out here on the internet. That'd be dumb
base_url = "http://api.genius.com"
#Key line below here when, this is how to authorize your request when
#using the API
headers = {'Authorization': 'Bearer TOKEN'}
search_url = base_url + "/search"
song_title = "In the Midst of It All"
data = {'q': song_title}
response = requests.get(search_url, data=data, headers=headers)

The response, according to the Genius API, would be a list of songs that match that string passed in, with the first result being the Tom Misch song that I was going for. By changing around the url that is passed into the request method, you can access all the information that Genius supplies from the API (pretty much everything but the lyrics).

Looking at that code above, you’ll probably be wondering how I can confirm that the song I picked off is the correct song I was looking for. For example, if I was looking for the song Capsized by Andrew Bird, and I used that as the search term, I’m probably going to get back more than a few songs titled Capsized. So now, I’m going to add in a little more information to make sure that we get the correct song we’re looking for.

import requests

base_url = "http://api.genius.com"
headers = {'Authorization': 'Bearer TOKEN'}
search_url = base_url + "/search"
song_title = "Capsized"
artist_name = "Andrew Bird"
data = {'q': song_title}
response = requests.get(search_url, data=data, headers=headers)
json = response.json()
song_info = None
for hit in json["response"]["hits"]:
  if hit["result"]["primary_artist"]["name"] == artist_name:
    song_info = hit
    break
if song_info:
  pass
  #now we have the song info and can do what we want

Cool, artist disambiguation. Final step here is getting the lyrics for the song itself. And this is going to involve BeautifulSoup. In this case, I want the lyrics from the Decemberists’ “Lake Song”.

import requests
from bs4 import BeautifulSoup

base_url = "http://api.genius.com"
headers = {'Authorization': 'Bearer TOKEN'}

song_title = "Lake Song"
artist_name = "The Decemberists"

def lyrics_from_song_api_path(song_api_path):
  song_url = base_url + song_api_path
  response = requests.get(song_url, headers=headers)
  json = response.json()
  path = json["response"]["song"]["path"]
  #gotta go regular html scraping... come on Genius
  page_url = "http://genius.com" + path
  page = requests.get(page_url)
  html = BeautifulSoup(page.text, "html.parser")
  #remove script tags that they put in the middle of the lyrics
  [h.extract() for h in html('script')]
  #at least Genius is nice and has a tag called 'lyrics'!
  lyrics = html.find(“div”, class_=”lyrics”).get_text() #updated css where the lyrics are based in HTML
  return lyrics

if __name__ == "__main__":
  search_url = base_url + "/search"
  data = {'q': song_title}
  response = requests.get(search_url, data=data, headers=headers)
  json = response.json()
  song_info = None
  for hit in json["response"]["hits"]:
    if hit["result"]["primary_artist"]["name"] == artist_name:
      song_info = hit
      break
  if song_info:
    song_api_path = song_info["result"]["api_path"]
    print lyrics_from_song_api_path(song_api_path)

Running the above code with the correct API token, and you should see the lyrics printed out to your console! Great song, listen to it.

If you look at the raw text within Python, you’ll see that there are some newlines in there that we can remove using a simple text replace for example. Formatting just depends on what you want to do with the output.

lyrics.replace('\n', ' ')

And with that above code, you should be able to get the lyrics to any song by supplying the name and the artist’s name.

Now one comment I’m sure someone will make is that Genius uses pretty simple url paths, and by knowing the artist and song names, we can probably figure out the path for the lyrics without using the api at all. For example, the song above’s url path that we generate is “http://genius.com/The-decemberists-lake-song-lyrics”. Pretty sure that’s just the artist name, song name and lyrics all sluggified together. But really I’m not just trying to get lyrics for songs here, but rather I’m trying to set up a general way to get info for artists and their songs for future, more complicated use.

Comment or let me know if I messed anything up here, or if I read the docs wrong and Genius supplies the lyrics somewhere in the API and I just missed it. I easily could have overlooked it, and I’d like to have this info be correct!

Stay tuned for more interesting posts including, but not limited to, which country artist talks about trucks the most.

13 thoughts on “Getting Song Lyrics from Genius’s API + Scraping

  1. Pingback: Getting Song Lyrics from Genius using their API + Scraping – Entertainment |Video | Lyrics

  2. Pingback: Talkin’ ‘Bout Trucks, Beer, and Love in Country Songs — Analyzing Genius Lyrics | Big-Ish Data

  3. Pingback: Classifying Country Music Songs is an Art — Getting Training Data | Big-Ish Data

  4. RyanB

    The reason they can’t give out lyrics freely is because they now have to pay licensing royalties on them. Most likely, they pay every time someone views a lyric. On a web page, this can be paid for with ads, but their API would need a paid tier in order to provide lyrics (which would be a nice option).

    Like

    Reply
  5. Ana Araujo

    Hello,
    This has helped me so much! Thank you!

    A small note: this only worked for me when I made this small adjustment to the code.
    Just in case there was a typo:
    I changed
    lyrics = html.find(“lyrics”).get_textn lyrics
    to
    lyrics = html.find(“lyrics”).get_text()

    I might be wrong though, in which case, I’m sorry
    Thanks again so much

    Like

    Reply
    1. Jack Schultz Post author

      Checking out the code here and looks like it’s correct get_text() on the post, so might have been a copy paste error for you. But thanks for having me check, sometimes me copying code to the editor here leads to issues. And thanks for reading this and trying it out! Like hearing from people who do that.

      Like

      Reply
  6. Bilal Ather

    I thought I was crazy for not being able to find lyric endpoints in their API, cool write-up. I’m gonna be trying to implement something like this in javascript. I guess it will use a part-API and part scraping

    Like

    Reply
    1. Jack Schultz Post author

      Thanks for the comment! Obviously I like to write these in general, but nice to hear that these posts help people.

      Like

      Reply
    1. Jack Schultz Post author

      Interesting about the url, I’ll take a look at that. And throw the code on github. I think I pushed some of it at one point, but need to clean it up for sure. Thanks for the info!

      Like

      Reply
  7. Jerome Jasinski

    One other thing, It appears they now have the lyrics as a div class now.

    I changed
    lyrics = html.find(“lyrics”).get_text()
    to
    lyrics = html.find(“div”, class_=”lyrics”).get_text()

    gets the lyrics just fine with this.

    Like

    Reply

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s