My Methods for Finding Data

For this analysis, I use Python for the webscraping and consult two main sources:

Data

I used several platforms to acquire the data necessary for this project. I organized the data with excel and csv files.Additionally,I used advanced excel features such as VLOOKUP,COUNTIF,Pivot Tables and other functions to clean,organize and complile data.

Problems

There were various problems that occured as I worked on this project. First,when attempting to retrieve data from Genius,The files with lyrics would repeat and/or include special characters in place of apostrophes. I fixed these issues by using regex to find the end of the song which had the text"Embed" at the end before repeating and then deleting the repeating lines.I also used find and replace to change the special characters to apostrophes. Another issue was that Genius did not have as much data as I would have liked. I was able to get album name,track count and release years but I was unable to get additional information such as genre,instruments and other potentially relevant data. In a meeting with Dr.B,we found a Bob Dylan wiki page that was created a run by Dylan fans. This site gave me relevant data such as genre,instrumentation,format and more. This solved the problem and I was able to get more data as a result. A third issue was that various excel functions like VLOOKUP do not work on tsv or csv files so I had to make a copy of the data in an excel spreadsheet that would be compatible with excel spreadsheets.For part of my analysis,I wanted to find the total number of verses over 8 albums. 4 in Dylan's early years and 4 in his late years. Most of the song txt files that I got via webscraping Genuis had verses tagged but some did not. Since I was familiar with most of the songs,like Queen Jane Approximately,I was able to figure out how many verses there were and added them myself.

Data Collection

I created multiple .py files to output tsv and csv files to organize my data. Then,I made sure that the variables like Album names were consistent across the files so that I could use the "VLOOKUP" function. If the album title was slightly different in one file,the VLOOKUP function would not work. For example,if there was an extra space in the title "Bob Dylan" like this: " Bob Dylan" or there was a typo,the function cannot match the data to the other table. I used the COUNTIF function to determine which period the album belongs to. Then,after compiling and cleaning data,I would use pivot tables to filter data.I normally filter by period which is either early or late dylan.In the rows column I would put a variable like format and the count of format. The table would read that Early Dylan had a count of 17 vinyl which would mean that all 17 albums were released on vinyl.If you would like to download my files,you may do so by clicking these links: fulldata.xlsx fulldata.csv

Visualizations

The visualizations you see on the album analysis and media format pages are made with: