My Methods for Finding Data
For this analysis, I use Python for the webscraping and consult two main sources:
Data
I used several platforms to acquire the data necessary for this project. I
organized the data with excel and csv files.Additionally,I used advanced excel
features such as VLOOKUP,COUNTIF,Pivot Tables and other functions to
clean,organize and complile data.
Problems
There were various problems that occured as I worked on this
project. First,when attempting to retrieve data from Genius,The files with lyrics
would repeat and/or include special characters in place of apostrophes. I fixed
these issues by using regex to find the end of the song which had the text"Embed" at
the end before repeating and then deleting the repeating lines.I also used find and
replace to change the special characters to apostrophes. Another issue was that
Genius did not have as much data as I would have liked. I was able to get album
name,track count and release years but I was unable to get additional information
such as genre,instruments and other potentially relevant data. In a meeting with
Dr.B,we found a Bob Dylan wiki page that was created a run by Dylan fans. This site
gave me relevant data such as genre,instrumentation,format and more. This solved the
problem and I was able to get more data as a result. A third issue was that various
excel functions like VLOOKUP do not work on tsv or csv files so I had to make a copy
of the data in an excel spreadsheet that would be compatible with excel
spreadsheets.For part of my analysis,I wanted to find the total number of verses
over 8 albums. 4 in Dylan's early years and 4 in his late years. Most of the song
txt files that I got via webscraping Genuis had verses tagged but some did not.
Since I was familiar with most of the songs,like Queen Jane Approximately,I was able
to figure out how many verses there were and added them myself.
Data Collection
I created multiple .py files to output tsv and csv files to organize my data.
Then,I made sure that the variables like Album names were consistent across the
files so that I could use the "VLOOKUP" function. If the album title was slightly
different in one file,the VLOOKUP function would not work. For example,if there
was an extra space in the title "Bob Dylan" like this: " Bob Dylan" or there was
a typo,the function cannot match the data to the other table. I used the COUNTIF
function to determine which period the album belongs to. Then,after compiling
and cleaning data,I would use pivot tables to filter data.I normally filter by
period which is either early or late dylan.In the rows column I would put a
variable like format and the count of format. The table would read that Early
Dylan had a count of 17 vinyl which would mean that all 17 albums were released
on vinyl.If you would like to download my files,you may do so by clicking these
links: fulldata.xlsx
fulldata.csv
Visualizations
The visualizations you see on the album analysis and media format pages are made
with:
- Power BI
- Kumu.io
- AntConc
- SVG