Solving the simple problems can be the most fun
When starting my first job for Bloomberg LP in September 2015, I had to undergo 2 weeks of training on the Bloomberg Terminal, finance, and the internal systems of the company. I learned about the people, the job and, most notably, the technology which I would use in my blossoming career. This technology was not learned from a classroom setting. It wasn’t even learned from doing datacamps online. It was learned through asking a fun question, doing some research, and executing trial and error coding.
My first coding question was “who in my training class can click the ‘ring bell’ button on the Bloomberg Terminal fastest and what is their bell rings per minute?” I accomplished this by separating bell click timestamps for each person, finding the first bell click time for each person, the last bell click time, and their total bell clicks. It was a fun problem and a surprisingly motivating project. After all, I had little experience with programming at the time and it helped me build a technical reputation among my peers.
Suggesting that people start programming using a project that interests them is not a revolutionary idea. However, what I’m suggesting is even simpler than that. Think of an interesting, programmable question and try to write code to answer it. If you enjoy games, build a tic tac toe board. If you want to feel better about the cold weather in Raleigh, compare it to Seattle or London by scraping the Dark Sky API. If you want help writing a cover letter for a job, use term frequency of the job description to know which key topics you need to write about.
That last project is exactly what I did during the Institute for Advanced Analytics’ busy interview season. The code was completed using PyPDF2, nltk and, if you wish to stem words, the PorterStemmer package in Python. This program isn’t something that I would suggest using instead of reading the job description, but it can help you confirm the words that are most repeated and thus should be reflected in your cover letter.
Below are the steps I took to accomplish this task followed by the code to run and try it yourself.
- Import the job description
The first task to overcome in this project was to get the job description imported into the programming software. This is where the PyPDF2 package came into play. This package allows you to read in the file as an object which I creatively called “pdfFileObj” using the open function and then save a reader version of the file using PyPDF2.PdfFileReader(). After the reader was saved, I looped through every page of the document and appended the text from each page onto a blank string variable in order to compile the entire document’s text.
- Tokenize the document
Looking at the document’s text, it may be a bit messy, but all that’s needed are the words and how often they occur. That’s where the nltk package makes life simple and easy. There is a function called word_tokenize which can be used to separate the entire job description into a list where words are the items. I decided to remove words that were single-character, numbers, or located in a stopword list. This removed most, if not all, of the punctuation (single-character), phone numbers (numbers), and company names (stopwords). The stopwords can be imported from the nltk package in order to remove pronouns, conjunctions, or any other words that show up in writing but hold little to no substance. I added “data” and “analytics” as well as the company’s name because those are obviously occurring frequently in a job description that students in this program are looking at, but are not useful in assessing what words to put in a cover letter.
- (Optional) Stem words
After stop words are removed, you can do some Porter Stemming if you wish to combine words with similar meanings. For example, you may want to combine words like “visualize,” “visual,” and “visualization.” If this is the case, using the PorterStemmer function from the nltk package on every word will convert words to simplify to their base meaning. In the above example, all three words would get stemmed to “visual.” While stemming a word may not always hold value as a true English word (like “analytics” stemmed to “analyt”), it can still tell you how relevant analytic knowledge is for that role.
- Find frequency of all words
Lastly, I took a frequency distribution for each word using the FreqDist function from nltk and printed out a list of the top 25 most frequent words along with their frequency counts. This list of the top 25 words does two things: (1) provides clarity on which words to include in a resume/cover letter for a job you are interested in (2) highlights topics to focus your attention on learning if you wish to one day get a job similar to the one in the description. If that job description just so happens to use words like “programming”, “python”, or “technology” most frequently, I’d suggest you start with a fun, small project and build up from there.
Columnist: Zach Wasielewski