Learning and Coding


Started April 13th

Non-technical details to know as a new data scientist

Beyond just learning the skills and tool kits necessary to find patterns in data, it’s also important to learn and remember how to think like a data scientist. These are a few of the non-technical things I’ve learned that I’ve found relevant to my development.


Training Speech Recognition



Many people have difficulty with spoken words, whether hard of hearing or even just checking voice mail in a noisy place. Speech recognition is a powerful tool with modern technology; training a computer to understand audio files and print the words, so that the information is accessible in a different format. Using technology to improve accessibility is a passion of mine, so I decided to develop my own algorithm for speech recognition prediction. My goal was not to create a better tool than what’s available on the market today, but to build an understanding of the process involved. Being limited to my own computational resources meant I simply don’t have the computing power to develop a highly accurate model.


Python Script on a Website

The first step is, of course, setting up a website that can run the script. I was recommended python anywhere which is a good, basic version to start with, as the free version works well for implementing basic code. I got my introduction from one of their blog posts that gives a step by step introduction to writing an initial webpage. This blog post is more intended to describe a few of the basic functions, and how to modify them, to implement specific tasks.


Understanding Results from Machine Learning

Machine learning is an incredibly powerful tool, but it’s important to remember that computers will never understand the data they look at. It’s up to the humans working with the data to look at the results to understand what information can be gleaned.


Explaining QQ Plots

To start with, the qq stands for quantile-quantile, or a comparison of quantiles between two populations. A quantile represents how many of the data points fall below this value, compared to the total number of data points. To simplify discussion, I will be discussing percentiles, or dividing up a data set into 100 equally sized groups. A percentile represents what percent of data points fall below the given value.