Non-technical details to know as a new data scientist

Posted by Alex Billinger on September 4, 2020

Beyond just learning the skills and tool kits necessary to find patterns in data, it’s also important to learn and remember how to think like a data scientist. These are a few of the non-technical things I’ve learned that I’ve found relevant to my development.

Low accuracy results are results

It’s easy to feel like the accuracy of results of a model are the most important part of a project, and that if there are no accurate models, then the project is a failure. However, it’s important to remember that even imperfect results are still results, and that something new has been learned. It feels frustrating to acknowledge that thing that was learned was “there’s not enough of a connection to build on”, or “we don’t have the data or resources necessary to use this”. This still gives us important information for moving forward: whether it tells us to go back to collect more data, or tells us our current path is a dead end, we still have more knowledge than before, and a more concrete direction to steer further steps in.

Computers are very literal

One of the most common problems I run into when my code doesn’t work is that, when I go back through it, I discover the computer is in fact, doing exactly what I told it to do. And this isn’t a “new to programming” thing, but something even seasoned coders often run into! It’s very easy to feel like we know exactly what we meant to tell the computer to do, but then it does something different-the computer only understands commands in the most literal way possible. Listening to other programmers complain about “why isn’t this code working right?” only to later follow it up with “oh, because it’s doing exactly what I told it to do” is common, and is an important detail to remember for people just starting out in writing code. People often talk about “rubber duck” coding; where just explaining code as you would to someone unfamiliar (even someone who doesn’t understand anything about it!) can help you find that bit of logic that you said a little bit wrong or didn’t specify clearly enough. While it might happen less often as coders get more experience, it doesn’t mean that it’s a sign of a “bad” coder.

Data isn’t infallible

Regardless of how it’s stored or processed, data is never infallible. It always originates from human ideas, and is subject to human bias. For instance, how a test group is gathered can be influenced by the kinds of people involved in a study, or even the wording on a recruitment poster. As another possibility-everyone knows the phrase “correlation does not imply causation” but it can be hard to remember that when data shows the correlation we expect from a study. Our own internal bias will guide us to the correlations we search the most for, and can make it hard to let go of a perceived causation. This also goes back to the first point about low accuracy results-it’s important to remember that sometimes the correlation we expect just is not there, and we need to move on from there. Also, how data is used can be subject to human choice and bias. For example, data gathered about consumers, while intended to make connecting them with advertisements that would benefit them, may make those consumers feel like they have no privacy-if we only view these data as impartial numbers, it’s easy to gloss over how the data is collected or used.

Looking things up does not mean you don’t know what you’re doing

Shortly before starting the course, I wrote some code for practice, and felt like I had needed to look up so many things. When showing the code (and all the open tabs) to another coder, I commented that I needed all the tabs because I was new and had to look everything up. His response was along the lines that that never changes-any day where he didn’t end up with stackoverflow or similar tabs open to look things up, was a day that was not spent programming. Coding isn’t about memorizing every command, but about learning how a computer processes information, and what it’s capable of. It’s also about learning what phrase or word to search for to find what you need to know. Never be afraid to look things up-it doesn’t mean you don’t understand it, but rather that you’re learning something new and building a better understanding of your skills.