Stumbling into Cultural Analytics

Nov 09, 2016

Data-driven method applied to the experience of intro CS at Universities.

Extreme Gradient Boosting Trees Feature Importance

I just realized in my mind, I have finally finished my doctoral project, even though I submitted and graduated months ago. My goal was to investigate all the issues that surround the decision of historically underrepresented groups to choose Computer Science. This was my own genuine question. No one gave it to me. It arose out of my own experience as an intelligent systems researcher. I often advise people to follow their curiosity, to lean into the itch that won’t be eased. I did just that. Scratching that it itch took me, UC Berkeley, led me to collaborate with some of the best in the field, and led me to Google.

The best thing that came out of scratching that itch though is my discovery that I am a data scientists. At first, I thought I wanted to build a computational model for understanding conceptual metaphors in rap lyrics. I started down this road. I became a graduate student researcher at ICSI — International Computer Science Institute and joined the metaphor team of Srini Narayanan. Gaining an understanding of how conceptual metaphors work has been invaluable, but I still didn’t see how that would help answer my question. What did happen was the creation of a rap inspired data-science unit within BJC: The Beauty and Joy of Computing and asking the question “rap as a computational object?” This led me to read Seymour Papert’s “Mindstorms” over and over again until I got clarity around the use of computational objects.

Rap wasn’t a computational object per say, but the understanding of conceptual metaphors in rap lyrics was a computational object TO THINK WITH. Since then the idea of “comp-obj to think with” has shaped me. For Papert, gears were his first comp-obj to think with, then came the logo turtle. The rap unit led me down a path of having to assess its impact on student’s experience of intro CS. At first, I bucked against this. I felt if you had to produce a “p-value” to prove the efficacy of your approach then your work was BS. I grudgingly accepted that I had to do this to get signatures and graduate. Like a spoilt teenager, I sucked my teeth and rolled my eyes. And did it, I did. Graduated I did. But something rather magical happened after graduation.

With the freedom that came from having my own intellectual runway back, I took flight. I that had sucked my teeth at “p-values” was now in search of more rigorous methods to understand social science data. All of a sudden I realized that my methods from intelligent systems and social science data analysis were one continuum. You started off at elementary statistics and linear algebra, then SVMs, till you eventually land at deep learning. I knew all of these things discretely. They weren’t connected and weren’t actually applied to things I cared about.

Once I formulated my own research question, inspired by my own lived experience, the lightbulb turned on. That was what was missing. You don’t really grok a thing until it can be applied to something you care about. You need that motivation. Now I have finally answered the last question of my doctoral inquiry. “Identify Factors that Predict Intro CS Experience Based on Gender.

Before the year is out, I will unpack this project in a series of medium post and share the techniques I used to answer this question.