Receipt.ID
A multi-label, multi-class, hierarchical classification system implemented in a two layer feed forward network. It trains individual Random Forest text-based classifiers and combines the result with other features. Receipt.ID is built to scale with an application as the taxonomy for the domain in which it is applied grows.


Model the Dynamics of Gender in Intro CS
A supervised learning project using eXtreme Gradient Boosting Trees. This project creates predictive model for understanding the dynamics of gender in intro CS at Berkeley for the years 2014 through 2015. This work builds on previous research done in fulfillment of a Computer Science Education Ph.D., HipHopathy, A Socio-Curricular Study of Introductory Computer Science.


Investigating Why Underrepresented Students Choose CS
This work presents a data-drive approach at examining the socio-curricular factors that lead historically underrepresented students’ retention and attrition in introductory Computer Science at UC Berkeley.


Hiphopathy
A Socio-Curricular Study of Introductory CS Anchored with Data Science Using Rap Lyrics. The goal of this work is to connect cultural relevance to computing by introducing elementary techniques of natural language processing with a corpus of hip-hop data. This curricular unit was implemented on edX MOOClet, BJC.3x: Data, Information and the Internet. The coursed launched in 2015 with over 20,000 students.


The Semantic Web Expert System Shell
An intelligent system platform capable of reasoning from multiple ontologies using Resource Description Framework (RDF), Rule Markup Language (RuleML), as well as other knowledge expressed as functional, structural, or causal models.


Identify Fraud from Enron Email
In 2000, Enron was one of the largest companies in the United States. By 2002, it had collapsed into bankruptcy due to widespread corporate fraud. In the resulting Federal investigation, a significant amount of typically confidential information entered into the public record, including tens of thousands of emails and detailed financial data for top executives. This project identifies systemic fraud at Enron by building a person of interest identifier based on financial and email data made public as a result of the Enron scandal.


A Student Intervention System
A student intervention supervised learning project with highly unbalanced dataset. The goal of this project is to identify students who might need early intervention.


Predicting Boston Housing Prices
A prediction model using a decision tree to determine what the optimal price might be for a house, based on historic housing data from Boston.