Analyzing customer spending data using Unsupervised Learning techniques for discovering internal structure, patterns and knowledge.
Python, Scikit-learn, PCA, Clustering
Cross language information retrieval system (CLIR) which, given a query in German, searches text documents written in English using Natural Language Processing.
Python, NLP, IR, Machine Translation, Language Models
An Adaboost classifier to accurately predict whether an individual makes more than $50,000, and identify likely donors for a non-profit organisation.
Python, Scikit-learn, Decision Trees, SVM, Adaboost
Statistical and spatial data analysis, including visualizations, for the walkability of Melbourne suburbs. Capstone project for The University of Melbourne.
Python, Pandas, Visualizations
Python, Pandas, Seaborn
Exploratory Data Analysis of the 911 calls dataset hosted on Kaggle. Demonstrates extraction of useful features from different variables.
Python, Decision Tree, Regression, Model Complexity Analysis
A model to predict the value of a given house in the Boston real estate market using various statistical analysis tools. Identified the best price that a client can sell their house utilizing machine learning.
R, Descriptive Statistics, ggplot, dplyr
Analysis of the BRFSS-2013 data set using R, focusing on investigating the relationship between education and eating habits, sleep and mental health, and smoking, drinking and general health of a person.
Python, Keras, Deep Learning, CNN, Computer Vision
Designing and implementing a Convolutional Neural Network that learns to recognize sequences of digits using synthetic data generated by concatenating images from MNIST.
Hypothesis testing, R, ggplot, dplyr
Analysing the GSS (General Social Survey) dataset using R to infer if, in the year 2012, were men, of 18 years or above in the United States, more likely to oppose sex education in public schools than women.
R, Exploratory Data Analysis, gplot, dplyr
Exploration of baseball data for the year 2001 using R to look at replacements for key players lost by the Oakland A's in 2001.
Python 2, Scikit-learn, NLP
3-way polarity (positive, negative, neutral) classification system for tweets, without using NLTK's sentiment analysis engine.
Python, Pandas, Seaborn, Financial Analysis
Analysis of technology stocks, including change in price over time, daily returns, and stock behaviour prediction.