Institute for Insight
In the Insight Lab we focus on data science and machine learning innovations with applications to business problems brought to us by strategic partners. These unique, collaborative engagements provide organizations with the opportunity to gain insight into a big data challenge.
Organizations with exploratory big data projects bring their staff and data together with institute faculty and students to engage in a 3- to 4-week focused effort to understand what is in their data and what can be done with it in the institute’s big data lab.
Students work in teams on data sprints to find solutions to these real business problems involving data management and applications. The students tackle each project with a company staff member and devise possible solutions.
The objective is to see if specific questions can be answered using the data or if the data may be helpful in other ways.
Bring Us Your Problems
We welcome participation from professionals from the business community. To bring us your big data difficulties, contact Yusen Xia at email@example.com to explore how your company can partner with the institute and find solutions.
Dr. Yusen Xia
director, Insight Lab
Watch a video of an Insight Sprint in action, plus student and employer commentary.
Dell Sprint Project
MSDA students examined a number of components reflecting the diversity of the company’s employees at all levels of the organization. They then developed a dashboard with dash and plotly to interactively visualize different measures of diversity and inclusion metrics.
Starr Sprint Project (Fall 2020)
MSDA students examined several business areas of the insurance industry, and collected data such as premium, incurred losses, loss ratio, experience modifications and exposures for multiple states. A Power BI dashboard was built to measure territory risk scores.
Truist (SunTrust) Sprint Project
MSDA students investigated customer experience of the mortgage division of the bank in order to provided a more customized service to the clients. Machine learning techniques such as random forest, SVM and XGBoost were used in the project.
Florida Center for Capital Representation (FCCR) Sprint Project
Working with a set of court documents filed in Florida state courts, the students extracted key textual information to study prosecutors’ exercise of discretion in seeking the death penalty. The students used a variety of natural language processing and machine learning techniques to classify the documents into relevant categories. They also built visualization dashboards and reporting mechanisms for the FCCR to use in analyzing future sets of court documents.
Starr Sprint Project (Spring 2020)
Students analyzed structured and unstructured data from different sources to predict the severity of Commercial Auto claims, enabling increased early detection of claim severity and more accurate severity predictions, which implies more investment capability. Text mining and machine learning methods such as term frequency, sentiment analysis, word2vec, logistic regression, XG-Boot and random forest were used in this project.
This project utilizes machine learning methods to understand customer attrition. In particular, students investigated historical banking transactions to identify households who are likely to attrite, distinguish those households who leave out of dissatisfaction versus normal churn (evitable vs. inevitable), and identify issues to address based on the traits that distinguish these households. Various machine learning methods were implemented.
The project develops a scoring methodology to rate and rank brokers who have done business with the company. Students used unsupervised machine learning methods such as clustering and principal component analysis to identify weights of various factors that reflect broker performance and created a broker ranking system in PowerBI.
This project explores blockchain technology. Students experimented with technologies such as Ethereum, Quorum and Hyperledger to develop a blockchain architecture to connect different parties of relevance to TSYS, perform associated analytics, set up relevant accounts in the blockchain and develop smart contracts to manage transactions. A prototype was built for deployment.
Students examine historical sales and pricing data and use machine learning to dynamically predict the optimal price for semi-commodity products. Machine learning techniques such as logistic regression, decision tree, random forest and deep learning methods such as LSTM were used to predict sales.
Better Business Bureau (BBB)
This project provides insight to BBB on understanding of causative factors that might influence or impact an individual or customers decision to do/ maintain business with the organization, through effective data analysis, application of machine learning and predictive modeling. In particular, students use both sentiment analysis and topic modelling approaches on customer review and complaint texts to explore behavior of businesses in different industries.
Students examined historical bank transactions to identify potential money laundering examples. Different machine learning methods such as decision trees, random forest, support vector machines, logistics regression, and neural networks were implemented to support anti-money laundering (AML) efforts at SunTrust.
Students analyzed various data formats at Starr and tried to automate data input process especially for unstructured texts and images. In addition, topic models such as LDA were used to summarize and classify topics in documents, and sentiments of these documents were analyzed as well.
Metro Atlanta Chamber
Students analyzed unstructured data from different sources including Twitter, news media, Reddit, Facebook, and Google search trends. A system was built to systematically collect, clean, process, and analyze data that is relevant to the reputation of the city of Atlanta. Analyses included relevance filtering, topic modeling, and sentiment analyses. Machine learning algorithms were applied to improve the accuracy of classification and unsupervised learning. The results (refreshed periodically) are pushed to an online and interactive dashboard.
Barrett & Farahany sprint
Students analyzed unstructured text data from legal documents and court records in order to classify lawsuit outcomes and develop a predictive model for forecasting the steps through which a lawsuit would progress and its conclusion. Methodologies used included topic modeling, Word2Vec, and various machine learning classification algorithms.
Georgia-Pacific challenged Robinson students to use images in operations to determine whether use of image recognition can detect fraud and monitor activity. Students matched same-day inbound/outbound truck images and explored the use of image data in logistics.
Robinson students were asked by SunTrust Banks to explore what website behavior, by a customer, leads to a sale and whether the bank can tailor individual interaction in real-time. During the project, students measured the impact of “visitor engagement” that increased the probability that a customer would acquire a new product.
Robinson students engaged with WestRock to improve its plant operations through image analytics. Students took pictures of corrugated boxes on an assembly line, read the labels captured in the images, and gauged the descriptions' accuracy by comparing them to the physical products. Students also took product inventory through intensity differentiation of the images.
Using Robinson’s big data lab, students used text-mining to predict client attrition for SunTrust Banks. Investigating “unstructured” texts such as underwriter’s notes, client acquisition or risk review, and sales manager’s notes from servicing clients, students provided approaches for supporting SunTrust’s goal.
American Red Cross
To address the ongoing demand and need for blood, students set out to determine whether the American Red Cross could identify those likely to be a repeat donor and those likely to be a high value donor. Students analyzed demographic, geographic and behavioral profiles for donors and offered insights on drivers of donor loss and retention.
Students were challenged to use Starr Companies’ data on customer attributes in an existing line of business to determine what external data is useful and for what purpose in the property-casualty business. Using Robinson’s big data lab, students conducted data analysis and mining on the existing book of business to find correlations and patterns.