Institute for Insight
Research Projects
Mutual Fund Risk Disclosures
Professors Anne Tucker (College of Law, and Institute for Insight) and Yusen Xia are working to extract mutual fund disclosures and analyze statements of investment strategy and the attendant risks. The research team has developed text extraction code and is leveraging machine learning methodologies combined with the legal subject matter expertise to confirm compliance with SEC regulations, identify and aggregate mutual fund risks, analyze tone and sentiment of strategy statements, and explore relationships between mutual fund disclosure features and fund performance.
Litigation
Professors Charlotte Alexander, Robinson Risk Management and Institute for Insight, and Associate Professor of Law, Anne Tucker are continuing earlier Legal Analytics Lab work using docket sheets to explore case pathways, especially focused on judicial dispositive motions like to dismiss or for summary judgment. Working with a team of MSA students and JD students, Professors Alexander and Tucker are leveraging text analytics and machine learning to gain further insights in the frequency and predictors of certain case pathways and outcomes.
Image Analytics to Improve Firm Operations
In this research, Professor Yusen Xia and colleagues have been exploring how to use image analytics to improve firm operations to achieve higher efficiency, greater productivity, and better customer services. Both traditional image analytics tools (e.g., image enhancing, segmentation, and object identification) and deep learning methods (e.g., convolutional neural networks, U-Nets, autoencoders) have been investigated in this study.
Drug Side-effect Discovery
Dr. Houping Xiao is working to discover side-effects for one single drug or a drug-drug interaction from the patients’ posts across different online health forum (e.g., FAERS, Healthboards, etc). The project first develops medical domain-based word embedding to represent side effect and then applies truth discovery approach, aggregating the side-effects from all patients across different forum in an unsupervised manner while considering the heterogeneity both patients and forums.
Understanding the Content of Online Reviews
Professors Cheng and Zhao, and colleagues, explore the content of online reviews. The growth of online shopping has made online reviews a critical source of information for consumers. However, there can be thousands of reviews for a single product. For example, Amazon’s Echo Dot had over 100,000 reviews in its first two years. The volume of reviews makes it difficult to search for useful and relevant information from the post-purchase experiences of others. The researchers develop a methodology that leads to a simple representation of information being revealed in reviews. Specifically, for each product, they extract the relevant aspects of the product that are discussed in the reviews, and develop a measure of each reviewer’s satisfaction with of these aspects. This leads to a simple representation of the information revealed in reviews: the discovery of salient aspects and then the extent of satisfaction of different reviewers with each of these aspects. They apply this methodology to a large review dataset from Amazon and show that initial reviewers report a few salient aspects of the product and their experiences with those aspects. Subsequent reviewers continue to report their experiences with these aspects. They find that user satisfaction with these aspects are very different when comparing favorable reviews to less favorable ones. Somewhat surprisingly, aspects that generate a strong positive satisfaction for positive reviews have a neutral or muted mention in negative reviews. Their results suggest simple strategies for platforms hosting reviews to easily provide relevant and useful information to customers.
A Study of Ten Years of Employee Misclassification Decisions
Led by Charlotte Alexander and Javad Feizollahi
This project examines the text of judges’ decisions in employee misclassification cases — or lawsuits where a worker’s status as an independent contractor or employee is in dispute — to understand how courts distinguish between the two categories. The law does not provide clear rules in these cases, so judges are called upon to apply a loose set of standards, producing written opinions that are highly unstructured. Using text mining and machine learning classification models, this project seeks to find patterns in judges’ decision-making and provide more clarity on the state of the law in this area.
Plaintiffs’ Attorney Networks as Litigation Drivers
Led by Charlotte Alexander
This project maps four types of network relationships among plaintiffs’ lawyers who filed wage and hour lawsuits under the Fair Labor Standards Act (FLSA) in federal court over seventeen years: overlapping college attendance, law school attendance, shared professional association memberships, and co-counseling linkages. The first three linkage types are hypothesized to layer underneath, and predict, the fourth: shared educational experience and affinity group membership may make co-counseling more likely. Further, the project explores whether co-counseling relationships, particularly those across borders, influence case-filing numbers. To adopt a public health frame, attorneys from high-volume FLSA “hot spot” jurisdictions who join forces with lawyers who practice in other courts or states may act as vectors for the spread of FLSA litigation. This project uses an original data set of all federal FLSA cases filed between 2000 and 2016 to explore the existence of layered network relationships within the FLSA plaintiffs’ bar, and to investigate the extent to which these network relationships acted as litigation drivers.
Dimension Reduction for Feature Engineering
One research area of Prof. Yichen Cheng is dimension reduction to better fit the machine learning model when there are many features. When the number of dimensions/features is high, classical analytics methods may fail. Thus, it is of importance to reduce the data dimensionality before any statistical methods or machine learning can be applied. In this project, we developed a supervised dimension reduction method that works especially well for high dimensional data. Applications include data visualization, feature engineering/selection for predictions and inferences.
Simplification and Interpretability of Machine Learning Modules
Prof Aghasi and colleagues are working on methods to simplify machine learning modules, specifically deep neural networks. This process makes the networks operate faster, improves the generalization error, and optimizes the memory required to store the model parameters. His team is also working on making deep and convolutional neural networks interpretable, to provide a more clear understanding of how they work, explore hidden and semantic features and improve their predictability. From an application perspective, his focus is mainly on business problems which involve image processing tasks and time series analysis.
