This project works to extract mutual fund disclosures and analyze statements of investment strategy and the attendant risks. The research team has developed text extraction code and is leveraging machine learning methodologies combined with the legal subject matter expertise to confirm compliance with SEC regulations, identify and aggregate mutual fund risks, analyze tone and sentiment of strategy statements, and explore relationships between mutual fund disclosure features and fund performance.
This project continues earlier Legal Analytics Lab work using docket sheets to explore case pathways, especially focused on judicial dispositive motions like "to dismiss" or "for summary" judgment. Working with a team of M.S. in Analytics and JD students, Alexander and Tucker are leveraging text analytics and machine learning to gain further insights in the frequency and predictors of certain case pathways and outcomes.
This research explores the use of image analytics to improve firm operations to achieve higher efficiency, greater productivity, and better customer service. Both traditional image analytics tools (e.g., image enhancing, segmentation, and object identification) and deep learning methods (e.g., convolutional neural networks, U-Nets, auto-encoders) have been investigated in this study.
This project aims to discover side effects for one single drug or a drug-drug interaction from patients’ posts across different online health forums (e.g., FAERS, Healthboards, etc). The project first develops medical domain-based word embedding to represent side effects and then applies a truth discovery approach, aggregating the side effects from all patients across different forums.
This study explores the content of online reviews. The growth of online shopping has made online reviews a critical source of information for consumers. However, there can be thousands of reviews for a single product. For example, Amazon’s Echo Dot accumulated more than 100,000 reviews in its first two years. The volume of reviews makes it difficult to search for useful and relevant information from the post-purchase experiences of others. The researchers developed a methodology that leads to a simple representation of information being revealed in reviews. Specifically, for each product, they extracted the relevant aspects of the product that are discussed in the reviews, and developed a measure of each reviewer’s satisfaction with those aspects. They applied this methodology to a large review dataset from Amazon and showed that initial reviewers report a few salient aspects of the product and their experiences with those aspects. Subsequent reviewers continue to report their experiences with these aspects. They find that user satisfaction with these aspects are very different when comparing favorable reviews to less favorable ones. Somewhat surprisingly, aspects that generate a strong positive satisfaction for positive reviews have a neutral or muted mention in negative reviews. Their results suggest simple strategies for platforms hosting reviews to easily provide relevant and useful information to customers.
This project examines the text of judges' decisions in employee misclassification cases -- or lawsuits where a worker's status as an independent contractor or employee is in dispute -- to understand how courts distinguish between the two categories. The law does not provide clear rules in these cases, so judges are called upon to apply a loose set of standards, producing written opinions that are highly unstructured. Using text mining and machine learning classification models, this project seeks to find patterns in judges' decision-making and provide more clarity on the state of the law in this area.
This project maps four types of network relationships among plaintiffs’ lawyers who filed wage and hour lawsuits under the Fair Labor Standards Act (FLSA) in federal court over 17 years: overlapping college attendance, law school attendance, shared professional association memberships, and co-counseling linkages. The first three linkage types are hypothesized to layer underneath, and predict, the fourth: shared educational experience and affinity group membership may make co-counseling more likely. Further, the project explores whether co-counseling relationships, particularly those across borders, influence case-filing numbers. To adopt a public health frame, attorneys from high-volume FLSA “hot spot” jurisdictions who join forces with lawyers who practice in other courts or states may act as vectors for the spread of FLSA litigation. This project uses an original data set of all federal FLSA cases filed between 2000 and 2016 to explore the existence of layered network relationships within the FLSA plaintiffs’ bar, and to investigate the extent to which these network relationships acted as litigation drivers.
One research area of Yichen Cheng is dimension reduction to better fit the machine learning model when there are many features. When the number of dimensions/features is high, classical analytics methods may fail. Thus, it is of importance to reduce the data dimensionality before any statistical methods or machine learning can be applied. In this project, we developed a supervised dimension reduction method that works especially well for high dimensional data. Applications include data visualization, feature engineering/selection for predictions, and inferences.
This project explores methods to simplify machine learning modules, specifically deep neural networks. This process makes the networks operate faster, improves the generalization error, and optimizes the memory required to store the model parameters. The team is also working on making deep and convolutional neural networks interpretable, to provide a more clear understanding of how they work, explore hidden and semantic features and improve their predictability. From an application perspective, the focus is mainly on business problems that involve image processing tasks and time series analysis.