
WARNING:
Changes to this schedule may be made during the course of the semester.

See below:
- Sign up for a showcase topic of your interest by selecting ONE (and only one) option using the
doodle registration site.
There will be 4 students assigned to each showcase. However, signing up for a showcase is individual (you don't need to find a group first; groups will be formed as students sign up using the doodle).
- Work together with the group of students assigned to the same topic
to identify a real-world application of the data mining topic
you are assigned to.
- Discuss your chosen data mining application with the professor
at least 2 weeks in advance to the presentation.
You need to get the professor's approval of your selected application
before you start preparing your presentation.
- The team should investigate the application in depth and
prepare and deliver a 10 minute in-class presentation describing this application in as much detail as possible, focusing on its data mining aspects.
- Your presentation should contain the following sections:
- A cover page with the following title and subtitle,
replacing the parts in red with the information for your particular showcase:
CS548 Fall 2016 <Data Mining Technique> Showcase by
< students' names >
Showcasing work by < application authors or company > on
<"Title or name of the application you are showcasing" >
- A list of references and resources that you used for your presentation.
This should be included right after the cover page.
If you used articles and research papers, include the full reference
not just a link to the articles.
For this, follow the IEEE formatting rules available at
IEEE citation style.
Follow this format style to reference books, journal articles, conference articles, online references, and other published or unpublished work.
The richer your set of references, the better.
- A detailed description of the application.
- Email the following materials to the professor
at least 48 hours in advance to your class presentation.
- Your presentation slides.
Please name your representation slides as follows:
CS548F16_Showcase_<Data Mining Techique>.<file extension>
If at all possible, please send us the slides in an editable format (e.g., pptx) so that we can make small edits if needed.
- A short description of your application (3-4 sentences) to be included
in this webpage under "Short Description" in your showcase entry below.
- Rehearse your oral presentation to make sure it is polished,
transitions between speakers work well, and the full presentation
stays within the time allowed (10 minutes).
-
Sept. 22: Decision Trees
- Students:
Muyeedul Hoque, Chao Xu, Yue Zhao, and Kevin Heath
- Application Topic/Title:
"Classification of Acoustic Emission Signals Using Waveletsnd Random Forests: Application to localized corrosion"
- Short description:
Acoustic Emission (AE) is the transient elastic sound waves produced by release of localized stress energy. Acoustic emission signals detection is applied in identification and the classification of different types of corrosion. To classify, the authors utilized sophisticated pre-processing techniques, and applied the supervised learning method - Random Forests. The results are consistently better than the K-Nearest Neighbor (KNN) method.
- Slides:
CS548F16_Showcase_Decision_Trees
-
Sept. 29: Model and Regression Trees
- Students:
Tom Hartvigsen, Mu Niu, Qingyun Ren, and Allison Rozet
- Application Topic/Title:
"Development of a maximum likelihood regression tree-based model for predicting subway incident delay"
- Short description:
Unexpected delays due to subway incidents negatively impact commuter
opinion of public transportation and cost millions of dollars in fines
in Hong Kong. This paper describes the process of constructing a maximum
likelihood regression tree (MLRT) and analyzes a model built to predict
subway incident delays in Hong Kong's Mass Transit Railway. Performance
evaluation experiments confirm that the MLRT model outperforms traditional
accelerated failure time (AFT) models.
- Slides:
CS548F16_Showcase_Model_and_Regression_Trees
-
Oct. 6: Association Rules
- Students:
Deepan Sanghavi, Dhaval Dholakia, Peter Wang and Karan Somaiah Napanda
- Application Topic/Title:
Incorporating Both Positive and Negative Association Rules into the Analysis
of Outbound Tourism in Hong Kong
- Short description:
Association Rules is a data mining technique that generates rules to
depict relationships among data attributes.
This showcased paper proposes to incorporate negative
and positive association rules for analyzing outbound tourism
in Hong Kong. The dataset was collected by surveying throughout
Hong Kong and association rule mining was applied. Along with
positive associations, it also depicted negative associations
which illustrated what kind of travel packages would not interest a
certain group of people. Results portray a larger number of negative
rules than positive rules thus helping to make managerial marketing
decisions for travel and tourism companies.
- Slides:
CS548F16_Showcase_Association_Rules
-
Oct. 27: Clustering
- Students:
Theresa Inzerillo, Xi Liu, Preston Mueller
- Application Topic/Title:
Understanding Bike-Sharing Systems using Data Mining: Exploring Activity Patterns
- Short description:
Researchers studying the bike-sharing system of Vienna, Austria found that stations could be grouped into clusters that reflected the usage of those stations based on time of day. Their clustering efforts involved three cluster detection algorithms: k-means, expectation-maximization, and sequential information bottleneck. They used the Dunn index, silhouette index, and Davies-Bouldin index to validate their results.
- Slides:
CS548F16_Showcase_Clustering
-
Nov. 10: Anomaly Detection
- Students:
Nichole Etienne, Rohitpal Singh, Suchithra Balakrishnan, Yousef Fadila
- Application Topic/Title:
"Catch me if you can"
- Short description:
Passengers in public transit systems are frequently subjected to
pick pocketing. This paper concentrates on creating a system,
which identifies thieves (anomalies) from regular passengers
based on their traveling behavior. They use one-class SVM
method, with a two-step framework, of Regular Passenger
Filtering and Suspect Detection.
- Slides:
CS548F16_Showcase_Anomaly_Detection
-
Nov. 15: Text Mining
- Students:
Brendan Foley, Francisco Guerrero, Dennis Silva, ML Tlachac
- Application Topic/Title:
"Coronary Artery Disease Risk Assessment from Unstructured Electronic Health Records Using Text Mining"
- Short description:
Retrieving relevant data from unstructured Electronic
Health Records is a major problem in the health industry.
The Framingham Risk Score is a predictor of Coronary Heart Disease
based on age, cholesterol levels, and other factors.
This paper focuses on
using text mining techniques to retrieve information
on these relevant factors from Electronic Health Records to
better predict Coronary Heart Disease.
- Slides:
CS548F16_Showcase_Text_Mining
-
Dec. 1: Sequence Mining
- Students:
Bian Du, Wa Gao, and Cam Jones
- Application Topic/Title:
"A dynamic understanding of customer behavior processes based on clustering and sequence mining"
- Short description:
This paper analyzes consumer behavior data
consisting of transactions, where a transaction contains information
such as the number of items purchased and how far in advance.
The authors paper discuss their approach of grouping the customer behaviors
related to ticket purchases in the Netherlands, to apply sequence mining
and clustering techniques. The paper uses the Generalized Sequential
Pattern (GSP) algorithm. This results in a twofold analysis of the data,
producing human readable trajectories as well as a statistical report
on the trends in the data.
- Slides:
CS548F16_Showcase_Sequence_Mining
-
Dec. 6: Web Mining
- Students:
Abdulaziz Alajaji; Yupeng Su; Ye Wang; Tingwen Zhou
- Application Topic/Title:
"Web mining for navigation problem detection and diagnosis in Discapnet: A website aimed at disabled people"
- Short description:
This study proposes a system based on web mining techniques that
collects in-use information while the user is accessing the web. The
proposed system models users in their real environment, and discovers navigation problems
appearing in the Discapnet web site. The system can also be used for problem detection
when new users are navigating the site. Thus, this system can be used
to help web designer to modify the structure of website.
- Slides:
CS548F16_Showcase_Web_Mining