
WARNING:
Changes to this schedule may be made during the course of the semester.

See below:
- Sign up for a showcase topic of your interest by selecting ONE (and only one) option using the
doodle registration site.
There will be 5 students in each showcase. However, signing up for a showcase is individual (you don't need to find a group first; groups will be formed as students sign up using the doodle; and groups can have a mix of students from CS548 and BCB503).
- Work together with the group of students assigned to the same topic
to identify a real-world application of the data mining topic
you are assigned to. These are the requirements for the paper:
- The paper should be about a
real-world, successful application of the data mining topic.
This sucessful data mining story should be about using the
corresponding data mining technique to
discover novel and useful patterns that made a difference in a certain application domain.
- That is,
the paper should be about a well-defined, real-world applicataion domain problem,
and it should either use existing data mining algorithms or introduce a new
data mining algorithm to solve the problem.
The paper should provide explicit analysis of the meaning of the results from the
point of view of the application domain.
- The application domain is up to the student team
(e.g., medicine, finance, sports, healthcare, science, ...).
- The paper must have been published within the past 7 years.
- Discuss your chosen data mining application with the professor
at least 2 weeks in advance to the presentation.
You need to get the professor's approval of your selected application
before you start preparing your presentation.
- The team should investigate the application in depth and
prepare and deliver a 10 minute in-class presentation describing this application in as much detail as possible, focusing on its data mining aspects.
- Your presentation should contain the following sections:
- A cover page with the following title and subtitle,
replacing the parts in blue with the information for your particular showcase:
CS548 Fall 2017 <Data Mining Technique> Showcase by
< students' names >
Showcasing work by < application authors or company > on
<"Title or name of the application you are showcasing" >
- A list of references and resources that you used for your presentation.
This should be included right after the cover page.
If you used articles and research papers, include the full reference
not just a link to the articles.
For this, follow the IEEE formatting rules available at
IEEE citation style.
Follow this format style to reference books, journal articles, conference articles, online references, and other published or unpublished work.
The richer your set of references, the better.
- A detailed description of the application.
- Email the following materials to the professor
at least 48 hours in advance to your class presentation.
- Your presentation slides.
Please name your representation slides as follows:
CS548F17_Showcase_<Data Mining Techique>.<file extension>
If at all possible, please send us the slides in an editable format (e.g., pptx) so that we can make small edits if needed.
- A short description of your application (3-4 sentences) to be included
in this webpage under "Short Description" in your showcase entry below.
- Rehearse your oral presentation to make sure it is polished,
transitions between speakers work well, and the full presentation
stays within the time allowed (10 minutes).
-
Sept. 21: Decision Trees
- Students:
Yimin Lin, Youqiao Ma, Ran Lin, Shaoju Wu, Bhon Bunnag.
- Application Topic/Title:
"Automatic selection of molecular descriptors using random forest:
Application to drug discovery"
- Short description:
Skillful feature selection is an essential pre-processing step for
the efficient application of data mining techniques in drug discovery.
This showcased paper examines a Random Forest-based approach to automatically
select features for classification.
The reduction of features helps to reduce the computing
time over existing approaches, and permits the exploration of much larger
datasets. The performance of models constructed over Random Forest-based
reduced features is compared with manual feature selection when
these models are trained using Random Forest, Support Vector Machines (SVM) and
Artificial Neural Networks (ANN) approaches.
- Slides:
CS548F17_Showcase_Decision_Trees
- Link to paper:
Showcased Paper
-
Sept. 28: Model and Regression Trees
- Students:
Nguyen Vo, Bolun Lin, Qing Xu, Ishaan Singhal, and Andi Dhroso
- Application Topic/Title:
"Everyone's an Influencer: Quantifying Influence on Twitter"
- Short description:
Quantifying how influential a user is on a social network plays an important role on
online promotion.
There is a large number of studies done on this research problem.
One of the first attempts to measuring influence level of users on Twitter is using Regression Trees.
This showcase presents how Regression Trees can be employed to derive data insights.
An interesting result is that influential users are those with many followers and who
have had high influence in the past.
- Slides:
CS548F17_Showcase_Model_and_Regression_Trees
- Link to paper:
Showcased Paper
-
Oct. 5: Association Rules
- Students:
Jidapa Thadajarassiri, Meng Wang,
Muhammed Veyis Kilincer, Sai Kiran Vadlamudi, Siqin Li
- Application Topic/Title:
"Identifying combinatorial biomarkers by association rule mining in the CAMD Alzheimer's database"
- Short description:
Association rules are simple at a conceptual level yet they can be used
for much more than the market basket example that is used to introduce
them. This paper uses association rules to provide an inexpensive and effective
method of diagnosing Alzheimer's. The authors devise an algorithm, based
on the apriori method, called SCARF which produced novel rules to better
and more inexpensively detect this dangerous and debilitating disease.
- Slides:
CS548F17_Showcase_Association_Rules
- Link to paper:
Showcased Paper
-
Oct. 26: Clustering
- Students:
Janvi Kothari, Sanket Gujar, Mihir Sawant, Umesh Nair, and Jin Huang.
- Application Topic/Title:
"Hierarchical Aligned Cluster Analysis for Temporal Clustering of Human Motion"
- Short description:
This paper presents an approach for clustering human motion
using Hierarchical Aligned Cluster Analysis (HACA).
HACA combines kernel k-means clustering with
generalized dynamic time alignment kernel to cluster human motion captured
in a time series.
- Slides:
CS548F17_Showcase_Clustering
- Link to paper:
Showcased Paper
-
Nov. 9: Anomaly Detection
- Students:
Jun Dao, Qiming Wang, Emily Weber, Zijun Xu, Ruosi Zhang.
- Application Topic/Title:
"Improved Principal Component Analysis for Anomaly Detection:
Application to an Emergency Department"
- Short description:
This paper is about applying an improved PCA algorithm using MCUSUM to
detect anomalies in data collected by an emergency department. The authors combine these two
techniques to better detect small anomalies in demand for patient care.
- Slides:
CS548F17_Showcase_Anomaly_Detection
- Link to paper:
Showcased Paper
-
Nov. 14: Text Mining
- Students:
Zhiqi Chen, Yuchen Shen, Tianyu Wu, Di You, Xiaoyu Zheng
- Application Topic/Title:
"Using text mining and sentiment analysis for online forums hotspot detection and forecast"
- Short description:
This paper applies sentiment analysis and text mining approaches to detect and forecast the hotspot of Sina Sports Forums. First, it develops an algorithm to calculate the emotional polarity of the text. Second, K-means is used to detect hotspots and SVM is used to predict the hotspots. The performance of SVM is measured by the K-means clustering result, and the prediction result of SVM is highly consistent with the clustering result of K-means.
- Slides:
CS548F17_Showcase_Text_Mining
- Link to paper:
Showcased Paper
-
Nov. 30: Sequence Mining
- Students:
Emma Clavet, Nan Hu, Shivangi Pandey, Tesia Shizume, Xiaojun Wang
- Application Topic/Title:
"The Use of Sequential Pattern Mining to Predict Next Prescribed Medications"
- Short description:
The aim of this paper is to determine whether Sequential Pattern Discovery using Equivalence Classes, a sequential pattern mining algorithm, is an effective method for identifying temporal relationships between medications and accurately predicting the next medication prescribed to a patient.
- Slides:
CS548F17_Showcase_Sequence_Mining
- Link to paper:
Showcased Paper
-
Dec. 5: Web Mining
- Students:
Linh Hoang, Giorgi Gachechiladze, Tian Xie, Jiaming Di, Yang Zheng
- Application Topic/Title:
"We Feel Fine and Searching the Emotional Web"
- Short description:
This paper presents We Feel Fine, an emotional search engine and web-based artwork which utilizes web mining to collect, detect and display the Internet emotions in realtime.
- Slides:
CS548F17_Showcase_Web_Mining
- Link to paper:
Showcased Paper