EE448 Big Data Mining 2018 

Weinan Zhang, Assistant Professor

John Hopcroft Center for Computer Science
SEIEE
Shanghai Jiao Tong University

Email: wnzhang [AT] sjtu.edu.cn

Big data driven techniques have been revolutionizing various aspects of our daily life. Big data means not only big volume but also high dimension and diversity. How to collect, represent, process and compute so as to successfully mine valuable patterns and acquire benefit from the big data is a fundamental challenge to both academia and industry.

This course provides a comprehensive introduction of the fundamental problems and methodologies of big data mining. The organization of the course would be application oriented, which helps SEIEE students get familar with various data mining tasks and basic solutions. Via lectures, hands-on courseworks and poster presentations, the students are expected to acquire the basic theory, algorithms, and some practice experience of big data mining techniques. It would also help students find their interested research topics, which could benefit their further graduate study and industrial practice.

Notice

The first quiz will be taken on Apr. 19, 2018.

The first course work (text classification) is launched on Kaggle.

Course Works


link
Course work 1: Text Classification
To create a classification model for recommending editor selected articles.
Mar. 22, 2018 - May 5, 2018.

link
Course work 2: Link Prediction
Top-N prediction for a target node give a start node and a link type.
Apr. 21, 2018 - June 8, 2018.

Slides


pdf
Lecture 1: Introduction to Big Data Mining
Basic concepts, history and some examples of data mining.
Mar. 1, 2018

pdf
Lecture 2: Know Your Data
Data representation, visualization and proximity measures.
Mar. 8, 2018

pdf
Lecture 3: Fundamental Data Mining Algorithms
Frequent patterns, association rules, Apriori, FPGrowth, KNN.
Mar. 15, 2018

pdf
Lecture 4: Supervised Learning (Part I)
Intro to machine learning, linear regression and logisitic regression.
Mar. 22, 2018

pdf
Lecture 5: Supervised Learning (Part II)
Support Vector Machines, Neural Networks.
Mar. 29 and Apr. 12, 2018

pdf
Lecture 6: Supervised Learning (Part III)
Tree models, Ensemble Methods
Apr. 19 and 26, 2018

pdf
Lecture 7: Unsupervised Learning
K-means Clustering, PCA, Mixture Gaussian, EM Methods
May 3, 2018

pdf
Lecture 8: Search Engines
Information retrieval, inverted index, retrieval model, relevance model
May 10, 2018

pdf
Lecture 9: Learning to Rank
Ranking problem, pairwise/listwise ranking, LambdaRank
May 17, 2018

pdf
Lecture 10: Recommender Systems
Information filtering, collaborative filering, matrix factorization
May 24, 2018

Related Readings

Teaching Assistants


link
Yuchen Yan, IEEE Honored Class 2015 student, Computer Science
Research on data mining, knowledge graph, network analysis
Email: xyxpzer [at] sjtu.edu.cn

link
Ruijie Wang, Computer Science 2015 student
Research on data mining, natural language processing
Email: wjerry5 [at] sjtu.edu.cn

link
Jialu Wang, IEEE Honored Class 2015 student, Computer Science
Research on data mining, deep learning, reinforcement learning
Email: faldict [at] sjtu.edu.cn

News


Mar. 22, 2018
First course work is launched. The deadline is May 5.

Mar. 1, 2018
First lecture is provided.

Jan. 9, 2018
Web site created!