SUMMER SCHOOL

----------------------------------------

A summer school on statistical learning is going to be held at Microsoft Research Asia, prior to the workshop. The school will consist of lectures given by two distinguished researchers in the field.

Schedule

July 9
9:00am 每 10:20am Tutorial by Prof. Friedman
10:20am 每 10:40am Tea Break
10:40am 每 12:00am Tutorial by Prof. Friedman
12:00am 每 2:00pm Lunch
2:00pm 每 3:20pm Tutorial by Prof. McCallum
3:20pm 每 3:40pm Tea Break
3:40pm 每 5:00pm Tutorial by Prof. McCallum
July 10
9:00am 每 10:20am Tutorial by Prof. Friedman
10:20am 每 10:40am Tea Break
10:40am 每 12:00am Tutorial by Prof. Friedman
12:00am 每 2:00pm Lunch
2:00pm 每 3:20pm Tutorial by Prof. McCallum
3:20pm 每 3:40pm Tea Break
3:40pm 每 5:00pm Tutorial by Prof. McCallum

Title: Tree Based Approaches to Statistical Machine Learning
Lecturer: Jerome Friedman (Stanford Univ.)
Abstract:
This workshop will focus on machine learning techniques based on decision trees. Decision trees and related procedures are among the most popular in data mining. After a general introduction to the statistical machine learning problem (regression and classification) the basics of single decision tree learning will be discussed. Following that, improved learning methods based on ensembles of trees will be described. These include bagging, random forests, boosting and rule fitting.

Title: Information Extraction, Data Mining and Topic Modeling with Probabilistic Models
Lecturer: Andrew McCallum (Univ. of Massachusetts)
Abstract:
In this talk I will describe recent research at the intersection of information extraction, data mining and social network analysis. In particular I will focus on how such a combination can be made both robust and scalable---showing that the typical brittle cascading of errors from text extraction to data mining can be avoided with unified probabilistic inference in graphical models, and showing that these models can be made efficient with recent methods of approximate inference and learning. After briefly introducing conditional random fields, I will demonstrate their use in joint models of extraction, entity resolution, and sequence alignment.

I will then describe several methods of integrating textual and other data in a "looser" type of data mining---topic modeling. These are Bayesian latent-variable models that can discover rich and interpretable cooccurrence patterns in high-dimensional data, including data from multiple modalities. I'll introduce a wide array of such models, including applications to nested correlations, expert-finding, trend analysis, career path modeling, research literature impact measurement.

Joint work with colleagues at UMass: Charles Sutton, Aron Culotta, Wei Li, Chris Pal, Ben Wellner, Michael Hay, Xuerui Wang, Natasha Mohanty, David Mimno, Pallika Kanani, Kedare Bellare, Michael Wick, Rob Hall, Gideon Mann, and Andres Corrada.

Date
July 9-10, 2007

Place
Microsoft Research Asia
Multifunction Room, B1 Sigma Building
No.49 Zhichun Road, Haidian District, Beijing.

Copy Right© 2007
School of Mathematical Sciences, Peking University