SUMMER SCHOOL
----------------------------------------
A summer school on statistical learning is going to be held at
Microsoft Research Asia, prior to the workshop. The school will
consist of lectures given by two distinguished researchers in
the
field.
Schedule
|
July 9 |
|
9:00am 每 10:20am |
Tutorial by Prof. Friedman |
|
10:20am 每 10:40am |
Tea Break |
|
10:40am 每 12:00am |
Tutorial by Prof. Friedman |
|
12:00am 每 2:00pm |
Lunch |
|
2:00pm 每 3:20pm |
Tutorial by Prof. McCallum |
|
3:20pm 每 3:40pm |
Tea Break |
|
3:40pm 每 5:00pm |
Tutorial by Prof. McCallum |
|
July 10 |
|
9:00am 每 10:20am |
Tutorial by Prof. Friedman |
|
10:20am 每 10:40am |
Tea Break |
|
10:40am 每 12:00am |
Tutorial by Prof. Friedman |
|
12:00am 每 2:00pm |
Lunch |
|
2:00pm 每 3:20pm |
Tutorial by Prof. McCallum |
|
3:20pm 每 3:40pm |
Tea Break |
|
3:40pm 每 5:00pm |
Tutorial by Prof. McCallum |
Title: Tree Based Approaches to Statistical Machine Learning
Lecturer: Jerome
Friedman (Stanford Univ.)
Abstract:
This workshop will focus on machine learning techniques based on
decision trees. Decision trees and related procedures are among
the most popular in data mining. After a general introduction to
the statistical machine learning problem (regression and
classification) the basics of single decision tree learning will
be discussed. Following that, improved learning methods based on
ensembles of trees will be described. These include bagging,
random forests, boosting and rule fitting.
Title: Information Extraction, Data Mining and Topic Modeling
with Probabilistic Models
Lecturer: Andrew
McCallum (Univ. of Massachusetts)
Abstract:
In this talk I will describe recent research at the intersection
of information extraction, data mining and social network
analysis. In particular I will focus on how such a combination
can be made both robust and scalable---showing that the typical
brittle cascading of errors from text extraction to data mining
can be avoided with unified probabilistic inference in graphical
models, and showing that these models can be made efficient with
recent methods of approximate inference and learning. After
briefly introducing conditional random fields, I will
demonstrate their use in joint models of extraction, entity
resolution, and sequence alignment.
I will then describe several methods of integrating textual and
other data in a "looser" type of data mining---topic modeling.
These are Bayesian latent-variable models that can discover rich
and interpretable cooccurrence patterns in high-dimensional
data, including data from multiple modalities. I'll introduce a
wide array of such models, including applications to nested
correlations, expert-finding, trend analysis, career path
modeling, research literature impact measurement.
Joint work with colleagues at UMass: Charles Sutton, Aron
Culotta, Wei Li, Chris Pal, Ben Wellner, Michael Hay, Xuerui
Wang, Natasha Mohanty, David Mimno, Pallika Kanani, Kedare
Bellare, Michael Wick, Rob Hall, Gideon Mann, and Andres
Corrada.
Date
July 9-10, 2007
Place
Microsoft Research Asia
Multifunction Room, B1 Sigma Building
No.49 Zhichun Road, Haidian District, Beijing.
﹛