CSCE 689: Special Topics in Natural Language Processing (Spring 2016)

Instructor: Ruihong Huang

  • Location: 126 HRBB
  • Time: TR 9:35-10:50 am
  • Instructor Email: huangrh@cse.tamu.edu
  • Instructor Office: 402B HRBB
  • Credits: 3
  • Office Hours: Tue 11 am -12 pm or by appointment

NEWS
  • [02/15] Project proposal presentations will be on 02/16!
  • [01/14] The first meeting will be on 01/19!

Course Description

This is an especially exciting time to study Information Extraction (IE), a fundamental research area in Natural Language Processing (NLP), which aims to enable computers to automatically process large amounts of free text. This course teaches core IE concepts and techniques that are important for students to develop automatic text processing applications. The students will digest and practice their NLP knowledge and skills by working on real projects.

This course will begin with lectures introducing basics of natural language processing and machine learning, and continue to read papers on several sub-disciplines of IE, including named entity recognition, relation extraction, event extraction, coreference resolution and sentiment analysis.

Course Goal

Through this course, students will gain solid theoretical knowledge and enough practical experience to design and develop their own text processing applications in the future.

Evaluation Metrics

This course will emphasize on skills of critical paper reading and practical system development. You're required to present at least one research paper, read all the papers, write short paper summaries (two paragraphs, at most 1 page) and actively participate in class discussions. You will work on a nlp project by yourself or teamed with one classmate (a team of at most 2 people). The project will be evaluated two times, which occur in the middle of the semester and at the end of the semester. By mid term, you should have built a working system. Then in the latter half term, you will work on further improving the performance of your system. Specifically, the following score calculation metric will be used:

Paper presentations: 25%
Paper summaries: 10%
Class participation: 10%
Mid-term Project: 25%
Final-term Project: 30%

The grading policy is as follows:
90-100: A
80-89: B
70-79: C
60-69: D
<60: F

Presentations and Discussions

Two or more papers will be presented and discussed in each class. Papers will be presented by different students, one paper per student. Each student will present gist of a paper in 20 minutes or less and lead discussions after all the papers have been presented.

Paper presentations should aim to address the following aspects:

Everyone should read all the papers, write short paper summaries and is expected to participate actively in the discussions.

Project

It's important that you work on a real nlp project so that you earn first hand experience of basic text processing and learn to deal with high complexity of human language in concrete applications. You're free and encouraged to develop your project ideas. Or you can work on a project I suggest. The scale of the project should be a semester long. Remind that there's both a mid term and a final term evaluation for the project. The tentative timeline is as follows:

Prerequisite

There’s no prerequisite required. But knowledge of algorithms and basic algebra will be helpful.

Material

There is no textbook required; all the material will be made available online.

Academic Integrity

"An Aggie does not lie, cheat, or steal or tolerate those who do." For additional information, please visit: http://aggiehonor.tamu.edu.

Upon accepting admission to Texas A&M University, a student immediately assumes a commitment to uphold the Honor Code, to accept responsibility for learning, and to follow the philosophy and rules of the Honor System. Students will be required to state their commitment on examinations, research papers, and other academic work. Ignorance of the rules does not exclude any member of the TAMU community from the requirements or the processes of the Honor System.

Americans with Disabilities Act (ADA) Statement

The Americans with Disabilities Act (ADA) is a federal anti-discrimination statute that provides comprehensive civil rights protection for persons with disabilities. Among other things, this legislation requires that all students with disabilities be guaranteed a learning environment that provides for reasonable accommodation of their disabilities. If you believe you have a disability requiring an accommodation, please contact Disability Services, currently located in the Disability Services building at the Student Services at White Creek complex on west campus or call 979-845-1637. For additional information, visit http://disability.tamu.edu.



Tentative schedule


Week Date Topic Material Speakers
1 01/19 Introduction slides, Ruihong
01/21 Overview of IE slides, IE tutorial, another tutorial (slides) Ruihong
2 01/26 NLP basics slides Ruihong
01/28 ML basics slides Ruihong
Semantic Class Learning and Named Entity Recognition (NER)
3 02/02 Sequential Models Conditional Random Fields, Incorporating Non-local information Seth (both papers)
02/04 Sequential Models (cont) Reflections, NER in tweets Jeremy, Khuong
4 02/09 Pattern Learning Bootstrapping Extraction Patterns, Bootstrapping Ranked Rules Wenlin, Khuong
02/11 Pattern Learning (cont) Hyponym Patterns Reducing Semantic Drift Aastikta, Aastikta
Relation Extraction
5 02/16 Project Proposal Presentations None
02/18 Distant Supervision Distant Supervision Multi-instance Multi-label Learning Benke (both papers)
6 02/23 out of town no class
02/25 Open IE An OpenIE System The second generation Guanlong, Girish
7 03/01 Universal Schemas Universal Schemas Matrix Factorization Vijaya (both papers)
03/03 Hack Day work on your project
Event Extraction
8 03/08 Discourse Modeling Glacier LINKER Ajit (both papers)
03/10 Global Constraints Cross-document Cross Event Jeremy, Wenlin
9 03/15 Spring Break no class
03/17
Event Detection
10 03/22 Mid-term Project Presentations None
03/24 Event Detection Multi-faceted Event Recognition Event Calendar Generation from Twitter Anurag, Anurag
Coreference Resolution
11 03/29 Using Entity-level features pairwise classification a Generative Model Anindya (both papers)
03/31 More flexible models Multi-pass Seive Incremental Clustering John, Thomas
12 04/05 Hack Week work on your project
04/07
Sentiment Analysis
13 04/12 Thumb Up? Up or down? Aspect-based Sentiment Summarization Thomas, John
04/14 Semantic Lexicon Induction Learning Sentiment Words Learning Sentiment Phrases Ke (both papers)
Semantic Role Labeling & Semantic Parsing
14 04/19 Semantic Role Labeling The first proposal via Integer Linear Programming Guangshuai, Pooja
04/21 Semantic Parsing Unsupervised Semantic Parsing Semantic Parsing on a database Guanlong, Girish
15 04/26 Final Project Presentations
04/28 Final Project Presentations
16 05/03 Final Project Presentations