CSCE 470: Information Storage and Retrieval (Spring 2020)

Instructor: Ruihong Huang

  • Location: ZACH 350
  • Time: MWF 11:30 am - 12:20 pm
  • TA: Sanuj Sharma
  • Grader: Sourjya Banerjee
  • Instructor Email:
  • Instructor Office: 402B HRBB
  • TA Email:
  • TA Office: 408 HRBB
  • Grader Email:
  • Credits: 3
  • Office Hours: Tue 10:00 am - 12:00 pm, or by appointment
  • TA Office Hours: Tue 3:00 pm - 5:00 pm and Thr 3:00 pm - 5:00 pm, or by appointment

  • [01/13] The first meeting will be on 01/13!

Course Description

This course will cover the theory, design and implementation of text-based information retrieval systems, including algorithms and techniques at the core of modern search systems. Specifically, we will learn the key concepts and models relevant to information retrieval and storage, including efficient text indexing, boolean and probabilistic retrieval models, retrieval evaluation, relevance feedback, document classification, learning to rank, document clustering and link analysis. We will implement key retrieval models on top of an open-source search engine system. Prerequisites: students should have had some exposure to basic probability, statistics, data structures and algorithms. You should be able to learn new software libraries on your own and design and develop functions on top.

Course Goal

Through this course, students will gain solid theoretical knowledge and enough practical experience to develop and diagnose their own search systems in the future.

Evaluation Metrics

Two Programming Assignments: 30%
Four Written Assignments: 20%
The Final Project: 25% (abstract: 5%, presentation+report+code+data: 20%)
Final Exam (May 5th, 10:30 am - 12:30 pm): 25%

The grading policy is as follows:
90-100: A
80-89: B
70-79: C
60-69: D
<60: F

Important Dates

Project Abstract Due: on Feb. 12th, Wed, by 11:59 pm
Full Project Submission Due: on Apr. 23 pm, Thu, by noon
Fianl term exam: on May 5th, Tue, 10:30 am - 12:30 pm

Attendance and Make-up Policies

Every student should attend the class, unless you have an accepted excuse. Please check student rule 7 for details.

Homework Late Policies

For the programming/written homework assignments, you have a total of 5 late days that you can use during the semester. However, a single assignment can be submitted up to 2 days late only. For the purposes of the class, a late day is an indivisible 24-hour unit. Once you exhaust your 5 late days, we will not accept any late submissions.


Students should have taken the course Data Structure and Algorithms (CSCE 221).

Textbook and Material

The primary textbook: Introduction to Information Retrieval, Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schutze, Cambridge University Press. 2008. Relevant tutorials and papers will also be handed out during the class.

Academic Integrity

"An Aggie does not lie, cheat, or steal or tolerate those who do." For additional information, please visit:

Upon accepting admission to Texas A&M University, a student immediately assumes a commitment to uphold the Honor Code, to accept responsibility for learning, and to follow the philosophy and rules of the Honor System. Students will be required to state their commitment on examinations, research papers, and other academic work. Ignorance of the rules does not exclude any member of the TAMU community from the requirements or the processes of the Honor System.

Americans with Disabilities Act (ADA) Statement

The Americans with Disabilities Act (ADA) is a federal anti-discrimination statute that provides comprehensive civil rights protection for persons with disabilities. Among other things, this legislation requires that all students with disabilities be guaranteed a learning environment that provides for reasonable accommodation of their disabilities. If you believe you have a disability requiring an accommodation, please contact Disability Services, currently located in the Disability Services building at the Student Services at White Creek complex on west campus or call 979-845-1637. For additional information, visit

Tentative Topics

Week Topic Material
1 Overview and indexing Book Chapters
2 The Boolean Retrieval model Book Chapters
3 Probabilistic IR: Vector space model Book Chapters
4 Probabilistic IR: BM25 Book Chapters
5 Probabilistic IR: language models Book Chapters
6 Retrieval Evaluation Book Chapters
7 Relevance Feedback Book Chapters
8 Document Classification Book Chapters
9 Learning to Rank Book Chapters
10 Flat clustering: k-means Book Chapters
11 Hierarchical clustering: HAC Book Chapters
12 Link analysis Book Chapters
13 Trending topics
14 Project Presentations