This course will cover the theory, design and implementation of text-based information retrieval systems, including algorithms and techniques at the core of modern search systems. Specifically, we will learn the key concepts and models relevant to information retrieval and storage, including efficient text indexing, boolean and probabilistic retrieval models, retrieval evaluation, relevance feedback, document classification, learning to rank, document clustering and link analysis. We will implement key retrieval models on top of an open-source search engine system. Prerequisites: students should have had some exposure to basic probability, statistics, data structures and algorithms. You should be able to learn new software libraries on your own and design and develop functions on top.
Course GoalThrough this course, students will gain solid theoretical knowledge and enough practical experience to develop and diagnose their own search systems in the future.
|Two Programming Assignments:||30%|
|Four Written Assignments:||20%|
|The Final Project:||25% (abstract: 5%, presentation+report+code+data: 20%)|
|Final Exam (May 6th, 8:00-10:00 am):||25%|
The grading policy is as follows:
Attendance and Make-up Policies
Every student should attend the class, unless you have an accepted excuse. Please check student rule 7 http://student-rules.tamu.edu/rule07 for details.
Students should have taken the course Data Structure and Algorithms (CSCE 221).
Textbook and Material
The primary textbook: Introduction to Information Retrieval, Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schutze, Cambridge University Press. 2008. Relevant tutorials and papers will also be handed out during the class.
"An Aggie does not lie, cheat, or steal or tolerate those who do." For additional information, please visit: http://aggiehonor.tamu.edu.
Upon accepting admission to Texas A&M University, a student immediately assumes a commitment to uphold the Honor Code, to accept responsibility for learning, and to follow the philosophy and rules of the Honor System. Students will be required to state their commitment on examinations, research papers, and other academic work. Ignorance of the rules does not exclude any member of the TAMU community from the requirements or the processes of the Honor System.
Americans with Disabilities Act (ADA) Statement
The Americans with Disabilities Act (ADA) is a federal anti-discrimination statute that provides comprehensive civil rights protection for persons with disabilities. Among other things, this legislation requires that all students with disabilities be guaranteed a learning environment that provides for reasonable accommodation of their disabilities. If you believe you have a disability requiring an accommodation, please contact Disability Services, currently located in the Disability Services building at the Student Services at White Creek complex on west campus or call 979-845-1637. For additional information, visit http://disability.tamu.edu.
|1||Overview and indexing||Book Chapters|
|2||The Boolean Retrieval model||Book Chapters|
|3||Probabilistic IR: Vector space model||Book Chapters|
|4||Probabilistic IR: BM25||Book Chapters|
|5||Probabilistic IR: language models||Book Chapters|
|6||Retrieval Evaluation||Book Chapters|
|7||Relevance Feedback||Book Chapters|
|8||Document Classification||Book Chapters|
|9||Learning to Rank||Book Chapters|
|10||Flat clustering: k-means||Book Chapters|
|11||Hierarchical clustering: HAC||Book Chapters|
|12||Link analysis||Book Chapters|
|13||Trending topics||Book Chapters|
|14||Project Presentations||Book Chapters|