{"id":794,"date":"2017-09-10T02:29:04","date_gmt":"2017-09-10T01:29:04","guid":{"rendered":"http:\/\/www.dongpingzhang.com\/?p=794"},"modified":"2017-09-12T02:34:51","modified_gmt":"2017-09-12T01:34:51","slug":"introduction-to-information-retrieval","status":"publish","type":"post","link":"http:\/\/www.dongpingzhang.com\/?p=794","title":{"rendered":"Introduction to Information Retrieval"},"content":{"rendered":"<p><a href=\"http:\/\/www.dongpingzhang.com\/wordpress\/wp-content\/uploads\/2017\/09\/InformationRetrieval.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter  wp-image-795\" src=\"http:\/\/www.dongpingzhang.com\/wordpress\/wp-content\/uploads\/2017\/09\/InformationRetrieval.jpg\" alt=\"\" width=\"224\" height=\"330\" \/><\/a><\/p>\n<p style=\"text-align: left;\"><a href=\"https:\/\/nlp.stanford.edu\/IR-book\/\"><span style=\"font-weight: 400;\">Introduction to Information Retrieval<\/span><\/a><span style=\"font-weight: 400;\"> is my book of the week. It is co-authored by <\/span><a href=\"http:\/\/nlp.stanford.edu\/%7Emanning\/\"><span style=\"font-weight: 400;\">Christopher D. Manning<\/span><\/a><span style=\"font-weight: 400;\">, <\/span><a href=\"http:\/\/theory.stanford.edu\/~pragh\/\"><span style=\"font-weight: 400;\">Prabhakar Raghavan<\/span><\/a><span style=\"font-weight: 400;\"> and <\/span><a href=\"http:\/\/www.cis.uni-muenchen.de\/schuetze\/\"><span style=\"font-weight: 400;\">Hinrich Sch\u00fctze<\/span><\/a><span style=\"font-weight: 400;\">. The authors generously make the e-version of the book freely available to the public. I benefited from this generosity and read the pdf version. It is very convenient to follow the links in pdf, although the lack of reverse-link in pdf makes it hard to navigate back to the source of the link. <\/span><\/p>\n<p style=\"text-align: left;\"><span style=\"font-weight: 400;\">This book is very comprehensive and probably the best textbook available if you wish to know about information retrieval. There are both fundamental and advanced topics covered. Moreover, each chapter includes a<\/span><i><span style=\"font-weight: 400;\"> References and Further Reading <\/span><\/i><span style=\"font-weight: 400;\">section, providing more resources for readers who would like to dive further into specific topics. The other notable attribute of this book is its clarity in explaining the concepts without introducing unnecessarily complicated formula. The texts accompanying the algorithms express the logic clearly. If you are still unsure about how certain algorithm works after reading the text part, thinking through one of the several exercises typically included in each chapter helps a great deal. <\/span><\/p>\n<p style=\"text-align: left;\"><span style=\"font-weight: 400;\">The version online is from April, 2009. There are a great amount of recent advances in the information retrieval field that are not covered here. However, grasping the content in the book would no doubt help to better understand more recent works. There are more recent lecture notes based on this book available online that I have not explored yet, partially because I have one fairly recently published book, <\/span><i><span style=\"font-weight: 400;\">Information Retrieval: Implementing and Evaluating Search Engines<\/span><\/i><span style=\"font-weight: 400;\">, on my to-read list for the near future. Should you become positively obsessed with this topic, like me, you might appreciate that the authors also very helpfully offer a comprehensive list of <\/span><a href=\"https:\/\/nlp.stanford.edu\/IR-book\/information-retrieval.html\"><span style=\"font-weight: 400;\">information retrieval resources<\/span><\/a><span style=\"font-weight: 400;\">. <\/span><\/p>\n<p style=\"text-align: left;\"><span style=\"font-weight: 400;\">In the <\/span><a href=\"https:\/\/nlp.stanford.edu\/IR-book\/pdf\/00front.pdf\"><span style=\"font-weight: 400;\">preface<\/span><\/a><span style=\"font-weight: 400;\">, the authors talk about the organisation of this book in depth. Here is my feeble attempt to show you what this book covers at a very high level. If you are interested in learning how search engines work or how to build one, Chapter 1 to 8 cover the basics, such as an inverted index, index construction, compression, vector space model, relevance score calculation, evaluation and so on. Chapter 9 on relevance feedback and query expansion is of great guidance for real-world projects, as I was handling such a challenge in my work while reading this book. In the authors\u2019 words: it discusses methods by which retrieval can be enhanced through the use of techniques like relevance feedback and query expansion, which aim at increasing the likelihood of retrieving relevant documents. Chapters 9 to 18 cover more advanced topics, for example: probabilistic language model, text classification, clustering and latent semantic analysis. Chapters 19 to 21 dive into web search basics and more in depth on crawling, indexing and finally link analysis. Forgive me for my lack of diligence here, since there could be no better overview than the one written by the authors in the preface. <\/span><\/p>\n<p style=\"text-align: left;\"><span style=\"font-weight: 400;\">I enjoyed reading this book no less than the Cicero trilogy (<\/span><a href=\"http:\/\/www.dongpingzhang.com\/?p=501\"><span style=\"font-weight: 400;\">Imperium<\/span><\/a><span style=\"font-weight: 400;\">, <\/span><a href=\"http:\/\/www.dongpingzhang.com\/?p=512\"><span style=\"font-weight: 400;\">Conspirata<\/span><\/a><span style=\"font-weight: 400;\"> and <\/span><a href=\"http:\/\/www.dongpingzhang.com\/?p=519\"><span style=\"font-weight: 400;\">Dictator<\/span><\/a><span style=\"font-weight: 400;\">) and made many notes for future re-visits. <\/span><\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Introduction to Information Retrieval is my book of the week. It is co-authored by Christopher D. Manning, Prabhakar Raghavan and Hinrich Sch\u00fctze. The authors generously make the e-version of the book freely available to the public. I benefited from this generosity and read the pdf version. It is very convenient to follow the links in &hellip; <a href=\"http:\/\/www.dongpingzhang.com\/?p=794\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">Introduction to Information Retrieval<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_exactmetrics_skip_tracking":false,"_exactmetrics_sitenote_active":false,"_exactmetrics_sitenote_note":"","_exactmetrics_sitenote_category":0,"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[4],"tags":[],"class_list":["post-794","post","type-post","status-publish","format-standard","hentry","category-computer-science"],"jetpack_featured_media_url":"","jetpack_shortlink":"https:\/\/wp.me\/paFL7T-cO","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"http:\/\/www.dongpingzhang.com\/index.php?rest_route=\/wp\/v2\/posts\/794","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/www.dongpingzhang.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/www.dongpingzhang.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/www.dongpingzhang.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/www.dongpingzhang.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=794"}],"version-history":[{"count":6,"href":"http:\/\/www.dongpingzhang.com\/index.php?rest_route=\/wp\/v2\/posts\/794\/revisions"}],"predecessor-version":[{"id":801,"href":"http:\/\/www.dongpingzhang.com\/index.php?rest_route=\/wp\/v2\/posts\/794\/revisions\/801"}],"wp:attachment":[{"href":"http:\/\/www.dongpingzhang.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=794"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/www.dongpingzhang.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=794"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/www.dongpingzhang.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=794"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}