Information retrieval, as the name implies, concerns the retrieving of relevant information from databases. It is basically concerned with facilitating the user’s access to large amounts of (predominantly textual) information.
“Process of searching within a document collection for a particular information need which is called a query”- Langville & Meyer
“Information retrieval deals with the representation, storage, organization of, and access to information item, in order to give the user the possibility to easily access the desired information”- Baeza Yates
Information Retrieval (IR) is the activity of obtaining information from large collections of Information sources in response to a need.
The working of Information Retrieval process is explained below
Above figure (check the embedded document below) sketches the Processing of textual data typically performed by Information Retrieval engine, by taking a document as input and yielding its index terms.
COMPONENTS OF INFORMATION RETRIEVAL
Figure shows the functional approach of the information retrieval system. There are three major components of the information retrieval system.
(for figure, plz check the embedded document below)
APPLICATIONS OF INFORMATION RETRIEVAL
Perhaps one of the most common and well known application of information retrieval is the retrieval of text documents from the internet. With its recent growth, the internet is fast becoming the main media of communications for business and academic information. Thus it is essential to be able to tap the right document from this vast ocean of information. This is in fact, one of the main pushing force for the development of information retrieval. To date, many relatively successful systems have been developed. Some examples include:
NetOwl is an advanced information retrieval system with automatic indexing and summarization capabilities. The product provides an easy, cost-efficient way for common users to benefit from text analysis aimed at intelligence analysts.
NetOwl makes use of a combination of computational linguistics and Knowledge-based pattern matching methods to analyze natural language to determine the categories of words in the language. By identifying key concepts and relationships, it allows users to quickly find relevant content, eliminate inappropriate materials, and get the information they need. An additional feature is that NetOwl is capable of building an electronic “back of the book” type index on a company’s own web server, which enables users to spot important information or launch a request for information.
EUROSPIDER: The EUROSPIDER system is an Information Retrieval (IR) system which searches very large and complex data collections for relevant information. It is a commercial version of the IR system SPIDER, developed by the Swiss Federal Institute of Technology. EUROSPIDER can be used in various ways:
1. as a standalone IR system
2. as an add-on to a World-Wide Web server which makes data collection accessible through a private or public network
3. added to a commercial database (DB) system to access possibly very dynamic and structured data.
The EUROSPIDER retrieval system provides advanced Information Retrieval (IR) functions such as relevance ranking, feedback searches, linguistic document analysis, and automatic indexing. Document analysis and indexing optionally includes fuzzy term matching to cope with recognition errors of OCR-devices.
In this era of information overloading, the amount of information available to us is simply so much that it is virtually impossible for us to deal with in an efficient manner. One solution to this problem is to set up databases for multimedia data. Hundreds of television and radio broadcasts would then be covered by a database application which keeps track of the information available. Thus these vast amount of informations could then be managed and captured in an efficient way.
STRATEGIES OF RETRIEVAL PROCESS
– documents are described explicitly with query words (keywords)
– the result is ad hoc document clusters
A search engine query is a request for information that is made using a search engine. Every time a user puts a string of characters in a search engine and presses “Enter”, a search engine query is made. The string of characters (often one or more words) act as keywords that the search engine uses to algorithmically match results with the query. These results are displayed on the search engine results page (SERP) in order of significance (ranks) (according to the algorithm).
– the user starts from some possibly interesting topic/idea/document and browses documents to find relevant ones
– if no relevant documents are found, the user will move to somewhere else
– the starting point can be found by querying
– assumption: documents on the same topic are organised together
– the user follows hyperlinks towards a known goal (e.g. department of education, amu, aligarh)
– the route is assumed to be known, or it is easily found out while navigating
– the user scans the titles of the answer list, documents, hyperlinks, meta data, etc.
– auxiliary operation: e.g. when scanning, the seeker selects a hyperlink to follow
– the goal is to select for a person or an organisation from a document flow (e.g. today’s news, emails) interesting documents or remove unwanted ones
– a document from a document flow is routed to a person who is interested in the document or to whose field of activities it belongs (e.g. questions by customers are routed to different experts)