- What is indexing?
- What is full text?
I know the answers to both questions, but I can't expose those answers in the exact way to an interviewer:
- indexing means something like index in book
- fulltext means for search string
Can please give me very simple definition for each of these questions?
An index in mysql is a mapping from each value in a column (or values in a set of columns) to the rows containing that value in that column (or those values in the set of columns).
A full text index on a column is a mapping from each word (generally separated by whitespace) to the set of rows that contains that word.
Normal index:
id:1 "bar"
id:2 "foo baz"
search for "foo" -> no results.
search for "foo baz" -> row with id:2
search for "bar" -> row with id:1
Fulltext index:
id:1 "bar"
id:2 "foo baz"
search for "foo" -> row with id:2
search for "foo baz" -> row with id:2
search for "bar" -> row with id:1
Indexing, is the process of creating Indexes. Indexes are structures which allow SQL (or more generally a DBMS or Search Engine) to locate, in a very efficient fashion, records based on the value of one (or several) of the fields they contain.
For example, a database may include at table containing student records, with their Student ID, their Name, Date of Birth, Phone Number ... By creating a index on the Phone Number, we can then search Student based on a phone number. In the absence of an index, the system would have found the same records, but this operation would have been carried by looking every single record and comparing with the desired phone number.
FullText Indexing is the process of creating a index for one (or several fields which contain text). Unlike with regular indexes which are based on comparing the complete value of the field (or possibly a simple regular expression) with the desired search value, a FullText index may locate a record based on words found within the field.
For example a bibliographic database may contain records describing books, with fields such as ISBN, Author, Title, Type, Price...). A fulltext index (sometimes called a "catalog") on the field Title, would allow to locate, efficiently, the book titled "The old man and the sea" when search by say the word "man".
Fulltext engines typically have a built in and parametrizable "understanding" of linguistic concepts pertaining to text. For example the "noise word" (also called "stopwords") are word frequently found in the text (example "the", "and", "of", "in", in English) may be ignored, for the purpose of minimizing the index size and making searches with more selective words more efficient. Also, fulltext engine may be aware of the various grammatical forms of a word, say the singular and plural form of words (as say Inch and Inches, Foot and Feet, Cat and Cats), or the conjugation of verbs (as Catch, Catching and Caught or Interpret, Interpreting, Interpreted). Thanks to this grammatical awareness, the FullText Engine can (if so instructed) locate words even if they do not match exactly the search criteria.
FullText engine also typically expose a search language/syntax which allows users to specify particular elements of the desired search. For example to search for the word "sea" within 5 words of the word "man". Or to find the word "Lake" or "Ocean" and the word "water" etc.