This chapter describes a lot of things that you need to know when working on the MySQL code.
The MySQL server creates the following threads:
process_alarm()
to force timeouts on connections
that have been idle too long.
-DUSE_ALARM_THREAD
, a dedicated thread that
handles alarms is created. This is only used on some systems where
there are some problems with sigwait()
or if one wants to use the
thr_alarm()
code in ones application without a dedicated signal
handling thread.
--flush_time=#
option, a dedicated thread is created
to flush all tables at the given interval.
INSERT DELAYED
gets its
own thread.
--master-host
, slave replication thread will be
started to read and apply updates from the master.
mysqladmin processlist
only shows the connection and INSERT
DELAYED
threads.
Since Version 3.23.23, MySQL has support for full-text indexing
and searching. Full-text index in MySQL is an
index of type FULLTEXT
. FULLTEXT
indexes can be created from
VARCHAR
and TEXT
columns at CREATE TABLE
time or added
later with ALTER TABLE
or CREATE INDEX
. Full-text search is
performed with the MATCH
function.
mysql> CREATE TABLE t (a VARCHAR(200), b TEXT, FULLTEXT (a,b)); Query OK, 0 rows affected (0.00 sec) mysql> INSERT INTO t VALUES -> ('MySQL has now support', 'for full-text search'), -> ('Full-text indexes', 'are called collections'), -> ('Only MyISAM tables','support collections'), -> ('Function MATCH ... AGAINST()','is used to do a search'), -> ('Full-text search in MySQL', 'implements vector space model'); Query OK, 5 rows affected (0.00 sec) Records: 5 Duplicates: 0 Warnings: 0 mysql> SELECT * FROM t WHERE MATCH (a,b) AGAINST ('MySQL'); +---------------------------+-------------------------------+ | a | b | +---------------------------+-------------------------------+ | MySQL has now support | for full-text search | | Full-text search in MySQL | implements vector-space-model | +---------------------------+-------------------------------+ 2 rows in set (0.00 sec) mysql> SELECT *,MATCH a,b AGAINST ('collections support') as x FROM t; +------------------------------+-------------------------------+--------+ | a | b | x | +------------------------------+-------------------------------+--------+ | MySQL has now support | for full-text search | 0.3834 | | Full-text indexes | are called collections | 0.3834 | | Only MyISAM tables | support collections | 0.7668 | | Function MATCH ... AGAINST() | is used to do a search | 0 | | Full-text search in MySQL | implements vector space model | 0 | +------------------------------+-------------------------------+--------+ 5 rows in set (0.00 sec)
The function MATCH
matches a natural language query AGAINST
a
text collection (which is simply the columns that are covered
by a FULLTEXT index). For every row in a table it returns
relevance - similarity measure between the text in that row (in the columns
that are part of the collection) and the query. When it is used in a
WHERE
clause (see example above) the rows returned are
automatically sorted with relevance decreasing. Relevance is a non-
negative floating-point number. Zero relevance means no similarity.
Relevance is computed based on number of words in the row and number of
unique words in that row, total number of words in the collection,
number of documents (rows), that contain a particular word, etc.
MySQL uses a very simple parser to split text into words. A "word" is
any sequence of letters, numbers, '
, and _
. Any "word"
that is present in the stopword list or just too short (3 characters
or less) is ignored.
Every correct word in the collection and in the query is weighted, according to its significance in the query or collection. This way, a word that is present in many documents will have lower weight (and may even have a zero weight), because it has lower semantic value in this particular collection. Otherwise, if the word is rare, it will receive a higher weight. Weights of the words are then combined to compute the relevance.
Such a technique works best with big collections (in fact, it was carefully tuned up this way). For very small tables, word distribution does not reflect adequately their semantical value, and this model may sometimes produce bizarre results.
For example search for the word "search" will produce no results in the above example. Word "search" is present in more than half of rows, and as, such, is effectively treated as stopword (i.e. with semantical value zero). It is, really, the desired behaviour - natural language query should not return every second row in 1GB table.
The word that select 50% of rows has low ability to locate relevant documents (and will find plenty of unrelevant documents also - we all know this happen too often when we are trying to find something in Internet with search engine), and, as such, has low semantical value in this particular dataset.
Go to the first, previous, next, last section, table of contents.