How to Configure Full-Text Stopwords & Stoplist in MySQL

By: Sunil Kumar |  In: MySQL  |  Last Updated: 2017/09/20

How to Configure Full-Text Stopwords & Stoplist in MySQL

Mysql stopwords is a list of most commonly used English language words like some, about, by, from, by… and much more. These stop words do not match if they are present in the search string.
These stopwords are ignored in full-text search because these words are more likely to present in almost every string and it’s not possible to calculate relevancy in this case. So MySQL ignores those words to reduce result pollution and for efficiency.

NOTE- Case sensitivity of stopword lookups depends on the server collation. For example, lookups are case insensitive if the collation is latin1_swedish_ci, whereas lookups are case sensitive if the collation is latin1_general_cs or latin1_bin.

Stopwords for InnoDB Search Indexes

To see the default InnoDB stopword list, query the INFORMATION_SCHEMA. INNODB_FT_DEFAULT_STOPWORD table.

mysql > SELECT * FROM INFORMATION_SCHEMA.INNODB_FT_DEFAULT_STOPWORD;

You also can define your own stopword list or you can alter the stopword list for all InnoDB tables, define a table with the same structure as the INNODB_FT_DEFAULT_STOPWORD table, populate it with stopwords, and set the value of the innodb_ft_server_stopword_table option to a value in the form db_name/table_name before creating the full-text index. The stopword table must have a single VARCHAR column named value.
Here is the process how you can do it-
Create a table to store your keywords-

mysql > CREATE TABLE my_stopwords(value VARCHAR(30)) ENGINE = INNODB;

Now insert stopwords into this table

mysql > INSERT INTO my_stopwords(value) VALUES ('stopword1');

Now Set the innodb_ft_server_stopword_table option to the new stopword table

mysql > SET GLOBAL innodb_ft_server_stopword_table = 'database_name/my_stopwords';

That’s it. Your new stopword list is ready.

NOTE- By default, words less than 3 characters in length or greater than 84 characters in length do not appear in an InnoDB full-text search index. Maximum and minimum word length values are configurable using the innodb_ft_max_token_size and  innodb_ft_min_token_size variables.

Stopwords for MyISAM Search Indexes

To override/alter the default stopword list for MyISAM tables, set the ft_stopword_file system variable.
The variable value should be the pathname of the file containing the stopword list, or the empty string to disable stopword filtering. The server looks for the file in the data directory unless an absolute path name is given to specify a different directory. After changing the value of this variable or the contents of the stopword file, restart the server and rebuild your FULLTEXT indexes.
The stopword list is free-form, separating stopwords with any nonalphanumeric character such as newline, space, or comma. Exceptions are the underscore character (_) and a single apostrophe (‘) which are treated as part of a word.
The following table shows the default list of stopwords for MyISAM search indexes.

a’s able about above according
accordingly across actually after afterwards
again against ain’t all allow
allows almost alone along already
also although always am among
amongst an and another any
anybody anyhow anyone anything anyway
anyways anywhere apart appear appreciate
appropriate are aren’t around as
aside ask asking associated at
available away awfully be became
because become becomes becoming been
before beforehand behind being believe
below beside besides best better
between beyond both brief but
by c’mon c’s came can
can’t cannot cant cause causes
certain certainly changes clearly co
com come comes concerning consequently
consider considering contain containing contains
corresponding could couldn’t course currently
definitely described despite did didn’t
different do does doesn’t doing
don’t done down downwards during
each edu eg eight either
else elsewhere enough entirely especially
et etc even ever every
everybody everyone everything everywhere ex
exactly example except far few
fifth first five followed following
follows for former formerly forth
four from further furthermore get
gets getting given gives go
goes going gone got gotten
greetings had hadn’t happens hardly
has hasn’t have haven’t having
he he’s hello help hence
her here here’s hereafter hereby
herein hereupon hers herself hi
him himself his hither hopefully
how howbeit however i’d i’ll
i’m i’ve ie if ignored
immediate in inasmuch inc indeed
indicate indicated indicates inner insofar
instead into inward is isn’t
it it’d it’ll it’s its
itself just keep keeps kept
know known knows last lately
later latter latterly least less
lest let let’s like liked
likely little look looking looks
ltd mainly many may maybe
me mean meanwhile merely might
more moreover most mostly much
must my myself name namely
nd near nearly necessary need
needs neither never nevertheless new
next nine no nobody non
none noone nor normally not
nothing novel now nowhere obviously
of off often oh ok
okay old on once one
ones only onto or other
others otherwise ought our ours
ourselves out outside over overall
own particular particularly per perhaps
placed please plus possible presumably
probably provides que quite qv
rather rd re really reasonably
regarding regardless regards relatively respectively
right said same saw say
saying says second secondly see
seeing seem seemed seeming seems
seen self selves sensible sent
serious seriously seven several shall
she should shouldn’t since six
so some somebody somehow someone
something sometime sometimes somewhat somewhere
soon sorry specified specify specifying
still sub such sup sure
t’s take taken tell tends
th than thank thanks thanx
that that’s thats the their
theirs them themselves then thence
there there’s thereafter thereby therefore
therein theres thereupon these they
they’d they’ll they’re they’ve think
third this thorough thoroughly those
though three through throughout thru
thus to together too took
toward towards tried tries truly
try trying twice two un
under unfortunately unless unlikely until
unto up upon us use
used useful uses using usually
value various very via viz
vs want wants was wasn’t
way we we’d we’ll we’re
we’ve welcome well went were
weren’t what what’s whatever when
whence whenever where where’s whereafter
whereas whereby wherein whereupon wherever
whether which while whither who
who’s whoever whole whom whose
why will willing wish with
within without won’t wonder would
wouldn’t yes yet you you’d
you’ll you’re you’ve your yours
yourself yourselves zero

Comments


  • Hi everybody, here every one is sharing these kinds of know-how, so it’s fastidious to read this
    blog, and I used to pay a quick visit this blog every day.

  • Everүthіng is very open with a precise explɑnation of
    the challenges. It was tгuly informative. Your site is very ᥙseful.
    Thank you for sһaring!

  • Hi there to all, the contents existing at this web page are
    really remarkable for people experience, well, keep up the nice
    work fellows.

  • Hi there! This post couldn’t be written much better! Looking at this post
    reminds me of my previous roommate! He always
    kept preaching about this. I am going to send
    this article to him. Pretty sure he will have a great read.
    Thanks for sharing!

  • Hi I am using WordPress for this blog. Great to hear that you are starting your blog. let me know in case you need any help.

  • What a information of un-ambiguity and preserveness of precious familiarity about unpredicted feelings.

  • Leave a Comment

    Your email address will not be published.

    *


    Sunil Kumar


    I am the owner of acmeextension. I am a passionate writter and reader. I like writting technical stuff and simplifying complex stuff.
    Know More

    Join more than 10,000 others Web Developers