From the MySQL manual:
For any Unicode character set, operations performed using the
xxx_general_ci
collation are faster than those for thexxx_unicode_ci
collation. For example, comparisons for theutf8_general_ci
collation are faster, but slightly less correct, than comparisons forutf8_unicode_ci
.
They have a amusing “examples of the effect of collation” set on “sorting German umlauts,” but it unhelpfully uses latin1_*
collations. And another table that helpfully explains:
A difference between the collations is that this is true for utf8_general_ci:
ß = s
Whereas this is true for utf8_unicode_ci, which supports the German DIN-1 ordering (also known as dictionary order):
ß = ss
This forum post adds more info, but nowhere do they explain how a ☃ sorts against ☁ or ⛅.
How much faster is utf8_general_ci
than utf8_unicode_ci
, though? An August 2010 message in the MySQL forums seems to suggest the performance for specific operations could be 30% faster, but then dismisses the performance difference as unimportant compared to good indexing and writing efficient queries.