Supported Unicode Character Classes

This section lists the Unicode character classes that are supported by the regular expression engine. Use the one- or two-letter class name in your patterns. For example to match a currency symbol:

\p{Sc}

The supported classes are:

  • C - Other
  • Cc - Control
  • Cf - Format
  • Co - PrivateUse
  • Cs - Surrogate
  • L - Letter
  • Ll - LowercaseLetter
  • Lm - ModifierLetter
  • Lo - OtherLetter
  • Lt - TitlecaseLetter
  • Lu - UppercaseLetter
  • M - Mark
  • Mc - SpacingMark
  • Me - EnclosingMark
  • Mn - NonSpacingMark
  • N - Number
  • Nd - DecimalNumber
  • Nl - LetterNumber
  • No - OtherNumber
  • P - Punctuation
  • Pc - ConnectorPunctuation
  • Pd - DashPunctuation
  • Pe - ClosePunctuation
  • Pf - FinalPunctuation
  • Pi - InitialPunctuation
  • Po - OtherPunctuation
  • Ps - OpenPunctuation
  • S - Symbol
  • Sc - CurrencySymbol
  • Sk - ModifierSymbol
  • Sm - MathSymbol
  • So - OtherSymbol
  • Z - Separator
  • Zl - LineSeparator
  • Zp - ParagraphSeparator
  • Zs - SpaceSeparator

Character class assignments are as provided by unicode.org - see http://www.unicode.org/Public/UNIDATA/UnicodeData.txt.