Package wchartype :: Module wchartype
[hide private]
[frames] | no frames]

Module wchartype

source code

wchartype Retrieves character types of double-byte (full-width) characters.


Version: 0.1

Author: Ryan Ginstrom

License: MIT

Functions [hide private]
 
is_asian(char)
Is the character Asian?
source code
 
is_full_width(char)
Is the character full width? It will be full width if it's Asian or an ideographic space.
source code
 
is_kanji(char)
Returns whether char is kanji (or Chinese)
source code
 
is_hanzi(char)
Returns whether char is kanji (or Chinese)
source code
 
is_hiragana(char)
Returns whether char is hiragana
source code
 
is_katakana(char)
Returns whether char is katakana
source code
 
is_half_katakana(char)
Returns whether char is half-width katakana
source code
 
is_hangul(char)
Returns whether char is hangul
source code
 
is_full_punct(char)
Returns whether char is full-width punctuation
source code
 
is_full_digit(char)
Returns whether char is full-width digit
source code
 
is_full_letter(char)
Returns whether char is full-width letter.
source code
 
_test()
Run doc tests
source code
Variables [hide private]
  __description__ = 'Retrieves character types of double-byte ch...
  IDEOGRAPHIC_SPACE = 12288
Function Details [hide private]

is_asian(char)

source code 
Is the character Asian?
>>> is_asian('a')
False
>>> is_asian(u'\u65e5')
True
>>> is_asian(unichr(0x3000))
False

is_full_width(char)

source code 
Is the character full width? It will be full width if it's Asian or an ideographic space.
>>> is_full_width('a')
False
>>> is_full_width(u'\u65e5')
True
>>> is_full_width(unichr(0x3000))
True

is_kanji(char)

source code 
Returns whether char is kanji (or Chinese)
>>> is_kanji(u'\u4E40')
True
>>> is_kanji(u"a")
False

is_hanzi(char)

source code 
Returns whether char is kanji (or Chinese)
>>> is_kanji(u'\u4E40')
True
>>> is_kanji(u"a")
False

is_hiragana(char)

source code 
Returns whether char is hiragana
>>> is_hiragana(u'a')
False
>>> is_hiragana(u'\u308F') # わ
True
>>> is_hiragana(u'\u30EA') # リ
False

is_katakana(char)

source code 
Returns whether char is katakana
>>> is_katakana(u'$')
False
>>> is_katakana(u'\u30EA') # リ
True
>>> is_katakana(u'\u308F') # わ
False

is_half_katakana(char)

source code 
Returns whether char is half-width katakana
>>> is_half_katakana(u'$')
False
>>> is_half_katakana(u'\uFF91') # ム
True
>>> is_half_katakana(u'\u30EA') # リ
False

is_hangul(char)

source code 
Returns whether char is hangul
>>> is_hangul(u'1')
False

# halfwidth hangul >>> is_hangul(u'\uFFB8') # HALFWIDTH HANGUL LETTER CIEUC True

# fullwidth hangul >>> is_hangul(u'\uB973') # 륳 True
>>> is_hangul(u'\u30EA') # リ
False

is_full_punct(char)

source code 
Returns whether char is full-width punctuation
>>> is_full_punct(u'$')
False
>>> is_full_punct(u'\uFF05') # %
True
>>> is_full_punct(u'\uFF1E') # >
True
>>> is_full_punct(u'\uFF3D') # ]
True
>>> is_full_punct(u'\uFF5B') # {
True
>>> is_full_punct(u'\u30EA') # リ
False

is_full_digit(char)

source code 
Returns whether char is full-width digit
>>> is_full_digit(u'1')
False
>>> is_full_digit(u'\uFF15') # 5
True
>>> is_full_digit(u'\uFF05') # %
False

is_full_letter(char)

source code 
Returns whether char is full-width letter. This differs from the built-in isalpha method for strings, because isalpha will return True for CJK characters.
>>> is_full_letter(u'\u308F') # hiragana wa (わ)
False
>>> u'\u308F'.isalpha() # hiragana wa (わ)
True
>>> is_full_letter(u'A')
False
>>> is_full_letter(u'\uFF31') # Q
True
>>> is_full_letter(u'\uFF4A') # j
True
>>> is_full_letter(u'\u30EA') # リ
False
>>> is_full_letter(u'\uFF15') # 5
False

Variables Details [hide private]

__description__

Value:
'Retrieves character types of double-byte characters.'