Answer a question

I have some strings that have a mix of English and none English letters. For example:

w='_1991_اف_جي2'

How can I recognize these types of string using Regex or any other fast method in Python?

I prefer not to compare letters of the string one by one with a list of letters, but to do this in one shot and quickly.

Answers

You can just check whether the string can be encoded only with ASCII characters (which are Latin alphabet + some other characters). If it can not be encoded, then it has the characters from some other alphabet.

Note the comment # -*- coding: ..... It should be there at the top of the python file (otherwise you would receive some error about encoding)

# -*- coding: utf-8 -*-
def isEnglish(s):
    try:
        s.encode(encoding='utf-8').decode('ascii')
    except UnicodeDecodeError:
        return False
    else:
        return True

assert not isEnglish('slabiky, ale liší se podle významu')
assert isEnglish('English')
assert not isEnglish('ގެ ފުރަތަމަ ދެ އަކުރު ކަ')
assert not isEnglish('how about this one : 通 asfަ')
assert isEnglish('?fd4))45s&')
Logo

Python社区为您提供最前沿的新闻资讯和知识内容

更多推荐