webentwicklung-frage-antwort-db.com.de

Entfernen Sie Akzente aus String

Gibt es in Android eine Möglichkeit, die (meines Wissens) nicht über Java.text.Normalizer verfügt, um Akzente aus einem String zu entfernen. E.g "éàù" wird zu "eau".

Ich möchte vermeiden, den String zu analysieren, um jedes Zeichen zu prüfen, wenn möglich!

32
Johann

Java.text.Normalizer gibt es in Android (auf den neuesten Versionen sowieso). Du kannst es benutzen.

EDITAls Referenz verwenden Sie Normalizer:

string = Normalizer.normalize(string, Normalizer.Form.NFD);
string = string.replaceAll("[^\\p{ASCII}]", "");

(eingefügt aus dem Link in den Kommentaren unten)

78
Guillaume

Ich habe die Lösung von Rabi an meine Bedürfnisse angepasst, ich hoffe, es hilft jemandem:

private static Map<Character, Character> MAP_NORM;
public static String removeAccents(String value)
{
    if (MAP_NORM == null || MAP_NORM.size() == 0)
    {
        MAP_NORM = new HashMap<Character, Character>();
        MAP_NORM.put('À', 'A');
        MAP_NORM.put('Á', 'A');
        MAP_NORM.put('Â', 'A');
        MAP_NORM.put('Ã', 'A');
        MAP_NORM.put('Ä', 'A');
        MAP_NORM.put('È', 'E');
        MAP_NORM.put('É', 'E');
        MAP_NORM.put('Ê', 'E');
        MAP_NORM.put('Ë', 'E');
        MAP_NORM.put('Í', 'I');
        MAP_NORM.put('Ì', 'I');
        MAP_NORM.put('Î', 'I');
        MAP_NORM.put('Ï', 'I');
        MAP_NORM.put('Ù', 'U');
        MAP_NORM.put('Ú', 'U');
        MAP_NORM.put('Û', 'U');
        MAP_NORM.put('Ü', 'U');
        MAP_NORM.put('Ò', 'O');
        MAP_NORM.put('Ó', 'O');
        MAP_NORM.put('Ô', 'O');
        MAP_NORM.put('Õ', 'O');
        MAP_NORM.put('Ö', 'O');
        MAP_NORM.put('Ñ', 'N');
        MAP_NORM.put('Ç', 'C');
        MAP_NORM.put('ª', 'A');
        MAP_NORM.put('º', 'O');
        MAP_NORM.put('§', 'S');
        MAP_NORM.put('³', '3');
        MAP_NORM.put('²', '2');
        MAP_NORM.put('¹', '1');
        MAP_NORM.put('à', 'a');
        MAP_NORM.put('á', 'a');
        MAP_NORM.put('â', 'a');
        MAP_NORM.put('ã', 'a');
        MAP_NORM.put('ä', 'a');
        MAP_NORM.put('è', 'e');
        MAP_NORM.put('é', 'e');
        MAP_NORM.put('ê', 'e');
        MAP_NORM.put('ë', 'e');
        MAP_NORM.put('í', 'i');
        MAP_NORM.put('ì', 'i');
        MAP_NORM.put('î', 'i');
        MAP_NORM.put('ï', 'i');
        MAP_NORM.put('ù', 'u');
        MAP_NORM.put('ú', 'u');
        MAP_NORM.put('û', 'u');
        MAP_NORM.put('ü', 'u');
        MAP_NORM.put('ò', 'o');
        MAP_NORM.put('ó', 'o');
        MAP_NORM.put('ô', 'o');
        MAP_NORM.put('õ', 'o');
        MAP_NORM.put('ö', 'o');
        MAP_NORM.put('ñ', 'n');
        MAP_NORM.put('ç', 'c');
    }

    if (value == null) {
        return "";
    }

    StringBuilder sb = new StringBuilder(value);

    for(int i = 0; i < value.length(); i++) {
        Character c = MAP_NORM.get(sb.charAt(i));
        if(c != null) {
            sb.setCharAt(i, c.charValue());
        }
    }

    return sb.toString();

}
4
Juarez Schulz

Dies ist wahrscheinlich nicht die effizienteste Lösung, aber es wird den Trick erfüllen und funktioniert in allen Android-Versionen:

private static Map<Character, Character> MAP_NORM;
static { // Greek characters normalization
    MAP_NORM = new HashMap<Character, Character>();
    MAP_NORM.put('ά', 'α');
    MAP_NORM.put('έ', 'ε');
    MAP_NORM.put('ί', 'ι');
    MAP_NORM.put('ό', 'ο');
    MAP_NORM.put('ύ', 'υ');
    MAP_NORM.put('ή', 'η');
    MAP_NORM.put('ς', 'σ');
    MAP_NORM.put('ώ', 'ω');
    MAP_NORM.put('Ά', 'α');
    MAP_NORM.put('Έ', 'ε');
    MAP_NORM.put('Ί', 'ι');
    MAP_NORM.put('Ό', 'ο');
    MAP_NORM.put('Ύ', 'υ');
    MAP_NORM.put('Ή', 'η');
    MAP_NORM.put('Ώ', 'ω');
}

public static String removeAccents(String s) {
    if (s == null) {
        return null;
    }
    StringBuilder sb = new StringBuilder(s);

    for(int i = 0; i < s.length(); i++) {
        Character c = MAP_NORM.get(sb.charAt(i));
        if(c != null) {
            sb.setCharAt(i, c.charValue());
        }
    }

    return sb.toString();
}
2
Rabi

Während Guillaumes Antwort funktioniert, werden alle Nicht-ASCII-Zeichen aus der Zeichenfolge entfernt. Wenn Sie diese beibehalten möchten, versuchen Sie diesen Code (wobei string die zu vereinfachende Zeichenfolge ist):

// Convert input string to decomposed Unicode (NFD) so that the
// diacritical marks used in many European scripts (such as the
// "C WITH CIRCUMFLEX" → ĉ) become separate characters.
// Also use compatibility decomposition (K) so that characters,
// that have the exact same meaning as one or more other
// characters (such as "㎏" → "kg" or "ヒ" → "ヒ"), match when
// comparing them.
string = Normalizer.normalize(string, Normalizer.Form.NFKD);

StringBuilder result = new StringBuilder();

int offset = 0, strLen = string.length();
while(offset < strLen) {
    int character = string.codePointAt(offset);
    offset += Character.charCount(character);

    // Only process characters that are not combining Unicode
    // characters. This way all the decomposed diacritical marks
    // (and some other not-that-important modifiers), that were
    // part of the original string or produced by the NFKD
    // normalizer above, disappear.
    switch(Character.getType(character)) {
        case Character.NON_SPACING_MARK:
        case Character.COMBINING_SPACING_MARK:
            // Some combining character found
        break;

        default:
            result.appendCodePoint(Character.toLowerCase(character));
    }
}

// Since we stripped all combining Unicode characters in the
// previous while-loop there should be no combining character
// remaining in the string and the composed and decomposed
// versions of the string should be equivalent. This also means
// we do not need to convert the string back to composed Unicode
// before returning it.
return result.toString();
2
ntninja

Alle Diagrammzeichen mit Akzent befinden sich im erweiterten Zeichensatz ASCII mit Dezimalwerten größer als 127. Sie können also alle Zeichen in einer Zeichenfolge auflisten. Wenn der Dezimalzeichencodewert größer als 127 ist, ordnen Sie ihn Ihrem zu gewünschtes Äquivalent. Es gibt keine einfache Möglichkeit, Akzentzeichen den Gegenstücken ohne Akzent zuzuordnen. Sie müssen sich eine Art Karte im Speicher halten, um die erweiterten Dezimalcodes den nicht akzentuierten Zeichen zuordnen zu können.

0
Mike Marshall