Gibt es in Android eine Möglichkeit, die (meines Wissens) nicht über Java.text.Normalizer verfügt, um Akzente aus einem String zu entfernen. E.g "éàù" wird zu "eau".
Ich möchte vermeiden, den String zu analysieren, um jedes Zeichen zu prüfen, wenn möglich!
Java.text.Normalizer
gibt es in Android (auf den neuesten Versionen sowieso). Du kannst es benutzen.
EDITAls Referenz verwenden Sie Normalizer
:
string = Normalizer.normalize(string, Normalizer.Form.NFD);
string = string.replaceAll("[^\\p{ASCII}]", "");
(eingefügt aus dem Link in den Kommentaren unten)
Ich habe die Lösung von Rabi an meine Bedürfnisse angepasst, ich hoffe, es hilft jemandem:
private static Map<Character, Character> MAP_NORM;
public static String removeAccents(String value)
{
if (MAP_NORM == null || MAP_NORM.size() == 0)
{
MAP_NORM = new HashMap<Character, Character>();
MAP_NORM.put('À', 'A');
MAP_NORM.put('Á', 'A');
MAP_NORM.put('Â', 'A');
MAP_NORM.put('Ã', 'A');
MAP_NORM.put('Ä', 'A');
MAP_NORM.put('È', 'E');
MAP_NORM.put('É', 'E');
MAP_NORM.put('Ê', 'E');
MAP_NORM.put('Ë', 'E');
MAP_NORM.put('Í', 'I');
MAP_NORM.put('Ì', 'I');
MAP_NORM.put('Î', 'I');
MAP_NORM.put('Ï', 'I');
MAP_NORM.put('Ù', 'U');
MAP_NORM.put('Ú', 'U');
MAP_NORM.put('Û', 'U');
MAP_NORM.put('Ü', 'U');
MAP_NORM.put('Ò', 'O');
MAP_NORM.put('Ó', 'O');
MAP_NORM.put('Ô', 'O');
MAP_NORM.put('Õ', 'O');
MAP_NORM.put('Ö', 'O');
MAP_NORM.put('Ñ', 'N');
MAP_NORM.put('Ç', 'C');
MAP_NORM.put('ª', 'A');
MAP_NORM.put('º', 'O');
MAP_NORM.put('§', 'S');
MAP_NORM.put('³', '3');
MAP_NORM.put('²', '2');
MAP_NORM.put('¹', '1');
MAP_NORM.put('à', 'a');
MAP_NORM.put('á', 'a');
MAP_NORM.put('â', 'a');
MAP_NORM.put('ã', 'a');
MAP_NORM.put('ä', 'a');
MAP_NORM.put('è', 'e');
MAP_NORM.put('é', 'e');
MAP_NORM.put('ê', 'e');
MAP_NORM.put('ë', 'e');
MAP_NORM.put('í', 'i');
MAP_NORM.put('ì', 'i');
MAP_NORM.put('î', 'i');
MAP_NORM.put('ï', 'i');
MAP_NORM.put('ù', 'u');
MAP_NORM.put('ú', 'u');
MAP_NORM.put('û', 'u');
MAP_NORM.put('ü', 'u');
MAP_NORM.put('ò', 'o');
MAP_NORM.put('ó', 'o');
MAP_NORM.put('ô', 'o');
MAP_NORM.put('õ', 'o');
MAP_NORM.put('ö', 'o');
MAP_NORM.put('ñ', 'n');
MAP_NORM.put('ç', 'c');
}
if (value == null) {
return "";
}
StringBuilder sb = new StringBuilder(value);
for(int i = 0; i < value.length(); i++) {
Character c = MAP_NORM.get(sb.charAt(i));
if(c != null) {
sb.setCharAt(i, c.charValue());
}
}
return sb.toString();
}
Dies ist wahrscheinlich nicht die effizienteste Lösung, aber es wird den Trick erfüllen und funktioniert in allen Android-Versionen:
private static Map<Character, Character> MAP_NORM;
static { // Greek characters normalization
MAP_NORM = new HashMap<Character, Character>();
MAP_NORM.put('ά', 'α');
MAP_NORM.put('έ', 'ε');
MAP_NORM.put('ί', 'ι');
MAP_NORM.put('ό', 'ο');
MAP_NORM.put('ύ', 'υ');
MAP_NORM.put('ή', 'η');
MAP_NORM.put('ς', 'σ');
MAP_NORM.put('ώ', 'ω');
MAP_NORM.put('Ά', 'α');
MAP_NORM.put('Έ', 'ε');
MAP_NORM.put('Ί', 'ι');
MAP_NORM.put('Ό', 'ο');
MAP_NORM.put('Ύ', 'υ');
MAP_NORM.put('Ή', 'η');
MAP_NORM.put('Ώ', 'ω');
}
public static String removeAccents(String s) {
if (s == null) {
return null;
}
StringBuilder sb = new StringBuilder(s);
for(int i = 0; i < s.length(); i++) {
Character c = MAP_NORM.get(sb.charAt(i));
if(c != null) {
sb.setCharAt(i, c.charValue());
}
}
return sb.toString();
}
Während Guillaumes Antwort funktioniert, werden alle Nicht-ASCII-Zeichen aus der Zeichenfolge entfernt. Wenn Sie diese beibehalten möchten, versuchen Sie diesen Code (wobei string
die zu vereinfachende Zeichenfolge ist):
// Convert input string to decomposed Unicode (NFD) so that the
// diacritical marks used in many European scripts (such as the
// "C WITH CIRCUMFLEX" → ĉ) become separate characters.
// Also use compatibility decomposition (K) so that characters,
// that have the exact same meaning as one or more other
// characters (such as "㎏" → "kg" or "ヒ" → "ヒ"), match when
// comparing them.
string = Normalizer.normalize(string, Normalizer.Form.NFKD);
StringBuilder result = new StringBuilder();
int offset = 0, strLen = string.length();
while(offset < strLen) {
int character = string.codePointAt(offset);
offset += Character.charCount(character);
// Only process characters that are not combining Unicode
// characters. This way all the decomposed diacritical marks
// (and some other not-that-important modifiers), that were
// part of the original string or produced by the NFKD
// normalizer above, disappear.
switch(Character.getType(character)) {
case Character.NON_SPACING_MARK:
case Character.COMBINING_SPACING_MARK:
// Some combining character found
break;
default:
result.appendCodePoint(Character.toLowerCase(character));
}
}
// Since we stripped all combining Unicode characters in the
// previous while-loop there should be no combining character
// remaining in the string and the composed and decomposed
// versions of the string should be equivalent. This also means
// we do not need to convert the string back to composed Unicode
// before returning it.
return result.toString();
Alle Diagrammzeichen mit Akzent befinden sich im erweiterten Zeichensatz ASCII mit Dezimalwerten größer als 127. Sie können also alle Zeichen in einer Zeichenfolge auflisten. Wenn der Dezimalzeichencodewert größer als 127 ist, ordnen Sie ihn Ihrem zu gewünschtes Äquivalent. Es gibt keine einfache Möglichkeit, Akzentzeichen den Gegenstücken ohne Akzent zuzuordnen. Sie müssen sich eine Art Karte im Speicher halten, um die erweiterten Dezimalcodes den nicht akzentuierten Zeichen zuordnen zu können.