Also habe ich das slug -Tag auf SO) durchsucht und nur zwei überzeugende Lösungen gefunden:
Welches sind nur eine Teillösung für das Problem. Ich könnte dies manuell selbst codieren, aber ich bin überrascht, dass es noch keine Lösung gibt.
Gibt es eine Slugify-Alrogithmus-Implementierung in C # und/oder .NET, die lateinische Zeichen, Unicode und verschiedene andere Sprachprobleme ordnungsgemäß behandelt?
http://predicatet.blogspot.com/2009/04/improved-c-slug-generator-or-how-to.html
public static string GenerateSlug(this string phrase)
{
string str = phrase.RemoveAccent().ToLower();
// invalid chars
str = Regex.Replace(str, @"[^a-z0-9\s-]", "");
// convert multiple spaces into one space
str = Regex.Replace(str, @"\s+", " ").Trim();
// cut and trim
str = str.Substring(0, str.Length <= 45 ? str.Length : 45).Trim();
str = Regex.Replace(str, @"\s", "-"); // hyphens
return str;
}
public static string RemoveAccent(this string txt)
{
byte[] bytes = System.Text.Encoding.GetEncoding("Cyrillic").GetBytes(txt);
return System.Text.Encoding.ASCII.GetString(bytes);
}
Hier finden Sie eine Möglichkeit, URL-Slug in c # zu generieren. Diese Funktion entfernt alle Akzente (Antwort von Marcel), ersetzt Leerzeichen, entfernt ungültige Zeichen, schneidet Bindestriche vom Ende ab und ersetzt doppelte Vorkommen von "-" oder "_"
Code:
public static string ToUrlSlug(string value){
//First to lower case
value = value.ToLowerInvariant();
//Remove all accents
var bytes = Encoding.GetEncoding("Cyrillic").GetBytes(value);
value = Encoding.ASCII.GetString(bytes);
//Replace spaces
value = Regex.Replace(value, @"\s", "-", RegexOptions.Compiled);
//Remove invalid chars
value = Regex.Replace(value, @"[^a-z0-9\s-_]", "",RegexOptions.Compiled);
//Trim dashes from end
value = value.Trim('-', '_');
//Replace double occurences of - or _
value = Regex.Replace(value, @"([-_]){2,}", "$1", RegexOptions.Compiled);
return value ;
}
Hier ist meine Wiedergabe, basierend auf den Antworten von Joan und Marcel. Ich habe folgende Änderungen vorgenommen:
Hier ist der Code:
public class UrlSlugger
{
// white space, em-dash, en-dash, underscore
static readonly Regex WordDelimiters = new Regex(@"[\s—–_]", RegexOptions.Compiled);
// characters that are not valid
static readonly Regex InvalidChars = new Regex(@"[^a-z0-9\-]", RegexOptions.Compiled);
// multiple hyphens
static readonly Regex MultipleHyphens = new Regex(@"-{2,}", RegexOptions.Compiled);
public static string ToUrlSlug(string value)
{
// convert to lower case
value = value.ToLowerInvariant();
// remove diacritics (accents)
value = RemoveDiacritics(value);
// ensure all Word delimiters are hyphens
value = WordDelimiters.Replace(value, "-");
// strip out invalid characters
value = InvalidChars.Replace(value, "");
// replace multiple hyphens (-) with a single hyphen
value = MultipleHyphens.Replace(value, "-");
// trim hyphens (-) from ends
return value.Trim('-');
}
/// See: http://www.siao2.com/2007/05/14/2629747.aspx
private static string RemoveDiacritics(string stIn)
{
string stFormD = stIn.Normalize(NormalizationForm.FormD);
StringBuilder sb = new StringBuilder();
for (int ich = 0; ich < stFormD.Length; ich++)
{
UnicodeCategory uc = CharUnicodeInfo.GetUnicodeCategory(stFormD[ich]);
if (uc != UnicodeCategory.NonSpacingMark)
{
sb.Append(stFormD[ich]);
}
}
return (sb.ToString().Normalize(NormalizationForm.FormC));
}
}
Dies löst das Problem mit nicht-lateinischen Zeichen immer noch nicht. Eine völlig alternative Lösung wäre, ri.EscapeDataString zu verwenden, um den String in seine hexadezimale Darstellung umzuwandeln:
string original = "测试公司";
// %E6%B5%8B%E8%AF%95%E5%85%AC%E5%8F%B8
string converted = Uri.EscapeDataString(original);
Verwenden Sie dann die Daten, um einen Hyperlink zu generieren:
<a href="http://www.example.com/100/%E6%B5%8B%E8%AF%95%E5%85%AC%E5%8F%B8">
测试公司
</a>
In vielen Browsern werden chinesische Zeichen in der Adressleiste angezeigt (siehe unten). Aufgrund meiner eingeschränkten Testmöglichkeiten wird dies jedoch nicht vollständig unterstützt.
HINWEIS: Damit ri.EscapeDataString auf diese Weise funktioniert, muss iriParsing aktiviert sein.
[~ # ~] edit [~ # ~]
Für diejenigen, die URL Slugs in C # generieren möchten, empfehle ich, diese verwandte Frage zu lesen:
Wie generiert Stack Overflow seine SEO-freundlichen URLs?
Es ist das, was ich letztendlich für mein Projekt verwendet habe.
Ein Problem, das ich mit Slugification (neues Wort!) Hatte, ist Kollisionen. Wenn ich zum Beispiel einen Blog-Post mit dem Namen "Stack-Overflow" und einen mit dem Namen "Stack Overflow" habe, sind die Slugs dieser beiden Titel gleich. Daher muss mein Slug-Generator normalerweise die Datenbank in irgendeiner Weise einbeziehen. Dies könnte der Grund sein, warum Sie keine allgemeineren Lösungen finden.
Hier ist mein Schuss drauf. Es unterstützt:
Code:
/// <summary>
/// Defines a set of utilities for creating slug urls.
/// </summary>
public static class Slug
{
/// <summary>
/// Creates a slug from the specified text.
/// </summary>
/// <param name="text">The text. If null if specified, null will be returned.</param>
/// <returns>
/// A slugged text.
/// </returns>
public static string Create(string text)
{
return Create(text, (SlugOptions)null);
}
/// <summary>
/// Creates a slug from the specified text.
/// </summary>
/// <param name="text">The text. If null if specified, null will be returned.</param>
/// <param name="options">The options. May be null.</param>
/// <returns>A slugged text.</returns>
public static string Create(string text, SlugOptions options)
{
if (text == null)
return null;
if (options == null)
{
options = new SlugOptions();
}
string normalised;
if (options.EarlyTruncate && options.MaximumLength > 0 && text.Length > options.MaximumLength)
{
normalised = text.Substring(0, options.MaximumLength).Normalize(NormalizationForm.FormD);
}
else
{
normalised = text.Normalize(NormalizationForm.FormD);
}
int max = options.MaximumLength > 0 ? Math.Min(normalised.Length, options.MaximumLength) : normalised.Length;
StringBuilder sb = new StringBuilder(max);
for (int i = 0; i < normalised.Length; i++)
{
char c = normalised[i];
UnicodeCategory uc = char.GetUnicodeCategory(c);
if (options.AllowedUnicodeCategories.Contains(uc) && options.IsAllowed(c))
{
switch (uc)
{
case UnicodeCategory.UppercaseLetter:
if (options.ToLower)
{
c = options.Culture != null ? char.ToLower(c, options.Culture) : char.ToLowerInvariant(c);
}
sb.Append(options.Replace(c));
break;
case UnicodeCategory.LowercaseLetter:
if (options.ToUpper)
{
c = options.Culture != null ? char.ToUpper(c, options.Culture) : char.ToUpperInvariant(c);
}
sb.Append(options.Replace(c));
break;
default:
sb.Append(options.Replace(c));
break;
}
}
else if (uc == UnicodeCategory.NonSpacingMark)
{
// don't add a separator
}
else
{
if (options.Separator != null && !EndsWith(sb, options.Separator))
{
sb.Append(options.Separator);
}
}
if (options.MaximumLength > 0 && sb.Length >= options.MaximumLength)
break;
}
string result = sb.ToString();
if (options.MaximumLength > 0 && result.Length > options.MaximumLength)
{
result = result.Substring(0, options.MaximumLength);
}
if (!options.CanEndWithSeparator && options.Separator != null && result.EndsWith(options.Separator))
{
result = result.Substring(0, result.Length - options.Separator.Length);
}
return result.Normalize(NormalizationForm.FormC);
}
private static bool EndsWith(StringBuilder sb, string text)
{
if (sb.Length < text.Length)
return false;
for (int i = 0; i < text.Length; i++)
{
if (sb[sb.Length - 1 - i] != text[text.Length - 1 - i])
return false;
}
return true;
}
}
/// <summary>
/// Defines options for the Slug utility class.
/// </summary>
public class SlugOptions
{
/// <summary>
/// Defines the default maximum length. Currently equal to 80.
/// </summary>
public const int DefaultMaximumLength = 80;
/// <summary>
/// Defines the default separator. Currently equal to "-".
/// </summary>
public const string DefaultSeparator = "-";
private bool _toLower;
private bool _toUpper;
/// <summary>
/// Initializes a new instance of the <see cref="SlugOptions"/> class.
/// </summary>
public SlugOptions()
{
MaximumLength = DefaultMaximumLength;
Separator = DefaultSeparator;
AllowedUnicodeCategories = new List<UnicodeCategory>();
AllowedUnicodeCategories.Add(UnicodeCategory.UppercaseLetter);
AllowedUnicodeCategories.Add(UnicodeCategory.LowercaseLetter);
AllowedUnicodeCategories.Add(UnicodeCategory.DecimalDigitNumber);
AllowedRanges = new List<KeyValuePair<short, short>>();
AllowedRanges.Add(new KeyValuePair<short, short>((short)'a', (short)'z'));
AllowedRanges.Add(new KeyValuePair<short, short>((short)'A', (short)'Z'));
AllowedRanges.Add(new KeyValuePair<short, short>((short)'0', (short)'9'));
}
/// <summary>
/// Gets the allowed unicode categories list.
/// </summary>
/// <value>
/// The allowed unicode categories list.
/// </value>
public virtual IList<UnicodeCategory> AllowedUnicodeCategories { get; private set; }
/// <summary>
/// Gets the allowed ranges list.
/// </summary>
/// <value>
/// The allowed ranges list.
/// </value>
public virtual IList<KeyValuePair<short, short>> AllowedRanges { get; private set; }
/// <summary>
/// Gets or sets the maximum length.
/// </summary>
/// <value>
/// The maximum length.
/// </value>
public virtual int MaximumLength { get; set; }
/// <summary>
/// Gets or sets the separator.
/// </summary>
/// <value>
/// The separator.
/// </value>
public virtual string Separator { get; set; }
/// <summary>
/// Gets or sets the culture for case conversion.
/// </summary>
/// <value>
/// The culture.
/// </value>
public virtual CultureInfo Culture { get; set; }
/// <summary>
/// Gets or sets a value indicating whether the string can end with a separator string.
/// </summary>
/// <value>
/// <c>true</c> if the string can end with a separator string; otherwise, <c>false</c>.
/// </value>
public virtual bool CanEndWithSeparator { get; set; }
/// <summary>
/// Gets or sets a value indicating whether the string is truncated before normalization.
/// </summary>
/// <value>
/// <c>true</c> if the string is truncated before normalization; otherwise, <c>false</c>.
/// </value>
public virtual bool EarlyTruncate { get; set; }
/// <summary>
/// Gets or sets a value indicating whether to lowercase the resulting string.
/// </summary>
/// <value>
/// <c>true</c> if the resulting string must be lowercased; otherwise, <c>false</c>.
/// </value>
public virtual bool ToLower
{
get
{
return _toLower;
}
set
{
_toLower = value;
if (_toLower)
{
_toUpper = false;
}
}
}
/// <summary>
/// Gets or sets a value indicating whether to uppercase the resulting string.
/// </summary>
/// <value>
/// <c>true</c> if the resulting string must be uppercased; otherwise, <c>false</c>.
/// </value>
public virtual bool ToUpper
{
get
{
return _toUpper;
}
set
{
_toUpper = value;
if (_toUpper)
{
_toLower = false;
}
}
}
/// <summary>
/// Determines whether the specified character is allowed.
/// </summary>
/// <param name="character">The character.</param>
/// <returns>true if the character is allowed; false otherwise.</returns>
public virtual bool IsAllowed(char character)
{
foreach (var p in AllowedRanges)
{
if (character >= p.Key && character <= p.Value)
return true;
}
return false;
}
/// <summary>
/// Replaces the specified character by a given string.
/// </summary>
/// <param name="character">The character to replace.</param>
/// <returns>a string.</returns>
public virtual string Replace(char character)
{
return character.ToString();
}
}