webentwicklung-frage-antwort-db.com.de

Ergreifen Sie die Zeilen, in denen die Tags in einer Textdatei nicht ersetzt wurden

In meinem Code ist ein großes Problem aufgetreten.

TL; DR: Nach ein paar Kommentaren habe ich beschlossen, den gesamten Code hier zu posten:

https://repl.it/repls/AustereShinyBetatest

Hier ist mein Code:

def highlight_nonmodified(content: str) -> str:
    regex = re.compile(r'(?s)(\{.*?[^\}]+\})', re.I | re.S)
    replace = r'#\1'
    content = regex.sub(replace, content)
    return content


def get_line(string_t: str, original: str) -> int:
    original = original.splitlines(True)
    for (i, line) in enumerate(original, 1):
        if string_t[1:] in line:
            return i

    return -1


def highligh_merge(original: str, modified: str) -> str:
    for line in modified.splitlines(True):
        if line.startswith('#'):
            numer = get_line(line, original)
            error = r"#Tag not supported at line{0}\n".format(numer)
            error = error + line
            modified = modified.replace(line, error)

Mein Problem ist, dass Folgendes passiert:

Textdatei.txt (Original):

1. Here goes some text. {tag} A wonderful day. It's soon cristmas. 
2. Happy 2019, soon. {Some useful tag!} Something else goes here. 
3. Happy ending. Yeppe! See you. 
4. 
5  Happy KKK! 
6. Happy B-Day!
7 
8. Universe is cool!
9.
10. {Tagish}. 
11.
12. {Slugish}. Here goes another line. {Slugish} since this is a new sentence. 
13.
14. endline.

Modified.txt:

Here goes some text.  A wonderful day. It's soon cristmas. 
Happy 2019, soon. #{Some useful tag!} Something else goes here. 
Happy ending. Yeppe! See you. 

Happy KKK! 
Happy B-Day!

Universe is cool!

. 

#Error: Tag not supported at line-1\n#{Slugish}. Here goes another line. #{Slugish} since this is a new sentence. 

endline. 

Ich kann anscheinend keine genaue Zeilennummerierung und keinen genauen Zeilennummernvergleich bekommen. Was mache ich hier falsch? Ich speichere offensichtlich zwei Kopien, original und modifiziert. Dann wähle ich aus und versuche die Zeilennummer herauszusuchen aus dem Originaltext durch zeilenweises Überfahren. Aber immer noch ohne Erfolg, ist dies sogar möglich. Vielen Dank im Voraus!

4
John Smith

Ich glaube nicht, dass dies möglich ist, wenn mehrzeilige Textblöcke entfernt wurden. Wenn Sie jedoch den Tagging-Prozess steuern, können Sie die ursprüngliche Zeilennummer in das Tag aufnehmen:

{ foo:12 }

und dann ist es trivial

original = int(re.search(r'\d+', tag).group(0))

Diese modifizierte Version Ihres Codes:

import re                                                                                                                        


def annotate_tags(content: str) -> str:                                                                                          
    """Annotate tags with line numbers."""                                                                                       
    tag_pattern = re.compile(r'(\{(?P<tag_value>[^}]+)\})')                                                                      
    lines = content.splitlines(True)                                                                                             
    annotated_lines = []                                                                                                         
    for idx, line in enumerate(lines, 1):                                                                                        
        annotated_lines.append(tag_pattern.sub(r'{\g<tag_value>:%s}' % idx, line))                                               
    annotated = ''.join(annotated_lines)                                                                                         
    return annotated                                                                                                             


def modify(content: str) -> str:                                                                                                 
    supported_tags = {                                                                                                           
            re.compile(r'(\{tag:\d+\})'): r'',                                                                                   
            re.compile(r'(\{Tagish:\d+\})'): r''                                                                                 
    }                                                                                                                            

    for pattern, replace in supported_tags.items():                                                                              
        matches = pattern.findall(content)                                                                                       
        if matches:                                                                                                              
            content = pattern.sub(replace, content)                                                                              

    return content                                                                                                               


def highlight_nonmodified(content: str) -> str:                                                                                  
    regex = re.compile(r'(?s)(\{.*?[^\}]+\})', re.I | re.S)                                                                      
    replace = r'#\1'                                                                                                             
    content = regex.sub(replace, content)                                                                                        
    return content                                                                                                               


def get_line(string_t: str, original: str) -> int:                                                                               
    tag_pattern = re.compile(r'(\{[^}]+:(?P<line_no>\d+)\})')                                                                    
    match = tag_pattern.search(string_t)                                                                                         
    if match:                                                                                                                    
        return match.group('line_no')                                                                                            
    return -1                                                                                                                    


def highlight_merge(original: str, modified: str) -> str:                                                                        
    tag_regex = re.compile(r'#(?s)(\{.*?[^\}]+\})', re.I | re.S)                                                                 
    for line in modified.splitlines(True):                                                                                       
        if tag_regex.search(line):                                                                                               
            numer = get_line(line, original)                                                                                     
            error = "#Tag not supported at line{0}\n".format(numer)                                                              
            error = error + line
            modified = modified.replace(line, error)
    return modified


if __== '__main__':
    file = 'textfile.txt'
    raw = ""
    with open(file, 'rt', encoding='utf-8') as f:
        for i, s in enumerate(f, 1):
            raw += "{}. {}".format(i, s)

    original = modified = raw

    modified = annotate_tags(modified)
    modified = modify(modified)
    modified = highlight_nonmodified(modified)
    modified = highlight_merge(original, modified)

    with open("modified.txt", 'w', encoding='utf-8') as f:
        f.write(modified)

Erzeugt diese Ausgabe:

1. Here goes some text.  A wonderful day. It's soon cristmas. 
#Tag not supported at line2
2. Happy 2019, soon. #{Some useful tag!:2} Something else goes here. 
3. Happy ending. Yeppe! See you. 
4. 
#Tag not supported at line5
5. #{begin:5}
6. Happy KKK! 
7. Happy B-Day!
#Tag not supported at line8
8. #{end:8}
9. 
10. Universe is cool!
11. 
12. . 
13. 
#Tag not supported at line14
14. #{Slugish:14}. Here goes another line. #{Slugish:14} since this is a new sentence. 
15. 
16. endline.
4
snakecharmerb

Nachfolgend finden Sie ein kurzes Skript zum Importieren der Dateien, zum Bereinigen der Daten, zum Erstellen von Aufzählungswörterbüchern und zur Ausgabe von Ergebnissen (optional basierend auf der Variable print_results).

(Wenn ich Ihre Frage nicht richtig interpretiere, lass es mich wissen!)

import re
from os import path

"""
Create an error class for trying to close a file that isn't open.
"""
class FileException(Exception):
    pass

class FileNotOpenError(FileException):
    pass

"""
Input variables.  base_path is just the directory where your files are located.
If they are in different directories, then use a second variable.
"""
base_path = r'C:\..\[folder containing text files]'
original_filename = 'test_text.txt'
modified_filename = 'modified_text.txt'


def import_data(file_name, root=base_path):
    """
    Read each text file into a list of lines.
    """
    full_path = path.join(root, file_name)

    with open(full_path, 'r') as f:
        data = f.readlines()

    try:
        f.close()
    except FileNotOpenError:
        pass

    if len(data) > 0:
        return data


def remove_numbering(input):
    """
    RegEx to clean data; This will remove only the line numbers and not
    any subsequent number-period combinations in the line.
    """
    p = re.compile(r'^([0-9]+[.]?\s)')
    return p.sub('', input)


def text_dict(text_list):
    """
    Remove numbering from either file; Considers period punctuation following number.
    """
    new_text = [remove_numbering(i).lstrip() for i in text_list]
    return {idx+1:val for idx, val in enumerate(new_text)}


def compare_files(original, modified, missing_list=None):

    # Create a fresh list (probably not necessary)
    if missing_list is None:
        missing_list = list()

    # Ensure that data types are dictionaries.
    if isinstance(original, dict) and isinstance(_modified, dict):
        # Use list comprehension to compare lines in each file.
        # Modified line numbers will end up in a list, which we will return.
        modified_index_list = [idx for idx in original.keys() if original[idx] != modified[idx]]

    # Check to see if list exists; Return it if it does.
    # if len(modified_index_list) > 0:
    if not modified_index_list is None:
        return modified_index_list


def comparison_findings(missing_list, original_dict, modified_dict):
    print('Modifications found on lines:\n- ' + '\n- '.join([str(i) for i in missing_list]))
    print('\n\n\tOriginal:\n')
    max_len = max([len(original_dict[i].replace('\n','').rstrip()) for i in original_dict.keys() if i in missing_list])
    print('\t\t{0:^7}{1:^{x}}'.format('Line','Value',x=max_len))
    for i in missing_list:
        temp_val = original_dict[i].replace('\n','').rstrip()
        print('\t\t{0:>5}{1:2}{2:<{x}}'.format(str(i), '', temp_val, x=max_len))
    print('\n\n\tModified:\n')
    max_len = max([len(modified_dict[i].replace('\n','').rstrip()) for i in modified_dict.keys() if i in missing_list])
    print('\t\t{0:^7}{1:^{x}}'.format('Line','Value',x=max_len))
    for i in xyz:
        temp_val = modified_dict[i].replace('\n','').rstrip()
        print('\t\t{0:>5}{1:2}{2:<{x}}'.format(str(i), '', temp_val, x=max_len))



if __== '__main__':
    print_results = True

    # Import text files.
    orig_data = import_data(original_filename)
    mod_data = import_data(modified_filename)

    # Create enumerated dictionaries from text files.
    _original = text_dict(orig_data)
    _modified = text_dict(mod_data)

    # Get a list of modified lines.
    mod_list = compare_files(_original, _modified)

    # Output results of file comparison.
    if print_results:
        comparison_findings(mod_list, _original, _modified)
2
Mark Moretto

Wenn Sie die Funktion get_line in highligh_merge aufrufen, führen Sie sie mit der geänderten Variable line aus, sodass sich line nie in der ursprünglichen Textdatei befindet. Wenn Sie den Wert von line betrachten:

#{Slugish}. Here goes another line. #{Slugish} since this is a new sentence.

Sie können sehen, dass dies eindeutig nicht in der ursprünglichen textfile.txt enthalten ist. Daher wird eine Zeilennummer von -1 zurückgegeben.

Eine Lösung für dieses Problem wäre die for-Schleife in Ihrer highligh_merge-Funktion von:

for line in modified.splitlines(True):

Zu:

for numer, line in enumerate(modified.splitlines(True)):

Nun ist numer in jeder Iteration gleich der Zeilenzahl - 1. Verwenden Sie einfach numer + 1, um die genaue Zeilenzahl der gerade bearbeiteten Zeile zu erhalten.

Ich hoffe das hilft. :)

1