Go to file
Gašper Spagnolo cb3c07b428
Fix the pasring script
2023-08-03 13:36:16 +02:00
plots Fix the pasring script 2023-08-03 13:36:16 +02:00
.gitignore Push to git 2023-07-05 00:56:05 +02:00
Kranjska-xml-perplexity.zip Push to git 2023-07-05 00:56:05 +02:00
Makefile Add results of perplexity scanning 2023-07-17 11:08:19 +02:00
README.md Push to git 2023-07-05 00:56:05 +02:00
parse.py Fix the pasring script 2023-08-03 13:36:16 +02:00
rate.py Add results of perplexity scanning 2023-07-17 11:08:19 +02:00
requirements.txt Push to git 2023-07-05 00:56:05 +02:00

README.md

XML Processor

This script is a Python application for processing XML files. It computes perplexity on sentences and segments in the XML files using the XLM-Roberta model from Hugging Face's transformers library. The perplexity scores are then added as new attributes in the XML files.

  1. Create a Python virtual environment.
    python3 -m venv .venv
    
  2. Activate the virtual environment.
    . .venv/bin/activate
    
  3. Install the required dependencies.
    pip install -r requirements.txt
    

Usage

python xml_processor.py --input_dir <input_dir> --output_dir <output_dir> [--cpu]
  • input_dir: Input directory containing XML files.
  • output_dir: Output directory to save processed files.
  • cpu: (Optional) Add this flag to force the script to use the CPU. If not provided, the script will use the GPU if available.