896 B
896 B
XML Processor
This script is a Python application for processing XML files. It computes perplexity on sentences and segments in the XML files using the XLM-Roberta model from Hugging Face's transformers library. The perplexity scores are then added as new attributes in the XML files.
- Create a Python virtual environment.
python3 -m venv .venv
- Activate the virtual environment.
. .venv/bin/activate
- Install the required dependencies.
pip install -r requirements.txt
Usage
python xml_processor.py --input_dir <input_dir> --output_dir <output_dir> [--cpu]
input_dir
: Input directory containing XML files.output_dir
: Output directory to save processed files.cpu
: (Optional) Add this flag to force the script to use the CPU. If not provided, the script will use the GPU if available.