xml-ocr/README.md

26 lines
896 B
Markdown

# XML Processor
This script is a Python application for processing XML files. It computes perplexity on sentences and segments in the XML files using the XLM-Roberta model from Hugging Face's transformers library.
The perplexity scores are then added as new attributes in the XML files.
1. Create a Python virtual environment.
```bash
python3 -m venv .venv
```
2. Activate the virtual environment.
```bash
. .venv/bin/activate
```
3. Install the required dependencies.
```bash
pip install -r requirements.txt
```
## Usage
```bash
python xml_processor.py --input_dir <input_dir> --output_dir <output_dir> [--cpu]
```
* `input_dir`: Input directory containing XML files.
* `output_dir`: Output directory to save processed files.
* `cpu`: (Optional) Add this flag to force the script to use the CPU. If not provided, the script will use the GPU if available.