26 lines
896 B
Markdown
26 lines
896 B
Markdown
|
# XML Processor
|
||
|
This script is a Python application for processing XML files. It computes perplexity on sentences and segments in the XML files using the XLM-Roberta model from Hugging Face's transformers library.
|
||
|
The perplexity scores are then added as new attributes in the XML files.
|
||
|
|
||
|
1. Create a Python virtual environment.
|
||
|
```bash
|
||
|
python3 -m venv .venv
|
||
|
```
|
||
|
2. Activate the virtual environment.
|
||
|
```bash
|
||
|
. .venv/bin/activate
|
||
|
```
|
||
|
3. Install the required dependencies.
|
||
|
```bash
|
||
|
pip install -r requirements.txt
|
||
|
```
|
||
|
|
||
|
## Usage
|
||
|
```bash
|
||
|
python xml_processor.py --input_dir <input_dir> --output_dir <output_dir> [--cpu]
|
||
|
```
|
||
|
|
||
|
* `input_dir`: Input directory containing XML files.
|
||
|
* `output_dir`: Output directory to save processed files.
|
||
|
* `cpu`: (Optional) Add this flag to force the script to use the CPU. If not provided, the script will use the GPU if available.
|