Documentation

For Scientists

Use the application to quickly surface rare Calabi-Yau candidates while keeping a reproducible record.

Installation Prerequisites

Quick Install

git clone https://github.com/upggr/compute.upg.gr.git
cd compute.upg.gr
pip install -r requirements.txt

How to Run

Basic Demo

python run_cy_search.py --config default.yml --verify --export-artifacts

Expected Runtime

5-15 minutes on standard hardware, depending on dataset size and CPU performance.

Output Location

All artifacts are saved to ./output/ directory:

Demo Exports

The Interactive Demo provides download buttons for JSON and CSV artifacts after each run.

Bring Your Own Data

Paste custom candidates as CSV rows in the demo. Each row must include all feature columns for the selected dataset in order:

Command-Line Options

--config CONFIG_FILE Path to configuration YAML (default: default.yml)
--verify Verify top results against ground truth
--export-artifacts Generate CSV/JSON output files
--top-k K Number of top results to export (default: 100)
--seed SEED Random seed for reproducibility (default: 42)

Reproducibility Guarantees

Fixed Random Seeds

All stochastic operations (model training, data shuffling) use deterministic seeds specified in the configuration file. Default seed is 42.

Pinned Dependencies

The requirements.txt file pins exact versions of all Python packages to ensure identical runtime environments.

pip freeze > requirements.lock # Generate locked dependencies

Dataset Checksum Verification

Dataset downloads are verified using SHA-256 checksums before processing begins. If the checksum fails, the pipeline halts with an error.

# Expected checksum stored in config
dataset_checksum: "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"

Run Metadata

Every run generates a repro.md file containing:

Configuration

Edit default.yml to customize pipeline behavior:

dataset:
url: "https://example.com/cy_dataset.csv"
checksum: "sha256:e3b0c44..."

model:
type: "random_forest"
n_estimators: 100
max_depth: 10

search:
top_k: 100
verification: true

reproducibility:
seed: 42
export_metadata: true

Troubleshooting

Checksum Verification Failed

If the dataset checksum doesn't match, the file may be corrupted. Delete cached data and re-download:

rm -rf ./cache/
python run_cy_search.py --config default.yml

Out of Memory

For large datasets, reduce batch size in config or use a machine with more RAM.

Module Not Found

Ensure all dependencies are installed:

pip install -r requirements.txt