Documentation

For Scientists

Use the application to quickly surface rare Calabi-Yau candidates while keeping a reproducible record.

Installation Prerequisites

Quick Install

git clone https://github.com/upggr/compute.upg.gr.git
cd compute.upg.gr
pip install -r requirements.txt

How to Run

Basic Run

python cy_search_real.py

Web App

python app.py

Expected Runtime

5-15 minutes on standard hardware, depending on dataset size and CPU performance.

Output Location

All artifacts are saved to ./output/ directory:

Run Exports

The Run page provides download buttons for JSON/CSV plus tool-friendly exports for CYTools, cymetric, Sage, and Mathematica.

Import / Export (API)

Use the REST API to import datasets or export results programmatically.

# Import: score custom candidates and save a run
curl -X POST https://compute.upg.gr/api/score-custom -H "Content-Type: application/json" \\
-d '{"dataset_id":"kreuzer-skarke","rows":[[12,45,66,3.75,924]],"top_k":20,"seed":42,"verify":true,"save":true}'

# Export: fetch results in tool-friendly formats
curl -o results.json "https://compute.upg.gr/api/export/RUN_ID?format=json"
curl -o results.csv "https://compute.upg.gr/api/export/RUN_ID?format=csv"
curl -o cytools.json "https://compute.upg.gr/api/export/RUN_ID?format=cytools"
curl -o cymetric.json "https://compute.upg.gr/api/export/RUN_ID?format=cymetric"
curl -o candidates.sage "https://compute.upg.gr/api/export/RUN_ID?format=sage"
curl -o candidates.wl "https://compute.upg.gr/api/export/RUN_ID?format=mathematica"

Per-Tool Schemas

Exports share the same candidate objects, packaged per tool. Each candidate includes the fields returned by the dataset formatter.

# Kreuzer-Skarke candidate fields
rank, h11, h21, euler_char, score, verified_target

# CY5-Folds candidate fields
rank, h11, h21, h31, euler_char, score, verified_target

# Heterotic candidate fields
rank, h11, h21, euler_char, hodge_balance, n_generations, score, verified_target

# CYTools / cymetric JSON wrapper
{ "schema": "cytools-candidates-v1", "run_metadata": {...}, "candidates": [ ... ] }
{ "schema": "cymetric-candidates-v1", "run_metadata": {...}, "candidates": [ ... ] }

# Sage output
candidates = [ { ... }, { ... } ]

# Mathematica output
candidates = {<| "rank" -> 1, "h11" -> 12, ... |>, ...};

Local Export Scripts

Use the adapter script to convert a saved results JSON file into tool-specific formats.

python scripts/export_adapters.py --input static/data/results_RUN_ID.json --format cytools --output cytools.json
python scripts/export_adapters.py --input static/data/results_RUN_ID.json --format cymetric --output cymetric.json
python scripts/export_adapters.py --input static/data/results_RUN_ID.json --format sage --output candidates.sage
python scripts/export_adapters.py --input static/data/results_RUN_ID.json --format mathematica --output candidates.wl

Bring Your Own Data

Paste custom candidates as CSV rows in the Run page. Each row must include all feature columns for the selected dataset in order:

Command-Line Options

--config CONFIG_FILE Path to configuration YAML (default: default.yml)
--verify Verify top results against ground truth
--export-artifacts Generate CSV/JSON output files
--top-k K Number of top results to export (default: 100)
--seed SEED Random seed for reproducibility (default: 42)

Reproducibility Guarantees

Fixed Random Seeds

All stochastic operations (model training, data shuffling) use deterministic seeds specified in the configuration file. Default seed is 42.

Pinned Dependencies

The requirements.txt file pins exact versions of all Python packages to ensure identical runtime environments.

pip freeze > requirements.lock # Generate locked dependencies

Dataset Checksum Verification

Dataset downloads are verified using SHA-256 checksums before processing begins. If the checksum fails, the pipeline halts with an error.

# Expected checksum stored in config
dataset_checksum: "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"

Run Metadata

Every run generates a repro.md file containing:

Configuration

Edit default.yml to customize pipeline behavior:

dataset:
url: "https://example.com/cy_dataset.csv"
checksum: "sha256:e3b0c44..."

model:
type: "random_forest"
n_estimators: 100
max_depth: 10

search:
top_k: 100
verification: true
seed: 42