Documentation

For Scientists

Use the application to quickly surface rare Calabi-Yau candidates while keeping a reproducible record.

Run searches in the web UI or CLI to rank candidates by predicted likelihood
Verify top-k hits and track precision/recall for experimental comparison
Export CSV/JSON artifacts for downstream analysis and sharing
Record random seeds and dataset metadata for reproducibility

Installation Prerequisites

Python 3.9 or higher
pip package manager
4GB RAM minimum (8GB recommended)
2GB free disk space for datasets and outputs

Quick Install

                git clone https://github.com/upggr/compute.upg.gr.git

cd compute.upg.gr

pip install -r requirements.txt

How to Run

Basic Run

python cy_search_real.py

Web App

python app.py

Expected Runtime

5-15 minutes on standard hardware, depending on dataset size and CPU performance.

Output Location

All artifacts are saved to ./output/ directory:

output/results_topk.csv - Top-k ranked results
output/metrics.json - Performance metrics
output/repro.md - Reproducibility report

Run Exports

The Run page provides download buttons for JSON/CSV plus tool-friendly exports for CYTools, cymetric, Sage, and Mathematica.

Import / Export (API)

Use the REST API to import datasets or export results programmatically.

                # Import: score custom candidates and save a run

curl -X POST https://compute.upg.gr/api/score-custom -H "Content-Type: application/json" \\

  -d '{"dataset_id":"kreuzer-skarke","rows":[[12,45,66,3.75,924]],"top_k":20,"seed":42,"verify":true,"save":true}'

# Export: fetch results in tool-friendly formats

curl -o results.json "https://compute.upg.gr/api/export/RUN_ID?format=json"

curl -o results.csv "https://compute.upg.gr/api/export/RUN_ID?format=csv"

curl -o cytools.json "https://compute.upg.gr/api/export/RUN_ID?format=cytools"

curl -o cymetric.json "https://compute.upg.gr/api/export/RUN_ID?format=cymetric"

curl -o candidates.sage "https://compute.upg.gr/api/export/RUN_ID?format=sage"

curl -o candidates.wl "https://compute.upg.gr/api/export/RUN_ID?format=mathematica"

Per-Tool Schemas

Exports share the same candidate objects, packaged per tool. Each candidate includes the fields returned by the dataset formatter.

                # Kreuzer-Skarke candidate fields

rank, h11, h21, euler_char, score, verified_target

# CY5-Folds candidate fields

rank, h11, h21, h31, euler_char, score, verified_target

# Heterotic candidate fields

rank, h11, h21, euler_char, hodge_balance, n_generations, score, verified_target

# CYTools / cymetric JSON wrapper

{ "schema": "cytools-candidates-v1", "run_metadata": {...}, "candidates": [ ... ] }

{ "schema": "cymetric-candidates-v1", "run_metadata": {...}, "candidates": [ ... ] }

# Sage output

candidates = [ { ... }, { ... } ]

# Mathematica output

candidates = {<| "rank" -> 1, "h11" -> 12, ... |>, ...};

Local Export Scripts

Use the adapter script to convert a saved results JSON file into tool-specific formats.

                python scripts/export_adapters.py --input static/data/results_RUN_ID.json --format cytools --output cytools.json

python scripts/export_adapters.py --input static/data/results_RUN_ID.json --format cymetric --output cymetric.json

python scripts/export_adapters.py --input static/data/results_RUN_ID.json --format sage --output candidates.sage

python scripts/export_adapters.py --input static/data/results_RUN_ID.json --format mathematica --output candidates.wl

Bring Your Own Data

Paste custom candidates as CSV rows in the Run page. Each row must include all feature columns for the selected dataset in order:

Kreuzer-Skarke: h11, h21, euler_abs, hodge_ratio, c2_h11
CY5-Folds: h11, h21, h31, euler, euler_abs, hodge_sum
Heterotic: h11, h21, euler, euler_abs, hodge_ratio, hodge_balance, n_gen

Command-Line Options

                --config CONFIG_FILE    Path to configuration YAML (default: default.yml)

--verify                Verify top results against ground truth

--export-artifacts      Generate CSV/JSON output files

--top-k K              Number of top results to export (default: 100)

--seed SEED            Random seed for reproducibility (default: 42)

Reproducibility Guarantees

Fixed Random Seeds

All stochastic operations (model training, data shuffling) use deterministic seeds specified in the configuration file. Default seed is 42.

Pinned Dependencies

The requirements.txt file pins exact versions of all Python packages to ensure identical runtime environments.

pip freeze > requirements.lock # Generate locked dependencies

Dataset Checksum Verification

Dataset downloads are verified using SHA-256 checksums before processing begins. If the checksum fails, the pipeline halts with an error.

                # Expected checksum stored in config

dataset_checksum: "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"

Run Metadata

Every run generates a repro.md file containing:

Timestamp and hostname
Git commit hash
Python version and environment info
Configuration parameters used
Dataset checksum verified
Random seeds employed

Configuration

Edit default.yml to customize pipeline behavior:

                dataset:

  url: "https://example.com/cy_dataset.csv"

  checksum: "sha256:e3b0c44..."

model:

  type: "random_forest"

  n_estimators: 100

  max_depth: 10

search:

  top_k: 100

  verification: true

  seed: 42