Documentation
For Scientists
Use the application to quickly surface rare Calabi-Yau candidates while keeping a reproducible record.
- Run searches in the web UI or CLI to rank candidates by predicted likelihood
- Verify top-k hits and track precision/recall for experimental comparison
- Export CSV/JSON artifacts for downstream analysis and sharing
- Record random seeds and dataset metadata for reproducibility
Installation Prerequisites
- Python 3.9 or higher
- pip package manager
- 4GB RAM minimum (8GB recommended)
- 2GB free disk space for datasets and outputs
Quick Install
git clone https://github.com/upggr/compute.upg.gr.git
cd compute.upg.gr
pip install -r requirements.txt
How to Run
Basic Run
python cy_search_real.py
Web App
python app.py
Expected Runtime
5-15 minutes on standard hardware, depending on dataset size and CPU performance.
Output Location
All artifacts are saved to ./output/ directory:
output/results_topk.csv- Top-k ranked resultsoutput/metrics.json- Performance metricsoutput/repro.md- Reproducibility report
Run Exports
The Run page provides download buttons for JSON/CSV plus tool-friendly exports for CYTools, cymetric, Sage, and Mathematica.
Import / Export (API)
Use the REST API to import datasets or export results programmatically.
# Import: score custom candidates and save a run
curl -X POST https://compute.upg.gr/api/score-custom -H "Content-Type: application/json" \\
-d '{"dataset_id":"kreuzer-skarke","rows":[[12,45,66,3.75,924]],"top_k":20,"seed":42,"verify":true,"save":true}'
# Export: fetch results in tool-friendly formats
curl -o results.json "https://compute.upg.gr/api/export/RUN_ID?format=json"
curl -o results.csv "https://compute.upg.gr/api/export/RUN_ID?format=csv"
curl -o cytools.json "https://compute.upg.gr/api/export/RUN_ID?format=cytools"
curl -o cymetric.json "https://compute.upg.gr/api/export/RUN_ID?format=cymetric"
curl -o candidates.sage "https://compute.upg.gr/api/export/RUN_ID?format=sage"
curl -o candidates.wl "https://compute.upg.gr/api/export/RUN_ID?format=mathematica"
Per-Tool Schemas
Exports share the same candidate objects, packaged per tool. Each candidate includes the fields returned by the dataset formatter.
# Kreuzer-Skarke candidate fields
rank, h11, h21, euler_char, score, verified_target
# CY5-Folds candidate fields
rank, h11, h21, h31, euler_char, score, verified_target
# Heterotic candidate fields
rank, h11, h21, euler_char, hodge_balance, n_generations, score, verified_target
# CYTools / cymetric JSON wrapper
{ "schema": "cytools-candidates-v1", "run_metadata": {...}, "candidates": [ ... ] }
{ "schema": "cymetric-candidates-v1", "run_metadata": {...}, "candidates": [ ... ] }
# Sage output
candidates = [ { ... }, { ... } ]
# Mathematica output
candidates = {<| "rank" -> 1, "h11" -> 12, ... |>, ...};
Local Export Scripts
Use the adapter script to convert a saved results JSON file into tool-specific formats.
python scripts/export_adapters.py --input static/data/results_RUN_ID.json --format cytools --output cytools.json
python scripts/export_adapters.py --input static/data/results_RUN_ID.json --format cymetric --output cymetric.json
python scripts/export_adapters.py --input static/data/results_RUN_ID.json --format sage --output candidates.sage
python scripts/export_adapters.py --input static/data/results_RUN_ID.json --format mathematica --output candidates.wl
Bring Your Own Data
Paste custom candidates as CSV rows in the Run page. Each row must include all feature columns for the selected dataset in order:
- Kreuzer-Skarke: h11, h21, euler_abs, hodge_ratio, c2_h11
- CY5-Folds: h11, h21, h31, euler, euler_abs, hodge_sum
- Heterotic: h11, h21, euler, euler_abs, hodge_ratio, hodge_balance, n_gen
Command-Line Options
--config CONFIG_FILE Path to configuration YAML (default: default.yml)
--verify Verify top results against ground truth
--export-artifacts Generate CSV/JSON output files
--top-k K Number of top results to export (default: 100)
--seed SEED Random seed for reproducibility (default: 42)
Reproducibility Guarantees
Fixed Random Seeds
All stochastic operations (model training, data shuffling) use deterministic seeds specified in the configuration file. Default seed is 42.
Pinned Dependencies
The requirements.txt file pins exact versions of all Python packages to ensure identical runtime environments.
pip freeze > requirements.lock # Generate locked dependencies
Dataset Checksum Verification
Dataset downloads are verified using SHA-256 checksums before processing begins. If the checksum fails, the pipeline halts with an error.
# Expected checksum stored in config
dataset_checksum: "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"
Run Metadata
Every run generates a repro.md file containing:
- Timestamp and hostname
- Git commit hash
- Python version and environment info
- Configuration parameters used
- Dataset checksum verified
- Random seeds employed
Configuration
Edit default.yml to customize pipeline behavior:
dataset:
url: "https://example.com/cy_dataset.csv"
checksum: "sha256:e3b0c44..."
model:
type: "random_forest"
n_estimators: 100
max_depth: 10
search:
top_k: 100
verification: true
seed: 42