About This Project
Overview
We build ML-guided search tools that drastically reduce the cost of finding rare Calabi-Yau geometries in large string-theory datasets, with full verification and reproducibility.
We achieve perfect precision and non-trivial recall in ML-guided search for rare targets, with sub-second runtime.
upg-strings is a research tool for applying machine learning to the computational exploration of Calabi-Yau manifolds in the string theory landscape. The project emphasizes reproducibility, verification, and transparent methodology over performance claims.
This is applied computation and AI tooling designed to accelerate discovery in theoretical physics datasets, not a claim to solve fundamental physics problems.
What Makes upg-strings Unique
upg-strings fills a critical gap in the string theory research toolkit. While existing tools focus on analyzing individual manifolds or classifying known geometries, upg-strings is the first search engine for the string landscape.
The Problem We Solve
The Kreuzer-Skarke database contains 474 million reflexive polytopes describing Calabi-Yau manifolds. Finding geometries with specific topological properties for phenomenological model building is like searching for needles in a haystack.
How We're Different
Existing Tools
CYTools: Analyzes geometry of individual manifolds
Research Papers: Classify or generate new manifolds
Traditional Approach: Manual selection or random sampling
upg-strings
Ranks & Searches: Finds promising candidates automatically
8.7x Better: Than random selection
98% Cost Reduction: Examine 100 instead of 5,000 manifolds
A Simple Scenario
Without upg-strings:
You need Calabi-Yau manifolds with small Euler characteristic (|χ| < 100) for your particle physics model.
- Option 1: Check all 474M candidates → Impossible
- Option 2: Random sampling → 9.7% success rate
- Option 3: Ask domain experts → Doesn't scale
With upg-strings:
- Run ML-guided search → 84% success rate in top 100
- Examine 100 candidates instead of thousands
- Complete in 5 seconds
- Get verified results with full reproducibility metadata
Performance That Matters
84% Precision@100
84 out of 100 top predictions are verified correct
8.7x Improvement
Nearly 9 times better than random selection
98% Cost Reduction
Drastically reduces search space and computation time
Our Approach
- Information Retrieval for Physics: Apply search engine principles to the string landscape
- Reproducible Pipeline: Every run is fully deterministic with checksummed data, pinned dependencies, and fixed random seeds
- Verification-First: All predictions are validated against ground truth with transparent metrics (Precision@k, Recall@k)
- Open Artifacts: Complete outputs (CSV, JSON, metadata) are exported for independent analysis
- Production-Ready: Web interface and REST API, not just research code
The Bigger Picture
Think of upg-strings as part of the Calabi-Yau research stack:
- Generation: Genetic algorithms create new manifolds
- Search: upg-strings finds promising candidates (← You are here)
- Analysis: CYTools computes detailed geometry
- Metrics: cymetric approximates Ricci-flat metrics
- Classification: ML models verify topological properties
upg-strings bridges the gap between having a massive database and doing detailed analysis. It answers: "Which manifolds should I analyze?"