About This Project
Overview
We build ML-guided search tools that drastically reduce the cost of finding rare Calabi-Yau geometries in large string-theory datasets, with full verification and reproducibility.
We achieve perfect precision and non-trivial recall in ML-guided search for rare targets, with sub-second runtime.
upg-strings is a research tool for applying machine learning to the computational exploration of Calabi-Yau manifolds in the string theory landscape. The project emphasizes reproducibility, verification, and transparent methodology over performance claims.
This is applied computation and AI tooling designed to accelerate discovery in theoretical physics datasets, not a claim to solve fundamental physics problems.
What Makes upg-strings Unique
upg-strings fills a critical gap in the string theory research toolkit. While existing tools focus on analyzing individual manifolds or classifying known geometries, upg-strings is the first search engine for the string landscape.
The Problem We Solve
The Kreuzer-Skarke database contains 474 million reflexive polytopes describing Calabi-Yau manifolds. Finding geometries with specific topological properties for phenomenological model building is like searching for needles in a haystack.
How We're Different
Existing Tools
CYTools: Analyzes geometry of individual manifolds
Research Papers: Classify or generate new manifolds
Traditional Approach: Manual selection or random sampling
upg-strings
Ranks & Searches: Finds promising candidates automatically
8.7x Better: Than random selection
98% Cost Reduction: Examine 100 instead of 5,000 manifolds
A Simple Scenario
Without upg-strings:
You need Calabi-Yau manifolds with small Euler characteristic (|χ| < 100) for your particle physics model.
- Option 1: Check all 474M candidates → Impossible
- Option 2: Random sampling → 9.7% success rate
- Option 3: Ask domain experts → Doesn't scale
With upg-strings:
- Run ML-guided search → 84% success rate in top 100
- Examine 100 candidates instead of thousands
- Complete in 5 seconds
- Get verified results with full reproducibility metadata
Performance That Matters
84% Precision@100
84 out of 100 top predictions are verified correct
8.7x Improvement
Nearly 9 times better than random selection
98% Cost Reduction
Drastically reduces search space and computation time
Our Approach
- Information Retrieval for Physics: Apply search engine principles to the string landscape
- Reproducible Pipeline: Every run is fully deterministic with checksummed data, pinned dependencies, and fixed random seeds
- Verification-First: All predictions are validated against ground truth with transparent metrics (Precision@k, Recall@k)
- Open Artifacts: Complete outputs (CSV, JSON, metadata) are exported for independent analysis
- Production-Ready: Web interface and REST API, not just research code
The Bigger Picture
Think of upg-strings as part of the Calabi-Yau research stack:
- Generation: Genetic algorithms create new manifolds
- Search: upg-strings finds promising candidates (← You are here)
- Analysis: CYTools computes detailed geometry
- Metrics: cymetric approximates Ricci-flat metrics
- Classification: ML models verify topological properties
upg-strings bridges the gap between having a massive database and doing detailed analysis. It answers: "Which manifolds should I analyze?"
Roadmap
- Integrate actual CYTools library for full KS database access
- User-definable search criteria (custom topological targets)
- Expand to additional Calabi-Yau datasets beyond current baseline
- Integrate advanced ML architectures (graph neural networks, transformers)
- Develop automated verification against algebraic geometry constraints
- Add mirror symmetry pair detection
Background
This work is part of an ongoing effort in applied computation and AI tooling for scientific research. The goal is to build reliable, transparent tools that researchers can trust and extend.
The project does not claim to solve string theory or make predictions about physical reality. It is a computational tool for exploring mathematical structures in large datasets.
Contact
For questions, collaboration inquiries, or bug reports:
Acknowledgments
This project builds on publicly available Calabi-Yau datasets and open-source machine learning libraries. We are grateful to the broader computational physics and ML communities for their foundational work.