Quick Start
Prepare Data
Scald expects CSV files with training data (features + target) and test data (same features). The target column should be numeric for regression or categorical for classification.
feature_1,feature_2,target
1.2,3.4,0
2.3,4.5,1
CLI Usage
Run AutoML with a single command:
scald --train data/train.csv \
--test data/test.csv \
--target price \
--task-type regression \
--max-iterations 5
Task type must be either classification or regression. Iterations control Actor-Critic refinement cycles (default: 5).
Python API
For programmatic control:
import asyncio
from scald import Scald
async def main():
scald = Scald(max_iterations=5)
predictions = await scald.run(
train_path="data/train.csv",
test_path="data/test.csv",
target="target_column",
task_type="classification"
)
print(f"Generated {len(predictions)} predictions")
asyncio.run(main())
Execution Flow
The workflow progresses through data preview, exploratory analysis, preprocessing, model training, and evaluation. The Critic reviews each iteration and provides feedback. The Actor refines the approach based on this feedback, converging on an optimal solution. Each iteration improves upon the previous attempt through targeted adjustments.
Output
Scald creates a timestamped session directory containing detailed logs, generated code artifacts, and final predictions:
sessions/session_20250113_143022/
├── session.log # Execution logs
├── artifacts/ # Generated code per iteration
└── predictions.csv # Final predictions
Console output shows iteration progress, final metrics, cost, and execution time.
Troubleshooting
API key errors indicate missing or incorrect credentials in .env. Memory errors suggest insufficient RAM for the dataset size. Poor performance can be improved by increasing max_iterations for additional refinement cycles.
Continue to Architecture to understand how Scald works internally.