> We demonstrate the versatility and performance of [our implementation in Python], focusing on programs that invoke heavy external computation through the use of large language models (LLMs) and other APIs. Across five scripts, we compare to several state-of-the-art baselines and show that opportunistic evaluation improves total running time (up to 6.2×) and latency (up to 12.7×) compared to standard sequential Python, while performing very close (between 1.3% and 18.5% running time overhead) to hand-tuned manually optimized asynchronous Rust. For Tree-of-Thoughts, a prominent LLM reasoning approach, we achieve a 6.2× performance improvement over the authors’ own implementation.
Is there a public repository with the code?