Gabriel Busto

omnilaunch

i’m not crazy on the name, maybe something like forrest would be a better name (forrest run lol) like Forrest Gump. the main command is literally omni run <runner>, so forrest would probably be more appropriate.

the goal of this though is to make ai workflows reproducible. a workflow is anything someone wants to do with an AI model:

simple inference to test a model
optimized inference for serving at production scale
fine tuning or full training
benchmarking
reproducing results from a research paper (like TRM)
even karpathy’s nanochat might be a good use case for this (or other similar educational workflows)

this is basically achievable thanks to the powerhouse combo of Modal + Hugging Face. Hugging Face for models, datasets, and its super helpful libraries like diffuers and transformers. and Modal provides environment stability, simple python coding, access to a range of GPUs depending on your use case, and $30 of free usage per month on their free tier plan!

omnilaunch itself is basically just a python cli that wraps the modal cli with some niceties like building a runner into an archive with a hash, and exposing/documenting entrypoints and params for each entrypoint. the idea is that you build -> setup -> run.

build creates the archived runner and hash for versioning and consistency (and in the future will potentially involve signing).
setup deploys the modal script, builds the image, downloads any files like model weights or code, then runs some simple checks (if any) to make sure everything is ready to go.
run executes the specified runner to actually run code on GPUs!

it’s that simple. it’s just an opinionated way to create reproducible, versioned, modal scripts.

the README for it explains in a lot more context why i’m building this, and i think it’ll be useful. i plan to start out creating more runners for models/repos (maybe TRM, nanochat), but also trying to reproduce code for papers that don’t yet have code.

my hope is that if i do enough of the initial lift, people can manually (or with the help of AI) extend runners and/or add their own. the end goal is a vibrant ecosystem (kind of like an app store?) of AI workflows that “just work” to do all kinds of things.

want an optimized inference pipeline for an llm for your app? great, there’s a runner for that!

want to test out a bunch of 3d models to see which one is best for your use case? great, there are runners for those! you can spin them up and test them out without dependency hell. and when you find the one you’re happy with, just use that model’s runner to serve it for your app’s backend!

want to reproduce results from a research paper and benchmark the model? great, there’s a runner for that!

want to create a custom lora finetune of SDXL on your own dataset? great, there’s a runner for that! and, it’ll be cheaper than using hosted, done-for-you alternatives!

for now, i’m just focusing on using Modal as the backend/infra. but it’s possible for this to branch out and support other backends; especially ones that can be defined via infra as code (IaC). anything that enables someone to run an ai workflow without them complaining that it doesn’t work on their machine.