跳转到主要内容

category

We are entering the era of small & highly efficient models!

Generated by author via Midjourney v6

Context

I reported a few days ago about a new state-of-the-art, open-source model that outperforms all other models, including GPT-4.

This model is SQLCoder-70B.

In a nutshell, based on Meta’s recent CodeLlama-70B, Defog leveraged its own hand-crafted dataset and built a new, fine-tuned model.

The outcome? well see for yourself:

The model greatly outperforms GPT-4 and a wide range of SQL tasks!

Read more: You can read all about it, and test the model, here.

From SQLCoder-70B to SQLCoder-7B

Unfortunately, 70B parameter models are still too large to consider for offline integrations or to run them on your laptop.

Model Distillation

Model distillation is a machine learning process that teaches a smaller, simpler “student” model to act like a bigger, more complex “teacher” model. By learning from the teacher’s outputs, the student can make similar decisions without needing…