Just want to chime in that I’ve seen TabbyML used a fair bit at work. Tabby in particular can run locally on M1/M2/M3 and uses the Neural Engine via CoreML. The performance hit isn’t noticeable at all and most of what we use it for (large autocompletes in serialized formats) it excels at.
Just want to chime in that I’ve seen TabbyML used a fair bit at work. Tabby in particular can run locally on M1/M2/M3 and uses the Neural Engine via CoreML. The performance hit isn’t noticeable at all and most of what we use it for (large autocompletes in serialized formats) it excels at.