A breakthrough in the performance of local language models on Mac computers has become a reality. Ollama, currently the most popular runtime for supporting LLMs (Large Language Models) on personal computers, has announced the introduction of support for the MLX framework. This open-source solution from Apple, designed specifically for machine learning, allows for squeezing the maximum out of the Apple Silicon architecture. This change, combined with new compression methods and cache optimization, drastically pushes the boundaries of what users can achieve without relying on the cloud.

The timing of this update is not accidental. We are currently observing a rapid surge in interest in local AI instances, driven by the success of projects such as OpenClaw. This project gained over 300,000 stars on GitHub in record time and became the foundation for high-profile experiments like Moltbook. It caused a particular stir in China, but this wave is spreading across the globe, prompting professionals to seek alternatives to paid subscriptions and the limits imposed by AI sector giants.

MLX Architecture and the End of Resource Waste

The key to Ollama's new performance is deep integration with MLX. Until now, many AI tools on macOS operated in a universal manner, which did not always allow for full utilization of the specific Unified Memory in M-series chips. Thanks to MLX, Ollama can now communicate with GPU units and the Neural Engine in a nearly native way. This translates not only to a higher number of generated tokens per second but, above all, to smarter resource management when working with multiple tasks simultaneously.

Ollama performance chart on Mac — Performance increase in the new version of Ollama thanks to MLX optimization.

Parallel to the support for the Apple framework, Ollama introduced support for the NVFP4 format from Nvidia. This is an advanced model compression method (quantization) that allows for a significant reduction in VRAM requirements while maintaining high response precision. For Mac users, this means that models that previously required massive amounts of RAM can now fit into smaller hardware configurations, while running faster thanks to an improved data caching system (caching performance).

Qwen3.5 and the Entry Barrier for Professionals

The new functionalities debuted in version Ollama 0.19, which currently holds preview status. At this moment, the list of supported models utilizing the full potential of MLX is short but impressive — it is opened by a variant of the Qwen3.5 model from Alibaba, featuring 35 billion parameters. The choice of this specific model is not accidental; the Qwen family has gained recognition for its excellent quality-to-size ratio, especially in tasks related to logic and programming.

Hardware Requirements: Mac computer with Apple Silicon processor (M1, M2, M3, M4 or newer).
RAM: Minimum 32GB of unified memory for the Qwen3.5 35B model.
Software Version: Ollama 0.19 (Preview).
Key Technologies: MLX Framework, NVFP4 compression, improved caching.

While the requirement of having 32GB of RAM may seem high for the average home user, it is a standard for professionals involved in data analysis or programming. Ollama recognizes this trend, as evidenced by the recent expansion of integration with Visual Studio Code. Developers are increasingly moving away from tools like Claude Code or ChatGPT Codex in favor of local solutions to avoid high subscription costs and restrictive rate limits that can paralyze work at the least convenient moment.

Ollama interface and speed tests — Memory consumption optimization allows for smoother work with large language models.

Privacy and Independence Drive Change

The development of local models is not just a matter of pure performance, but above all, data sovereignty. Companies and independent creators are increasingly willing to invest in more powerful Apple hardware, knowing that their source code or confidential documents will never leave the local disk. The success of OpenClaw proved that there is a huge demand for tools that give the user full control over the model's inference process.

Thanks to the new optimizations in Ollama, the line between a model running in the cloud and one running locally is beginning to blur. The ability to run a model with a scale of 35 billion parameters on a laptop with the fluidity offered by MLX is a game-changer. The Apple Silicon architecture, which was designed from the beginning with energy efficiency and memory bandwidth in mind, has finally seen software that fully exploits its unique features in the context of generative artificial intelligence.

MLX support in Ollama is just the beginning of a broader consolidation of AI tools around dedicated hardware. As the library of supported models expands, Apple Silicon will become the default platform for AI developers who value mobility without compromising on performance. Local artificial intelligence is ceasing to be the domain of enthusiasts building powerful workstations with multiple graphics cards and is becoming a real work tool available within reach of every MacBook owner with an adequate supply of RAM.

Running local models on Macs gets faster with Ollama's MLX support

MLX Architecture and the End of Resource Waste

Read also

Qwen3.5 and the Entry Barrier for Professionals

Privacy and Independence Drive Change

More from AI

Cisco CEO Chuck Robbins wants data centers in space

How to use the new ChatGPT app integrations, including DoorDash, Spotify, Uber, and others

Spain’s Xoople raises $130 million Series B to map the Earth for AI

Copilot is ‘for entertainment purposes only,’ according to Microsoft’s terms of use

Related Articles

“The problem is Sam Altman”: OpenAI Insiders don’t trust CEO

Google quietly launched an AI dictation app that works offline

Iran threatens ‘Stargate’ AI data centers

Iran threatens OpenAI’s Stargate data center in Abu Dhabi

Comments