Open-Source AI Models: Getting Started

Explore the rise of open-source AI models, their applications, and how to get started without costs, including insights on popular platforms and legal considerations.

The development of AI has taken a turn towards decentralization and open-source solutions, moving beyond popular commercial offerings. Local LLMs (Large Language Models) enable private data handling, customizable systems tailored to specific tasks, and full control over the usage environment. However, launching these models requires a grasp of fundamental tools—from repositories and model weights to cloud environments and technical specifications.

In this new article from ForkLog, we will explore how to start engaging with autonomous AI models at no cost, what resources are available for beginners, and what developers of open-source solutions have to offer.

First Steps

For developers of open AI models, two primary platforms exist—GitHub and Hugging Face. The former is traditionally used for publishing source code, documentation, and installation scripts, while the latter has become a global hub for model weights, datasets, and ready-made ML solutions. Hugging Face hosts hundreds of thousands of trained neural networks, ranging from compact language models for smartphones to alternative media content generators and specialized algorithms for researchers and enthusiasts.

Community activity metrics help in selecting the right model. On GitHub, these are represented by the number of stars, update frequency, and issue resolution speed.

It’s also crucial to verify the product's origin and the repository's authenticity. Popular open-source builds often become bait for cybercriminals distributing malicious code disguised as well-known AI tools.

The next step in familiarizing oneself with local AI models is to test their functionality in practice. For users without powerful hardware, there are free and freemium cloud platforms available.

The most popular solution is Google Colab, a cloud environment that provides access to GPUs directly from the browser. A free subscription allows usage of a system with an Nvidia Tesla T4 accelerator for an average of two to four hours, depending on the load. Alternatives include Kaggle Notebooks and Hugging Face Spaces, the latter enabling interaction with models through ready-made web interfaces like Gradio or Streamlit.

When working with federated solutions, legal aspects should also be considered. Many popular projects are available under standard licenses like MIT or Apache 2.0, allowing for commercial use with minimal restrictions.

However, there are specific approaches as well. Meta distributes its flagship models under its own Llama 3.1 Community License, which requires special permission if the monthly audience exceeds 700 million users.

Strict copyleft licenses like the GNU General Public License also exist, mandating the disclosure of code for all derivative products.

My Personal ChatGPT Alternative

Among the vast array of general-purpose autonomous LLMs (similar to ChatGPT or Gemini), independent rankings based on blind testing and performance metrics like the Open LLM Leaderboard and Chatbot Arena help in selecting the right model.

Open LLM dashboard. Source: llm-stats.

The gold standard in this segment is the Llama model family from Meta and Qwen from Alibaba. These models excel at handling long contexts, managing multi-step queries, and are suitable for tasks like vibe coding and programming. Thanks to the open framework Ollama, their installation can be reduced to a single command.

During testing for this article, the model qwen3.5:2b was successfully run on a laptop without a discrete graphics card, powered by a Core i7 with 8 GB RAM and SSD, while closing heavy applications like messengers and browsers.

Source: Ollama.

“2b” indicates 2 billion parameters. The higher the number, the more complex relationships the neural network can grasp. For instance, a 2b model will learn basic grammar and simple commands, while a 122b model will remember facts from quantum physics, nuances of legal documents, and learn to plan tasks ten steps ahead.

Each parameter occupies physical space on the hard drive and, crucially, in RAM. The 2b model used about 4-5 GB of RAM, making it the maximum for such a machine. However, it took nearly three minutes to generate a simple response to the query “hello!”

Screenshot: ForkLog.

Here’s a rough classification of models:

0.5b-2b: Fast, can run on older laptops and smartphones. Ideal for simple tasks (command routing, basic summarization, auto-completion of short code lines). Prone to hallucinations on complex queries;
3b-4b: Balance of speed and quality. Good for mobile devices, smart home tasks, and automation. For example, a chatbot can be asked to dim the lights in a room, turn on the air conditioning, or raise a barrier;
7b-9b: Require about 6-8 GB of free RAM. Powerful models with context understanding and deep logic, suitable for programming and working with large texts.

In his recent study on vibe coding in Web3, Vladimir Sliper found that models like qwen2.5-coder:7b, qwen3:8b, llama3.2:3b, and deepseek-r1:8b would work on a MacBook Air with 16 GB RAM. More powerful models require investment in high-end PCs with top-tier graphics cards or deployment on rented servers.

Private Data Processing, 3D Printing, and User Protection

Options for interacting with open AI models depend on the user’s skill level and hardware. Some projects come packaged in user-friendly installers (.EXE files) or mobile apps that work “out of the box.” Others are abandoned GitHub repositories where installation turns into a lengthy battle with outdated library conflicts.

Today, applied AI models are used for much more than just text generation. A superficial analysis of the ecosystem reveals dozens of specialized tools for specific tasks.

Video and 3D Work:

CogVideoX: An open model from Zhipu AI for generating videos from text descriptions. It creates realistic short clips, has open weights, and can be deployed in environments like Jupyter or Colab with sufficient video memory;
DepthCrafter: A tool for extracting depth information from videos. Useful for VFX specialists and 3D modeling, it creates high-precision depth maps for each frame of a dynamic scene;
TRELLIS (Morfx 3D): An advanced system for generating 3D assets. This project allows for the creation of high-quality three-dimensional models from images or text prompts, optimizing them for use in game engines.

Transforming a photo of a train into a 3D-printable object using the web version of Morfx 3D. Screenshot: ForkLog.

Sound and Recognition:

CosyVoice: A multilingual speech synthesis model with voice cloning support. It generates realistic audio while preserving the intonations and emotional tones of the original speaker;
Whisper-WebGPU: An implementation of OpenAI's speech recognition model, rewritten to work directly in the browser using the WebGPU API. This means audio transcription occurs locally, ensuring complete privacy without sending audio files to external servers;
BirdNET-Analyzer: A neural network from Cornell University for identifying bird species by their songs. Unlike the popular Merlin Bird ID app, which relies heavily on cloud processing for some functions, BirdNET-Analyzer provides full control over the analysis process locally and can be used for bulk processing of gigabytes of field recordings.

Source: BirdNET.

Programming and User Protection:

Screenshot-to-Code: A utility for converting screenshots of web pages or mobile apps into clean HTML, Tailwind, or React code. Although often working in conjunction with paid APIs (Claude, GPT-4), the architecture allows for the integration of open multimodal models;
MinerU/Magic-PDF: A project for accurately extracting structured data from PDF documents. The model recognizes text, mathematical formulas, and tables, converting complex layouts into Markdown format;
Fawkes: Makes invisible changes to images, hindering facial recognition systems from identifying individuals. It is loaded locally on a PC via an .EXE file and can be used for avatars on social media;
Nightshade: “Poisons” pixels in images to confuse AI training algorithms if they operate without permission. For example, when prompted with “dog,” the model will generate an image of a cat.

Portrait of former US President Donald Trump before using Fawkes. Source: Library of Congress. After processing with Fawkes algorithms. Screenshot: ForkLog.

Struggles with Libraries and Initial Success

After installing AI models with user-friendly UI/UX, it was necessary to determine how easily a heavy repository could be deployed in the cloud for free.

FLUX.1 from Black Forest Labs is one of the leading image generation models, competing with corporate giants like Midjourney and Nano Banana. With the right hardware, the software can operate autonomously without internet access and allows for censorship circumvention.

The lightest free version, FLUX.1 Schnell, was used for testing. To facilitate interaction with open solutions, developers create targeted frameworks like Ollama. Popular graphical interfaces for image generation include ComfyUI and Forge.

During attempts to install the Forge implementation—cagliostro-forge-colab—one entire session of access to Google Colab's GPU was consumed. The issue turned out to be a classic rookie mistake—version mismatches between Python, the cloud environment, and the model itself. After four hours of vibe coding with the free version of Gemini 3 Flash, no success was achieved.

Ultimately, it was necessary to abandon the framework installation and proceed directly to deploying FLUX.1, but in the next free session on another day.

In practice, free Google Colab is more convenient to use on weekends, as the platform often provides extended access during that time.

The model occupied about 34 GB of cloud SSD disk space. However, all processes associated with the installation ultimately consumed around 86 GB.

Resources used by the Google Colab cloud machine. Screenshot: ForkLog.

Initially, FLUX.1 Schnell lacked sufficient video memory on the Nvidia Tesla T4 accelerator. The unoptimized configuration hit GPU limits until a series of simple code experiments with Gemini 3 Flash helped make adjustments using staged loading and memory clearing. As a result, about 3 GB of the available 16 GB of video memory was used during generation.

Screenshot: ForkLog.

Creating a single image took about seven minutes. Considering this was the free version of an open model, the result was pleasantly surprising.

Generated image using FLUX.1 Schnell. Source: ForkLog.

When attempting to generate an image of rock singer Marilyn Manson in Victorian style with FLUX.1 Schnell, it likely did not recognize the reference to a specific person and reproduced only a generalized visual template.

Generated image of the performer based on the prompt “draw Marilyn Manson in Victorian style” using FLUX.1 Schnell. Source: ForkLog.

Complex and Incredible

Open neural networks have long been used not only for generating texts and images but also for more niche and unusual tasks. A striking example of unconventional AI architecture application is the GameNGen model, capable of recreating the gameplay of the classic shooter DOOM in real-time.

Source: GameNGen/Github.

GameNGen does not simulate the game in the traditional sense but sequentially generates video: the model predicts what the next frame should look like after user actions (like movement or shooting). As a result, enemies, objects, and scene changes are not “calculated” by the engine but visually reproduced as the most likely outcome.

Among autonomous systems, the Voyager project stands out—an AI agent for Minecraft. It autonomously explores the game world, gathers resources, and continuously self-trains.

The scientific community is also actively adapting open AI for its needs, such as using algorithms for deciphering history. Researchers from Tel Aviv and Munich universities trained the Akkademia model to directly translate ancient Akkadian cuneiform into English. It allows for processing thousands of damaged clay tablets, accelerating archaeologists' work by several times.

No less interesting is the MinD-Vis project. This system analyzes functional MRI data and attempts to reconstruct images that the subject observes during scanning. In other words, it generates interpretations of what a person sees based on patterns of brain activity.

Such initiatives demonstrate that artificial intelligence has become a universal tool for understanding and modeling reality. The shift from closed corporate APIs to open-source code is forming a completely new paradigm for technological development. Today, any researcher, developer, or enthusiast can deploy infrastructure that just a few years ago required multi-million dollar investments in server farms.

The evolution of the ecosystem inevitably accompanies improvements in user experience: intuitive interfaces and automated deployment environments are replacing complex scripts. The use of tools like Ollama and Forge shows that privacy, lack of censorship, and high performance can coexist harmoniously within a single software solution. The future of the AI industry largely depends on how strong, scalable, and independent the open ecosystem remains.

From the Ground Up: Exploring Open-Source AI Models

First Steps

My Personal ChatGPT Alternative

Private Data Processing, 3D Printing, and User Protection

Struggles with Libraries and Initial Success

Complex and Incredible