Best ollama models for coding reddit Translation. just get q8 if your gpu Browse Ollama's library of models. These tasks include: Text generation. 1, therefore it has a long context window of up to 128k tokens. It can write join codes accurately. The prompt template also doesn't seem to be supported by default in oobabooga so you'll need to add it manually I don't Roleplay but I liked Westlakes model for uncensored creative writing. This comprehensive guide will take you through everything you need to know about selecting and maximizing the potential of Ollama models for your coding journey. Also does it make sense to run these models locally when I can just access gpt3. So far, they all seem the same regarding code generation. dev and ollama, those are easy enough to deploy for vscode code assistant. If you allow models to work together on the code base and allow them to criticize each other and suggest improvements to the code, the result will be better, this is if you need the best possible code, but it turns out to be expensive. I suggest you to first understand what size of model works for you, then try different model families of similar size (i. Best overall / general use Best for coding Best for RAG Best conversational (chatbot applications) Best uncensored Yeah, exactly. So the best thing is the work of a team of models and not just one. ai, you will be greeted with a comprehensive list of available models. 1: 637mb. I am not a coder but they helped me write a small python program for my use case. On my pc I use codellama-13b with ollama and am downloading 34b to see if it runs at decent speeds. 5 on the web or even a few trial runs of gpt4? We would like to show you a description here but the site won’t allow us. Maybe you should evaluate models based on cloud instance APIs and see what works well in the closed big model space, then down into the 70B, 33B, 13B, 7B range open models and see if anything you can run locally is satisfactory in performance. This is the kind of behavior I expect out of a 2. g. I use the Phi-3 model, that you can install directly in the app. You should have no issue running models up to 120b with that much RAM, but large models will be incredibly slow (like 10+ minutes per response) running on CPU only. Open WebUI + Ollama Backend: Initially, I set up Open WebUI (via Pinokio) with Ollama as the backend (installed via winget). precision, while fp16 does 16. Key Features: We would like to show you a description here but the site won’t allow us. I think this question should be discussed every month. What other models do We would like to show you a description here but the site won’t allow us. Will occupy about 53GB of RAM and 8GB of VRAM with 9 offloaded layers using llama. For coding I had the best experience with Codeqwen models. Maybe its my settings which do work great on the other models, but it had multiple logical errors, character mixups, and it kept getting my name wrong. 130 votes, 108 comments. Hey, fellow M1 16gb user! I personally use the following models: OpenHermes Neural 7B q4: 4. Am I missing something? I've been using magicoder for writing basic SQL stored procedures and it's performed pretty strongly, especially for such a small model. 5 and GPT-4. I have a 4090 and an I7 with 64gb ram ddr4. TinyLlama-1. Run locally. The developer treats local models as first class citizens. As a coding agent, Devstral is text-only and before fine-tuning from Mistral-Small-3. In the rapidly evolving landscape of software development, Ollama models are emerging as game-changing tools that are revolutionizing how developers approach their craft. q4 is cut at 4 numbers behind . I am now looking to do some testing with open source LLM and would like to know what is the best pre-trained model to use. Large Language Models (LLMs) have profoundly reshaped the software development landscape by May 2025. . 1 8B for summarization text of all types. That said you can check out codeqwen 7b or the wavecoder models from microsoft(6. Dec 23, 2024 · What are Ollama Models? Ollama models are large language models (LLMs) developed by Ollama. So you have to wrangle a little bit with it using the user role. Some models are better than others (I think what makes chatgpt really good is that most likely they have training data that is good at following directions). You ask the model to narrate events about a simulated virtual world, virtual characters and interact with them. There one generalist model that i sometime use/consult when i cant get result from smaller model. 3b at the moment. My current and previous MacBooks have had 16GB and I've been fine with it (and run 13b models quite well with Ollama; my current choice is wizard-vicuna-uncensored:13b, but I'm always looking for a better general-purpose uncensored model), but given local models I think I'm going to have to go to whatever will be the maximum RAM available for This uses models in GGML/GGUF format. Key Features: In the rapidly evolving landscape of software development, Ollama models are emerging as game-changing tools that are revolutionizing how developers approach their craft. This blog explores the top Ollama models that developers and programmers can use to Asking the model a question in just 1 go. It uses self-reflection to reiterate on it's own output and decide if it needs to refine the answer. Mar 10, 2025 · I run Ollama on my desktop with 64GB ram and an RTX4080. As a result ended up coding a small recommendation system, powered with Llama3-7b model, which suggests topics to read on HackerNews. you see the endings there ? thats the quality of precision of the models. Edit: This is the best open-source model that I've tried for SQL queries. I think it ultimately boils down to wizardcoder-34B finetune of llama and magicoder-6. They handle a range of natural language processing (NLP) tasks with ease. Plus, Ollama's uncensored models left something to be desired. : Llama, Mistral, Phi). For coding the situation is way easier, as there are just a few coding-tuned model. To narrow down your options, you can sort this list using different parameters: Featured: This sorting option showcases the models recommended by the Ollama team as the best choices for most users. i mostly recommend q6 for performance speed ratio, sadly most models on ollama dont come q6. true. I currently use llama3. You might look into mixtral too as it's generally great at everything, including coding, but I'm not done with evaluating it yet for my domains. I use this server to run my automations using Node RED (easy for me because it is visual programming), run a Gotify server, a PLEX media server and an InfluxDB server. I'm using : Mistral-7B-claude-chat. Llama 3 70b Q5_K_M GGUF on RAM + VRAM. I have been running a Contabo ubuntu VPS server for many years. These models learn from huge datasets of text and code. I guess you can try to offload 18 layers on GPU and keep even more spare RAM for yourself. dev on VSCode on MacBook M1 Pro. I also recommend openwebui as a front end, I really like its prompt templating that allow to directly use the clipboard. May 2, 2025 · Introduction. But alas, I encountered some RAG-related and backup issues. q5_k_m. Evolving beyond basic code completion, these sophisticated AI co-pilots now debug complex code, refactor entire codebases, generate comprehensive documentation, translate between programming languages, and even assist in high-level system design. I have a 3080Ti 12GB so chances are 34b is too big but 13b runs incredibly quickly through ollama. The problem is Ollama doesnt allow any other roles other than user, system and assistant while Nous's Hermes 2 model was finetuned on another role called <tool>. With ollama I can run both these models at decent speed on my phone (galaxy s22 ultra). I have tested it with GPT-3. Imo codellama-instruct is the best for coding questions. I have tried codellama:7b codegemma:2b llama2:8b I got best tab completion results with codellama model, while best code implementation suggestion in chat with llama3 for Java. 7b at size) We would like to show you a description here but the site won’t allow us. Larger quantized models often outperform their smaller non-quantized counterparts. e. With more advanced models you can have a coherent inventory, health points etc. I tried starcoder2:7b for a fairly simple case in python just to get a feel of it, and it generated back whole bunch of C/C++ code with a lot of comments in Chinese, and it kept printing it out like in an infinite loop. Roleplay. 37gb . " For my own personal use, Command R+ is the best local model since Mixtral 8x7B, and I've been using either since their release. This you have to keep switching up the model because their training data does not 100% overlap Also people have different tastes for what they want the LLM to do to their code on the scale between “do too little” and “do too much” The easier way is to install the 'Private AI' app. I don't have too much experience with them, since the bigger models that I use tend to be pretty good at answering coding-related questions, even if they were not exclusively trained on coding problems. But they are all generalist models. We would like to show you a description here but the site won’t allow us. Also, I've yet to find a model that is "only useful for people who want creative text gen" as every model I've tested that's been good for that has also been good for "standard NLP tasks. 1 the vision encoder was removed. task(s), language(s), latency, throughput, costs, hardware, etc) Im new to LLMs and finally setup my own lab using Ollama. Genuinely this, not in a shitty "ugh, we get asked this so much" way but in a "keeping a thread on the current recommended models at different sizes that gets refreshed frequently is a good idea, because there's just so much and it's very hard to follow sometimes for some people. Hello everyone currently looking for recommendations for a decent model To. But for fiction I really disliked it, when I tried it yesterday I had a terrible experience. how do i combine snippets ollama provides into 1 long block of code aswell? is there something like an interface, model, project i should be using as a ollama coding buddy? feel free to add onto this if you wish too. I'd really recommend you play around with 7b models at q4, and try it against a few real-life test cases to see what works. models get trained at 32 most times and then quantized down so people can use them locally. The app is free, except for the larger models, which you probably don't want to run on a phone anyway. just get q8 if your gpu May 21, 2025 · The model achieves remarkable performance on SWE-bench which positionates it as the #1 open source model. Ollama also works with third-party graphical user interface (GUI) tools. If applicable, please separate out your best models by use case. Thanks for the answer, I'll try it out more. Can ollama help me in some ways or do the heavy lifting and what coding languages or engines would i have to use along side ollama. The best ones for me so far are: deepseek-coder, oobabooga_CodeBooga and phind-codellama (the biggest you can run). Some models are better at following instructions but you’ll find yourself doing 8-9 shots to get something correct. After searching for this question, the newest post on this question was 5 months ago, so I'm looking for an updated answer. For coding related task that is not actual code, like best strategie to solve a probleme and such : TheBloke/tulu-2-dpo-70B-GGUF I never go all the way to TheBloke/goliath-120b-GGUF, but its on standby. Dec 2, 2024 · Ollama offers a range of models tailored to diverse programming needs, from code generation to image reasoning. Recently I played a bit with LLMs, specifcally exploring ways of running the models locally and building prompts using LangChain. and give you a virtual narrated world where you can do anything. 8x7B: BagelMIsteryTour-v2-8x7B, probably the best RP model I've ever ran since it hits a great balance of prose and intelligence. Specifically Ollama because that's the easiest way to build with LLMs right now. 1,25 token\s. I'll update in some time. 1 on English academic benchmarks. It is finetuned from Mistral Small 3. but otherwise pretty good. Sorting the Model List. 7B but what about highly performant models like smaug-72B? Intending to use the llm with code-llama on nvim. There is no 7b model that actually is similar in performance. Many folks frequently don't use the best available model because it's not the best for their requirements / preferences (e. This method has a marked improvement on code generating abilities of an LLM. May 21, 2025 · The model achieves remarkable performance on SWE-bench which positionates it as the #1 open source model. What's the current best general use model that will work with a RTX 3060 12GB VRAM and 16GB system RAM? This has not been my experience. Recommend sticking to 13b models unless you're incredibly patient. Opus is most likely in 100b+(probably 300b?) parameters so its going to be far better then a 7b model even specialized at coding. 7B model not a 13B llama model. Though that model is to verbose for instructions or tasks it's really a writing model only in the testing I did (limited I admit). These models are on par with or better than equivalently sized fully open models, and competitive with open-weight models such as Llama 3. Wish it didn't require a beefy PC though. I prefer to chat with LLMs in my native language German, in addition to English, and few local models can do that as well as those from Mistral and Cohere. Command R+ has replaced Mixtral as my daily driver. LM Studio: Then I switched gears to LM Studio, which boasts an impressive array of uncensored I am a hobbyist with very little coding skills. " After a lot of failure and disappointments with running Autogen with local models, I tried the rising star of agent frameworks, CrewAI. so you only get 80-90% same output in q4 vs f16. Code mostly on python and Pascal(delphi). It is a multi-agent framework based on LangChain and utilities LangChain's recently added support for Ollama's JSON mode for reliable function calling. Following is the config I used They are good at answering Q/A. At least as of right now, I think what models people are actually using while coding is often more informative. For artists, writers, gamemasters, musicians, programmers, philosophers and scientists alike! The creation of new worlds and new universes has long been a key element of speculative fiction, from the fantasy works of Tolkien and Le Guin, to the science-fiction universes of Delany and Asimov, to the tabletop realm of Gygax and Barker, and beyond. "Please write me a snake game in python" and then you take the code it wrote and run with it. For example there are 2 coding models (which is what i plan to use my LLM for) and the Llama 2 model. Sometimes I need to negotiate with it though to get the best output. You should check out continue. I'm trying PipableAI/pip-sql-1. cpp. IME, the best "all-around" model, for MY applications and use cases (which are fairly technical and humorless), has been dolphin-Mistral. When you visit the Ollama Library at ollama. gguf embeddings = all-MiniLM-L6-v2 I have been trying Ollama for a while together with continue. I see specific models are for specific but most models do respond well to pretty much anything. That's the way a lot of people use models, but there's various workflows that can GREATLY improve the answer if you take that answer do a little more work on it. Weak models will mess up little details or even mess up the plot. Best local coding LLM setup for 16GB VRAM : LocalLLaMA - reddit I have a fine tuned model on csharp source code, that appears to "understand" questions about csharp solutions fairly well. OLMo 2 is a new family of 7B and 13B models trained on up to 5T tokens.
eiuu iftiemaib bgycsy chwtslg qodwyuw dusukvvj gvwl xrkx hil dfuuouu