When helping coworkers and fellow students set up their VS Code environments, I’ve noticed that understanding how to properly install, activate, and use Conda is one of the biggest challenges. For anyone working in data science, Conda is an essential tool that simplifies package management by creating isolated environments, which prevents conflicts between libraries and dependencies. This guide will cover key concepts, best practices, and real-world applications to help you use Conda effectively in your workflow.
What is Conda, and Why Should You Use It?
Conda is an open-source package management and environment management system primarily used for managing programming languages, libraries, and dependencies.
At its core, Conda solves a fundamental problem in software development: dependency management. When working on different projects, you often need specific versions of libraries or tools that might conflict with one another. For instance, one project might require TensorFlow 2.3, while another needs TensorFlow 2.7. Without an effective tool to manage these dependencies, conflicts could break your projects or cause erratic behavior. This is where Conda shines. It allows you to create isolated environments for each project, so you can install the exact libraries and versions you need without them interfering with each other.
Additionally, Conda simplifies package installation. It downloads, installs, and updates packages in a seamless manner, which handles any dependencies automatically. So, whether you’re working with data analysis libraries like pandas or deep learning frameworks like TensorFlow, Conda ensures that all the components work together.
Example Uses Cases for Conda
Pretend that you’re working on two different data science projects that require distinct dependencies. One project relies on TensorFlow 2.3, while another needs TensorFlow 2.7. Additionally, the first project requires Python 3.7 for compatibility with legacy libraries, whereas the second project utilizes the newer Python 3.9. With Conda, you can create separate environments for each project, ensuring that each one has the specific versions of both Python and TensorFlow installed without any conflicts. For example, you would run the following to achieve this:
When working with teammates, Conda ensures that everyone is using the same environment, which is critical for avoiding “it works on my machine” issues. Conda allows you to share your environment using the environment.yml file, ensuring that everyone has the same setup with the exact versions of the required packages.
Miniconda vs. Anaconda: Which One Should You Use?
When setting up Conda, you’ll likely come across two main options: Miniconda and Anaconda. Both are distribution versions of Conda, but they serve different purposes:
- Miniconda: A lightweight version of Anaconda that includes Conda but does not come preloaded with any packages beyond the bare essentials. You can install additional packages as needed.
- Anaconda: A full-featured distribution that includes Conda and over 150 popular scientific packages such as NumPy, pandas, and JupyterLab. It’s a comprehensive solution for beginners or users who want everything pre-installed.
The choice between Miniconda and Anaconda boils down to your preference and requirements. If you want an out-of-the-box setup with most libraries you’ll need for data science or machine learning, go for Anaconda. However, if you prefer a more customizable setup and don’t want to install unnecessary packages, Miniconda is the better option. Personally, I use Miniconda to prevent bloat.
How To Properly Set Up Conda
Setting up Conda is relatively straightforward, but there are a few important steps to ensure everything runs smoothly, especially in VS Code.
- Download Miniconda or Anaconda: Visit the official Conda website and download either Miniconda or Anaconda based on your needs.
- Install Conda: Download and run the installer, then follow the prompts. If you have administrative rights on your computer, be sure to check the option to “Add Anaconda3 to my PATH environment variable” during the installation process. This will automatically configure your shell to recognize Conda commands, making it easier to use. If you choose not to add it to your PATH, you’ll need to manually add Python and Conda to your path by using the setx command in your command prompt.
- Update Conda: After installation, it’s a good practice to update Conda to ensure you’re using the latest version. In your Anaconda Prompt, run:
- Understanding the Base Environment: The base environment is the default that comes with Conda, containing the package manager and essential tools. It’s best not to install dependencies directly in the base environment, as doing so can cause conflicts between packages and make managing different versions harder. Cluttering the base environment can also lead to stability issues. For better isolation and easier management, create a new environment for each project and install only the required dependencies there. As you can see, we are currently in the base environment:
- Setting Up Environments: Conda’s main strength is its ability to create isolated environments. You can create a new environment named ‘ds’ with the most recent version of python using the following command:
- Activating Environments: Once your environment is created, you can activate it by running:
- Installing Libraries on Environments: After activated you can install specific versions of libraries onto the environment by using the following command:
- Deactivating Environments: When you’re done, deactivate the environment with:
How to Correctly Utilize Conda Environments in VS Code
Activating Conda environments is straightforward. Below are the steps to help you select and use the appropriate Conda environment for your Jupyter notebook project in VS Code.
- Locate Kernal: A kernel in VS Code is the computational engine that runs and executes code in Jupyter notebooks, linking it to a specific environment like a Conda environment. Click on the ‘Select Kernel’ button, as shown below, to specify your kernel.
- Python Environment: Conda is a Python Environment manager, so click on ‘Python Environments’ as show below.
- Select Conda Environment: The ‘base’ environment is the default environment that comes with Conda. It’s always present because it holds Conda itself and the essential tools needed to manage other environments. While you can use the base environment, it’s best practice to create separate environments for each project to keep things organized and avoid cluttering the base with unnecessary packages. Earlier, we created the ds environment, and you can see and select it here along with your other Conda environments.
- Useable Conda Environment: Whatever you selected from the previous step will now be your active kernel as shown below. Now you can get to coding!
Similarly, when creating a Python script and wanting to use your Conda environment, you can select the desired interpreter by using the ‘Python: Select Interpreter’, as shown below.
Conclusion
Conda is an essential tool for anyone working with programming environments, especially in the realms of data science and machine learning. By managing dependencies and isolating environments, Conda ensures that your projects run smoothly, regardless of the complexity of the tools involved. Whether you opt for the fully loaded Anaconda or the minimal Miniconda, utilizing Conda will make your life as a developer significantly easier. So, the next time you set up a project in VS Code, remember to leverage the power of Conda to manage your packages and environments efficiently!
Columnist: Sterling Hayden