OCDevel AI Podcast

Learn AI and machine learning from the ground up - a complete, self-driving course that goes from "what is AI?" all the way to building and operating production AI systems. Every episode pairs a five-minute brief on the latest in AI with a tutorial that climbs a single ladder across ~100 episodes - interleaving the concepts, the math that actually explains them, hands-on code you build yourself, and the MLOps to ship it. It leaves no stone unturned: the probability, statistics, and Bayesian foundations most courses skip get the deep treatment they deserve, right alongside the practical work. The path runs from your first model on real data, through the mathematical bedrock, classical ML, neural networks built from scratch in PyTorch, transformers part by part, building with LLMs (RAG, fine-tuning, agents), and MLOps on AWS and GCP - to the capstone: operating a self-managing fleet of AI agents in production. The goal isn't a diploma, it's a job. Every phase leaves you a portfolio project, and the whole course is built to make you the rare "operator" who can ship real systems - the one-person AI department. For programmers who want to break into AI through self-directed learning - no grad school required. AI-generated podcast by OCDevel.

Generated with OCDevel PodcasterMade with OCDevel Podcaster

This show was made with OCDevel Podcaster: turn any topic or text into an AI-narrated podcast episode that drops right into your feed.Turn any topic into an AI-narrated episode in your feed.Create your own →Create your own →

Setting Up Your AI Toolkit: Python, NumPy, Notebooks, Git, and Why a Portfolio Beats a Credential (Phase 0, Ep 4)

14h ago

Build the exact dev environment this course runs on (Python, virtual environments, notebooks, git, VS Code) and learn why three to five shipped, defensible GitHub projects open more doors than any stack of certificates.

Learn Faster with a Walking DeskWalk While You Learn

Sitting for hours drains energy and focus. A walking desk boosts alertness, helping you retain complex ML topics more effectively.Boost focus and energy to learn faster and retain more.Discover the benefitsDiscover the benefits

Show Notes

The last map-phase episode: we build the workshop and set the strategy. We construct the development environment the whole course uses, then make the case that shipped portfolio projects beat certificates in a tougher entry-level market.

Why Python won. Python is glue over compiled cores; NumPy brings C and Fortran power to Python by dispatching array math to BLAS/LAPACK. Proof of the hybrid design: SciPy is roughly 50% Python, 25% Fortran, 20% C (SciPy 1.0, Nature Methods; Array Programming with NumPy). Python 3.14 (Oct 2025) ships a supported free-threaded build, but pin 3.11 or 3.12 for ML since CUDA wheels lag (Astral).

Environments & packaging. venv+pip (baseline), conda/Miniforge (binary deps like CUDA), and uv (Rust, 10-100x faster, the 2026 default). Note the Anaconda licensing landmine: paid license required for 200+ employee orgs; prefer Miniforge/conda-forge. Lockfiles give reproducibility.

Notebooks. Great for exploration, terrible for git. Pimentel et al. analyzed 1.45M notebooks: only ~24% ran clean, only 4% reproduced results. Cure: Restart Kernel and Run All. Tools: nbstripout, jupytext, nbdime; marimo as a reactive alternative.

Git & GitHub. Commit code/configs/lockfiles, not data/weights/secrets (100MB cap). README in Problem-Solution-Impact format. Later: Git LFS and DVC.

Editors & hardware. VS Code dominates (73.6% in SO 2024); Cursor, Copilot, Claude Code are the AI trio. You don't need a GPU early; use free Colab (T4) and Kaggle.

Portfolio over credential. WEF Future of Jobs 2025: net +78M jobs by 2030; AI/ML specialists among the fastest-growing. ML Engineer pay ranges widely (Coursera). Entry-level is genuinely tough (IEEE Spectrum), which is exactly why a portfolio matters: 3-5 repos, one deployed demo, one write-up (Careery).

News: SpaceX/xAI's record $75B IPO (NPR); Moonshot's open-weight Kimi K2.7 Code; OpenAI retires GPT-5.2; Google's DiffusionGemma.

Transcript

Quick news brief before we build today's workshop.

The big one: SpaceX went public, and it dragged the whole AI story along with it. The shares were priced after the close on June eleventh at a fixed one hundred thirty-five dollars apiece, about five hundred fifty-six million shares, raising seventy-five billion dollars. That puts the post-money valuation somewhere around one and three-quarter trillion dollars. Trading started June twelfth on the Nasdaq under the ticker S-P-C-X. This is the largest initial public offering in history, nearly three times Saudi Aramco's old record. And here's why it matters for us: this is the first time a frontier AI lab gets priced by public markets at this scale. SpaceX bought xAI back in February, so one share now wraps up the rockets, Starlink, the X social network, and Grok, the AI models. First-day close was one hundred sixty-one dollars, up about nineteen percent, briefly pushing the market cap past two trillion. The book was reportedly three to four times oversubscribed, with demand north of two hundred fifty billion, including a reported five-billion-dollar BlackRock order. Worth noting the AI business itself reportedly lost about six point four billion dollars in 2025.

Next, something you can actually download. Moonshot AI released Kimi K2.7 Code on June twelfth, and it's open-weight. One trillion total parameters, but only thirty-two billion active at a time, because it's a mixture-of-experts design. A two hundred fifty-six thousand token context window, and a modified MIT license, so commercial use is fine with attribution. It always runs in thinking mode, but uses about thirty percent fewer reasoning tokens than the previous version. Vendor-reported benchmarks are strong, including a claimed eighty-one percent on one tool-use benchmark versus Claude Opus 4.8's reported seventy-six. Treat cross-model comparisons as vendor-reported. Hosted pricing is roughly a dollar in, four dollars out, per million tokens, but the weights are free to self-host. If you're building coding agents, this is a real frontier-tier option you can run yourself.

Two quick ones to close. OpenAI retired GPT-5.2 inside ChatGPT on June twelfth. The Instant, Thinking, and Pro variants are gone, and existing chats auto-migrate to the matching GPT-5.5 model.

And Google released DiffusionGemma, a twenty-six-billion-parameter open model that generates text by diffusion instead of left-to-right. It denoises a whole block of tokens in parallel, which can be up to four times faster, reportedly over a thousand tokens per second on a single H100 GPU. The catch, and the lesson, is that it trails standard Gemma on basically every benchmark. Speed bought at a quality cost. It's Apache-licensed and runs locally, so the smallest next action is to pull the weights and benchmark its speed against a same-size regular Gemma to feel that trade-off yourself.

This is the last episode of the map phase, and it does two jobs at once. We build the workshop, meaning the actual development environment this whole course runs on. And we set the strategy, which is the argument that shipped projects beat certificates. Episode one was what AI is. Episode two was the history. Episode three was the four learning paradigms. Those drew the map. Today we hand you the tools and a plan for using them.

Let's start with a question that trips people up: why is Python the language of AI when everyone knows Python is slow? The answer is that in numerical work, Python isn't actually doing the heavy lifting. It's the orchestration layer. The real array math runs in C and Fortran underneath. NumPy, the foundational array library, exists specifically to bring the computational power of languages like C and Fortran to Python. When you write a single matrix multiply, that one call dispatches down to highly optimized linear algebra routines, libraries like Intel's math kernel or OpenBLAS, that people have been tuning for decades.

This trick has a name. It's called vectorization. Instead of writing a loop in Python, where every iteration pays the interpreter's overhead, you push the loop down into compiled code that runs at native speed. So the mental shift is this: you stop thinking in loops and start thinking in whole-array operations. Your Python stays readable and high-level, and the math runs as fast as C, because it is C.

If you want concrete proof that this hybrid design is real and not just marketing, look at SciPy's own codebase. It's roughly half Python, a quarter Fortran, a fifth C, and small slices of Cython and C-plus-plus. That Fortran is mostly legacy numerical libraries from the nineteen-seventies and eighties, things for linear algebra, optimization, integration, and interpolation. Nobody rewrote them, because they're already correct and already fast. Python just wraps them in a friendly interface. That's the whole philosophy in one number: Python is the glue, compiled code is the engine.

So why did Python win over R, Julia, and MATLAB? It's a network effect, a virtuous cycle. As more AI frameworks chose Python, Python became a better place to do AI, which encouraged even more frameworks to be built in Python. TensorFlow chose Python. PyTorch chose Python. Scikit-learn chose Python. Each choice made the next choice easier, and the snowball just kept rolling.

Here's a mental model for the ecosystem, a set of layers stacked on top of each other. At the bottom is NumPy, with its core array object. On top of that sits SciPy, which is algorithms that run on those arrays. Then pandas, which gives you a labeled tabular structure, a DataFrame, like a spreadsheet you can program. Matplotlib handles plotting. Scikit-learn is classical machine learning with a beautifully uniform interface, where almost everything follows the same fit and predict pattern. Above that you've got the deep learning frameworks, PyTorch, TensorFlow, and JAX, which add GPU acceleration and automatic differentiation. And at the very top, Hugging Face, where you grab pretrained models. Each layer leans on the one below it.

Pay attention to that scikit-learn detail, because it's the reason the library is so easy to learn. Almost every model, whether it's a linear regression, a decision tree, or a clustering algorithm, exposes the same two methods. You call fit to train it on your data, and you call predict to get answers back. Learn that one rhythm and you can swap models in and out without relearning anything. That uniformity is a design choice, and it's a big part of why scikit-learn became the default place to start. We'll lean on it hard in Phase one.

One note on Python versions, and this is a moving frontier, so verify it yourself. Python 3.14 shipped in October of 2025. The headline feature is the free-threaded build, the one that removes the global interpreter lock. It was experimental in 3.13 and became an officially supported build in 3.14, and the single-threaded performance penalty is now down to roughly five to ten percent. That's genuinely exciting long-term. But here's my practical advice: for machine learning, pin a stable minor version, something like 3.11 or 3.12. The reason is that GPU and CUDA wheels, the prebuilt packages, lag behind the newest Python. Chase the bleeding edge and you'll spend your afternoon fighting install errors instead of doing ML.

Which brings us to package and environment management, the part everyone hates and everyone needs. The problem is simple to state. Different projects need conflicting versions of the same library. Project A needs an older NumPy, project B needs a newer one. Install everything into one global Python and you get what people call dependency hell, where fixing one project breaks another. You also get the classic excuse, works on my machine, because your setup is a unique snowflake nobody can reproduce.

A virtual environment solves this. It's just a self-contained directory with its own Python interpreter and its own packages, isolated from everything else. One project, one environment. They can't step on each other.

Think of it like a clean kitchen for each recipe. Project A gets its own counter, its own ingredients, its own versions of everything. Project B gets a completely separate one. When you finish, you can throw the whole environment away and rebuild it from the recipe, and nothing else on your machine is affected. That disposability is the feature. The day an install goes sideways, and it will, you delete the environment and recreate it in seconds instead of debugging your entire system Python.

Let me walk you through the tools for the ear, not as commands to memorize. The baseline, built right into Python, is venv paired with pip. You create a virtual environment in a hidden folder, you activate it, you install your packages, and when you're done you freeze the exact list of what you installed into a requirements file so someone else can recreate it. It's universal and always available. The downsides: it's slow at resolving dependencies, and it doesn't manage the Python version itself.

Then there's conda, which you'll meet as Anaconda or Miniconda. Conda's superpower is that it manages non-Python binary dependencies too. The CUDA toolkit, cuDNN, math libraries, even ffmpeg. For deep learning and scientific work that needs system-level C and C-plus-plus libraries, that's irreplaceable, and it's something the faster tools simply can't do.

And the 2026 darling is uv, from Astral, the same people who make the Ruff linter. It's written in Rust, it's ten to a hundred times faster than pip, and it unifies a whole pile of separate tools into one: package installing, virtual environments, Python version management, and lockfiles. It's still pre-one-point-oh, so again, verify the version, but the consensus advice for new projects is just start with uv. You create an environment, you add packages, you run things through it, and it automatically writes a lockfile recording exactly what got installed. There's also Poetry, aimed at application and library project management, and Pixi, which does conda-style binary dependencies with a modern feel.

You don't have to pick just one, by the way. A pragmatic hybrid that a lot of people land on is using conda for the heavy binary dependencies, the CUDA and the system libraries, and then using uv inside that conda environment for fast everyday Python package management. Best of both.

Now, the files that record all this form a kind of reproducibility ladder. At the bottom is the flat requirements file, which is better when you actually pin versions. Then the conda environment file, which can also capture non-Python dependencies and channels. And the modern standard, the project file written in TOML, which puts your dependencies and your project configuration together in one place. uv adds its own lockfile on top.

The whole point of this is reproducibility. A loose import with no version pinned works today and breaks in six months when something upstream changes. A lockfile records the exact versions of everything, including the dependencies of your dependencies, so you or a teammate or a server can rebuild the same environment bit for bit. Hold onto that idea, because it connects straight forward to MLOps later in the course.

One landmine I have to warn you about, because it has bitten real organizations: Anaconda's licensing. Since March of 2024, Anaconda Incorporated's terms of service require a paid commercial license for any organization with two hundred or more employees or contractors. And that includes governments and non-profits. The specific trap is that the default download channel ships Anaconda's proprietary repository, and the community channel no longer quietly falls back to it. So you can be out of compliance without realizing it. The escape hatch is Miniforge, a minimal conda installer that defaults to the community channel instead. One large public university migrated its whole organization to Miniforge in late 2024 for exactly this reason. The lesson: prefer Miniforge with the community channel, or just use uv.

Let's talk about notebooks, because they're central to data work and they will absolutely betray you if you don't understand them. Why do they fit data science so well? Because data work is a loop. You load some data, peek at it, plot it, transform it, peek again. It's an inherently interactive, exploratory rhythm. Notebooks give you stateful, cell-by-cell execution with the output shown inline, right there, tables and charts and images. The notebook file itself is just JSON holding cells, outputs, and metadata. It's wonderful for weaving together narrative, code, and results. And it's terrible for version control, but we'll get there.

The platforms. Jupyter is the open standard. JupyterLab is the modern multi-panel interface, and the newer Notebook 7 was actually rebuilt on top of JupyterLab. Google Colab is hosted Jupyter with free GPUs and TPUs and zero setup, which makes it the perfect on-ramp. The free tier, and this is a moving frontier so verify, gives you an NVIDIA T4 GPU with sixteen gigabytes of memory, a ninety-minute idle timeout, and a twelve-hour maximum session. People observe a median of around twenty-two hours of GPU access a week, but during peak demand free users can get no GPU at all. Kaggle notebooks are similar, free GPU and TPU at historically around thirty hours a week, with datasets and competitions built right in. And VS Code runs notebooks natively, giving you IntelliSense, a variable explorer, a DataFrame viewer, and in-cell debugging, which the browser experience lacks.

Now the dark side, and I have hard data for this. Researchers analyzed over one and a four-tenths million notebooks across more than two hundred sixty thousand GitHub repositories. When they re-ran each one cleanly from top to bottom, only about twenty-four percent executed without an error. And only about four percent reproduced the same results they originally showed. Four percent. A follow-up study managed to push reproducibility up from roughly five percent to fifteen percent just by trying different cell orderings, which, honestly, only proves the problem rather than solving it.

So what goes wrong? Two big culprits. The first is hidden state. You define a variable, then you edit or delete the cell that made it, but the variable is still sitting in the kernel's memory. So the code on your screen no longer matches the state that actually produced your output. The notebook lies to you. The second is out-of-order execution. Nothing stops you from running cell five, then cell two, then cell eight. The little execution counter next to each cell reveals the true order, and when those numbers have gaps or run out of sequence, that's your red flag. Add in missing dependencies and data that isn't where the notebook expects it, and you've got the main failure causes.

The discipline that saves you is one habit. Before you trust a notebook or commit it, restart the kernel and run all cells, top to bottom. If it doesn't run clean from a fresh start, it's broken, full stop. And keep a clear line in your head: notebooks are for exploration. Production code belongs in proper Python modules, with functions, tests, and imports. Don't ship a notebook as your pipeline.

The version-control pain deserves its own mention. Because the notebook file is JSON with the outputs embedded inside it, including images encoded as long blocks of text, git diffs are unreadable and merges are brutal. Two people editing the same notebook is a nightmare. The mitigations: one tool strips the outputs before committing, another pairs your notebook with a plain Python file so git tracks the readable version, and a third gives you notebook-aware diffing. There's also a newer reactive alternative called marimo, which stores notebooks as pure Python and builds a dependency graph so that when you change a cell, everything that depends on it re-runs automatically. That design kills hidden-state bugs at the root.

On to git and GitHub, which for a self-taught person is two things at once: your safety net and your resume. Git tracks changes through commits. You initialize a repository, you add and commit your changes, you push to a remote, you branch, you open pull requests. GitHub is the hosted home for that, and it's also your portfolio surface, the public face of your work.

What belongs in a repository? Your code, your configuration, your dependency files and lockfile, a README, small sample data, and scripts that fetch or regenerate the big data. What does not belong: large datasets, trained model weights, secrets and API keys, your virtual environment folder, and the various cache folders. You keep those out with an ignore file that lists patterns to skip, things like the data folder, large data files, model files, the virtual environment, and the caches.

Why so strict about big files? Because git is genuinely bad at versioning large files, something even Linus Torvalds, who created git, has acknowledged. Git stores full snapshots, and binary blobs can't be diffed and they bloat your history permanently, forever, even after you delete them. On top of that, GitHub enforces a hard one-hundred-megabyte cap per file, with warnings starting above fifty.

The README is where you sell the project, and this ties directly into the portfolio argument coming up. Treat it as a mini design document. State the business problem, what you tried, what failed, and what you learned. A simple, memorable framing is problem, solution, impact. For a self-taught candidate especially, a clean GitHub profile with strong READMEs basically is your resume.

And there's a forward reference here to MLOps. When you do need to version data and models properly, two tools handle it. One replaces large files with small text pointers and keeps the actual bytes on a separate server. The other works like the ignore file but for data: it swaps large files for tiny pointer files that you commit to git, while the real bytes live in remote storage like S3 or Google Cloud, and it adds pipeline definitions and experiment tracking on top. File that away as a later topic.

The editor landscape is easy, because there's a clear winner. In the 2024 Stack Overflow survey, VS Code was the primary editor for seventy-three and a half percent of professional developers, twice its nearest competitor, and it stayed number one in 2025. For Python and ML, install two extensions from Microsoft: the Python extension, which gives you linting, IntelliSense, debugging, and environment selection, and the Jupyter extension, which lets you run notebook cells right inside the editor with that variable explorer and DataFrame viewer.

Then there's the AI-assisted coding wave, and this is very much a moving frontier. The three dominant tools in 2026 are Cursor, a standalone AI IDE that's actually a fork of VS Code, GitHub Copilot, an extension that works across editors, and Claude Code, a terminal-native agent. Adoption is real: roughly one point eight million developers use Copilot, and about half of the Fortune 500 deploy Cursor. Rough pricing runs about ten dollars a month for Copilot, twenty for Cursor, and twenty to two hundred for Claude Code. All three are now agentic, meaning they do multi-file editing and autonomous planning, and the best of them score above eighty percent on the SWE-bench Verified coding benchmark.

But here's the caveat, and it ties into a trap we'll hit at the end. AI autocomplete can let a beginner ship code they don't actually understand. That feels great right up until an interviewer grills you to a depth you never imagined, asking why your own code works. Use these tools, but make sure you can explain everything they write.

Now the hardware question everyone asks early: do I need a GPU? For these early phases, no. Phase one is classical machine learning, scikit-learn on tabular data, and that runs perfectly fine on any laptop CPU. GPUs matter for deep learning, where you need to do enormous parallel matrix multiplies, and those map naturally onto a GPU's thousands of cores.

So here's the breakdown. A CPU is fine for pandas data wrangling, classical ML, small models, and small-model inference. A GPU is what you need to train or fine-tune neural networks at a reasonable speed, and NVIDIA dominates this space because its CUDA platform is what the whole framework ecosystem is built around first. The recommended on-ramp is the cloud free tiers, Colab and Kaggle, which we already covered. A T4 is great for fine-tuning smaller models, inference, prototyping, and learning, though it's not for production.

When does local hardware actually matter? When you've got privacy-sensitive data you can't upload, when you iterate so frequently that session limits genuinely hurt, when latency matters, or when you want to stop paying ongoing rental costs. The entry point is a consumer GPU with twelve to twenty-four gigabytes of video memory. And the key insight: video memory, VRAM, is usually your binding constraint, not raw speed, because it caps how big a model and batch you can fit. My guidance: don't buy a GPU to start. Use free Colab and Kaggle, and only invest once you're regularly bottlenecked.

Which brings us to the heart of this episode: the portfolio-over-credential thesis. Let me set the job-market backdrop, because it's not all rosy and I won't pretend it is. The World Economic Forum's Future of Jobs report from early 2025, drawing on over a thousand employers covering fourteen million workers, projects one hundred seventy million new jobs created and ninety-two million displaced by 2030, a net gain of about seventy-eight million. The three fastest-growing jobs by percentage are big data specialists, fintech engineers, and AI and machine learning specialists. AI and big data top the fastest-growing skills list. Eighty-six percent of employers expect AI to transform their business by 2030, two-thirds plan to hire AI-skilled talent, and forty percent of core skills are expected to change over five years. The US Bureau of Labor Statistics projects computer and information research roles to grow about twenty percent from 2024 to 2034.

What about money? Report these as ranges, because the sources genuinely disagree, and that disagreement is worth saying out loud. For a machine learning engineer in the US in 2026, the average runs anywhere from around one hundred twenty-five thousand to nearly one hundred eighty-nine thousand depending on who's counting. Experienced engineers, five-plus years, see base salaries from about a hundred two thousand up to two hundred thirty-three thousand, with the ninetieth percentile around two hundred fifty thousand and senior roles at top labs clearing three hundred fifty thousand in total comp. Data scientists average roughly a hundred thirty thousand. Entry-level is typically eighty to a hundred twenty thousand, with Bay Area juniors higher.

Now the honest counterweight, because credibility matters. The entry-level market is genuinely tougher right now. New-grad job-market sentiment in 2026 is the most pessimistic since 2020. AI is shifting entry-level expectations upward, and employers increasingly weigh industry experience and demonstrated proficiencies among their top factors. And here's the thing: that is exactly why a portfolio matters more now, not less. It's the proxy for the experience you can't get without a job. It breaks the chicken-and-egg.

So why do shipped projects beat certificates? Certificates feel productive, but they fade. You finish course after course, collecting credentials, feeling great, and the skills quietly slip away because you never built anything you owned. Then in an interview, when someone asks you to explain a project, they will grill you to a depth you have never imagined, and they can absolutely tell which projects you actually built versus copied. A certificate proves attendance. A project you can defend proves capability. That's the whole difference.

What does a strong portfolio actually look like? The minimum viable version is a GitHub profile with three to five repositories, one deployed demo someone can click on, and one technical write-up. Use real, messy data, from the UCI machine learning repository, government open data, or genuine Kaggle competitions. Avoid the famous toy datasets, the Titanic survival set, the iris flowers, the handwritten digits, as your headline projects, because reviewers have seen them hundreds of times. Show the unglamorous work too. Data cleaning and feature engineering are sixty to eighty percent of real ML work, and skipping them signals inexperience. Make it scannable, because sixty-two percent of recruiters spend less than thirty seconds on a portfolio. Clear titles that emphasize business outcomes, a README in that problem-solution-impact format, and a deployed demo using something like Streamlit, Gradio, or Hugging Face Spaces. And make it end to end: data, model, evaluation, deployed demo. That end-to-end story ties straight to our north star, the operator, the one-person AI department.

A nuanced word on Kaggle, since people ask. It has a five-tier ranking, from novice up through contributor, expert, master, and grandmaster, earned with bronze, silver, and gold medals across competitions, notebooks, and datasets. The scarcity tells you something: out of more than twenty-three million accounts, only around three thousand are masters and just over six hundred are grandmasters. So a high rank is a real signal. But here's the honest framing: leaderboards reward squeezing a tenth of a percent more accuracy out of a clean, pre-defined dataset, which is the opposite of messy end-to-end work. Use Kaggle to learn technique and to grab free compute and data. Don't let it turn into leaderboard-grinding, which is just tutorial hell with a scoreboard.

What do hiring managers actually look for? The ability to take a real problem from messy data all the way to a working, explainable solution. Clean, readable code on a clean GitHub. Communication, a write-up or a blog post. Evidence that you understand why, not just what. And ideally something deployed they can actually poke at.

Notice how that list maps perfectly onto the toolkit we just built. The clean GitHub is your git habits. The reproducible environment is what lets a reviewer actually run your code instead of giving up. The notebook discipline is what keeps your results trustworthy. The write-up is your README in problem-solution-impact form. None of this is busywork. Every tool today exists to make a future project legible to the person deciding whether to hire you. That's the through-line: the workshop and the strategy are the same thing seen from two angles.

Let me close on the pitfalls, the traps that swallow self-learners. The biggest is tutorial hell, where you depend on tutorials and courses but can't build anything independently. You watch one video, get a little stuck, and bounce to another. But real learning happens precisely when you're on your own and you get stuck and you have to fight through it. The escape is to build something unique, on your own, that actually breaks, and then fix it. The 2025-and-onward variant is vibe coding hell, where you let an AI assistant generate code you can't explain. Same trap, shinier tooling, same illusion of progress without understanding. Then there's certificate collecting, which impresses nobody in a technical interview. And environment-setup yak-shaving, losing days to CUDA and conda and dependency errors instead of doing actual ML, which, again, is the whole reason we built a stable, reproducible stack today, with uv or Miniforge and lockfiles, and Colab as your fallback when you're stuck. And the notebook-state bug, which you now recognize from non-monotonic execution counters and irreproducible results, cured by restart and run all.

So where does this land on the map? Phase zero built the conceptual picture, what AI is, its history, and the four paradigms. This last Phase 0 episode built the workshop and set the strategy. Phase one unlocks first: your first models on real data, scikit-learn on tabular datasets, the fit-and-predict loop, train-test splits, evaluation, all of it shipping into the GitHub portfolio you set up today.

A quick word on one framework choice, since you'll wonder. PyTorch versus TensorFlow, and this is a moving frontier. PyTorch dominates research, around eighty-five percent of papers at top venues, leads in job postings, and owns the Hugging Face and LLM stack. TensorFlow keeps a large enterprise and production installed base, plus the edge and mobile niche through its lightweight version. The framing I like: PyTorch dominates the flow, the new research and new models and new hires, while TensorFlow leads on the stock, the installed base. For a learner today, default to PyTorch.

So your stable default stack, names to drop freely: Python, NumPy, pandas, scikit-learn, PyTorch, Hugging Face, Jupyter, git and GitHub, and VS Code. Set it up once, reproducibly, and stop fighting it. Next time, we put it to work on real tabular data, and your portfolio gets its first real entry.

OCDevel AI Podcast

@media (min-width:0px){.css-6k8fz8{display:none;}}@media (min-width:1200px){.css-6k8fz8{display:block;}}Generated with OCDevel Podcaster@media (min-width:0px){.css-1rb0nos{display:block;}}@media (min-width:1200px){.css-1rb0nos{display:none;}}Made with OCDevel Podcaster

Setting Up Your AI Toolkit: Python, NumPy, Notebooks, Git, and Why a Portfolio Beats a Credential (Phase 0, Ep 4)

Learn Faster with a Walking DeskWalk While You Learn

Generated with OCDevel PodcasterMade with OCDevel Podcaster

Generated with OCDevel PodcasterMade with OCDevel Podcaster