The Inherent Difficulty of Trying to Understand New Codebases
Most approaches to understanding new codebases don't work well. Here's what actually happens when you inherit unfamiliar code — and a better way.
You've been there.
Your manager says, "We're putting you on the payments team." Or a client forwards a GitHub link: "Can you take a look at this?"
You clone the repository. 1,247 files. 83,000 lines of code. Zero documentation.
Everyone expects you to be productive by Monday.
What do you do?
Understanding a new codebase is one of the hardest parts of being a developer. Whether you're joining a new team, inheriting legacy code, or evaluating a third-party library, you face the same challenge: thousands of lines of unfamiliar code and no clear starting point.
Most developers approach this the same way — by opening files and reading code. But this rarely works. Here's why, and what to do instead.
The "Just Pick a File" Trap
Usually you start with a file that looks important. main.py. App.tsx. Something with the word "server" in it.
It feels reasonable. Every program starts somewhere, right?
You scroll past a wall of imports. You recognize the framework. That's comforting. You click into one of the imported modules. Then another. Then another.
An hour later, you're deep inside password hashing utilities or request middleware, and a subtle unease sets in.
You've been reading code for a while now, but you don't feel more oriented. In fact, you feel less certain than when you started.
You're learning things — just not the things you need.
You know how a token is verified, but not why this service exists. You know what a function does, but not where it fits. You can explain a few lines of code in isolation, yet you couldn't describe the system to someone else if you tried.
That's the first trap.
Codebases are not linear, but reading them linearly feels like progress. It isn't.
Three Hours Later, You're Lost
The problem isn't intelligence or experience. It's that modern systems are webs.
Every file is connected to many others. Imports pull you sideways. Abstractions push you down. Utilities hide important decisions behind innocuous names.
Here's what this looks like in practice.
You're looking at a FastAPI backend. You see:
from app.routers import auth, billing, jobs
Which should you look at first?
Without the map, you guess. You click auth because authentication seems foundational. But in this codebase, jobs is actually the core — auth and billing are just supporting modules.
You just spent 45 minutes in the wrong place.
With the map, you'd see: jobs has 47 dependencies. auth has 3. You'd know where the complexity lives before wasting time.
Reading code without context is like exploring a city by wandering into random buildings. You can describe individual rooms in detail, yet you still can't answer the questions that matter: Where's downtown? Which way is the hospital? How do you get home?
Eventually, most developers realize this approach isn't working.
So they try something else.
The ChatGPT Hallucination Problem
The next move often feels smarter. Instead of reading everything yourself, you paste a file into ChatGPT and ask, "What does this do?"
The answer sounds plausible. Confident. Structured. Reassuring.
And subtly wrong.
ChatGPT confidently tells you:
"This is a user authentication module. The main authentication logic is in
src/auth/handlers.py, which handles login and session management. The password hashing uses bcrypt and tokens are stored in Redis with a 24-hour expiration."
Sounds great! Except:
- There is no
src/auth/handlers.py(that folder doesn't exist) - Passwords use argon2, not bcrypt
- Tokens are in PostgreSQL, not Redis
- The 24-hour expiration was removed 6 months ago
ChatGPT doesn't know your codebase. It doesn't know which files exist, which ones were deleted, or which architectural decisions were reversed six months ago. It fills in gaps with patterns it has seen elsewhere.
The danger isn't that it's useless — it's that it's almost right.
You walk away believing you understand the system, only to discover later that half of what you internalized never existed.
At that point, you're not just lost. You're lost with false confidence.
So you stop trusting AI explanations and do what developers have always done.
You ask a human.
Why Asking Coworkers Doesn't Scale
You find someone who's been on the team forever and ask them to explain how the codebase works.
Sometimes they're busy. Sometimes they're generous with their time. Either way, the result is usually the same: a verbal explanation that skips context you don't have and references systems you haven't seen yet.
They talk about "the old auth system." Or "the refactor we did last year." Or "the part that's weird but don't worry about it."
You nod along, writing notes you don't yet have a place to store mentally.
Later, you try to map what they said back to the code. It doesn't line up cleanly. Some of it is outdated. Some of it assumes knowledge you don't have. Some of it was correct once but no longer is.
This isn't their fault. Architectural knowledge decays. People compress explanations. Oral history doesn't translate cleanly into source code.
You still don't have the map.
The Test Documentation Myth
At some point, someone tells you, "Read the tests. Tests are documentation that never goes out of date."
So you open the test directory.
# tests/test_auth.py
def test_login_success():
# TODO: fix this test
pass
def test_login_invalid_password():
# Skip: breaks in CI
pytest.skip()
def test_user_registration():
user = create_user("test@example.com", "password123")
assert user.id is not None
# What does this actually test???
Some tests are skipped. Some are broken. Some test implementation details that no longer matter. Others validate behaviors without explaining why those behaviors exist.
You learn how to call functions, but not why they're called. You see inputs and outputs, but not relationships.
Tests show pieces of the system — never the system itself.
You're still working bottom-up. One assertion at a time.
Trial and Error Isn't Learning
Another common move is to run the application, make a small change, and see what breaks. This feels empirical. Scientific, even.
Until a harmless-looking edit causes failures in three unrelated modules.
You change one line in the auth module. Suddenly:
- Tests in the billing module fail
- The admin dashboard won't load
- The API returns 500 errors
Things fail far away from where you were working, and you have no intuition for why.
That's when fear creeps in. You stop changing things. You start tiptoeing.
The codebase becomes something you survive, not something you understand.
The "Grep Everything" Approach
One last attempt: systematic search.
You want to find where authentication happens. You grep:
$ grep -r "authenticate" .
347 results.
./src/auth/service.py: def authenticate_user(username, password):
./src/middleware/auth.py: # Authenticate requests
./tests/test_auth.py: def test_authenticate():
./legacy/old_auth.py: def authenticate(user): # DEPRECATED
./docs/api.md: Authenticate users via POST /auth/login
./node_modules/passport/index.js: authenticate: function()
... (341 more results)
Which ones matter? Which is the current implementation? Is legacy/old_auth.py still used or not?
You have 350+ results. No context. No signal vs noise. No way to know what's central and what's leftover from three migrations ago.
At this point, you might pause and wonder: why is this so hard?
The answer is uncomfortable.
You're trying to build a mental model from the bottom up. You're collecting puzzle pieces without knowing what the picture looks like. Every new detail adds noise, not clarity.
Trying to understand a codebase by reading files sequentially is like trying to understand a conversation by analyzing individual letters. The letters are real. The structure is correct. But you're looking at the wrong level.
It's an inherent problem with the approach.
The Common Thread: All Roads Lead Nowhere
All of these approaches — reading sequentially, asking AI, bugging coworkers, running experiments, grepping randomly — fail for the same reason.
They start at the wrong level.
They force you to assemble an architectural understanding from scattered implementation details. They assume that if you just accumulate enough facts about individual files, the system's shape will eventually reveal itself.
But structure doesn't emerge from details.
Structure has to be observed directly.
When you join a new team or inherit a system, what you actually need first isn't granular code knowledge. It's orientation.
You need to know:
- What are the major pieces?
- How do they connect?
- Where does execution flow?
- What's central and what's peripheral?
You need the map before you learn the streets.
What Actually Works: Structure First, Details Second
The moment things change is when you stop asking "What does this function do?" and start asking "What is the shape of this system?"
Here's what that looks like in practice.
First: Understand the organization
How is the repository laid out? Where are models? Where are routes? Where's configuration? Where's the actual business logic?
You're not reading code yet. You're learning the neighborhoods.
Second: Identify entry points
Where does execution begin? API endpoints? CLI commands? React components? Main functions?
These are your front doors.
Third: Map dependencies
What imports what? Which modules are foundational (imported by many) and which are peripheral (import nothing, imported by few)?
This is critical. A module that's imported by 30 other files is architectural. A module imported by 2 files is a detail.
These are your roads.
Fourth: Then read code
Now when you read code, you have context. You know where this file lives in the larger system. You know what depends on it. You know what it depends on.
You understand the layout of the city.
Reading code with context is a completely different experience.
Doing It the Manual Way (Takes 3-4 Hours)
If you can't or won't use automated tools, here's the systematic approach:
1. Map the Directory Structure
tree -L 2 -I node_modules
Look for patterns. Most codebases follow conventions:
src/orapp/— main codemodels/orentities/— data structuresroutes/orcontrollers/— request handlersservices/— business logicutils/— helpers
You're not reading code yet. You're just seeing how things are organized.
2. Find the Entry Points
Backend API:
- Look for
main.py,server.js,cmd/main.go - Find where routes are defined
- Check for API endpoint declarations
Frontend:
- Look for
App.tsx,pages/,routes/ - Find the root component
- Check routing configuration
CLI tools:
- Find argument parsers
- Look for command definitions
- Check the main entry function
3. Trace One Complete Flow
Pick one user action. Login. Checkout. Search. Doesn't matter which.
Follow it from entry point → business logic → database → response.
Don't branch off. Stay on that one path from start to finish.
Now you understand a vertical slice of the system.
4. Map Dependencies Manually
grep -r "from app.auth" .
grep -r "import *database" .
Use your IDE's "find usages" feature. Build a rough diagram showing which modules depend on which.
Ask yourself:
- Which files are imported the most? (foundational)
- Which files import the most things? (complex)
- Are there circular dependencies? (A imports B imports A)
This works. It's just slow.
For a 50K line codebase, you're looking at 3-4 hours of manual work.
But you get the map.
Or Let Tools Do the Heavy Lifting (Takes 5 Minutes)
This is where automated structural analysis helps.
Instead of spending hours grepping imports and drawing diagrams by hand, tools like PViz analyze the repository and show you:
The Dependency Graph
Which files import which, in a consolidated and structured format.
You can see that auth.service is imported by 23 other files (critical) while utils.formatting is imported by 2 (peripheral).
Architectural Zones
The codebase automatically grouped into logical areas: API routes, database models, business logic, utilities.
No guessing which files "belong together."
Coupling Hotspots
Files that import many things and are imported by many others.
These are the ones that break things when you change them.
Circular Dependencies
Detected automatically. Module A imports B imports A — architectural smell. Now you know where the technical debt lives.
Entry Points
Where execution actually begins, identified automatically.
Then you can ask questions directly:
- "Where is authentication implemented?"
- "What depends on the database module?"
- "What are the main entry points?"
- "Which file is most coupled?"
All answers grounded in the actual structure analysis. No hallucinations. No guessing.
Example output:
Repository: payments-api (347 files, 87K SLOC)
Architectural zones:
• api/ (23 files, 45 imports)
• auth/ (12 files, imported by 23 others) ← critical
• billing/ (18 files, 12 imports)
• database/ (8 files, imported by 67 others) ← foundational
Coupling hotspots:
• database/models.py (imported by 34 files — refactor carefully)
• auth/service.py (imports 12 modules, imported by 23)
Circular dependencies: 2 found
• auth.service ↔ user.model
• jobs.router ↔ jobs.service
In 5 minutes, you now know:
- The codebase has 347 files across 4 main architectural zones
- The database module is foundational (67 files depend on it)
- The auth service is highly coupled (imported by 23 files)
- There are 2 circular dependencies to watch out for
You get the map in 5 minutes instead of 3 hours.
Try it: pvizgenerator.com
What Changes When You Have the Map
Once you have structural understanding, everything else becomes easier.
Reading code feels productive instead of aimless.
You know where you are and why it matters. Imports aren't rabbit holes anymore — they're signposts.
Changes feel safer.
You can see what depends on what, so you know what to test. You're not randomly breaking things three modules away.
Conversations with teammates become more specific.
Instead of "how does this work?" you ask "why does billing depend on auth instead of the other way around?"
Much better questions. Much better answers.
Onboarding accelerates.
New team members get oriented in hours, not weeks. They don't waste days wandering through utilities.
Fear decreases.
You're not tiptoeing around the codebase hoping you don't break things. You know what's safe to change and what's risky.
The code hasn't changed. Your understanding has.
The Real Shift
The breakthrough isn't a new tool or a clever trick.
It's a mindset change.
Stop trying to understand codebases from the inside out.
Start from the outside in.
Get the map first. See the architecture. Understand the major pieces and how they connect.
Then — and only then — dive into the details.
Everything else becomes easier once you do.
What to Do Next Time You Face Unfamiliar Code
Don't:
- Open a random file and start reading
- Ask ChatGPT to explain the architecture
- Grep randomly hoping to find things
- Make changes blindly to see what breaks
- Assume the README is accurate
Do:
- Get the architectural map first (manual or automated)
- Identify entry points before reading code
- Map dependencies to understand coupling
- Learn the major zones/modules
- Then dive into specific code with context
If you're doing it manually:
- Map directory structure with
treeor folder browsing - Find entry points (
main.py,App.tsx,server.js) - Trace one complete user flow from start to finish
- Grep for imports to map dependencies
- Draw rough diagrams
If you want to save time:
- Use PViz to automate the structural analysis
- Get dependency graphs, coupling metrics, and architectural insights in minutes
- Ask questions and get evidence-backed answers
- Download the analysis for your team
The difference between a codebase you survive and a codebase you understand is just one thing:
The map.
Get the map first. Then explore.
Everything else follows.
Try it free: pvizgenerator.com
Related reading:
- What is a Dependency Graph? (Complete Guide) — coming soon
- Legacy Code Analysis: A Systematic Approach — coming soon
- Circular Dependencies & Strongly Connected Components: The Complete Guide
Try PViz on Your Codebase
Get instant dependency graphs, architectural insights, and coupling analysis for any GitHub repository.