Combined from Anthropic's official docs and community refactoring lessons.
1. The #1 Rule: Skills Are Context Engineering, Not Documentation
The single most important insight - validated by both official docs and painful community experience - is that skills are not documentation dumps. They are active workflow knowledge that must be context-engineered.
A developer learned this the hard way: 36 skills totaling ~15,000 lines caused context explosions, slow activation, and degraded output quality. After refactoring with progressive disclosure, initial load dropped by 85%, activation went from ~500ms to <100ms, and relevant information ratio jumped from ~10% to ~90%.
The mantra: Context engineering isn't about loading more information. It's about loading the right information at the right time.
2. The 200-Line Rule for SKILL.md
Official docs recommend keeping SKILL.md under 500 lines. Community experience sharpens this to ~200 lines as the practical ceiling for the entry point.
Why 200 lines works:
- It's the sweet spot where Claude can efficiently scan and decide what to load next
- It forces you to write a table of contents, not a textbook
- Total loaded context stays at 400-700 lines of highly relevant content instead of 1,000+ lines of mixed relevance
If you can't fit the core instructions in 200 lines, you're putting too much in the entry point. Move the rest to reference files.
3. Progressive Disclosure Architecture
This is non-optional. Structure every skill in three tiers:
Tier 1: Metadata (Always Loaded)
YAML frontmatter only (~100 words). Just enough for Claude to decide if the skill is relevant.
---
name: pdf-processing
description: Extracts text and tables from PDF files, fills forms, merges documents. Use when working with PDF files or when the user mentions PDFs, forms, or document extraction.
---
Tier 2: SKILL.md Entry Point (Loaded on Activation)
~200 lines max. Contains overview, quick start, navigation map. Points to references but doesn't include their content.
# PDF Processing
## Quick start
Extract text with pdfplumber:
```python
import pdfplumber
with pdfplumber.open("file.pdf") as pdf:
text = pdf.pages[0].extract_text()
Advanced features
Form filling: See FORMS.md
API reference: See REFERENCE.md
Examples: See EXAMPLES.md
### Tier 3: Reference Files (Loaded On-Demand)
200-300 lines each. Detailed, modular, focused on single topics. Claude reads only when needed.
### Directory Structure
skill-name/
├── SKILL.md # Entry point (≤200 lines)
├── references/
│ ├── detailed-guide.md # Loaded only when needed
│ ├── api-reference.md
│ └── examples.md
└── scripts/
├── validate.py # Executed, not loaded into context
└── helper.sh
---
## 4. Organize by Workflow, Not by Tool
A critical reframing: skills should map to **what you actually do during development**, not to individual tools.
| Bad (tool-centric) | Good (workflow-centric) |
| ------------------------------------------------ | -------------------------------------------------- |
| `cloudflare/` (1,131 lines) | `devops/` (Cloudflare + Docker + GCloud + Vercel) |
| `tailwind/` + `shadcn/` separate | `ui-styling/` (shadcn + Tailwind + design tokens) |
| `nextjs/` + `turborepo/` + `remix/` separate | `web-frameworks/` (Next.js + Turborepo + Remix) |
Each skill teaches Claude **how to perform a specific development task**, not what a tool does. That's the difference between an encyclopedia and a capability.
---
## 5. Writing Effective Frontmatter
### Name Field
- Max 64 characters, lowercase letters/numbers/hyphens only
- Use gerund form for clarity: `processing-pdfs`, `testing-code`, `managing-databases`
- Avoid vague names: `helper`, `utils`, `tools`
- No reserved words: `anthropic`, `claude`
### Description Field
- Max 1024 characters, non-empty
- **Always write in third person** (it's injected into the system prompt)
- Include both what the skill does AND when to use it
- Include specific trigger terms
```yaml
# Bad
description: "Helps with documents"
# Good
description: "Extracts text and tables from PDF files, fills forms, merges documents. Use when working with PDF files or when the user mentions PDFs, forms, or document extraction."
6. Conciseness: Only Add What Claude Doesn't Know
Claude is already very smart. Challenge every piece of information:
- "Does Claude really need this explanation?"
- "Can I assume Claude knows this?"
- "Does this paragraph justify its token cost?"
# Good (~50 tokens)
## Extract PDF text
Use pdfplumber for text extraction:
```python
import pdfplumber
with pdfplumber.open("file.pdf") as pdf:
text = pdf.pages[0].extract_text()
Bad (~150 tokens)
Extract PDF text
PDF (Portable Document Format) files are a common file format...
To extract text from a PDF, you'll need a library...
There are many libraries available...
---
## 7. Set Appropriate Degrees of Freedom
Match specificity to the task's fragility:
| Freedom Level | When to Use | Example |
| ------------- | ---------------------------------- | ------------------------------------------ |
| **High** | Multiple approaches valid | Code review guidelines, refactoring advice |
| **Medium** | Preferred pattern exists | Report templates with customizable sections|
| **Low** | Fragile, consistency critical | Database migrations, exact deploy commands |
Think of it as a path: narrow bridge with cliffs = exact instructions; open field = general direction.
---
## 8. Keep References One Level Deep
Claude may partially read files when they're referenced from other referenced files. Avoid nesting.
```markdown
# Bad: nested references
SKILL.md → advanced.md → details.md → actual info
# Good: flat references
SKILL.md → advanced.md (has the info)
SKILL.md → reference.md (has the info)
SKILL.md → examples.md (has the info)
For reference files over 100 lines, include a table of contents at the top so Claude can see the full scope even when previewing.
9. Workflows and Feedback Loops
Use Checklists for Complex Tasks
Provide a checklist Claude can copy and track progress:
## Deployment workflow
Copy this checklist:
- [ ] Step 1: Run tests
- [ ] Step 2: Build artifacts
- [ ] Step 3: Validate config
- [ ] Step 4: Deploy to staging
- [ ] Step 5: Verify staging
- [ ] Step 6: Deploy to production
Implement Validation Loops
The pattern: run validator → fix errors → repeat
1. Make edits
2. Validate immediately: `python scripts/validate.py`
3. If validation fails → fix → validate again
4. Only proceed when validation passes
10. Common Patterns
Template Pattern
Provide output format templates. Use strict templates for API responses, flexible templates for reports.
Examples Pattern
Provide input/output pairs (like few-shot prompting):
**Input**: Added user authentication with JWT tokens
**Output**:
feat(auth): implement JWT-based authentication
Add login endpoint and token validation middleware
Conditional Workflow Pattern
Guide Claude through decision points:
**Creating new content?** → Follow "Creation workflow"
**Editing existing content?** → Follow "Editing workflow"
11. Scripts: Solve, Don't Punt
When writing utility scripts for skills:
- Handle errors explicitly with helpful messages, don't just let them crash
- Document all magic constants (no "voodoo constants")
- List all required package dependencies explicitly
- Make clear whether Claude should execute the script or read it as reference
- Use
pip install --break-system-packagesfor Python packages
12. Iterative Development with Claude
The most effective development loop:
- Complete a task without a skill - notice what context you repeatedly provide
- Ask Claude A to create a skill from those patterns
- Review for conciseness - cut anything Claude already knows
- Test with Claude B (fresh instance) on real tasks
- Observe behavior - where does it drift or miss?
- Refine with Claude A based on observations
- Repeat
What to Watch For
- Does Claude read files in an order you didn't expect? → Restructure
- Does Claude miss references to important files? → Make links more prominent
- Does Claude over-read the same file? → Move that content to SKILL.md
- Does Claude never access a bundled file? → Remove it
13. Testing
- Test the cold start: Clear context, activate skill, measure. If >500 lines load on first activation, refactor.
- Test with all target models: What works for Opus may need more detail for Haiku.
- Build evaluations before writing docs: Identify gaps first, then write just enough to fix them.
- Create at least 3 evaluation scenarios per skill.
14. Anti-Patterns to Avoid
| Anti-Pattern | Why It's Bad |
|---|---|
| Giant monolithic SKILL.md | Context explosion, slow activation, irrelevant noise |
| Tool-centric organization | Creates duplicates, doesn't match workflows |
Windows-style paths (\) |
Breaks on Unix systems |
| Multiple tool options | Confuses Claude; pick a default, add an escape hatch |
| Time-sensitive info | Becomes wrong; use "old patterns" sections instead |
| Inconsistent terminology | Confuses Claude; pick one term and stick with it |
| Deeply nested references | Claude partially reads nested files |
| Assuming tools are pre-installed | Always include install commands |
15. Quick Reference Checklist
Before Sharing a Skill
- [ ] Description is specific, third-person, includes trigger terms
- [ ] SKILL.md body is under 200 lines (hard target) / 500 lines (absolute max)
- [ ] Heavy content is in separate reference files
- [ ] References are one level deep from SKILL.md
- [ ] No time-sensitive information
- [ ] Consistent terminology throughout
- [ ] Concrete examples, not abstract explanations
- [ ] Workflows have clear sequential steps
- [ ] Feedback loops for quality-critical tasks
- [ ] Tested with cold start activation
- [ ] Tested on real tasks, not just test scenarios
For Skills with Code
- [ ] Scripts handle errors explicitly
- [ ] No magic constants (all values documented)
- [ ] Required packages listed with install commands
- [ ] Validation/verification steps for critical operations
- [ ] Clear distinction: execute vs. read-as-reference
- [ ] Forward slashes only in all file paths
Key Metrics to Track
| Metric | Bad | Good |
|---|---|---|
| SKILL.md lines | 800+ | ≤200 |
| First activation load | 1,000+ lines | <500 lines |
| Relevant info ratio | ~10% | ~90% |
| Token efficiency | 1x (baseline) | 4-5x improved |
| Activation time | ~500ms | <100ms |
Sources: Anthropic Agent Skills Best Practices · Community Refactoring Post