← All Skills
AI Skill

Technical SEO Audit Skill

> Purpose: Prevent SEO disasters like the One Person Company incident (2026-01-27) > > Reusable for: Any static website deployment (Cloudflare Pages, Vercel, Ne

Quick Install
npx skills add technical-seo-audit

Technical SEO Audit Skill

Purpose: Prevent SEO disasters like the One Person Company incident (2026-01-27)
>
Reusable for: Any static website deployment (Cloudflare Pages, Vercel, Netlify, etc.)

✅ Pre-Deployment Checklist (MANDATORY)

Before deploying ANY website, verify these 4 things:

1️⃣ Create _headers File (CRITICAL)

Location: deploy/_headers (or public/_headers depending on your build) Content:
# Cloudflare Pages Headers Configuration

Allow all pages to be indexed by search engines

/ X-Robots-Tag: index, follow X-Frame-Options: SAMEORIGIN X-Content-Type-Options: nosniff Referrer-Policy: strict-origin-when-cross-origin Cache-Control: public, max-age=3600

Static assets - longer cache

/images/
Cache-Control: public, max-age=31536000, immutable

Sitemap and robots - short cache for updates

/sitemap.xml X-Robots-Tag: noindex Cache-Control: public, max-age=3600

/robots.txt X-Robots-Tag: noindex Cache-Control: public, max-age=3600

Why noindex for sitemap/robots?
  • These are tool files, not content
  • Google doesn't need to index them
  • Reduces crawl waste

2️⃣ Generate Complete Sitemap

Requirements:
  • All HTML pages must be in sitemap
  • Use production domain (NOT .pages.dev or localhost)
  • Valid XML format
  • Include tags
Generation Script Example:
// scripts/generate_sitemap.js
const fs = require('fs');
const path = require('path');

const DOMAIN = 'https://your-domain.com'; const DEPLOY_DIR = './deploy';

const htmlFiles = fs.readdirSync(DEPLOY_DIR) .filter(f => f.endsWith('.html'));

const sitemap = <?xml version="1.0" encoding="UTF-8"?> <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <url> <loc>${DOMAIN}/</loc> <lastmod>${new Date().toISOString()}</lastmod> <changefreq>daily</changefreq> <priority>1.0</priority> </url> ${htmlFiles.map(file => <url> <loc>${DOMAIN}/${file}</loc> <lastmod>${new Date().toISOString()}</lastmod> <changefreq>weekly</changefreq> <priority>0.8</priority> </url>).join('\n')} </urlset>;

fs.writeFileSync(path.join(DEPLOY_DIR, 'sitemap.xml'), sitemap); console.log(✅ Generated sitemap with ${htmlFiles.length + 1} URLs);

3️⃣ Configure robots.txt

Location: deploy/robots.txt Content:
User-agent: 
Allow: /

Sitemap

Sitemap: https://your-domain.com/sitemap.xml

AI Agents (optional)

User-agent: GPTBot Allow: /

User-agent: ChatGPT-User Allow: /

User-agent: Claude-Web Allow: /

DO NOT:
# ❌ WRONG - This blocks everything
User-agent: 
Disallow: /

4️⃣ Validate Before Deploying

Run this script EVERY TIME before deploying:
#!/bin/bash

Pre-deployment validation

echo "🔍 SEO Validation Check..."

1. Check _headers exists

if [ ! -f "deploy/_headers" ]; then echo "❌ CRITICAL: deploy/_headers missing!" echo " Google won't index your site!" exit 1 fi

2. Check _headers content

if ! grep -q "X-Robots-Tag: index, follow" deploy/_headers; then echo "❌ CRITICAL: _headers doesn't allow indexing!" exit 1 fi

3. Check sitemap exists

if [ ! -f "deploy/sitemap.xml" ]; then echo "❌ CRITICAL: sitemap.xml missing!" exit 1 fi

4. Count URLs in sitemap

SITEMAP_URLS=$(grep -c "<loc>" deploy/sitemap.xml) HTML_FILES=$(ls deploy/
.html 2>/dev/null | wc -l | tr -d ' ')

if [ "$SITEMAP_URLS" -lt "$HTML_FILES" ]; then echo "❌ CRITICAL: Sitemap incomplete!" echo " Sitemap: $SITEMAP_URLS URLs" echo " HTML files: $HTML_FILES files" exit 1 fi

echo "✅ All SEO checks passed!" exit 0


🔍 Post-Deployment Verification

After deploying, immediately verify:

Check HTTP Headers

curl -I https://your-domain.com/

✅ Should see:

x-robots-tag: index, follow

❌ If you see:

x-robots-tag: noindex

→ Your _headers file wasn't deployed correctly

Check Sitemap Accessibility

curl -s https://your-domain.com/sitemap.xml | head -20

Should show valid XML with production domain

Check robots.txt

curl https://your-domain.com/robots.txt

Should allow crawling and reference sitemap


📊 Google Indexing Timeline

After fixing SEO issues:

TimelineExpected Result
24-48 hoursGoogle starts crawling
3-7 days30-50% pages indexed
2-4 weeks80-100% pages indexed
1-3 monthsRankings stabilize
Note: New sites are slow. Be patient.

🚫 Common Mistakes (Learned from One Person Company)

❌ Mistake #1: No _headers File

Problem: Cloudflare adds X-Robots-Tag: noindex by default Result: 0% of site indexed Fix: Create deploy/_headers with X-Robots-Tag: index, follow

❌ Mistake #2: Incomplete Sitemap

Problem: Sitemap has 50 URLs, but site has 208 pages Result: Google only knows about 24% of content Fix: Regenerate sitemap automatically before each deployment

❌ Mistake #3: Wrong Domain in Sitemap

Problem: Sitemap uses .pages.dev instead of custom domain Result: Google crawls wrong URLs Fix: Use production domain in sitemap generation script

❌ Mistake #4: Manual Sitemap Updates

Problem: Forget to update sitemap when adding new pages Result: New content never gets indexed Fix: Automate sitemap generation in build process

🛠️ Integration with Build Process

For Cloudflare Pages

Add to package.json:

{
  "scripts": {
    "build": "npm run generate-content && npm run generate-sitemap && npm run validate-seo",
    "generate-sitemap": "node scripts/generate_sitemap.js",
    "validate-seo": "./scripts/pre_deploy_validation.sh"
  }
}

For Vercel/Netlify

Same approach - run validation before build:

{
  "scripts": {
    "build": "npm run validate-seo && next build",
    "validate-seo": "./scripts/pre_deploy_validation.sh"
  }
}


🔄 Automation with Cron Jobs

See /Users/xiaoyinqu/heyboss/heyboss-cron-service for automated deployment system.

Benefits:
  • Automatic daily deployments
  • Pre-deployment validation
  • Lark notifications for results
  • No human errors

📝 For Future Websites

When starting a new website project:

  1. ✅ Copy deploy/_headers template
  2. ✅ Copy scripts/generate_sitemap.js
  3. ✅ Copy scripts/pre_deploy_validation.sh
  4. ✅ Add validation to build process
  5. ✅ Test with curl -I https://new-domain.com/
NEVER deploy without validating these 4 files:
  1. _headers (with X-Robots-Tag: index, follow)
  2. sitemap.xml (complete, production domain)
  3. robots.txt (allows crawling)
  4. index.html (links to all content)

🆘 Troubleshooting

Problem: Google not indexing after 2 weeks

Check:
  1. curl -I https://your-domain.com/ → Should show x-robots-tag: index, follow
  2. Google Search Console → Coverage report → Check for errors
  3. site:your-domain.com in Google → How many pages indexed?
If still blocked:
  • Request indexing manually in Google Search Console
  • Check for manual actions (penalties)
  • Verify domain ownership
  • Check server logs for Googlebot access

Problem: Some pages indexed, others not

Likely causes:
  1. Sitemap incomplete → Regenerate
  2. No internal links to pages → Add to index/navigation
  3. noindex meta tags on specific pages → Remove
  4. 404 errors → Fix broken links

📚 Related Documentation

  • One Person Company SKILL.md - Full deployment workflow
  • SEO_ROOT_CAUSE_ANALYSIS.md - Detailed postmortem of indexing failure
  • DEPLOYMENT_CHECKLIST.md - Step-by-step deployment guide
  • pre_deploy_validation.sh - Automated validation script


🚀 Quick Start Commands

For a NEW website project:

# 1. Create project structure
mkdir -p deploy/images scripts automation

2. Copy SEO protection files from One Person Company

cp ~/path/to/onepersoncompany/deploy/_headers deploy/ cp ~/path/to/onepersoncompany/scripts/pre_deploy_validation.sh scripts/ cp ~/path/to/onepersoncompany/scripts/generate_sitemap_and_index.js scripts/

3. Make scripts executable

chmod +x scripts/
.sh

4. Update domain in generation script

sed -i '' 's/onepersoncompany.com/your-domain.com/g' scripts/generate_sitemap_and_index.js

5. Run initial setup

node scripts/generate_sitemap_and_index.js ./scripts/pre_deploy_validation.sh

6. Deploy

./automation/deploy.sh

For EXISTING website (emergency fix):

# 1. Check if indexing is blocked
curl -I https://your-domain.com/ | grep -i x-robots-tag

2. If you see "noindex" - create _headers immediately

cat > deploy/_headers << 'EOF' /* X-Robots-Tag: index, follow X-Frame-Options: SAMEORIGIN X-Content-Type-Options: nosniff Cache-Control: public, max-age=3600 EOF

3. Regenerate sitemap

node scripts/generate_sitemap_and_index.js

4. Validate

./scripts/pre_deploy_validation.sh

5. Redeploy

./automation/deploy.sh

6. Verify fix (wait 5 minutes)

curl -I https://your-domain.com/ | grep x-robots-tag

Should show: x-robots-tag: index, follow


📝 Template Files

Template: package.json (build script)

{
  "scripts": {
    "prebuild": "npm run validate-seo",
    "build": "npm run generate-sitemap && npm run build-site",
    "generate-sitemap": "node scripts/generate_sitemap_and_index.js",
    "validate-seo": "./scripts/pre_deploy_validation.sh",
    "deploy": "npm run build && ./automation/deploy.sh"
  }
}

Template: GitHub Actions (CI/CD)

# .github/workflows/deploy.yml
name: Deploy

on: push: branches: [main]

jobs: deploy: runs-on: ubuntu-latest steps: - uses: actions/checkout@v3

- name: Setup Node uses: actions/setup-node@v3 with: node-version: '18'

- name: Install dependencies run: npm install

- name: Generate sitemap run: node scripts/generate_sitemap_and_index.js

- name: SEO Validation (CRITICAL) run: ./scripts/pre_deploy_validation.sh

- name: Deploy to Cloudflare run: ./automation/deploy.sh env: CLOUDFLARE_API_TOKEN: ${{ secrets.CLOUDFLARE_API_TOKEN }} CLOUDFLARE_ACCOUNT_ID: ${{ secrets.CLOUDFLARE_ACCOUNT_ID }}


🧪 Testing Checklist

Before going live:

  • [ ] Run curl -I https://your-domain.com/ → Check x-robots-tag
  • [ ] Run curl https://your-domain.com/sitemap.xml → Verify accessibility
  • [ ] Open homepage in browser → Check all links work
  • [ ] Test on mobile → Verify viewport is correct
  • [ ] Run Lighthouse audit → Check for SEO score
  • [ ] Submit sitemap to Google Search Console
  • [ ] Request indexing for homepage in GSC
  • [ ] Set up Google Analytics (optional)

After deployment (within 24 hours):

  • [ ] site:your-domain.com in Google → Verify homepage appears
  • [ ] Check Google Search Console → Verify no errors
  • [ ] Check server logs → Verify Googlebot accessed the site
  • [ ] Test 5 random article URLs → All should load correctly

After 7 days:

  • [ ] Check indexing progress in GSC Coverage Report
  • [ ] Should see 30-50% of pages indexed
  • [ ] No "Excluded by 'noindex' tag" errors
  • [ ] Googlebot crawl rate should be increasing

🎓 Learning Resources

Official Documentation:

Tools:


Created: 2026-01-28 Last Updated: 2026-01-28 Lessons Learned From: One Person Company SEO disaster (2026-01-27) Status: Production-ready, battle-tested