Skip to content

Luna RAG + GLM Vision Cloud Deployment Guide โ€‹

Last Updated: November 5, 2025
Status: Production Ready
Platform: Cloudflare Workers + Cloud Vector Databases


๐ŸŒ Cloud Architecture Overview โ€‹

Infrastructure Components โ€‹

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                    Cloudflare Global Network                 โ”‚
โ”‚                  (200+ Cities Worldwide)                     โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                              โ†“
        โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
        โ”‚     Cloudflare Workers (Serverless)     โ”‚
        โ”‚   - Luna RAG MCP Server                 โ”‚
        โ”‚   - Luna GLM Vision MCP Server          โ”‚
        โ”‚   - Integration Layer                   โ”‚
        โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                โ†“                    โ†“
    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
    โ”‚  Cloud Vector DB โ”‚  โ”‚  Cloudflare R2   โ”‚
    โ”‚  (Pinecone/      โ”‚  โ”‚  (Object Storage)โ”‚
    โ”‚   Weaviate/      โ”‚  โ”‚  - Screenshots   โ”‚
    โ”‚   Qdrant)        โ”‚  โ”‚  - Test Reports  โ”‚
    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                โ†“                    โ†“
    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
    โ”‚  Cloudflare KV   โ”‚  โ”‚  Cloudflare D1   โ”‚
    โ”‚  (Cache/Config)  โ”‚  โ”‚  (SQLite DB)     โ”‚
    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Benefits of Cloud Deployment โ€‹

  • โœ… Zero Infrastructure Management - No servers to maintain
  • โœ… Global Distribution - Sub-50ms latency worldwide
  • โœ… Auto-Scaling - Handle 0 to millions of requests
  • โœ… Cost-Effective - Pay only for what you use
  • โœ… High Availability - 99.99% uptime SLA
  • โœ… Built-in Security - DDoS protection, SSL/TLS
  • โœ… Easy Deployment - Single command deployment

๐Ÿ“ฆ Prerequisites โ€‹

Required Accounts โ€‹

  1. Cloudflare Account (Free tier available)

  2. Vector Database (Choose one)

  3. AI Provider (Choose one)

  4. GLM API

Required Tools โ€‹

bash
# Install Wrangler CLI (Cloudflare Workers CLI)
npm install -g wrangler

# Verify installation
wrangler --version

# Login to Cloudflare
wrangler login

๐Ÿš€ Quick Start Deployment โ€‹

Step 1: Clone and Setup โ€‹

bash
# Clone the repository
git clone https://github.com/shacharsol/luna-agents.git
cd luna-agents

# Navigate to cloud deployment
cd mcp-servers/cloud-deployment

# Install dependencies
npm install

Step 2: Configure Environment โ€‹

Create .env file:

bash
# Cloudflare Configuration
CLOUDFLARE_ACCOUNT_ID=your_account_id
CLOUDFLARE_API_TOKEN=your_api_token

# Vector Database (Pinecone)
PINECONE_API_KEY=your_pinecone_key
PINECONE_ENVIRONMENT=us-west1-gcp
PINECONE_INDEX_NAME=luna-rag-context

# OpenAI (for embeddings)
OPENAI_API_KEY=your_openai_key
OPENAI_EMBEDDING_MODEL=text-embedding-3-small

# GLM Vision
GLM_API_KEY=your_glm_key
GLM_BASE_URL=https://open.bigmodel.cn/api/paas/v4
GLM_MODEL=glm-4.5v

# R2 Storage
R2_BUCKET_NAME=luna-test-reports
R2_ACCESS_KEY_ID=your_r2_access_key
R2_SECRET_ACCESS_KEY=your_r2_secret_key

Step 3: Deploy to Cloudflare โ€‹

bash
# Deploy RAG MCP Server
wrangler deploy --name luna-rag-mcp

# Deploy GLM Vision MCP Server
wrangler deploy --name luna-glm-vision-mcp

# Deploy Integration Layer
wrangler deploy --name luna-integration-mcp

Step 4: Verify Deployment โ€‹

bash
# Test RAG endpoint
curl https://luna-rag-mcp.your-subdomain.workers.dev/health

# Test GLM Vision endpoint
curl https://luna-glm-vision-mcp.your-subdomain.workers.dev/health

# Test Integration endpoint
curl https://luna-integration-mcp.your-subdomain.workers.dev/health

๐Ÿ”ง Detailed Configuration โ€‹

1. Cloudflare Workers Setup โ€‹

Create Workers โ€‹

bash
# Create RAG Worker
wrangler init luna-rag-mcp
cd luna-rag-mcp

# Create wrangler.toml
cat > wrangler.toml << EOF
name = "luna-rag-mcp"
main = "src/index.js"
compatibility_date = "2024-01-01"

[vars]
ENVIRONMENT = "production"

[[kv_namespaces]]
binding = "CACHE"
id = "your_kv_namespace_id"

[[r2_buckets]]
binding = "STORAGE"
bucket_name = "luna-context-storage"

[env.production]
name = "luna-rag-mcp"
route = "https://rag.luna-agents.com/*"
EOF

Worker Code Structure โ€‹

javascript
// src/index.js
import { Server } from '@modelcontextprotocol/sdk/server/index.js';
import { PineconeClient } from '@pinecone-database/pinecone';
import { OpenAIEmbeddings } from '@langchain/openai';

export default {
  async fetch(request, env, ctx) {
    // Initialize MCP Server
    const server = new Server({
      name: 'luna-rag-mcp-cloud',
      version: '2.0.0'
    });

    // Initialize Pinecone
    const pinecone = new PineconeClient({
      apiKey: env.PINECONE_API_KEY,
      environment: env.PINECONE_ENVIRONMENT
    });

    // Initialize OpenAI Embeddings
    const embeddings = new OpenAIEmbeddings({
      openAIApiKey: env.OPENAI_API_KEY,
      modelName: env.OPENAI_EMBEDDING_MODEL
    });

    // Handle MCP requests
    return handleMCPRequest(request, server, pinecone, embeddings, env);
  }
};

async function handleMCPRequest(request, server, pinecone, embeddings, env) {
  const url = new URL(request.url);
  
  // Health check
  if (url.pathname === '/health') {
    return new Response(JSON.stringify({ status: 'healthy' }), {
      headers: { 'Content-Type': 'application/json' }
    });
  }

  // MCP tool handlers
  if (url.pathname === '/tools/query_context') {
    return await handleQueryContext(request, pinecone, embeddings, env);
  }

  if (url.pathname === '/tools/setup_rag') {
    return await handleSetupRAG(request, pinecone, embeddings, env);
  }

  return new Response('Not Found', { status: 404 });
}

2. Vector Database Setup โ€‹

Pinecone Configuration โ€‹

javascript
// Initialize Pinecone index
const pinecone = new PineconeClient({
  apiKey: process.env.PINECONE_API_KEY,
  environment: process.env.PINECONE_ENVIRONMENT
});

// Create index (one-time setup)
await pinecone.createIndex({
  name: 'luna-rag-context',
  dimension: 1536, // OpenAI text-embedding-3-small
  metric: 'cosine',
  pods: 1,
  replicas: 1,
  podType: 'p1.x1'
});

// Get index
const index = pinecone.Index('luna-rag-context');

// Upsert vectors
await index.upsert({
  vectors: [
    {
      id: 'context-1',
      values: embedding,
      metadata: {
        filePath: 'src/components/Auth.tsx',
        type: 'component',
        language: 'typescript'
      }
    }
  ]
});

// Query vectors
const results = await index.query({
  vector: queryEmbedding,
  topK: 5,
  includeMetadata: true
});

Weaviate Cloud Configuration โ€‹

javascript
import weaviate from 'weaviate-ts-client';

const client = weaviate.client({
  scheme: 'https',
  host: 'your-cluster.weaviate.network',
  apiKey: new weaviate.ApiKey(process.env.WEAVIATE_API_KEY),
  headers: {
    'X-OpenAI-Api-Key': process.env.OPENAI_API_KEY
  }
});

// Create schema
await client.schema
  .classCreator()
  .withClass({
    class: 'CodeContext',
    vectorizer: 'text2vec-openai',
    properties: [
      { name: 'content', dataType: ['text'] },
      { name: 'filePath', dataType: ['string'] },
      { name: 'language', dataType: ['string'] },
      { name: 'type', dataType: ['string'] }
    ]
  })
  .do();

// Add objects
await client.data
  .creator()
  .withClassName('CodeContext')
  .withProperties({
    content: 'function login() { ... }',
    filePath: 'src/auth/login.ts',
    language: 'typescript',
    type: 'function'
  })
  .do();

// Query
const result = await client.graphql
  .get()
  .withClassName('CodeContext')
  .withNearText({ concepts: ['authentication'] })
  .withLimit(5)
  .withFields('content filePath language type')
  .do();

3. Cloudflare R2 Storage Setup โ€‹

Create R2 Bucket โ€‹

bash
# Create bucket for screenshots
wrangler r2 bucket create luna-screenshots

# Create bucket for test reports
wrangler r2 bucket create luna-test-reports

# List buckets
wrangler r2 bucket list

R2 Integration in Worker โ€‹

javascript
export default {
  async fetch(request, env, ctx) {
    // Upload screenshot to R2
    if (request.method === 'POST' && url.pathname === '/upload-screenshot') {
      const formData = await request.formData();
      const file = formData.get('screenshot');
      
      const key = `screenshots/${Date.now()}-${file.name}`;
      await env.SCREENSHOTS.put(key, file.stream(), {
        httpMetadata: {
          contentType: file.type
        }
      });

      return new Response(JSON.stringify({ 
        success: true, 
        url: `https://screenshots.luna-agents.com/${key}` 
      }));
    }

    // Retrieve screenshot from R2
    if (request.method === 'GET' && url.pathname.startsWith('/screenshots/')) {
      const key = url.pathname.slice(1);
      const object = await env.SCREENSHOTS.get(key);
      
      if (!object) {
        return new Response('Not Found', { status: 404 });
      }

      return new Response(object.body, {
        headers: {
          'Content-Type': object.httpMetadata.contentType,
          'Cache-Control': 'public, max-age=31536000'
        }
      });
    }
  }
};

4. Cloudflare KV for Caching โ€‹

javascript
// Store in KV
await env.CACHE.put('context:login-component', JSON.stringify(context), {
  expirationTtl: 3600 // 1 hour
});

// Retrieve from KV
const cached = await env.CACHE.get('context:login-component', 'json');
if (cached) {
  return cached;
}

// Delete from KV
await env.CACHE.delete('context:login-component');

// List keys
const keys = await env.CACHE.list({ prefix: 'context:' });

๐Ÿ” Security Configuration โ€‹

API Key Management โ€‹

bash
# Set secrets in Cloudflare Workers
wrangler secret put PINECONE_API_KEY
wrangler secret put OPENAI_API_KEY
wrangler secret put GLM_API_KEY
wrangler secret put R2_ACCESS_KEY_ID
wrangler secret put R2_SECRET_ACCESS_KEY

# List secrets
wrangler secret list

CORS Configuration โ€‹

javascript
// Add CORS headers
function addCORSHeaders(response) {
  const headers = new Headers(response.headers);
  headers.set('Access-Control-Allow-Origin', '*');
  headers.set('Access-Control-Allow-Methods', 'GET, POST, PUT, DELETE, OPTIONS');
  headers.set('Access-Control-Allow-Headers', 'Content-Type, Authorization');
  
  return new Response(response.body, {
    status: response.status,
    statusText: response.statusText,
    headers
  });
}

// Handle OPTIONS requests
if (request.method === 'OPTIONS') {
  return new Response(null, {
    headers: {
      'Access-Control-Allow-Origin': '*',
      'Access-Control-Allow-Methods': 'GET, POST, PUT, DELETE, OPTIONS',
      'Access-Control-Allow-Headers': 'Content-Type, Authorization'
    }
  });
}

Rate Limiting โ€‹

javascript
// Implement rate limiting with KV
async function checkRateLimit(clientId, env) {
  const key = `ratelimit:${clientId}`;
  const limit = 100; // requests per minute
  const window = 60; // seconds

  const current = await env.CACHE.get(key, 'json') || { count: 0, resetAt: Date.now() + window * 1000 };

  if (Date.now() > current.resetAt) {
    current.count = 0;
    current.resetAt = Date.now() + window * 1000;
  }

  if (current.count >= limit) {
    return { allowed: false, retryAfter: Math.ceil((current.resetAt - Date.now()) / 1000) };
  }

  current.count++;
  await env.CACHE.put(key, JSON.stringify(current), { expirationTtl: window });

  return { allowed: true, remaining: limit - current.count };
}

๐Ÿ“Š Monitoring & Observability โ€‹

Cloudflare Analytics โ€‹

javascript
// Log analytics events
export default {
  async fetch(request, env, ctx) {
    const startTime = Date.now();
    
    try {
      const response = await handleRequest(request, env);
      
      // Log successful request
      ctx.waitUntil(logAnalytics(env, {
        timestamp: new Date().toISOString(),
        path: new URL(request.url).pathname,
        method: request.method,
        status: response.status,
        duration: Date.now() - startTime,
        success: true
      }));
      
      return response;
    } catch (error) {
      // Log error
      ctx.waitUntil(logAnalytics(env, {
        timestamp: new Date().toISOString(),
        path: new URL(request.url).pathname,
        method: request.method,
        error: error.message,
        duration: Date.now() - startTime,
        success: false
      }));
      
      throw error;
    }
  }
};

async function logAnalytics(env, data) {
  await env.ANALYTICS.put(
    `log:${Date.now()}:${Math.random()}`,
    JSON.stringify(data),
    { expirationTtl: 86400 * 30 } // 30 days
  );
}

Custom Metrics โ€‹

javascript
// Track custom metrics
class MetricsCollector {
  constructor(env) {
    this.env = env;
  }

  async trackQueryLatency(duration) {
    const key = `metrics:query_latency:${new Date().toISOString().split('T')[0]}`;
    const current = await this.env.CACHE.get(key, 'json') || { count: 0, total: 0, min: Infinity, max: 0 };
    
    current.count++;
    current.total += duration;
    current.min = Math.min(current.min, duration);
    current.max = Math.max(current.max, duration);
    current.avg = current.total / current.count;
    
    await this.env.CACHE.put(key, JSON.stringify(current), { expirationTtl: 86400 * 7 });
  }

  async trackEmbeddingCost(tokens) {
    const key = `metrics:embedding_cost:${new Date().toISOString().split('T')[0]}`;
    const current = await this.env.CACHE.get(key, 'json') || { tokens: 0, cost: 0 };
    
    current.tokens += tokens;
    current.cost = (current.tokens / 1000000) * 0.02; // $0.02 per 1M tokens
    
    await this.env.CACHE.put(key, JSON.stringify(current), { expirationTtl: 86400 * 30 });
  }
}

๐Ÿ’ฐ Cost Optimization โ€‹

Pricing Breakdown โ€‹

Cloudflare Workers โ€‹

  • Free Tier: 100,000 requests/day
  • Paid Plan: $5/month for 10M requests
  • Additional: $0.50 per million requests

Cloudflare R2 โ€‹

  • Storage: $0.015/GB/month
  • Class A Operations: $4.50 per million
  • Class B Operations: $0.36 per million
  • Free Egress: No bandwidth charges

Cloudflare KV โ€‹

  • Free Tier: 100,000 reads/day, 1,000 writes/day
  • Paid: $0.50 per million reads, $5 per million writes

Vector Databases โ€‹

Pinecone:

  • Starter: Free (1 pod, 5M vectors)
  • Standard: $70/month (1 pod)
  • Enterprise: Custom pricing

Weaviate Cloud:

  • Sandbox: Free (limited)
  • Standard: $25/month
  • Business: $200/month

Qdrant Cloud:

  • Free: 1GB storage
  • Starter: $25/month
  • Pro: $95/month

Cost Optimization Strategies โ€‹

javascript
// 1. Implement aggressive caching
async function getCachedOrFetch(key, fetchFn, ttl = 3600) {
  const cached = await env.CACHE.get(key, 'json');
  if (cached) return cached;
  
  const data = await fetchFn();
  await env.CACHE.put(key, JSON.stringify(data), { expirationTtl: ttl });
  return data;
}

// 2. Batch operations
async function batchEmbeddings(texts) {
  // Process in batches of 100 to reduce API calls
  const batchSize = 100;
  const batches = [];
  
  for (let i = 0; i < texts.length; i += batchSize) {
    batches.push(texts.slice(i, i + batchSize));
  }
  
  const results = await Promise.all(
    batches.map(batch => generateEmbeddings(batch))
  );
  
  return results.flat();
}

// 3. Use cheaper embedding models for non-critical queries
function selectEmbeddingModel(priority) {
  if (priority === 'high') {
    return 'text-embedding-3-large'; // More expensive, higher quality
  }
  return 'text-embedding-3-small'; // Cheaper, good quality
}

// 4. Implement request deduplication
const pendingRequests = new Map();

async function deduplicatedRequest(key, fn) {
  if (pendingRequests.has(key)) {
    return await pendingRequests.get(key);
  }
  
  const promise = fn();
  pendingRequests.set(key, promise);
  
  try {
    const result = await promise;
    return result;
  } finally {
    pendingRequests.delete(key);
  }
}

๐Ÿงช Testing Cloud Deployment โ€‹

Local Development โ€‹

bash
# Run local development server
wrangler dev

# Test locally
curl http://localhost:8787/health

Staging Environment โ€‹

bash
# Deploy to staging
wrangler deploy --env staging

# Test staging
curl https://luna-rag-mcp-staging.workers.dev/health

Production Deployment โ€‹

bash
# Deploy to production
wrangler deploy --env production

# Verify production
curl https://luna-rag-mcp.workers.dev/health

๐Ÿ”„ CI/CD Pipeline โ€‹

GitHub Actions Workflow โ€‹

yaml
name: Deploy to Cloudflare Workers

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      
      - name: Setup Node.js
        uses: actions/setup-node@v3
        with:
          node-version: '18'
      
      - name: Install dependencies
        run: npm ci
      
      - name: Run tests
        run: npm test
      
      - name: Deploy to Cloudflare Workers
        uses: cloudflare/wrangler-action@v3
        with:
          apiToken: ${{ secrets.CLOUDFLARE_API_TOKEN }}
          accountId: ${{ secrets.CLOUDFLARE_ACCOUNT_ID }}
          command: deploy --env production

๐Ÿ“ž Support & Resources โ€‹

Documentation โ€‹

Community โ€‹


Status: โœ… Production Ready
Last Updated: November 5, 2025
Version: 2.0.0

Built with โค๏ธ for developers