Building a Secure File Upload Pipeline with Jenkins, Docker & AWS

Secure File Upload Pipeline with Jenkins

Posted Apr 18, 2026

By Sambath

10 min read

A step-by-step walkthrough of how I built an automated file upload pipeline that validates Excel, JSON, and CSV files before landing them safely in S3 — using Jenkins running in Docker, AWS Lambda for sanity checks, and a clean quarantine-then-promote flow.

The Problem
Architecture Overview
Part 1 — Setting Up Jenkins in Docker
Part 2 — AWS Setup
Part 3 — The Lambda Validator
Part 4 — The Jenkins Pipeline
Part 5 — How It All Works Together
Where to Make Diagrams
Lessons Learned

The Problem

The requirement was simple on paper:

“The upload job should first upload the file to a quarantine bucket, apply a Lambda function in order to ensure the sanity check (no malware, correct format, etc.) on it before moving it to the target bucket (tmp_zone)”

But making it production-ready — with proper validation, clean error handling, and no AWS CLI dependency — turned out to be a journey worth documenting.

Files coming in: .xlsx, .json, .csv Entry point: Jenkins UI (manual upload via browser) Cloud: AWS Free Tier — N. Virginia (us-east-1)

Architecture Overview

[ User uploads file via Jenkins UI ]
              │
              ▼
   ┌─────────────────────┐
   │   Jenkins Pipeline   │
   │   (Docker container) │
   └─────────┬───────────┘
             │
    ① Validate extension
    (.xlsx / .json / .csv only)
             │
             ▼
   ┌─────────────────────┐
   │  S3 Quarantine Bucket│  ← quarantine-uploads-prod
   └─────────┬───────────┘
             │
    ② Invoke Lambda
             │
             ▼
   ┌─────────────────────┐
   │   file-validator    │  ← AWS Lambda (Python 3.12)
   │   Lambda Function   │
   │  • Size check       │
   │  • Format check     │
   │  • Structure check  │
   └─────────┬───────────┘
             │
         ┌───┴────┐
        PASS     FAIL
         │         │
         ▼         ▼
  ③ Promote    Clean up
  to Target    quarantine
  Bucket       → abort

   ┌─────────────────────┐
   │   S3 Target Bucket  │  ← user-defined (e.g. sam-uploads-prod)
   └─────────────────────┘
             │
    ④ Popup summary shown
       in Jenkins UI

Part 1 — Setting Up Jenkins in Docker

The docker-compose.yml

I ran Jenkins inside Docker to keep the setup portable and reproducible. Here’s the core docker-compose.yml:

        
      
version: "3.9"

services:
  jenkins:
    image: jenkins/jenkins:lts-jdk17
    container_name: jenkins
    restart: unless-stopped
    privileged: true
    user: root
    ports:
      - "8080:8080"
      - "50000:50000"
    volumes:
      - jenkins_home:/var/jenkins_home
      - /var/run/docker.sock:/var/run/docker.sock
      - ./pipelines:/var/jenkins_home/pipelines:ro
    environment:
      - JAVA_OPTS=-Djenkins.install.runSetupWizard=false

volumes:
  jenkins_home:

Start it with:

docker compose up -d

Then open http://localhost:8080 to access the Jenkins UI.

Jenkins Plugins Required

Install these from Manage Jenkins → Plugins → Available:

Plugin	Why
`File Parameter`	Allows file upload in “Build with Parameters”
`Pipeline: AWS Steps`	S3 upload/copy/delete + Lambda invoke — no AWS CLI needed

⚠️ Gotcha I hit: AWS CLI is not installed in the Jenkins container by default. Rather than installing it manually, I switched to the Pipeline: AWS Steps plugin which handles everything natively from Groovy — no shell commands needed.

Part 2 — AWS Setup

S3 Buckets

Create two buckets in us-east-1 (N. Virginia):

Bucket	Purpose
`quarantine-uploads-prod`	Temporary holding zone while Lambda validates
`sam-uploads-prod` (or your name)	Final destination — the `tmp_zone`

Settings for both:

Block all public access: ON
Versioning: optional

IAM User for Jenkins

Create a user jenkins-s3-user and attach these managed policies:

AmazonS3FullAccess
AWSLambda_FullAccess

Then generate an Access Key (type: “Application running outside AWS”) and save both the Key ID and Secret.

Adding Credentials to Jenkins

Manage Jenkins → Credentials → System → Global → Add Credentials

Kind: AWS Credentials
ID: aws-credentials ← this exact ID is used in the Jenkinsfile
Paste your Access Key ID + Secret Access Key

Part 3 — The Lambda Validator

Why no layers?

My original plan used openpyxl via a public Lambda layer. It failed with a cross-account permission error:

Action: lambda:GetLayerVersion
On resource: arn:aws:lambda:us-east-1:770693421928:layer:Klayers-p312-openpyxl:4
Context: no resource-based policy allows the action

The solution? Use only Python built-ins — json, csv, io, zipfile. Since XLSX files are ZIP archives internally, zipfile is enough to validate structure without any third-party library.

The handler code

        
      
import json, os, boto3, csv, io, zipfile

s3 = boto3.client("s3")
MAX_BYTES = 100 * 1024 * 1024  # 100 MB
ALLOWED_EXTENSIONS = {"xlsx", "json", "csv"}

def handler(event, context):
    bucket = event["bucket"]
    key    = event["key"]

    head = s3.head_object(Bucket=bucket, Key=key)
    size = head["ContentLength"]

    if size == 0:        return _fail("File is empty")
    if size > MAX_BYTES: return _fail(f"File too large")

    ext  = key.rsplit(".", 1)[-1].lower()
    data = s3.get_object(Bucket=bucket, Key=key)["Body"].read()

    if ext == "json":
        try: json.loads(data)
        except: return _fail("Invalid JSON")

    elif ext == "csv":
        reader  = csv.reader(io.StringIO(data.decode("utf-8", errors="replace")))
        headers = next(reader, None)
        if not headers: return _fail("CSV has no header row")

    elif ext == "xlsx":
        try:
            zf = zipfile.ZipFile(io.BytesIO(data))
            if "xl/workbook.xml" not in zf.namelist():
                return _fail("XLSX missing workbook")
        except zipfile.BadZipFile:
            return _fail("Not a valid XLSX file")

    return {"statusCode": 200, "body": {"valid": True, "reason": "ok"}}

def _fail(reason):
    return {"statusCode": 400, "body": {"valid": False, "reason": reason}}

Lambda configuration

Setting	Value
Runtime	Python 3.12
Handler	`lambda_function.handler`
Memory	128 MB
Timeout	30 sec
Execution role	Must have `AmazonS3ReadOnlyAccess`

⚠️ Gotcha I hit: The default handler is lambda_function.lambda_handler. Since my function is named handler not lambda_handler, I had to update this in Configuration → General configuration → Runtime settings → Handler.

📸 Image suggestion: Screenshot of Lambda Runtime settings showing the Handler field set to lambda_function.handler.

Part 4 — The Jenkins Pipeline

Build parameters

When a user clicks “Build with Parameters”, they see:

📎 UPLOAD_FILE         [Choose File]
🪣 DESTINATION_BUCKET  [sam-uploads-prod]

The destination bucket is editable — so the same pipeline can route files to different buckets without code changes.

The full Jenkinsfile

        
      
import groovy.json.JsonSlurper

pipeline {
    agent any

    parameters {
        base64File(name: 'UPLOAD_FILE', description: 'Select file (.xlsx, .json, .csv)')
        string(name: 'DESTINATION_BUCKET', defaultValue: 'sam-uploads-prod',
               description: 'Target S3 bucket name')
    }

    environment {
        AWS_DEFAULT_REGION = 'us-east-1'
        QUARANTINE_BUCKET  = 'quarantine-uploads-prod'
        VALIDATOR_FUNCTION = 'file-validator'
        ALLOWED_EXTENSIONS = 'xlsx,json,csv'
    }

    stages {

        stage('Prepare File') {
            steps {
                withFileParameter('UPLOAD_FILE') {
                    script {
                        def originalName = env.UPLOAD_FILE_FILENAME ?: 'uploaded_file'
                        def ext = originalName.tokenize('.').last().toLowerCase()

                        if (!(ext in env.ALLOWED_EXTENSIONS.split(',')))
                            error("File type '.${ext}' not allowed")

                        def cleanName = originalName.replaceAll(/[^a-zA-Z0-9._-]/, '_')
                        def timestamp = new Date().format('yyyyMMdd-HHmmss')

                        env.FILE_KEY      = "uploads/${timestamp}-${cleanName}"
                        env.CLEAN_NAME    = cleanName
                        env.TEMP_FILE     = env.UPLOAD_FILE
                        env.TARGET_BUCKET = params.DESTINATION_BUCKET?.trim() ?: 'sam-uploads-prod'

                        sh 'cp "$TEMP_FILE" "$WORKSPACE/$CLEAN_NAME"'
                        env.LOCAL_FILE = "${env.WORKSPACE}/${cleanName}"
                    }
                }
            }
        }

        stage('Upload to Quarantine') {
            steps {
                withAWS(region: "${AWS_DEFAULT_REGION}", credentials: 'aws-credentials') {
                    script {
                        s3Upload(bucket: "${QUARANTINE_BUCKET}",
                                 file:   "${env.LOCAL_FILE}",
                                 path:   "${env.FILE_KEY}")
                    }
                }
            }
        }

        stage('Validate via Lambda') {
            steps {
                withAWS(region: "${AWS_DEFAULT_REGION}", credentials: 'aws-credentials') {
                    script {
                        def response = invokeLambda(
                            functionName: "${VALIDATOR_FUNCTION}",
                            payload: [bucket: env.QUARANTINE_BUCKET, key: env.FILE_KEY],
                            returnValueAsString: true
                        )
                        def parsed = new JsonSlurper().parseText(response)
                        if (parsed.statusCode != 200 || !parsed.body?.valid)
                            error("Validation failed: ${parsed.body?.reason}")
                    }
                }
            }
        }

        stage('Promote to Destination') {
            steps {
                withAWS(region: "${AWS_DEFAULT_REGION}", credentials: 'aws-credentials') {
                    script {
                        s3Copy(fromBucket: "${QUARANTINE_BUCKET}", fromPath: "${env.FILE_KEY}",
                               toBucket:   "${env.TARGET_BUCKET}",  toPath:   "${env.FILE_KEY}")
                        s3Delete(bucket: "${QUARANTINE_BUCKET}", path: "${env.FILE_KEY}")
                    }
                }
            }
        }

        stage('Summary') {
            steps {
                script {
                    currentBuild.description = """
                        ✅ <b>Upload Success</b><br/>
                        📄 <b>File:</b> ${env.ORIGINAL_NAME}<br/>
                        🎯 <b>Destination:</b> s3://${env.TARGET_BUCKET}/${env.FILE_KEY}
                    """
                    input(message: """
✅ File uploaded successfully!
📄 File       : ${env.ORIGINAL_NAME}
🎯 Destination: s3://${env.TARGET_BUCKET}/${env.FILE_KEY}
🕐 Time       : ${new Date().format('yyyy-MM-dd HH:mm:ss')}
                    """, ok: '✅ OK, Done!')
                }
            }
        }
    }

    post {
        failure {
            withAWS(region: "${AWS_DEFAULT_REGION}", credentials: 'aws-credentials') {
                script {
                    try { s3Delete(bucket: "${QUARANTINE_BUCKET}", path: "${env.FILE_KEY}") }
                    catch (e) { echo "Nothing to clean up." }
                }
            }
        }
    }
}

📸 Image suggestion: Screenshot of the Jenkins pipeline stages view showing all 5 green stages.

Part 5 — How It All Works Together

Here is the end-to-end flow when a user uploads a file:

Step 1 — User clicks “Build with Parameters” They pick a file from their local machine and optionally change the destination bucket name.

Step 2 — Prepare File stage Jenkins receives the file as a base64-encoded temp file. The pipeline decodes it, strips special characters from the filename (spaces, brackets, etc.), and copies it to a stable workspace path.

Step 3 — Upload to Quarantine The cleaned file is uploaded to quarantine-uploads-prod using the s3Upload step from the Pipeline AWS Steps plugin. No AWS CLI needed.

Step 4 — Lambda Validation The pipeline invokes file-validator Lambda, passing the bucket and key as a JSON payload. Lambda downloads the file from S3 and checks:

Is it empty?
Is it over 100 MB?
Is the extension allowed?
Is the structure valid? (JSON parseable / CSV has headers / XLSX has a workbook)

Step 5 — Promote or Abort If Lambda returns { "valid": true }, the file is copied to the destination bucket and deleted from quarantine. If validation fails, the quarantine file is deleted and the build is marked failed with the reason shown in logs.

Step 6 — Popup Summary On success, Jenkins shows a popup dialog with the file name, destination path, and timestamp. The user clicks OK to close.

📸 Image suggestion: Screenshot of the Jenkins input popup showing the success summary.

Where to Make Diagrams

Here are the best free tools to create architecture and flow diagrams for your post:

Diagram	Tool	What to show
Architecture overview	draw.io	S3 buckets, Lambda, Jenkins, arrows
AWS IAM setup	Excalidraw	User → policies → Jenkins
Jenkins UI	Screenshot	Build with Parameters screen
Success popup	Screenshot	The input dialog at end of pipeline

Lessons Learned

1. AWS CLI is not in Jenkins by default Don’t assume it’s there. Use the Pipeline: AWS Steps plugin instead — it handles S3 and Lambda natively from Groovy with zero shell commands.

2. Public Lambda layers have cross-account restrictions The Klayers public layers blocked my account with a permissions error. The fix was to eliminate the dependency entirely and use Python built-ins — XLSX files are ZIP archives, so zipfile is enough.

3. File uploads need filename sanitization Files with spaces and parentheses in their names (like ExportedEstimate (1).xlsx) break S3 paths and shell commands. Always sanitize with a regex before using the filename anywhere.

4. Jenkins script security approvals are per-method When using new File() or .getBytes() in Groovy scripts, Jenkins requires explicit admin approval for each method call. Switching to sh 'cp ...' avoids this entirely.

5. Lambda handler name must match exactly The default handler is lambda_function.lambda_handler. If your function uses a different name (like handler), update the Runtime settings — this is a common silent failure.

6. Pass Lambda payloads as Maps, not JSON strings The invokeLambda step in the Jenkins plugin expects a Groovy Map [key: value], not a JSON string from JsonOutput.toJson(). Passing a string causes a TypeError: string indices must be integers inside Lambda.

Built on AWS Free Tier · Jenkins LTS · Python 3.12 · us-east-1

info

security AWS

This post is licensed under CC BY 4.0 by the author.

Building a Secure File Upload Pipeline with Jenkins, Docker & AWS

Table of Contents

The Problem

Architecture Overview

Part 1 — Setting Up Jenkins in Docker

The docker-compose.yml

Jenkins Plugins Required

Part 2 — AWS Setup

S3 Buckets

IAM User for Jenkins

Adding Credentials to Jenkins

Part 3 — The Lambda Validator

Why no layers?

The handler code

Lambda configuration

Part 4 — The Jenkins Pipeline

Build parameters

The full Jenkinsfile

Part 5 — How It All Works Together

Where to Make Diagrams

Recommended diagram set for this post

Lessons Learned

Trending Tags