Post

Building a Secure File Upload Pipeline with Jenkins, Docker & AWS

Secure File Upload Pipeline with Jenkins

Building a Secure File Upload Pipeline with Jenkins, Docker & AWS

A step-by-step walkthrough of how I built an automated file upload pipeline that validates Excel, JSON, and CSV files before landing them safely in S3 — using Jenkins running in Docker, AWS Lambda for sanity checks, and a clean quarantine-then-promote flow.


Table of Contents

  1. The Problem
  2. Architecture Overview
  3. Part 1 — Setting Up Jenkins in Docker
  4. Part 2 — AWS Setup
  5. Part 3 — The Lambda Validator
  6. Part 4 — The Jenkins Pipeline
  7. Part 5 — How It All Works Together
  8. Where to Make Diagrams
  9. Lessons Learned

The Problem

The requirement was simple on paper:

“The upload job should first upload the file to a quarantine bucket, apply a Lambda function in order to ensure the sanity check (no malware, correct format, etc.) on it before moving it to the target bucket (tmp_zone)”

But making it production-ready — with proper validation, clean error handling, and no AWS CLI dependency — turned out to be a journey worth documenting.

Files coming in: .xlsx, .json, .csv Entry point: Jenkins UI (manual upload via browser) Cloud: AWS Free Tier — N. Virginia (us-east-1)


Architecture Overview

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
[ User uploads file via Jenkins UI ]
              │
              ▼
   ┌─────────────────────┐
   │   Jenkins Pipeline   │
   │   (Docker container) │
   └─────────┬───────────┘
             │
    ① Validate extension
    (.xlsx / .json / .csv only)
             │
             ▼
   ┌─────────────────────┐
   │  S3 Quarantine Bucket│  ← quarantine-uploads-prod
   └─────────┬───────────┘
             │
    ② Invoke Lambda
             │
             ▼
   ┌─────────────────────┐
   │   file-validator    │  ← AWS Lambda (Python 3.12)
   │   Lambda Function   │
   │  • Size check       │
   │  • Format check     │
   │  • Structure check  │
   └─────────┬───────────┘
             │
         ┌───┴────┐
        PASS     FAIL
         │         │
         ▼         ▼
  ③ Promote    Clean up
  to Target    quarantine
  Bucket       → abort

   ┌─────────────────────┐
   │   S3 Target Bucket  │  ← user-defined (e.g. sam-uploads-prod)
   └─────────────────────┘
             │
    ④ Popup summary shown
       in Jenkins UI

Part 1 — Setting Up Jenkins in Docker

The docker-compose.yml

I ran Jenkins inside Docker to keep the setup portable and reproducible. Here’s the core docker-compose.yml:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
version: "3.9"

services:
  jenkins:
    image: jenkins/jenkins:lts-jdk17
    container_name: jenkins
    restart: unless-stopped
    privileged: true
    user: root
    ports:
      - "8080:8080"
      - "50000:50000"
    volumes:
      - jenkins_home:/var/jenkins_home
      - /var/run/docker.sock:/var/run/docker.sock
      - ./pipelines:/var/jenkins_home/pipelines:ro
    environment:
      - JAVA_OPTS=-Djenkins.install.runSetupWizard=false

volumes:
  jenkins_home:

Start it with:

1
docker compose up -d

Then open http://localhost:8080 to access the Jenkins UI.

image

Jenkins Plugins Required

Install these from Manage Jenkins → Plugins → Available:

Plugin Why
File Parameter Allows file upload in “Build with Parameters”
Pipeline: AWS Steps S3 upload/copy/delete + Lambda invoke — no AWS CLI needed

⚠️ Gotcha I hit: AWS CLI is not installed in the Jenkins container by default. Rather than installing it manually, I switched to the Pipeline: AWS Steps plugin which handles everything natively from Groovy — no shell commands needed.


Part 2 — AWS Setup

S3 Buckets

Create two buckets in us-east-1 (N. Virginia):

Bucket Purpose
quarantine-uploads-prod Temporary holding zone while Lambda validates
sam-uploads-prod (or your name) Final destination — the tmp_zone

Settings for both:

  • Block all public access: ON
  • Versioning: optional

image

IAM User for Jenkins

Create a user jenkins-s3-user and attach these managed policies:

  • AmazonS3FullAccess
  • AWSLambda_FullAccess

Then generate an Access Key (type: “Application running outside AWS”) and save both the Key ID and Secret.

Adding Credentials to Jenkins

Manage Jenkins → Credentials → System → Global → Add Credentials

  • Kind: AWS Credentials
  • ID: aws-credentials ← this exact ID is used in the Jenkinsfile
  • Paste your Access Key ID + Secret Access Key

📸 image


Part 3 — The Lambda Validator

Why no layers?

My original plan used openpyxl via a public Lambda layer. It failed with a cross-account permission error:

1
2
3
Action: lambda:GetLayerVersion
On resource: arn:aws:lambda:us-east-1:770693421928:layer:Klayers-p312-openpyxl:4
Context: no resource-based policy allows the action

The solution? Use only Python built-insjson, csv, io, zipfile. Since XLSX files are ZIP archives internally, zipfile is enough to validate structure without any third-party library.

The handler code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
import json, os, boto3, csv, io, zipfile

s3 = boto3.client("s3")
MAX_BYTES = 100 * 1024 * 1024  # 100 MB
ALLOWED_EXTENSIONS = {"xlsx", "json", "csv"}

def handler(event, context):
    bucket = event["bucket"]
    key    = event["key"]

    head = s3.head_object(Bucket=bucket, Key=key)
    size = head["ContentLength"]

    if size == 0:        return _fail("File is empty")
    if size > MAX_BYTES: return _fail(f"File too large")

    ext  = key.rsplit(".", 1)[-1].lower()
    data = s3.get_object(Bucket=bucket, Key=key)["Body"].read()

    if ext == "json":
        try: json.loads(data)
        except: return _fail("Invalid JSON")

    elif ext == "csv":
        reader  = csv.reader(io.StringIO(data.decode("utf-8", errors="replace")))
        headers = next(reader, None)
        if not headers: return _fail("CSV has no header row")

    elif ext == "xlsx":
        try:
            zf = zipfile.ZipFile(io.BytesIO(data))
            if "xl/workbook.xml" not in zf.namelist():
                return _fail("XLSX missing workbook")
        except zipfile.BadZipFile:
            return _fail("Not a valid XLSX file")

    return {"statusCode": 200, "body": {"valid": True, "reason": "ok"}}

def _fail(reason):
    return {"statusCode": 400, "body": {"valid": False, "reason": reason}}

Lambda configuration

Setting Value
Runtime Python 3.12
Handler lambda_function.handler
Memory 128 MB
Timeout 30 sec
Execution role Must have AmazonS3ReadOnlyAccess

⚠️ Gotcha I hit: The default handler is lambda_function.lambda_handler. Since my function is named handler not lambda_handler, I had to update this in Configuration → General configuration → Runtime settings → Handler.

📸 Image suggestion: Screenshot of Lambda Runtime settings showing the Handler field set to lambda_function.handler.


Part 4 — The Jenkins Pipeline

Build parameters

When a user clicks “Build with Parameters”, they see:

1
2
📎 UPLOAD_FILE         [Choose File]
🪣 DESTINATION_BUCKET  [sam-uploads-prod]

The destination bucket is editable — so the same pipeline can route files to different buckets without code changes.

The full Jenkinsfile

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
import groovy.json.JsonSlurper

pipeline {
    agent any

    parameters {
        base64File(name: 'UPLOAD_FILE', description: 'Select file (.xlsx, .json, .csv)')
        string(name: 'DESTINATION_BUCKET', defaultValue: 'sam-uploads-prod',
               description: 'Target S3 bucket name')
    }

    environment {
        AWS_DEFAULT_REGION = 'us-east-1'
        QUARANTINE_BUCKET  = 'quarantine-uploads-prod'
        VALIDATOR_FUNCTION = 'file-validator'
        ALLOWED_EXTENSIONS = 'xlsx,json,csv'
    }

    stages {

        stage('Prepare File') {
            steps {
                withFileParameter('UPLOAD_FILE') {
                    script {
                        def originalName = env.UPLOAD_FILE_FILENAME ?: 'uploaded_file'
                        def ext = originalName.tokenize('.').last().toLowerCase()

                        if (!(ext in env.ALLOWED_EXTENSIONS.split(',')))
                            error("File type '.${ext}' not allowed")

                        def cleanName = originalName.replaceAll(/[^a-zA-Z0-9._-]/, '_')
                        def timestamp = new Date().format('yyyyMMdd-HHmmss')

                        env.FILE_KEY      = "uploads/${timestamp}-${cleanName}"
                        env.CLEAN_NAME    = cleanName
                        env.TEMP_FILE     = env.UPLOAD_FILE
                        env.TARGET_BUCKET = params.DESTINATION_BUCKET?.trim() ?: 'sam-uploads-prod'

                        sh 'cp "$TEMP_FILE" "$WORKSPACE/$CLEAN_NAME"'
                        env.LOCAL_FILE = "${env.WORKSPACE}/${cleanName}"
                    }
                }
            }
        }

        stage('Upload to Quarantine') {
            steps {
                withAWS(region: "${AWS_DEFAULT_REGION}", credentials: 'aws-credentials') {
                    script {
                        s3Upload(bucket: "${QUARANTINE_BUCKET}",
                                 file:   "${env.LOCAL_FILE}",
                                 path:   "${env.FILE_KEY}")
                    }
                }
            }
        }

        stage('Validate via Lambda') {
            steps {
                withAWS(region: "${AWS_DEFAULT_REGION}", credentials: 'aws-credentials') {
                    script {
                        def response = invokeLambda(
                            functionName: "${VALIDATOR_FUNCTION}",
                            payload: [bucket: env.QUARANTINE_BUCKET, key: env.FILE_KEY],
                            returnValueAsString: true
                        )
                        def parsed = new JsonSlurper().parseText(response)
                        if (parsed.statusCode != 200 || !parsed.body?.valid)
                            error("Validation failed: ${parsed.body?.reason}")
                    }
                }
            }
        }

        stage('Promote to Destination') {
            steps {
                withAWS(region: "${AWS_DEFAULT_REGION}", credentials: 'aws-credentials') {
                    script {
                        s3Copy(fromBucket: "${QUARANTINE_BUCKET}", fromPath: "${env.FILE_KEY}",
                               toBucket:   "${env.TARGET_BUCKET}",  toPath:   "${env.FILE_KEY}")
                        s3Delete(bucket: "${QUARANTINE_BUCKET}", path: "${env.FILE_KEY}")
                    }
                }
            }
        }

        stage('Summary') {
            steps {
                script {
                    currentBuild.description = """
                        ✅ <b>Upload Success</b><br/>
                        📄 <b>File:</b> ${env.ORIGINAL_NAME}<br/>
                        🎯 <b>Destination:</b> s3://${env.TARGET_BUCKET}/${env.FILE_KEY}
                    """
                    input(message: """
✅ File uploaded successfully!
📄 File       : ${env.ORIGINAL_NAME}
🎯 Destination: s3://${env.TARGET_BUCKET}/${env.FILE_KEY}
🕐 Time       : ${new Date().format('yyyy-MM-dd HH:mm:ss')}
                    """, ok: '✅ OK, Done!')
                }
            }
        }
    }

    post {
        failure {
            withAWS(region: "${AWS_DEFAULT_REGION}", credentials: 'aws-credentials') {
                script {
                    try { s3Delete(bucket: "${QUARANTINE_BUCKET}", path: "${env.FILE_KEY}") }
                    catch (e) { echo "Nothing to clean up." }
                }
            }
        }
    }
}

📸 Image suggestion: Screenshot of the Jenkins pipeline stages view showing all 5 green stages.


Part 5 — How It All Works Together

Here is the end-to-end flow when a user uploads a file:

Step 1 — User clicks “Build with Parameters” They pick a file from their local machine and optionally change the destination bucket name.

Step 2 — Prepare File stage Jenkins receives the file as a base64-encoded temp file. The pipeline decodes it, strips special characters from the filename (spaces, brackets, etc.), and copies it to a stable workspace path.

Step 3 — Upload to Quarantine The cleaned file is uploaded to quarantine-uploads-prod using the s3Upload step from the Pipeline AWS Steps plugin. No AWS CLI needed.

Step 4 — Lambda Validation The pipeline invokes file-validator Lambda, passing the bucket and key as a JSON payload. Lambda downloads the file from S3 and checks:

  • Is it empty?
  • Is it over 100 MB?
  • Is the extension allowed?
  • Is the structure valid? (JSON parseable / CSV has headers / XLSX has a workbook)

Step 5 — Promote or Abort If Lambda returns { "valid": true }, the file is copied to the destination bucket and deleted from quarantine. If validation fails, the quarantine file is deleted and the build is marked failed with the reason shown in logs.

Step 6 — Popup Summary On success, Jenkins shows a popup dialog with the file name, destination path, and timestamp. The user clicks OK to close.

📸 Image suggestion: Screenshot of the Jenkins input popup showing the success summary.


Where to Make Diagrams

Here are the best free tools to create architecture and flow diagrams for your post:

Recommended diagram set for this post

Diagram Tool What to show
Architecture overview draw.io S3 buckets, Lambda, Jenkins, arrows
AWS IAM setup Excalidraw User → policies → Jenkins
Jenkins UI Screenshot Build with Parameters screen
Success popup Screenshot The input dialog at end of pipeline

Lessons Learned

1. AWS CLI is not in Jenkins by default Don’t assume it’s there. Use the Pipeline: AWS Steps plugin instead — it handles S3 and Lambda natively from Groovy with zero shell commands.

2. Public Lambda layers have cross-account restrictions The Klayers public layers blocked my account with a permissions error. The fix was to eliminate the dependency entirely and use Python built-ins — XLSX files are ZIP archives, so zipfile is enough.

3. File uploads need filename sanitization Files with spaces and parentheses in their names (like ExportedEstimate (1).xlsx) break S3 paths and shell commands. Always sanitize with a regex before using the filename anywhere.

4. Jenkins script security approvals are per-method When using new File() or .getBytes() in Groovy scripts, Jenkins requires explicit admin approval for each method call. Switching to sh 'cp ...' avoids this entirely.

5. Lambda handler name must match exactly The default handler is lambda_function.lambda_handler. If your function uses a different name (like handler), update the Runtime settings — this is a common silent failure.

6. Pass Lambda payloads as Maps, not JSON strings The invokeLambda step in the Jenkins plugin expects a Groovy Map [key: value], not a JSON string from JsonOutput.toJson(). Passing a string causes a TypeError: string indices must be integers inside Lambda.


Built on AWS Free Tier · Jenkins LTS · Python 3.12 · us-east-1

This post is licensed under CC BY 4.0 by the author.

Trending Tags