Building a Secure File Upload Pipeline with Jenkins, Docker & AWS
Secure File Upload Pipeline with Jenkins
A step-by-step walkthrough of how I built an automated file upload pipeline that validates Excel, JSON, and CSV files before landing them safely in S3 — using Jenkins running in Docker, AWS Lambda for sanity checks, and a clean quarantine-then-promote flow.
Table of Contents
- The Problem
- Architecture Overview
- Part 1 — Setting Up Jenkins in Docker
- Part 2 — AWS Setup
- Part 3 — The Lambda Validator
- Part 4 — The Jenkins Pipeline
- Part 5 — How It All Works Together
- Where to Make Diagrams
- Lessons Learned
The Problem
The requirement was simple on paper:
“The upload job should first upload the file to a quarantine bucket, apply a Lambda function in order to ensure the sanity check (no malware, correct format, etc.) on it before moving it to the target bucket (tmp_zone)”
But making it production-ready — with proper validation, clean error handling, and no AWS CLI dependency — turned out to be a journey worth documenting.
Files coming in: .xlsx, .json, .csv
Entry point: Jenkins UI (manual upload via browser)
Cloud: AWS Free Tier — N. Virginia (us-east-1)
Architecture Overview
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
[ User uploads file via Jenkins UI ]
│
▼
┌─────────────────────┐
│ Jenkins Pipeline │
│ (Docker container) │
└─────────┬───────────┘
│
① Validate extension
(.xlsx / .json / .csv only)
│
▼
┌─────────────────────┐
│ S3 Quarantine Bucket│ ← quarantine-uploads-prod
└─────────┬───────────┘
│
② Invoke Lambda
│
▼
┌─────────────────────┐
│ file-validator │ ← AWS Lambda (Python 3.12)
│ Lambda Function │
│ • Size check │
│ • Format check │
│ • Structure check │
└─────────┬───────────┘
│
┌───┴────┐
PASS FAIL
│ │
▼ ▼
③ Promote Clean up
to Target quarantine
Bucket → abort
┌─────────────────────┐
│ S3 Target Bucket │ ← user-defined (e.g. sam-uploads-prod)
└─────────────────────┘
│
④ Popup summary shown
in Jenkins UI
Part 1 — Setting Up Jenkins in Docker
The docker-compose.yml
I ran Jenkins inside Docker to keep the setup portable and reproducible. Here’s the core docker-compose.yml:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
version: "3.9"
services:
jenkins:
image: jenkins/jenkins:lts-jdk17
container_name: jenkins
restart: unless-stopped
privileged: true
user: root
ports:
- "8080:8080"
- "50000:50000"
volumes:
- jenkins_home:/var/jenkins_home
- /var/run/docker.sock:/var/run/docker.sock
- ./pipelines:/var/jenkins_home/pipelines:ro
environment:
- JAVA_OPTS=-Djenkins.install.runSetupWizard=false
volumes:
jenkins_home:
Start it with:
1
docker compose up -d
Then open http://localhost:8080 to access the Jenkins UI.
Jenkins Plugins Required
Install these from Manage Jenkins → Plugins → Available:
| Plugin | Why |
|---|---|
File Parameter |
Allows file upload in “Build with Parameters” |
Pipeline: AWS Steps |
S3 upload/copy/delete + Lambda invoke — no AWS CLI needed |
⚠️ Gotcha I hit: AWS CLI is not installed in the Jenkins container by default. Rather than installing it manually, I switched to the Pipeline: AWS Steps plugin which handles everything natively from Groovy — no shell commands needed.
Part 2 — AWS Setup
S3 Buckets
Create two buckets in us-east-1 (N. Virginia):
| Bucket | Purpose |
|---|---|
quarantine-uploads-prod |
Temporary holding zone while Lambda validates |
sam-uploads-prod (or your name) |
Final destination — the tmp_zone |
Settings for both:
- Block all public access: ON
- Versioning: optional
IAM User for Jenkins
Create a user jenkins-s3-user and attach these managed policies:
AmazonS3FullAccessAWSLambda_FullAccess
Then generate an Access Key (type: “Application running outside AWS”) and save both the Key ID and Secret.
Adding Credentials to Jenkins
Manage Jenkins → Credentials → System → Global → Add Credentials
- Kind:
AWS Credentials - ID:
aws-credentials← this exact ID is used in the Jenkinsfile - Paste your Access Key ID + Secret Access Key
Part 3 — The Lambda Validator
Why no layers?
My original plan used openpyxl via a public Lambda layer. It failed with a cross-account permission error:
1
2
3
Action: lambda:GetLayerVersion
On resource: arn:aws:lambda:us-east-1:770693421928:layer:Klayers-p312-openpyxl:4
Context: no resource-based policy allows the action
The solution? Use only Python built-ins — json, csv, io, zipfile. Since XLSX files are ZIP archives internally, zipfile is enough to validate structure without any third-party library.
The handler code
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
import json, os, boto3, csv, io, zipfile
s3 = boto3.client("s3")
MAX_BYTES = 100 * 1024 * 1024 # 100 MB
ALLOWED_EXTENSIONS = {"xlsx", "json", "csv"}
def handler(event, context):
bucket = event["bucket"]
key = event["key"]
head = s3.head_object(Bucket=bucket, Key=key)
size = head["ContentLength"]
if size == 0: return _fail("File is empty")
if size > MAX_BYTES: return _fail(f"File too large")
ext = key.rsplit(".", 1)[-1].lower()
data = s3.get_object(Bucket=bucket, Key=key)["Body"].read()
if ext == "json":
try: json.loads(data)
except: return _fail("Invalid JSON")
elif ext == "csv":
reader = csv.reader(io.StringIO(data.decode("utf-8", errors="replace")))
headers = next(reader, None)
if not headers: return _fail("CSV has no header row")
elif ext == "xlsx":
try:
zf = zipfile.ZipFile(io.BytesIO(data))
if "xl/workbook.xml" not in zf.namelist():
return _fail("XLSX missing workbook")
except zipfile.BadZipFile:
return _fail("Not a valid XLSX file")
return {"statusCode": 200, "body": {"valid": True, "reason": "ok"}}
def _fail(reason):
return {"statusCode": 400, "body": {"valid": False, "reason": reason}}
Lambda configuration
| Setting | Value |
|---|---|
| Runtime | Python 3.12 |
| Handler | lambda_function.handler |
| Memory | 128 MB |
| Timeout | 30 sec |
| Execution role | Must have AmazonS3ReadOnlyAccess |
⚠️ Gotcha I hit: The default handler is
lambda_function.lambda_handler. Since my function is namedhandlernotlambda_handler, I had to update this in Configuration → General configuration → Runtime settings → Handler.
📸 Image suggestion: Screenshot of Lambda Runtime settings showing the Handler field set to
lambda_function.handler.
Part 4 — The Jenkins Pipeline
Build parameters
When a user clicks “Build with Parameters”, they see:
1
2
📎 UPLOAD_FILE [Choose File]
🪣 DESTINATION_BUCKET [sam-uploads-prod]
The destination bucket is editable — so the same pipeline can route files to different buckets without code changes.
The full Jenkinsfile
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
import groovy.json.JsonSlurper
pipeline {
agent any
parameters {
base64File(name: 'UPLOAD_FILE', description: 'Select file (.xlsx, .json, .csv)')
string(name: 'DESTINATION_BUCKET', defaultValue: 'sam-uploads-prod',
description: 'Target S3 bucket name')
}
environment {
AWS_DEFAULT_REGION = 'us-east-1'
QUARANTINE_BUCKET = 'quarantine-uploads-prod'
VALIDATOR_FUNCTION = 'file-validator'
ALLOWED_EXTENSIONS = 'xlsx,json,csv'
}
stages {
stage('Prepare File') {
steps {
withFileParameter('UPLOAD_FILE') {
script {
def originalName = env.UPLOAD_FILE_FILENAME ?: 'uploaded_file'
def ext = originalName.tokenize('.').last().toLowerCase()
if (!(ext in env.ALLOWED_EXTENSIONS.split(',')))
error("File type '.${ext}' not allowed")
def cleanName = originalName.replaceAll(/[^a-zA-Z0-9._-]/, '_')
def timestamp = new Date().format('yyyyMMdd-HHmmss')
env.FILE_KEY = "uploads/${timestamp}-${cleanName}"
env.CLEAN_NAME = cleanName
env.TEMP_FILE = env.UPLOAD_FILE
env.TARGET_BUCKET = params.DESTINATION_BUCKET?.trim() ?: 'sam-uploads-prod'
sh 'cp "$TEMP_FILE" "$WORKSPACE/$CLEAN_NAME"'
env.LOCAL_FILE = "${env.WORKSPACE}/${cleanName}"
}
}
}
}
stage('Upload to Quarantine') {
steps {
withAWS(region: "${AWS_DEFAULT_REGION}", credentials: 'aws-credentials') {
script {
s3Upload(bucket: "${QUARANTINE_BUCKET}",
file: "${env.LOCAL_FILE}",
path: "${env.FILE_KEY}")
}
}
}
}
stage('Validate via Lambda') {
steps {
withAWS(region: "${AWS_DEFAULT_REGION}", credentials: 'aws-credentials') {
script {
def response = invokeLambda(
functionName: "${VALIDATOR_FUNCTION}",
payload: [bucket: env.QUARANTINE_BUCKET, key: env.FILE_KEY],
returnValueAsString: true
)
def parsed = new JsonSlurper().parseText(response)
if (parsed.statusCode != 200 || !parsed.body?.valid)
error("Validation failed: ${parsed.body?.reason}")
}
}
}
}
stage('Promote to Destination') {
steps {
withAWS(region: "${AWS_DEFAULT_REGION}", credentials: 'aws-credentials') {
script {
s3Copy(fromBucket: "${QUARANTINE_BUCKET}", fromPath: "${env.FILE_KEY}",
toBucket: "${env.TARGET_BUCKET}", toPath: "${env.FILE_KEY}")
s3Delete(bucket: "${QUARANTINE_BUCKET}", path: "${env.FILE_KEY}")
}
}
}
}
stage('Summary') {
steps {
script {
currentBuild.description = """
✅ <b>Upload Success</b><br/>
📄 <b>File:</b> ${env.ORIGINAL_NAME}<br/>
🎯 <b>Destination:</b> s3://${env.TARGET_BUCKET}/${env.FILE_KEY}
"""
input(message: """
✅ File uploaded successfully!
📄 File : ${env.ORIGINAL_NAME}
🎯 Destination: s3://${env.TARGET_BUCKET}/${env.FILE_KEY}
🕐 Time : ${new Date().format('yyyy-MM-dd HH:mm:ss')}
""", ok: '✅ OK, Done!')
}
}
}
}
post {
failure {
withAWS(region: "${AWS_DEFAULT_REGION}", credentials: 'aws-credentials') {
script {
try { s3Delete(bucket: "${QUARANTINE_BUCKET}", path: "${env.FILE_KEY}") }
catch (e) { echo "Nothing to clean up." }
}
}
}
}
}
📸 Image suggestion: Screenshot of the Jenkins pipeline stages view showing all 5 green stages.
Part 5 — How It All Works Together
Here is the end-to-end flow when a user uploads a file:
Step 1 — User clicks “Build with Parameters” They pick a file from their local machine and optionally change the destination bucket name.
Step 2 — Prepare File stage Jenkins receives the file as a base64-encoded temp file. The pipeline decodes it, strips special characters from the filename (spaces, brackets, etc.), and copies it to a stable workspace path.
Step 3 — Upload to Quarantine
The cleaned file is uploaded to quarantine-uploads-prod using the s3Upload step from the Pipeline AWS Steps plugin. No AWS CLI needed.
Step 4 — Lambda Validation
The pipeline invokes file-validator Lambda, passing the bucket and key as a JSON payload. Lambda downloads the file from S3 and checks:
- Is it empty?
- Is it over 100 MB?
- Is the extension allowed?
- Is the structure valid? (JSON parseable / CSV has headers / XLSX has a workbook)
Step 5 — Promote or Abort
If Lambda returns { "valid": true }, the file is copied to the destination bucket and deleted from quarantine. If validation fails, the quarantine file is deleted and the build is marked failed with the reason shown in logs.
Step 6 — Popup Summary On success, Jenkins shows a popup dialog with the file name, destination path, and timestamp. The user clicks OK to close.
📸 Image suggestion: Screenshot of the Jenkins input popup showing the success summary.
Where to Make Diagrams
Here are the best free tools to create architecture and flow diagrams for your post:
Recommended diagram set for this post
| Diagram | Tool | What to show |
|---|---|---|
| Architecture overview | draw.io | S3 buckets, Lambda, Jenkins, arrows |
| AWS IAM setup | Excalidraw | User → policies → Jenkins |
| Jenkins UI | Screenshot | Build with Parameters screen |
| Success popup | Screenshot | The input dialog at end of pipeline |
Lessons Learned
1. AWS CLI is not in Jenkins by default Don’t assume it’s there. Use the Pipeline: AWS Steps plugin instead — it handles S3 and Lambda natively from Groovy with zero shell commands.
2. Public Lambda layers have cross-account restrictions
The Klayers public layers blocked my account with a permissions error. The fix was to eliminate the dependency entirely and use Python built-ins — XLSX files are ZIP archives, so zipfile is enough.
3. File uploads need filename sanitization
Files with spaces and parentheses in their names (like ExportedEstimate (1).xlsx) break S3 paths and shell commands. Always sanitize with a regex before using the filename anywhere.
4. Jenkins script security approvals are per-method
When using new File() or .getBytes() in Groovy scripts, Jenkins requires explicit admin approval for each method call. Switching to sh 'cp ...' avoids this entirely.
5. Lambda handler name must match exactly
The default handler is lambda_function.lambda_handler. If your function uses a different name (like handler), update the Runtime settings — this is a common silent failure.
6. Pass Lambda payloads as Maps, not JSON strings
The invokeLambda step in the Jenkins plugin expects a Groovy Map [key: value], not a JSON string from JsonOutput.toJson(). Passing a string causes a TypeError: string indices must be integers inside Lambda.
Built on AWS Free Tier · Jenkins LTS · Python 3.12 · us-east-1