KI: PRAKTEK 13 — Proyek Akhir AI Security
Di praktikum ini kamu tidak lagi “coba-coba tool”. Kamu akan membangun produk keamanan mini yang:
- punya input data jelas
- punya proses deteksi
- punya output keputusan (risk score + alasan)
- punya laporan & demo yang bisa dipertanggungjawabkan
Kuncinya: jelaskan logika keamanan. AI hanya membantu.
Pilihan Proyek (Pilih 1)
- AI Phishing Detector (paling “nyata”, mudah diuji)
- AI Audit PDP (privacy compliance, cocok untuk log/CSV/dataset)
- AI IDS Sederhana (network log/anomaly, menantang tapi seru)
Semua proyek punya kerangka sama (end-to-end).
Struktur Wajib Proyek (Sama untuk semua)
Tahap 0 — Setup Environment (Ubuntu 24.04)
Instal dependensi dasar
sudo apt update sudo apt install -y python3 python3-venv python3-pip git gpg python3 --version gpg --version
Buat folder proyek
mkdir -p ~/ai-security-final/{data,src,reports,models}
cd ~/ai-security-final
python3 -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
Paket Python (open-source)
Untuk semua opsi proyek:
pip install pandas scikit-learn numpy joblib rich
Opsional (kalau butuh parsing log lebih rapih / regex kuat):
pip install python-dateutil
Struktur folder minimal:
ai-security-final/ data/ src/ models/ reports/ README.md
Tahap 1 — Keamanan Data Proyek (Wajib): GnuPG untuk Dataset & Output
Kenapa? Karena dataset dan laporan sering mengandung data sensitif. Kamu harus membuktikan bahwa kamu bisa mengamankan data.
1. Buat key GPG (untuk proyek)
gpg --full-generate-key
Pilih:
- (1) RSA and RSA
- 3072 atau 4096
- nama: AI Security Student
- email: student@lab.local
Cek key:
gpg --list-keys
2. Enkripsi dataset (contoh)
Misal dataset kamu data/phishing_samples.csv:
gpg --output data/phishing_samples.csv.gpg --symmetric --cipher-algo AES256 data/phishing_samples.csv
shred -u data/phishing_samples.csv
Decrypt saat butuh:
gpg --output data/phishing_samples.csv --decrypt data/phishing_samples.csv.gpg
Aturan proyek: dataset yang berisi data pribadi/berisiko harus disimpan terenkripsi atau minimal data dummy.
Tahap 2 — Pilih Proyek + Jalankan Step-by-step
Di bawah ini saya kasih 3 jalur proyek lengkap, masing-masing punya:
- data contoh realistis
- langkah implementasi
- kode training + inference
- output demo
- format laporan
Kamu tinggal pilih salah satu.
OPSI A — AI Phishing Detector (Recommended)
Goal: deteksi pesan phishing dari teks email/chat → keluarkan label + risk score + alasan.
1. Siapkan dataset (realistis tapi aman)
Buat file: data/phishing_samples.csv (contoh mini, bisa kamu tambah)
text,label "URGENT: Your account will be suspended. Verify now at http://secure-login.example.com",1 "Hi team, meeting moved to 3pm. Link: https://meet.example.org/abc",0 "Reset password now. Your mailbox is full. Click http://mailbox-reset.example.net",1 "Invoice attached, please review. Thanks",0 "Bank: unusual activity detected. Confirm your OTP at http://bank-verify.example.xyz",1 "Reminder: submit assignment before Friday",0 Label: 1=phishing, 0=benign.
Challenge: nanti kamu tambahkan 50–200 contoh (bisa dari teks buatan sendiri yang realistis).
2. Buat training script (ML sederhana + explainable)
Buat src/train_phishing.py:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline
from sklearn.metrics import classification_report, confusion_matrix
import joblib
DATA_PATH = "data/phishing_samples.csv"
MODEL_PATH = "models/phishing_model.joblib"
def main():
df = pd.read_csv(DATA_PATH)
X = df["text"].astype(str)
y = df["label"].astype(int)
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.3, random_state=42, stratify=y
)
model = Pipeline([
("tfidf", TfidfVectorizer(ngram_range=(1,2), min_df=1)),
("clf", LogisticRegression(max_iter=200))
])
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
print("=== Confusion Matrix ===")
print(confusion_matrix(y_test, y_pred))
print("\n=== Classification Report ===")
print(classification_report(y_test, y_pred))
joblib.dump(model, MODEL_PATH)
print(f"\nSaved model to: {MODEL_PATH}")
if __name__ == "__main__":
main()
Jalankan:
python src/train_phishing.py
3. Buat detector + alasan (top keywords)
Buat src/detect_phishing.py:
import joblib
from rich import print
from rich.console import Console
MODEL_PATH = "models/phishing_model.joblib"
SUSPICIOUS_HINTS = [
"urgent", "verify", "reset", "suspended", "otp", "password",
"click", "confirm", "limited", "account", "bank"
]
def explain_text(text: str):
low = text.lower()
hits = [h for h in SUSPICIOUS_HINTS if h in low]
return hits[:10]
def main():
model = joblib.load(MODEL_PATH)
console = Console()
console.print("[bold]AI Phishing Detector Demo[/bold]")
console.print("Ketik pesan/email. Enter kosong untuk keluar.\n")
while True:
text = input("Message> ").strip()
if not text:
break
proba = model.predict_proba([text])[0][1] # prob phishing
label = "PHISHING" if proba >= 0.5 else "BENIGN"
hints = explain_text(text)
print("\n[bold]Result[/bold]")
print(f"Label : [bold]{label}[/bold]")
print(f"Risk score: [bold]{proba:.2f}[/bold] (0..1)")
print(f"Reasons : {hints if hints else 'No obvious keyword hints'}")
print("-" * 60)
if __name__ == "__main__":
main()
Run demo:
python src/detect_phishing.py
Contoh input nyata untuk demo:
- “Admin: akun kamu akan nonaktif. klik link ini untuk verifikasi …”
- “Tolong cek invoice, ada file .zip passwordnya 12345”
- “Meeting jam 2, link google meet …”
Penilaian tinggi kalau kamu menambahkan: deteksi URL pendek, domain aneh, kata “urgent”, dan pattern yang sering dipakai scam.
OPSI B — AI Audit PDP (Privacy Audit Tool)
Goal: scan file CSV/log → deteksi personal data → laporan risiko + rekomendasi.
1. Dataset contoh
Buat data/sample_users.csv: name,email,phone,nik,address,notes
"Budi","budi@mail.com","08123456789","327xxxxxxxxxxxx","Bekasi","token=abc123" "Siti","siti@gmail.com","082233445566","320xxxxxxxxxxxx","Jakarta","pwd=123456" "Andi","andi@corp.co.id","081299988877","","Bandung","no issues"
2. Tool audit (regex + scoring)
Buat src/pdp_audit.py:
import re
import pandas as pd
from rich import print
from rich.table import Table
PATTERNS = {
"EMAIL": re.compile(r"\b[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}\b"),
"PHONE_ID": re.compile(r"\b(08\d{8,12})\b"),
"NIK_LIKE": re.compile(r"\b\d{16}\b"),
"TOKEN_LIKE": re.compile(r"\b(token|apikey|secret)\s*=\s*[A-Za-z0-9_-]{6,}\b", re.I),
"PASSWORD_LIKE": re.compile(r"\b(pass|pwd|password)\s*=\s*\S+\b", re.I),
}
SEVERITY = {
"EMAIL": 2,
"PHONE_ID": 2,
"NIK_LIKE": 4,
"TOKEN_LIKE": 4,
"PASSWORD_LIKE": 5,
}
def scan_value(val: str):
findings = []
for k, rx in PATTERNS.items():
if rx.search(val):
findings.append(k)
return findings
def main():
path = "data/sample_users.csv"
df = pd.read_csv(path)
findings_rows = []
total_score = 0
for idx, row in df.iterrows():
row_findings = []
row_score = 0
for col, v in row.items():
s = "" if pd.isna(v) else str(v)
hits = scan_value(s)
for h in hits:
row_findings.append((col, h))
row_score += SEVERITY[h]
total_score += row_score
findings_rows.append((idx, row_score, row_findings))
table = Table(title="PDP Audit Report (Quick Scan)")
table.add_column("Row", justify="right")
table.add_column("Risk Score", justify="right")
table.add_column("Findings")
for idx, score, f in findings_rows:
pretty = ", ".join([f"{col}:{tag}" for col, tag in f]) if f else "-"
table.add_row(str(idx), str(score), pretty)
print(table)
print(f"\n[bold]Total Risk Score:[/bold] {total_score}")
if total_score >= 20:
print("[bold red]High risk:[/bold red] segera lakukan masking/encryption & akses kontrol.")
elif total_score >= 10:
print("[bold yellow]Medium risk:[/bold yellow] audit consent + minimisasi data.")
else:
print("[bold green]Low risk:[/bold green] tetap pastikan retention & akses log.")
if __name__ == "__main__":
main()
Run:
python src/pdp_audit.py
Bonus nilai: hasil audit disimpan jadi file reports/pdp_report.txt lalu dienkripsi pakai GPG.
OPSI C — AI IDS Sederhana (Anomaly Detection dari Log)
Goal: baca log koneksi → deteksi “aneh” (contoh: port scanning / brute force) → alert.
1. Buat dataset log sederhana (contoh realistis)
Buat data/connections.csv:
src_ip,dst_port,count_per_minute 192.168.1.10,80,5 192.168.1.11,22,3 192.168.1.50,22,60 192.168.1.50,23,55 192.168.1.50,445,40 192.168.1.12,443,4 192.168.1.13,80,6
Interpretasi:
- IP 192.168.1.50 “rame banget” ke banyak port → scan/bruteforce suspicion
2. Anomaly detection dengan IsolationForest
Buat src/ids_anomaly.py:
import pandas as pd
from sklearn.ensemble import IsolationForest
from rich import print
from rich.table import Table
def main():
df = pd.read_csv("data/connections.csv")
# features sederhana
X = df"dst_port", "count_per_minute".astype(float)
model = IsolationForest(contamination=0.2, random_state=42)
df["anomaly"] = model.fit_predict(X) # -1 anomaly, 1 normal
df["score"] = model.decision_function(X) # semakin kecil = semakin aneh
table = Table(title="AI IDS Sederhana (Anomaly Detection)")
table.add_column("src_ip")
table.add_column("dst_port", justify="right")
table.add_column("count/min", justify="right")
table.add_column("anomaly")
table.add_column("score", justify="right")
for _, r in df.sort_values("score").iterrows():
tag = "[bold red]ALERT[/bold red]" if r["anomaly"] == -1 else "OK"
table.add_row(
str(r["src_ip"]),
str(int(r["dst_port"])),
str(int(r["count_per_minute"])),
tag,
f"{r['score']:.3f}"
)
print(table)
alerts = df[df["anomaly"] == -1]
if len(alerts) > 0:
print("\n[bold]Suggested Investigation Steps:[/bold]")
print("- Cek apakah IP itu user normal atau device tak dikenal")
print("- Cek log auth (/var/log/auth.log) jika port 22 dominan")
print("- Jika banyak port berbeda: kemungkinan port scanning")
else:
print("\n[bold green]No anomaly detected[/bold green]")
if __name__ == "__main__":
main()
Run:
python src/ids_anomaly.py
Bonus nilai: integrasikan dengan log nyata dari auth.log atau ufw.log (tanpa data pribadi).
Tahap 3 — Output Wajib: Tool + Laporan + Demo
1. Tool (Wajib)
Tool kamu minimal bisa:
menerima input (file / teks)
menghasilkan output (label/alert/report)
punya cara menjalankan yang jelas: python src/...
2. Laporan (Template singkat tapi kuat)
Buat reports/report.md (minimal isi):
Latar belakang masalah
Threat model ringkas (siapa attacker, target, impact)
Desain sistem
Dataset & pengamanan data (pakai GPG? masking?)
Hasil uji (contoh output, confusion matrix / alert list)
Limitasi & potensi kesalahan
Rekomendasi perbaikan
3. Demo (Wajib)
Demo 3–5 menit:
jelaskan masalah
jalankan tool live
tunjukkan output
jelaskan “kenapa” hasilnya begitu
Rubrik Penilaian
Kalau mau nilai tinggi, fokus ke ini:
End-to-end berfungsi (bukan potongan kode)
output ada risk score + alasan
ada evaluasi & limitation (AI bisa salah di mana)
data sensitif ditangani: mask/encrypt (GPG)
dokumentasi rapi: README + report
Checklist Final (Sebelum Submit) src/ berisi script utama data/ aman (dummy atau terenkripsi GPG) models/ ada model kalau proyek ML reports/report.md ada Demo bisa jalan di Ubuntu 24.04 dengan perintah jelas Semua open-source, tanpa proprietary