<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>https://onnocenter.or.id/wiki/index.php?action=history&amp;feed=atom&amp;title=Cyber_Security%3A_Python%3A_Isolasi_Outliar</id>
	<title>Cyber Security: Python: Isolasi Outliar - Revision history</title>
	<link rel="self" type="application/atom+xml" href="https://onnocenter.or.id/wiki/index.php?action=history&amp;feed=atom&amp;title=Cyber_Security%3A_Python%3A_Isolasi_Outliar"/>
	<link rel="alternate" type="text/html" href="https://onnocenter.or.id/wiki/index.php?title=Cyber_Security:_Python:_Isolasi_Outliar&amp;action=history"/>
	<updated>2026-06-23T12:01:46Z</updated>
	<subtitle>Revision history for this page on the wiki</subtitle>
	<generator>MediaWiki 1.35.4</generator>
	<entry>
		<id>https://onnocenter.or.id/wiki/index.php?title=Cyber_Security:_Python:_Isolasi_Outliar&amp;diff=73635&amp;oldid=prev</id>
		<title>Onnowpurbo: /* 4. Jalankan langsung ke file Wazuh */</title>
		<link rel="alternate" type="text/html" href="https://onnocenter.or.id/wiki/index.php?title=Cyber_Security:_Python:_Isolasi_Outliar&amp;diff=73635&amp;oldid=prev"/>
		<updated>2026-06-22T21:21:26Z</updated>

		<summary type="html">&lt;p&gt;&lt;span dir=&quot;auto&quot;&gt;&lt;span class=&quot;autocomment&quot;&gt;4. Jalankan langsung ke file Wazuh&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;table class=&quot;diff diff-contentalign-left diff-editfont-monospace&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;Revision as of 21:21, 22 June 2026&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l482&quot; &gt;Line 482:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 482:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;== 4. Jalankan langsung ke file Wazuh==&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;== 4. Jalankan langsung ke file Wazuh==&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt; &lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt; &lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt; sudo apt install -y acl&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt; &lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt; &lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt; &lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt; sudo setfacl -R -m u:onno:rx /opt/wazuh-data&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt; &lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt; sudo setfacl -R -m u:onno:rx /opt/wazuh-data/logs&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt; &lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt; sudo setfacl -R -m u:onno:rx /opt/wazuh-data/logs/archives&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt; &lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt; sudo setfacl -m u:onno:r /opt/wazuh-data/logs/archives/archives.json&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;  python3 wazuh_archive_outlier.py \&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;  python3 wazuh_archive_outlier.py \&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>Onnowpurbo</name></author>
	</entry>
	<entry>
		<id>https://onnocenter.or.id/wiki/index.php?title=Cyber_Security:_Python:_Isolasi_Outliar&amp;diff=73634&amp;oldid=prev</id>
		<title>Onnowpurbo: /* 2. Buat virtual environment */</title>
		<link rel="alternate" type="text/html" href="https://onnocenter.or.id/wiki/index.php?title=Cyber_Security:_Python:_Isolasi_Outliar&amp;diff=73634&amp;oldid=prev"/>
		<updated>2026-06-22T21:13:21Z</updated>

		<summary type="html">&lt;p&gt;&lt;span dir=&quot;auto&quot;&gt;&lt;span class=&quot;autocomment&quot;&gt;2. Buat virtual environment&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;table class=&quot;diff diff-contentalign-left diff-editfont-monospace&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;Revision as of 21:13, 22 June 2026&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l12&quot; &gt;Line 12:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 12:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;== 2. Buat virtual environment==&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;== 2. Buat virtual environment==&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt; &lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt; sudo apt install python3.14-venv&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;  python3 -m venv venv&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;  python3 -m venv venv&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;  source venv/bin/activate&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;  source venv/bin/activate&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>Onnowpurbo</name></author>
	</entry>
	<entry>
		<id>https://onnocenter.or.id/wiki/index.php?title=Cyber_Security:_Python:_Isolasi_Outliar&amp;diff=73633&amp;oldid=prev</id>
		<title>Onnowpurbo: Created page with &quot;Bisa. Ini skrip Python '''read-only''' untuk menganalisis:   /opt/wazuh-data/logs/archives/archives.json  Tujuannya: membaca log Wazuh, membuat fitur sederhana, lalu memakai '...&quot;</title>
		<link rel="alternate" type="text/html" href="https://onnocenter.or.id/wiki/index.php?title=Cyber_Security:_Python:_Isolasi_Outliar&amp;diff=73633&amp;oldid=prev"/>
		<updated>2026-06-22T21:11:19Z</updated>

		<summary type="html">&lt;p&gt;Created page with &amp;quot;Bisa. Ini skrip Python &amp;#039;&amp;#039;&amp;#039;read-only&amp;#039;&amp;#039;&amp;#039; untuk menganalisis:   /opt/wazuh-data/logs/archives/archives.json  Tujuannya: membaca log Wazuh, membuat fitur sederhana, lalu memakai &amp;#039;...&amp;quot;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;New page&lt;/b&gt;&lt;/p&gt;&lt;div&gt;Bisa. Ini skrip Python '''read-only''' untuk menganalisis:&lt;br /&gt;
&lt;br /&gt;
 /opt/wazuh-data/logs/archives/archives.json&lt;br /&gt;
&lt;br /&gt;
Tujuannya: membaca log Wazuh, membuat fitur sederhana, lalu memakai '''Isolation Forest''' untuk menemukan event yang polanya tidak umum / outlier.&lt;br /&gt;
&lt;br /&gt;
== 1. Buat folder kerja==&lt;br /&gt;
&lt;br /&gt;
 mkdir -p ~/Apps/Wazuh-Outlier&lt;br /&gt;
 cd ~/Apps/Wazuh-Outlier&lt;br /&gt;
&lt;br /&gt;
== 2. Buat virtual environment==&lt;br /&gt;
&lt;br /&gt;
 python3 -m venv venv&lt;br /&gt;
 source venv/bin/activate&lt;br /&gt;
 pip install pandas numpy scikit-learn&lt;br /&gt;
&lt;br /&gt;
== 3. Buat script==&lt;br /&gt;
&lt;br /&gt;
 nano wazuh_archive_outlier.py&lt;br /&gt;
&lt;br /&gt;
Isi dengan script ini:&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/usr/bin/env python3&lt;br /&gt;
&amp;quot;&amp;quot;&amp;quot;&lt;br /&gt;
wazuh_archive_outlier.py&lt;br /&gt;
&lt;br /&gt;
Analisa outlier dari Wazuh archives.json.&lt;br /&gt;
&lt;br /&gt;
Output:&lt;br /&gt;
- outliers.csv&lt;br /&gt;
- outliers.jsonl&lt;br /&gt;
- sample_all_scored.csv&lt;br /&gt;
&lt;br /&gt;
Script ini READ ONLY terhadap file Wazuh.&lt;br /&gt;
Tidak menghapus, tidak memindahkan, tidak mengubah archives.json.&lt;br /&gt;
&amp;quot;&amp;quot;&amp;quot;&lt;br /&gt;
&lt;br /&gt;
import argparse&lt;br /&gt;
import json&lt;br /&gt;
import os&lt;br /&gt;
import sys&lt;br /&gt;
import math&lt;br /&gt;
from pathlib import Path&lt;br /&gt;
from collections import Counter&lt;br /&gt;
from datetime import datetime&lt;br /&gt;
&lt;br /&gt;
import numpy as np&lt;br /&gt;
import pandas as pd&lt;br /&gt;
from sklearn.feature_extraction import FeatureHasher&lt;br /&gt;
from sklearn.ensemble import IsolationForest&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
CAT_FIELDS = [&lt;br /&gt;
    &amp;quot;agent.name&amp;quot;, &amp;quot;agent.id&amp;quot;, &amp;quot;manager.name&amp;quot;, &amp;quot;location&amp;quot;,&lt;br /&gt;
    &amp;quot;decoder.name&amp;quot;, &amp;quot;rule.id&amp;quot;, &amp;quot;rule.description&amp;quot;,&lt;br /&gt;
    &amp;quot;data.srcip&amp;quot;, &amp;quot;data.dstip&amp;quot;, &amp;quot;data.srcport&amp;quot;, &amp;quot;data.dstport&amp;quot;,&lt;br /&gt;
    &amp;quot;data.protocol&amp;quot;, &amp;quot;data.action&amp;quot;, &amp;quot;data.status&amp;quot;, &amp;quot;data.url&amp;quot;,&lt;br /&gt;
    &amp;quot;srcip&amp;quot;, &amp;quot;dstip&amp;quot;, &amp;quot;srcport&amp;quot;, &amp;quot;dstport&amp;quot;, &amp;quot;protocol&amp;quot;,&lt;br /&gt;
    &amp;quot;program_name&amp;quot;, &amp;quot;syscheck.path&amp;quot;,&lt;br /&gt;
    &amp;quot;win.system.eventID&amp;quot;, &amp;quot;win.system.providerName&amp;quot;,&lt;br /&gt;
]&lt;br /&gt;
&lt;br /&gt;
NUM_FIELDS = [&lt;br /&gt;
    &amp;quot;rule.level&amp;quot;, &amp;quot;data.srcport&amp;quot;, &amp;quot;data.dstport&amp;quot;,&lt;br /&gt;
    &amp;quot;srcport&amp;quot;, &amp;quot;dstport&amp;quot;, &amp;quot;win.system.eventID&amp;quot;,&lt;br /&gt;
]&lt;br /&gt;
&lt;br /&gt;
COMMON_DST_PORTS = {&lt;br /&gt;
    22, 25, 53, 80, 110, 123, 143, 443, 465, 587,&lt;br /&gt;
    993, 995, 1514, 1515, 9200, 5601&lt;br /&gt;
}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
def read_last_lines(path: str, max_lines: int):&lt;br /&gt;
    &amp;quot;&amp;quot;&amp;quot;&lt;br /&gt;
    Membaca hanya N baris terakhir dari file besar.&lt;br /&gt;
    max_lines = 0 berarti baca seluruh file.&lt;br /&gt;
    &amp;quot;&amp;quot;&amp;quot;&lt;br /&gt;
    if max_lines &amp;lt;= 0:&lt;br /&gt;
        with open(path, &amp;quot;r&amp;quot;, encoding=&amp;quot;utf-8&amp;quot;, errors=&amp;quot;replace&amp;quot;) as f:&lt;br /&gt;
            return list(f)&lt;br /&gt;
&lt;br /&gt;
    block_size = 1024 * 1024&lt;br /&gt;
    data = b&amp;quot;&amp;quot;&lt;br /&gt;
&lt;br /&gt;
    with open(path, &amp;quot;rb&amp;quot;) as f:&lt;br /&gt;
        f.seek(0, os.SEEK_END)&lt;br /&gt;
        pos = f.tell()&lt;br /&gt;
&lt;br /&gt;
        while pos &amp;gt; 0:&lt;br /&gt;
            read_size = min(block_size, pos)&lt;br /&gt;
            pos -= read_size&lt;br /&gt;
            f.seek(pos)&lt;br /&gt;
            data = f.read(read_size) + data&lt;br /&gt;
&lt;br /&gt;
            if data.count(b&amp;quot;\n&amp;quot;) &amp;gt; max_lines:&lt;br /&gt;
                break&lt;br /&gt;
&lt;br /&gt;
    lines = data.splitlines()[-max_lines:]&lt;br /&gt;
    return [line.decode(&amp;quot;utf-8&amp;quot;, errors=&amp;quot;replace&amp;quot;) for line in lines]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
def flatten_json(obj, parent=&amp;quot;&amp;quot;, out=None, depth=0, max_depth=4):&lt;br /&gt;
    &amp;quot;&amp;quot;&amp;quot;&lt;br /&gt;
    Ubah JSON nested menjadi dot notation.&lt;br /&gt;
    Contoh:&lt;br /&gt;
    agent.name&lt;br /&gt;
    rule.level&lt;br /&gt;
    data.srcip&lt;br /&gt;
    &amp;quot;&amp;quot;&amp;quot;&lt;br /&gt;
    if out is None:&lt;br /&gt;
        out = {}&lt;br /&gt;
&lt;br /&gt;
    if depth &amp;gt; max_depth:&lt;br /&gt;
        return out&lt;br /&gt;
&lt;br /&gt;
    if isinstance(obj, dict):&lt;br /&gt;
        for k, v in obj.items():&lt;br /&gt;
            key = f&amp;quot;{parent}.{k}&amp;quot; if parent else str(k)&lt;br /&gt;
            flatten_json(v, key, out, depth + 1, max_depth)&lt;br /&gt;
    elif isinstance(obj, list):&lt;br /&gt;
        if all(not isinstance(x, (dict, list)) for x in obj):&lt;br /&gt;
            out[parent] = &amp;quot;,&amp;quot;.join(str(x) for x in obj[:20])&lt;br /&gt;
        else:&lt;br /&gt;
            out[parent] = f&amp;quot;list_len_{len(obj)}&amp;quot;&lt;br /&gt;
    else:&lt;br /&gt;
        out[parent] = obj&lt;br /&gt;
&lt;br /&gt;
    return out&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
def clean_value(v, max_len=140):&lt;br /&gt;
    if v is None:&lt;br /&gt;
        return &amp;quot;&amp;quot;&lt;br /&gt;
    s = str(v).strip()&lt;br /&gt;
    if not s or s.lower() in {&amp;quot;none&amp;quot;, &amp;quot;null&amp;quot;, &amp;quot;nan&amp;quot;, &amp;quot;-&amp;quot;}:&lt;br /&gt;
        return &amp;quot;&amp;quot;&lt;br /&gt;
    return s[:max_len]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
def pick(flat, *keys):&lt;br /&gt;
    for k in keys:&lt;br /&gt;
        v = clean_value(flat.get(k))&lt;br /&gt;
        if v:&lt;br /&gt;
            return v&lt;br /&gt;
    return &amp;quot;&amp;quot;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
def to_float(v):&lt;br /&gt;
    try:&lt;br /&gt;
        if v is None or v == &amp;quot;&amp;quot;:&lt;br /&gt;
            return None&lt;br /&gt;
        return float(str(v).strip())&lt;br /&gt;
    except Exception:&lt;br /&gt;
        return None&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
def parse_timestamp(ts):&lt;br /&gt;
    if not ts:&lt;br /&gt;
        return None&lt;br /&gt;
    try:&lt;br /&gt;
        return datetime.fromisoformat(str(ts).replace(&amp;quot;Z&amp;quot;, &amp;quot;+00:00&amp;quot;))&lt;br /&gt;
    except Exception:&lt;br /&gt;
        return None&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
def build_rows_and_cats(events):&lt;br /&gt;
    rows = []&lt;br /&gt;
    cat_pairs_per_event = []&lt;br /&gt;
&lt;br /&gt;
    for i, event in enumerate(events):&lt;br /&gt;
        flat = flatten_json(event)&lt;br /&gt;
        ts = pick(flat, &amp;quot;timestamp&amp;quot;)&lt;br /&gt;
        dt = parse_timestamp(ts)&lt;br /&gt;
&lt;br /&gt;
        row = {&lt;br /&gt;
            &amp;quot;idx&amp;quot;: i,&lt;br /&gt;
            &amp;quot;timestamp&amp;quot;: ts,&lt;br /&gt;
            &amp;quot;agent_name&amp;quot;: pick(flat, &amp;quot;agent.name&amp;quot;),&lt;br /&gt;
            &amp;quot;agent_id&amp;quot;: pick(flat, &amp;quot;agent.id&amp;quot;),&lt;br /&gt;
            &amp;quot;manager_name&amp;quot;: pick(flat, &amp;quot;manager.name&amp;quot;),&lt;br /&gt;
            &amp;quot;location&amp;quot;: pick(flat, &amp;quot;location&amp;quot;),&lt;br /&gt;
            &amp;quot;decoder_name&amp;quot;: pick(flat, &amp;quot;decoder.name&amp;quot;),&lt;br /&gt;
            &amp;quot;rule_id&amp;quot;: pick(flat, &amp;quot;rule.id&amp;quot;),&lt;br /&gt;
            &amp;quot;rule_level&amp;quot;: pick(flat, &amp;quot;rule.level&amp;quot;),&lt;br /&gt;
            &amp;quot;rule_description&amp;quot;: pick(flat, &amp;quot;rule.description&amp;quot;),&lt;br /&gt;
            &amp;quot;srcip&amp;quot;: pick(flat, &amp;quot;data.srcip&amp;quot;, &amp;quot;srcip&amp;quot;),&lt;br /&gt;
            &amp;quot;dstip&amp;quot;: pick(flat, &amp;quot;data.dstip&amp;quot;, &amp;quot;dstip&amp;quot;),&lt;br /&gt;
            &amp;quot;srcport&amp;quot;: pick(flat, &amp;quot;data.srcport&amp;quot;, &amp;quot;srcport&amp;quot;),&lt;br /&gt;
            &amp;quot;dstport&amp;quot;: pick(flat, &amp;quot;data.dstport&amp;quot;, &amp;quot;dstport&amp;quot;),&lt;br /&gt;
            &amp;quot;protocol&amp;quot;: pick(flat, &amp;quot;data.protocol&amp;quot;, &amp;quot;protocol&amp;quot;),&lt;br /&gt;
            &amp;quot;full_log&amp;quot;: pick(flat, &amp;quot;full_log&amp;quot;, &amp;quot;message&amp;quot;),&lt;br /&gt;
        }&lt;br /&gt;
&lt;br /&gt;
        full_log = row[&amp;quot;full_log&amp;quot;]&lt;br /&gt;
        row[&amp;quot;full_log_len&amp;quot;] = len(full_log)&lt;br /&gt;
        row[&amp;quot;hour&amp;quot;] = dt.hour if dt else -1&lt;br /&gt;
        row[&amp;quot;day_of_week&amp;quot;] = dt.weekday() if dt else -1&lt;br /&gt;
&lt;br /&gt;
        cat_pairs = []&lt;br /&gt;
&lt;br /&gt;
        for field in CAT_FIELDS:&lt;br /&gt;
            val = clean_value(flat.get(field))&lt;br /&gt;
            if val:&lt;br /&gt;
                cat_pairs.append((field, val))&lt;br /&gt;
&lt;br /&gt;
        if not cat_pairs and full_log:&lt;br /&gt;
            cat_pairs.append((&amp;quot;full_log_prefix&amp;quot;, full_log[:80]))&lt;br /&gt;
&lt;br /&gt;
        rows.append(row)&lt;br /&gt;
        cat_pairs_per_event.append(cat_pairs)&lt;br /&gt;
&lt;br /&gt;
    return rows, cat_pairs_per_event&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
def build_features(rows, cat_pairs_per_event):&lt;br /&gt;
    n = max(len(rows), 1)&lt;br /&gt;
&lt;br /&gt;
    pair_counts = Counter(&lt;br /&gt;
        f&amp;quot;{k}={v}&amp;quot;&lt;br /&gt;
        for pairs in cat_pairs_per_event&lt;br /&gt;
        for k, v in pairs&lt;br /&gt;
    )&lt;br /&gt;
&lt;br /&gt;
    feature_dicts = []&lt;br /&gt;
    rare_pairs_per_event = []&lt;br /&gt;
&lt;br /&gt;
    for row, pairs in zip(rows, cat_pairs_per_event):&lt;br /&gt;
        feats = {}&lt;br /&gt;
        rare_parts = []&lt;br /&gt;
        rare_scores = []&lt;br /&gt;
&lt;br /&gt;
        for k, v in pairs:&lt;br /&gt;
            pair = f&amp;quot;{k}={v}&amp;quot;&lt;br /&gt;
            feats[pair] = 1.0&lt;br /&gt;
&lt;br /&gt;
            freq = pair_counts[pair]&lt;br /&gt;
            rarity = -math.log(max(freq / n, 1e-12))&lt;br /&gt;
            rare_scores.append(rarity)&lt;br /&gt;
&lt;br /&gt;
            if freq &amp;lt;= 3:&lt;br /&gt;
                rare_parts.append(pair)&lt;br /&gt;
&lt;br /&gt;
        for field in NUM_FIELDS:&lt;br /&gt;
            val = to_float(row.get(field) or &amp;quot;&amp;quot;)&lt;br /&gt;
            if val is not None:&lt;br /&gt;
                feats[f&amp;quot;num:{field}&amp;quot;] = math.log1p(abs(val))&lt;br /&gt;
&lt;br /&gt;
        if row[&amp;quot;hour&amp;quot;] &amp;gt;= 0:&lt;br /&gt;
            feats[&amp;quot;num:hour&amp;quot;] = row[&amp;quot;hour&amp;quot;] / 23.0&lt;br /&gt;
&lt;br /&gt;
        if row[&amp;quot;day_of_week&amp;quot;] &amp;gt;= 0:&lt;br /&gt;
            feats[&amp;quot;num:day_of_week&amp;quot;] = row[&amp;quot;day_of_week&amp;quot;] / 6.0&lt;br /&gt;
&lt;br /&gt;
        feats[&amp;quot;num:full_log_len&amp;quot;] = math.log1p(row[&amp;quot;full_log_len&amp;quot;])&lt;br /&gt;
        feats[&amp;quot;num:rare_score_avg&amp;quot;] = float(np.mean(rare_scores)) if rare_scores else 0.0&lt;br /&gt;
        feats[&amp;quot;num:rare_score_sum&amp;quot;] = float(np.sum(rare_scores)) if rare_scores else 0.0&lt;br /&gt;
&lt;br /&gt;
        feature_dicts.append(feats)&lt;br /&gt;
        rare_pairs_per_event.append(&amp;quot;, &amp;quot;.join(rare_parts[:5]))&lt;br /&gt;
&lt;br /&gt;
    return feature_dicts, rare_pairs_per_event&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
def explain_reason(row, rare_fields):&lt;br /&gt;
    reasons = []&lt;br /&gt;
&lt;br /&gt;
    level = to_float(row.get(&amp;quot;rule_level&amp;quot;))&lt;br /&gt;
&lt;br /&gt;
    if level is not None and level &amp;gt;= 10:&lt;br /&gt;
        reasons.append(f&amp;quot;rule.level tinggi ({int(level)})&amp;quot;)&lt;br /&gt;
&lt;br /&gt;
    dstport = to_float(row.get(&amp;quot;dstport&amp;quot;))&lt;br /&gt;
&lt;br /&gt;
    if dstport is not None:&lt;br /&gt;
        dstport_int = int(dstport)&lt;br /&gt;
&lt;br /&gt;
        if dstport_int not in COMMON_DST_PORTS:&lt;br /&gt;
            reasons.append(f&amp;quot;dstport tidak umum ({dstport_int})&amp;quot;)&lt;br /&gt;
&lt;br /&gt;
    if row.get(&amp;quot;full_log_len&amp;quot;, 0) &amp;gt; 500:&lt;br /&gt;
        reasons.append(&amp;quot;full_log panjang/tidak biasa&amp;quot;)&lt;br /&gt;
&lt;br /&gt;
    if rare_fields:&lt;br /&gt;
        reasons.append(&amp;quot;kombinasi field jarang muncul&amp;quot;)&lt;br /&gt;
&lt;br /&gt;
    if not reasons:&lt;br /&gt;
        reasons.append(&amp;quot;pola statistik berbeda dari mayoritas log&amp;quot;)&lt;br /&gt;
&lt;br /&gt;
    return &amp;quot;; &amp;quot;.join(reasons)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
def main():&lt;br /&gt;
    parser = argparse.ArgumentParser(&lt;br /&gt;
        description=&amp;quot;Detect outliers from Wazuh archives.json JSONL file.&amp;quot;&lt;br /&gt;
    )&lt;br /&gt;
&lt;br /&gt;
    parser.add_argument(&lt;br /&gt;
        &amp;quot;--input&amp;quot;,&lt;br /&gt;
        required=True,&lt;br /&gt;
        help=&amp;quot;Path ke archives.json&amp;quot;&lt;br /&gt;
    )&lt;br /&gt;
&lt;br /&gt;
    parser.add_argument(&lt;br /&gt;
        &amp;quot;--max-lines&amp;quot;,&lt;br /&gt;
        type=int,&lt;br /&gt;
        default=100000,&lt;br /&gt;
        help=&amp;quot;Baca N baris terakhir. Pakai 0 untuk seluruh file.&amp;quot;&lt;br /&gt;
    )&lt;br /&gt;
&lt;br /&gt;
    parser.add_argument(&lt;br /&gt;
        &amp;quot;--contamination&amp;quot;,&lt;br /&gt;
        type=float,&lt;br /&gt;
        default=0.01,&lt;br /&gt;
        help=&amp;quot;Perkiraan rasio outlier. 0.01 berarti 1 persen.&amp;quot;&lt;br /&gt;
    )&lt;br /&gt;
&lt;br /&gt;
    parser.add_argument(&lt;br /&gt;
        &amp;quot;--output-dir&amp;quot;,&lt;br /&gt;
        default=&amp;quot;wazuh-outlier-results&amp;quot;,&lt;br /&gt;
        help=&amp;quot;Folder output.&amp;quot;&lt;br /&gt;
    )&lt;br /&gt;
&lt;br /&gt;
    parser.add_argument(&lt;br /&gt;
        &amp;quot;--hash-features&amp;quot;,&lt;br /&gt;
        type=int,&lt;br /&gt;
        default=4096,&lt;br /&gt;
        help=&amp;quot;Jumlah fitur hashed untuk ML.&amp;quot;&lt;br /&gt;
    )&lt;br /&gt;
&lt;br /&gt;
    parser.add_argument(&lt;br /&gt;
        &amp;quot;--top&amp;quot;,&lt;br /&gt;
        type=int,&lt;br /&gt;
        default=30,&lt;br /&gt;
        help=&amp;quot;Tampilkan top N outlier di terminal.&amp;quot;&lt;br /&gt;
    )&lt;br /&gt;
&lt;br /&gt;
    args = parser.parse_args()&lt;br /&gt;
&lt;br /&gt;
    input_path = Path(args.input)&lt;br /&gt;
&lt;br /&gt;
    if not input_path.exists():&lt;br /&gt;
        print(f&amp;quot;ERROR: File tidak ditemukan: {input_path}&amp;quot;, file=sys.stderr)&lt;br /&gt;
        sys.exit(1)&lt;br /&gt;
&lt;br /&gt;
    out_dir = Path(args.output_dir)&lt;br /&gt;
    out_dir.mkdir(parents=True, exist_ok=True)&lt;br /&gt;
&lt;br /&gt;
    print(f&amp;quot;[1/5] Reading: {input_path}&amp;quot;)&lt;br /&gt;
    lines = read_last_lines(str(input_path), args.max_lines)&lt;br /&gt;
&lt;br /&gt;
    events = []&lt;br /&gt;
    bad_lines = 0&lt;br /&gt;
&lt;br /&gt;
    print(&amp;quot;[2/5] Parsing JSON lines...&amp;quot;)&lt;br /&gt;
&lt;br /&gt;
    for line in lines:&lt;br /&gt;
        line = line.strip()&lt;br /&gt;
&lt;br /&gt;
        if not line:&lt;br /&gt;
            continue&lt;br /&gt;
&lt;br /&gt;
        try:&lt;br /&gt;
            events.append(json.loads(line))&lt;br /&gt;
        except Exception:&lt;br /&gt;
            bad_lines += 1&lt;br /&gt;
&lt;br /&gt;
    if len(events) &amp;lt; 50:&lt;br /&gt;
        print(&lt;br /&gt;
            f&amp;quot;ERROR: Hanya menemukan {len(events)} JSON event valid. Butuh data lebih banyak.&amp;quot;,&lt;br /&gt;
            file=sys.stderr&lt;br /&gt;
        )&lt;br /&gt;
        print(f&amp;quot;Bad lines skipped: {bad_lines}&amp;quot;, file=sys.stderr)&lt;br /&gt;
        sys.exit(2)&lt;br /&gt;
&lt;br /&gt;
    print(f&amp;quot;Valid events: {len(events)} | Bad lines skipped: {bad_lines}&amp;quot;)&lt;br /&gt;
&lt;br /&gt;
    print(&amp;quot;[3/5] Building features...&amp;quot;)&lt;br /&gt;
    rows, cat_pairs = build_rows_and_cats(events)&lt;br /&gt;
    feature_dicts, rare_fields = build_features(rows, cat_pairs)&lt;br /&gt;
&lt;br /&gt;
    print(&amp;quot;[4/5] Training IsolationForest...&amp;quot;)&lt;br /&gt;
&lt;br /&gt;
    hasher = FeatureHasher(&lt;br /&gt;
        n_features=args.hash_features,&lt;br /&gt;
        input_type=&amp;quot;dict&amp;quot;,&lt;br /&gt;
        alternate_sign=False&lt;br /&gt;
    )&lt;br /&gt;
&lt;br /&gt;
    X = hasher.transform(feature_dicts)&lt;br /&gt;
&lt;br /&gt;
    model = IsolationForest(&lt;br /&gt;
        n_estimators=200,&lt;br /&gt;
        contamination=args.contamination,&lt;br /&gt;
        random_state=42,&lt;br /&gt;
        n_jobs=-1,&lt;br /&gt;
    )&lt;br /&gt;
&lt;br /&gt;
    labels = model.fit_predict(X)&lt;br /&gt;
    scores = model.score_samples(X)&lt;br /&gt;
&lt;br /&gt;
    df = pd.DataFrame(rows)&lt;br /&gt;
&lt;br /&gt;
    df[&amp;quot;model_label&amp;quot;] = labels&lt;br /&gt;
    df[&amp;quot;model_score&amp;quot;] = scores&lt;br /&gt;
    df[&amp;quot;outlier_score&amp;quot;] = -scores&lt;br /&gt;
    df[&amp;quot;rare_fields&amp;quot;] = rare_fields&lt;br /&gt;
&lt;br /&gt;
    df[&amp;quot;reason&amp;quot;] = [&lt;br /&gt;
        explain_reason(row, rf)&lt;br /&gt;
        for row, rf in zip(rows, rare_fields)&lt;br /&gt;
    ]&lt;br /&gt;
&lt;br /&gt;
    df = df.sort_values(&amp;quot;outlier_score&amp;quot;, ascending=False)&lt;br /&gt;
&lt;br /&gt;
    outliers = df[df[&amp;quot;model_label&amp;quot;] == -1].copy()&lt;br /&gt;
&lt;br /&gt;
    display_cols = [&lt;br /&gt;
        &amp;quot;timestamp&amp;quot;, &amp;quot;agent_name&amp;quot;, &amp;quot;location&amp;quot;, &amp;quot;decoder_name&amp;quot;,&lt;br /&gt;
        &amp;quot;rule_level&amp;quot;, &amp;quot;rule_id&amp;quot;, &amp;quot;srcip&amp;quot;, &amp;quot;dstip&amp;quot;, &amp;quot;srcport&amp;quot;, &amp;quot;dstport&amp;quot;,&lt;br /&gt;
        &amp;quot;protocol&amp;quot;, &amp;quot;outlier_score&amp;quot;, &amp;quot;reason&amp;quot;, &amp;quot;rare_fields&amp;quot;, &amp;quot;full_log&amp;quot;,&lt;br /&gt;
    ]&lt;br /&gt;
&lt;br /&gt;
    existing_cols = [c for c in display_cols if c in df.columns]&lt;br /&gt;
&lt;br /&gt;
    all_csv = out_dir / &amp;quot;sample_all_scored.csv&amp;quot;&lt;br /&gt;
    out_csv = out_dir / &amp;quot;outliers.csv&amp;quot;&lt;br /&gt;
    out_jsonl = out_dir / &amp;quot;outliers.jsonl&amp;quot;&lt;br /&gt;
&lt;br /&gt;
    df[existing_cols].to_csv(all_csv, index=False)&lt;br /&gt;
    outliers[existing_cols].to_csv(out_csv, index=False)&lt;br /&gt;
&lt;br /&gt;
    with open(out_jsonl, &amp;quot;w&amp;quot;, encoding=&amp;quot;utf-8&amp;quot;) as f:&lt;br /&gt;
        for _, row in outliers.iterrows():&lt;br /&gt;
            event = events[int(row[&amp;quot;idx&amp;quot;])]&lt;br /&gt;
&lt;br /&gt;
            event[&amp;quot;_ml_outlier&amp;quot;] = {&lt;br /&gt;
                &amp;quot;outlier_score&amp;quot;: float(row[&amp;quot;outlier_score&amp;quot;]),&lt;br /&gt;
                &amp;quot;model_score&amp;quot;: float(row[&amp;quot;model_score&amp;quot;]),&lt;br /&gt;
                &amp;quot;reason&amp;quot;: row[&amp;quot;reason&amp;quot;],&lt;br /&gt;
                &amp;quot;rare_fields&amp;quot;: row[&amp;quot;rare_fields&amp;quot;],&lt;br /&gt;
            }&lt;br /&gt;
&lt;br /&gt;
            f.write(json.dumps(event, ensure_ascii=False) + &amp;quot;\n&amp;quot;)&lt;br /&gt;
&lt;br /&gt;
    print(&amp;quot;[5/5] Done.&amp;quot;)&lt;br /&gt;
    print(f&amp;quot;All scored events : {all_csv}&amp;quot;)&lt;br /&gt;
    print(f&amp;quot;Outliers CSV      : {out_csv}&amp;quot;)&lt;br /&gt;
    print(f&amp;quot;Outliers JSONL    : {out_jsonl}&amp;quot;)&lt;br /&gt;
    print(f&amp;quot;Outliers found    : {len(outliers)} from {len(df)} events&amp;quot;)&lt;br /&gt;
&lt;br /&gt;
    print(&amp;quot;\nTop outliers:&amp;quot;)&lt;br /&gt;
&lt;br /&gt;
    preview_cols = [&lt;br /&gt;
        &amp;quot;timestamp&amp;quot;, &amp;quot;agent_name&amp;quot;, &amp;quot;location&amp;quot;, &amp;quot;decoder_name&amp;quot;,&lt;br /&gt;
        &amp;quot;rule_level&amp;quot;, &amp;quot;srcip&amp;quot;, &amp;quot;dstip&amp;quot;, &amp;quot;dstport&amp;quot;,&lt;br /&gt;
        &amp;quot;outlier_score&amp;quot;, &amp;quot;reason&amp;quot;,&lt;br /&gt;
    ]&lt;br /&gt;
&lt;br /&gt;
    preview_cols = [c for c in preview_cols if c in outliers.columns]&lt;br /&gt;
&lt;br /&gt;
    print(&lt;br /&gt;
        outliers[preview_cols]&lt;br /&gt;
        .head(args.top)&lt;br /&gt;
        .to_string(index=False)&lt;br /&gt;
    )&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
if __name__ == &amp;quot;__main__&amp;quot;:&lt;br /&gt;
    main()&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Simpan:&lt;br /&gt;
&lt;br /&gt;
 Ctrl+O&lt;br /&gt;
 Enter&lt;br /&gt;
 Ctrl+X&lt;br /&gt;
&lt;br /&gt;
== 4. Jalankan langsung ke file Wazuh==&lt;br /&gt;
&lt;br /&gt;
 python3 wazuh_archive_outlier.py \&lt;br /&gt;
   --input /opt/wazuh-data/logs/archives/archives.json \&lt;br /&gt;
   --max-lines 100000 \&lt;br /&gt;
   --contamination 0.01 \&lt;br /&gt;
   --output-dir hasil-outlier&lt;br /&gt;
&lt;br /&gt;
Artinya:&lt;br /&gt;
&lt;br /&gt;
 --max-lines 100000&lt;br /&gt;
&lt;br /&gt;
membaca '''100.000 baris terakhir''' saja, supaya aman untuk file besar.&lt;br /&gt;
&lt;br /&gt;
 --contamination 0.01&lt;br /&gt;
&lt;br /&gt;
berarti model menganggap kira-kira '''1% event paling aneh''' sebagai outlier.&lt;br /&gt;
&lt;br /&gt;
Kalau mau lebih sensitif:&lt;br /&gt;
&lt;br /&gt;
 python3 wazuh_archive_outlier.py \&lt;br /&gt;
   --input /opt/wazuh-data/logs/archives/archives.json \&lt;br /&gt;
   --max-lines 100000 \&lt;br /&gt;
   --contamination 0.03 \&lt;br /&gt;
   --output-dir hasil-outlier&lt;br /&gt;
&lt;br /&gt;
Artinya outlier sekitar '''3%'''.&lt;br /&gt;
&lt;br /&gt;
== 5. Kalau permission denied==&lt;br /&gt;
&lt;br /&gt;
Karena file Wazuh biasanya milik root, pakai salah satu cara ini.&lt;br /&gt;
&lt;br /&gt;
Cara aman: ambil copy 100.000 baris terakhir dulu.&lt;br /&gt;
&lt;br /&gt;
 sudo tail -n 100000 /opt/wazuh-data/logs/archives/archives.json &amp;gt; archives_sample.json&lt;br /&gt;
 sudo chown $USER:$USER archives_sample.json&lt;br /&gt;
&lt;br /&gt;
Lalu analisis file copy:&lt;br /&gt;
&lt;br /&gt;
 python3 wazuh_archive_outlier.py \&lt;br /&gt;
   --input archives_sample.json \&lt;br /&gt;
   --max-lines 0 \&lt;br /&gt;
   --contamination 0.01 \&lt;br /&gt;
   --output-dir hasil-outlier&lt;br /&gt;
&lt;br /&gt;
== 6. Lihat hasil==&lt;br /&gt;
&lt;br /&gt;
 ls -lh hasil-outlier&lt;br /&gt;
&lt;br /&gt;
Output penting:&lt;br /&gt;
&lt;br /&gt;
 hasil-outlier/outliers.csv&lt;br /&gt;
 hasil-outlier/outliers.jsonl&lt;br /&gt;
 hasil-outlier/sample_all_scored.csv&lt;br /&gt;
&lt;br /&gt;
Buka 20 outlier teratas:&lt;br /&gt;
&lt;br /&gt;
 head -n 20 hasil-outlier/outliers.csv&lt;br /&gt;
&lt;br /&gt;
Atau lebih enak:&lt;br /&gt;
&lt;br /&gt;
 column -s, -t &amp;lt; hasil-outlier/outliers.csv | less -S&lt;br /&gt;
&lt;br /&gt;
== 7. Arti hasil==&lt;br /&gt;
&lt;br /&gt;
Kolom penting:&lt;br /&gt;
&lt;br /&gt;
 outlier_score&lt;br /&gt;
&lt;br /&gt;
Semakin besar nilainya, semakin aneh event tersebut.&lt;br /&gt;
&lt;br /&gt;
 reason&lt;br /&gt;
&lt;br /&gt;
Alasan kasar kenapa event dianggap outlier.&lt;br /&gt;
&lt;br /&gt;
Contoh alasan:&lt;br /&gt;
&lt;br /&gt;
 rule.level tinggi&lt;br /&gt;
 dstport tidak umum&lt;br /&gt;
 full_log panjang/tidak biasa&lt;br /&gt;
 kombinasi field jarang muncul&lt;br /&gt;
 pola statistik berbeda dari mayoritas log&lt;br /&gt;
&lt;br /&gt;
 rare_fields&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Field yang jarang muncul di dataset, misalnya agent tertentu, decoder tertentu, source IP tertentu, port tertentu, atau kombinasi log yang tidak biasa.&lt;br /&gt;
&lt;br /&gt;
== 8. Kalau Wazuh Docker dan file ada di dalam container==&lt;br /&gt;
&lt;br /&gt;
Cek nama container:&lt;br /&gt;
&lt;br /&gt;
 sudo docker ps --format &amp;quot;table {{.Names}}\t{{.Image}}\t{{.Status}}&amp;quot; | grep wazuh&lt;br /&gt;
&lt;br /&gt;
Ambil 100.000 baris terakhir dari container manager:&lt;br /&gt;
&lt;br /&gt;
 sudo docker exec single-node-wazuh.manager-1 \&lt;br /&gt;
   tail -n 100000 /var/ossec/logs/archives/archives.json \&lt;br /&gt;
   &amp;gt; archives_sample.json&lt;br /&gt;
&lt;br /&gt;
Lalu jalankan:&lt;br /&gt;
&lt;br /&gt;
 python3 wazuh_archive_outlier.py \&lt;br /&gt;
   --input archives_sample.json \&lt;br /&gt;
   --max-lines 0 \&lt;br /&gt;
   --contamination 0.01 \&lt;br /&gt;
   --output-dir hasil-outlier&lt;br /&gt;
&lt;br /&gt;
== Catatan penting&lt;br /&gt;
&lt;br /&gt;
Script ini belum menentukan “serangan” secara pasti. Ia hanya menjawab:&lt;br /&gt;
&lt;br /&gt;
&amp;gt; “Event mana yang paling berbeda dari pola mayoritas log?”&lt;br /&gt;
&lt;br /&gt;
Jadi hasil `outliers.csv` tetap perlu dibaca SOC analyst. Biasanya yang menarik adalah event dengan kombinasi:&lt;br /&gt;
&lt;br /&gt;
 rule.level tinggi&lt;br /&gt;
 source IP jarang&lt;br /&gt;
 destination port aneh&lt;br /&gt;
 decoder tidak biasa&lt;br /&gt;
 agent tertentu tiba-tiba berbeda&lt;br /&gt;
 full_log sangat panjang&lt;br /&gt;
 event muncul pada jam tidak biasa&lt;br /&gt;
&lt;br /&gt;
Untuk lab Wazuh, ini sudah cukup bagus sebagai tahap awal sebelum nanti ditambah '''MITRE ATT&amp;amp;CK mapping''', '''risk score''', dan '''LLM summary'''.&lt;/div&gt;</summary>
		<author><name>Onnowpurbo</name></author>
	</entry>
</feed>