Real-time fix message anomaly detection for credit risk assessment

Anomaly detection in FIX messages is critical to ensure operational integrity and mitigate risks, including the risk of fraud. The goal of this project is to develop a prototype of an alert system capable of identifying anomalous behavior through FIX messages sent.

As it was important for the alert system to work with “real-time” data, a FIX message generator was created. It was designed to create a stream of FIX messages representing both normal and anomalous trading activities. The controlled anomalies introduced, for this prototype, were as follows –

1. Sudden spikes in order sizes

2. Frequent order rejections

3. Frequent logout messages

4. Higher message frequency

5. Higher frequency of message rejections

6. Tendency towards one-sided orders

The second important aspect was the alert system itself which utilized the Isolation Forest Model. Isolation Forest was specifically chosen for four reasons –

1. The data is highly dimensional because the FIX messages contain diverse features such as order size, message type, comp_ids, timestamps and so much more. Isolation Forest does well with this type of data and is able to identify anomalies without extensive preprocessing

2. This model is also scales well and given the high volume of FIX messages on a normal trading day, this scalability ensures better real-time anomaly detection.

3. Isolation Forest is good at handling mixed data types which is exactly what FIX messages are – a mixture of numerical and categorical data

4. Lastly, it’s robustness to outliers enhances its ability to differentiate between normal and bad messages

Features such as order size (lastshares), message type (msg_type), side (side), and timestamps (sending_time) were chosen based on their relevance to credit risk assessment and anomaly detection. FIX messages were parsed and features were extracted using Python. Numerical features were normalized, and categorical features were encoded as necessary for model training. A subset of normal FIX message data was used to train the Isolation Forest model. Hyperparameters such as contamination (expected proportion of anomalies) and random_state were tuned to optimize detection accuracy.

The last feature is the third script, Extractor.py. This extracts all FIX messages for a certain client in the log (by compID).

Evaluation Metrics for the Isolation Forest Model -

The alerts generated by Alert_system.py:

Results from Extractor.py for a known delinquent client

Real-time fix message anomaly detection for credit risk assessment

Recent Posts

Comments