Hey, I'm Daniel

Technical Leader · ML Engineer · Bioinformatics Scientist

I'm a machine learning engineer and technical leader with a BS in Computer Science from DePaul University and an MS in Data Science from the University of Chicago. I specialize in designing and scaling deep learning systems for large-scale bioinformatics problems. I'm focused on building high-impact AI teams and translating cutting-edge research into real-world solutions.

Connect

Find me online

Check out my work, models, and open-source projects.

GitHub

Code, repos & contributions

Research

Publications

Peer-reviewed research I've contributed to.

Co-Author Microorganisms 2025

teamNGS Balances Sensitivity for Viruses with Comprehensive Microbial Detection in Clinical Specimens

Yamaguchi J, Orf GS, Malinauskas J, Mata M, Weiss SL, Forberg K, Meyer TV, Wiebe PO, Mowerman I, Piotrowski SJ, Glownia D, Rodgers MA, Hackett J Jr, Suputtamongkol Y, Phoompoung P, Gomathi S, Pradeep A, Solomon SS, Bbosa N, Kaleebu P, Ahouidi AD, Mboup S, Sequeira AF, Tojo A, Cloherty GA, Berg MG.

Probe-based capture represents a highly sensitive and cost-effective approach for overcoming host background and enriching viruses in metagenomic NGS (mNGS) libraries. Using clinical specimens collected globally from patients with fever or respiratory illness, teNGS achieved increased sensitivity with 100–10,000× increases in depth and >50% genome coverage for pathogens with titers ≥ 1000 cp/mL. teamNGS couples target enrichment with standard mNGS into a single sequencing run, improving genome recovery while maintaining comprehensive non-viral microbial detection.

PubMed PMC Free Article DOI: 10.3390/microorganisms13122854

Craft

Prompts

Prompts I use for ML workflows — click to expand and copy.

Binary Classification Model Evaluation

Comprehensive evaluation framework for scikit-learn binary classifiers

# Role & Objective

Act as a Senior Data Scientist. I am evaluating a binary classification model built using Python and scikit-learn. I need you to generate [CHOOSE ONE: the Python evaluation code / a detailed analysis of the provided metric outputs] based on the configuration below.

# Context

Project/Objective: [e.g., Predicting customer churn for a SaaS platform]
Positive Class (1): [e.g., Churned]
Negative Class (0): [e.g., Retained]
Class Imbalance: [e.g., Highly imbalanced, 90% Negative / 10% Positive]
Cost of Errors: [e.g., False Positives trigger unnecessary retention offers (costly). False Negatives are missed churn events (lost revenue).]

# Required Evaluation Metrics

Please structure your response to address the following metrics. For each, explain what the results indicate regarding the specific objective outlined above.

Cumulative Gains Curve: To assess how effectively we can prioritize the top percentiles of our data based on model confidence.
ROC Curve & AUC: To evaluate the general baseline separation of the model against a random guesser.
Confusion Matrix (at threshold = [e.g., 0.5]): To view the exact distribution of False Positives vs. False Negatives.
Classification Report: To review Precision, Recall, and F1-scores for both individual classes.
Precision-Recall (PR) Curve & AUC-PR: To evaluate performance specifically on the positive class, given the data balance, and ensure the model isn't generating excessive False Positives.
Lift Curve: To determine exactly how many times better the model is than a random guess at specific data deciles.
Brier Score (or Log Loss / Cross-Entropy): To measure probability calibration and verify that the output scores are mathematically reliable for automated downstream pipelines.
Kolmogorov-Smirnov (KS) Statistic: To measure the maximum degree of separation between the positive and negative class probability distributions.

# Input Data / Instructions

Option A — Generate Code Write clean, modular Python code using sklearn.metrics, matplotlib, and scikit-plot or yellowbrick to calculate and plot all the metrics requested above. Assume I have y_true, y_pred (hard predictions), and y_proba (probability predictions) already loaded in my pipeline.

Option B — Analyze Outputs Here are the outputs from the metrics. Please analyze them and provide a final recommendation on whether the model is ready for deployment or needs threshold tuning:
[PASTE METRIC OUTPUTS/SCORES HERE]

Multi-Class Classification Model Evaluation

Comprehensive evaluation framework for scikit-learn multi-class classifiers

# Role & Objective

Act as a Senior Data Scientist. I am evaluating a multi-class classification model built using Python and scikit-learn. I need you to generate [CHOOSE ONE: the Python evaluation code / a detailed analysis of the provided metric outputs] based on the configuration below.

# Context

Project/Objective: [e.g., Categorizing support tickets by severity level for automated routing]
Classes:
- Class 0: [e.g., Critical / Outage]
- Class 1: [e.g., High Priority]
- Class 2: [e.g., Medium Priority]
- Class 3: [e.g., Low / Informational]
Class Imbalance: [e.g., Moderate imbalance. Classes 2 and 3 make up 70% of the data; Classes 0 and 1 make up 30%.]
Cost of Errors: [e.g., Misclassifying a 'Critical' ticket as 'Low' causes SLA breaches. Confusing 'High' with 'Medium' is suboptimal but manageable.]

# Required Evaluation Metrics

Please structure your response to address the following metrics. For each, explain what the results indicate regarding the specific objective and cost of errors outlined above.

Classification Report (Macro & Weighted Averages): To review Precision, Recall, and F1-scores for individual classes, and assess how the model handles minority classes via the Macro average.
N x N Confusion Matrix: To identify specific misclassifications and class overlap (e.g., which specific categories are being confused with each other).
One-vs-Rest (OvR) ROC AUC: To evaluate how well the model isolates each individual class from the rest of the dataset.
Cohen's Kappa: To verify that the model's overall accuracy is driven by genuine learning rather than random chance or majority-class bias.
Multi-Class Log Loss: To measure probability calibration and ensure the predict_proba distributions across all classes are mathematically reliable.

# Input Data / Instructions

Option A — Generate Code Write clean, modular Python code using sklearn.metrics, matplotlib, and seaborn (for the heatmap) to calculate and plot all the metrics requested above. Ensure parameters like average='macro' or multi_class='ovr' are set correctly. Assume I have y_true, y_pred (hard predictions), and y_proba (probability predictions array of shape (n_samples, n_classes)) already loaded.

Option B — Analyze Outputs Here are the outputs from the metrics. Please analyze them, highlighting any specific class confusion, and provide a final recommendation on whether the model is ready for deployment:
[PASTE METRIC OUTPUTS/SCORES HERE]

Time-Series Architecture Evaluation

Diagnostic framework for choosing between parametric (ARIMA) and deep learning (LSTM) approaches

# Role & Objective

Act as a Senior Data Scientist specializing in Time-Series Forecasting. I am trying to determine the optimal architectural approach for a forecasting problem. I need to decide whether to use a traditional parametric model (e.g., ARIMA, SARIMA, VAR) or a non-parametric deep learning model (e.g., LSTM, GRU).

Please generate [CHOOSE ONE: the Python diagnostic code / a detailed analysis of the provided diagnostic outputs] based on the configuration below.

# Context

Project/Objective: [e.g., Forecasting weekly product demand across regional warehouses to optimize inventory allocation.]
Data Frequency: [e.g., Monthly data]
Data Volume: [e.g., 20 years of historical data, ~240 observations per series]
Multivariate/Univariate: [e.g., Multivariate — predicting demand while using promotional spend and weather data as exogenous features]

# Required Evaluation Diagnostics

Please structure your response to address the following diagnostics. For each, explain what the results indicate regarding the choice between a parametric model and a neural network (LSTM).

Stationarity Checks (ADF & KPSS Tests): To determine if the data has a unit root or deterministic trend, and if standard differencing can make it stationary.
ACF & PACF Analysis: To assess the strength and clarity of linear dependencies across lags.
Seasonal Decomposition (STL): To evaluate the complexity of the seasonal patterns and the size/structure of the residual noise.
Volatility / Non-Linearity (ARCH/Ljung-Box): To detect conditional heteroskedasticity or complex non-linear patterns that standard linear models cannot capture.
Dimensionality & Viability Check: Based on the data volume and number of features provided in the context, explicitly state whether an LSTM is prone to overfitting here, or if the dataset is robust enough to support deep learning.

# Input Data / Instructions

Option A — Generate Code Write clean, modular Python code using statsmodels, pandas, and matplotlib. The code should calculate the ADF and KPSS p-values, plot the ACF/PACF, perform an STL decomposition, and run an ARCH test for volatility. Assume I have a pandas DataFrame df with a datetime index and my target variable in a column named target.

Option B — Analyze Outputs Here are the outputs from the diagnostic tests. Please analyze them and provide a definitive recommendation: should I build a parametric model (ARIMA/VAR) or a non-parametric model (LSTM) for this specific data?
[PASTE METRIC OUTPUTS/SCORES/PLOT DESCRIPTIONS HERE]

K-Means Clustering & Optimal k Selection

End-to-end unsupervised segmentation with programmatic elbow detection

# Role & Objective

Act as a Senior Data Scientist. My core objective is to answer the question: "What is the best, most natural number of groups (k) found in this dataset?" I am using unsupervised learning (K-Means clustering) to segment my data. I need you to write complete, executable Python code that scales the data, determines the optimal number of clusters by mathematically calculating the inflection point of an Elbow Curve, generates the final model, and labels the dataset.

# Context

Project/Objective: [e.g., Discovering natural customer segments based on purchasing behavior and engagement metrics.]
Data Characteristics: [e.g., 5 continuous numeric features: Avg Order Value, Purchase Frequency, Days Since Last Purchase, Session Duration, and Pages Viewed. No missing values.]
Business Value: [e.g., By accurately segmenting customers, we can create tailored marketing campaigns for each group rather than using a one-size-fits-all strategy.]

# Required Workflow & Outputs

Please generate a single, cohesive Python script that performs the following steps:

Pre-processing: Apply StandardScaler to the features, as K-Means is a distance-based algorithm and requires normalized data.
Determine Optimal 'k' (Elbow Method & Silhouette):
- Iterate through a range of clusters (e.g., k=2 to k=10).
- Calculate the Within-Cluster Sum of Squares (WCSS/Inertia) for each 'k'.
- Calculate the Silhouette Score for each 'k' as a secondary validation metric.
Calculate the Inflection Point: Do not rely on visual inspection. Programmatically calculate the exact inflection point (the "elbow") of the WCSS curve. You may use the kneed library (KneeLocator) or a distance-to-line mathematical implementation.
Visualization (seaborn):
- Create a professional seaborn (sns) line plot of the Elbow Curve (WCSS vs. Number of Clusters).
- Add a vertical dashed line or a distinct marker on the plot to explicitly highlight the calculated inflection point.
Final Model & Labeling:
- Instantiate a final K-Means model using the programmatically calculated optimal 'k'.
- Fit the model and predict the cluster labels.
- Append these labels as a new column named Cluster_Label to the original dataframe.
Cluster Profiling: Group the original dataframe by Cluster_Label and calculate the mean for each feature. Print this summary table so I can interpret the real-world characteristics of each group.

# Input Data / Instructions

Generate Code Write clean, modular Python code using pandas, scikit-learn, matplotlib.pyplot, and seaborn. Assume my data is already loaded into a pandas DataFrame named df and the features I want to cluster on are in a list named features_to_cluster.

Optional — Analyze Output If you want the LLM to analyze the output, paste the printed Cluster Profile or Silhouette Scores here once you run the code.
[PASTE CLUSTER PROFILE / SILHOUETTE SCORES HERE]

Engineer

Loops

Autonomous AI workflows designed to prompt themselves.

Binary Classifier Training Loop

Goal-driven loop — LLM builds, evaluates, and refines until performance thresholds are met

Build

LLM writes or updates model code

Train

model.fit() runs on dataset

Evaluate

Metrics computed → LLM_context.json written

Decide

LLM reads context, determines next action

iterate until goal_met = true

Stopping Condition

After every iteration, the LLM reads LLM_context.json and checks whether all required metric goals have been met. Optional metrics inform its reasoning but do not gate the loop. The loop exits autonomously when goal_met: true — or when max_iterations is reached.

ROC-AUC ≥ 0.90 ✦ required F1 (positive) ≥ 0.80 ✦ required PR-AUC ≥ 0.82 · optional Brier Score ≤ 0.12 · optional

LLM_context.json iteration_03

{
  "loop_meta": {
    "loop_id": "binary-clf-v1",
    "iteration": 3,
    "max_iterations": 10,
    "status": "in_progress",
    "last_updated": "2026-06-28T07:41:00Z"
  },

  "performance_goal": {
    "description": "User-configurable thresholds. Loop exits when ALL required goals are met.",
    "goals": [
      { "metric": "roc_auc",     "operator": ">=", "threshold": 0.90, "required": true  },
      { "metric": "pr_auc",      "operator": ">=", "threshold": 0.82, "required": false },
      { "metric": "f1_positive", "operator": ">=", "threshold": 0.80, "required": true  },
      { "metric": "brier_score", "operator": <=", "threshold": 0.12, "required": false }
    ],
    "goal_met": false
  },

  "current_metrics": {
    "roc_auc":      { "value": 0.867, "goal_met": false, "delta_from_prev": +0.023 },
    "pr_auc":       { "value": 0.791, "goal_met": false, "delta_from_prev": +0.031 },
    "f1_positive":  { "value": 0.763, "goal_met": false, "delta_from_prev": +0.018 },
    "accuracy":     { "value": 0.884, "goal_met": null,  "delta_from_prev": +0.009 },
    "brier_score":  { "value": 0.114, "goal_met": true,  "delta_from_prev": -0.008 },
    "ks_statistic": { "value": 0.612, "goal_met": null,  "delta_from_prev": +0.041 },
    "cumulative_gains": {
      "top_10_pct": 0.41,
      "top_20_pct": 0.67,
      "top_30_pct": 0.83
    },
    "confusion_matrix": {
      "threshold": 0.50,
      "TP": 412, "FP": 63, "TN": 891, "FN": 134
    }
  },

  "model_config": {
    "algorithm": "GradientBoostingClassifier",
    "hyperparameters": {
      "n_estimators": 200,
      "max_depth": 4,
      "learning_rate": 0.08,
      "subsample": 0.85,
      "class_weight": "balanced"
    },
    "feature_count": 34,
    "training_samples": 12400,
    "class_ratio": { "positive": 0.11, "negative": 0.89 }
  },

  "decision_context": {
    "llm_analysis": "ROC-AUC improved +0.023 this iteration but remains below 0.90 target. F1 on positive class (0.763) is the primary bottleneck. High FN count (134) suggests threshold should be lowered or recall-focused tuning applied. KS statistic improving — model separation is strengthening.",
    "recommended_next_action": "Lower decision threshold to 0.40 and increase n_estimators to 300. Consider SMOTE oversampling to address class imbalance impact on recall.",
    "confidence": "medium",
    "alternative_actions": [
      "Switch to XGBoost with scale_pos_weight tuning",
      "Add interaction features to improve positive class signal"
    ]
  },

  "history": [
    { "iteration": 1, "roc_auc": 0.821, "f1_positive": 0.727, "action": "Baseline GBC, default params" },
    { "iteration": 2, "roc_auc": 0.844, "f1_positive": 0.745, "action": "Tuned learning_rate to 0.08, added class_weight=balanced" },
    { "iteration": 3, "roc_auc": 0.867, "f1_positive": 0.763, "action": "Increased max_depth to 4, subsample to 0.85" }
  ]
}

Concept: When AI builds itself — The Anthropic Institute · Boris Cherny, Head of Claude Code · “I don’t prompt Claude anymore. I have loops running that prompt Claude and figuring out what to do. My job is to write loops.”