Research

I apply bioinformatics methods and machine learning to understand how gene-regulatory programs drive cancer and other diseases. My research combines transcriptomic analysis, network biology, and interpretable predictive modeling to identify regulatory programs associated with disease phenotypes, with an emphasis on population-aware analysis. I am now extending this work to single-cell and spatial resolution. Throughout, my hands-on training in molecular and cell biology keeps me close to how the data are actually generated.

Regulatory networks

Population-stratified regulatory networks in cancer

Tumor biology differs across ancestral populations, yet most regulatory analyses pool patients together. I build population-specific ceRNA (competing endogenous RNA) and gene-regulatory networks to expose those differences. Across prostate (TCGA-PRAD) and testicular (TCGA-TGCT) cohorts, I compared African American and European American patients — integrating mRNA, lncRNA, and miRNA expression with somatic mutation profiles — and built population- and tumor-type-specific networks that reveal distinct hub genes and pathways. My current work prioritizes regulatory axes with a multi-criterion composite score (statistical robustness, experimental miRNA–target evidence, network topology, and curated disease association) and links them to druggable targets through a custom DGIdb GraphQL integration.

Methods — DESeq2 · starBase / HMDD · TCGAbiolinks · Cytoscape / MCODE · Fisher Z

Machine learning

Interpretable machine learning for biomarker discovery

I treat machine learning as a tool for discovery rather than a black box, pairing predictive performance with interpretability. In early-onset prostate cancer, I integrated WGCNA co-expression analysis with differential expression to narrow candidate genes, then built a nested leave-one-out pipeline with LASSO feature selection across SVM, Random Forest, and ANN classifiers (AUC up to 0.91). This identified the lncRNA PCSEAT as the single biomarker shared by all three models in a White early-onset cohort, with robustness confirmed by permutation importance and SHAP and validated in an independent dataset. Related work applies the same philosophy to glioblastoma treatment response and tumor classification.

Methods — WGCNA · LASSO · SVM / RF / ANN · SHAP · nested cross-validation

Emerging direction

Single-cell & spatial transcriptomics

I am extending network- and ML-based thinking to single-cell and spatial data, where the goal is to connect regulatory programs to cell identity and tissue organization. In an independent project on mouse-brain spatial transcriptomics, I built a scanpy/squidpy pipeline for preprocessing, clustering, and spatial-domain analysis, with a PyTorch cell-type classifier benchmarked against a scikit-learn baseline and interpreted through feature importance. Regulatory genomics at single-cell and spatial resolution is the direction I most want to grow into.

Methods — scanpy · squidpy · PyTorch · spatial-domain analysis

Experimental grounding

A foundation at the bench

My computational work is anchored in hands-on molecular biology. Through laboratory training I gained practical experience in nucleic-acid extraction, conventional and real-time PCR, gel electrophoresis, and mammalian cell culture with viability assays — experience that keeps my analyses tied to how the data are actually generated.

Methods — DNA/RNA extraction · qRT-PCR · electrophoresis · cell culture · MTT / AO-PI assays