Projects

Dynamic Load Balancing Algorithms in AMReX
π NERSC, LBNL
- Investigated the current-state of load balancing algorithms suitable for dynamic load balancing in domain decomposition based high-performance computing (HPC) simulations.
- Developed a novel hybrid load-balancing algorithm combining SFC and Knapsack, and an improved SFC bisection strategy using the painterβs algorithm.
- painterβs partition-based algorithm outperform the original SFC-based strategies across all tested cases. Additionally, combination algorithms outperform their single-algorithm counterparts and should be evaluated for potential use in production-scale simulations.

CHAI-KTQ: A Novel Framework for Scalable Large Language Models and Efficient Inference
π Boolean Lab, UCSD
- Developed CHAI-KTQ, a novel framework designed to enhance efficiency while maintaining robust performance. It introduces three key extensions: CHAI Quant, CHAI Target, and CHAI Knowledge Distillation (CHAI KD).
- CHAI Quant employs mixed-precision quantization for clustered attention heads, reducing Key-Value (K, V) cache size by up to 55% and improving latency by 40%, all while keeping accuracy deviations below 1%.
- CHAI Target focuses on targeted fine-tuning of sensitive layers identified through attention sensitivity analysis, ensuring robust predictions and reducing uncertainty in critical tasks.
- CHAI KD enables efficient knowledge transfer from large teacher models to lightweight student models, achieving speed gains of 3000 inferences/sec for 125M models with competitive performance on knowledge-intensive tasks like PIQA and RTE.

CPTQuant - A Novel Mixed Precision Post-Training Quantization Techniques for Large Language Models
π Boolean Lab, UCSD
- Developed CPTQuant, a comprehensive strategy that introduces correlation-based (CMPQ), pruning-based (PMPQ), and Taylor decomposition-based (TDMPQ) mixed precision techniques.
- CMPQ adapts the precision level based on canonical correlation analysis of different layers. PMPQ optimizes precision layer-wise based on their sensitivity to sparsity. TDMPQ modifies precision using Taylor decomposition to assess each layer's sensitivity to input perturbation .
- CPTQuant demonstrates up to 4x compression and a 2x-fold increase in efficiency with minimal accuracy drop compared to Hugging Face FP16.
- PMPQ demonstrates an 11% higher compression ratio than other methods for classification tasks, while TDMPQ achieves a 30% greater compression ratio for language modeling tasks.

FedNAMs: Performing Interpretability Analysis in Federated Learning Context
π Boolean Lab, UCSD
- Federated learning is continuously evolving but faces challenges in interpretability and explainability. To address these issues, a creative approach has been introduced that employs Neural Additive Models (NAMs) within a federated learning framework.
- FedNAMs combine the strengths of NAMs, which allow individual networks to focus on specific input features, with the decentralized nature of federated learning. This integration results in more interpretable analysis while enhancing privacy by training on local data across multiple devices. As a result, the risks associated with data centralization are minimized, and the model's robustness and generalizability are improved.
- Research using various text and image classification tasks, such as datasets from OpenFetch ML Wine, UCI Heart Disease, and Iris demonstrates that FedNAMs achieve strong interpretability with minimal loss in accuracy compared to traditional Federated Deep Neural Networks (DNNs). The study reveals significant findings, including the identification of key predictive features both at the client level and the global level.

Communication Efficient Asynchronous Peer-to-Peer Federated LLMs
π Boolean Lab, UCSD
- Developed a secure, efficient, and privacy-preserving federated learning approach in a decentralized setting. Further, this work addresses the challenge of communication overhead in peer-to-peer networks by optimizing the path for weight transfer and mitigating node anomalies.
- Experiments were conducted to evaluate memory usage and latency in server and serverless environments. Our results demonstrate a 5X decrease in latency and a 13% increase in accuracy for serverless cases.
- Comparisons between synchronous and asynchronous scenarios revealed a 76% reduction in information passing time for the latter. The PageRank method is most efficient in eliminating anomalous nodes for better performance of the global federated LLM model.Β

Leveraging High-Performance Computing for Spatial Transcriptomic Identification of CDX2 Genes in Intestinal Crypts Using Deep Neural Network
π Boolean Lab, UCSD
- Spatial transcriptomics provides a valuable link between gene expression patterns and specific locations within tissues. By analyzing the spatial distribution of gene expression, insights can be gained into the molecular characteristics and functional diversity of cells residing in different regions of tissues.
- Developed a YOLOv8-based instance segmentation model to recognize the shape of crypts and glands in the intestines tissue. Identifying these structures enables us to pinpoint genes differentially expressed along the crypt axis, specifically those influenced by immune cells, such as macrophage and epithelial genes.
- The Yolov8 model for gland and crypt achieves an mAP50 score of 0.937, an mAP50-95 score of 0.567, as well as an mAP50 score of 0.748 and an mAP50-95 score of 0.654, respectively.

OrgaTuring: Accelerating Organoid Discovery with Vision-AI
π Boolean Lab, UCSD
- Developed an interpretable CNN-based deep learning model to automate and streamline the microscopic analysis of organoid images. The model enables the real-time location, quantification, tracking, and classification of Crohn's disease organoids from 2D and 3D images.
- Analyzed large confocal microscopic images with small sample sizes. Implemented focal loss to handle class imbalance and G-mean for better thresholding. The DNN, trained with DenseNet-121, achieved a testing accuracy of 75% and an AUC-ROC of 0.67. Additionally, incorporated model interpretability techniques using SHAP and applied domain adaptation methods.
- Exploring learning methods such as zero-shot and few-shot learning to effectively handle small sample sizes Additionally, creating patches from large images to develop a classification model and a probabilistic method for class prediction. Also working on a novel conformal prediction technique that incorporates a dynamic level adjustment method, which computes sensitivity maps based on gradient magnitudes.

Expression Gradient of Cancer Suppressor Gene using Vision-AI
π Boolean Lab, UCSD
- Introduced an original work for gland instance segmentation using Mask R-CNN and Yolo-v8 on colon tissue to diagnose colon cancer. Annotated first public U-shaped colon dataset.
- Mask R-CNN outperformed models like SegNet and UNet, achieving F1 and IoU scores of (0.98, 0.97) for backgrounds, (0.63, 0.46) for glands, and (0.59, 0.42) for crypts, respectively.
- Yolo-v8 model outperformed Mask R-CNN with mean Average Precision at 50% overlap (mAP50) scores of 0.937 for glands and 0.748 for crypts. Additionally, differentially expressed genes along the crypts, verified on 25 slides, predicted over5,000 U-shaped glands.

Impact-of-Feature-Correlation-on-Feature-Importance-using-SHAP
π Deyβs Lab, UCSD
- Conducted a detailed analysis to determine if Shapley interaction values effectively capture feature correlations and enhance feature ranking accuracy.
- This research focused on examining the correlation between blood pressure and various health behaviors like sleep, exercise, diet, and stress management.

Particle-Filter and Visual-Inertial SLAM
π ERL Lab, UCSD
- This project addresses an approach to solving the SLAM (Simultaneous Localization and Mapping) problem for autonomous vehicles. The method consists of three main steps: IMU (Inertial Measurement Unit) Localization through EKF (Extended Kalman Filter) Prediction, Landmark Mapping via EKF Update, and Visual-Inertial SLAM.
- Implemented the differential-drive motion and scan-grid correlation observation models for simultaneous localization and occupancy-grid mapping.
- A visual-inertial simultaneous localization and mapping (SLAM) system was developedΒ using Pythonβs extended Kalman filter (EKF). The EKF was prioritized over sparse SLAM approaches like Particle Filter or Factor Graph-based methods due to its dual prediction-update framework, which efficiently tracks an autonomous systemβs state over time while concurrently estimating landmark positions.