Research
Our research spans multiple critical areas in AI security and safety, addressing the most pressing challenges in the field.
AI Security
Developing robust defenses against adversarial attacks, model extraction, and other security threats in AI systems. Our research focuses on creating resilient models that can withstand sophisticated attack vectors while maintaining performance.
Research Topics
- —Adversarial robustness
- —Model extraction prevention
- —Data poisoning defenses
- —Backdoor detection
- —Privacy-preserving ML
AI Safety
Ensuring AI systems behave reliably, predictably, and align with human values and intentions. We investigate methods to prevent unintended behaviors and ensure AI systems operate safely in real-world scenarios.
Research Topics
- —Robustness and reliability
- —Value alignment
- —Failure mode analysis
- —Safety verification
- —Human-AI interaction
Model Evaluation
Creating comprehensive evaluation frameworks to assess AI system capabilities, limitations, and risks. Our work develops standardized methodologies for understanding model behavior across diverse contexts.
Research Topics
- —Evaluation frameworks
- —Capability assessment
- —Risk analysis
- —Benchmark development
- —Performance metrics
Alignment Research
Investigating methods to align AI systems with human preferences and ensure beneficial outcomes. We explore techniques for training and fine-tuning models to better reflect human values and intentions.
Research Topics
- —Preference learning
- —Reward modeling
- —Constitutional AI
- —Interpretability
- —Value learning
For inquiries about our research or collaboration opportunities, please contact us.