I am currently a engineer at Microsoft/Azure working on building hyperscale accelerators for cloud services. I've helped develop and launch various accelerated services including Microsoft's first production FPGA-accelerated machine learning service for Bing Search and SDN Accelerated Networking for Azure Networking.
My previous research at UT-Austin under Professor Derek Chiou and the UTFAST research group focused on improving the design of highly parallel systems (from algorithms and applications down to microarchitecture) by improving the speed and flexibility of many-core simulation. I have worked on multiple projects involving novel approaches to HW/SW partitioning, systems analysis and communication methods to enable efficient parallelization and design of accelerators across multiple problem domains.
My primary research efforts have focused on exploiting logical decomposition of simulation across a functional/timing boundary. Using novel parallelization and speculation techniques, such a partitioning makes aggressive fine-grain parallelization with hybrid CPU/FPGA platforms a practical reality, improving simulation rates by orders of magnitude. Secondly, using a novel set of user-exposed simulation mechanisms, such a partitioning enables sw design-space exploration for optimizing large-scale algorithmic changes in parallel software prior to committing to expensive and potentially uncessary code changes.