Publications

Issued Patents

Books & Book Chapters

Heterogeneous Computing with OpenCL 2.0
D. Kaeli, P. Mistry, D. Schaa, D. P. Zhang Morgan Kaufmann Publishers, 2015, available in bookstores and Amazon.
Chapter 13: OpenCL Profiling and Debugging, Chapter 14: Performance Optimisation of an Image Analysis Application on dGPU and APUs; Heterogeneous Computing with OpenCL, 2nd Edition
D. P. Zhang, Morgan Kaufmann Publishers, 2012.
Multi-dimensional image segmentation and registration: Coronary artery segmentation and motion modelling
D. P. Zhang, LAP LAMBERT Academic Publishing, 2013.

Papers

Horton Tables: Fast Hash Tables for In-Memory Data-Intensive Computing
Alex Breslow, Dong Ping Zhang, Joseph Greathouse, Nuwan Jayasena, Dean Tullsen USENIX Annual Technical Conference 2016.
Abstract

Hash tables are important data structures that lie at the heart of key applications such as key-value stores and relational databases. Typically bucketized cuckoo hash tables (BCHTs) are used because they provide high- throughput lookups and load factors that exceed 95%. Unfortunately, this performance comes at the cost of re- duced memory access efficiency. Positive lookups (key is in the table) and negative lookups (where it is not) on average access 1.5 and 2.0 buckets, respectively, which results in 50 to 100% more table-containing cache lines to be accessed than should be minimally necessary.

To reduce these surplus accesses, this paper presents the Horton table, a revamped BCHT that reduces the ex- pected cost of positive and negative lookups to fewer than 1.18 and 1.06 buckets, respectively, while still achiev- ing load factors of 95%. The key innovation is remap entries, small in-bucket records that allow (1) more el- ements to be hashed using a single, primary hash func- tion, (2) items that overflow buckets to be tracked and rehashed with one of many alternate functions while maintaining a worst-case lookup cost of 2 buckets, and (3) shortening the vast majority of negative searches to 1 bucket access. With these advancements, Horton tables outperform BCHTs by 17% to 89%.

HADM: Hybrid Analysis for Detection of Malware
Lifan Xu, Dong Ping Zhang, Nuwan Jayasena, John Cavazos IntelliSys 2016.
Abstract

Android is the most popular mobile operating system with a market share of over 80% [1]. Due to its popularity and also its open source nature, Android is now the platform most targeted by malware, creating an urgent need for effective defense mechanisms to protect Android-enabled devices.

In this paper, we propose a novel Android malware classifi- cation method called HADM, Hybrid Analysis for Detection of Malware. We first extract static and dynamic information, and convert this information into vector-based representations. It has been shown that combining advanced features derived by deep learning with the original features provides significant gains [2]. Therefore, we feed both the original dynamic and static feature vector sets to a Deep Neural Network (DNN) which outputs a new set of features. These features are then concatenated with the original features to construct DNN vector sets. Different kernels are then applied onto the DNN vector sets. We also convert the dynamic information into graph-based representations and apply graph kernels onto the graph sets. Learning results from various vector and graph feature sets are combined using hierarchical Multiple Kernel Learning (MKL) to build a final hybrid classifier.

Dynamic Android Malware Classification Using Graph-Based Representations
Lifan Xu, Dong Ping Zhang, Marco A. Alvarez, Jose Andre Morales, John Cavazos IEEE CSCloud. 2016.
Abstract

Malware classification for the Android ecosystem can be performed using a range of techniques. One major technique that has been gaining ground recently is dynamic analysis based on system call invocations recorded during the executions of Android applications. Dynamic analysis has traditionally been based on converting system calls into flat feature vectors and feeding the vectors into machine learning algorithms for classification.

In this paper, we implement three traditional feature-vector- based representations for Android system calls. For each feature vector representation, we also propose a novel graph-based representation. We then use graph kernels to compute pair-wise similarities and feed these similarity measures into a Support Vector Machine (SVM) for classification. To speed up the graph kernel computation, we compress the graphs using the Com- pressed Row Storage format, and then we apply OpenMP to par- allelize the computation. Experiments show that the graph-based representations are able to improve the classification accuracy over the corresponding feature-vector-based representations from the same input. Finally we show that different representations can be combined together to further improve classification accuracy.

Fine-Grained Task Migration for Graph Algorithms using Processing in Memory
Paula Aguilera, Dong Ping Zhang, Nam Sung Kim, Nuwan Jayasena 18th Workshop on Advances in Parallel and Distributed Computational Models. 2016.
Abstract

Graphs are used in a wide variety of application domains, from social science to machine learning. Graph algorithms present large numbers of irregular accesses with little data reuse to amortize the high cost of memory accesses, requiring high memory bandwidth. Processing in memory (PIM) implemented through 3D die-stacking can deliver this high memory bandwidth. In a system with multiple memory modules with PIM, the in-memory compute logic has low latency and high bandwidth access to its local memory, while accesses to remote memory introduce high latency and energy consumption. Ideally, in such a system, computation and data are partitioned among the PIM devices to maximize data locality. But the irregular memory access patterns present in graph applications make it difficult to guarantee that the computation in each PIM device will only access its local data. A large number of remote memory accesses can negate the benefits of using PIM.

In this paper, we examine the feasibility and potential of fine-grained work migration to reduce remote data accesses in systems with multiple PIM devices. First, we propose a data-driven implementation of our study algorithms: breadth-first search (BFS), single source shortest path (SSSP) and betweenness centrality (BC) where each PIM has a queue where the vertices that it needs to process are held. New vertices that need to be processed are enqueued at the PIM device co-located with the memory that stores those vertices. Second, we propose hardware support that takes advantage of PIM to implement highly efficient queues that improve the performance of the queuing framework by up to 16.7%. Third, we develop a timing model for the queueing framework to explore the benefits of work migration vs. remote memory accesses. And, finally, our analysis using the above framework shows that naïve task migration can lead to performance degradations and identifies trade-offs among data locality, redundant computation, and load balance among PIM devices that must be taken into account to realize the potential benefits of fine-grain task migration.

Scaling Deep Learning on Multiple In-Memory Processors
Lifan Xu, Dong Ping Zhang, Nuwan Jayasena WoNDP: 3rd Workshop on Near-Data Processing. 2015.
Abstract

Deep learning methods are proven to be state-of-the-art in addressing many challenges in machine learning domains. However, it comes at the cost of high computational requirements and energy consumption. The emergence of Processing In Memory (PIM) with die stacking technology presents an opportunity to speed up deep learning computation and reduce energy consumption by providing low-cost high-bandwidth memory accesses. PIM uses 3D die stacking to move computations closer to memory and therefore reduces data movement overheads. In this paper, we study the parallelization of deep learning methods on a system with multiple PIM devices. We select three typical layers: the convolutional, pooling, and fully connected layers from common deep learning models and parallelize them using different schemes. Preliminary results show we are able to reach competitive or even better performance using multiple PIM devices when comparing with traditional GPU parallelization.

Realizing the Full Potential of Heterogeneity through Processing in Memory
Nuwan Jayasena, Dongping Zhang, Amin Farmahini-Farahani, Mike Ignatowski WoNDP: 3rd Workshop on Near-Data Processing. 2015.
Abstract

While many processing in memory (PIM) research studies demonstrate significant improvements in memory system energy efficiency, relatively little attention has been paid to the sources of overall energy efficiency of PIM systems. In this paper, we quantify the sources of energy efficiency of a GPU-based PIM design and show that selecting low-power operating points for the in-memory processors is an important aspect, accounting for a 1.9x improvement in energy efficiency compared to a mainstream implementation of the evaluated GPU design. Memory interface efficiency of PIM provides an additional 3.8x improvement over that. These results also demonstrate that, due to memory system inefficiencies, implementing high-performance and low-power heterogeneous cores on the same die attached to a conventional memory system can only realize a fraction of the overall improvement realized by PIM (52% in our study). While these results in general confirm conventional wisdom, we quantify the relative importance of these processor and memory efficiency factors across a wide range of benchmarks and encourage further research to enable and leverage the symbiosis between PIM and heterogeneous computing to further improve energy efficiency.

TOP-PIM: Throughput-Oriented Programmable Processing in Memory
Dong Ping Zhang, Nuwan Jayasena, Alexander Lyashevsky, Joseph Greathouse, Lifan Xu, Mike Ignatowski The 23rd International ACM Symposium on High Performance Parallel and Distributed Computing (HPDC), Best Paper Award Finalist, 2014.
Abstract

As computation becomes increasingly limited by data movement and energy consumption, exploiting locality throughout the memory hierarchy becomes critical to continued performance scaling. Moving computation closer to memory presents an opportunity to reduce both energy and data movement overheads. We explore the use of 3D die stacking to move memory-intensive computations closer to memory. This approach to processing in memory addresses some drawbacks of prior research on in-memory computing and is commercially viable in the foreseeable future.

Because 3D stacking provides increased bandwidth, we study throughput-oriented computing using programmable GPU compute units across a broad range of benchmarks, including graph and HPC applications. We also introduce a methodology for rapid design space exploration by analytically predicting performance and energy of in-memory processors based on metrics obtained from execution on today’s GPU hardware. Our results show that, on average, viable PIM configurations show moderate performance losses (27%) in return for significant energy efficiency improvements (76% reduction in EDP) relative to a representative mainstream GPU at 22nm technology. At 16nm technology, on average, viable PIM configurations are performance competitive with a representative mainstream GPU (7% speedup) and provide even greater energy efficiency improvements (85% reduction in EDP).

Bibtex

@inproceedings{Zhang:2014:TTP:2600212.2600213, author = {Zhang, Dongping and Jayasena, Nuwan and Lyashevsky, Alexander and Greathouse, Joseph L. and Xu, Lifan and Ignatowski, Michael}, title = {TOP-PIM: Throughput-oriented Programmable Processing in Memory}, booktitle = {Proceedings of the 23rd International Symposium on High-performance Parallel and Distributed Computing}, series = {HPDC ’14}, year = {2014}, isbn = {978-1-4503-2749-7}, location = {Vancouver, BC, Canada}, pages = {85–98}, numpages = {14}, url = {http://doi.acm.org/10.1145/2600212.2600213}, doi = {10.1145/2600212.2600213}, acmid = {2600213}, publisher = {ACM}, address = {New York, NY, USA}, keywords = {energy efficiency, gpgpu, performance modeling and analysis, processing in memory}, }

Efficient Parallel Image Clustering and Search on a Heterogeneous Platform
Dong Ping Zhang, Lifan Xu, Lee Howes 22nd High Performance Computing Symposium (HPC), Best Paper Award, 2014.
Abstract

We present a parallel image clustering and search framework for large scale datasets that does not require image annotation, segmentation or registration. This work addresses the image search problem while avoiding the need for user-specified or auto-generated metadata. Instead we rely on image data alone to avoid the ambiguity inherent in user-provided information. We propose a parallel algorithm exploiting heterogeneous hardware resources to generate global descriptors for the set of input images. Given a group of query images we derive the global descriptors in parallel. Secondly, we propose to build a customisable search tree of the image database by performing a hierarchical K-means (H-Kmeans) clustering of the corresponding descriptors. Lastly, we design a novel parallel vBFS algorithm to search through the H-Kmeans tree and locate the set of closest matches for query image descriptors.

To validate our design we analyse the search performance and energy efficiency under a range of hardware clock frequencies and in comparison with alternative approaches. The result of our analysis shows that the framework greatly increases the search efficiency and thereby reduces the energy consumption per query.

Bibtex

@inproceedings{Zhang:2014HPC, author = {Zhang, Dongping and Xu, Lifan and Howes, Lee}, title = {Efficient Parallel Image Clustering and Search on a Heterogeneous Platform}, booktitle = {22nd High Performance Computing Symposium (HPC)}, year = {2014}, }

Parallelization of Shortest Path Graph Kernels on Multi-Core CPUs and GPUs
Lifan Xu, Wei Wang, Marco A. Alvarez, John Cavazos, Dong Ping Zhang The Seventh Workshop on Programmability Issues for Heterogeneous Multicores in conjunction with HiPEAC, Best Paper Award, 2014.
Abstract

In this paper, we present an study on the parallelization of the shortest path graph kernel from machine learning theory. We first propose our modification of the original algorithm which we refer as Fast Computation of Shortest Path kernel(FCSP). Then we explore two different parallelization schemes on CPU and four different implementations on GPU. We investigate the advantages of each and implement a hybrid version which, for different pairs of graphs, dynamically chooses the best implementation from multicore execution and GPU execution. Finally, we apply each of these implementations to several data sets that are composed of graphs from different domains. We first create a set of synthetic data sets to evaluate the benefits and drawbacks of our different implementations. Then, we evaluate our implementations on a set of four real-world graph data sets. The results show the GPU version of FCSP offers an maximum 18x speedup over the sequential CPU version and maximum 2x over a parallel CPU implementation.

Bibtex

@inproceedings{Zhang:2014MultiProg, author = {Lifan Xu, Wei Wang, Marco A. Alvarez, John Cavazos, Dong Ping Zhang}, title = {Parallelization of Shortest Path Graph Kernels on Multi-Core CPUs and GPUs}, booktitle = {The Seventh Workshop on Programmability Issues for Heterogeneous Multicores in conjunction with HiPEAC}, year = {2014}, }

High-level Programming Model Abstractions for Processing in Memory
M. Chu, N. Jayasena, D. P. Zhang, M. Ignatowski WoNDP: 1st Workshop on Near-Data Processing in conjunction with the 46th IEEE/ACM International Symposium on Microarchitecture (MICRO-46) 2013.
Abstract

While the idea of processing in memory (PIM) has been around for decades, both hardware and software limitations have kept it from growing in practical, real-world use. Recent advancements in 3D die-stacking technology have begun to make inroads towards solving some of the implementation issues, but software programmability questions remain. This position paper presents high-level programming models as a solution for PIM programmability by abstracting away many of the low-level architectural details. While we acknowledge that expert programmers still will want low-level, detailed control for optimization, we see high-level programming abstractions as a way to broaden the use of PIM to a larger audience of developers and increase the adoption of PIM architectures in future systems.

Bibtex

@inproceedings{Zhang:2013Wondp1, author = {M. Chu, N. Jayasena, D. P. Zhang, M. Ignatowski}, title = {High-level Programming Model Abstractions for Processing in Memory}, booktitle = {WoNDP: 1st Workshop on Near-Data Processing in conjunction with the 46th IEEE/ACM International Symposium on Microarchitecture (MICRO-46) }, year = {2013}, }

A Processing in Memory Taxonomy and a Case for Studying Fixed-Function PIM
G. Loh, N. Jayasena, M. Oskin, M. Nutter, D. Roberts, M. Meswani, D. P. Zhang, M. Ignatowski WoNDP: 1st Workshop on Near-Data Processing In conjunction with the 46th IEEE/ACM International Symposium on Microarchitecture (MICRO-46) 2013.
Abstract

The emergence of die-stacking technology with mixed logic and memory processes has brought about a renaissance in “processing in memory” (PIM) concepts, first envisioned decades ago. For some, the PIM concept conjures an image of a complete processing unit (e.g., CPU, GPU) integrated directly with memory, perhaps on a logic chip 3D-stacked under one or more memory chips. However, PIM potentially covers a very wide spectrum of compute capabilities embedded in/with the memory. This position paper presents an initial taxonomy for in-memory computing, and advocates for the exploration of simpler computing mechanisms in the memory stack in addition to fully-programmable PIM architectures.

Bibtex

@inproceedings{Zhang:2013Wondp2, author = {G. Loh, N. Jayasena, M. Oskin, M. Nutter, D. Roberts, M. Meswani, D. P. Zhang, M. Ignatowski}, title = {A Processing in Memory Taxonomy and a Case for Studying Fixed-Function PIM}, booktitle = {WoNDP: 1st Workshop on Near-Data Processing in conjunction with the 46th IEEE/ACM International Symposium on Microarchitecture (MICRO-46) }, year = {2013}, }

A new perspective on processing-in-memory architecture design
D. P. Zhang, N. Jayasena, A. Lyashevsky, J. Greathouse, M. Meswani, M. Nutter, M. Ignatowski ACM SIGPLAN Workshop on Memory Systems Performance and Correctness in Conjunction with Conference on Programming Language Design and Implementation. 2013.
Abstract

As computation becomes increasingly limited by data movement and energy consumption, exploiting locality throughout the memory hierarchy becomes critical for maintaining the performance scaling that many have come to expect from the computing industry. Moving computation closer to main memory presents an opportunity to reduce the overheads associated with data movement. We explore the potential of using 3D die stacking to move memory-intensive computations closer to memory. This approach to processing-in-memory addresses some drawbacks of prior research on in-memory computing and appears commercially viable in the foreseeable future. We show promising early results from this approach and identify areas that are in need of research to unlock its full potential.

Bibtex

@inproceedings{zhang_2012_new_perspective_pim, author = {Dongping Zhang and Nuwan Jayasena and Joseph Greathouse and Mitesh Meswani and Mark Nutter and Alexander Lyashevsky and Mike Ignatowski}, title = {A new perspective on processing-in-memory architecture design}, booktitle = {ACM SIGPLAN Workshop on Memory Systems Performance and Correctness}, year = {2013}, location = {Seattle, Washington, USA}, }

Vasculature segmentation using parallel multi-hypothesis template tracking on heterogeneous platforms
D. P. Zhang, L. Howes SPIE Electronic Imaging: Parallel Processing in Image Processing Systems 2013.
Abstract

We present a parallel multi-hypothesis template tracking algorithm on heterogeneous platforms using a layered dispatch programming model. The contributions of this work are: an architecture-specifi c optimised solution for vasculature structure enhancement, an approach to segment the vascular lumen network from volumetric CTA images and a layered dispatch programming model to free the developers from hand-crafting mappings to particularly constrained execution domains on high throughput architecture. This abstraction is demonstrated through a vasculature segmentation application and can also be applied in other real-world applications.

Current GPGPU programming models define a grouping concept which may lead to poorly scoped local/shared memory regions and an inconvenient approach to projecting complicated iterations spaces. To improve on this situation, we propose a simpler and more flexible programming model that leads to easier computation projections and hence a more convenient mapping of the same algorithm to a wide range of architectures.

We first present an optimised image enhancement solution step-by-step, then solve a separable nonlinear least squares problem using a parallel Levenberg-Marquardt algorithm for template matching, and perform the energy efficiency analysis and performance comparison on a variety of platforms, including multi-core CPUs, discrete GPUs and APUs. We propose and discuss the e efficiency of a layered-dispatch programming abstraction for mapping algorithms onto heterogeneous architectures.

Bibtex

@inproceedings{zhang_2013_vascular_segmentation, author = {Dongping Zhang and Lee Howes}, title = {Vasculature segmentation using parallel multi-hypothesis template tracking on heterogeneous platforms}, booktitle = {SPIE Electronic Imaging: Parallel Processing in Image Processing Systems}, year = {2013}, location = {San Francisco, California, USA}, }

Multi-Method Analysis of MRI Images in Early Diagnostics of Alzheimer’s Disease
Robin Wolz, Valtteri Julkunnen, Juha Koikkalainen, Eini Niskanen, Dong Ping Zhang, Daniel Rueckert, Hilkka Soininen, Jyrki Lötjönen and the Alzheimer’s Disease Neuroimaging Initiative. PLoS ONE 2011..
Abstract

The role of structural brain magnetic resonance imaging (MRI) is becoming more and more emphasized in the early diagnostics of Alzheimer’s disease (AD). This study aimed to assess the improvement in classification accuracy that can be achieved by combining features from different structural MRI analysis techniques. Automatically estimated MR features used are hippocampal volume, tensor-based morphometry, cortical thickness and a novel technique based on manifold learning. Baseline MRIs acquired from all 834 subjects (231 healthy controls (HC), 238 stable mild cognitive impairment (SMCI), 167 MCI to AD progressors (P-MCI), 198 AD) from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database were used for evaluation. We compared the classification accuracy achieved with linear discriminant analysis (LDA) and support vector machines (SVM). The best results achieved with individual features are 90% sensitivity and 84% specificity (HC/AD classification), 64%/66% (S-MCI/P-MCI) and 82%/76% (HC/P-MCI) with the LDA classifier. The combination of all features improved these results to 93% sensitivity and 85% specificity (HC/AD), 67%/69% (S-MCI/P-MCI) and 86%/82% (HC/P-MCI). Compared with previously published results in the ADNI database using individual MR-based features, the presented results show that a comprehensive analysis of MRI images combining multiple features improves classification accuracy and predictive power in detecting early AD. The most stable and reliable classification was achieved when combining all available features.

Bibtex

@inproceedings{wolz_2013_alzheimers, author = {Robin Wolz and Valtteri Julkunnen and Juha Koikkalainen and Eini Niskanen and Dong Ping Zhang and Daniel Rueckert and Hilkka Soininen and Jyrki Lötjönen and The Alzheimer’s Disease Neuroimaging Initiative}, title = {Multi-Method Analysis of MRI Images in Early Diagnostics of Alzheimer’s Disease}, booktitle = {PLoS ONE}, year = {2011}, }

Motion tracking of left ventricle and coronaries in 4D CTA.
D. P. Zhang, X. H. Zhuang, P. Edwards, S. Ourselin and D. Rueckert SPIE Medical Imaging 2011.
Abstract

In this paper, we present a novel approach for simultaneous motion tracking of left ventricle and coronary arteries from cardiac Computed Tomography Angiography (CTA) images. We fi rst use the multi-scale vesselness fi lter proposed by Frangi et al. to enhance vessels in the cardiac CTA images. The vessel centrelines are then extracted as the minimal cost path from the enhanced images. The centrelines at end-diastolic (ED) are used as prior input for the motion tracking. All other centrelines are used to evaluate the accuracy of the motion tracking. To segment the left ventricle automatically, we perform three levels of registration using a cardiac atlas obtained from MR images. The cardiac motion is derived from cardiac CTA sequences by using local-phase information to derive a non-rigid registration algorithm. The CTA image at each time frame is registered to the ED frame by maximising the proposed similarity function and following a serial registration scheme. Once the images have been aligned, a dynamic motion model of the left ventricle can be obtained by applying the computed free-form deformations to the segmented left ventricle at ED phase. A similar propagation method also applies to the coronary arteries. To validate the accuracy of the motion model we compare the actual position of the coronaries and left ventricle in each time frame with the predicted ones as estimated from the proposed tracking method.

Bibtex

@inproceedings{zhang_2011_motiontracking, author = {D. P. Zhang and X. H. Zhuang and P. Edwards and S. Ourselin and D. Rueckert}, title = {Motion tracking of left ventricle and coronaries in 4D CTA}, booktitle = {SPIE Medical Imaging}, year = {2011}, }

Coronary Motion Estimation from CTA Using Probability Atlas and Diffeomorphic Registration
D. P. Zhang, L. Risser, F.-X. Vialard, P. Edwards, C. Metz, L. Neefjes, N. Mollet, W. Niessen and D. Rueckert 5th International Workshop on Medical Imaging and Augmented Reality (paper + presentation) 2010.
Abstract

In this paper, we present a method for coronary artery motion estimation from 4D cardiac CT angiogram (CTA) data sets. The proposed method potentially allows the construction of patient-specific 4D coronary motion model from pre-operative CTA which can be used for guiding totally endoscopic coronary artery bypass surgery (TECAB). The proposed approach consists of three steps: Firstly, prior to motion tracking, we form a coronary probability atlas from manual segmentations of the CTA scans of a number of subjects. Secondly, the vesselness response image is calculated and enhanced for end-diastolic and end-systolic CTA images in each 4D sequence. Thirdly, we design a special purpose registration framework for tracking the highly localized coronary motion. It combines the coronary probability atlas, the intensity information from the CTA image and the corresponding vesselness response image to fully automate the coronary motion tracking procedure and improve its accuracy. We performed pairwise 3D registration of cardiac time frames by using a multi-channel implementation of the Large Deformation Diffeomorphic Metric Mapping (LDDMM) framework, where each channel contains a given level of description of the registered shapes. For validation, we compare the automatically tracked coronaries with those segmented manually at end-diastolic phase for each subject.

Bibtex

@inproceedings{zhang_2010_coronary_motion_estimation, author = {D. P. Zhang and L. Risser and F.-X. Vialard and P. Edwards and C. Metz and L. Neefjes and N. Mollet and W. Niessen and D. Rueckert }, title = {Coronary Motion Estimation from CTA Using Probability Atlas and Diffeomorphic Registration}, booktitle = {5th International Workshop on Medical Imaging and Augmented Reality}, year = {2010}, }

Nonrigid Registration and Template Matching for Coronary Motion Modeling from 4D CTA
D. P. Zhang, L. Risser, O. Friman, C. Metz, L. Neefjes, N. Mollet, W. Niessen and D. Rueckert 4th International Workshop on Biomedical Image Registration, (paper + presentation) 2010.
Abstract

In this paper, we present a method for coronary artery motion tracking in 4D cardiac CT angiogram data sets. The proposed method allows the construction of patient-specific 4D coronary motion model from pre-operative CTA which can be used for guiding totally endoscopic coronary artery bypass surgery (TECAB). The proposed approach consists of three steps: Firstly, the coronary arteries are extracted in the end-diastolic time frame using a minimal cost path approach. To achieve this, the start and end points of the coronaries are identified interactively and the minimal cost path between the start and end points is computed using A* graph search algorithm. Secondly, the cardiac motion is estimated throughout the cardiac cycle by using a non-rigid image registration technique based on a free-form B-spline transformation model and maximization of normalized mutual information. Finally, coronary= arteries are tracked automatically through all other phases of the cardiac cycle. This is estimated by deforming the extracted coronaries at end-diastole to all other time frames according the motion field acquired in second step. The estimated coronary centerlines are then refined by template matching algorithm to improve the accuracy. We compare the proposed approach with two alternative approaches: The first approach is based on the minimal cost path extraction of the coronaries with start and end points manually identified in each time frame while the second approach is based on propagating the extracted coronaries from the end-diastolic time frame to other time frames using image-based non-rigid registration only. Our results show that the proposed approach performs more robustly than the non-rigid registration based method and that the resulting motion model is comparable to the motion model constructed from semi-automatic extractions of the coronaries in all time frames.

Bibtex

@incollection{ year={2010}, isbn={978-3-642-14365-6}, booktitle={Biomedical Image Registration}, volume={6204}, series={Lecture Notes in Computer Science}, editor={Fischer, Bernd and Dawant, BenoîtM. and Lorenz, Cristian}, doi={10.1007/978-3-642-14366-3_19}, title={Nonrigid Registration and Template Matching for Coronary Motion Modeling from 4D CTA}, url={http://dx.doi.org/10.1007/978-3-642-14366-3_19}, publisher={Springer Berlin Heidelberg}, keywords={Nonrigid Deformation; Computer Integrated Surgery; Intra-modality Registration; Motion Detection and Tracking}, author={Zhang, DongPing and Risser, Laurent and Friman, Ola and Metz, Coert and Neefjes, Lisan and Mollet, Nico and Niessen, Wiro and Rueckert, Daniel}, pages={210-221} }

Coronary Artery Motion Modeling from 3D Cardiac CT Sequences Using Template Matching and Graph Search
D. P. Zhang, L. Risser, C. Metz, L. Neefjes, N. Mollet, W. Niessen and D. Rueckert IEEE International Symposium on Biomedical Imaging: From Nano to Macro, (paper + presentation) 2010.
Abstract

We present a novel method for coronary artery motion tracking in 4D cardiac CT data sets. The algorithm allows the automatic construction of a 4D coronary motion model from pre-operative CT which can be used for guiding totally-endoscopic coronary artery bypass surgery (TECAB). The proposed approach is based on two steps: In the first step, the coronary arteries are extracted in the end-diastolic time frame using a minimal cost path approach. To achieve this, the start and end points of the coronaries are identified interactively and the minimal cost path between the start and end points is computed using the A* graph algorithm. In the second stage the coronaries are tracked automatically through all other phases of the cardiac cycle. This is achieved by automatically identifying the start and end points in subsequent time points through a non-rigid template-tracking algorithm. Once the start and end points have been located, the minimal cost path is constructed in every time frame.

We compare the proposed approach to two alternative approaches: The first one is based on a semi-automatic extraction of the coronaries with start and end points manually supplied in each time frame and the second approach is based on propagating the extracted coronaries from the end-diastolic time frame to other time frames using non-rigid registration. Our results show that the proposed approach performs significantly better than non-rigid registration based method and that the resulting motion model is comparable to the motion model constructed from semi-automatic extractions of the coronaries.

Bibtex

@INPROCEEDINGS{5490171, author={Dong Ping Zhang and Risser, L. and Metz, C. and Neefjes, L. and Mollet, N. and Niessen, W. and Rueckert, D.}, booktitle={Biomedical Imaging: From Nano to Macro, 2010 IEEE International Symposium on}, title={Coronary artery motion modeling from 3D cardiac CT sequences using template matching and graph search}, year={2010}, pages={1053-1056}, keywords={blood vessels;cardiology;computerised tomography;image sequences;medical image processing;surgery;3D cardiac CT sequence;4D cardiac CT data set;cardiac cycle;coronary artery motion modeling;graph search;nonrigid template tracking algorithm;template matching;totally endoscopic coronary artery bypass surgery;Arteries;Cardiology;Computed tomography;Costs;Educational institutions;Heart;Radiology;Robots;Surgery;Tracking;Cardiovascular Image Analysis;Image Guided Surgery;Image Registration;Motion Detection and Tracking}, doi={10.1109/ISBI.2010.5490171}, ISSN={1945-7928}, }

Coronary Artery Tracking from Dynamic Cardiac CT Sequences
D. P. Zhang, O. Pedro, K. Mori, P. J. Edwards and D. Rueckert 13th Annual Conference on Medical Image Understanding and Analysis, (paper + presentation) 2009.
Abstract

Abstract

Bibtex

@inproceedings{zhangMIUA, author = {D. P. Zhang, O. Pedro, K. Mori, P. J. Edwards and D. Rueckert}, title = { 13th Annual Conference on Medical Image Understanding and Analysis}, booktitle = {13th Annual Conference on Medical Image Understanding and Analysis}, year = {2009}, location = {UK}, }

4D motion modeling of the coronary arteries from CT images for robotic assisted minimally invasive surgery
D. P. Zhang, E. Edwards, L. Mei and D. Rueckert SPIE Medical Imaging, (paper + presentation) 2009.
Abstract

Abstract

Bibtex

@INPROCEEDINGS{2009SPIE.7259E..31Z, author = {{Zhang}, D.~P. and {Edwards}, E. and {Mei}, L. and {Rueckert}, D. }, title = “{4D motion modeling of the coronary arteries from CT images for robotic assisted minimally invasive surgery}”, booktitle = {Society of Photo-Optical Instrumentation Engineers (SPIE) Conference Series}, year = 2009, series = {Society of Photo-Optical Instrumentation Engineers (SPIE) Conference Series}, volume = 7259, month = feb, eid = {72590X}, doi = {10.1117/12.811518}, adsurl = {http://adsabs.harvard.edu/abs/2009SPIE.7259E..31Z}, adsnote = {Provided by the SAO/NASA Astrophysics Data System} }

Image Guidance for Robotic Minimally Invasive Coronary Artery Bypass
M. Figl, D. Rueckert, D. J. Hawkes, R. Casula, M. Hu, O. Pedro, D. P. Zhang, G. P. Penny, F. Bello and P. J. Edwards Computerized Medical Imaging and Graphics 2010.
Abstract

A novel system for image guidance in totally endoscopic coronary artery bypass (TECAB) is presented. Key requirement is the availability of 2D–3D registration techniques that can deal with non-rigid motion and deformation. Image guidance for TECAB is mainly required before the mechanical stabilisation of the heart, when the most dominant source of misregistration is the deformation and non-rigid motion of the heart.

To augment the images in the endoscope of the da Vinci robot, we have to find the transformation from the coordinate system of the preoperative imaging modality to the system of the endoscopic cameras.

In a first step we build a 4D motion model of the beating heart. Intraoperatively we can use the ECG or video processing to determine the phase of the cardiac cycle, as well as the heart and respiratory frequencies. We then take the heart surface from the motion model and register it to the stereo endoscopic images of the da Vinci robot resp. of a validation system using photo-consistency. To take advantage of the fact that there is a whole image sequence available for registration, we use the different phases together to get the registration. We found the similarity function to be much smoother when using more phases. This also showed promising behaviour in convergence tests.

Images of the vessels available in the preoperative coordinate system can then be transformed to the camera system and projected into the calibrated endoscope view using two video mixers with chroma keying. It is hoped that the augmented view can improve the efficiency of TECAB surgery and reduce the conversion rate to more conventional procedures.

Bibtex

@incollection{ year={2008}, isbn={978-3-540-79981-8}, booktitle={Medical Imaging and Augmented Reality}, volume={5128}, series={Lecture Notes in Computer Science}, editor={Dohi, Takeyoshi and Sakuma, Ichiro and Liao, Hongen}, doi={10.1007/978-3-540-79982-5_23}, title={Image Guidance for Robotic Minimally Invasive Coronary Artery Bypass}, url={http://dx.doi.org/10.1007/978-3-540-79982-5_23}, publisher={Springer Berlin Heidelberg}, author={Figl, Michael and Rueckert, Daniel and Hawkes, David and Casula, Roberto and Hu, Mingxing and Pedro, Ose and Zhang, DongPing and Penney, Graeme and Bello, Fernando and Edwards, Philip}, pages={202-209} }

Coronary Motion Modeling for Augmented Reality Guidance of Endoscopic Coronary Artery Bypass
M. Figl, D. Rueckert, D. J. Hawkes, R. Casula, M. Hu, O. Pedro, D. P. Zhang, G. P. Penny, F. Bello and P. J. Edwards International Symposium on Biomedical Simulation 2008.
Abstract

The overall aim of our project is to guide totally endoscopic coronary artery bypass. This requires construction of a 4D preoperative model of the coronary arteries and myocardium. The model must be aligned with the endoscopic view of the patient’s beating heart and presented to the surgeon using augmented reality. We propose that the model can be constructed from coronary CT. Segmentation can be performed for one phase of the cardiac cycle only and propagated to the others using non-rigid registration. We have compared the location of the coronaries produced by this method to hand segmentation.

Registration of the model to the endoscopic view of the patient is achieved in two phases. Temporal registration is performed by identification of corresponding motion between model and video. Then we calculate photo-consistency between the two da Vinci endoscope views and average over the frames of the motion model. This has been shown to improve the shape of the cost function. Phantom results are presented.

The model can then be transformed to the calibrated endoscope view and overlaid using two video mixers.

Bibtex

@incollection{ year={2008}, isbn={978-3-540-70520-8}, booktitle={Biomedical Simulation}, volume={5104}, series={Lecture Notes in Computer Science}, editor={Bello, Fernando and Edwards, P.J.Eddie}, title={Coronary Motion Modelling for Augmented Reality Guidance of Endoscopic Coronary Artery Bypass}, publisher={Springer Berlin Heidelberg}, author={Figl, Michael and Rueckert, Daniel and Hawkes, David and Casula, Roberto and Hu, Mingxing and Pedro, Ose and Zhang, DongPing and Penney, Graeme and Bello, Fernando and Edwards, Philip}, pages={197-202} }

Image Guidance for Robotic Minimally Invasive Coronary Artery Bypass
M. Figl, D. Rueckert, D. J. Hawkes, R. Casula, M. Hu, O. Pedro, D. P. Zhang, G. P. Penny, F. Bello and P. J. Edwards International Workshop on Medical Imaging and Augmented Reality 2008.
Abstract

A novel system for image guidance in totally endoscopic coronary artery bypass (TECAB) is presented. Key requirement is the availability of 2D-3D registration techniques that can deal with non-rigid motion and deformation. Image guidance for TECAB is mainly required before the mechanical stabilization of the heart, thus the most dominant source of non-rigid deformation is the motion of the beating heart.

To augment the images in the endoscope of the da Vinci robot, we have to find the transformation from the coordinate system of the preoperative imaging modality to the system of the endoscopic cameras.

In a first step we build a 4D motion model of the beating heart. Intraoperatively we can use the ECG or video processing to determine the phase of the cardiac cycle. We can then take the heart surface from the motion model and register it to the stereo-endoscopic images of the da Vinci robot using 2D-3D registration methods. We are investigating robust feature tracking and intensity-based methods for this purpose.

Images of the vessels available in the preoperative coordinate system can then be transformed to the camera system and projected into the calibrated endoscope view using two video mixers with chroma keying. It is hoped that the augmented view can improve the efficiency of TECAB surgery and reduce the conversion rate to more conventional procedures.

Bibtex

@incollection{ year={2008}, isbn={978-3-540-79981-8}, booktitle={Medical Imaging and Augmented Reality}, volume={5128}, series={Lecture Notes in Computer Science}, editor={Dohi, Takeyoshi and Sakuma, Ichiro and Liao, Hongen}, title={Image Guidance for Robotic Minimally Invasive Coronary Artery Bypass}, url={http://dx.doi.org/10.1007/978-3-540-79982-5_23}, publisher={Springer Berlin Heidelberg}, author={Figl, Michael and Rueckert, Daniel and Hawkes, David and Casula, Roberto and Hu, Mingxing and Pedro, Ose and Zhang, DongPing and Penney, Graeme and Bello, Fernando and Edwards, Philip}, pages={202-209} }

Augmented Reality Image Guidance for Minimally Invasive Coronary Artery Bypass
M. Figl, D. Rueckert, D. J. Hawkes, R. Casula, M. Hu, O. Pedro, D. P. Zhang, G. P. Penny, F. Bello and P. J. Edwards SPIE Medical Imaging: Visualization, Image-guided procedures and Modeling 2008.
Abstract

We propose a novel system for image guidance in totally endoscopic coronary artery bypass (TECAB). A key requirement is the availability of 2D-3D registration techniques that can deal with non-rigid motion and deformation. Image guidance for TECAB is mainly required before the mechanical stabilization of the heart, thus the most dominant source of non-rigid deformation is the motion of the beating heart. To augment the images in the endoscope of the da Vinci robot, we have to find the transformation from the coordinate system of the preoperative imaging modality to the system of the endoscopic cameras. In a first step we build a 4D motion model of the beating heart. Intra-operatively we can use the ECG or video processing to determine the phase of the cardiac cycle. We can then take the heart surface from the motion model and register it to the stereo-endoscopic images of the da Vinci robot using 2D-3D registration methods. We are investigating robust feature tracking and intensity-based methods for this purpose. Images of the vessels available in the preoperative coordinate system can then be transformed to the camera system and projected into the calibrated endoscope view using two video mixers with chroma keying. It is hoped that the augmented view can improve the efficiency of TECAB surgery and reduce the conversion rate to more conventional procedures.

Bibtex

Bibtex

Registration of a 4D Cardiac Motion Model to Endoscopic Video for Augmented Reality Image Guidance of Robotic Coronary Artery Bypass.
M. Figl, D. Rueckert, D. J. Hawkes, R. Casula, M. Hu, O. Pedro, D. P. Zhang, G. P. Penny, F. Bello and P. J. Edwards International Workshop on Augmented Environments for Medical Imaging and Computer-aided Surgery 2008.
Abstract

The aim of the work described in this paper is registration of a 4D preoperative motion model of the heart to the video view of the patient through the intraoperative endoscope, in order to overlay the real video sequence with it. As the heart motion is cyclical it can be modeled using multiple reconstructions of cardiac gated coronary CT. We propose the use of photoconsistency between the two views through the da Vinci endoscope to align to the preoperative heart surface model from CT. We propose averaging of the photoconsistency over the cardiac cycle to improve the registration compared to a single view. Results are presented for simulated renderings and for real video of a beating heart phantom. We found much smoother behaviour of the test function at the minimum when using multiple phases for the registration, furthermore convergence was found to be better when more phases are used.

Bibtex

Bibtex

Cardiac CT Image Analysis with Subdivision Method and Nonrigid Image Registration.
D. P. Zhang, P. J. Edwards, D. Rueckert Medical Image and Signal Analysis 2007.
Abstract

Abstract

Bibtex

Bibtex

Invited talks

Exploring the design space of processing-in-memory architecture for exascale computing
D. P. Zhang The 7th Workshop on Programmability Issues for Heterogeneous Multicores, Keynote 2014.
Abstract

This talk highlights the growing importance of co-design of software solutions and hardware architectures in general purpose computing industry. It also explores why computational capacity as a single metric is out-of-date and finally why power consumption of communication and data movement are as important.To provide an example, this talk focuses on exploration of processing-in-memory (PIM) design space through the software design solution of a PIM API, simulator and performance models.

Processing-in-memory research
D. P. Zhang Grace Hopper celebration of women in computing Conference 2013.
Supporting Computer Vision through High Performance GPU Programming
D. P. Zhang Plenary talk, IEEE Winter Vision Meetings 2013.
Abstract

In this talk, I will discuss the support that AMD hardware and software infrastructure can provide for developing applications in computer vision and its related domains. It includes offering and supporting the OpenCL C++ binding and OpenCL C++ kernel language extension, The BOLT C++ template library for harnessing heterogeneous compute power, the OpenCL module developed for the industry standard OpenCV library and two other university collaboration projects: content-based image retrieval and future computing architectures for simultaneous localization and mapping (SLAM). This presentation will also highlight the evolution of AMD discrete GPU and APU architecture designs and how AMD is working to increase the programmability and ease the domain-specific scientists’ access to this new level of compute resources.

Biomedical data analysis on heterogeneous platforms
D. P. Zhang AMD Fusion Developer Summit 2012.

PhD Thesis

Coronary artery segmentation and motion modelling
Dong Ping Zhang Department of Computing, Imperial College London, UK 2010.
Abstract

Conventional coronary artery bypass surgery requires invasive sternotomy and the use of a cardiopulmonary bypass, which leads to long recovery period and has high infectious potential. Totally endoscopic coronary artery bypass (TECAB) surgery based on image guided robotic surgical approaches have been developed to allow the clinicians to conduct the bypass surgery o -pump with only three pin holes incisions in the chest cavity, through which two robotic arms and one stereo endoscopic camera are inserted. However, the restricted eld of view of the stereo endoscopic images leads to possible vessel misidenti cation and coronary artery mis-localization. This results in 20-30% conversion rates from TECAB surgery to the conventional approach.

We have constructed patient-speci c 3D + time coronary artery and left ventricle motion models from preoperative 4D Computed Tomography Angiography (CTA) scans. Through temporally and spatially aligning this model with the intraoperative endoscopic views of the patient’s beating heart, this work assists the surgeon to identify and locate the correct coronaries during the TECAB precedures. Thus this work has the prospect of reducing the conversion rate from TECAB to conventional coronary bypass procedures.

This thesis mainly focus on designing segmentation and motion tracking methods of the coronary arteries in order to build pre-operative patient-speci c motion models. Various vessel centreline extraction and lumen segmentation algorithms are presented, including intensity based approaches, geometric model matching method and morphology-based method. A probabilistic atlas of the coronary arteries is formed from a group of subjects to facilitate the vascular segmentation and registration procedures. Non-rigid registration framework based on a free-form deformation model and multi-level multi-channel large deformation di eomorphic metric mapping are proposed to track the coronary motion. The methods are applied to 4D CTA images acquired from various groups of patients and quantitatively evaluated.

Bibtex

@phdthesis{Zhang2010, author = {Zhang, Dong Ping}, title = {Coronary artery segmentation and motion modelling}, year = {2010}, school = {Department of Computing, Imperial College London, United Kingdom}, }

MSc Thesis

Constrained Optimization Techniques for Image Registration
Dong Ping Zhang Department of Computing, Imperial College London, UK, 2006, Distinction Award
Abstract

Medical image registration problems seek the best transformations that minimize the cost function composed of similarity measurements. Normalized mutual information is used to measure the alignment of different modality images in this project. Optimization techniques are used to accelerate the automatic registration process. This thesis compares various optimization methods with respect to accuracy and robustness.

Five gradient-based and three gradient-free optimization methods are compared: nonlinear conjugate gradient DR, conjugate gradient HZDZ, conjugate gradient HSDY, Powell, quasi-Newton, downhill descent DR, Kiefer Wolfowitz, simultaneous perturbation stochastic approximation.

The performance of the methods is tested in two types of experiments. Firstly, the registration from CT brain images to MR brain images are done. Secondly, PET and MR brain images are registered. Six MR-CT and four MR-PET data sets used in this project have manually registered deformation available as golden standard. Special attention is paid to the accuracy and convergence speed of the optimization methods. The experiments show that Powell optimizer is the best choice for both registration problems. All three conjugate gradient approaches achieve similar performance.

Bibtex

@MScthesis{Zhang2006, author = {Zhang, Dong Ping}, title = {Constrained Optimization Techniques for Image Registration}, year = {2006}, school = {Department of Computing, Imperial College London, United Kingdom}, }