*march wb, xiao b, yu cd, biros g*

** askit: an efficient parallel library for high-dimensional kernel summations**

*siam journal on scientific computing, 2016 (in print)*

abstract

*xiao b, biros g*

** parallel algorithms for nearest neighbor search problems in high dimensions**

*siam journal on scientific computing, 2016 (in print)*

abstract

The library supports both exact and approximate nearest neighbor searches. The latter is based on iterative, randomized, and greedy KD-tree searches. We describe novel algorithms for the construction of the KD-tree, give complexity analysis, and provide experimental evidence for the scalability of the method. In our largest runs, we were able to perform an all-neighbors query search on a 13 TB synthetic dataset of 0.8 billion points in 2,048 dimensions on the 131K cores on Oak Ridge’s XK6 “Jaguar” system. These results represent several orders of magnitude improvement over current state-of-the-art methods. Also, we apply our method to non-synthetic data from machine learning data repositories. For example, we perform an all-nearest-neighbor search on a variant of the ”MNIST” handwritten digit dataset with 8 million points in 784 dimensions on 16,384 cores of the ”Stampede” system at the Texas Advanced Computing Center, achieving less than one second per RKDT iteration.

*yu c, huang j, austin w, xiao b, biros g*

**perfomance optimization of the k-nearest neighbors kernel on x86 architectures**

*acm/ieee SC'15, austin tx, november 2015*

abstract

*march b, xiao b, tharakan s, yu c, biros g*

**a kernel independent FMM in general dimensions **

*acm/ieee SC'15, austin tx, november 2015*

abstract

*march b, xiao b, tharakan s, yu c, biros g*

**robust treecode approximation for kernel machines **

*Proceedings of the 21st ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD15), ACM, Sydney, Australia, August 2015*

abstract

for example the bandwidth for the Gaussian kernel. We consider two approximation methods: Nystrom and an algebraic treecode developed in our group. Nystrom methods construct a global low-rank approximation of the kernel matrix. Treecodes approximate just the off-diagonal blocks, typically using a hierarchical decomposition. We present a theoretical error analysis of our treecode and relate it to the error of Nystrom methods. Our analysis reveals how the block-rank structure of the kernel matrix controls the performance of the treecode. We evaluate our treecode by comparing it to the classical Nystrom method and a state-of-the-art fast approximate Nystrom method. We test the kernel matrix approximation accuracy for

several different bandwidths and datasets. On the MNIST2M dataset (2M points in 784 dimensions) for a Gaussian kernel with bandwidth $h=1$, the Nystrom methods' error is over 90\% whereas our treecode delivers error less than 1\%. We also test the performance of the three methods on binary classification using two models: a Bayes classifier and kernel ridge regression. Our evaluation reveals the existence of bandwidth values that should be examined in cross-validation but whose corresponding kernel matrices cannot be approximated well by Nystrom methods. In contrast, the treecode scheme performs much better for these values.

*march b, xiao b, tharakan s, yu c, biros g*

**An algebraic parallel treecode in arbitrary dimensions **

*Proceedings of the 29st IEEE International Parallel & Distributed Processing Symposium (IPDPS15), Hyderabad, India, May 2015*

abstract

We introduce novel parallel algorithms for ASKIT, derive complexity estimates, and demonstrate scalability on synthetic, scientific, and image datasets. In particular, we introduce a local essential tree construction that extends to arbitrary dimensions in a scalable manner. We introduce data transformations for memory locality and use GPU acceleration. We report results on the “Maverick” and “Stampede” systems at the Texas Advanced Computing Center. Our largest computations involve two billion points in 64 dimensions on 32,768 x86 cores and 8 million points in 784 dimensions on 16,384 x86 cores.

*gholami a, sundar h, malhotra d, biros g*

**FFT,FMM, or Multigrid? A comparative study of state-of-the-art Poisson solvers **

*Submitted for publication.*

abstract

*malhotra d, biros g*

**a distributed memory fast multipole method for volume potentials**

*Submitted for publication.*

abstract

parallelism to employ the Xeon Phi on the Stampede platform at the Texas Advanced Computing Center (TACC). We achieve about 600GFlop/s of double precision performance on a single node. Our largest run on Stampede took 3.5s on 16K cores for a problem with 18E+9 unknowns for a highly nonuniform particle distribution (corresponding to an effective resolution exceeding 2E+23 unknowns since we used 23 levels in our octree).

*moon l, long d, joshi s, tripathi v, xiao b, biros g*

**parallel algorithms for clustering and nearest neighbor search problems in high dimensions**

*acm/ieee scxy conference series, poster *

abstract

*rahimian a, lashuk i, veerapaneni s. k, aparna c, malhotra d, moon l, sampath r, shringarpure a, vetter j, vuduc r, zorin d, biros g*

**petascale direct numerical simulation of blood flow on 200K cores and heterogeneous architectures**

*acm/ieee scxy conference series, pp. 1–11, 2010, (Gordon Bell Prize)*

abstract

*veerapaneni sk, rahimian a, biros g, zorin d*

**a fast algorithm for simulating vesicle flows in three dimensions**

*in review, pp. 1–40, 2010*

abstract

*chaillat s, biros g*

**FaIMS: a fast algorithm for the inverse medium problem with multiple frequencies and multiple sources for the scalar Helmholtz equation**

*Journal of Computational Physics, 231(20), pp. 4403 - 4421, 2012*

abstract

*ghilliotti g, rahimian a, biros g, misbah c*

**vesicle migration and spatial organization driven by flow line curvature**

*physical review letters, in press, pp. 1–4, 2010*

abstract

*gooya a, biros g, davatzikos c*

**deformable registration of glioma images using an EM algorithm and diffusion-reaction modeling **

*IEEE Transcations in Medical Imaging, in press, pp. 1–15, 2010*

abstract

*sampath rs, biros g*

**parallel elastic registration using a multigrid preconditioned Gauss-Newton-Krylov solver, grid continuation and octrees **

*in review, pp. 1–30, 2010*

abstract

*adavani s, biros g*

**fast algorithms for inverse problems with parabolic PDE constraints**

*in review, pp. 1–15, 2010*

abstract

*rahimian a, veerapaneni sk, biros g*

**dynamic simulation of locally inextensible vesicles suspended in an arbitrary two-dimensional domain, a boundary integral method**

*journal of computational physics, pp. 6466–6484, (229) 2010*

abstract

*sampath rs, biros g*

**a parallel geometric multigrid method for finite elements on octree meshes**

*siam journal on scientific computing, 32(3), pp.1361–1392, 2010*

abstract

*aparna c, williams s,oliker l, lashuk i, biros g, vuduc r*

**optimizing and tuning the fast multipole method for state-of-the-art multicore architectures**

*ieee proceedings of ipdps, pp. 1–15, 2010*

abstract

*kaoui b, biros g, misbah c*

**why do red blood cells have asymmetric shapes even in a symmetric flow?**

*physical refview laters, (103), 2009*

abstract

*lashuk i, aparna c, langston h, nguyen ta, sampath ra, shringarpure a, vuduc r, ying l, zorin d, biros g*

**a massively parallel adaptive fast-multipole method on heterogeneous architectures**

*acm/ieee scxy conference series, pp. 1–11, 2009*

abstract

sampath s.s, adavani s.s, sundar h, lashuk i, and biros g

dendro: parallel algorithms for multigrid and amr methods on 2:1 balanced octrees

acm/ieee scxy conference series, 2008

veerapaneni s.k, raj r, biros g, and purohit p.k

analytical and numerical solutions for shapes of quiescent 2D vesicles

international journal of nonlinear mechanics, 2008

sundar h, davatzikos c, and biros g

biomechanically-constrained 4D estimation of myocardial motion

submitted for publication, 2008

veerapaneni s.k, geuyffier d, zorin d, and biros g

a boundary integral method for simulating the dynamics of inextensible vesicles suspended in a viscous fluid in 2D

journal of computational physics, 2009

biros g and dogan g

a multilevel algorithm for inverse problems with elliptic PDE constraints

inverse problems , 2008

veerapaneni s.k and biros g

the Chebyshev fast Gauss and nonuniform fast Fourier transforms and their application to the evaluation of distributed heat potentials

journal of computational physics, 2008

sundar h, sampath r.s, adavani s.s, davatzikos c, and biros g

low-constant parallel algorithms for finite element simulations using linear octrees

acm/ieee scxy conference series, 2007

sundar h, sampath r.s, and biros g

bottom-up construction and 2:1 balance refinement of linear octrees in parallel

siam journal on scientific computing, 2007

adavani s.s and biros g

multigrid algorithms for inverse problems with linear parabolic pde constraints

siam journal on scientific computing, 2007

hogea c, davatzikos c, and biros g

an image-driven parameter estimation problem for a reaction-diffusion glioma growth model with mass effects

journal of mathematical biology, 2007

veerapaneni s.k. and biros g

a fast high-order integral equation solver for the heat equation with moving boundaries in 1d

siam journal on scientific computing, 2007

ying l, biros g, and zorin d

a high-order 3d boundary integral equation solver for elliptic PDEs with smooth boundaries

journal of computational physics, 2006

akcelik v, biros g, draganescu a, ghattas o, hill j, and van bloemen waanders b

dynamic data-driven inversion for terascale simulations: real-time identification of airborne contaminants

acm/ieee scxy conference series, 2005

ying l, biros g, zorin d, and harper l

a new parallel kernel-independent fast multipole method

acm/ieee scxy conference series, 2003

ying l. biros g, and zorin d

a kernel-independent adaptive fast multipole method in two and three dimensions

journal of computational physics, 2004

akcelik v, bielak j, biros g, epanomeritakis i, fernandez a, ghattas o, kim e.j. , o'hallaron d, and tu t

high-resolution forward and inverse earthquake modeling on terascale computers

acm/ieee scxy conference series, 2003

akcelik v, biros g, and ghattas o.

parallel multiscale Gauss-Newton-Krylov methods for inverse wave propagation

acm/ieee scxy conference series, 2002

biros g, ying l, and zorin d.

an embedded boundary integral equation solver for the unsteady incompressible Navier-Stokes equations

submitted 2004

akcelik v, biros g, ghattas o,long k. r. and van bloemen waanders b

a variational finite element method for source inversion for convective-diffusive transport

finite elements in analysis and design, 2002

biros g, ying l. and zorin d.

an embedded boundary integral equation solver for the Stokes equations

journal of computational physics, 2004

biros g, ying l. and zorin d.

the embedded boundary integral equation solver for the incompressible Navier-Stokes equations

international association for boundary element methods symposium, 2002

biros g. and ghattas o

inexactness issues in LNKS algorithms for PDE-constrained optimization

springer's lecture notes in computational science and engineering, 2002

biros g. and ghattas o.

parallel Lagrange-Newton-Krylov-Schur methods for PDE-constrained optimization.

part i: the Krylov-Schur solver

siam journal on scientific computing, 2005

biros g. and ghattas o.

parallel Lagrange-Newton-Krylov-Schur methods for PDE-constrained optimization.

part ii: the Lagrange-Newton solver, and its application to optimal control of steady viscous flows

siam journal on scientific computing, 2005

biros g. and ghattas o.

a Lagrange-Newton-Krylov-Schur method for PDE-constrained optimization

SIAG/OPT news and views,2000

biros g. and ghattas o

parallel domain decomposition methods for optimal control of viscous incompressible flows

parallel computational fluid mechanics, 1999

biros g. and ghattas o

parallel Newton-Krylov algorithms for PDE-constrained optimization

acm/ieee scxy conference series,1999

biros g, kallivokas l. f, ghattas o, jaramaz b

direct ct-scan to finite element modeling using a 3d fictitious domain method with an application to biomechanics

fourth u.s national congress on computational mechanics, 1997

biros g.

2d contour smoothing and surface reconstruction of tubular anatomical structures

ms thesis (biomedical engineering), 1996

biros g, dimitropoulos d, and polymenidis p

expert system development for machinery maintenance using temperature and vibration monitoring

diploma thesis (mechanical engineering)

aristotle university, 1995