This page includes select material from formal print and electronic publications.

It does not include my blog, my newsletters ( Complex Machinery and Block & Mortar ), or my one-pager guides ( “Will AI Help Here?” and “How Do I Do AI?” ).

Understanding Patterns of Disruption

I paired up with longtime co-author Ken Gleason to explore how shifts in the technology landscape have opened (or, even, reopened ) the door on disruptive, market-shifting business models. In this paper, we mix technology, economics, and markets to show how to spot a potential business disruptor and how to make the most of it.

On Leadership

Many people migrate from hands-on technical roles to leadership positions without formal management training. They often learn the hard way that being an engineering manager or analytics lead is not natural progression of their technical skill set, but instead requires that they develop a very different kind of muscle. In this O'Reilly Radar piece, a colleague and I offer guidance for the newly-minted leaders and leaders-to-be.

Business Models for the Data Economy

The recent surge in data collection and analysis opens up a number of business models, though only a couple of them get much attention. Business Models for the Data Economy explores eight ways to add value and generate revenue in the world of data.

This blog post on O'Reilly Radar, “Building a Business on Data” describes the paper in greater detail. You can also download it for free through the O'Reilly catalog.

Steering the ship that is data science

This was the second in a set of O'Reilly Radar posts I co-authored with Mike Loukides (@mikeloukides). We explored some parallels between today’s data science boom and the late-1990s tech boom. In particular, we ask: how can data science reap the rewards of being the Hot New Thing while avoiding its pitfalls?

Leading Indicators

This is the first in a set of O'Reilly Radar posts I co-authored with Mike Loukides (@mikeloukides). We pondered how to size up an organization’s data science efforts from the outside, perhaps as a possible interview candidate.

Bad Data Handbook: Mapping the World of Data Problems

A road map of data problems and solutions. This book describes various real-world data problems, from the hands-on technical grunt work to the high-level strategic issues.

I was the book's editor, which means I was responsible for developing the concept and leading the project.
I supported and coordinated the efforts of nineteen contributing authors. I also co-wrote a chapter, “Data Quality Analysis Demystified: Knowing When Your Data Is Good Enough”.

Parallel R: Data Analysis in the Distributed World

Parallel R describes strategies for getting R to work in the Big-Data era. In other words, Stephen Weston and I explain how to work past R's limitations – being memory-bound and single-threaded – and let R work in a parallel, distributed manner suited to modern datasets.

The book covers well-known R packages for parallelism (Snow, Multicore, Parallel) as well as newer, Hadoop-related tools (RHIPE, Segue, Hadoop streaming). Much of my contribution explores how to mix R and Hadoop.

Managing RPM-Based Systems with Kickstart and Yum

An exploration of automated builds and systems management, using the RedHat Kickstart and yum tools.

APR Networking & the Reactor Pattern

Introduction to Apache Portable Runtime (APR) networking. I use the classic Reactor pattern as an example.

What Is Jetty

A page from the O'Reilly “What Is” series, this article describes the Jetty servlet container and its underlying API. Jetty is designed with embedding in mind; that is, you can add webapp (servlet, JSP, web services) functionality to a Java application without having to repackage it as a formal WAR.

GNU Autoconf

Use GNU Autoconf to simplify cross-platform builds of your native-code apps. Familiar with the standard ./configure; make; make install routine? Autoconf is what drives the ./configure step.

App-Managed DataSources with commons-dbcp

I'm all for standards, such as J2EE's container-managed database connection pooling. Sometimes, though, you have to take a different path. This article explains how to create a database connection pool inside your application using two Jakarta libraries, commons-pool and commons-dbcp.

Processing XML with Xerces and SAX

Second in a two-part series, this article explains how to use the SAX side of the (Apache) Xerces C++ library to process XML documents.

The Perl-Compatible Regular Expressions Library

Want the power of Perl's regular expressions (regexps) in your C and C++ apps? Use the Perl-Compatible Regular Expressions Library, or PCRE.

Processing XML with Xerces and the DOM

First in a two-part series, this article explains how to use the DOM side of the (Apache) Xerces C++ library to process XML documents.

Simplify Network Programming with libCURL

The curl commandline tool is a Swiss-Army knife of URL handling and downloading. Use its backend libCURL library to add file-transfer power to your native-code applications.

Pre-Patched Kickstart Installs

Third in a series, this article explains how to create a pre-patched Kickstart tree (that is, one with the updates already applied) and add some change control to your yum cronjobs.

Custom Containers & Iterators for STL-Friendly Code

Many C++ STL container objects look and act alike, but they don't share a parent class. Learn how to extend existing contianers or create new ones using STL's “concepts,” a kind of loosely-enforced polymorphism.

The Watchful Eye of FAM

Watching for changes in a file or directory? Calling poll() can be expensive. Let the File Alteration Monitor, or FAM, watch for you and report results to your code.

Advanced Linux Installations and Upgrades with Kickstart

Second in a series, this article shows how to customize your Kickstart process and leverage Kickstart for OS upgrades.

Migrating to Page Controllers

Use the Page Controller pattern in your PHP web applications to separate business logic from the HTML.

Hands-Off Fedora Installs with Kickstart

First in a series, this article is an introduction to the Kickstart automated OS-install tool for Linux. Why click through the installer a few (hundred) times? For Red Hat, Fedora, CentOS, and other RPM-based Linux distros, let Kickstart do the work so you can hang out at the pub.

Building a PHP Front Controller

Apply the Front Controller design pattern to your PHP apps, and in return you'll get a single entry point through which to apply common services (such as security or page templating).

Programming Linux 2.6

A review of the developer-oriented features in Linux kernel 2.6.

Changing a Program's Identity

Learn how to safely use the setuid() and setgid() system calls to make your app change its identity at runtime.

Writing a Trace System

You can't always use a debugger in production! Add a configurable trace (logging) system to your app so you can track down problems at runtime.

Software Packaging with RPM

The RPM is the unit of measurement Red Hat Linux and its derivatives (Fedora, CentOS, and so on). Learn how to package your software as an RPM, so you can take advantage of the OS's package management system.