talks and training

I speak at conferences, private industry gatherings, and other events. I also provide teaching/training sessions. My talks cover a variety of topics, but the unifying theme is solving business problems through practical application of data analysis and technology.

Would you like me to speak at your conference, company function, or other event?
Interested in on-site training for your team?

Please contact me. Services are available worldwide.


The following is a list of selected talks that I have delivered or will soon deliver. (Certain private talks are not disclused here.) I’m also working on other talks that are not yet mentioned here.

Well, Now What? The world of data seen through the lens of risk, bubbles, and rhinos

(delivered July 2020 at CDx Summit)

The COVID-19 pandemic has induced stress on, among other things, business budgets. Companies are now reviewing every department in search of ways to trim spending. Does this mean the end of data science/ML/AI’s heyday of no-accountability, no-questions-asked money? If so, how do we recover?

In this talk, I explore the data world’s coming scrutiny through the lens of financial risk, stock market bubbles, and gray rhinos.

Time Series Analysis for Trading: It’s All About the Residuals

(delivered 2019 and 2020 to a graduate class in algorithmic trading)

Time series analysis is a branch of mathematical modeling that focuses on time-based phenomena. This is a cornerstone of electronic trading, as market data is expressed as time series of prices.

In this lecture, I explored the theory behind the when/why of univariate time series modeling (AR, MA, ARIMA), and used live code examples to demonstrate the what/how of the model-evaluate cycle (pandas, statsmodels).

Professional Software Development: It’s More Than Just Code

(delivered 2018 and 2019 to an undergraduate computer science class)

It’s easy to think that being a software developer is all about writing code. That’s the focus of most computer science courses, and what’s most often mentioned in media coverage of the profession.

In reality, writing code is only a small part of being a successful and effective software developer. In this lecture, I walked students through several concrete, straightforward measures – “cheat codes,” as I called them – they could take to make the most of their career once they graduate.

DSS Podcast: Applications of Data Science in Media & Entertainment

(as host of the Data Science Salon Podcast )

I sat down with Harini Krishnan (Capsule8) and Ayan Battacharya (Deloitte Consulting) to explore data science in the world of media and entertainment.

DSS Podcast: Prolific vs. Private Data in Media Advertising

(as host of the Data Science Salon Podcast )

This was an extended conversation with Lauren Lombardo (Nielsen) and Sergey Fogelson (Viacom) on the impact of AI in advertising, and the ethics around certain practices.

Exploring Your Data / Generating Your Own Data

(delivered 2018 to a graduate class in algorithmic trading)

You’re a new analyst in the firm, and someone has just handed you a pile of data on which to build your models. What do you do?

In this lecture, I shared a step-by-step approach on how to explore a new, unfamiliar dataset while explaining the realities of the troublesome data they would see in the workplace. (This was a live demo, using Python and Jupyter Notebook.)

As bonus material, I demonstrated how they could generate their own data for model testing.

What is data science, really? (Getting past the hype and getting to business value.)

(delivered September 2017 to a group of executives as part of a private CollaborativeGain speaker series)

The business world is sold on the promise of data: whether you call it big data, data science, machine learning, or anything else, everyone seems hell-bent on using data to transform and improve their company. Once you push away the press hype and vendor pitches, though, there is little practical advice on how to get started. It’s no wonder that so many companies’ data efforts start off on the wrong path and ultimately derail with no return on investment.

The promise of data is indeed real. It doesn’t start with the alphabet soup of terminology, but with you, the leadership: you need to understand what is possible with data and develop a realistic plan. In this talk we’ll explore what “data science” really means, the people and processes you’ll need to make it work for you, and ways it can get you into hot water. You’ll get practical guidance from an industry veteran who has seen what works and what doesn’t.

Data Monetization: Turn Corporate Data into Revenue Streams

(delivered August 2017 at Dataversity)

Scan any number of financial industry news publications, and stories regarding Wall Street’s hunger for new data sets to improve alpha abound. While this might seem like a new trend, monetizing data - or the ability to turn corporate data into revenue streams - has existed for decades. But both the supply side and the demand side have changed. On the supply side, the extreme variety of data that now exists (location-based, geospatial, socio-demographic, online search trends, pricing, etc.) combines with high computing power and new digital requirements to create a fertile data market environment. On the demand side, to remain competitive, companies in a wide variety of industries, not just the financial sector, are leveraging data in all forms to maintain an edge or be disruptive.

During this session we’ll explore what data monetization is and the forms it can take; characteristics of data that could make it more valuable to external parties; and key considerations in making data products available to external parties. Intellectual property, data privacy, and contractual issues will also be explored.

Data Science, a strategist perspective

(as a guest on the Analytikus podcast )

I met up with Miguel Molina-Cosculluela, founder of Analytikus, to discuss the state of data science, how companies can make smart first steps in data, and how the field may change over time.

Beyond the Big Data Hype: Putting Analytics to Work

(keynote address – March 2016 at a private industry event)

At this private event, my co-presenter and I provided more than 100 executives with practical, industry-specific, actionable guidance on how to use data to improve their business standing. Our data maturity model served as a foundation for the talk, which explored business data initiatives from traditional business intelligence (BI) all the way through advanced reinforcement learning.

The Growing Emphasis on Leveraging IoT Data

(moderator – November 2015 at the ITA Internet of Things Summit)

As moderator, I led the discussion of four panelists to discuss business, social, and technological concerns of using Internet of Things (IoT) data in the enterprise. Panelists included representatives from the University of Chicago’s department of Computational Analysis and Public Policy, Zebra Technologies, IBM, and the Digital Manufacturing & Design Innovation Institute (DMDII).

Hadoop and Elastic MapReduce

(delivered January 2014 at DataPotluck)

Hadoop is a powerful tool for large-scale data analysis. That power comes with a hefty price tag: the cost of building and maintaining the underlying compute cluster can hinder Hadoop adoption in small and large firms alike.

In this talk, I’ll explain how to put Hadoop to work for you, and how to use Elastic MapReduce (EMR), the hosted Hadoop solution provided by Amazon Web Services. Learn how EMR can help you get Hadoop in a hurry and on the cheap, without the costly cluster commitment.

Building Your Analytics Shop, Step By Step

(delivered 2013/10/30 at Strata+Hadoop World 2013)

Also known as: Busting Myths About Building Your Analytics Shop.

Whether you call it “Big Data,” “data science,” or simply, “analytics,” the field has quickly become an integral part of business. There is plenty of technical guidance for the hands-on practitioner, but people in leadership roles – those who are responsible for setting a company’s direction and aligning analytics to business goals – are left with scant help beyond vendor marketing materials.

I delivered this talk with Brett Goldstein (@bjgol) at Strata+Hadoop World NYC 2013. It was based on our upcoming book, Making Analytics Work: Case by Case. We framed it as an exercise in busting myths of building analytics practice.

History Lesson

(delivered 2013/09/12 at DataGotham)

One can draw several parallels between today’s data analysis fever and the IT boom of the 1990s. How do we learn from the IT fallout, to reap our rewards yet avoid the pitfalls?

Integrating R+Hadoop into Your Data Analytics Pipeline

(delivered 2013/08/10 at KDD 2013 Big Data Camp)

This was built on the “R+Hadoop” talk, described below, but with a focus on when to apply that strategy to the data analytics pipeline.

As an analytics tool, R strains under modern large-scale datasets. People have devised a number of ways to help R function in the big-data arena, one of which is to drive it with Hadoop. But how does this work, and when is it an appropriate solution? This talk will describe the what and the how of mixing R and Hadoop, and more importantly, the when and why of this approach.

Text-mining Your City

(first delivered 2012/10/25 at Strata+Hadoop World 2012)

I presented this talk with Brett Goldstein, Chicago’s chief data officer (CDO). We had collaborated on some social media analysis for civic good, and delivered a talk to explore what we learned.

Dealing with Bad Data

(first delivered 2012/02/29 at Strata Conference)

People often limit the definition of “bad data” to missing values or difficult formats. I say that “bad data” is so much more: it is any data that gets in the way. This can involve missing or hard-to-access datasets, inconsistencies, unexpected modifications in upstream data sources, and so on. If it derails your analysis effort, then it’s bad data in my book.

Dealing with Bad Data explores this notion of “bad data,” how it impacts an analysis effort, and how to address the problems it causes.

Machine Learning’s Impact on the Library

(first delivered 2011/11 to a graduate-level Information Management class in an MLIS program)

Quite a bit of library science involves categorizing and indexing content, to make it easy to find later, and librarians have been doing this since long before “Big Data” was a buzzword. That puts librarians among the world’s first data miners.

Technology advances have changed the types and volume of content librarians manage, as well as the expectations of information-seekers. Similarly, technology will play a strong role in the evolution of library science. In this talk, I explained how natural langauge processing (NLP) and machine learning would influence the next phase of how librarians classify and index content.

A Bit of R & Hadoop: Getting R to Dance With the Elephant (a.k.a., “R+Hadoop”)

(I have delivered this presentation at various meetups in the US, Canada, and England.)

The formal title for this talk is, A Bit of R & Hadoop: Getting R to Dance With the Elephant. It is sometimes listed under other titles, such as Mixing R and Hadoop: Large-Scale Data Analysis and Computations, though people mostly know it as my “R+Hadoop talk.”

In short: R is quite a useful tool, but it strains under new-age Big Data problems. One solution is to use Hadoop’s scalable, parallel computing framework to drive R. This talk explores the what, how, and why of getting R to dance with the elephant. It starts with an introduction to Hadoop/MapReduce, and wraps up with some tools to use Hadoop through R.

Elastic MapReduce: Hadoop in the Cloud

This presentation’s formal title is, Elephants in the Cloud, on the Cheap. People sometimes refer to is as my “EMR talk.”

(I also offer an extended version of this talk, suitable for a teaching/training session. Please see below.)

Many companies want Hadoop’s power, but the cost of an on-site, self-managed cluster can be quite a shock. Amazon Web Services offers Elastic MapReduce (EMR), a hosted, on-demand form of Hadoop as an alternative to an on-site cluster. This session explains MapReduce concepts, Hadoop, and EMR, in terms of both theory and hands-on practical guidance.

training / teaching

I’m also working on other training materials that are not listed here. If you’d like training for your company on another topic, please let me know.

Introduction to R

R is a free, open-source, and powerful tool for data analysis. It has become even more popular in recent years as the analytics (“data science,” “Big Data”) field has grown.

That said, R’s commandline interface and differences from other programming languages can present a steep learning curve.

Would you like your team to use R? This hands-on training session provides a solid introduction to R basics, syntax, working with data, and graphics.

AWS: Elastic MapReduce: Hadoop in the Cloud

Many companies want Hadoop’s power, but the cost of an on-site, self-managed cluster can be quite a shock. Amazon Web Services offers Elastic MapReduce (EMR), a hosted, on-demand form of Hadoop as an alternative to an on-site cluster. This session explains MapReduce concepts, Hadoop, and EMR, in terms of both theory and hands-on practical guidance.

AWS: Asynchronous Messaging Systems Using Amazon SQS

Done well, message-driven systems can support a variety of robust, scalable, and flexible applications. They also offer “free” concurrency under certain circumstances. (Contrary to popular belief, async messaging is not just for trading systems.)

This session explores asynchronous messaging concepts, the basics of message-driven systems, and how to implement them using the Amazon Web Services Simple Queue Service (SQS) as the middleware.