talks and training

I speak at conferences, private industry gatherings, and other events. I also provide teaching/training sessions. My talks cover a variety of topics, but the unifying theme is solving business problems through practical application of data analysis and technology.

Would you like me to speak at your conference, company function, or other event?
Interested in on-site training for your team?

Please contact me. Services are available worldwide.


The following is a list of selected talks that I have delivered or will soon deliver. (Certain private talks are not disclused here.) I’m also working on other talks that are not yet mentioned here.

Be Friends, not Frenemies. AI and Educators: 5 Steps to Take Now

(delivered October 2023, to an audience of educators and administrators)

Following up on the February webinar, “ChatGPT In Education,” I met with Michael Manley (CTO, ThinkCERCA) and Daniel Rivera (Technology Director for First District RESA) to talk about AI in education.

Generative AI is poised to be around for a while, so it’s in educators’ best interests to put it to good use. Our talk explored new developments in the generative AI landscape, and offered five ways teachers and students can responsibly use generative AI in classroom settings.

ChatGPT In Education

(delivered February 2023, to an audience of educators and administrators)

Almost overnight, the name “ChatGPT” seems to be everywhere. Now that people have a tool that can generate reams of human-readable (though, not always correct or appropriate) text on-demand, what does that mean for educators? What’s a teacher to do when students can have a machine write their essays for them?

I delivered this talk with ThinkCERCA CTO Michael Manley. We explained what AI really is, how generative AI models work, and what ChatGPT can(not) do. We wrap up with three things educators can do right now when it comes to ChatGPT, plus three patterns to avoid.

What’s next for AI?

(delivered December 2022, at a private gathering of experienced technology leaders)

The field we now call “AI” started around 2009, when predictive analytics and Big Data started getting traction. More than a decade later, the field has certainly changed: we have new tools and new business use cases for this powerful technology. Then again, it’s very much the same: it is still all about, as I like to say, analyzing data for fun and profit.

This talk explores what we can expect for the next decade of AI.

(Some of the themes therein stem from the “Rebranding Data” article I published on O’Reilly Radar.)

ML/AI for Technical Leaders: Automating and Outsourcing ML

(delivered August 2022, at private gatherings for technical leaders: CTOs, VPs of Engineering, and so on)

This is the second talk in a series for technical leaders, to help them understand the work their company’s data scientists do and how to make the company more effective with ML/AI.

The tools for ML/AI continue to improve year after year. Companies are able to derive even greater benefit from this technology as a result. Automating machine learning (autoML) tools stand to reduce the time, effort, and money involved in developing ML models. They may also lead data scientists to question their purpose, and perhaps even their employment.

In this talk, I explore why autoML is such a powerful opportunity, describe the current state of the tools available, and explain how to broach this topic with your company’s data scientists and machine learning engineers.

(Some of the ideas in this talk relate to the article “Automating the Automators: Shift Change in the Robot Factory” which I published on O’Reilly Radar.)

ML/AI for Technical Leaders: Talking the Talk (A crash-course in two AI dialects)

(delivered October and November 2021, at private gatherings for technical leaders: CTOs, VPs of Engineering, and so on)

As a CTO or Director of Engineering, you’re in an interesting spot: either you’re getting pressure from stakeholders to “do AI” (whatever that means) or you’re trying to make sense of what your data science team is asking of you. And then there are the incessant calls from vendors who just swear that their latest AI-flavored offering will cure all that ails you. For a hefty fee, of course.

AI is a big field and there’s a lot to learn. The first step to making sense of all this is learning how to talk the talk. In this talk, I explain what AI really is, how to guide your stakeholders down the road to effective AI, and how to “speak machine learning” with your data scientists. You’ll also get better at evaluating the vendor pitches.

Well, Now What? The world of data seen through the lens of risk, bubbles, and rhinos

(delivered July 2020 at CDx Summit)

The COVID-19 pandemic has induced stress on, among other things, business budgets. Companies are now reviewing every department in search of ways to trim spending. Does this mean the end of data science/ML/AI’s heyday of no-accountability, no-questions-asked money? If so, how do we recover?

In this talk, I explore the data world’s coming scrutiny through the lens of financial risk, stock market bubbles, and gray rhinos.

Time Series Analysis for Trading: It’s All About the Residuals

(delivered 2019 and 2020 to a graduate class in algorithmic trading)

Time series analysis is a branch of mathematical modeling that focuses on time-based phenomena. This is a cornerstone of electronic trading, as market data is expressed as time series of prices.

In this lecture, I explored the theory behind the when/why of univariate time series modeling (AR, MA, ARIMA), and used live code examples to demonstrate the what/how of the model-evaluate cycle (pandas, statsmodels).

Professional Software Development: It’s More Than Just Code

(delivered 2018 and 2019 to an undergraduate computer science class)

It’s easy to think that being a software developer is all about writing code. That’s the focus of most computer science courses, and what’s most often mentioned in media coverage of the profession.

In reality, writing code is only a small part of being a successful and effective software developer. In this lecture, I walked students through several concrete, straightforward measures – “cheat codes,” as I called them – they could take to make the most of their career once they graduate.

DSS Podcast: Applications of Data Science in Media & Entertainment

(as host of the Data Science Salon Podcast )

I sat down with Harini Krishnan (Capsule8) and Ayan Battacharya (Deloitte Consulting) to explore data science in the world of media and entertainment.

DSS Podcast: Prolific vs. Private Data in Media Advertising

(as host of the Data Science Salon Podcast )

This was an extended conversation with Lauren Lombardo (Nielsen) and Sergey Fogelson (Viacom) on the impact of AI in advertising, and the ethics around certain practices.

Exploring Your Data / Generating Your Own Data

(delivered 2018 to a graduate class in algorithmic trading)

You’re a new analyst in the firm, and someone has just handed you a pile of data on which to build your models. What do you do?

In this lecture, I shared a step-by-step approach on how to explore a new, unfamiliar dataset while explaining the realities of the troublesome data they would see in the workplace. (This was a live demo, using Python and Jupyter Notebook.)

As bonus material, I demonstrated how they could generate their own data for model testing.

What is data science, really? (Getting past the hype and getting to business value.)

(delivered September 2017 to a group of executives as part of a private CollaborativeGain speaker series)

The business world is sold on the promise of data: whether you call it big data, data science, machine learning, or anything else, everyone seems hell-bent on using data to transform and improve their company. Once you push away the press hype and vendor pitches, though, there is little practical advice on how to get started. It’s no wonder that so many companies’ data efforts start off on the wrong path and ultimately derail with no return on investment.

The promise of data is indeed real. It doesn’t start with the alphabet soup of terminology, but with you, the leadership: you need to understand what is possible with data and develop a realistic plan. In this talk we’ll explore what “data science” really means, the people and processes you’ll need to make it work for you, and ways it can get you into hot water. You’ll get practical guidance from an industry veteran who has seen what works and what doesn’t.

Data Monetization: Turn Corporate Data into Revenue Streams

(delivered August 2017 at Dataversity)

Scan any number of financial industry news publications, and stories regarding Wall Street’s hunger for new data sets to improve alpha abound. While this might seem like a new trend, monetizing data - or the ability to turn corporate data into revenue streams - has existed for decades. But both the supply side and the demand side have changed. On the supply side, the extreme variety of data that now exists (location-based, geospatial, socio-demographic, online search trends, pricing, etc.) combines with high computing power and new digital requirements to create a fertile data market environment. On the demand side, to remain competitive, companies in a wide variety of industries, not just the financial sector, are leveraging data in all forms to maintain an edge or be disruptive.

During this session we’ll explore what data monetization is and the forms it can take; characteristics of data that could make it more valuable to external parties; and key considerations in making data products available to external parties. Intellectual property, data privacy, and contractual issues will also be explored.

Data Science, a strategist perspective

(as a guest on the Analytikus podcast )

I met up with Miguel Molina-Cosculluela, founder of Analytikus, to discuss the state of data science, how companies can make smart first steps in data, and how the field may change over time.

Beyond the Big Data Hype: Putting Analytics to Work

(keynote address – March 2016 at a private industry event)

At this private event, my co-presenter and I provided more than 100 executives with practical, industry-specific, actionable guidance on how to use data to improve their business standing. Our data maturity model served as a foundation for the talk, which explored business data initiatives from traditional business intelligence (BI) all the way through advanced reinforcement learning.

The Growing Emphasis on Leveraging IoT Data

(moderator – November 2015 at the ITA Internet of Things Summit)

As moderator, I led the discussion of four panelists to discuss business, social, and technological concerns of using Internet of Things (IoT) data in the enterprise. Panelists included representatives from the University of Chicago’s department of Computational Analysis and Public Policy, Zebra Technologies, IBM, and the Digital Manufacturing & Design Innovation Institute (DMDII).

Hadoop and Elastic MapReduce

(delivered January 2014 at DataPotluck)

Hadoop is a powerful tool for large-scale data analysis. That power comes with a hefty price tag: the cost of building and maintaining the underlying compute cluster can hinder Hadoop adoption in small and large firms alike.

In this talk, I’ll explain how to put Hadoop to work for you, and how to use Elastic MapReduce (EMR), the hosted Hadoop solution provided by Amazon Web Services. Learn how EMR can help you get Hadoop in a hurry and on the cheap, without the costly cluster commitment.

Building Your Analytics Shop, Step By Step

(delivered 2013/10/30 at Strata+Hadoop World 2013)

Also known as: Busting Myths About Building Your Analytics Shop.

Whether you call it “Big Data,” “data science,” or simply, “analytics,” the field has quickly become an integral part of business. There is plenty of technical guidance for the hands-on practitioner, but people in leadership roles – those who are responsible for setting a company’s direction and aligning analytics to business goals – are left with scant help beyond vendor marketing materials.

I delivered this talk with Brett Goldstein (@bjgol) at Strata+Hadoop World NYC 2013. It was based on our upcoming book, Making Analytics Work: Case by Case. We framed it as an exercise in busting myths of building analytics practice.

History Lesson

(delivered 2013/09/12 at DataGotham)

One can draw several parallels between today’s data analysis fever and the IT boom of the 1990s. How do we learn from the IT fallout, to reap our rewards yet avoid the pitfalls?

Integrating R+Hadoop into Your Data Analytics Pipeline

(delivered 2013/08/10 at KDD 2013 Big Data Camp)

This was built on the “R+Hadoop” talk, described below, but with a focus on when to apply that strategy to the data analytics pipeline.

As an analytics tool, R strains under modern large-scale datasets. People have devised a number of ways to help R function in the big-data arena, one of which is to drive it with Hadoop. But how does this work, and when is it an appropriate solution? This talk will describe the what and the how of mixing R and Hadoop, and more importantly, the when and why of this approach.

Text-mining Your City

(first delivered 2012/10/25 at Strata+Hadoop World 2012)

I presented this talk with Brett Goldstein, Chicago’s chief data officer (CDO). We had collaborated on some social media analysis for civic good, and delivered a talk to explore what we learned.

Dealing with Bad Data

(first delivered 2012/02/29 at Strata Conference)

People often limit the definition of “bad data” to missing values or difficult formats. I say that “bad data” is so much more: it is any data that gets in the way. This can involve missing or hard-to-access datasets, inconsistencies, unexpected modifications in upstream data sources, and so on. If it derails your analysis effort, then it’s bad data in my book.

Dealing with Bad Data explores this notion of “bad data,” how it impacts an analysis effort, and how to address the problems it causes.

Machine Learning’s Impact on the Library

(first delivered 2011/11 to a graduate-level Information Management class in an MLIS program)

Quite a bit of library science involves categorizing and indexing content, to make it easy to find later, and librarians have been doing this since long before “Big Data” was a buzzword. That puts librarians among the world’s first data miners.

Technology advances have changed the types and volume of content librarians manage, as well as the expectations of information-seekers. Similarly, technology will play a strong role in the evolution of library science. In this talk, I explained how natural langauge processing (NLP) and machine learning would influence the next phase of how librarians classify and index content.

A Bit of R & Hadoop: Getting R to Dance With the Elephant (a.k.a., “R+Hadoop”)

(I have delivered this presentation at various meetups in the US, Canada, and England.)

The formal title for this talk is, A Bit of R & Hadoop: Getting R to Dance With the Elephant. It is sometimes listed under other titles, such as Mixing R and Hadoop: Large-Scale Data Analysis and Computations, though people mostly know it as my “R+Hadoop talk.”

In short: R is quite a useful tool, but it strains under new-age Big Data problems. One solution is to use Hadoop’s scalable, parallel computing framework to drive R. This talk explores the what, how, and why of getting R to dance with the elephant. It starts with an introduction to Hadoop/MapReduce, and wraps up with some tools to use Hadoop through R.

Elastic MapReduce: Hadoop in the Cloud

This presentation’s formal title is, Elephants in the Cloud, on the Cheap. People sometimes refer to is as my “EMR talk.”

(I also offer an extended version of this talk, suitable for a teaching/training session. Please see below.)

Many companies want Hadoop’s power, but the cost of an on-site, self-managed cluster can be quite a shock. Amazon Web Services offers Elastic MapReduce (EMR), a hosted, on-demand form of Hadoop as an alternative to an on-site cluster. This session explains MapReduce concepts, Hadoop, and EMR, in terms of both theory and hands-on practical guidance.

training / teaching

I’m also working on other training materials that are not listed here. If you’d like training for your company on another topic, please let me know.

Introduction to R

R is a free, open-source, and powerful tool for data analysis. It has become even more popular in recent years as the analytics (“data science,” “Big Data”) field has grown.

That said, R’s commandline interface and differences from other programming languages can present a steep learning curve.

Would you like your team to use R? This hands-on training session provides a solid introduction to R basics, syntax, working with data, and graphics.

AWS: Elastic MapReduce: Hadoop in the Cloud

Many companies want Hadoop’s power, but the cost of an on-site, self-managed cluster can be quite a shock. Amazon Web Services offers Elastic MapReduce (EMR), a hosted, on-demand form of Hadoop as an alternative to an on-site cluster. This session explains MapReduce concepts, Hadoop, and EMR, in terms of both theory and hands-on practical guidance.

AWS: Asynchronous Messaging Systems Using Amazon SQS

Done well, message-driven systems can support a variety of robust, scalable, and flexible applications. They also offer “free” concurrency under certain circumstances. (Contrary to popular belief, async messaging is not just for trading systems.)

This session explores asynchronous messaging concepts, the basics of message-driven systems, and how to implement them using the Amazon Web Services Simple Queue Service (SQS) as the middleware.