talks and training

I speak at meetups, conferences, and other events. I also provide teaching/training sessions. My talks cover a variety of topics, but the unifying theme is solving business problems through practical application of technology and data.

Would you like me to speak at your conference, company function, or other event?
Interested in on-site training for your team?

Please contact me. Services are available worldwide.


talks

The following is a list of selected talks that I have delivered or will soon deliver. I’m also working on other talks that are not yet mentioned here.

Beyond the Big Data Hype: Putting Analytics to Work

(keynote address – March 2016 at a private industry event)

At this private event, my co-presenter and I provided more than 100 executives with practical, industry-specific, actionable guidance on how to use data to improve their business standing. Our data maturity model served as a foundation for the talk, which explored business data initiatives from traditional business intelligence (BI) all the way through advanced reinforcement learning.

The Growing Emphasis on Leveraging IoT Data

(moderator – November 2015 at the ITA Internet of Things Summit)

As moderator, I led the discussion of four panelists to discuss business, social, and technological concerns of using Internet of Things (IoT) data in the enterprise. Panelists included representatives from the University of Chicago’s department of Computational Analysis and Public Policy, Zebra Technologies, IBM, and the Digital Manufacturing & Design Innovation Institute (DMDII).

Hadoop and Elastic MapReduce

(delivered January 2014 at DataPotluck)

Hadoop is a powerful tool for large-scale data analysis. That power comes with a hefty price tag: the cost of building and maintaining the underlying compute cluster can hinder Hadoop adoption in small and large firms alike.

In this talk, I’ll explain how to put Hadoop to work for you, and how to use Elastic MapReduce (EMR), the hosted Hadoop solution provided by Amazon Web Services. Learn how EMR can help you get Hadoop in a hurry and on the cheap, without the costly cluster commitment.

Building Your Analytics Shop, Step By Step

(delivered 2013/10/30 at Strata+Hadoop World 2013)

Also known as: Busting Myths About Building Your Analytics Shop.

Whether you call it “Big Data,” “data science,” or simply, “analytics,” the field has quickly become an integral part of business. There is plenty of technical guidance for the hands-on practitioner, but people in leadership roles – those who are responsible for setting a company’s direction and aligning analytics to business goals – are left with scant help beyond vendor marketing materials.

I delivered this talk with Brett Goldstein (@bjgol) at Strata+Hadoop World NYC 2013. It was based on our upcoming book, Making Analytics Work: Case by Case. We framed it as an exercise in busting myths of building analytics practice.

History Lesson

(delivered 2013/09/12 at DataGotham)

One can draw several parallels between today’s data analysis fever and the IT boom of the 1990s. How do we learn from the IT fallout, to reap our rewards yet avoid the pitfalls?

Integrating R+Hadoop into Your Data Analytics Pipeline

(delivered 2013/08/10 at KDD 2013 Big Data Camp)

This was built on the “R+Hadoop” talk, described below, but with a focus on when to apply that strategy to the data analytics pipeline.

As an analytics tool, R strains under modern large-scale datasets. People have devised a number of ways to help R function in the big-data arena, one of which is to drive it with Hadoop. But how does this work, and when is it an appropriate solution? This talk will describe the what and the how of mixing R and Hadoop, and more importantly, the when and why of this approach.

Text-mining Your City

(first delivered 2012/10/25 at Strata+Hadoop World 2012)

I presented this talk with Brett Goldstein, Chicago’s chief data officer (CDO). We had collaborated on some social media analysis for civic good, and delivered a talk to explore what we learned.

Dealing with Bad Data

(first delivered 2012/02/29 at Strata Conference)

People often limit the definition of “bad data” to missing values or difficult formats. I say that “bad data” is so much more: it is any data that gets in the way. This can involve missing or hard-to-access datasets, inconsistencies, unexpected modifications in upstream data sources, and so on. If it derails your analysis effort, then it’s bad data in my book.

Dealing with Bad Data explores this notion of “bad data,” how it impacts an analysis effort, and how to address the problems it causes.

Machine Learning’s Impact on the Library

(first delivered 2011/11 to a graduate-level Information Management class in an MLIS program)

Quite a bit of library science involves categorizing and indexing content, to make it easy to find later, and librarians have been doing this since long before “Big Data” was a buzzword. That puts librarians among the world’s first data miners.

Technology advances have changed the types and volume of content librarians manage, as well as the expectations of information-seekers. Similarly, technology will play a strong role in the evolution of library science. In this talk, I explained how natural langauge processing (NLP) and machine learning would influence the next phase of how librarians classify and index content.

A Bit of R & Hadoop: Getting R to Dance With the Elephant (a.k.a., “R+Hadoop”)

(I have delivered this presentation at various meetups in the US, Canada, and England.)

The formal title for this talk is, A Bit of R & Hadoop: Getting R to Dance With the Elephant. It is sometimes listed under other titles, such as Mixing R and Hadoop: Large-Scale Data Analysis and Computations, though people mostly know it as my “R+Hadoop talk.”

In short: R is quite a useful tool, but it strains under new-age Big Data problems. One solution is to use Hadoop’s scalable, parallel computing framework to drive R. This talk explores the what, how, and why of getting R to dance with the elephant. It starts with an introduction to Hadoop/MapReduce, and wraps up with some tools to use Hadoop through R.

Elastic MapReduce: Hadoop in the Cloud

This presentation’s formal title is, Elephants in the Cloud, on the Cheap. People sometimes refer to is as my “EMR talk.”

(I also offer an extended version of this talk, suitable for a teaching/training session. Please see below.)

Many companies want Hadoop’s power, but the cost of an on-site, self-managed cluster can be quite a shock. Amazon Web Services offers Elastic MapReduce (EMR), a hosted, on-demand form of Hadoop as an alternative to an on-site cluster. This session explains MapReduce concepts, Hadoop, and EMR, in terms of both theory and hands-on practical guidance.


training / teaching

I’m also working on other training materials that are not listed here. If you’d like training for your company on another topic, please let me know.

Introduction to R

R is a free, open-source, and powerful tool for data analysis. It has become even more popular in recent years as the analytics (“data science,” “Big Data”) field has grown.

That said, R’s commandline interface and differences from other programming languages can present a steep learning curve.

Would you like your team to use R? This hands-on training session provides a solid introduction to R basics, syntax, working with data, and graphics.

AWS: Elastic MapReduce: Hadoop in the Cloud

Many companies want Hadoop’s power, but the cost of an on-site, self-managed cluster can be quite a shock. Amazon Web Services offers Elastic MapReduce (EMR), a hosted, on-demand form of Hadoop as an alternative to an on-site cluster. This session explains MapReduce concepts, Hadoop, and EMR, in terms of both theory and hands-on practical guidance.

AWS: Asynchronous Messaging Systems Using Amazon SQS

Done well, message-driven systems can support a variety of robust, scalable, and flexible applications. They also offer “free” concurrency under certain circumstances. (Contrary to popular belief, async messaging is not just for trading systems.)

This session explores asynchronous messaging concepts, the basics of message-driven systems, and how to implement them using the Amazon Web Services Simple Queue Service (SQS) as the middleware.