Marcin is a Data developer, Data infrastructure administrator and Consultant at TantusData. He has a lot of hands-on experience with technical problems related to Big Data (Clusters with hundreds of nodes) as well as practical knowledge in business data analysis. Companies Marcin has worked for or consulted for include: Spotify, Apple and small startups.
SparkMLib – what can you do in a week?
Have you ever wondered how hard (or easy) it is to start your Machine Learning project with Spark? Are you concerned about your Math or Machine Learning basic knowledge? Are you worried about lack of experience with Spark? Are you wondering whether a transition from vanilla R to SparkMLib would be hard and what specific benefits you can get? I will share my experience gained from my first week of working with SparkMLib. I will present lessons learned as a Spark expert with just a university background in Machine Learning and solid, although a bit rusty Math. The talk will cover a few examples of using some of the most popular ML algorithms implemented in SparkMLib. It will show not only how you can benefit from distributing the computation using Spark, but also what kind of traps and difficulties might be faced by somebody who is not familiar with Spark or the specific properties of selected algorithms.