Cloudera VIP Customer Meetup
13:30– 13:45 Opening 開場致詞
13:45 – 14:45 Cloudera Data Science Workbench (CDSW), the Machine Learning Platform for Enterprise 邁向企業級機器學習平台
Guest speaker: Josh Yeh, Software Engineer of Cloudera, Palo Alto
Work on Cloudera Data Science Workbench (CDSW) with Apache HDFS, YARN, Spark
Work on E2E with ML/DL/AI framework: Keras, TensorFlow and etc, in CDSW
Education: EECS, UC Berkeley College of Engineering
Machine learning is all the rage. Machine learning poses great opportunities for enterprises who already capture vast amount of data, and Cloudera’s customers are using our platform to solve Machine Machine learning problems everyday.
However getting data from an enterprise data hub is no trivial task for a data scientist. The main challenges (but not limited to) are:
Accessing vast amount of production data from secured production cluster.
Maintaining machine learning tools, libraries, and frameworks
Training efficiency with GPU clusters.
Which result data scientists could only have limited dataset for data modeling and training. Without dataset from production, it creates data silo, small dataset problem. In addition, data governance is another set of problems for cluster administrators. Data scientists also want to shorten the development cycles to deploy trained model into production as efficient as possible, which is really hard to accomplish in production environment. Cloudera Data Science Workbench is the solution to enable data scientist while meeting enterprises data security requirements.
14:45 – 15:00 Break
15:00 – 16:00 Cloudera’s storage systems overview解構巨量資料平台儲存系統
Guest speaker: Wei-Chiu Chuang, Ph.D., Software Engineer of Cloudera, Palo Alto
and applications in Taiwan. Wei-Chiu received his Ph.D. in Computer Science from Wei-Chiu joined Cloudera in 2015 as a software engineer, where he is responsible for development of Cloudera’s storage systems, mostly the Hadoop Distributed File System (HDFS). He is an Apache Hadoop Committer/Project Management Committee member for his contribution in the open source project. He is also a co-founder of Taiwan Data Engineering Association, an organization that promotes better Data Engineering technologies Purdue University for his research in distributed systems and programming models.
In the past, Cloudera’s platform (CDH) supports two storage types where Big Data Applications can leverages: HDFS and HBase. But Cloudera’s customers are constantly discovering new use cases and expecting the platform to support more types of workloads. Therefore, Cloudera’s storage systems team are now supporting three new storage systems optimized for different use cases: Kudu for IoT, and S3 and ADLS for the cloud. Meanwhile, the good, old HDFS and HBase are getting a refresh with the release of Hadoop 3.0 and HBase 2.0.
In this talk, I will present an overview of these storage systems. I will highlight the new capabilities brought into Hadoop 3.0 and HBase 2.0, and then shift the focus to Kudu for IoT use cases, followed by S3/ADLS for the cloud use cases. With the new storage system options available, we fully believe Cloudera’s customers will find better use of data, and make what’s impossible today, possible tomorrow.