FROM CLASSIC ETL APPROACH WITH POSTGRES TO STREAMING ETL USING KAFKA
STREAMS – CARLES PLANAS
WORKING WITH POSTGRESQL AND PL/PGSQL FOR IMPLEMENTING ETLS WITH MORE
THAN 300M OF HTTP DIALOGS BY DAY FOR A SINGLE CLIENT COULD BE NOT
SCALABLE OR MAINTAINABLE AT ALL, SO IT COULD BE CHALLENGING MOVING
FORWARD TO STREAM ETLS USING KAFKA STREAM.
OUR DDC DATA COLLECTOR WHICH IS ABLE TO CAPTURE THIS AMOUNT OF TRAFFIC
WITH FEW RESOURCES MANAGE TO SEND RAW DATA TO A KAFKA CLUSTER WHICH
ALLOWS US TO DIGEST THIS TRAFFIC INSTEAD OF USING POSTGRESQL.
OUR STACK SHOULD BE ABLE TO WORK WITH DIFFERENT BI TOOLS IN ORDER TO
VISUALIZE THE IMPORTANT DATA INSIGHT OF A PARTICULAR CLIENT, FOR
EXAMPLE BUILDING OLAP CUBES FOR BATCH PROCESSES OR PROVIDING REAL-TIME
QUERIES FOR DIFFERENT DIMENSIONS OF THE DATA. SO WE ARE IMPLEMENTING
ETL APPLICATIONS BY MEANS OF KAFKA STREAMS.
AT THE END OF THIS TALK YOU WILL KNOW MORE ABOUT OUR EXPERIENCE OF
WORKING WITH KAFKA STREAMS LIBRARY FOR DOING AGGREGATION,
DISAGGREGATION AND JOIN APPLICATIONS. THE MOTIVATION BEHIND THE CHOOSE
OF KAFKA STREAMS, WHY WE DECIDED TO USE KAFKA STREAM IN ORDER TO
REPLACE EVEN BATCH PROCESSES. OUR EXPERIENCE USING AVRO OR KRYO INSIDE
OUR APPLICATION FOR SERIALIZATION. AND FINALLY OUR IDEAS TO USE
KUBERNETES FOR DEPLOYING THOSE APPLICATIONS.
THIS TALK WILL INCLUDE A DEMO OF OUR APPLICATION CAPTURING OTA (ONLINE
TRAVEL AGENCY) LOOK LIKE HTTP DIALOGS TRAFFIC, DEPLOY OF OUR
APPLICATION FOR AGGREGATE, DISAGGREGATE AND JOIN STREAMS AND FINALLY
THE RESULTING DATA STORE ON INFLUXDB AND GRAFANA FOR VISUALIZATION
*AFORO LIMITADO. PARA ASEGURAR TU RESERVA PONTE EN CONTACTO CON
DATAHACK Y CONFIRMA TU INSCRIPCIÓN EN
(HTTPS://WWW.DATAHACK.ES/EVENTOS/KAFKA-STREAMS-MAREA-DATOS/
[https://www.datahack.es/eventos/kafka-streams-marea-datos/]). UNA VEZ
HAYAN ACCEDIDO LAS PLAZAS RESERVADAS, LA ENTRADA SE REALIZARÁ POR
ORDEN DE LLEGADA HASTA COMPLETAR AFORO.
culture
business
399
Views
19/09/2019 Last update