👱
About
🖊️
Blog
➕
Submit
📁 Categories
Data
Backend
DevOps
Organization
Best Practices
Frontend
Product
Career
Event
Mobile
Get a weekly digest of tech articles. No spam.
STAY UPDATED
Data
articles
Learn how French data teams do data engineering and machine learning on production
Categories
Data
227
Backend
215
DevOps
183
Organization
173
Best Practices
165
Frontend
146
Product
146
Career
120
Event
93
Mobile
76
See all French tech blogs
Latest articles
Doctrine
- 9 May 2022
Moving all our Airflow Architecture to Kubernetes
Alan
- 28 April 2022
Machine Learning at Alan
Welcome to the Jungle
- 20 April 2022
How our growth challenged our data migration process
Veepee
- 12 April 2022
How to evaluate and monitor a Machine Learning model from experiment to production?
Adevinta
- 8 April 2022
Enabling data science on Google Cloud Platform at Adevinta
namR
- 25 March 2022
Computer Vision Pipeline with Kubernetes
Teads
- 22 February 2022
Running Spark Pipelines on EMR Using Spots Instances
Veepee
- 21 February 2022
Training pipeline orchestration with Kubeflow pipelines
Back Market
- 18 February 2022
From Delta Lake to BigQuery
AntVoice
- 17 February 2022
How we are streaming thousands of rows per second into BigQuery — Part II: Google Storage loading
Toucan Toco
- 15 February 2022
Youprep: Why it's easier than SQL
iAdvize
- 11 February 2022
Tableau Data Catalog: Let’s do the jigsaw puzzle!
iAdvize
- 10 February 2022
Tableau Data Catalog: Diving into workbooks’ complexity
iAdvize
- 9 February 2022
Tableau Data Catalog: Harvesting Tableau fields
iAdvize
- 8 February 2022
Tableau Data Catalog: Discovering Tableau’s metadata…
iAdvize
- 7 February 2022
Tableau Data Catalog: How the need came
Dataiku
- 25 January 2022
What I Wish I Had Known Before Developing a Geospatial Analysis Feature
Voodoo
- 13 January 2022
How Voodoo did Airflow 101
Doctrine
- 4 January 2022
A/B testing experience at Doctrine
namR
- 17 December 2021
Prédire la présence d’objets sur les toits de bâtiments
Teads
- 14 December 2021
⭐
Setup a slim CI for dbt with BigQuery and Docker
Lifen
- 13 December 2021
How to fix Interoperability in Healthcare
Adevinta
- 3 December 2021
Machine Learning: when should you use it?
Getaround Europe
- 2 December 2021
GDPR compliance and account deletion
Algolia
- 1 December 2021
Multilingual search: Decompounding with language-specific lexicons
Artefact
- 25 November 2021
How did we predict sales for products with almost no historical data (launches)
Stuart
- 18 November 2021
⭐
How we’re building our data platform as a product
Pennylane
- 17 November 2021
Data at Pennylane: what it entails, and how we do it
Contentsquare
- 16 November 2021
Behavioral Time Series Segmentation in ClickHouse
Doctolib
- 5 November 2021
How to ship a machine learning model in days not months?
Adevinta
- 4 November 2021
Learnings from building machine learning products at scale
Preligens
- 3 November 2021
Applying Neural Architecture Search (NAS) to a real use case — our key learnings
Doctrine
- 29 October 2021
Semantic recommendation system using CamemBERT
Younited Credit
- 19 October 2021
⭐
Data Organisation: why are there so many roles ?
Preligens
- 6 October 2021
How we built an AI Factory — Parts 2&3
Teads
- 5 October 2021
Production A/B Test Analysis Framework at Teads
Algolia
- 1 October 2021
Building real-time Analytics APIs at scale
Teads
- 30 September 2021
⭐
Managing a BigQuery data warehouse at scale
Adevinta
- 21 September 2021
Treating data as a product at Adevinta
Adevinta
- 16 September 2021
A/B testing to improve recommender products
Deezer
- 15 September 2021
Recommending music to new users
PhotoRoom
- 13 September 2021
4 times faster image segmentation with TRTorch
Artefact
- 9 September 2021
How to choose the right visualizations to better debug your forecasting models
AB Tasty
- 7 September 2021
⭐
Data Quality: Timeseries Anomaly Detection at Scale with Thirdeye
Adevinta
- 6 September 2021
The collaborative process of building a Machine Learning platform
Preligens
- 31 August 2021
⭐
How we built an AI factory — Part 1
namR
- 19 August 2021
Développer un système de livraison d’images sur l’imagerie aérienne française
Criteo
- 13 August 2021
Image-based ML techniques to classify billions of e-commerce products into thousands of categories
Criteo
- 12 August 2021
Introducing Autofaiss: An Automatic K-Nearest-Neighbor Indexing Library At Scale
Criteo
- 3 August 2021
A/B Test Decisions: Reducing Type 1 Errors And Using Elasticity
AntVoice
- 21 July 2021
How we are streaming thousands of rows per second into BigQuery — Part I: Google Cloud Dataflow
Shine
- 7 July 2021
⭐
Apache beam at Shine — part I
Adevinta
- 17 June 2021
⭐
Building a data mesh to support an ecosystem of data products at Adevinta
Veepee
- 17 June 2021
⭐
A journey to create a training pipeline using TFRecords and TFRanking
namR
- 11 June 2021
Apprendre des modèles urbains pour prédire la hauteur des bâtiments
Alan
- 10 June 2021
⭐
Automated document processing at Alan
Artefact
- 9 June 2021
Applying machine learning algorithms to satellite imagery for agriculture applications
Adevinta
- 8 June 2021
⭐
Introducing Unicron, our big data and machine learning platform
OVHcloud
- 7 June 2021
Network devices overheat monitoring
Artefact
- 26 May 2021
The path to developing a high-performance demand forecasting model — Part 2
Meetic
- 26 May 2021
How to build a useful chatbot
Contentsquare
- 25 May 2021
Reducing inference latency with TensorFlow Serving on Amazon SageMaker
Back Market
- 20 May 2021
⭐
Subscriptions on a Delta Lake
Adevinta
- 20 May 2021
Measuring output, outcomes and impact in platform teams at Adevinta
Bedrock
- 19 May 2021
The organisational challenge of building a Data team: lessons learnt
leboncoin
- 11 May 2021
⭐
Cooling down hot data: From Kafka to Athena
Lifen
- 5 May 2021
⭐
Fast graph-based layout detection
Artefact
- 5 May 2021
Leveraging satellite imagery for machine learning computer vision applications
Carrefour
- 30 April 2021
Machine Learning @Carrefour: tackling promotional product shortage
Dataiku
- 21 April 2021
Distributed Hyperparameter Search: How It’s Done in Dataiku
Artefact
- 14 April 2021
The path to developing a high-performance demand forecasting model — Part 1
Adeo
- 26 March 2021
Natural Language Understanding (NLU) for retail : our first use cases and tech approach
BlaBlaCar
- 24 March 2021
How BlaBlaCar uses Machine Learning to maximise the success of new drivers
Artefact
- 17 March 2021
Automating the training of ML models with Google Cloud AI Platform
Fretlink
- 12 March 2021
⭐
Build your own “data lake” for reporting purposes in a multi-services environment
Decathlon
- 4 March 2021
Improving performance of image classification models using pretraining and a combination of…
Artefact
- 3 March 2021
Using NLP to extract quick and valuable insights from your customers’ reviews
Artefact
- 22 February 2021
Introducing NLPretext
TF1
- 19 February 2021
⭐
Migration du backend MYTF1 vers Kafka
Contentsquare
- 18 February 2021
Ten Flink Gotchas we wish we had known
Decathlon
- 17 February 2021
Building a RNN Recommendation Engine with TensorFlow
Lifen
- 5 February 2021
⭐
Our journey from Gitlab to Kubeflow
Artefact
- 3 February 2021
Sales forecasting in retail: what we learned from the M5 competition
Dailymotion
- 28 January 2021
How Deep Learning can boost Contextual Advertising Capabilities
Veepee
- 27 January 2021
Learning to rank at Veepee
Artefact
- 25 January 2021
How to use computer vision to help medical experts diagnose Lymphoma?
Contentsquare
- 21 January 2021
How we improved our Akka Stream application throughput by 6x
Adevinta
- 14 January 2021
Tuning related-items recommenders with Bayesian Optimization
Criteo
- 12 January 2021
How to compare two treatments?
Maisons du Monde
- 18 December 2020
Qlik Sense under control
Decathlon
- 15 December 2020
Smile detection for image moderation
Decathlon
- 11 December 2020
Introducing DecaVision to train image classifiers with Google’s free TPUs
Doctolib
- 10 December 2020
Integrating an HTTPS interface in hospitals using Mirth Connect communication server
Artefact
- 8 December 2020
How did we put our sales forecasting solution for croissants into production?
Teads
- 3 December 2020
Updating to Spark 3.0 in production
Deepki
- 23 November 2020
Application Monitoring : Entre créativité et veille technique
Adevinta
- 20 November 2020
⭐
How we use Machine Learning to impact user experience
Artefact
- 17 November 2020
Reducing product stock-outs in hypermarkets with Time Series modeling
Artefact
- 17 November 2020
How to train a language model from scratch without any linguistic knowledge?
Artefact
- 17 November 2020
⭐
How did we forecast croissant sales with Catboost?
Artefact
- 17 November 2020
NLU benchmark for intent detection and named-entity recognition in call center conversations
Deepki
- 16 November 2020
L’ETL chez Deepki
Dailymotion
- 20 October 2020
How we used Cross-Lingual Transfer Learning to categorize our content
Decathlon
- 13 October 2020
10 tips to improve your machine learning models with TensorFlow
Criteo
- 6 October 2020
Out of sight, out of click
Deezer
- 1 October 2020
Improving your ingestion performance using the multidisciplinary approach
Qonto
- 30 September 2020
⭐
How Cohort Analyses inform Qonto
Linkvalue
- 29 September 2020
Comment prédire un classement global à partir d’estimations locales ?
Tweag I/O
- 23 September 2020
Announcing Lagoon - A tool for centralizing semi-structured datasets
Criteo
- 22 September 2020
⭐
Under the hood of Spark performance, or why query compilation matters
Hugging Face
- 22 September 2020
Simple considerations for simple people building fancy neural networks
AXA
- 22 September 2020
Getting started with NLP.js
leboncoin
- 9 September 2020
⭐
Migrating our machine learning platform from AWS Sagemaker to Kubernetes & Kubeflow
Criteo
- 1 September 2020
⭐
Why your AB-test needs confidence intervals
Deepki
- 24 August 2020
Comment manipuler un DataFrame Pandas avec le MultiIndex ?
Voodoo
- 17 August 2020
Building cheaper and more performant EMR pipelines
Criteo
- 11 August 2020
How Much Can Bad Data Cost Us?
Criteo
- 6 August 2020
DeepR — Training TensorFlow Models for Production
Octo
- 21 July 2020
Retour d’expérience : refactoring d’un modèle de Machine Learning qui tourne en Production
PhotoRoom
- 12 July 2020
The Hunt for Cheap Deep Learning GPUs
Criteo
- 2 July 2020
⭐
DataDoc — The Criteo Data Observability Platform
Criteo
- 10 June 2020
Introducing ADA: Another Domain Adaptation Library
AXA
- 4 June 2020
Détection des abandonnistes avec ksqlDB
Maisons du Monde
- 1 June 2020
⭐
Migration to Airflow: One year feedback
Doctrine
- 28 May 2020
Making queries 100x faster with Snowflake
Decathlon
- 15 May 2020
Making Jupyter Kernels remanent in AWS Sagemaker
GitGuardian
- 6 May 2020
Assessing model performance in secrets detection: accuracy, precision & recall explained
Doctrine
- 4 May 2020
⭐
A single legal text representation at Doctrine: the legal camemBERT
Deezer
- 29 April 2020
Detecting explicit content in songs
ManoMano
- 17 April 2020
⭐
A framework for feature engineering and machine learning pipelines
namR
- 14 April 2020
Predicting the Solar Potential of Rooftops using Image Segmentation and Structured Data
Deepki
- 10 April 2020
Tutoriel : Traiter automatiquement des fichiers sur le cloud grâce à SNS
ManoMano
- 6 April 2020
⭐
Product categorization: classical Machine Learning problem for a difficult e-commerce task
Cheerz
- 2 April 2020
Cheerz simple data stack
Publicis Sapient Engineering
- 1 April 2020
L’analyse de séries temporelles avec Prophet et DeepAR
Teads
- 31 March 2020
⭐
Reducing AWS EMR data processing costs
OVHcloud
- 24 March 2020
Announcing Kafka-on-Pulsar: bring native Kafka protocol support to Apache Pulsar
Criteo
- 10 March 2020
Open sourcing cluster-pack
OVHcloud
- 6 March 2020
Doing BIG automation with Celery
Decathlon
- 5 March 2020
Identifying the sport in an image using the Sport Vision API
Doctrine
- 26 February 2020
⭐
A generic pipeline to make offline inferences
OVHcloud
- 26 February 2020
Introducing Director – a tool to build your Celery workflows
Maisons du Monde
- 24 February 2020
Road to add form for Airflow’s dag
Decathlon
- 21 February 2020
Personalization strategy and recommendation systems at Décathlon Canada
Algolia
- 17 February 2020
⭐
Scaling Algolia’s Personalization engine with Bigtable’s schema
OVHcloud
- 14 February 2020
Contributing to Apache HBase: custom data balancing
Cdiscount
- 13 February 2020
Estimating when a message will be consumed in Kafka
Criteo
- 6 February 2020
Incrementality: a simple question commanding subtle answers
namR
- 5 February 2020
Estimating Vegetated Surfaces with Computer Vision: how we improved our model and scaled up
Teads
- 29 January 2020
Spark UDAF could be an option!
Sicara
- 27 January 2020
TensorFlow 2.0 Tutorial : Optimizing Training Time Performance
Sicara
- 13 January 2020
Optimize Response Time of your Machine Learning API In Production
Doctolib
- 3 January 2020
Redshift : Troubleshouting made simple
Deepki
- 17 December 2019
Comment fait-on pour obtenir des courbes de charge en temps réel ?
Hugging Face
- 3 December 2019
Encoder-decoders in Transformers: a hybrid pre-trained architecture for seq2seq
Kapten
- 29 November 2019
Why going real time?
Meilleurs Agents
- 27 November 2019
La qualité des données d’une plateforme : un vecteur de confiance.
Deepki
- 27 November 2019
Dask vs Pandas : Fight !
Hugging Face
- 26 November 2019
How To Write With Transformer
Dailymotion
- 21 November 2019
Bringing Machine Learning models into production without effort at Dailymotion
Preligens
- 17 November 2019
Boosting object detection performance through ensembling on satellite imagery
Criteo
- 14 November 2019
Big Data Quality at Criteo
Deezer
- 4 November 2019
Releasing Spleeter: Deezer R&D source separation engine
Doctrine
- 3 November 2019
⚖ Structuring legal documents with Deep Learning
Linkvalue
- 31 October 2019
From Duels to Sport Racing
Tweag I/O
- 30 October 2019
Here You See the Small Porcupine Perched in Its Tree, Preparing and Crunching Some Data with Me
Sicara
- 28 October 2019
Deep Learning Memory Usage and Pytorch Optimization Tricks
Hugging Face
- 18 October 2019
Benchmarking Transformers: PyTorch and TensorFlow
Preligens
- 10 October 2019
Super-resolution for satellite imagery analysis
Dailymotion
- 4 October 2019
Realtime data processing with Apache Beam and Google Dataflow at Dailymotion
Criteo
- 30 September 2019
Boosting image processing performance, from ImageMagick to Libvips
Sicara
- 26 September 2019
Face Detectors: Understand DSFD and the State-of-the-art Algorithms
Alan
- 25 September 2019
Alan’s solution to sharing data knowledge
Criteo
- 19 September 2019
Lessons learned from annotating 5 million images
Dailymotion
- 5 September 2019
How Dailymotion revamped its Strategy and built a Programmatic Platform
Preligens
- 3 September 2019
Improving the description of satellite images using GIS data
Cdiscount
- 30 August 2019
Cdiscount image dataset for visual search and product classification
Hugging Face
- 28 August 2019
Smaller, faster, cheaper, lighter: Introducing DilBERT, a distilled version of BERT
Dailymotion
- 9 August 2019
Getting started with Data Lineage
Hugging Face
- 9 August 2019
From TensorFlow to PyTorch
Deezer
- 31 July 2019
Women & Hip-Hop
Sicara
- 31 July 2019
Introducing tf-explain, Interpretability for TensorFlow 2.0
Sicara
- 30 July 2019
Few-Shot Image Classification with Meta-Learning
Dailymotion
- 19 July 2019
How to design Deep Learning models with Sparse Inputs in Tensorflow Keras
Glose
- 19 July 2019
How to evaluate readers text comprehension?
Sicara
- 16 July 2019
Image Registration: From SIFT to Deep Learning
Sicara
- 14 July 2019
Determine Your Network Hyper-parameters With Bayesian Optimization
Qonto
- 10 July 2019
Using DVC to create an efficient version control system for data projects
Dailymotion
- 25 June 2019
Building modern recommender systems: when deep learning meets product principles
Hugging Face
- 24 June 2019
Scaling a massive State-of-the-Art Deep Learning model in production
Glose
- 20 June 2019
How to Evaluate Text Readability with NLP
Criteo
- 6 June 2019
Train TensorFlow models on YARN in just a few lines of code !
Teads
- 23 May 2019
Lessons learned while optimizing Spark aggregation jobs
namR
- 21 May 2019
Why data quality matters?
Hugging Face
- 17 May 2019
Introducing FastBert — A simple Deep Learning library for BERT Models
Glose
- 14 May 2019
️ Fast bag-of-words using spaCy and cython
Criteo
- 13 May 2019
Upgrading Kafka on a large infra
Voodoo
- 13 May 2019
Leverage AWS to create a data pipeline at scale in a couple of weeks
Deepki
- 13 May 2019
Du G dans le SI, pour quoi faire ?
Snips
- 10 May 2019
Snips Open Sources Tract
OVHcloud
- 10 May 2019
Alerting based on IPMI data collection
Hugging Face
- 9 May 2019
How to build a State-of-the-Art Conversational AI with Transfer-Learning
Snips
- 9 May 2019
Infrared Voice Controlled With Snips
Snips
- 26 April 2019
Debunking the Unnecessary Compromise: Users Can Have a Voice Assistant and Their Privacy Too
Criteo
- 16 April 2019
Hyper-parameter optimization algorithms: a short review
Deepki
- 15 April 2019
Accessibilité des données énergétiques aux consommateurs
namR
- 9 April 2019
Deep Learning for Roof Detection in Aerial Images in 3 minutes
Sicara
- 8 April 2019
How Apache Airflow Distributes Jobs on Celery workers
ManoMano
- 20 March 2019
Improve business decisions with three machine learning interpretability tools
Decathlon
- 12 March 2019
Personalize your app or Website using your catalog of images
Sicara
- 12 March 2019
Edge Detection in Opencv 4.0, A 15 Minutes Tutorial
Teads
- 26 February 2019
Give meaning to 100 Billion events a day — Part II
Preligens
- 26 February 2019
How to use deep learning on satellite imagery — Playing with the loss function
Cdiscount
- 12 February 2019
Détection de bots
Welcome to the Jungle
- 11 February 2019
Exploring collaborative filtering for job recommendations
Hugging Face
- 29 January 2019
Multi-label Text Classification using BERT – The Mighty Transformer
Decathlon
- 28 January 2019
Building a visual search algorithm in a few steps using transfer learning
Lifen
- 24 January 2019
Storing AI predictions in FHIR
leboncoin
- 9 January 2019
Data Traffic Control with Apache Airflow
Sicara
- 9 January 2019
How to Perform Fraud Detection with Personalized Page Rank
Criteo
- 8 January 2019
Packaging code with PEX — a PySpark example
Getaround Europe
- 7 January 2019
Why we've chosen Snowflake ❄️ as our Data Warehouse
Lifen
- 16 November 2018
Fake it until you make it