Leveraging the power of BERT sentence embeddings using Spark NLP

Photo by Brett Jordan on Unsplash

In a previous post, I demonstrated how different word embeddings (GloVe, ELMo, BERT) could be used for a text classification task. We saw how (and why) capturing context was important to maximize accuracy.

We also explored several ways text can be preprocessed to improve results. At the same time, we visualized how each step transformed our raw text.

Today we’re throwing it all out the window! I’m going to demonstrate how I achieved 90% classification accuracy without any preprocessing at all. Are you ready? …

A guide to state-of-the-art text classification using Spark NLP

Photo by Amador Loureiro on Unsplash

One of the most challenging tasks for machine learning models is finding the best way to to generate numeric representations for words so the model can use that information in its calculations.

In computer vision tasks, the red channel in a color (RGB) image will always refer to the red channel, and the green channel to the green channel. Text, however, is heavily based on context, such that the same word can take on multiple meanings depending on its use. Pandas, for example, can refer to cute and fuzzy bears or a Python data analysis library.

This is further complicated…

A simple approach to gain insight into the limitations of your CNN model

Photo by Marc-Olivier Jodoin on Unsplash

There is something magical about the way modern algorithms solve our current data-driven problems. Despite coming from a neuroscience background with a strong understanding of our sensory systems, I feel humbled when I write algorithms that mimic them.

One things that strikes me most, however, is the fact that both can be described as black boxes, a term usually reserved for deep learning models. For example, right now as you read this, what are you really seeing?

words…sentences…paragraphs? No way!

You’re not seeing any thing. Rather, light emitted at differing wavelengths, angles, and intensities enters your retina to produce a…

How to appropriately measure bivariate relationships in your data

Photo by Isaac Smith on Unsplash

Valentine’s Day may be a distant memory, yet it still feels like the perfect opportunity to discuss relationships with you — That is relationships within your data.

By the end of this post, you will have a better understanding of the most common correlation coefficients and when to use them. No longer relying on default settings, this freedom will give you more confidence when interpreting and presenting the results from your correlational studies.

We will look at:

  • Pearson’s r
  • Spearman’s rho
  • Kendall’s tau
  • Point biserial correlation

If you wish to follow along, you can find the dataset I used here

How to measure relationships

Comparing the efficacy of GRU, LSTM, and BiLSTM to predict Bitcoin price

Image source: Photo by André François McKenzie on Unsplash


I have been following crypto prices for several years now. I am fascinated with the evolution of the blockchain and its implications. I’ve chuckled more than once at the idea of digital currency. Not that it’s new, but I was born in the 80’s when we had to fill out a paper and speak with a human if we wanted to withdraw actual paper money…Remember paper money?

In any case, today I want to share one of my recent projects with you. I will be comparing three models to determine their efficacy at predicting the price of Bitcoin, the King…

image source — https://unsplash.com/@bacila_vlad

This is my first Medium article! (Yay!!!) The project is based on my capstone project for my master’s in data science. I had a lot of fun and I learned so much. I would love to here your feedback on this project if you have some free time. You can find the complete project here: https://github.com/ryancburke/blindness_detection

1. Introduction

A quick PubMed search for artificial intelligence revealed more than 15,000 scientific papers published last year. This field has seen a rapid evolution due to the significant contribution it makes on a global and individual scale. At a global level, AI is being utilized…

Ryan Burke

Freelance data scientist and a life-long learner.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store