His friend who works in satellites tells him, he can look at the past months satellite feed as Moffett Field is right across the bay and his satellite flies right over there. It expanded from there to the point people started live-streaming all the distribution centers as a way to try to predict whether there will be sufficient demand for the new model, and ultimately whether the share price will fall or rise.

This excellent example of unstructured data is simply a picture of how many cars are sitting on any given lot at any given time. Some users were even able to write automated counters and live-stream the locations so that traders could have the information on-demand and any time they wanted. The problem with the whole thing is that the alt-data lacked context. As Tesla ramped up production, so did their temporary storage. Without knowing the other factors, having access to the fastest, most accurate, alt-data in real-time can be open to any number of wide interpretations.

Bitvore's Use Of Alt-Data Takes A Different Approach.

Alt-data isn't valuable without correlating it to more traditional data sources. The single most valuable source is timestamp-based news. While there are a lot of things that can be discovered that never show up in the news, having access to those things lack context without validation in the news. That's not to say all news sources are equivalent. There is a production cycle and an escalation process for certain items. Bitvore has gotten really good at identifying early news items that will be significant before they are well covered by more traditional, slow-moving media.

This expertise helps for predictive models. In the short term, we can find valuable news items by correlating the information with our alt-data and leveraging our machine learning models that have been tuned using tens or hundreds of millions of records across various companies and industries. For longer term predictions, we look for patterns in our analysis. We identify individual items with something called a signal. A signal is simply an indicator that something financially impactful happened with a very high degree of reliability. We also correlate that signal to the company that is mentioned. When we combine both the company and the signal, we come up with precision news—a highly reliable indicator that something important happened.

Our latest predictive efforts use that highly reliable information to predict other signals. For instance, in our municipal product, if a city eliminates a fire, police, or ambulance service, forgoes teacher raises in a school district, or starts discussing pension costs--all signals in our system, we can predict with almost certainty they will be announcing a budget shortfall at the end of the fiscal year. Likewise, if a city further announces a budget shortfall, raises new money through issuing new bonds, pushes through public employee raises, or raises property taxes, also all signals in our system, we can predict a city or a county bankruptcy.

Companies follow similar patterns. Fundraising, an abundance of new product launches, executive churn, and various other patterns of signals can result in looking for new money/fundraising, trying to sell the company/merger & acquisition, financial distress, or even bankruptcy. While these types of predictions are not absolute, just knowing there is a higher percent chance over the course of the next two or four quarters is extremely useful information.

Why Do Data Scientists Spend 60-80% Of Their Time Dealing With Unstructured Alternative Data?

In short:

  • Multiple, disparate sources of data
  • Normalization issues
  • Cleansing issues

For data science, there is always a tradeoff between using a small, but very clean data set versus using a large and dirty one. There are many ways data can be dirty. The first is concordance. If you have several different names of companies, i.e. Family Dollar Stores, Dollar Tree, Dollar General, Dollar Express, Dollar Holdings, you have a concordance problem. Which company names are the same and which are different? Which are still around and which have gone away? Sometimes it's even hard for humans to know the difference. Geographically, we have to differentiate between the City of West, Texas and West Texas, Central Pennsylvania and the City of Center, Pennsylvania, and hundreds more of really ambiguous items.

First « 1 2 3 4 5 6 » Next