Take Web servers for instance. If you went to your company executives in the early 90s and told them that you were going to use the company's expensive network connection, clog it up with traffic, run a piece of software on an expensive company machine so that people outside of the company that you don't know can grab proprietary company information that your competitors can use against you ... the executives would probably fire you.

But that's exactly what happened with Web servers in the current day and age (minus the firing). After the benefits of having a Web server far outweighed any initial costs and concerns, companies took advantage of the collective value. This is a perfect example of a network effect. Alt-data is just starting to provide enough value to overcome the initial costs and concerns, and its adoption will only accelerate from here.

What Is Unstructured Alternative Data?

Raw text is considered unstructured data, but the truth is, even raw text comes with some points of structure. What source did it come from? When was it published? Who is the author? At Bitvore, we focus mostly on semi-structured data like textual news item, though we do look at press releases, SEC filings, investor presentations, public records, ratings, social media, product reviews, job postings, and other information.

There are companies that do use more visual alt-data like satellite images of how many cars are sitting in a storage lot or how much foot traffic goes through various airports, buildings, malls, or public spaces. That sort of information, while useful, falls outside of our interest and customer areas.

Because we can augur some of the structure, we can reason and derive structure out of the data. Did it come from a reputable source or is the source blacklisted? Was this written by a human or is it robonews/junk? What is the subject of the story and where did it take place? A lot of these early answers that help us separate invaluable, valuable, and un-valuable info can be derived structurally even before we apply more powerful machine learning algorithms.

Another source of semi-structured data comes from Web sites. The reason Web sites are semi-structured is because you aren't just looking up values on the site to answer questions. Who is the CEO? Who is on the board? What is the last big deal the company did? For how much? With which customer? When did they last launch a product?

There are Web scraping technologies out there in the world, but without doing a bit of analysis, it's hard to figure out the information or answers you need. The key question is, how do you get a machine to understand and answer these questions to the same level of quality as a human sitting down and digging through the Web site to find the answers? The answer is: humans and machines aren't perfect, but a little machine learning goes a long way to being able to do far more, far faster, and for far more sites than is feasible with any amount of humans.

How Do Data Scientists Use Alternative Data To Build Predictive Models For Analysts?

There's an urban legend that gets passed along among alt-data data scientists. It starts out like an old joke. Two guys walk into a bar. A stock analyst following Tesla is drinking away his sorrows as his clients keep asking him what is happening with Tesla. They keep promising tens of thousands of cars, but every time he visits the company, they are stockpiling thousands of cars that aren't moving anywhere.

First « 1 2 3 4 5 6 » Next