Shift to a Data-rich Web
April 2, 2009  |  Web

The Web has undergone mass amounts of change from its inception nearly twenty years ago – from static HTML-only pages that could only be edited by the technical few, to rich Web applications and social media that gives anyone with Internet access the ability to contribute to the ever-growing base that is the Web. I always find it amazing that my four young nephews, all age three and under, will experience far different Web revolutions than those of us who have seen the rapid growth of the Internet and experienced the shift from static to dynamic Web content. With the current ability for any one person to contribute to the Web what is next?

Over the last couple of weeks I have been catching up on the recent batch of TED presentations – these are lectures where the best and brightest share their ideas, realistic or not, with the world in hopes of inducing positive change. One of the most captivating, for me at least, was the talk given by Tim Berners-Lee. Tim Berners-Lee, commonly referred to as TBL by my Internet computing professor, was the mind behind and the creator of the World Wide Web twenty years ago. For more information on the amazing contributions TBL has made to the world of technology please check out his bio. In his TED talk from February 2009, TBL provided his insight into how the next generation of Web will contain large amounts of accessible and linked data that can be used by anyone. I encourage you to watch the presentation below as TBL briefly describes the Web he created before moving on to open data.

Before my analysis of the presentation I would first like to discuss some relevant educational theory – I am drawing from one of my favourite business classes, Management Information Systems, for this one. Basically, there is a hierarchy of how data is transformed into information, then information into knowledge and knowledge into wisdom.

DATA > INFORMATION > KNOWLEDGE > WISDOM

Data represents the raw symbols collected. No analysis has been done or relationships made meaning this raw data has no significance on its own. For example, 2, 4, 10 represent raw data.
Information comes from relating various data together using relationships. Perhaps these numbers are dimensions giving us the size of a piece of wood – 2″ wide by 4″ deep by 100″ long. These numbers have now come together to provide something meaningful.
Knowledge is gained when the information proves to be useful in decision making. Let us assume that you are building a bridge that requires long pieces of wood 2″ by 4″ by 75″. By looking at the information you have you now have the knowledge that the piece of wood in your possession is indeed long enough.
Wisdom really takes analysis to another level by answering WHY something is as it is. Wisdom can only be achieved through time and thought and is not easily gained in every example – particularly in this example involving a piece of wood.

This small lesson I just provided is simply to show the difference between data and information. As we know it, the Web contains loads of data and information, but in a largely unorganized and unlinked manner. AsTBL says, “we haven’t got data on the Web as data”. This doesn’t mean that there is not data everywhere, because it is everywhere, it just means that data has already been turned into information or exists in such a way that it cannot be easily extracted.

My interpretation of TBL’s presentation is that he wishes for the Web to offer collections of database/spreadsheet friendly data – no analysis, no CSS, no Javascript, just column-after-column of raw data in a standardized format or, as he so elegantly put it, “give us the unadulterated data” followed by the chanting of “RAW DATA NOW!” I am the type of person that can never have too much data, particularly in business, so TBL’s outlook really gets me excited. Unfortunately, I see a fundamental issue preventing data from being “open” on a large scale.

Data behind closed doors. This is probably the biggest problem with creating a network of open data. TBL calls it just what it is – “database hugging”. Enterprises, labs, small business and many other groups all have a strong sense of ownership when it comes to their data – relevant or not to their focus.

Imagine what an open network of data could do in the area of medical research. TBL says, “the power of the data the other scientists have collected is locked up and we need to get it unlocked to tackle those huge problems“. Unlocking this data could dramatically increase the speed at which scientists could cure many of our problems – but is this possible? With so many private labs doing the research it is difficult, at best, to achieve an open data stream to the Web. This would involve a great shift in the mindset of the organization – a shift that is unlikely to occur without financial gain.

Shifting gears to the enterprise, data is locked down within the company and only accessible by those that need access. I have worked in enough places to understand that just because a company keeps outsiders from data does not mean that all insiders have access – it always surprises me how difficult it is to get access to data that is required for someone to perform their job. How likely is it that a telecom would willingly put their service, demographic, geographic and competitive data online in an open format? No chance! The ability for a firm to gain knowledge and wisdom from their information and data can be a huge competitive advantage when it comes to service upgrades, marketing and sales. Opening this data may be great for the consumer as companies would have to compete on service and price – resulting in new technology earlier – but business as we know it would, like laboratories, require a shift in fundamentals.

Is there a solution? In my opinion, the solution is compromise. A 100% open and free network of data would greatly help independent consultants, students and anyone that enjoys analysis – but without big business on board this will not happen. The opening of government data is a start and provides useful sociodemographic and economic data to citizens and businesses at no cost and in a relatively standardized format. It will be difficult to get enterprises and laboratories to do the same without financial incentive. To gain this private data, it may be possible to drop the world “open” from this concept for some of the data and sell it. The Web could consist of a large amount of open and free data that is easily accessible and standardized, but then there could be a section of proprietary data, still easily accessible and standardized, but at a cost – be it through pay-per-use or subscription.

Information pertaining to clients, customers, patients and accounts should all remain locked down to protect individual interests, but if a firm believes in their product then selling the raw data they used in the planning and development could be worth their while. Keep in mind that they are not selling their information, knowledge or wisdom, just the raw data. Unfortunately, when it comes to selling anything in a digital format there is one inherent danger – piracy. If data is changing hands at a cost there is always a concern as to what the buyer will do with that data – will they still protect it and honour the deal? This is just one of many problems that may occur with any solution when it comes to asking for open data.

Do I believe Tim Berners-Lee is correct in his analysis regarding a shift towards a data-rich Web? I really hope he is. After all, having data open, accessible, and standardized will lead to the best analysis – helping us climb from data to information to knowledge and finally to wisdom in record speed. Only then can we solve so many global problems in the short periods of time that are sometimes required.


Leave a Reply