Data, data everywhere
Information has gone from scarce to superabundant. That brings huge new benefits, says Kenneth Cukier (interviewed here)—but also big headaches
Feb 25th 2010
From The Economist print edition
WHEN the Sloan Digital Sky Survey started work in 2000, its telescope in New Mexico collected more data in its first few weeks than had been amassed in the entire history of astronomy. Now, a decade later, its archive contains a whopping 140 terabytes of information. A successor, the Large Synoptic Survey Telescope, due to come on stream in Chile in 2016, will acquire that quantity of data every five days.
Such astronomical amounts of information can be found closer to Earth too. Wal-Mart, a retail giant, handles more than 1m customer transactions every hour, feeding databases estimated at more than 2.5 petabytes—the equivalent of 167 times the books in America’s Library of Congress (see article for an explanation of how data are quantified). Facebook, a social-networking website, is home to 40 billion photos. And decoding the human genome involves analysing 3 billion base pairs—which took ten years the first time it was done, in 2003, but can now be achieved in one week.
All these examples tell the same story: that the world contains an unimaginably vast amount of digital information which is getting ever vaster ever more rapidly. This makes it possible to do many things that previously could not be done: spot business trends, prevent diseases, combat crime and so on. Managed well, the data can be used to unlock new sources of economic value, provide fresh insights into science and hold governments to account.
But they are also creating a host of new problems. Despite the abundance of tools to capture, process and share all this information—sensors, computers, mobile phones and the like—it already exceeds the available storage space (see chart 1). Moreover, ensuring data security and protecting privacy is becoming harder as the information multiplies and is shared ever more widely around the world.
Alex Szalay, an astrophysicist at Johns Hopkins University, notes that the proliferation of data is making them increasingly inaccessible. “How to make sense of all these data? People should be worried about how we train the next generation, not just of scientists, but people in government and industry,” he says.
“We are at a different period because of so much information,” says James Cortada of IBM, who has written a couple of dozen books on the history of information in society. Joe Hellerstein, a computer scientist at the University of California in Berkeley, calls it “the industrial revolution of data”. The effect is being felt everywhere, from business to science, from government to the arts. Scientists and computer engineers have coined a new term for the phenomenon: “big data”.
Epistemologically speaking, information is made up of a collection of data and knowledge is made up of different strands of information. But this special report uses “data” and “information” interchangeably because, as it will argue, the two are increasingly difficult to tell apart. Given enough raw data, today’s algorithms and powerful computers can reveal new insights that would previously have remained hidden.
The business of information management—helping organisations to make sense of their proliferating data—is growing by leaps and bounds. In recent years Oracle, IBM, Microsoft and SAP between them have spent more than $15 billion on buying software firms specialising in data management and analytics. This industry is estimated to be worth more than $100 billion and growing at almost 10% a year, roughly twice as fast as the software business as a whole.
Chief information officers (CIOs) have become somewhat more prominent in the executive suite, and a new kind of professional has emerged, the data scientist, who combines the skills of software programmer, statistician and storyteller/artist to extract the nuggets of gold hidden under mountains of data. Hal Varian, Google’s chief economist, predicts that the job of statistician will become the “sexiest” around. Data, he explains, are widely available; what is scarce is the ability to extract wisdom from them.
More of everything
There are many reasons for the information explosion. The most obvious one is technology. As the capabilities of digital devices soar and prices plummet, sensors and gadgets are digitising lots of information that was previously unavailable. And many more people have access to far more powerful tools. For example, there are 4.6 billion mobile-phone subscriptions worldwide (though many people have more than one, so the world’s 6.8 billion people are not quite as well supplied as these figures suggest), and 1 billion-2 billion people use the internet.
Moreover, there are now many more people who interact with information. Between 1990 and 2005 more than 1 billion people worldwide entered the middle class. As they get richer they become more literate, which fuels information growth, notes Mr Cortada. The results are showing up in politics, economics and the law as well. “Revolutions in science have often been preceded by revolutions in measurement,” says Sinan Aral, a business professor at New York University. Just as the microscope transformed biology by exposing germs, and the electron microscope changed physics, all these data are turning the social sciences upside down, he explains. Researchers are now able to understand human behaviour at the population level rather than the individual level.
The amount of digital information increases tenfold every five years. Moore’s law, which the computer industry now takes for granted, says that the processing power and storage capacity of computer chips double or their prices halve roughly every 18 months. The software programs are getting better too. Edward Felten, a computer scientist at Princeton University, reckons that the improvements in the algorithms driving computer applications have played as important a part as Moore’s law for decades.
A vast amount of that information is shared. By 2013 the amount of traffic flowing over the internet annually will reach 667 exabytes, according to Cisco, a maker of communications gear. And the quantity of data continues to grow faster than the ability of the network to carry it all.
People have long groused that they were swamped by information. Back in 1917 the manager of a Connecticut manufacturing firm complained about the effects of the telephone: “Time is lost, confusion results and money is spent.” Yet what is happening now goes way beyond incremental growth. The quantitative change has begun to make a qualitative difference.
This shift from information scarcity to surfeit has broad effects. “What we are seeing is the ability to have economies form around the data—and that to me is the big change at a societal and even macroeconomic level,” says Craig Mundie, head of research and strategy at Microsoft. Data are becoming the new raw material of business: an economic input almost on a par with capital and labour. “Every day I wake up and ask, ‘how can I flow data better, manage data better, analyse data better?” says Rollin Ford, the CIO of Wal-Mart.
Sophisticated quantitative analysis is being applied to many aspects of life, not just missile trajectories or financial hedging strategies, as in the past. For example, Farecast, a part of Microsoft’s search engine Bing, can advise customers whether to buy an airline ticket now or wait for the price to come down by examining 225 billion flight and price records. The same idea is being extended to hotel rooms, cars and similar items. Personal-finance websites and banks are aggregating their customer data to show up macroeconomic trends, which may develop into ancillary businesses in their own right. Number-crunchers have even uncovered match-fixing in Japanese sumo wrestling.
Dross into gold
“Data exhaust”—the trail of clicks that internet users leave behind from which value can be extracted—is becoming a mainstay of the internet economy. One example is Google’s search engine, which is partly guided by the number of clicks on an item to help determine its relevance to a search query. If the eighth listing for a search term is the one most people go to, the algorithm puts it higher up.
As the world is becoming increasingly digital, aggregating and analysing data is likely to bring huge benefits in other fields as well. For example, Mr Mundie of Microsoft and Eric Schmidt, the boss of Google, sit on a presidential task force to reform American health care. “Early on in this process Eric and I both said: ‘Look, if you really want to transform health care, you basically build a sort of health-care economy around the data that relate to people’,” Mr Mundie explains. “You would not just think of data as the ‘exhaust’ of providing health services, but rather they become a central asset in trying to figure out how you would improve every aspect of health care. It’s a bit of an inversion.”
To be sure, digital records should make life easier for doctors, bring down costs for providers and patients and improve the quality of care. But in aggregate the data can also be mined to spot unwanted drug interactions, identify the most effective treatments and predict the onset of disease before symptoms emerge. Computers already attempt to do these things, but need to be explicitly programmed for them. In a world of big data the correlations surface almost by themselves.
Sometimes those data reveal more than was intended. For example, the city of Oakland, California, releases information on where and when arrests were made, which is put out on a private website, Oakland Crimespotting. At one point a few clicks revealed that police swept the whole of a busy street for prostitution every evening except on Wednesdays, a tactic they probably meant to keep to themselves.
But big data can have far more serious consequences than that. During the recent financial crisis it became clear that banks and rating agencies had been relying on models which, although they required a vast amount of information to be fed in, failed to reflect financial risk in the real world. This was the first crisis to be sparked by big data—and there will be more.
The way that information is managed touches all areas of life. At the turn of the 20th century new flows of information through channels such as the telegraph and telephone supported mass production. Today the availability of abundant data enables companies to cater to small niche markets anywhere in the world. Economic production used to be based in the factory, where managers pored over every machine and process to make it more efficient. Now statisticians mine the information output of the business for new ideas.
“The data-centred economy is just nascent,” admits Mr Mundie of Microsoft. “You can see the outlines of it, but the technical, infrastructural and even business-model implications are not well understood right now.” This special report will point to where it is beginning to surface.
U.S. banks posted last year their sharpest decline in lending since 1942, suggesting that the industry's continued slide is making it harder for the economy to recover.
Banks That Went BustTrack U.S. bank failures since January 2008.
Besides registering their biggest full-year decline in total loans outstanding in 67 years, U.S. banks set a number of grim milestones. According to the FDIC, the number of U.S. banks at risk of failing hit a 16-year high at 702. More than 5% of all loans were at least three months past due, the highest level recorded in the 26 years the data have been collected. And the problems are expected to last through 2010.
The struggling U.S. banking industry remains a problem for policy makers eager for banks to lend again. Lawmakers on Capitol Hill and administration officials have pushed banks to lend, particularly in light of the billions in taxpayer aid injected into the financial industry over the past two years. Banking groups and their members counter that they're under pressure from regulators to be more prudent and that demand from struggling consumers and businesses isn't there.
Initiatives such as the Obama administration's $30 billion small-business lending program will rely on banks making loans at a time when many of those same firms are wrestling with a rising tide of commercial real estate problems or being told to add to their reserves by regulators.
Some small-business owners say they could expand if they could just get a loan. Nick Sachs, president of Homewatch CareGivers Cincinnati-Metro, says he's been asking banks for a loan of $150,000 to $250,000 since 2008. He says his home-health-care franchise could hire 20 to 30 aides and even one or two office assistants.
After being rejected for a loan by Huntington Bancshares Inc. over a year ago, Mr. Sachs recently re-applied to the Columbus, Ohio, bank. He did so in part because Huntington said in February that it would double its annual small-business lending over the next three years and extend credit to as many as 27,000 more businesses.
Maureen Brown, a Huntington spokeswoman, said the bank's "turnaround loans" have been well-received. She said the bank doesn't comment on individual loan applicants. Huntington has posted a string of five quarterly losses dating to 2008.
The FDIC said that the decline in loan balances in the quarter hit all major categories—from construction to commercial loans and residential mortgages—with the exception of credit card loans.
It remains unclear whether the sharp decline in loans outstanding stems from banks' tightening standards and a fear of lending or from weak demand from potential borrowers spooked by the downturn. Another cause could be banks actively reducing the size of their loan portfolios, creating a natural decline.
Most surveys suggest a combination of factors is at play. A January survey by the Federal Reserve of senior loan officers showed banks have slowed their efforts to tighten lending standards, but have not backed off the more stringent loan terms they put in place over the past two years. The same report, however, also showed that demand for loans from businesses and consumers continues to fall.
"Lending has been weak and spending by businesses and consumers has also been weak," FDIC Chief Economist Richard Brown said.
Bankers, on the other hand, say creditworthy borrowers are hard to come by. Fifth Third Bancorp recently extended a $3.5 million line of credit to Chicago-based One Hope United after the state of Illinois, beset by a budget crisis, delayed payments to the child-and-family-services provider.
Steve Abbey, Fifth Third senior vice president, said One Hope United is a rare exception of a nonprofit borrower that could qualify for credit from Fifth Third because of a cash crunch. Most other nonprofits that need cash right now, "haven't set themselves up to borrow money and pay it back," Mr. Abbey said. "They just need money."
The FDIC's Ms. Bair said officials are eager for banks to make loans in their communities, putting the onus on the bigger institutions to do more small-business lending. "The larger institutions I think need to step up to the plate here too," Ms. Bair said, describing as "significant" the declines in their loan balances and credit lines.
One issue complicating banks' ability to lend is the looming problem of troubled commercial-real-estate loans. The FDIC's Mr. Brown said these loans take longer than residential mortgages to go bad, dragging out the hit to a bank's balance sheet.
The FDIC's report revealed that asset-quality indicators for banks continued to deteriorate in the fourth quarter as borrowers continued to fall behind on their loans. Banks wrote down $53 billion in loans in the final three months of last year. The quarterly write-off rate was the highest ever recorded in the 26 years the FDIC has collected the data. A total of $391.3 billion of all loans and leases, or 5.4%, were at least three months past due at the end of 2009.
"While the economy is moving ahead banking results tend to lag behind," Mr. Brown said. "The problem loans and the earnings of the industry will improve somewhat after the economy improves."
Write to Marshall Eckblad at email@example.com