Idea Summary Write a summary paragraph: In about a paragraph, describe what problem you are solving, how many people are experiencing the problem, and what your solution is. Big Data – voluminous and rapidly growing amounts of data that exceeds the processing capacity of conventional database systems – is now the main driving force for enterprises to innovate by gaining deeper insight about their customers, partners and overall business. Hadoop is a distributed open- source software system that has evolved rapidly over the last few years to become the leading platform for storing, managing, processing and analyzing Big Data. It is estimated that, by 2015, more than half the world's data will be processed by Hadoop. However, Hadoop comes at a high cost because it relies heavily on users, system administrators, and expensive consultants to manually configure and optimize the workloads and computing resources at many levels. Our solution, Starfish, is an intelligent automated management tool for Hadoop that makes the processing of Big Data fast, resource-efficient and cost-effective. Starfish is indispensable to make Hadoop accessible to a broad spectrum of users in financial services, telecommunications, security, media, web, retail, bioinformatics, healthcare, manufacturing, and government. Tell us more about the problem you are solving. Why is it a problem and how big of a problem is it? Hadoop is currently the most popular distributed system used in the cloud computing era. However, it is hard to manage the system efficiently due to its complexity. A Hadoop deployment usually consists of hundreds of machines and there are thousands of configuration options that need to be taken care of for the overall system to run reliably and efficiently. Currently, to manage a Hadoop deployment, companies either maintain a dedicated in-house management team to provide support or consult external Hadoop services, which are both expensive solutions. Companies without enough budget to get professional help in managing their Hadoop deployments have to look for help from the Hadoop community, which may take several days to just get a response. Who do you think your target customers are and how many are there? Starfish is targeting two market segments: the companies that mainly use Hadoop professional support (professional market) and companies that use voluntary Hadoop community support (community market). The professional market is large: it is estimated that Big Data will be a $70 Billion industry in 2015 and the professional service will be the largest part (at least 30%). This market is growing at a rapid rate of 15% to 20% a year. Meanwhile, Hadoop is going to process more than half of the data in the world by 2015. So we estimate that the Hadoop professional market will be at least $20 Billion in 2015. The goal of Starfish is to be an automated solution that replaces the large and slow human efforts in this market. We estimate the community market based on data storage. Studies have reported that the total size of data storage grows at 50% per year and will reach about 8000 Exabytes (1 Exabytes = 1 Billion GB) in 2015. As mentioned before, 50% of this data will be stored and processed in Hadoop. If we estimate the community market as 20% of this data, then in 2015, there will be about 800 Billion GB of data in Hadoop that needs to be managed without professional support. Starfish is well designed to suit this rapidly growing market. Do you think your customers are looking for a solution? For the professional market, Hadoop experts are paid to provide support. Though more people are learning to use and manage Hadoop, the growth in the number of Hadoop experts lags far behind the growth in the number of customers, which lowers the quality of professional service for each customer. Cloudera, the leading Hadoop professional service provider, has indicated that many customers tend to misconfigure the Hadoop system. For common issues like configuring the system properly, customers will benefit a lot by using Starfish’s automatic solution instead of requiring expensive professional Hadoop experts to look into these issues again and again. Customers will save a lot of expense and time, while professionals can avoid tedious repeated work and thus provide better service to more customers. For the community market, the demand for having an intelligent management tool like Starfish will be even more significant. The main reason these customers do not seek professional support is their lack of large budgets. Since Starfish provides computer-based solutions which are relatively much cheaper than professional human services, customers will try Starfish if they could not find the solution from the community or they cannot afford the time waiting for such voluntary support. The dominant players in the professional market are Cloudera, Amazon and IBM. Across the competition, however, they focus on easing the management and providing manual support. No one has developed a tool with intelligent and automatic management. Tell us about your solution. How does it work and what are the benefits? Currently, Starfish is developing solutions for most frequent questions Hadoop customers have. It analyses critical information collected from a running Hadoop deployment, and uses a comprehensive model that understands Hadoop’s internal mechanisms in order to make the right recommendations for customers. While more functionality is being developed, some of the coolest features are listed here. The most powerful feature is that Starfish can suggest a better configuration to improve the system performance. In the cloud computing era, this feature means reducing the significant costs of renting cloud resources. Note that there is no single rule for how to configure the complex Hadoop system, and even for a Hadoop professional, it might take hours or days to find a desirable configuration. But with Starfish, this can be done within minutes by Hadoop beginners, and can also be done 2x-3x better! Starfish can also recommend the minimum amount of cloud resources required for performing a certain task within a specific time objective. Thus, customers do not have to buy unnecessary cloud resources and can cut down the operating costs further. Since the Hadoop system is still evolving to strengthen its flexibility and power, more intelligence for automated management is being developed in Starfish that can be exploited in future. For example, one feature we are developing is that Starfish can fix misconfigurations that cause a failure of the system, which would save customers a significant amount of time otherwise spent waiting for Hadoop professional support or for the community to solve the problem. We believe that Starfish’s low cost and high quality recommendations will benefit both the professional market and the community market. Do you have any intellectual property (IP) that can be protected? Is it protected? Yes. We have filed provisional patents. What's your plan for developing your product or service including some dates and milestones? Starfish will be developed into an enterprise version and a cloud version. For long-term customers, especially in the professional market, we recommend the enterprise version with a fixed annual charge, so that they can use Starfish anywhere and for an unlimited number of times per year. For customers in the community market, we recommend the cloud version which charges per use so that these customers can afford the service under tight budgets. The Starfish system has been developed for two years in the Duke University Computer Science department. The prototype has shown promising results. We are requesting $0.5 million in seed funding to make it production ready in six months, and $1 million in Series A funding to develop new intelligent features. How much funding to get to a company exit? It can be expected that the Hadoop market will continue exploding in the next decade and the need for managing Hadoop efficiently and effectively will become more and more urgent. We are likely to be a synergistic acquisition for one of the larger market players. Current investment in our product will establish a competitive edge in the rapidly growing Hadoop market. Tell us about yourselves (Who is on your team, what are you studying, what year are you) The developer of Starfish is Herodotos Herodotou, a 5th year Duke University Computer Science PhD Candidate who has worked on Starfish for two years and has several publications about Starfish in top research venues. He will continue to work on Starfish’s new intelligent features. The other two members are Jie Li, a first-year Duke University Computer Science PhD Candidate who will enhance Starfish into production-quality software, and Xuan Wang, a first-year Duke University MEMP Candidate who will handle Starfish’s business development. Our team is advised by Shivnath Babu, who is currently an Assistant Professor in Duke University Computer Science and works on data-intensive systems and cloud computing. Use of Funds - if you won $50,000 how would you use it? We will use the prize to launch a demo of Starfish’s production version and seek further investment. The prize will cover the expense for servers, marketing experiments, travel, etc. Anything else you would like to share with us? We appreciate the Duke Start-up Challenge competition which triggered the idea of commercializing our research prototype. Vote for us on the Duke Start-Up Challenge Facebook Page! |