Starfish

Starfish YouTube Video



Idea Summary

Write a summary paragraph: In about a paragraph, describe what problem you are solving, how many people are experiencing the problem, and what your solution is.

Big Data – voluminous and rapidly growing amounts of data that exceeds the processing capacity of
conventional database systems – is now the main driving force for enterprises to innovate by gaining
deeper insight about their customers, partners and overall business. Hadoop is a distributed open-
source software system that has evolved rapidly over the last few years to become the leading platform
for storing, managing, processing and analyzing Big Data. It is estimated that, by 2015, more than
half the world's data will be processed by Hadoop. However, Hadoop comes at a high cost because it
relies heavily on users, system administrators, and expensive consultants to manually configure and
optimize the workloads and computing resources at many levels. Our solution, Starfish, is an intelligent
automated management tool for Hadoop that makes the processing of Big Data fast, resource-efficient
and cost-effective. Starfish is indispensable to make Hadoop accessible to a broad spectrum of users
in financial services, telecommunications, security, media, web, retail, bioinformatics, healthcare,
manufacturing, and government.

Tell us more about the problem you are solving.  Why is it a problem and how big of a problem is it?

Hadoop is currently the most popular distributed system used in the cloud computing era. However, it
is hard to manage the system efficiently due to its complexity. A Hadoop deployment usually consists
of hundreds of machines and there are thousands of configuration options that need to be taken care
of for the overall system to run reliably and efficiently. Currently, to manage a Hadoop deployment,
companies either maintain a dedicated in-house management team to provide support or consult
external Hadoop services, which are both expensive solutions. Companies without enough budget to
get professional help in managing their Hadoop deployments have to look for help from the Hadoop
community, which may take several days to just get a response.

Who do you think your target customers are and how many are there?

Starfish is targeting two market segments: the companies that mainly use Hadoop professional support
(professional market) and companies that use voluntary Hadoop community support (community
market).

The professional market is large: it is estimated that Big Data will be a $70 Billion industry in 2015 and
the professional service will be the largest part (at least 30%). This market is growing at a rapid rate of
15% to 20% a year. Meanwhile, Hadoop is going to process more than half of the data in the world by
2015. So we estimate that the Hadoop professional market will be at least $20 Billion in 2015. The goal
of Starfish is to be an automated solution that replaces the large and slow human efforts in this market.

We estimate the community market based on data storage. Studies have reported that the total size
of data storage grows at 50% per year and will reach about 8000 Exabytes (1 Exabytes = 1 Billion GB) in
2015. As mentioned before, 50% of this data will be stored and processed in Hadoop. If we estimate
the community market as 20% of this data, then in 2015, there will be about 800 Billion GB of data in
Hadoop that needs to be managed without professional support. Starfish is well designed to suit this
rapidly growing market.

Do you think your customers are looking for a solution?

For the professional market, Hadoop experts are paid to provide support. Though more people are
learning to use and manage Hadoop, the growth in the number of Hadoop experts lags far behind the
growth in the number of customers, which lowers the quality of professional service for each customer.
Cloudera, the leading Hadoop professional service provider, has indicated that many customers tend to
misconfigure the Hadoop system. For common issues like configuring the system properly, customers
will benefit a lot by using Starfish’s automatic solution instead of requiring expensive professional
Hadoop experts to look into these issues again and again. Customers will save a lot of expense and time,
while professionals can avoid tedious repeated work and thus provide better service to more customers.

For the community market, the demand for having an intelligent management tool like Starfish will be
even more significant. The main reason these customers do not seek professional support is their lack
of large budgets. Since Starfish provides computer-based solutions which are relatively much cheaper
than professional human services, customers will try Starfish if they could not find the solution from the
community or they cannot afford the time waiting for such voluntary support.

The dominant players in the professional market are Cloudera, Amazon and IBM. Across the
competition, however, they focus on easing the management and providing manual support. No one has
developed a tool with intelligent and automatic management.

Tell us about your solution.  How does it work and what are the benefits?

Currently, Starfish is developing solutions for most frequent questions Hadoop customers have. It
analyses critical information collected from a running Hadoop deployment, and uses a comprehensive
model that understands Hadoop’s internal mechanisms in order to make the right recommendations for
customers. While more functionality is being developed, some of the coolest features are listed here.

The most powerful feature is that Starfish can suggest a better configuration to improve the system
performance. In the cloud computing era, this feature means reducing the significant costs of renting
cloud resources. Note that there is no single rule for how to configure the complex Hadoop system, and
even for a Hadoop professional, it might take hours or days to find a desirable configuration. But with
Starfish, this can be done within minutes by Hadoop beginners, and can also be done 2x-3x better!

Starfish can also recommend the minimum amount of cloud resources required for performing a certain
task within a specific time objective. Thus, customers do not have to buy unnecessary cloud resources
and can cut down the operating costs further.

Since the Hadoop system is still evolving to strengthen its flexibility and power, more intelligence for
automated management is being developed in Starfish that can be exploited in future. For example,
one feature we are developing is that Starfish can fix misconfigurations that cause a failure of the
system, which would save customers a significant amount of time otherwise spent waiting for Hadoop
professional support or for the community to solve the problem. We believe that Starfish’s low cost and
high quality recommendations will benefit both the professional market and the community market.

Do you have any intellectual property (IP) that can be protected?  Is it protected?

Yes. We have filed provisional patents.

What's your plan for developing your product or service including some dates and milestones?  

Starfish will be developed into an enterprise version and a cloud version. For long-term customers,
especially in the professional market, we recommend the enterprise version with a fixed annual charge,
so that they can use Starfish anywhere and for an unlimited number of times per year. For customers
in the community market, we recommend the cloud version which charges per use so that these
customers can afford the service under tight budgets.

The Starfish system has been developed for two years in the Duke University Computer Science
department. The prototype has shown promising results. We are requesting $0.5 million in seed funding
to make it production ready in six months, and $1 million in Series A funding to develop new intelligent
features.

How much funding to get to a company exit?

It can be expected that the Hadoop market will continue exploding in the next decade and the need
for managing Hadoop efficiently and effectively will become more and more urgent. We are likely to
be a synergistic acquisition for one of the larger market players. Current investment in our product will
establish a competitive edge in the rapidly growing Hadoop market.

Tell us about yourselves (Who is on your team, what are you studying, what year are you)

The developer of Starfish is Herodotos Herodotou, a 5th year Duke University Computer Science PhD
Candidate who has worked on Starfish for two years and has several publications about Starfish in
top research venues. He will continue to work on Starfish’s new intelligent features. The other two
members are Jie Li, a first-year Duke University Computer Science PhD Candidate who will enhance
Starfish into production-quality software, and Xuan Wang, a first-year Duke University MEMP Candidate
who will handle Starfish’s business development.

Our team is advised by Shivnath Babu, who is currently an Assistant Professor in Duke University
Computer Science and works on data-intensive systems and cloud computing.

Use of Funds - if you won $50,000 how would you use it?

We will use the prize to launch a demo of Starfish’s production version and seek further investment. The
prize will cover the expense for servers, marketing experiments, travel, etc.

Anything else you would like to share with us? 

We appreciate the Duke Start-up Challenge competition which triggered the idea of commercializing our
research prototype.

--
Visit Starfish homepage: http://www.cs.duke.edu/starfish/


Comments