May 17-18, 2016

San Francisco, CABuy Tickets

The future is already here, it is just not evenly distributed. But it clearly shows in our 150 talks, comprising 7 conferences, bounded by the 5 days conference matrix. 50+ founders/CEOs/CTOs speaking.

In-depth talks from Google (BigQuery and Translate), Baidu Research, MetaMind, StitchFix (Deep Learning), Microsoft, Bloomberg, Quora, Kaggle, Dato (Machine Learning), Netflix (Recommender Systems), IBM (Watson), Facebook, ClearStory (DataViz), LinkedIn, Yahoo, H2O, Confluent, Mesosphere (Data Pipelines), Samsung, Automatic (IoT), AMPLab, Databricks, Salesforce, Workday, Cloudera (Spark), Pivotal (OSS), Zillow, Pandora, Nitro, Lucidworks, Mattermark, Credit Karma, Alpine Labs, , University of California-Berkeley, Stanford University, City of San Francisco, and many others.

Buy Tickets

Only 300 tickets for each day will be available to have a truly intimate technical community atmosphere.

View the Data By the Bay speakers and schedule.

Text By the Bay started in 2015 as the first applied NLP conference for the By Area. The key idea is to take open-source tools developed by the best researchers and practitioners, working at scale, and build a community of startup users understanding and improving them to run a business. Scientific rigor and excellent software engineering are the two key properties of the systems we use.

Conference News

Registration is open

We are using an advanced pass system that allows you to select a pass for several days, from 2 to 5. Each day's capacity is 400, and there are 100 Very Early/Early passes available. Please see the TICKETS page for full details.
Buy Tickets

Schedule is Published

The full schedule is published for all five days, all seven conferences. Very Early Bird registration is in effect, ending March 15.

Text By the Bay: Scalable Natural Language Engineering

A growing applied NLP conference bringing together researchers and practitioners, using computational linguistics and text mining to build new companies through understanding.

Please see the umbrella Data By the Bay description of a good talk By the Bay.

For Text By the Bay, some specific topics of interest include those covered in Text By the Bay 2015 and beyond, such as

  • Open-source libraries for parsing, entity linking, etc., at scale
  • Public corpora, crowdsourcing, labels, AIAI platforms, human-computer systems
  • Semantic modeling, knowledge bases, ontologies
  • Deep Learning for NLP
  • And more!

Last year, we started with a two-day, three-track, 50-talk conference. We've put together an inspiring program centered around language, Big Data, text and images, deep learning, UI, social networks, and much more.

This year, we're running the first data grid conference sequence with with seven verticals over five days. Each day's attendance is limited to only 400 seats and it will be full. We hope you join us in May By the Bay!

Keynote Speakers

Ricardo Baeza-Yates

Ricardo Baeza-Yates is VP of Research and Chief Research Scientist at Yahoo Labs based in Sunnyvale, California, since August 2014. Before he founded and lead from 2006 to 2015 the labs in Barcelona and Santiago de Chile. Between 2008 and 2012 he also oversaw the Haifa lab. He is also part time Professor at the Dept. of Information and Communication Technologies of the Universitat Pompeu Fabra, in Barcelona, Spain. During 2005 he was an ICREA research professor at the same university. Until 2004 he was Professor and before founder and Director of the Center for Web Research at the Dept. of Computing Science of the University of Chile (in leave of absence until today). He obtained a Ph.D. in CS from the University of Waterloo, Canada, in 1989. Before he obtained two masters (M.Sc. CS & M.Eng. EE) and the electronics engineer degree from the University of Chile in Santiago. He is co-author of the best-seller Modern Information Retrieval textbook, published in 1999 by Addison-Wesley with a second enlarged edition in 2011, that won the ASIST 2012 Book of the Year award. He is also co-author of the 2nd edition of the Handbook of Algorithms and Data Structures, Addison-Wesley, 1991; and co-editor of Information Retrieval: Algorithms and Data Structures, Prentice-Hall, 1992, among more than 500 other publications. From 2002 to 2004 he was elected to the board of governors of the IEEE Computer Society and in 2012 he was elected for the ACM Council. He has received the Organization of American States award for young researchers in exact sciences (1993), the Graham Medal for innovation in computing given by the University of Waterloo to distinguished ex-alumni (2007), the CLEI Latin American distinction for contributions to CS in the region (2009), and the National Award of the Chilean Association of Engineers (2010), among other distinctions. In 2003 he was the first computer scientist to be elected to the Chilean Academy of Sciences and since 2010 is a founding member of the Chilean Academy of Engineering. In 2009 he was named ACM Fellow and in 2011 IEEE Fellow.

Data Pipelines By the Bay

May 16, 2016

Building on Big Data Scala, this is the first conference showing end-to-end unity of Data Engineering and Data Science for big, fast, streaming data.

Text By the Bay

May 17-18, 2016 (Day 2 parallel with Democracy By the Bay and Law By the Bay)

The first applied NLP conference for the Bay Area, building on the highly-acclaimed 2015 edition: 50 talks from 50 top companies, all online at

Democracy By the Bay

May 18, 2016 (parallel with Law By the Bay and Text By the Bay)

NLP and Data Science with focus on politics, society, and government.

Law By the Bay

May 18, 2016 (parallel with Democracy By the Bay and Text By the Bay)

NLP and Data Science with focus on legal data and processes.

Legal search (100% recall), case-specific NLP, ambiguity analysis, etc.

AIoT By the Bay

May 19, 2016

Not everything is text. Multiple talks at Text By the Bay dealt with multi-modal data such as images with text. AI and IoT day is all about sensor data streams, images, vision, speech, music.

Life Sciences By the Bay

May 20, 2016 (Parallel with Data UX By the Bay)

There are several major categories of data mining related to life and health. First, genomics -- Bay Area leads with Spark and ADAM. Second, medical sensor and imaging data, with companies like Enlitic.

Data UX By the Bay

May 20, 2016 (Parallel with Life Sciences By the Bay)

Data should be visualized, with massive datasets distilled into clear and actionable display calling attention to what's really important. And then UX should naturally lead to the appropriate action.

Data By the Bay – Common Thread

May 16-20, 2016

For each conference, we'll have a common horizontal themes: platforms and algorithms.

Our Sponsors

Host Sponsor

Partner Sponsors

Friend Sponsors

Media Sponsors

Technology Insights and Events

Be a supporting member of San Francisco's premier Data/AI conference. We want to hear from you! Contact us for a prospectus and sponsorship agreement, or to talk about how we can help you be a contributing sponsor for the Data By The Bay conference!


The Agenda

Come to Text By the Bay well-rested and ready to meet your fellow developers. We'll have a full day of talks (keynotes, full-length, and lightning) and build a startup-centric data engineering community for the Bay Area!

Get Updates

Stay informed with the Text By the Bay conference news and event updates.

If you'd like to sponsor Text By the Bay, contact


Conference Schedule

View the Data By the Bay schedule & directory.

Conference Tickets

You can buy tickets for two or more days of the conference as passes. Once you buy a pass, you will receive an email with instructions on how to redeem the days you want. Each day has the capacity of 400 and will automatically be disabled once full. We'll add the days that are sold out on the TICKETS page as soon as they become unavailable.

Currently available days: Day 1, Day 2, Day 3, Day 4, Day 5.

Pricing works as follows: regular admission is $500/day. Very Early Bird is $400/day, Early Bird is $450/day, and late Bird is $550/day. We will only allocate 100 Very Early/Early Bird tickets for each day, since our capacity is limited and the word is only getting out. The passes are 2/3/4/5-day bundles, discounted $50 per each extra day (so 2-day Very Early Bird Bundle is $750, 2-day Early Bird Bundle is $850, 2-day Regular Admission Bundle is $950, etc.). We use Stripe directly to process all payments.

Full-time students inquiring about discounts: please email proof of enrollment and dates of interest.