Data Sharing Overview | Yext Hitchhikers Platform

Overview

Yext Data Sharing is a feature that leverages Snowflake’s Secure Data Sharing functionality to share raw data from the Yext Snowflake warehouse with a customer’s Snowflake account.

Yext Analytics collects a lot of useful analytics data for customers and provides robust built-in tools to aggregate and visualize that data, either in-platform through Report Builder and Dashboards, or programmatically through our Analytics and Logs APIs.

With that said, while they are flexible and powerful enough for the vast majority of use cases, the standard Analytics reporting tools can’t reasonably satisfy every single one.

Yext Data Sharing opens up several new avenues for businesses to realize the full potential of their data:

  • Fully customizable reporting - Write SQL queries against raw data to customize reports to your liking. Calculate complex, business-specific metrics that aren’t available out-of-the-box. Run large, compute-intensive SQL queries that aren’t supported in Report Builder or Reports API.
  • Centralized business insights - Integrate your Yext data with the rest of the internal business data in your Snowflake warehouse to consolidate your reporting workflow.
  • Sync data seamlessly - Move data from Yext to your internal warehouses more efficiently. Existing Snowflake customers can sync data warehouse-to-warehouse. Yext’s integrations with third-party BI tools like Tableau are built on top of the Reports API, which has limitations. Data Sharing allows you to build more flexible and powerful integrations on top of Snowflake with their ecosystem of BI partners.

How It Works

The concept of Yext Data Sharing is actually pretty simple. There is a lot going on under the hood to support sharing, but the process boils down to two main steps:

  1. A provider Snowflake account (Yext) creates a data share object (Share A), and adds secure views to it.
  2. Yext adds an external consuming Snowflake account to the data share. Users belonging to the account receive read-only query access to the secure views in the data share. In other words, users can read (query) the data, but cannot write (alter) the data.

datasharing diagram

No actual data is copied or transferred between accounts during this process. All sharing is accomplished through Snowflake’s services layer and metadata store. This makes the feature extremely secure and access to the data becomes near-instantaneous for consumers.

Account Prerequisites

In order to participate in Yext Data Sharing, you must be a Snowflake customer whose account lives in the us-east-1 (AWS) region. This is the region of our primary Snowflake account where we share data from. Additionally, your Yext account must be in the US partition – Data Sharing is not yet supported for EU partition customers.

For more on Snowflake’s different cloud regions, see their documentation here .

Additionally, only the ‘root’, or top-level business in an account tree can define a data share. If you are a reseller who controls multiple sub-accounts, only the top-level business can define a data share. Individual sub-accounts cannot define a data share - instead, their data will be subsumed by the root (parent) account’s data share.

Yext Data Share

light bulb
Note
All database objects shared between accounts are read-only. This means that the objects cannot be modified in any way, including adding, deleting, or modifying table data.

Key Components

There are four main primitives in a Yext Data Share – the sharing database, secure views, the Yext Data Share, and the Snowflake Share.

Yext has a designated global sharing database that contains all of the data to be included in Data Sharing. This sharing database contains several secure views . A Snowflake view is a named object that stores the result of a query as a table. ****A view can be used almost anywhere a table is used - in joins, subqueries, etc.

Secure views are specifically designed for data privacy. With regular views, anybody can see the view definition and the underlying tables. With secure views, only authorized users can see this information. The secure view is ultimately what contains the raw data for you to consume and interact with.

Yext populates the Snowflake Share with these secure views and creates a Yext Data Share resource in Config-as-Code (CaC). The Snowflake Share resource is ultimately what you import into your Snowflake account.

The flowchart below illustrates how each of these pieces interact.

Row Level Security

Yext Data Sharing ensures that you only see the data that applies to your business by applying row-level security to the data share object. This means that you will only receive query access to the rows of a given secure view that contain your business’ data.

For more information on row-level security works and how Yext Data Sharing handles security and sensitive data in general, see Security and Data Sensitivity in Data Sharing .

Signing Up

Yext Data Sharing will be available by request for mutual customers of Yext and Snowflake via a personalized listing on the Snowflake Marketplace.

You should be able to sign up by requesting access through the marketplace listing. After that, a member of your Yext account team will be able to set up your data share resource. Once the share is set up, all you need to do is import it into your Snowflake warehouse and you can start querying the data!

For more information on how to sign up and get started, see Security and Data Sensitivity in Data Sharing

Available Data

Product Domain Schema Views
Public public public.businesses
Platform platform platform.api_requests
platform.apps
platform.consumer_api_requests
platform.endpoints
platform.function_invocations
Pages pages pages.builds
pages.generations
pages.generation_function_invocations
pages.pages_errors
pages.sites
Analytics analytics analytics.analytics_events
Content content content.entities
content.entity_types
content.labels
content.profile_field_data_cdc
content.entity_data_cdc (table still in development)
content.folders
Listings listings listings.publishers
listings.google_performance_metrics
listings.google_search_keywords
listings.search_term_type
Listings (Legacy Schema) legacy_listings legacy_listings.agg_google_my_business_location_daily
legacy_listings.agg_bing_weekly
legacy_listings.agg_location_daily
legacy_listings.agg_google_my_business_metrics
legacy_listings.agg_facebook_demographics_daily
legacy_listings.agg_facebook_location_daily
Reviews reviews reviews.entity_reviews
reviews.entity_review_comments

For each product domain, we will provide the view definition and a data dictionary for each secure view, some sample use cases, and starter SQL queries. The schemas and content of the views are the same across all environments.

Example Use Cases

Here are a couple of example use cases where you might find a lot of value in using Yext Data Sharing.

Example 1: Pages Deployment Heatlh with Fleet Management

Let’s say you’re a customer that uses Yext Pages for a ‘fleet management’ use case, where you’re trying to manage thousands of sites at scale, from the same GitHub repository. Whenever you trigger a new deploy, you’ll often want to parallelize the deployments so that I’m running hundreds or even thousands of deployments at once.

You need a way to track the health of these deployments and to ensure that you’re on top of any failed builds. The Yext platform does a great job of tracking the status of each individual deploy, but you can only view one site at a time in the UI.

If you have thousands of sites, the only way to check the status of these deployments is to check each individual site in the Yext platform, which is not scalable at all. There isn’t a way to calculate custom metrics like this out-of-the-box.

With Yext Data Sharing, you can run some custom SQL queries to find out how many deployments you made today and how many of those failed.

Total # of Deployments Today

select count(*)
from pages.sites
where date(creation_timestamp) = current_date()

# of Deployment Errors Today

select count(*)
from pages.pages_errors 
join pages.sites using(site_id)
where date(timestamp) = current_date()

It looks like you had a few failed site deployments! Let’s run another query to get some more information on which sites these failures were associated with.

Metadata about Sites Associated with a Failed Deploy

select 
    resource_name as business_id,
    name as business_name, 
    max(pages_errors.timestamp) as timestamp,
    concat('https://yext.com/s/', businesses.business_id, '/yextsites/', pages_errors.site_id) as site_link,
    sites.site_name,
    pages_errors.site_id,
    pages_errors.error_type,
    sites.repo_uuid
from pages.pages_errors 
join pages.sites using(site_id)
where date(timestamp) = current_date()
group by 1,2,4,5,6,7,8
order by 3 desc;

You can add the outputs of these queries to a Snowflake dashboard and continually refresh and monitor it to stay on top of all of my deployments!

Example 2: Listings Monitoring for Reseller

Let’s say I’m a reseller partner who wants to power monthly Listings reporting for thousands of sub-accounts, each of whom has dedicated end users that rely on this reporting. However, the Reports API has a row limit of 500k rows, meaning no report generated can be larger than that.

I need a way to create reports for each of my sub-accounts, dimensioned by entity and day, but if I try to send this report to the Reports API, the report will likely fail because the generated output would exceed the row limit.

Keep in mind that the Reports API is simply a layer of abstraction on top of the raw data in Snowflake. With Yext Data Sharing, I don’t need to worry about the row limits or other limitations of a REST API. I can just directly query the data to pull the report that I want to create.

Listings Impressions for Every Sub-Account in an Account Tree

select
-- Dimensions
    businesses.resource_name, -- Sub-account dimension
    google_performance_metrics.entity_id,
    google_performance_metrics.date,
    case
      when google_performance_metrics.metric in ('BUSINESS_IMPRESSIONS_DESKTOP_MAPS', 'BUSINESS_IMPRESSIONS_MOBILE_MAPS') then 'MAPS'
      else 'SEARCH' 
    end as app,
-- Metric
    sum(case 
        when metric in ('BUSINESS_IMPRESSIONS_DESKTOP_MAPS','BUSINESS_IMPRESSIONS_MOBILE_MAPS','BUSINESS_IMPRESSIONS_DESKTOP_SEARCH','BUSINESS_IMPRESSIONS_MOBILE_SEARCH') 
        then value end) 
    as listings_impressions
from yext.listings.google_performance_metrics 
join yext.public.businesses using(business_id)
where date between 'yyyy-mm-dd' and 'yyyy-mm-dd'
  and app is not null
  and google_performance_metrics.business_id is not null
  and google_performance_metrics.entity_id is not null
group by 1,2,3,4

Frequently Asked Questions

Are there any limitations?

Keep in mind that this is an initial release, so the system has a few limitations for the time being:

  • A consuming Snowflake account can only be associated with a single Yext Data Share across a product environment. This means there can be only one Yext Data Share in the production environment, but the same business can have another one in a sandbox environment if they have an associated Yext account in that environment.
  • For customers that control multiple subsidiary accounts, such as reseller partners, only the ‘root’ (parent) account can define a data share.
  • Customers can’t directly interact with the Yext Data Share resource yet. For support, reach out to your Yext account team to submit a ticket to Engineering, or use Snowflake support.

Can I participate in Data Sharing without a Snowflake account of my own?

You will need to have a Snowflake account to participate in Yext Data Sharing.

If you aren’t already a Snowflake customer and aren’t yet ready to fully commit to signing up for Snowflake, you can sign up for a 30-day trial account . This will give you access to Yext Data Sharing, and it can be converted to a full paid account at any time within that 30-day window.

Are there pricing implications?

Yes, but you will not be billed by Yext.

Instead, you’ll be billed directly through Snowflake. Snowflake charges based on the compute resources, such as virtual warehouses, used to query the shared data.

If you’re using a trial account, you won’t be billed by Snowflake until you decide to convert to a full paid account.

The incremental cost impact of Yext Data Sharing is completely subject to your business’ usage. If you don’t query the shared database often, you probably won’t see much of a markup on your regular monthly Snowflake costs, but if you query it all the time, you might see an incremental increase in your bill.

For more on Snowflake’s compute costs, see the documentation .

How will I know how to set up my SQL queries?

The Data Dictionary documentation should be a useful resource for setting up SQL queries. This document lays out the data model, and for each view, contains detailed descriptions of each column and a few sample queries.

Can I move my Yext data outside of Snowflake?

Yes!

Once your Yext data is in your Snowflake warehouse, you are free to use it for your internal business purposes.

Whether you’d like to keep it in Snowflake, combine it with other business data in a different database like GCP , or export it to a BI tool like Tableau, you have the flexibility to choose what you do with your data as long as it is for your internal business purposes (which means you cannot sell your Yext data, or disclose or distribute your Yext data externally).

Can I still use the Reports API?

The Reports API isn’t going anywhere!

Yext Data Sharing is meant to augment your current workflows for getting data out of Yext. The Reports API is still a great fit for simpler, standardized use cases.

However, for more complex / custom / scaled use cases, Yext Data Sharing is going to be much more flexible and powerful than the Reports API and is a more appropriate solution.

Is Data Sharing secure?

Yes, Data Sharing is highly secure! If anything, Data Sharing will help minimize security risks by allowing you to make your raw data available directly in your data warehouse without having to involve a third-party (ETL) pipeline and without actually copying or transferring the data.