Schema.org(LD+JSON) extraction for connectors workaround

Hi Community,

Extracting data in LD+JSON format is a limitation for connectors as of today. Here is a workaround.

Note: This is an unofficial workaround that can be used (at your own risk) till Yext releases a permanent solution. And this works with JSON Objects but not with JSON Arrays.

JSON data can be classified into 2 categories.

  1. Standard format of data across all the pages crawled.

  2. Variable data.

We will be using Yext functions in transform to solve this.

Before we dive into creating the function, please make sure

  • The data is in a valid JSON format. We can use Json viewer to validate the same.
  • We have the JSON in one of the columns in our connector.

Create a function and paste the below code into it.

export function getStandardData(inputJson: string) {
  return printValues(JSON.parse(inputJson));
}

export function getVariableData(inputJson: string) {
  let jsonString = JSON.parse(inputJson);
  /**
   *  Add your keys in array format like ["name"]
   *  To use nested JSON, use the format of parent.child or parent.child.grandchild
   */
  let keys = [];
  let returnString = "";
  keys.forEach((item) => {
    returnString += buildString(item, jsonString) + "$$";
  });
  return returnString.replace(/(\$\$)*$/, "");
}

//  Build and return string based on set of keys
const buildString = (keyIn, data) => {
  var value = "";
  if (keyIn.includes(".")) value = getValue(keyIn, data);
  else value = data[keyIn] ? data[keyIn] : null;
  return `${keyIn}###${value}`;
};

const getValue = (initialKey, obj) => {
  const [key, ...rest] = initialKey.split(".");
  if (obj[key] == null) {
    return "null";
  }
  if (rest.length) {
    return getValue(rest.join("."), obj[key]);
  }
  return obj[key];
};

let data = "";
function printValues(obj, parent = "") {
  let tempKey = "";
  for (var key in obj) {
    if (typeof obj[key] === "object") {
      printValues(obj[key], `${parent}${key}_`);
    } else {
      data += `${parent}${key}` + " ### " + obj[key] + "$$";
    }
    tempKey = "";
  }
  return data.replace(/(\$\$)*$/, "");
}

Let’s use it in our connector.

  • Add transform
    • Transform - Function
    • Columns - Column holding the JSON
    • Plugin - Your plugin name.
    • Function - getStandardData for standard data and getVariableData for variable data
  • The data returned will be joined using $$.
  • Create a transform
    • Transform - Split Column
    • Column - Column on which the function is run.
    • Add Column - Add all the new column names.
    • delimiter - $$
  • Once you’ve got all the columns you need and another transform.
  • Transform - Find and Replace
    • Columns - Select all te columns created in above step.
    • Find - Regex
    • Regex - (.*)###\s?
    • Replace - Leave it blank

Map the columns to relevant entity fields and run the connector.

Here is an example of using the function to extract name, count, child name, child street, and parent state.

{
  name: 'Family',
  count: 6,
  child: {
    name: 'Jim',
    age: '24',
    address: {
      street: '5th ave',
      state: 'NY',
      country: 'US',
    },
  },
  parents: {
    name: 'John',
    age: '45',
    address: {
      street: 'New brew st',
      state: 'CA',
      country: null,
    },
  },
}

In the keys array add non-nested keys directly and nested keys joined by a . in hierarchal order.
let keys=["name", "count", "child.name", "child.address.street", "parent.address.state"];

Note: Once the function is uploaded, You can play around with it in the Admin console (CAC), like adding/removing keys.

2 Likes