Skip to main content

Augmenting Collected Events

Upon entering the CDP, all collected events are transformed in real-time to include additional information or to remove user sensitive information before they are persisted. The following sections described the data we augment events with.

Metadata

The following event metadata is derived and available for all events.

MetadataLocation on the tracking eventExample
Timestamp (readable)meta.timestamp2022-07-07T16:09:17.809Z
Timestamp in milliseconds (epoch unix format)meta.timestampMillis1657210157809
Is a valid eventmeta.isValidEventtrue
User IDmeta.userId{"id": "1pCookie", "id": "c6ac2829223e182cc225b2278a2e2622"}
Request made by a bot/crawlermeta.fromBotfalse
Request made by device with a blacklisted user-agentmeta.isBlacklistedfalse

IP Address

For requests that are directly sent by the user's device, through one of our SDKs, we use the IP address of the device that executed the HTTP request. For server-to-server events or events that were received through one of the supported webhooks, the IP has to be explicitly stated as part of the request. Some third-party integrations via webhooks might not support sending the IP address. Refer to each webhook details for information on how to send IP addresses.

After extracting the IP address, it is persisted in the event under the field meta.ip.

IP Anonymization

IP addresses are automatically anonymized by default.

Anonymization occurs as soon as it is possible, before any logging or persistence takes place, and the full IP address is never stored. The anonymization principle is simple: every IP found on the request will have its last octet set to 0. For example, if we have 123.22.22.14 it will become 123.22.22.0.

Geolocation

The IP address from the user tracking event is analyzed so that we can derive geolocation information. The information obtained through this process is purposely imprecise in order to avoid tracing the address or location of a particular user.

Derived InformationLocation on the tracking eventExample
Countrymeta.countryUnited States of America
Citymeta.cityNew York
Location (latitude and longitude)meta.location[40.7128, 74.0060]
Autonomous System Number (ASN)meta.asnCOMCAST-7922
Postal Codemeta.postalCode32073

In order to provide this functionality the CDP uses GeoLite2 databases created by MaxMind, available from https://www.maxmind.com.

Entrypoint

The meta.entrypoint field can be used to identify how the event was collected on the CDP.

Under meta.entrypoint.type we label the channel through which the event reached the CDP:

  • tag - the event was collected directly by either the JavaScript tag or the Android/iOS SDK.
  • server - the event entered the CDP through the server-to-server endpoint.
  • templated - the event entered the system through one of the templated webhooks.
  • source - similar to templated but currently it only applies to the integration with Segment.

Entrypoints of type templated and source can also contain an additional field, name, which states the third-party integration from which the event originated from.

Derive Information

We use the value of the User-Agent HTTP header to assert the user's device. Similar to the IP address, the User-Agent is directly fetched from events sent using one of our SDKs or has to be explicitly sent if the request entered through a server-to-server endpoint or through a third-party webhook.

We store the following information under meta.user-device:

  • user-agent-family
  • user-agent-major
  • user-agent-minor
  • os-family
  • os-major
  • os-minor
  • device-family
  • device-type

Interaction Type

We classify each event type according an interaction type. This information is stored under meta.interactionType.

The interaction type can either be:

  • passive - events with type activationRequest, matchRequest, and cookieSyncRequest;
  • outbound - events with type adView, emailDelivery, and emailSend;
  • active - all the other event types.

Product Data

When an event contains one or more products, the received product details will be complemented with information about the product if a Product Feed is available. More information on how to enable and setup this feature can be found in the Offline Imports section.

Currency Conversion

We convert every field that contains a currency to the default system currency. We support a variety of currencies and the conversion rates used are updated daily. All fields under meta.data should have the currency converted to the system default.

We only convert the currency for fields that are part of our events schema. Custom fields are not converted. The unaltered payload with the original currency can be found under meta.rawData.

Origin

We add some metadata about how the event originated under meta.origin. This field encompasses various factors that might have led to the occurrence of the event and it contains the following fields:

  • source - Where the user came from. Possible values:
    • direct - The user accessed the website directly;
    • email - The event is related to an email event;
    • none - The event occurred while the user was navigating the website, after it had already entered through some other channel;
    • google - The user accessed the website through one of Google channels (like Adwords and DoubleClick);
    • Other - Related to UTM vars or the web page the user was what before being directed to your website;
  • medium - The type of traffic or tool used to get to the website;
  • campaign - Ad campaign that originated this event;
  • keyword - Any keyword related to a possible ad that originated this event;
  • content - Used to differentiate the content of a possible ad that originated this event.

These fields can be derived from the event using the following prioritized list:

#Conditions by priorityExtracted origin
1Event contains utm_* query parameters (at least utm_source)source: value from utm_source
medium: value from utm_medium
campaign: value from utm_campaign
content: value from utm_content
keyword: value from utm_term
2Event contains gclid query parametersource: google
medium: cpc adwords
3Event contains gclsrc query parametersource: google
medium: cpc doubleclick
4Event type is email*source: email
campaign: value from campaignId
5Event is missing an referrersource: direct
6Event referrer is a search enginesource: domain from referrer
medium: organic search
7Event referrer is a payment platformsource: none
8Event referrer is blacklistedsource: none
9Any other referrersource: domain from referrer
medium:referal
content:path from referrer