Data aggregation is the least visible part of a family office platform and the most consequential. It is the layer nobody sees in a demonstration, the capability that rarely features in a vendor's marketing materials, and the foundation on which every report, every analytics output, and every AI-generated response depends. It is also the area where the gap between platforms claiming equivalent capability is widest.
The challenge for a family office evaluating aggregation tools is that the differences are not visible on the surface. Two platforms can present similar consolidated portfolio views while relying on fundamentally different data architectures underneath: one built on direct institutional feeds with rigorous normalisation and validation, the other on web scraping, manual workarounds, and inconsistently standardised data. The reports look similar. The reliability does not hold up over time.
This framework covers the five dimensions that separate genuinely strong data aggregation from platforms that are adequate in favourable conditions and unreliable when the complexity of a real portfolio is applied to them.
The first question to ask of any data aggregation platform is not how many connections it has. It is what kind of connections they are.
Direct SFTP feeds from custodians and banks represent the appropriate standard for family office data aggregation. An SFTP connection delivers structured, validated transaction and valuation data automatically from the institution, on a scheduled basis, in a format agreed between the platform and the data provider. The data arrives consistently, reliably, and without manual intervention. A platform with 500 direct SFTP connections can automate the data receipt process for the vast majority of portfolios it serves.
The alternative, used by some platforms to extend their apparent coverage, is web scraping: automated tools that log into banking portals and extract data by reading the screen rather than receiving a structured feed. Web scraping is fragile. It breaks when a portal updates its layout, introduces new authentication steps, or makes any structural change to the page the scraper was reading. It is slow, dependent on portal availability. And it produces unstructured data that requires significantly more processing and produces higher error rates than a direct institutional feed.
The practical consequence for a family office is that a platform with 1,000 connections, of which 600 are web scrapers, delivers lower-quality, less reliable data for those 600 sources than a platform with 500 direct SFTP connections. The headline connectivity number is not the relevant figure. The proportion of those connections that are direct institutional feeds is.
The question to put to any vendor is direct: of your stated connections, what proportion are direct SFTP or API feeds from the institution, and what proportion are web scrapers or portal-based extraction tools? A vendor that cannot answer that question clearly, or that conflates the two categories, has answered the question without meaning to.
Beyond connection type, the coverage of the specific institutions in the office's portfolio matters more than the absolute number of connections. A platform with 500 direct connections that covers 90% of the custodians and banks appearing in the office's book delivers more value than a platform with 1,000 connections that covers 60% of them. Before finalising any evaluation, the office should provide a representative list of its custodians and banks and ask the vendor to confirm which are covered by direct feeds, which by alternative methods, and how gaps are handled. For a detailed look at how data sourcing works across the full range of sources a family office manages, that piece covers the landscape in full.
Data from different custodians, banks, and administrators does not arrive in a common format. Asset names differ between institutions. Transaction categories vary. Currency conventions are inconsistent. Valuation methodologies differ. Without a robust normalisation process, the consolidated environment the platform presents to the office contains the same holding described differently by two custodians, transactions categorised in ways that distort asset class analysis, and performance figures that do not reconcile across sources.
This is not a minor inconvenience. It is a fundamental data quality problem that propagates through every report, every analytics output, and every AI-generated response the platform produces. A consolidated view built on unnormalised data looks like a consolidated view. It behaves like a collection of inconsistently formatted source files with the joins hidden.
The normalisation process that genuinely addresses this standardises all incoming data against a consistent schema before it is stored. The same bond held at two custodians appears as one position in every report. A transaction described differently by two institutions is categorised consistently. Currency conversions are applied uniformly. The consolidated picture the office works from reflects a single standard rather than the aggregated inconsistencies of its sources.
The questions to ask when evaluating this capability are specific. How does the platform's normalisation process work, and at what point in the data pipeline is it applied? How are conflicts between data from two sources, such as different valuations for the same position on the same date, identified and resolved? Is the normalisation schema documented and available for the office to inspect? And how are changes to source data formats, which custodians make periodically, handled when they occur?
A platform that can describe its normalisation process precisely, with evidence from comparable client portfolios, has invested in this layer seriously. One that describes it in general terms and redirects to reporting quality is not prioritising the part of the infrastructure that reporting quality depends on.
For a family office with significant allocations to private equity, real estate, hedge funds, and direct investments, the platform's capability for handling alternative investment data is as important as its handling of listed assets. This is also the dimension where the gap between strong and adequate platforms is most significant.
Listed assets held at custodians arrive through automated daily feeds and slot cleanly into the normalisation and consolidation process. Alternative investments do not. Capital account statements, NAV letters, LP reports, and valuation notices arrive weeks after month-end in PDF and Excel formats from counterparties that have no standardised data delivery obligation. Without a specific capability for handling this data, it either enters the consolidated environment through manual data entry, creating a bottleneck that scales badly as the alternatives allocation grows, or it remains outside the consolidated picture entirely.
AI-powered document parsing is the current standard for platforms that take alternatives coverage seriously. Rather than a team member reading a capital account statement and entering figures manually, the platform reads the document automatically, identifies the relevant data points, validates them against expected values, and incorporates them into the consolidated environment. The latency in when administrators publish data cannot be changed. The manual processing burden once it arrives can be eliminated, and the gap between alternatives data and listed data in the consolidated picture can be substantially reduced.
The questions to evaluate this capability are direct. Does the platform support automated document parsing for capital account statements, NAV letters, and LP reports? What document formats does it handle, and how are documents submitted for processing? What validation is applied to extracted data before it is incorporated into the consolidated environment? And what is the process when the document format changes, which fund administrators do periodically?
For banks and custodians that lack the technical infrastructure to provide direct SFTP feeds, AI document parsing serves the same function: automating the extraction of data from statements that would otherwise require manual entry. A platform that applies document parsing consistently across both alternatives documents and bank statements from non-connected institutions has built a genuinely comprehensive data coverage model rather than an automated layer with significant manual gaps underneath it. For a detailed treatment of the specific challenges that alternatives create for consolidated reporting, that piece covers the five areas where platforms most commonly fall short.
The volume of data a platform ingests is not the measure of its aggregation quality. The proportion of that data that is accurate, current, and properly validated before it reaches reporting and analytics is. A platform that ingests high volumes of data quickly but with inconsistent quality creates a different problem rather than solving the original one.
The data quality layer that distinguishes strong aggregation platforms covers three specific capabilities.
Automated validation rules applied to incoming data before it is stored, checking that values fall within expected ranges, that transaction types match the asset class, and that valuations are consistent with prior periods within acceptable tolerances. These rules catch the majority of data quality issues at the point of ingestion rather than after they have propagated through reporting.
Automated exception detection that flags anomalies and surfaces them to the team for review. The team should be reviewing a manageable list of exceptions, not auditing every transaction. The volume of exceptions relative to total data processed, and the process for resolving them, is a practical indicator of the underlying data quality the platform delivers.
Reconciliation capability that allows the office to verify the platform's view of the portfolio against custodian statements and resolve discrepancies when they arise. A platform that makes reconciliation easy, with clear tooling for identifying and investigating differences, is one that respects the office's need to maintain its own view of data accuracy rather than simply trusting the platform's output.
The data quality question to put to any vendor is the same one that reveals the true capability of the aggregation layer: what is your average exception rate as a proportion of total data processed, and what does the exception resolution process look like in practice? The answer, and the willingness to provide it with specificity, is revealing.
For a family office aggregating the complete financial picture of a single family into a platform, the architecture of how that data is stored and governed is not a secondary consideration. It is a fundamental part of what separates platforms built for the SFO context from those adapted to serve it.
Physical data isolation means the office's data resides in an environment that is architecturally separate from the data of any other organisation using the same platform. Not logically separated within a shared system, but held in a dedicated database that cannot be accessed, queried, or affected by activity in another client's environment. This is the appropriate standard for information of the sensitivity a family office holds, and it matters specifically for data aggregation because the aggregation layer is where the most complete and current version of the portfolio picture exists. For a full explanation of what physical isolation means in practice and why it matters, that piece covers the distinction in detail.
Data ownership is a related but distinct question. The office should retain ownership of its data and its data environment. If the relationship with the platform provider ends, the data and its history should be exportable in a usable format rather than held within a proprietary system the office cannot access independently. The choice of region where data is stored is also a practical governance question for offices with specific data residency requirements.
Finally, as AI becomes an integral part of how family offices interact with their aggregated data, the governance of the AI layer depends directly on the governance of the data layer beneath it. An AI agent that queries the consolidated data environment should operate within the same user permissions that govern access to the rest of the platform, and within the boundaries of the office's physically isolated data environment. A platform that has invested in data architecture for AI readiness, not just reporting readiness, is better positioned for how the technology will continue to develop. For a practical framework on preparing the data infrastructure for AI, that piece covers the specific requirements in full.
The most reliable test of a data aggregation platform in a family office context is not a demonstration using vendor-prepared data. It is a structured assessment using the office's actual custodian and bank list, a representative sample of its alternative investment document types, and a clear question about how each source would be handled.
A platform that can confirm direct SFTP coverage for the majority of the office's custodians, describe its normalisation process with precision, demonstrate its alternative investment document handling with live examples, and evidence its data quality controls with specific metrics has built a genuinely strong aggregation capability. One that speaks primarily about its reporting outputs, its visualisation quality, or its AI features without addressing the data infrastructure questions has prioritised the visible layer over the foundation it depends on.
For offices beginning this evaluation, Sesame One for family offices provides a starting point for understanding how the full data infrastructure layer is built, maintained, and governed.