Apache SkyWalking – Profiling

Zh: AI Coding 如何重塑软件架构师的工作方式

Sun, 15 Mar 2026 00:00:00 +0000

以 SkyWalking GraalVM Distro 为例，看 AI Coding 如何把一批探索性 PoC 打磨成一条可重复的迁移流水线。

这个项目给我最大的启发，不是 AI 能写多少代码，而是 AI Coding 改变了架构设计的试错成本。当一个想法可以很快做成 PoC、跑起来验证、不行就推翻重来时，架构师就更有机会逼近自己真正想要的设计，而不是过早停在“团队现在做得出来”的折中方案上。

这种变化在成熟开源系统里尤其重要。Apache SkyWalking OAP 长期以来一直是一个功能强大且经过生产验证的可观测性后端，但大型 Java 平台该有的问题它一个不少：运行时字节码生成、重反射初始化、classpath 扫描、基于 SPI 的模块装配，以及动态 DSL 执行——这些机制方便扩展，但做 GraalVM Native Image 时全是障碍。

SkyWalking GraalVM Distro 的出现，源于我们把这个挑战当成一个架构设计问题来处理，而不是一次性的移植工程。目标不仅是让 OAP 能以原生二进制运行，更是把 GraalVM 迁移本身做成一条可重复执行、能够持续跟上上游演进的自动化流水线。

如果你想看完整的技术设计、基准数据和上手方式，请阅读配套文章：SkyWalking GraalVM Distro：设计与基准测试。

从停滞的想法到可运行的系统

这件事其实很多年前就开始了。在这个仓库创建不久之后，yswdqz 曾花了数个月探索迁移方案。真正做下来才发现，这个项目远比 GraalVM 文档里列出的那些单点限制复杂得多，这项工作最终也因此搁置了很多年。

这段停滞很重要。缺少的并不是想法。成熟维护者通常从来不缺想法，真正稀缺的，是把这些想法真正做出来的时间、人力和精力。即使架构师已经看到了几条很有前景的路线，有限的开发资源也会迫使大家更早做出权衡：优先选择实现成本最低的方案，而不是那个更干净、更可复用、更经得起未来变化的方案。

这种情况非常普遍，并不特殊。在开源社区里，很多工作依赖志愿者或有限的企业赞助；在商业产品里，约束的形式不同，但本质仍然一样：路线图承诺、团队规模和交付压力都会让工程资源始终紧张。在这两种环境里，很多好想法被放弃，并不是因为它们错了，而是因为要把它们真正验证清楚、实现完整，成本太高。

还有一个同样重要的约束：架构师通常同时也是非常资深的工程师，而不是一个可以全职扑在实现细节上的人。问题在于个人编码精力有限、时间高度碎片化，同时还要在代码尚未出现之前，不断向其他资深工程师解释自己的设计意图。传统上，这种解释主要通过图、文档和沟通完成。它很慢、信息损失大，而且充满不确定性。我们都体验过“传话游戏”：哪怕是很简单的意思，也很容易被误解，而等误解真正暴露出来时，时间已经过去很多了。

到了 2025 年末，AI Coding 让”同时尝试多条路线”这件事终于变得现实。我们不必再因为实现能力稀缺而过早接受折中，而是可以在多个设计之间来回切换，用代码验证，快速淘汰弱方案，持续迭代，直到架构本身变得足够稳固、足够实用、足够高效。

这种设计自由度至关重要。GraalVM 文档对单个限制讲得很清楚，但成熟 OSS 平台遇到的是一整套彼此牵连的系统性问题。只修补一个动态机制远远不够。要让 native image 真正落地，我们必须把整类运行时行为改造成构建期产物和自动生成的元数据。

在这条路的早期历史中，还有一座非常具体的大山。那时上游 SkyWalking 仍然大量依赖 Groovy 来处理 LAL、MAL 和 Hierarchy 脚本。理论上，这只不过是另一个“不支持运行时动态行为”的例子；但在实践中，Groovy 是整条路径上最大的障碍。它不仅意味着脚本执行，还意味着一整套在 JVM 里极其便利、在 native image 里却极其不友好的动态模型。

为了跨过这道坎，我们围绕 AOT-first 模式重新设计了 OAP 的核心引擎。早期实验必须直接面对 Groovy 时代的运行时行为，并尝试不同的脚本编译方案来绕过去。最终方案走得更远：对齐上游编译器流水线，把动态生成前移到构建期，并引入自动化机制，让这条迁移路径在上游持续演进时依然保持可控。具体来说，就是把 OAL、MAL、LAL 和 Hierarchy 的生成过程变成构建期预编译器的输出，而不是继续保留为启动期的动态行为。

AI Coding 如何改写架构迭代

这次转变的关键，并不只是“写代码更快了”。AI 真正改变的，是想法、原型、验证和重设计之间来回迭代的速度。围绕同一个问题，我们可以很快做出几个可运行的 PoC，迅速淘汰不成立的方向，再把值得保留的抽象慢慢沉淀成一套连贯的迁移系统。

这并不会削弱人的架构价值，反而会放大它。哪些行为应该前移到构建期，哪些地方应该保留可配置性，哪里应该引入 same-FQCN 替换，如何让上游同步保持可控，以及哪些抽象值得不惜代价保留下来，这些判断仍然只能由人来做。不同的是，AI 的速度让我们终于有机会把这些更好的设计真正做出来，而不是过早退回到更简单、也更差的折中方案。

这才是软件架构师工作方式真正发生变化的地方。过去，架构师往往已经知道更干净的方向在哪里，但有限的工程产能会逼着那个愿景退回到一个更便宜的妥协方案。现在，架构师在某种意义上又重新变回了“能快速动手的人”：可以直接用代码把思路搭出来，把高层抽象落成接口，再用真实运行的实现去证明设计。

这不仅改变了实现，也改变了沟通方式。在开源里，我们常说：talk is cheap, show me the code。在 AI Coding 时代，“把代码拿出来”这件事变得容易多了。设计不再那么依赖一个缓慢的、自上而下的翻译过程：从想法到文档，再到解释，再到实现。代码可以更早出现，也可以更早跑起来。

这也让其他资深工程师受益。他们不必只靠图、会议或长篇解释来还原整个设计，而是可以直接审查抽象、阅读真实代码、运行它、质疑它，并在具体实现上一起打磨。这让架构协作更快、更清晰，也少了很多沟通误差。

也正因为如此，我总觉得今天很多 AI 讨论有点跑偏。很多项目确实很有趣、也很好玩，拿来体验当然没问题，但高级工程工作并不会因为“给代码库接了个 agent”就自然变好。真正重要的，不是哪个 demo 看起来最炫，而是哪些工程能力真的被放大了，同时软件开发本身的纪律有没有被保留下来。

对于架构师和资深工程师来说，这里真正重要的能力包括：

快速做对比式原型验证：不是只用 slides 和文档去论证某个想法，而是直接把多个方案做成可运行代码来比较。
大规模代码理解能力：能在大量模块之间快速阅读，同时保持对整个系统的全局认识。
系统性的重构能力：把基于反射、依赖运行时动态行为的路径，系统性地改造成适配 AOT 约束的设计。
搭建自动化的能力：当一个迁移步骤在每次上游同步时都必须重做一次，靠手工处理本身就很费时费力，而且越往后只会越累。AI 让我们真正有条件去投资生成器、清单、一致性检查和漂移检测，把重复的人力劳动变成可重复的自动化流程。
大范围审查能力：在很大的代码面上检查边界条件、兼容性约束，以及方案是否经得起反复执行。

这些能力也都体现在最终的设计结果里。same-FQCN 替换为 GraalVM 特定行为建立了清晰、受控的边界；反射元数据不再依赖手工维护的猜测清单，而是直接从构建产物中生成；各种清单机制和漂移检测，则把原本模糊的“上游同步风险”变成了显式的工程工作流。

对于初级工程师，我觉得这里的启发同样重要。AI 不会让架构设计、系统约束、接口设计、测试和可维护性这些基本功变得不重要。恰恰相反，这些能力只会变得更重要，因为它们决定了“被加速的实现”最终产出的是一个可持续演进的系统，还是只是更快地制造出更多代码。真正的杠杆来自工程判断力，而不是新鲜感。

Claude Code 和 Gemini AI 在整个过程中都扮演了工程加速器的角色。在 GraalVM Distro 这个项目里，它们具体帮我们做了几件事：

把迁移思路直接做成可运行代码：不是争论哪个方向可能行得通，而是把多个真实原型做出来、跑起来、比较掉，把不成立的方向淘汰掉。
重构重反射、重动态的代码路径：把不适合运行时的模式系统性替换成 AOT 友好的实现方式。
让上游同步真正可持续：每次 distro 从上游 SkyWalking 拉取变更后，元数据扫描、配置再生成和重新编译都必须再来一次。AI 帮助我们把这些过程做成流水线，使每次同步都变成一个可控、且大部分自动化的过程，而不是一次比一次更长的手工重复劳动。
在大范围内审查逻辑和边界情况：特别是在功能对等性比纯实现速度更重要的地方。

最终产出的，不只是一次大重写，而是一套可重复的系统：预编译器、manifest 驱动的加载、反射配置生成、替换边界，以及让上游迁移可审查、可自动化的漂移检测机制。

如果你想看这种开发方法背后的更广泛背景，可以读这篇文章：在成熟开源大型项目中实践 Agentic Vibe Coding：软件工程与工程控制论还在延续。这篇文章则是这个故事的下一步：不仅是在一个成熟代码库里增强功能，而是重新激活一项曾经停滞的工作，并把它真正做成可运行系统。

真正改变的到底是什么

这个项目最重要的结果，并不是一张 benchmark 表。基准数据当然属于 distro 本身，而且它们很重要，因为它们证明这套系统是真实可运行的。但对这篇文章来说，更深层的变化发生在方法论层面：AI Coding 改变了我们探索、验证和打磨架构方案的方式。

过去，架构往往更像一项以文档为主、后面拖着漫长而昂贵实现过程的活动。现在，我们可以更快地在想法、原型、比较和重设计之间切换。这让我们真正有机会去追求更高抽象层次的方案，保留更干净的边界，并建设那些让迁移过程可持续维护的自动化机制。

这项工作的技术证据，就是 SkyWalking GraalVM Distro 本身：它不仅是一个可运行的系统，更是一条由预编译器、自动生成的反射元数据、受控替换边界和漂移检查组成的迁移流水线。基准数据之所以重要，是因为它们证明这套系统在实践里是成立的；但从架构角度看，真正的结果是：这次迁移不再是一场一次性的移植，而是变成了一套可重复执行的系统工程。关于完整测试方法、原始数据和技术设计，请阅读配套文章：SkyWalking GraalVM Distro：设计与基准测试。

项目仓库位于 apache/skywalking-graalvm-distro。我们欢迎社区成员测试这个新发行版、提交 issue，并帮助它逐步走向生产可用。

对我来说，更深层的启发并不止于这个发行版。AI Coding 不会让架构变得不重要，反而会让架构更值得被认真追求。当实现速度提升到一定程度时，我们终于有机会在真实代码里验证更多想法，保留那些真正好的抽象，并把那些过去常常因为投入太大而半途妥协的系统真正做出来。

对于资深工程师来说，瓶颈正在从单纯的代码实现速度，转向品味、系统判断力，以及定义稳定边界的能力。对于初级工程师来说，真正该走的路不是追逐每一种看上去都很刺激的 AI 工作流，而是把基础能力练得更扎实，让加速真正产生复利：理解需求、阅读陌生系统、质疑假设，并识别出在系统快速变化时仍然必须保持正确的那些部分。AI Coding 降低了验证好设计的代价，但并没有降低工程判断本身的门槛。

Zh: SkyWalking GraalVM Distro：设计与基准测试

Sun, 15 Mar 2026 00:00:00 +0000

这篇文章会完整介绍我们如何把 Apache SkyWalking OAP 迁移到 GraalVM Native Image。目标不是做一次性移植，而是把这件事做成一套能持续跟上上游演进的流程。

如果你想看这项工作的更大背景，以及 AI Coding 如何让这个项目真正做得出来，请阅读：AI Coding 如何重塑软件架构师的工作方式。

为什么 GraalVM 在这里是刚需

GraalVM Native Image 可以把 Java 应用做 Ahead-of-Time（AOT）编译，生成独立可执行文件。对于像 SkyWalking OAP 这样的可观测性后端来说，这不是“锦上添花”的性能优化，而是明确的工程刚需。

可观测性平台必须是基础设施中最可靠的部分。它必须在自己要观测的那些故障发生时依然存活。在云原生环境里，工作负载会不断扩缩容、迁移和重启，负责观测一切的后端本身不能还是那个启动慢、空闲占用大、恢复缓慢的重型进程。

我们的基准测试结果让这个结论变得非常具体：

**启动时间：**约 5 ms 对比约 635 ms。在 Kubernetes 集群里，当 OAP Pod 被驱逐或重新调度时，635 ms 的差距意味着这段时间里的遥测数据可能会丢失。5 ms 的情况下，新 Pod 往往在大部分客户端还没感知到中断之前就已经重新开始接收数据了。
**空闲内存：**约 41 MiB 对比约 1.2 GiB。可观测性后端是 24/7 常驻运行的。在多租户或边缘部署场景里，基础 RSS 降了 97%，可以放进更小的节点，而不再必须占用一台专用机器。
**负载下内存：**在 20 RPS 下约 629 MiB 对比约 2.0 GiB。生产级负载下内存降了 70%，直接对应更少的节点、更低的云账单，以及在后端本身成为扩容瓶颈之前更多的余量。
**没有预热惩罚：**峰值吞吐可以更早发挥出来。JVM 的 JIT 编译器往往需要数分钟流量才能完成热点优化，在这段时间里，尾延迟更差，数据处理也会滞后。原生二进制没有同样的阶段。
**更小的攻击面：**不再需要完整 JDK 运行时，需要跟踪和修补的 CVE 也就少了很多。对于一个会接收整个集群所有服务数据的组件来说，这一点很重要。

这些都不是“小修小补”。它们直接改变了哪些部署形态开始变得可行：无服务器形态的可观测性后端、边车式采集模型、内存预算极其紧张的边缘节点。只有当后端足够轻、足够快时，这些方案才真正有落地空间。

挑战：一个成熟、动态特性很多的 Java 平台

SkyWalking OAP 身上有大型 Java 平台的所有典型问题：运行时字节码生成、重反射初始化、classpath 扫描、基于 SPI 的模块装配，以及动态 DSL 执行。这些机制方便扩展，但做 GraalVM native image 时全是障碍。

GraalVM 文档中列出的限制，只是问题的开始。在一个成熟的 OSS 平台里，这些限制会深深缠绕在多年积累下来的运行时设计决策中。常规的 GraalVM native image 很难处理运行时类生成、反射、动态发现和脚本执行，而这些在 SkyWalking OAP 中都不是零散存在的，它们本来就是系统设计的一部分。

在这个发行版的早期历史里，还有一座非常具体的大山。那时上游 SkyWalking 仍然高度依赖 Groovy 来处理 LAL、MAL 和 Hierarchy 脚本。理论上，它只是另一个“不支持运行时动态”的组件；但在实践里，Groovy 是整条路径上最大的障碍。它不仅仅是脚本执行问题，而是代表着一整套在 JVM 世界里极其便利、在 native image 世界里极其不友好的动态模型。

设计目标：让迁移这件事可以重复做

设计目标不是”把 native-image 跑通一次就完”，而是做出一套能反复用、能长期维护的迁移系统：

把运行时生成的产物前移到构建期。 OAL、MAL、LAL、Hierarchy 规则，以及 meter 相关的生成类，都在构建期完成编译并打包，而不是等到启动时才动态生成。
用确定性的加载机制替代动态发现。 classpath 扫描和运行时注册路径被转换为基于 manifest 的加载方式。
减少运行时反射，并在构建期生成 native 元数据。 反射配置不再依赖人工维护的猜测清单，而是根据真实 manifest 和扫描结果生成。
让上游同步边界保持清晰。 same-FQCN replacements 会被显式打包、列清单，并通过陈旧性检查守住边界。
让变化第一时间暴露出来。 一旦上游 provider、规则文件或被替换的源文件发生变化，测试就会失败，迫使我们做显式审查。

这才是最关键的架构转变。好的抽象和前瞻性，在 AI 时代并没有变得不重要，反而变得更重要了，因为它们决定了 AI 带来的速度，最终产出的是一个可维护的系统，还是一堆膨胀得更快的代码。

把运行时动态行为变成构建期产物

SkyWalking OAP 里有多个在 JVM 世界里很自然、但在 native image 里很棘手的动态子系统：

OAL 会在运行时生成类。
LAL、MAL 和 Hierarchy 在历史上与大量基于 Groovy 的运行时行为绑定在一起，这也是早期 distro 工作中最难处理的阻碍之一。
MAL、LAL 和 Hierarchy 规则依赖运行时编译行为。
基于 Guava 的 classpath 扫描会发现注解、dispatcher、decorator 和 meter function。
基于 SPI 的模块和 provider 发现依赖更动态的运行时环境。
YAML/config 初始化和框架集成依赖反射访问。

在 SkyWalking GraalVM Distro 里，这些问题不是靠零散补丁一个个修掉的，而是被统一收敛到一条构建期流水线里。

预编译器会在构建过程中运行 DSL 引擎、导出生成类、写入 manifest、序列化配置数据，并生成 native-image 元数据。这样一来，启动时只需要做类加载和注册，不再需要运行时代码生成。运行期之所以能变得更简单，是因为原本的复杂性被前移到了构建期。

这也是为什么这个项目不只是一次性能优化。我们的设计目标，是把复杂性前移到一个更容易验证、更容易自动化、也更便于反复执行的位置。

same-FQCN 替换：一条可控的边界

这个发行版里最实用的设计选择之一，就是使用 same-FQCN 替换类。我们没有依赖模糊的启动技巧，也没有依赖未文档化的加载顺序假设。相反，我们会重新打包 GraalVM 特定 jar，排除原本的上游类，再让替换类占据完全相同的 fully-qualified class name。

这对可维护性非常关键，因为它建立了一条非常清晰的边界：

上游类仍然定义行为契约；
GraalVM 侧的替换类提供兼容的实现策略；
打包过程则让这次替换变得显式可见。

例如，OAL 的加载过程从运行时编译变成了基于 manifest 的预编译类加载。类似的替换也处理了 MAL 和 LAL DSL 加载、模块装配、配置初始化，以及多个对反射敏感的路径。目标不是把一切都 fork 出去，而是只替换那些运行时模型从根本上不适合 native image 的部分。

随后，这条边界还会通过测试来守护：测试会对照与 replacement 对应的上游源文件做哈希。当上游改动了这些文件中的任何一个，构建就会失败，并明确告诉我们哪个 replacement 需要重新审查。这样一来，“如何跟上上游”就不再是一个充满焦虑的抽象问题，而变成一项明确、可落地的工程工作。

反射配置不是猜出来的，而是生成出来的

在很多 GraalVM 迁移项目里，reflect-config.json 最终会变成一个靠经验不断累积的工件。它会越来越大，越来越陈旧，最后没有人真正清楚它是不是完整，也不清楚每一项配置为什么存在。这种模式在一个持续演化的大型 OSS 平台里是无法扩展的。

在这个发行版里，反射元数据直接从构建产物和扫描结果中生成，包括：

OAL、MAL、LAL、Hierarchy 以及 meter 生成类的 manifest；
注解扫描得到的类；
Armeria HTTP handler；
GraphQL resolver 和 schema 映射类型；
被接受的 ModuleConfig 类。

这是一种健康得多的模式。我们不再依赖人去记住所有可能触发反射访问的路径，而是让系统根据真实迁移流水线推导出反射元数据。构建过程本身，成为了事实来源。

让上游同步变得现实可行

如果这个发行版只是一次性的工程冲刺，那它的意义会小很多。真正困难的事情，是在上游 SkyWalking 继续演进的同时，让它还能持续维护下去。

这也是为什么仓库里会有一整套显式的清单和漂移检测机制：

provider 清单，用来强制新上游 provider 被分类；
规则文件清单，用来强制新 DSL 输入被显式确认；
预编译 YAML 输入的 SHA watcher；
带 GraalVM 特定 replacement 的上游源文件 SHA watcher。

好的抽象不仅仅是代码结构优雅，更在于你是否选择了一种能在未来变化面前继续成立的迁移设计。

基准测试结果

我们在一台 Apple M3 Max（macOS、Docker Desktop、10 CPUs / 62.7 GB）上，对标准 JVM OAP 和 GraalVM Distro 做了对比测试，两者都连接到 BanyanDB。

启动测试（Docker Compose，无流量，3 次取中位数）

指标	JVM OAP	GraalVM OAP	差异
冷启动时间	635 ms	5 ms	约快 127 倍
热启动时间	630 ms	5 ms	约快 126 倍
空闲 RSS	约 1.2 GiB	约 41 MiB	约降低 97%

启动时间的测量方式，是从 OAP 第一条应用日志时间戳开始，到出现 listening on 11800 日志（即 gRPC 服务 ready）为止。

持续负载下（Kind + Istio 1.25.2 + Bookinfo，约 20 RPS，2 个 OAP 副本）

在 60 秒预热之后，每 10 秒采样一次，共 30 个样本。

指标	JVM OAP	GraalVM OAP	差异
CPU 中位数（millicores）	101	68	-33%
CPU 平均值（millicores）	107	67	-37%
内存中位数（MiB）	2068	629	-70%
内存平均值（MiB）	2082	624	-70%

两个版本报告的 entry-service CPM 一致，说明在这个测试负载下，两者的流量处理能力相同。

我们每 30 秒通过 swctl 对所有已发现服务收集这些指标： service_cpm、service_resp_time、service_sla、service_apdex、service_percentile。

完整的基准测试脚本和原始数据位于发行版仓库中的 benchmark/ 目录。

当前状态

这个项目已经是一个可运行的实验性发行版，托管在独立仓库中：apache/skywalking-graalvm-distro。

当前发行版有意聚焦在一种现代、高性能的运行模式上：

存储： BanyanDB
集群模式： Standalone 和 Kubernetes
配置方式： 无配置或 Kubernetes ConfigMap
运行模型： 固定模块集合、预编译产物和 AOT 友好的装配方式

这种聚焦是刻意的。要把迁移做成一套可重复的系统，第一步必须先把边界收清楚，做出一个真正能跑起来的版本，然后再在不失控的前提下逐步扩展。

快速开始

由于 SkyWalking GraalVM Distro 的设计目标就是追求极致性能，它目前最适合与 BanyanDB 存储后端搭配使用。当前发布的镜像已经可以在 Docker Hub 获取，你可以直接用下面这个 docker-compose.yml 启动整套系统。

version: '3.8'

services:
  banyandb:
    image: ghcr.io/apache/skywalking-banyandb:e1ba421bd624727760c7a69c84c6fe55878fb526
    container_name: banyandb
    restart: always
    ports:
      - "17912:17912"
      - "17913:17913"
    command: standalone --stream-root-path /tmp/stream-data --measure-root-path /tmp/measure-data --measure-metadata-cache-wait-duration 1m --stream-metadata-cache-wait-duration 1m
    healthcheck:
      test: ["CMD", "sh", "-c", "nc -nz 127.0.0.1 17912"]
      interval: 5s
      timeout: 10s
      retries: 120

  oap:
    image: apache/skywalking-graalvm-distro:0.1.1
    container_name: oap
    depends_on:
      banyandb:
        condition: service_healthy
    restart: always
    ports:
      - "11800:11800"
      - "12800:12800"
    environment:
      SW_STORAGE: banyandb
      SW_STORAGE_BANYANDB_TARGETS: banyandb:17912
      SW_HEALTH_CHECKER: default
    healthcheck:
      test: ["CMD-SHELL", "nc -nz 127.0.0.1 11800 || exit 1"]
      interval: 5s
      timeout: 10s
      retries: 120

  ui:
    image: ghcr.io/apache/skywalking/ui:10.3.0
    container_name: ui
    depends_on:
      oap:
        condition: service_healthy
    restart: always
    ports:
      - "8080:8080"
    environment:
      SW_OAP_ADDRESS: http://oap:12800

只需要执行：

docker compose up -d

欢迎社区来测试这个新发行版、提交 issue，并帮助我们推动它走向生产可用。

特别感谢 GraalVM 团队提供的技术基础。

Blog: How AI Changed the Economics of Architecture

Fri, 13 Mar 2026 00:00:00 +0000

SkyWalking GraalVM Distro: A case study in turning runnable PoCs into a repeatable migration pipeline.

The most important lesson from this project is not that AI can generate a large amount of code. It is that AI changes the economics of architecture. When runnable PoCs become cheap to build, compare, discard, and rebuild, architects can push further toward the design they actually want instead of stopping early at a compromise they can afford to implement.

That shift matters a lot in mature open source systems. Apache SkyWalking OAP has long been a powerful and production-proven observability backend, but it also carries all the realities of a large Java platform: runtime bytecode generation, reflection-heavy initialization, classpath scanning, SPI-based module wiring, and dynamic DSL execution that are friendly to extensibility but hostile to GraalVM native image.

SkyWalking GraalVM Distro is the result of treating that challenge as a design-system problem instead of a one-off porting exercise. The goal was not only to make OAP run as a native binary, but to turn GraalVM migration itself into a repeatable automation pipeline that can stay aligned with upstream evolution.

For the full technical design, benchmark data, and getting-started guide, see the companion post: SkyWalking GraalVM Distro: Design and Benchmarks.

From Paused Idea to Runnable System

This journey actually began years ago. Shortly after this repository was created, yswdqz spent several months exploring the transition. The project proved much harder in practice than the individual GraalVM limitations sounded on paper, and the work eventually paused for years.

That pause is important. The missing ingredient was not ideas. Mature maintainers usually have more ideas than time. The real constraint was implementation economics. Even when the architect can see several promising directions, limited developer resources force an earlier trade-off: choose the path that is cheapest to implement, not necessarily the path that is cleanest, most reusable, or most future-proof.

This is a very common reality, not an exceptional one. In open source communities, much of the work depends on volunteers or limited company sponsorship. In commercial products, the pressure is different but the constraint is still real: roadmap commitments, staffing limits, and delivery deadlines keep engineering resources tight. In both worlds, good ideas are often abandoned not because they are wrong, but because they are too expensive to validate and implement thoroughly.

There is another constraint that matters just as much: the architect is usually also a very senior engineer, not a full-time implementation machine. That means limited personal coding energy, fragmented time, and a constant need to explain ideas to other senior engineers before the code exists. Traditionally, that explanation happens through diagrams, documents, and conversations. It is slow, lossy, and unpredictable. We all know some version of the Telephone Game: even simple words are easy to misunderstand, and by the time the misunderstanding becomes visible, a lot of time has already passed.

What changed in late 2025 was that AI engineering made multiple runnable ideas affordable. Instead of picking an early compromise because implementation capacity was scarce, we could switch repeatedly between designs, validate them with code, discard weak directions quickly, and keep iterating until the architecture became solid, practical, and efficient enough to hold.

That design freedom was critical. GraalVM documentation gives clear guidance on isolated limitations, but a mature OSS platform hits them as a connected system. Fixing only one dynamic mechanism is not enough. To make native image practical, we had to turn whole categories of runtime behavior into build-time artifacts and automated metadata generation.

There was also a very concrete mountain in front of us in the early history of this distro. In the first several commits of the repository, upstream SkyWalking still relied heavily on Groovy for LAL, MAL, and Hierarchy scripts. In theory, that was just one more unsupported runtime-heavy component. In practice, Groovy was the biggest obstacle in the whole path. It represented not only script execution, but a whole dynamic model that was deeply convenient on the JVM side and deeply unfriendly to native image.

To bridge the gap, we re-architected the core engines of OAP around an AOT-first model. Earlier experiments had to confront Groovy-era runtime behavior directly and explore alternative script-compilation approaches to get around it. The finalized direction went further: align with the upstream compiler pipeline, move dynamic generation to build time, and add automation so the migration stays controllable as upstream keeps moving. Concretely, that meant turning OAL, MAL, LAL, and Hierarchy generation into build-time precompiler outputs instead of leaving them as startup-time dynamic behavior.

AI Speed Changed the Design Loop

The scale of this transformation was not only about coding faster. AI changed the loop between idea, prototype, validation, and redesign. We could build runnable PoCs for different approaches, throw away weak ones quickly, and preserve the promising abstractions until they formed a coherent migration system.

That does not reduce the role of human architecture. It raises the value of it. Human judgment was still required to decide what should become build-time, what should stay configurable, where to introduce same-FQCN replacements, how to keep upstream sync controllable, and which abstractions were worth preserving. But AI speed made it realistic to pursue those better designs instead of settling for a simpler compromise too early.

This is the real change in the economics of architecture. In the past, an architect might already know the cleaner direction, but limited engineering capacity often forced that vision back toward a cheaper compromise. Now the architect can return much closer to being a fast developer again: building code, shaping high-abstraction interfaces, and using design patterns to prove the vision directly in the real world.

That changes communication as much as implementation. In open source, we often say, talk is cheap, show me the code. With AI engineering, showing the code becomes much more straightforward. The design no longer depends so heavily on a slow top-down translation from idea to documents to interpretation to implementation. The code can appear earlier, and it can run earlier.

Other senior engineers benefit from this too. They do not need to reconstruct the whole design only from diagrams, meetings, or long explanations. They can review the actual abstraction, see the behavior in code, run it, challenge it, and refine it from something concrete. That makes architectural collaboration faster, clearer, and less lossy.

This is also where I think the current AI discussion is often noisy. Many projects are fun, surprising, and worth exploring, but advanced engineering work is not improved merely by attaching an agent to a codebase. The important question is not which demo looks most magical. The important question is which engineering capabilities are actually being accelerated without losing the discipline of software development itself.

For architects and senior engineers, the capabilities that mattered most here were:

Fast comparative prototyping: Building several runnable approaches in code instead of defending one idea with slides and documents.
Large-scale code comprehension: Reading across many modules quickly enough to keep the whole system in view.
Systematic refactoring: Converting reflection-heavy or runtime-dynamic paths into designs that fit AOT constraints.
Automation construction: When a migration step must be repeated every upstream sync, doing it manually once is already expensive. Doing it manually again next time is even more expensive. AI made it practical to invest in generators, inventories, consistency checks, and drift detectors that turn repeated manual work into repeatable automation.
Review at breadth: Checking edge cases, compatibility boundaries, and repeatability across a large surface area.

Those capabilities were visible in the resulting design. Same-FQCN replacements created a controlled boundary for GraalVM-specific behavior. Reflection metadata was generated from build outputs instead of maintained as a hand-written guess list. Inventories and drift detectors turned upstream sync from a vague maintenance risk into an explicit engineering workflow.

For junior engineers, I think the lesson is equally important. AI does not remove the need to learn architecture, invariants, interfaces, testing, or maintenance. It makes those skills more valuable, because they determine whether accelerated implementation produces a durable system or just more code faster. The leverage comes from engineering judgment, not from novelty.

Claude Code and Gemini AI acted as engineering accelerators throughout this process. In the GraalVM Distro specifically, they helped us:

Explore migration strategies as running code: Instead of debating which approach might work, we built and compared multiple real prototypes, discarded the weak ones, and kept what held up.
Refactor reflection-heavy and dynamic code paths: Replace runtime-hostile patterns with AOT-friendly alternatives across the codebase.
Make upstream sync sustainable: Every time the distro pulls from upstream SkyWalking, metadata scanning, config regeneration, and recompilation must happen again. AI helped build the pipeline so that each sync is a controlled, largely automated process rather than a fresh manual effort that grows longer each time.
Review logic and edge cases at scale: Especially in places where feature parity mattered more than raw implementation speed.

The result was not just a large rewrite. It was a repeatable system: precompilers, manifest-driven loading, reflection-config generation, replacement boundaries, and drift detectors that make upstream migration reviewable and automatable.

For the broader methodology behind this style of development, see Agentic Vibe Coding in a Mature OSS Project. This post is the next step in that story: not only enhancing an active mature codebase, but reviving a paused effort and making it actually runnable.

What Actually Changed

The most important outcome of this project is not a benchmark table. The benchmark results belong to the distro itself, and they matter because they prove the system is real. But for this post, the deeper result is methodological: AI engineering changed how architecture could be explored, validated, and refined.

Instead of treating architecture as a mostly document-driven activity followed by a long and expensive implementation phase, we were able to move much faster between idea, prototype, comparison, and redesign. That made it realistic to pursue higher-abstraction solutions, preserve cleaner boundaries, and build the automation needed to keep the migration maintainable over time.

The technical evidence for that work is the SkyWalking GraalVM Distro itself: not only a runnable system, but a migration pipeline expressed as precompilers, generated reflection metadata, controlled replacement boundaries, and drift checks. The benchmark data matter because they prove the system works in practice, but the architectural result is that the migration became a repeatable system rather than a one-time port. For detailed benchmark methodology, per-pod data, and the full technical design, see SkyWalking GraalVM Distro: Design and Benchmarks.

The project is hosted at apache/skywalking-graalvm-distro. We invite the community to test it, report issues, and help move it toward production readiness.

For me, the deeper takeaway is broader than this distro. AI engineering does not make architecture less important. It makes architecture more worth pursuing. When implementation speed rises enough, we can afford to test more ideas in code, keep the good abstractions, and build systems that would previously have been judged too expensive to finish well.

For senior engineers, that means the bottleneck shifts away from raw typing speed and toward taste, system judgment, and the ability to define stable boundaries. For junior engineers, it means the path forward is not to chase every exciting AI workflow, but to become stronger at the fundamentals that let acceleration compound: understanding requirements, reading unfamiliar systems, questioning assumptions, and recognizing what must remain correct as everything around it changes. AI changed the economics of architecture because it lowered the cost of validating better designs without lowering the bar for engineering judgment.

Blog: SkyWalking GraalVM Distro: Design and Benchmarks

Fri, 13 Mar 2026 00:00:00 +0000

A technical deep-dive into how we migrated Apache SkyWalking OAP to GraalVM Native Image — not as a one-off port, but as a repeatable pipeline that stays aligned with upstream.

For the broader story of how AI engineering made this project economically viable, see How AI Changed the Economics of Architecture.

Why GraalVM Is Not Optional

GraalVM Native Image compiles Java applications Ahead-of-Time (AOT) into standalone executables. For an observability backend like SkyWalking OAP, this is not a performance optimization — it is an operational necessity.

An observability platform must be the most reliable component in the infrastructure. It has to survive the failures it is supposed to observe. In cloud-native environments where workloads scale, migrate, and restart constantly, the backend that watches everything cannot itself be the slow, heavy process that takes seconds to recover and gigabytes to idle.

Our benchmarks make the case concrete:

Startup: ~5 ms vs ~635 ms. In a Kubernetes cluster where an OAP pod gets evicted or rescheduled, a 635 ms gap means lost telemetry — traces, metrics, and logs that arrive during that window are simply dropped. At 5 ms, the new pod is receiving data before most clients even notice the disruption.
Idle memory: ~41 MiB vs ~1.2 GiB. Observability backends run 24/7. In a multi-tenant or edge deployment, a 97% reduction in baseline RSS is the difference between fitting the observability stack on a small node and needing a dedicated one.
Memory under load: ~629 MiB vs ~2.0 GiB at 20 RPS. A 70% reduction at production-like traffic means fewer nodes, lower cloud bills, and more headroom before the backend itself becomes a scaling bottleneck.
No warm-up penalty: Peak throughput is available from the first request. The JVM’s JIT compiler needs minutes of traffic before it optimizes hot paths — during that window, tail latency is worse and data processing lags behind. A native binary has no such phase.
Smaller attack surface: No JDK runtime means fewer CVEs to track and patch. For a component that ingests data from every service in the cluster, that matters.

These are not incremental improvements. They change what deployment topologies are practical. Serverless observability backends, sidecar-model collectors, edge nodes with tight memory budgets — all become realistic when the backend is this light and this fast.

The Challenge: A Mature, Dynamic Java Platform

SkyWalking OAP carries all the realities of a large Java platform: runtime bytecode generation, reflection-heavy initialization, classpath scanning, SPI-based module wiring, and dynamic DSL execution. These patterns are friendly to extensibility but hostile to GraalVM native image.

The documented GraalVM limitations are only the beginning. In a mature OSS platform, those limitations are deeply entangled with years of runtime design decisions. Standard GraalVM native images struggle with runtime class generation, reflection, dynamic discovery, and script execution — all of which had deep roots in SkyWalking OAP.

There was also a very concrete mountain in the early history of this distro. Upstream SkyWalking relied heavily on Groovy for LAL, MAL, and Hierarchy scripts. In theory, that was just one more unsupported runtime-heavy component. In practice, Groovy was the biggest obstacle in the whole path. It represented not only script execution, but a whole dynamic model that was deeply convenient on the JVM side and deeply unfriendly to native image.

The Design Goal: Make Migration Repeatable

The final design is not just “run native-image successfully.” It is a system that keeps migration work repeatable:

Pre-compile runtime-generated assets at build time. OAL, MAL, LAL, Hierarchy rules, and meter-related generated classes are compiled during the build and packaged as artifacts instead of being generated at startup.
Replace dynamic discovery with deterministic loading. Classpath scanning and runtime registration paths are converted into manifest-driven loading.
Reduce runtime reflection and generate native metadata from the build. Reflection configuration is produced from actual manifests and scanned classes instead of being maintained as a hand-written guess list.
Keep the upstream sync boundary explicit. Same-FQCN replacements are intentionally packaged, inventoried, and guarded with staleness checks.
Make drift visible immediately. If upstream providers, rule files, or replaced source files change, tests fail and force explicit review.

That is the architectural shift that matters most. Reusable abstraction and foresight did not become less important in the AI era. They became more important, because they determine whether AI speed produces a maintainable system or just a fast-growing pile of code.

Turning Runtime Dynamism into Build-Time Assets

SkyWalking OAP has several dynamic subsystems that are natural in a JVM world but problematic for native image:

OAL generates classes at runtime.
LAL, MAL, and Hierarchy were historically tied to Groovy-heavy runtime behavior, which became one of the biggest practical blockers in the early distro work.
MAL, LAL, and Hierarchy rules depend on runtime compilation behavior.
Guava-based classpath scanning discovers annotations, dispatchers, decorators, and meter functions.
SPI-based module/provider discovery expects a more dynamic runtime environment.
YAML/config initialization and framework integrations depend on reflective access.

In SkyWalking GraalVM Distro, these are not solved one by one as isolated patches. They are pulled into a build-time pipeline.

The precompiler runs the DSL engines during the build, exports generated classes, writes manifests, serializes config data, and generates native-image metadata. That means startup becomes class loading and registration, not runtime code generation. The runtime path is simpler because the build path became richer.

This is also why the project is more than a performance exercise. The design goal was to move complexity into a place where it is easier to verify, easier to automate, and easier to repeat.

Same-FQCN Replacements as a Controlled Boundary

One of the most practical design choices in this distro is the use of same-FQCN replacement classes. We do not rely on vague startup tricks or undocumented ordering assumptions. Instead, the GraalVM-specific jars are repackaged so the original upstream classes are excluded and the replacement classes occupy the exact same fully-qualified names.

This matters for maintainability. It creates a very clear boundary:

the upstream class still defines the behavior contract,
the GraalVM replacement provides a compatible implementation strategy,
and the packaging makes that swap explicit.

For example, OAL loading changes from runtime compilation into manifest-driven loading of precompiled classes. Similar replacements handle MAL and LAL DSL loading, module wiring, config initialization, and several reflection-sensitive paths. The goal is not to fork everything. The goal is to replace only the places where the runtime model is fundamentally unfriendly to native image.

That boundary is then guarded by tests that hash the upstream source files corresponding to the replacements. When upstream changes one of those files, the build fails and tells us exactly which replacement needs review. This is what turns “keeping up with upstream” from an anxiety problem into a visible engineering task.

Reflection Config Is Generated, Not Guessed

In many GraalVM migrations, reflect-config.json becomes a manually accumulated artifact. It grows over time, gets stale, and nobody is fully sure whether it is complete or why each entry exists. That approach does not scale well for a large, evolving OSS platform.

In this distro, reflection metadata is generated from the build outputs and scanned classes:

manifests for OAL, MAL, LAL, Hierarchy, and meter-generated classes,
annotation-scanned classes,
Armeria HTTP handlers,
GraphQL resolvers and schema-mapped types,
and accepted ModuleConfig classes.

This is a much healthier model. Instead of asking people to remember every reflective access path, the system derives reflection metadata from the actual migration pipeline. The build becomes the source of truth.

Keeping Upstream Sync Practical

If this distro were only a one-time engineering sprint, it would be much less interesting. The real challenge is keeping it alive while upstream SkyWalking continues to evolve.

That is why the repo includes explicit inventories and drift detectors:

provider inventories that force new upstream providers to be categorized,
rule-file inventories that force new DSL inputs to be acknowledged,
SHA watchers for precompiled YAML inputs,
and SHA watchers for upstream source files with GraalVM-specific replacements.

Good abstraction is not only about elegant code structure. It is about choosing a migration design that can survive contact with future change.

Benchmark Results

We benchmarked the standard JVM OAP against the GraalVM Distro on an Apple M3 Max (macOS, Docker Desktop, 10 CPUs / 62.7 GB), both connecting to BanyanDB.

Boot Test (Docker Compose, no traffic, median of 3 runs)

Metric	JVM OAP	GraalVM OAP	Delta
Cold boot startup	635 ms	5 ms	~127x faster
Warm boot startup	630 ms	5 ms	~126x faster
Idle RSS	~1.2 GiB	~41 MiB	~97% reduction

Boot time is measured from OAP’s first application log timestamp to the listening on 11800 log line (gRPC server ready).

Under Sustained Load (Kind + Istio 1.25.2 + Bookinfo at ~20 RPS, 2 OAP replicas)

30 samples at 10s intervals after 60s warmup.

Metric	JVM OAP	GraalVM OAP	Delta
CPU median (millicores)	101	68	-33%
CPU avg (millicores)	107	67	-37%
Memory median (MiB)	2068	629	-70%
Memory avg (MiB)	2082	624	-70%

Both variants reported identical entry-service CPM, confirming equivalent traffic processing capability.

Service metrics collected every 30s via swctl for all discovered services: service_cpm, service_resp_time, service_sla, service_apdex, service_percentile.

Full benchmark scripts and raw data are in the benchmark/ directory of the distro repository.

Current Status

The project is a runnable experimental distribution, hosted in its own repository: apache/skywalking-graalvm-distro.

The current distro intentionally focuses on a modern, high-performance operating model:

Storage: BanyanDB
Cluster modes: Standalone and Kubernetes
Configuration: none or Kubernetes ConfigMap
Runtime model: fixed module set, precompiled assets, and AOT-friendly wiring

This focus is deliberate. A repeatable migration system starts by making a clear scope runnable, then expanding without losing control.

Getting Started

Because the SkyWalking GraalVM Distro is designed for peak performance, it is optimized to work with BanyanDB as its storage backend. The current published image is available on Docker Hub, and you can boot the stack using the following docker-compose.yml.

version: '3.8'

services:
  banyandb:
    image: ghcr.io/apache/skywalking-banyandb:e1ba421bd624727760c7a69c84c6fe55878fb526
    container_name: banyandb
    restart: always
    ports:
      - "17912:17912"
      - "17913:17913"
    command: standalone --stream-root-path /tmp/stream-data --measure-root-path /tmp/measure-data --measure-metadata-cache-wait-duration 1m --stream-metadata-cache-wait-duration 1m
    healthcheck:
      test: ["CMD", "sh", "-c", "nc -nz 127.0.0.1 17912"]
      interval: 5s
      timeout: 10s
      retries: 120

  oap:
    image: apache/skywalking-graalvm-distro:0.1.1
    container_name: oap
    depends_on:
      banyandb:
        condition: service_healthy
    restart: always
    ports:
      - "11800:11800"
      - "12800:12800"
    environment:
      SW_STORAGE: banyandb
      SW_STORAGE_BANYANDB_TARGETS: banyandb:17912
      SW_HEALTH_CHECKER: default
    healthcheck:
      test: ["CMD-SHELL", "nc -nz 127.0.0.1 11800 || exit 1"]
      interval: 5s
      timeout: 10s
      retries: 120

  ui:
    image: ghcr.io/apache/skywalking/ui:10.3.0
    container_name: ui
    depends_on:
      oap:
        condition: service_healthy
    restart: always
    ports:
      - "8080:8080"
    environment:
      SW_OAP_ADDRESS: http://oap:12800

Simply run:

docker compose up -d

We invite the community to test this new distribution, report issues, and help us move it toward a production-ready state.

Special thanks to the GraalVM team for the technology foundation.

Blog: Profiling Java application with SkyWalking bundled async-profiler

Mon, 09 Dec 2024 00:00:00 +0000

Background

Apache SkyWalking is an open-source Application Performance Management system that helps users gather logs, traces, metrics, and events from various platforms and display them on the UI. In version 10.1.0, Apache SkyWalking can perform CPU analysis through eBPF, which supports multiple languages, but not Java. This article discusses how Apache SkyWalking 10.2.0 uses async-profiler to collect CPU, memory allocation, and locks for analysis, solving this limitation, and also provides memory allocation and occupancy analysis.

Why use async-profiler

The async-profiler is a low overhead sampling profiler for Java that does not suffer from the Safepoint bias problem. It features HotSpot-specific API to collect stack traces and to track memory allocations. The profiler works with OpenJDK and other Java runtimes based on the HotSpot JVM. The async-profiler also officially supports the instruction set architectures commonly used on Linux and Mac platforms, and the sampling data can be stored in the JFR format. Compared with the JFR tool officially provided by JDK, it supports lower JDK versions (JDK 6).

Architecture diagram

The processes of running a profiling task

A user submits a async-profiler task in the UI
The Java agent retrieves the task from the OAP Server
Java agent excuses the task to collect profiling data sampling through async-profiler
After the profiling is completed, the agent uploads the JFR file to the OAP server.
The server parses the JFR file to generate profiling results and marks the task as completed status.
The user could check the performance analysis result from the UI side.

Demo

You can setup SkyWalking showcase locally to preview this feature. In this demo, we only deploy service, the latest released SkyWalking OAP, and UI.

export FEATURE_FLAGS=java-agent-injector,single-node,elasticsearch
make deploy.kubernetes

After deployment is complete, please run the following script to open SkyWalking UI: http://localhost:8080/.

kubectl port-forward svc/ui 8080:8080 --namespace default

Run the Async Profiling Task Step by Step

After the deployment is complete, users can navigate to the service page where the Java agent is configured. Upon entering the service page, users will be able to see the Async Profiling component. By clicking on this component, users will gain access to the relevant functionality page, where they can perform some operations.

Create a New Task

Clicking New Task on the Async Profiling page will direct you to the following configuration page. The usage of each parameter is explained as follows:

Instance: This parameter allows you to select the instance of the service that will execute the profiling. It supports selecting multiple instances simultaneously for performance analysis.
Duration: Specifies the duration for the task. The default duration is conservatively set to a maximum of 20 minutes, but this can be adjusted through the Java agent configuration.
Async Profiling Events: The profiling events are categorized into three types of sampling, which will be explained below:
- CPU Sampling: CPU, WALL, CTIMER, ITIMER. See the differences between these four CPU sampling types.
- Memory Allocation Sampling: ALLOC.
- Lock Occupancy Sampling: LOCK.
ExecArgs: Extended parameters for async-profiler. Detailed usage instructions are available.

Check the Progresses Of the Task

By clicking the task details icon, users can view the task status logs, relevant parameters, as well as instances where data collection has either failed or been successfully completed. Instances that have successfully completed data collection will be available for subsequent performance analysis.

It is important to note that, in containerized deployments where users have not configured volume mounts, there may be cases where JFR files cannot be received. To address this, the OAP Server by default uses memory to receive and parse JFR files. The maximum acceptable size for JFR files is conservatively set to 30MB by default.

Users can customize the default JFR file size in the OAP configuration and opt to store the files on the filesystem before parsing them, enabling the platform to handle larger JFR files and ensuring smoother memory allocation.

Currently, the JFR parser requires approximately 1GB of memory to process a 200MB JFR file. (Note that this refers only to memory allocation, not the actual memory required for parsing.) Users can use this as a reference when configuring their OAP Server

Performance Analysis

Users can select a task and choose the instances they wish to analyze for performance (multiple instances can be selected for aggregated flame graph analysis). After selecting the desired JFR event type for analysis, users can click the Analyze button to display the corresponding flame graph.

Some Details

Differences in CPU sampling during task creation

The CPU sampling mechanism supports several modes, each representing a different sampling engine implemented by async-profiler. These modes include CPU, WALL, CTIMER, and ITIMER, and differ primarily in how they collect and generate sampling signals. The following provides a detailed description of each sampling:

CPU: cpu mode relies on perf_events. The idea is the same - to generate a signal every N nanoseconds of CPU time, which in this case is achieved by configuring PMU to generate an interrupt every K CPU cycles.
WALL: Same as CPU sampling, but also samples threads in non-runnable state, such as threads in sleep
ITIMER: itimer mode is based on setitimer(ITIMER_PROF) syscall, which ideally generates a signal every given interval of the CPU time consumed by the process.
CTIMER: ctimer aims to address these limitations of perf_events and itimer. ctimer relies on timer_create. It combines benefits of cpu and itimer, except that it does not allow collecting kernel stacks.

For details, please refer to async-profiler

ExecArgs in task creation

By default, task parameters are separated by commas. When creating a task, users should refer to the following example format for input: lock=10us,interval=10ms.

Currently, the following parameters are supported by default:

Option	Description
chunksize=N	approximate size of JFR chunk in bytes (default: 100 MB)
chunktime=N	duration of JFR chunk in seconds (default: 1 hour)
lock[=DURATION]	profile contended locks overflowing the DURATION ns bucket
jstackdepth=N	maximum Java stack depth (default: 2048)
interval=N	sampling interval in ns (default: 10'000'000, i.e. 10 ms)
alloc[=BYTES]	profile allocations with BYTES interval

For other parameters, please refer to async-profiler and need to be tested by yourself

Comparison table between sampling types and JFR events in task analysis

Task sample type	JFR event type	Description	Unit
CPU WALL ITIMER CTIMER	EXECUTION_SAMPLE	Multiple AsyncProfilerEventType types correspond to the EXECUTION_SAMPLE event. This is primarily due to the fact that different sampling types employ distinct underlying mechanisms and have varying sampling scopes.	Sample times. The execution time can be calculated based on the sampling interval. For instance, if the number of samples is 10 and the interval is set to 10ms, the total execution time can be estimated as 100ms (the default interval is 10ms)
LOCK	THREAD_PARK JAVA_MONITOR_ENTER	Empty	ns
ALLOC	OBJECT_ALLOCATION_IN_NEW_TLAB OBJECT_ALLOCATION_OUTSIDE_TLAB	Empty	byte
Add `live` option to extended parameters	PROFILER_LIVE_OBJECT	Because it is not in the event parameter of async-profiler, it is not selected separately in the task sampling type of the UI during implementation, but is used as an extended parameter	byte

Performance expenses

There is no performance overhead when an instance is not receiving an async-profiler task. Performance impact is only introduced once the async-profiler performance analysis is initiated. The extent of this overhead depends on the specific configuration parameters. When using the default settings, the performance impact typically ranges from 0.3% to 10%. For more detailed information, please refer to the issue.

Blog: SkyWalking 10 Release: Service Hierarchy, Kubernetes Network Monitoring by eBPF, BanyanDB, and More

Mon, 13 May 2024 00:00:00 +0000

The Apache SkyWalking team today announced the 10 release. SkyWalking 10 provides a host of groundbreaking features and enhancements. The introduction of Layer and Service Hierarchy streamlines monitoring by organizing services and metrics into distinct layers and providing seamless navigation across them. Leveraging eBPF technology, Kubernetes Network Monitoring delivers granular insights into network traffic, topology, and TCP/HTTP metrics. BanyanDB emerges as a high-performance native storage solution, while expanded monitoring support encompasses Apache RocketMQ, ClickHouse, and Apache ActiveMQ Classic. Support for Multiple Labels Names enhances flexibility in metrics analysis, while enhanced exporting and querying capabilities streamline data dissemination and processing.

This release blog briefly introduces these new features and enhancements as well as some other notable changes.

Layer and Service Hierarchy

Layer concept was introduced in SkyWalking 9.0.0, it represents an abstract framework in computer science, such as Operating System(OS_LINUX layer), Kubernetes(k8s layer). It organizes services and metrics into different layers based on their roles and responsibilities in the system. SkyWalking provides a suite of monitoring and diagnostic tools for each layer, but there is a gap between the layers, which can not easily bridge the data across different layers.

In SkyWalking 10, SkyWalking provides new abilities to jump/connect across different layers and provide a seamless monitoring experience for users.

Layer Jump

In the topology graph, users can click on a service node to jump to the dashboard of the service in another layer. The following figures show the jump from the GENERAL layer service topology to the VIRTUAL_DATABASE service layer dashboard by clicking the topology node. Figure 1: Layer Jump

Figure 2: Layer jump Dashboard

Service Hierarchy

SkyWalking 10 introduces a new concept called Service Hierarchy, which defines the relationships of existing logically same services in various layers. OAP will detect the services from different layers, and try to build the connections. Users can click the Hierarchy Services in any layer’s service topology node or service dashboard to get the Hierarchy Topology. In this topology graph, users can see the relationships between the services in different layers and the summary of the metrics and also can jump to the service dashboard in the layer. When a service occurs performance issue, users can easily analyze the metrics from different layers and track down the root cause:

The examples of the Service Hierarchy relationships:

The application song deployed in the Kubernetes cluster with SkyWalking agent and Service Mesh at the same time. So the application song across the GENERAL, MESH, MESH_DP and K8S_SERVICE layers which could be monitored by SkyWalking, the Service Hierarchy topology as below: Figure 3: Service Hierarchy Agent With K8s Service And Mesh With K8s Service.

And can also have the Service Instance Hierarchy topology to get the single instance status across the layers as below: Figure 4: Instance Hierarchy Agent With K8s Service(Pod)
The PostgreSQL database psql deployed in the Kubernetes cluster and used by the application song. So the database psql across the VIRTUAL_DATABASE, POSTGRESQL and K8S_SERVICE layers which could be monitored by SkyWalking, the Service Hierarchy topology as below: Figure 5: Service Hierarchy Agent(Virtual Database) With Real Database And K8s Service

For more supported layers and how to detect the relationships between services in different layers please refer to the Service Hierarchy. how to configure the Service Hierarchy in SkyWalking, please refer to the Service Hierarchy Configuration section.

Monitoring Kubernetes Network Traffic by using eBPF

In the previous version, skyWalking provides Kubernetes (K8s) monitoring from kube-state-metrics and cAdvisor, which can monitor the Kubernetes cluster status and the metrics of the Kubernetes resources.

In SkyWalking 10, by leverage Apache SkyWalking Rover 0.6+, SkyWalking has the ability to monitor the Kubernetes network traffic by using eBPF, which can collect and map access logs from applications in Kubernetes environments. Through these data, SkyWalking can analyze and provide the Service Traffic, Topology, TCP/HTTP level metrics from the Kubernetes aspect.

The following figures show the Topology and TCP Dashboard of the Kubernetes network traffic:

Figure 6: Kubernetes Network Traffic Topology

Figure 7: Kubernetes Network Traffic TCP Dashboard

More details about how to monitor the Kubernetes network traffic by using eBPF, please refer to the Monitoring Kubernetes Network Traffic by using eBPF.

BanyanDB - Native APM Database

BanyanDB 0.6.0 and BanyanDB Java client 0.6.0 are released with SkyWalking 10, As a native storage solution for SkyWalking, BanyanDB is going to be SkyWalking’s next-generation storage solution. This is recommended to use for medium-scale deployments from 0.6 until 1.0.
It has shown high potential performance improvement. Less than 50% CPU usage and 50% memory usage with 40% disk volume compared to Elasticsearch in the same scale.

Apache RocketMQ Server Monitoring

Apache RocketMQ is an open-source distributed messaging and streaming platform, which is widely used in various scenarios including Internet, big data, mobile Internet, IoT, and other fields. SkyWalking provides a basic monitoring dashboard for RocketMQ, which includes the following metrics:

Cluster Metrics: including messages produced/consumed today, total producer/consumer TPS, producer/consumer message size, messages produced/consumed until yesterday, max consumer latency, max commitLog disk ratio, commitLog disk ratio, pull/send threadPool queue head wait time, topic count, and broker count.
Broker Metrics: including produce/consume TPS, producer/consumer message size.
Topic Metrics: including max producer/consumer message size, consumer latency, producer/consumer TPS, producer/consumer offset, producer/consumer message size, consumer group count, and broker count.

The following figure shows the RocketMQ Cluster Metrics dashboard: Figure 8: Apache RocketMQ Server Monitoring

For more metrics and details about the RocketMQ monitoring, please refer to the Apache RocketMQ Server Monitoring,

ClickHouse Server Monitoring

ClickHouse is an open-source column-oriented database management system that allows generating analytical data reports in real-time, it is widely used for online analytical processing (OLAP). ClickHouse monitoring provides monitoring of the metrics 、events and asynchronous metrics of the ClickHouse server, which includes the following parts of metrics:

Server Metrics
Query Metrics
Network Metrics
Insert Metrics
Replica Metrics
MergeTree Metrics
ZooKeeper Metrics
Embedded ClickHouse Keeper Metrics

The following figure shows the ClickHouse Cluster Metrics dashboard: Figure 9: ClickHouse Server Monitoring

For more metrics and details about the ClickHouse monitoring, please refer to the ClickHouse Server Monitoring, and here is a blog that can help for a quick start Monitoring ClickHouse through SkyWalking.

Apache ActiveMQ Server Monitoring

Apache ActiveMQ Classic is a popular and powerful open-source messaging and integration pattern server. SkyWalking provides a basic monitoring dashboard for ActiveMQ, which includes the following metrics:

Cluster Metrics: including memory usage, rates of write/read, and average/max duration of write.
Broker Metrics: including node state, number of connections, number of producers/consumers, and rate of write/read under the broker. Depending on the cluster mode, one cluster may include one or more brokers.
Destination Metrics: including number of producers/consumers, messages in different states, queues, and enqueue duration in a queue/topic.

The following figure shows the ActiveMQ Cluster Metrics dashboard: Figure 10: Apache ActiveMQ Server Monitoring

For more metrics and details about the ActiveMQ monitoring, please refer to the Apache ActiveMQ Server Monitoring, and here is a blog that can help for a quick start Monitoring ActiveMQ through SkyWalking.

Support Multiple Labels Names

Before SkyWalking 10, SkyWalking does not store the labels names in the metrics data, which makes MQE have to use _ as the generic label name, it can’t query the metrics data with multiple labels names.

SkyWalking 10 supports storing the labels names in the metrics data, and MQE can query or calculate the metrics data with multiple labels names. For example: The k8s_cluster_deployment_status metric has labels namespace, deployment and status. If we want to query all deployment metric values with namespace=skywalking-showcase and status=true, we can use the following expression:

k8s_cluster_deployment_status{namespace='skywalking-showcase', status='true'}

related enhancement:

Since Alarm rule configuration had migrated to the MQE in SkyWalking 9.6.0, the alarm rule also supports multiple labels names.
PromeQL service supports multiple labels names query.

Metrics gRPC exporter

SkyWalking 10 enhanced the metrics gPRC exporter, it supports exporting all types of metrics data to the gRPC server.

SkyWalking Native UI Metrics Query Switch to V3 APIs

SkyWalking Native UI metrics query deprecate the V2 APIs, and all migrated to V3 APIs and MQE.

Other Notable Enhancements

Support Java 21 runtime and oap-java21 image for Java 21 runtime.
Remove CLI(swctl) from the image.
More MQE functions and operators supported.
Enhance the native UI and improve the user experience.
Several bugs and CVEs fixed.

Zh: SkyWalking 10 发布：服务层次结构、基于 eBPF 的 Kubernetes 网络监控、BanyanDB 等

Mon, 13 May 2024 00:00:00 +0000

Apache SkyWalking 团队今天宣布发布 SkyWalking 10。SkyWalking 10 提供了一系列突破性的功能和增强功能。Layer 和 Service Hierarchy 的引入通过将服务和指标组织成不同的层次，并提供跨层无缝导航，从而简化了监控。利用 eBPF 技术，Kubernetes 网络监控提供了对网络流量、拓扑和 TCP/HTTP 指标的详细洞察。BanyanDB 作为高性能的原生存储解决方案出现，同时扩展的监控支持包括 Apache RocketMQ、ClickHouse 和 Apache ActiveMQ Classic。对多标签名称的支持增强了指标分析的灵活性，而增强的导出和查询功能简化了数据分发和处理。

本文简要介绍了这些新功能和增强功能以及其他一些值得注意的变化。

Layer 和 Service Hierarchy

Layer 概念是在 SkyWalking 9.0.0 中引入的，它代表计算机科学中的一个抽象框架，例如操作系统（OS_LINUX layer）、Kubernetes（k8s layer）。它根据系统中服务和指标的角色和职责将其组织到不同的层次。SkyWalking 为每个层提供了一套监控和诊断工具，但层之间存在 gap，无法轻松跨层桥接数据。

在 SkyWalking 10 中，SkyWalking 提供了跨层跳转/连接的新功能，为用户提供无缝的监控体验。

Layer Jump

在拓扑图中，用户可以点击服务节点跳转到另一层服务的仪表板。下图显示了通过点击拓扑节点从 GENERAL 层服务拓扑跳转到 VIRTUAL_DATABASE 服务层仪表板的过程。

Service Hierarchy

SkyWalking 10 引入了一个新概念，称为 Service Hierarchy，它定义了各层中现有逻辑相同服务的关系。OAP 将检测不同层次的服务，并尝试建立连接。用户可以点击任何层的服务拓扑节点或服务仪表板中的 Hierarchy Services 获取 Hierarchy Topology。在此拓扑图中，用户可以看到不同层次服务之间的关系和指标摘要，并且可以跳转到该层的服务仪表板。当服务发生性能问题时，用户可以轻松分析不同层次的指标并找出根本原因：

以下是 Service Hierarchy 关系的示例：

应用程序 song 同时在 Kubernetes 集群中部署了 SkyWalking agent 和 Service Mesh。因此，应用程序 song 跨越了 GENERAL、MESH、MESH_DP 和 K8S_SERVICE 层，SkyWalking 可以监控这些层次，Service Hierarchy 拓扑如下：

还可以有 Service Instance Hierarchy 拓扑来获取跨层的单实例状态，如下所示：

在 Kubernetes 集群中部署并由应用程序 song 使用的 PostgreSQL 数据库 psql。因此，数据库 psql 跨越 VIRTUAL_DATABASE、POSTGRESQL 和 K8S_SERVICE 层，SkyWalking 可以监控这些层次，Service Hierarchy 拓扑如下：

有关更多支持的层次以及如何检测不同层次服务之间的关系，请参阅 Service Hierarchy。有关如何在 SkyWalking 中配置 Service Hierarchy，请参阅 Service Hierarchy Configuration 部分。

使用 eBPF 监控 Kubernetes 网络流量

在之前的版本中，SkyWalking 提供了来自 kube-state-metrics 和 cAdvisor 的 Kubernetes (K8s) 监控，它可以监控 Kubernetes 集群状态和 Kubernetes 资源的指标。

在 SkyWalking 10 中，通过利用 Apache SkyWalking Rover 0.6+，SkyWalking 具有使用 eBPF 监控 Kubernetes 网络流量的能力，可以收集和映射 Kubernetes 环境中应用程序的访问日志。通过这些数据，SkyWalking 可以从 Kubernetes 角度分析和提供服务流量、拓扑、TCP/HTTP 级别指标。

下图显示了 Kubernetes 网络流量的拓扑和 TCP 仪表板：

有关如何使用 eBPF 监控 Kubernetes 网络流量的更多详细信息，请参阅使用 eBPF 监控 Kubernetes 网络流量。

BanyanDB - 原生 APM 数据库

BanyanDB 0.6.0 和 BanyanDB Java 客户端 0.6.0 随 SkyWalking 10 一起发布。作为 SkyWalking 的原生存储解决方案，BanyanDB 将成为 SkyWalking 的下一代存储解决方案。推荐在 0.6 到 1.0 期间用于中等规模的部署。它展示了高性能改进的潜力。与 Elasticsearch 在同一规模下相比，CPU 使用率降低 50%，内存使用率降低 50%，磁盘使用量减少 40%。

Apache RocketMQ 服务器监控

Apache RocketMQ 是一个开源的分布式消息和流平台，广泛应用于互联网、大数据、移动互联网、物联网等领域。SkyWalking 为 RocketMQ 提供了一个基本的监控仪表板，包括以下指标：

Cluster Metrics：包括当天产生/消费的消息数、总生产者/消费者 TPS、生产者/消费者消息大小、截至昨天产生/消费的消息数、最大消费者延迟、最大 commitLog 磁盘比、commitLog 磁盘比、拉取/发送线程池队列头等待时间、topic count 和 broker count。
Broker Metrics：包括生产/消费 TPS、生产者/消费者消息大小。
Topic Metrics：包括最大生产者/消费者消息大小、消费者延迟、生产/消费 TPS、生产/消费偏移、生产/消费消息大小、消费者组数和代理数。

下图显示了 RocketMQ Cluster Metrics 仪表板：

有关 RocketMQ 监控的更多指标和详细信息，请参阅 Apache RocketMQ 服务器监控。

ClickHouse Server 监控

ClickHouse 是一个开源的列式数据库管理系统，可以实时生成分析数据报告，广泛用于在线分析处理 (OLAP)。ClickHouse 监控提供了 ClickHouse 服务器的指标、事件和异步指标的监控，包括以下部分的指标：

Server Metrics
Query Metrics
Network Metrics
Insert Metrics
Replica Metrics
MergeTree Metrics
ZooKeeper Metrics
Embedded ClickHouse Keeper Metrics

下图显示了 ClickHouse Cluster Metrics 仪表板：

有关 ClickHouse 监控的更多指标和详细信息，请参阅 ClickHouse 服务器监控，以及一篇可以帮助快速入门的博客通过 SkyWalking 监控 ClickHouse。

Apache ActiveMQ 服务器监控

Apache ActiveMQ Classic 是一个流行且强大的开源消息和集成模式服务器。SkyWalking 为 ActiveMQ 提供了一个基本的监控仪表板，包括以下指标：

Cluster Metrics：包括内存使用率、写入/读取速率和平均/最大写入持续时间。
Broker Metrics：包括节点状态、连接数、生产者/消费者数和代理下的写入/读取速率。根据集群模式，一个集群可以包含一个或多个代理。
Destination Metrics：包括生产者/消费者数、不同状态的消息、队列和队列/主题中的入队持续时间。

下图显示了 ActiveMQ Cluster Metrics 仪表板：

有关 ActiveMQ 监控的更多指标和详细信息，请参阅 Apache ActiveMQ 服务器监控，以及一篇可以帮助快速入门的博客通过 SkyWalking 监控 ActiveMQ。

支持多标签名称

在 SkyWalking 10 之前，SkyWalking 不会在指标数据中存储标签名称，这使得 MQE 必须使用 _ 作为通用标签名称，无法使用多个标签名称查询指标数据。

SkyWalking 10 支持在指标数据中存储标签名称，MQE 可以使用多个标签名称查询或计算指标数据。例如：k8s_cluster_deployment_status 指标具有 namespace、deployment 和 status 标签。如果我们想查询所有 namespace=skywalking-showcase 和 status=true 的部署指标值，可以使用以下表达式：

k8s_cluster_deployment_status{namespace='skywalking-showcase', status='true'}

Metrics gRPC 导出器

SkyWalking 10 增强了 metrics gPRC exporter，支持将所有类型的指标数据导出到 gRPC 服务器。

SkyWalking 原生 UI 指标查询切换到 V3 API

SkyWalking 原生 UI 指标查询弃用 V2 API，全部迁移到 V3 API 和 MQE。

其他值得注意的增强功能

支持 Java 21 运行时和 oap-java21 镜像用于 Java 21 运行时。
从镜像中删除 CLI（swctl）。
支持更多的 MQE 函数和操作。
增强原生 UI 并改善用户体验。
修复了一些 Bug 和 CVE。

Blog: Monitoring Kubernetes network traffic by using eBPF

Mon, 18 Mar 2024 00:00:00 +0000

Background

Apache SkyWalking is an open-source Application Performance Management system that helps users gather logs, traces, metrics, and events from various platforms and display them on the UI. With version 9.7.0, SkyWalking can collect access logs from probes in multiple languages and from Service Mesh, generating corresponding topologies, tracing, and other data. However, it could not initially collect and map access logs from applications in Kubernetes environments. This article explores how the 10.0.0 version of Apache SkyWalking employs eBPF technology to collect and store application access logs, addressing this limitation.

Why eBPF?

To monitor the network traffic in Kubernetes, the following features support be support:

Cross Language: Applications deployed in Kubernetes may be written in any programming language, making support for diverse languages important.
Non-Intrusiveness: It’s imperative to monitor network traffic without making any modifications to the applications, as direct intervention with applications in Kubernetes is not feasible.
Kernel Metrics Monitoring: Often, diagnosing network issues by analyzing traffic performance at the user-space level is insufficient. A deeper analysis incorporating kernel-space network traffic metrics is frequently necessary.
Support for Various Network Protocols: Applications may communicate using different transport protocols, necessitating support for a range of protocols.

Given these requirements, eBPF emerges as a capable solution. In the next section, we will delve into detailed explanations of how Apache SkyWalking Rover resolves these aspects.

Kernel Monitoring and Protocol Analysis

In previous articles, we’ve discussed how to monitor network traffic from programs written in various languages. This technique remains essential for network traffic monitoring, allowing for the collection of traffic data without language limitations. However, due to the unique aspects of our monitoring trigger mechanism and the specific features of kernel monitoring, these two areas warrant separate explanations.

Kernel Monitoring

Kernel monitoring allows users to gain insights into network traffic performance based on the execution at the kernel level, specifically from Layer 2 (Data Link) to Layer 4 (Transport) of the OSI model.

Network monitoring at the kernel layer is deference from the syscall (user-space) layer in terms of the metrics and identifiers used. While the syscalls layer can utilize file descriptors to correlate various operations, kernel layer network operations primarily use packets as unique identifiers. This discrepancy necessitates a mapping relationship that SkyWalking Rover can use to bind these two layers together for comprehensive monitoring.

Let’s dive into the details of how data is monitored in both sending and receiving modes.

Observe Sending

When sending data, tracking the status and timing of each packet is crucial for understanding the state of each transmission. Within the kernel, operations progress from Layer 4 (L4) down to Layer 2 (L2), maintaining the same thread ID as during the syscalls layer, which simplifies data correlation.

SkyWalking Rover monitors several key kernel functions to observe packet transmission dynamics, listed from L4 to L2:

kprobe/tcp_sendmsg: Captures the time when a packet enters the L4 protocol stack for sending and the time it finishes processing. This function is essential for tracking the initial handling of packets at the transport layer.
kprobe/tcp_transmit_skb: Records the total number of packet transmissions and the size of each packet sent. This function helps identify how many times a packet or a batch of packets is attempted to be sent, which is critical for understanding network throughput and congestion.
tracepoint/tcp/tcp_retransmit_skb: Notes whether packet retransmission occurs, providing insights into network reliability and connection quality. Retransmissions can significantly impact application performance and user experience.
tracepoint/skb/kfree_skb: Records packet loss during transmission and logs the reason for such occurrences. Understanding packet loss is crucial for diagnosing network issues and ensuring data integrity.
kprobe/__ip_queue_xmit: Records the start and end times of processing by the L3 protocol. This function is vital for understanding the time taken for IP-level operations, including routing decisions.
kprobe/nf_hook_slow: Records the total time and number of occurrences spent in Netfilter hooks, such as iptables rule evaluations. This monitoring point is important for assessing the impact of firewall rules and other filtering mechanisms on packet flow.
kprobe/neigh_resolve_output: If resolving an unknown MAC address is necessary before sending a network request, this function records the occurrences and total time spent on this resolution. MAC address resolution times can affect the initial packet transmission delay.
kprobe/__dev_queue_xmit: Records the start and end times of entering the L2 protocol stack, providing insights into the data link layer’s processing times.
tracepoint/net/net_dev_start_xmit and tracepoint/net/net_dev_xmit: Records the actual time taken to transmit each packet at the network interface card (NIC). These functions are crucial for understanding the hardware-level performance and potential bottlenecks at the point of sending data to the physical network.

According to the interception of the above method, Apache SkyWalking Rover can provide key execution time and metrics for each level when sending network data, from the application layer (Layer 7) to the transport layer (Layer 4), and finally to the data link layer (Layer 2).

Observe Receiving

When receiving data, the focus is often on the time it takes for packets to travel from the network interface card (NIC) to the user space. Unlike the process of sending data, data receiving in the kernel proceeds from the data link layer (Layer 2) up to the transport layer (Layer 4), until the application layer (Layer 7) retrieves the packet’s content. In SkyWalking Rover, monitors the following key system functions to observe this process, listed from L2 to L4:

tracepoint/net/netif_receive_skb: Records the time when a packet is received by the network interface card. This tracepoint is crucial for understanding the initial point of entry for incoming data into the system.
kprobe/ip_rcv: Records the start and end times of packet processing at the network layer (Layer 3). This probe provides insights into how long it takes for the IP layer to handle routing, forwarding, and delivering packets to the correct application.
kprobe/nf_hook_slow: Records the total time and occurrences spent in Netfilter hooks, same with the sending traffic flow.
kprobe/tcp_v4_rcv: Records the start and end times of packet processing at the transport layer (Layer 4). This probe is key to understanding the efficiency of TCP operations, including connection management, congestion control, and data flow.
tracepoint/skb/skb_copy_datagram_iovec: When application layer protocols use the data, this tracepoint binds the packet to the syscall layer data at Layer 7. This connection is essential for correlating the kernel’s handling of packets with their consumption by user-space applications.

Based on the above methods, network monitoring can help you understand the complete execution process and execution time from when data is received by the network card to when it is used by the program.

Metrics

By intercepting the methods mentioned above, we can gather key metrics that provide insights into network performance and behavior. These metrics include:

Packets: The size of the packets and the frequency of their transmission or reception. These metric offers a fundamental understanding of the network load and the efficiency of data movement between the sender and receiver.
Connections: The number of connections established or accepted between services and the time taken for these connections to be set up. This metric is crucial for analyzing the efficiency of communication and connection management between different services within the network.
L2-L4 Events: The time spent on key events within the Layer 2 to Layer 4 protocols. This metric sheds light on the processing efficiency and potential bottlenecks within the lower layers of the network stack, which are essential for data transmission and reception.

Protocol Analyzing

In previous articles, we have discussed parsing HTTP/1.x protocols. However, with HTTP/2.x, the protocol’s stateful nature and the pre-established connections between services complicate network profiling. This complexity makes it challenging for Apache SkyWalking Rover to fully perceive the connection context, hindering protocol parsing operations.

Transitioning network monitoring to Daemon mode offers a solution to this challenge. By continuously observing service operations around the clock, SkyWalking Rover can begin monitoring as soon as a service starts. This immediate initiation allows for the tracking of the complete execution context, making the observation of stateful protocols like HTTP/2.x feasible.

Probes

To detect when a process is started, monitoring a specific trace point (tracepoint/sched/sched_process_fork) is essential. This approach enables the system to be aware of process initiation events. Given the necessity to filter process traffic based on certain criteria such as the process’s namespace, Apache SkyWalking Rover follows a series of steps to ensure accurate and efficient monitoring. These steps include:

Monitoring Activation: The process is immediately added to a monitoring whitelist upon detection. This step ensures that the process is considered for monitoring from the moment it starts, without delay.
Push to Queue: The process’s PID (Process ID) is pushed into a monitoring confirmation queue. This queue holds the PIDs of newly detected processes that are pending further confirmation from a user-space program. This asynchronous approach allows for the separation of immediate detection and subsequent processing, optimizing the monitoring workflow.
User-Space Program Confirmation: The user-space program retrieves process PIDs from the queue and assesses whether each process should continue to be monitored. If a process is deemed unnecessary for monitoring, it is removed from the whitelist.

This process ensures that SkyWalking Rover can dynamically adapt its monitoring scope based on real-time conditions and configurations, allowing for both comprehensive coverage and efficient resource use.

Limitations

The monitoring of stateful protocols like HTTP/2.x currently faces certain limitations:

Inability to Observe Pre-existing Connections: Monitoring the complete request and response cycle requires that monitoring be initiated before any connections are established. This requirement means that connections set up before the start of monitoring cannot be observed.
Challenges with TLS Requests: Observing TLS encrypted traffic is complex because it relies on asynchronously attaching uprobes (user-space attaching) for observation. If new requests are made before these uprobes are successfully attached, it becomes impossible to access the data before encryption or after decryption.

Demo

Next, let’s quickly demonstrate the Kubernetes monitoring feature, so you can understand more specifically what it accomplishes.

Deploy SkyWalking Showcase

SkyWalking Showcase contains a complete set of example services and can be monitored using SkyWalking. For more information, please check the official documentation.

In this demo, we only deploy service, the latest released SkyWalking OAP, and UI.

export FEATURE_FLAGS=java-agent-injector,single-node,elasticsearch,rover
make deploy.kubernetes

After deployment is complete, please run the following script to open SkyWalking UI: http://localhost:8080/.

kubectl port-forward svc/ui 8080:8080 --namespace default

Done

Once deployed, Apache SkyWalking Rover automatically begins monitoring traffic within the system upon startup. Then, reports this traffic data to SkyWalking OAP, where it is ultimately stored in a database.

In the Service Dashboard within Kubernetes, you can view a list of monitored Kubernetes services. If any of these services have HTTP traffic, this information would be displayed alongside them in the dashboard.

Figure 1: Kubernetes Service List

Additionally, within the Topology Tab, you can observe the topology among related services. In each service or call relationship, there would display relevant TCP and HTTP metrics.

Figure 2: Kubernetes Service Topology

When you select a specific service from the Service list, you can view service metrics at both the TCP and HTTP levels for the chosen service.

Figure 3: Kubernetes Service TCP Metrics

Figure 4: Kubernetes Service HTTP Metrics

Furthermore, by using the Endpoint Tab, you can see which URIs have been accessed for the current service.

Figure 5: Kubernetes Service Endpoint List

Conclusion

In this article, I’ve detailed how to utilize eBPF technology for network monitoring of services within a Kubernetes cluster, a capability that has been implemented in Apache SkyWalking Rover. This approach leverages the power of eBPF to provide deep insights into network traffic and service interactions, enhancing visibility and observability across the cluster.

Zh: 将 Apache SkyWalking 与 Arthas 集成

Sun, 17 Sep 2023 00:00:00 +0000

背景介绍

Arthas 是一款常用的 Java 诊断工具，我们可以在 SkyWalking 监控到服务异常后，通过 Arthas 进一步分析和诊断以快速定位问题。

在 Arthas 实际使用中，通常由开发人员拷贝或者下载安装包到服务对应的VM或者容器中，attach 到对应的 Java 进程进行问题排查。这一过程不可避免的会造成服务器敏感运维信息的扩散，而且在分秒必争的问题排查过程中，这些繁琐的操作无疑会浪费大量时间。

SkyWalking Java Agent 伴随 Java 服务一起启动，并定期上报服务、实例信息给OAP Server。我们可以借助 SkyWalking Java Agent 的插件化能力，开发一个 Arthas 控制插件，由该插件管理 Arthas 运行生命周期，通过页面化的方式，完成Arthas的启动与停止。最终实现效果可以参考下图：

要完成上述功能，我们需要实现以下几个关键点：

开发 agent arthas-control-plugin，执行 arthas 的启动与停止命令
开发 oap arthas-controller-module ，下发控制命令给 arthas agent plugin
定制 skywalking-ui, 连接 arthas-tunnel-server，发送 arthas 命令并获取执行结果

以上各个模块之间的交互流程如下图所示：

connect

disconnect

本文涉及的所有代码均已发布在 github skywalking-x-arthas 上，如有需要，大家可以自行下载代码测试。文章后半部分将主要介绍代码逻辑及其中包含的SkyWalking扩展点。

agent arthas-control-plugin

首先在 skywalking-java/apm-sniffer/apm-sdk-plugin 下创建一个 arthas-control-plugin，该模块在打包后会成为 skywalking-agent/plugins 下的一个插件，其目录结构如下：

arthas-control-plugin/
├── pom.xml
└── src
    └── main
        ├── java
        │   └── org
        │       └── apache
        │           └── skywalking
        │               └── apm
        │                   └── plugin
        │                       └── arthas
        │                           ├── config
        │                           │   └── ArthasConfig.java    # 模块配置
        │                           ├── service
        │                           │   └── CommandListener.java # boot service，监听 oap command
        │                           └── util
        │                               ├── ArthasCtl.java       # 控制 arthas 的启动与停止
        │                               └── ProcessUtils.java
        ├── proto
        │   └── ArthasCommandService.proto                       # 与oap server通信的 grpc 协议定义
        └── resources
            └── META-INF
                └── services                                     # boot service spi service
                    └── org.apache.skywalking.apm.agent.core.boot.BootService

16 directories, 7 files

在 ArthasConfig.java 中，我们定义了以下配置，这些参数将在 arthas 启动时传递。

以下的配置可以通过 agent.config 文件、system prop、env variable指定。关于 skywalking-agent 配置的初始化的具体流程，大家可以参考 SnifferConfigInitializer 。

public class ArthasConfig {
    public static class Plugin {
        @PluginConfig(root = ArthasConfig.class)
        public static class Arthas {
            // arthas 目录
            public static String ARTHAS_HOME;
            // arthas 启动时连接的tunnel server
            public static String TUNNEL_SERVER;
            // arthas 会话超时时间
            public static Long SESSION_TIMEOUT;
            // 禁用的 arthas command
            public static String DISABLED_COMMANDS;
        }
    }
}

接着，我们看下 CommandListener.java 的实现，CommandListener 实现了 BootService 接口，并通过 resources/META-INF/services 下的文件暴露给 ServiceLoader。

BootService 的定义如下，共有prepare()、boot()、onComplete()、shutdown()几个方法，这几个方法分别对应插件生命周期的不同阶段。

public interface BootService {
    void prepare() throws Throwable;

    void boot() throws Throwable;

    void onComplete() throws Throwable;

    void shutdown() throws Throwable;

    default int priority() {
        return 0;
    }
}

在 ServiceManager 类的 boot() 方法中，定义了BootService 的 load 与启动流程，该方法由SkyWalkingAgent 的 premain 调用，在主程序运行前完成初始化与启动：

public enum ServiceManager {
    INSTANCE;
    ...
    ...
    public void boot() {
        bootedServices = loadAllServices();

        prepare();
        startup();
        onComplete();
    }
    ...
    ...
}

回到我们 CommandListener 的 boot 方法，该方法在 agent 启动之初定义了一个定时任务，这个定时任务会轮询 oap ，查询是否需要启动或者停止arthas:

public class CommandListener implements BootService, GRPCChannelListener {
    ...
    ...
    @Override
    public void boot() throws Throwable {
        getCommandFuture = Executors.newSingleThreadScheduledExecutor(
            new DefaultNamedThreadFactory("CommandListener")
        ).scheduleWithFixedDelay(
            new RunnableWithExceptionProtection(
                this::getCommand,
                t -> LOGGER.error("get arthas command error.", t)
            ), 0, 2, TimeUnit.SECONDS
        );
    }
    ...
    ...
}

getCommand方法中定义了start、stop的处理逻辑，分别对应页面上的 connect 和 disconnect 操作。这两个 command 有分别转给 ArthasCtl 的 startArthas 和 stopArthas 两个方法处理，用来控制 arthas 的启停。

在 startArthas 方法中，启动arthas-core.jar 并使用 skywalking-agent 的 serviceName 和 instanceName 注册连接至配置文件中指定的arthas-tunnel-server。

ArthasCtl 逻辑参考自 Arthas 的 BootStrap.java ，由于不是本篇文章的重点，这里不再赘述，感兴趣的小伙伴可以自行查看。

switch (commandResponse.getCommand()) {
    case START:
        if (alreadyAttached()) {
            LOGGER.warn("arthas already attached, no need start again");
            return;
        }
        try {
            arthasTelnetPort = SocketUtils.findAvailableTcpPort();
            ArthasCtl.startArthas(PidUtils.currentLongPid(), arthasTelnetPort);
        } catch (Exception e) {
            LOGGER.info("error when start arthas", e);
        }
        break;
    case STOP:
        if (!alreadyAttached()) {
            LOGGER.warn("no arthas attached, no need to stop");
            return;
        }
        try {
            ArthasCtl.stopArthas(arthasTelnetPort);
            arthasTelnetPort = null;
        } catch (Exception e) {
            LOGGER.info("error when stop arthas", e);
        }
        break;
}

看完 arthas 的启动与停止控制逻辑，我们回到 CommandListener 的 statusChanged 方法，由于要和 oap 通信，这里我们按照惯例监听 grpc channel 的状态，只有状态正常时才会执行上面的getCommand轮询。

public class CommandListener implements BootService, GRPCChannelListener {
    ...
    ...
    @Override
    public void statusChanged(final GRPCChannelStatus status) {
        if (GRPCChannelStatus.CONNECTED.equals(status)) {
            Object channel = ServiceManager.INSTANCE.findService(GRPCChannelManager.class).getChannel();
            // DO NOT REMOVE Channel CAST, or it will throw `incompatible types: org.apache.skywalking.apm.dependencies.io.grpc.Channel
            // cannot be converted to io.grpc.Channel` exception when compile due to agent core's shade of grpc dependencies.
            commandServiceBlockingStub = ArthasCommandServiceGrpc.newBlockingStub((Channel) channel);
        } else {
            commandServiceBlockingStub = null;
        }
        this.status = status;
    }
    ...
    ...
}

上面的代码，细心的小伙伴可能会发现，getChannel() 的返回值被向上转型成了 Object, 而在下面的 newBlockingStub 方法中，又强制转成了 Channel。

看似有点多此一举，其实不然，我们将这里的转型去掉,尝试编译就会收到下面的错误：

[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.10.1:compile (default-compile) on project arthas-control-plugin: Compilation failure
[ERROR] .../CommandListener.java:[59,103] 不兼容的类型: org.apache.skywalking.apm.dependencies.io.grpc.Channel无法转换为io.grpc.Channel

上面的错误提示 ServiceManager.INSTANCE.findService(GRPCChannelManager.class).getChannel() 的返回值类型是 org.apache.skywalking.apm.dependencies.io.grpc.Channel，无法被赋值给 io.grpc.Channel 引用。

我们查看GRPCChannelManager的getChannel()方法代码会发现，方法定义的返回值明明是 io.grpc.Channel，为什么编译时会报上面的错误？

其实这是skywalking-agent的一个小魔法，由于 agent-core 最终会被打包进 skywalking-agent.jar，启动时由系统类装载器（或者其他父级类装载器）直接装载，为了防止所依赖的类库和被监控服务的类发生版本冲突，agent 核心代码在打包时使用了maven-shade-plugin, 该插件会在 maven package 阶段改变 grpc 依赖的包名，我们在源代码里看到的是 io.grpc.Channel，其实在真正运行时已经被改成了 org.apache.skywalking.apm.dependencies.io.grpc.Channel，这便可解释上面编译报错的原因。

除了grpc以外，其他一些 well-known 的 dependency 也会进行 shade 操作，详情大家可以参考 apm-agent-core pom.xml ：

<plugin>
    <artifactId>maven-shade-plugin</artifactId>
    <executions>
        <execution>
            <phase>package</phase>
            <goals>
                <goal>shade</goal>
            </goals>
            <configuration>
                ...
                ...
                <relocations>
                    <relocation>
                        <pattern>${shade.com.google.source}</pattern>
                        <shadedPattern>${shade.com.google.target}</shadedPattern>
                    </relocation>
                    <relocation>
                        <pattern>${shade.io.grpc.source}</pattern>
                        <shadedPattern>${shade.io.grpc.target}</shadedPattern>
                    </relocation>
                    <relocation>
                        <pattern>${shade.io.netty.source}</pattern>
                        <shadedPattern>${shade.io.netty.target}</shadedPattern>
                    </relocation>
                    <relocation>
                        <pattern>${shade.io.opencensus.source}</pattern>
                        <shadedPattern>${shade.io.opencensus.target}</shadedPattern>
                    </relocation>
                    <relocation>
                        <pattern>${shade.io.perfmark.source}</pattern>
                        <shadedPattern>${shade.io.perfmark.target}</shadedPattern>
                    </relocation>
                    <relocation>
                        <pattern>${shade.org.slf4j.source}</pattern>
                        <shadedPattern>${shade.org.slf4j.target}</shadedPattern>
                    </relocation>
                </relocations>
                ...
                ...
            </configuration>
        </execution>
    </executions>
</plugin>

除了上面的注意点以外，我们来看一下另一个场景，假设我们需要在 agent plugin 的 interceptor 中使用 plugin 中定义的 BootService 会发生什么？

我们回到 BootService 的加载逻辑，为了加载到 plugin 中定义的BootService，ServiceLoader 指定了类装载器为AgentClassLoader.getDefault()，（这行代码历史非常悠久，可以追溯到2018年：Allow use SkyWalking plugin to override service in Agent core. #1111 ），由此可见，plugin 中定义的 BootService 的 classloader 是 AgentClassLoader.getDefault()：

void load(List<BootService> allServices) {
    for (final BootService bootService : ServiceLoader.load(BootService.class, AgentClassLoader.getDefault())) {
        allServices.add(bootService);
    }
}

再来看下 interceptor 的加载逻辑，InterceptorInstanceLoader.java 的 load 方法规定了如果父加载器相同，plugin 中的 interceptor 将使用一个新创建的 AgentClassLoader （在绝大部分简单场景中，plugin 的 interceptor 都由同一个 AgentClassLoader 加载）：

public static <T> T load(String className,
    ClassLoader targetClassLoader) throws IllegalAccessException, InstantiationException, ClassNotFoundException, AgentPackageNotFoundException {
    ...
    ...
    pluginLoader = EXTEND_PLUGIN_CLASSLOADERS.get(targetClassLoader);
    if (pluginLoader == null) {
        pluginLoader = new AgentClassLoader(targetClassLoader);
        EXTEND_PLUGIN_CLASSLOADERS.put(targetClassLoader, pluginLoader);
    }
    ...
    ...
}

按照类装载器的委派机制，interceptor 中如果用到了 BootService，也会由当前的类的装载器去装载。所以 ServiceManager 中装载的 BootService 和 interceptor 装载的 BootService 并不是同一个（一个 class 文件被不同的 classloader 装载了两次），如果在 interceptor 中调用 BootService 方法，同样会发生 cast 异常。由此可见，目前的实现并不支持我们在interceptor中直接调用 plugin 中 BootService 的方法，如果需要调用，只能将 BootService 放到 agent-core 中，由更高级别的类装载器优先装载。

这其实并不是 skywalking-agent 的问题，skywalking agent plugin 专注于自己的应用场景，只需要关注 trace、meter 以及默认 BootService 的覆盖就可以了。只是我们如果有扩展 skywalking-agent 的需求，要对其类装载机制做到心中有数，否则可能会出现一些意想不到的问题。

oap arthas-controller-module

看完 agent-plugin 的实现，我们再来看看 oap 部分的修改，oap 同样是模块化的设计，我们可以很轻松的增加一个新的模块，在 /oap-server/ 目录下新建 arthas-controller 子模块：

arthas-controller/
├── pom.xml
└── src
    └── main
        ├── java
        │   └── org
        │       └── apache
        │           └── skywalking
        │               └── oap
        │                   └── arthas
        │                       ├── ArthasControllerModule.java      # 模块定义
        │                       ├── ArthasControllerProvider.java    # 模块逻辑实现者
        │                       ├── CommandQueue.java
        │                       └── handler
        │                           ├── CommandGrpcHandler.java      # grpc handler,供 plugin 通信使用
        │                           └── CommandRestHandler.java      # http handler,供 skywalking-ui 通信使用
        ├── proto
        │   └── ArthasCommandService.proto
        └── resources
            └── META-INF
                └── services                                         # 模块及模块实现的 spi service
                    ├── org.apache.skywalking.oap.server.library.module.ModuleDefine
                    └── org.apache.skywalking.oap.server.library.module.ModuleProvider

模块的定义非常简单，只包含一个模块名，由于我们新增的模块并不需要暴露service给其他模块调用，services 我们返回一个空数组

public class ArthasControllerModule extends ModuleDefine {

    public static final String NAME = "arthas-controller";

    public ArthasControllerModule() {
        super(NAME);
    }

    @Override
    public Class<?>[] services() {
        return new Class[0];
    }
}

接着是模块实现者，实现者取名为 default，module 指定该 provider 所属模块，由于没有模块的自定义配置，newConfigCreator 我们返回null即可。 start 方法分别向 CoreModule 的 grpc 服务和 http 服务注册了两个 handler，grpc 服务和 http 服务就是我们熟知的 11800 和 12800 端口：

public class ArthasControllerProvider extends ModuleProvider {

    @Override
    public String name() {
        return "default";
    }

    @Override
    public Class<? extends ModuleDefine> module() {
        return ArthasControllerModule.class;
    }

    @Override
    public ConfigCreator<?> newConfigCreator() {
        return null;
    }

    @Override
    public void prepare() throws ServiceNotProvidedException {

    }

    @Override
    public void start() throws ServiceNotProvidedException, ModuleStartException {
        // grpc service for agent
        GRPCHandlerRegister grpcService = getManager().find(CoreModule.NAME)
                                                      .provider()
                                                      .getService(GRPCHandlerRegister.class);
        grpcService.addHandler(
            new CommandGrpcHandler()
        );

        // rest service for ui
        HTTPHandlerRegister restService = getManager().find(CoreModule.NAME)
                                                      .provider()
                                                      .getService(HTTPHandlerRegister.class);
        restService.addHandler(
            new CommandRestHandler(),
            Collections.singletonList(HttpMethod.POST)
        );
    }

    @Override
    public void notifyAfterCompleted() throws ServiceNotProvidedException {

    }

    @Override
    public String[] requiredModules() {
        return new String[0];
    }
}

最后在配置文件中注册本模块及模块实现者，下面的配置表示 arthas-controller 这个 module 由 default provider 提供实现：

arthas-controller:
  selector: default
  default:

CommandGrpcHandler 和 CommandHttpHandler 的逻辑非常简单，CommandHttpHandler 定义了 connect 和 disconnect 接口，收到请求后会放到一个 Queue 中供 CommandGrpcHandler 消费，Queue 的实现如下，这里不再赘述：

public class CommandQueue {

    private static final Map<String, Command> COMMANDS = new ConcurrentHashMap<>();

    // produce by connect、disconnect
    public static void produceCommand(String serviceName, String instanceName, Command command) {
        COMMANDS.put(serviceName + instanceName, command);
    }

    // consume by agent getCommand task
    public static Optional<Command> consumeCommand(String serviceName, String instanceName) {
        return Optional.ofNullable(COMMANDS.remove(serviceName + instanceName));
    }
}

skywalking-ui arthas console

完成了 agent 和 oap 的开发，我们再看下 ui 部分：

connect：调用oap server connect 接口，并连接 arthas-tunnel-server
disconnect：调用oap server disconnect 接口，并与 arthas-tunnel-server 断开连接
arthas 命令交互，这部分代码主要参考 arthas，大家可以查看 web-ui console 的实现

修改完skywalking-ui的代码后，我们可以直接通过 npm run dev 测试了。

如果需要通过主项目打包，别忘了在apm-webapp 的 ApplicationStartUp.java 类中添加一条 arthas 的路由：

Server
    .builder()
    .port(port, SessionProtocol.HTTP)
    .service("/arthas", oap)
    .service("/graphql", oap)
    .service("/internal/l7check", HealthCheckService.of())
    .service("/zipkin/config.json", zipkin)
    .serviceUnder("/zipkin/api", zipkin)
    .serviceUnder("/zipkin",
        FileService.of(
            ApplicationStartUp.class.getClassLoader(),
            "/zipkin-lens")
            .orElse(zipkinIndexPage))
    .serviceUnder("/",
        FileService.of(
            ApplicationStartUp.class.getClassLoader(),
            "/public")
            .orElse(indexPage))
    .build()
    .start()
    .join();

总结

BootService 启动及停止流程
如何利用 BootService 实现自定义逻辑
Agent Plugin 的类装载机制
maven-shade-plugin 的使用与注意点
如何利用 ModuleDefine 与 ModuleProvider 定义新的模块
如何向 GRPC、HTTP Service 添加新的 handler

如果你还有任何的疑问，欢迎大家与我交流。

Blog: Activating Automatical Performance Analysis -- Continuous Profiling

Sun, 25 Jun 2023 00:00:00 +0000

Background

In previous articles, We have discussed how to use SkyWalking and eBPF for performance problem detection within processes and networks. They are good methods to locate issues, but still there are some challenges:

The timing of the task initiation: It’s always challenging to address the processes that require performance monitoring when problems occur. Typically, manual engagement is required to identify processes and the types of performance analysis necessary, which cause extra time during the crash recovery. The root cause locating and the time of crash recovery conflict with each other from time to time. In the real case, rebooting would be the first choice of recovery, meanwhile, it destroys the site of crashing.
Resource consumption of tasks: The difficulties to determine the profiling scope. Wider profiling causes more resources than it should. We need a method to manage resource consumption and understand which processes necessitate performance analysis.
Engineer capabilities: On-call is usually covered by the whole team, which have junior and senior engineers, even senior engineers have their understanding limitation of the complex distributed system, it is nearly impossible to understand the whole system by a single one person.

The Continuous Profiling is a new created mechanism to resolve the above issues.

Automate Profiling

As profiling is resource costing and high experience required, how about introducing a method to narrow the scope and automate the profiling driven by polices creates by senior SRE engineer? So, in 9.5.0, SkyWalking first introduced preset policy rules for specific services to be monitored by the eBPF Agent in a low-energy manner, and run profiling when necessary automatically.

Policy

Policy rules specify how to monitor target processes and determine the type of profiling task to initiate when certain threshold conditions are met.

These policy rules primarily consist of the following configuration information:

Monitoring type: This specifies what kind of monitoring should be implemented on the target process.
Threshold determination: This defines how to determine whether the target process requires the initiation of a profiling task.
Trigger task: This specifies what kind of performance analysis task should be initiated.

Monitoring type

The type of monitoring is determined by observing the data values of a specified process to generate corresponding metrics. These metric values can then facilitate subsequent threshold judgment operations. In eBPF observation, we believe the following metrics can most directly reflect the current performance of the program:

Monitor Type	Unit	Description
System Load	Load	System load average over a specified period.
Process CPU	Percentage	The CPU usage of the process as a percentage.
Process Thread Count	Count	The number of threads in the process.
HTTP Error Rate	Percentage	The percentage of HTTP requests that result in error responses (e.g., 4xx or 5xx status codes).
HTTP Avg Response Time	Millisecond	The average response time for HTTP requests.

Monitoring network type metrics is not as simple as obtaining basic process information. It requires the initiation of eBPF programs and attaching them to the target process for observation. This is similar to the principles of network profiling task we introduced in the previous article, except that we no longer collect the full content of the data packets. Instead, we only collect the content of messages that match specified HTTP prefixes.

By using this method, we can significantly reduce the number of times the kernel sends data to the user space, and the user-space program can parse the data content with less system resource usage. This ultimately helps in conserving system resources.

Metrics collector

The eBPF agent would report metrics of processes periodically as follows to indicate the process performance in time.

Name	Unit	Description
process_cpu	(0-100)%	The CPU usage percent
process_thread_count	count	The thread count of process
system_load	count	The average system load for the last minute, each process have same value
http_error_rate	(0-100)%	The network request error rate percentage
http_avg_response_time	ms	The network average response duration

Threshold determination

For the threshold determination, the judgement is made by the eBPF Agent based on the target monitoring process in its own memory, rather than relying on calculations performed by the SkyWalking backend. The advantage of this approach is that it doesn’t have to wait for the results of complex backend computations, and it reduces potential issues brought about by complicated interactions.

By using this method, the eBPF Agent can swiftly initiate tasks immediately after conditions are met, without any delay.

It includes the following configuration items:

Threshold: Check if the monitoring value meets the specified expectations.
Period: The time period(seconds) for monitoring data, which can also be understood as the most recent duration.
Count: The number of times(seconds) the threshold is triggered within the detection period, which can also be understood as the total number of times the specified threshold rule is triggered in the most recent duration(seconds). Once the count check is met, the specified Profiling task will be started.

Trigger task

When the eBPF Agent detects that the threshold determination in the specified policy meets the rules, it can initiate the corresponding task according to pre-configured rules. For each different target performance task, their task initiation parameters are different:

On/Off CPU Profiling: It automatically performs performance analysis on processes that meet the conditions, defaulting to 10 minutes of monitoring.
Network Profiling: It performs network performance analysis on all processes in the same Service Instance on the current machine, to prevent the cause of the issue from being unrealizable due to too few process being collected, defaulting to 10 minutes of monitoring.

Once the task is initiated, no new profiling tasks would be started for the current process for a certain period. The main reason for this is to prevent frequent task creation due to low threshold settings, which could affect program execution. The default time period is 20 minutes.

Data Flow

The figure 1 illustrates the data flow of the continuous profiling feature:

Figure 1: Data Flow of Continuous Profiling

eBPF Agent with Process

Firstly, we need to ensure that the eBPF Agent and the process to be monitored are deployed on the same host machine, so that we can collect relevant data from the process. When the eBPF Agent detects a threshold validation rule that conforms to the policy, it immediately triggers the profiling task for the target process, thereby reducing any intermediate steps and accelerating the ability to pinpoint performance issues.

Sliding window

The sliding window plays a crucial role in the eBPF Agent’s threshold determination process, as illustrated in the figure 2:

Figure 2: Sliding Window in eBPF Agent

Each element in the array represents the data value for a specified second in time. When the sliding window needs to verify whether it is responsible for a rule, it fetches the content of each element from a certain number of recent elements (period parameter). If an element exceeds the threshold, it is marked in red and counted. If the number of red elements exceeds a certain number, it is deemed to trigger a task.

Using a sliding window offers the following two advantages:

Fast retrieval of recent content: With a sliding window, complex calculations are unnecessary. You can know the data by simply reading a certain number of recent array elements.
Solving data spikes issues: Validation through count prevents situations where a data point suddenly spikes and then quickly returns to normal. Verification with multiple values can reveal whether exceeding the threshold is frequent or occasional.

eBPF Agent with SkyWalking Backend

The eBPF Agent communicates periodically with the SkyWalking backend, involving three most crucial operations:

Policy synchronization: Through periodic policy synchronization, the eBPF Agent can keep processes on the local machine updated with the latest policy rules as much as possible.
Metrics sending: For processes that are already being monitored, the eBPF Agent periodically sends the collected data to the backend program. This facilitates real-time query of current data values by users, who can also compare this data with historical values or thresholds when problems arise.
Profiling task reporting: When the eBPF detects that a certain process has triggered a policy rule, it automatically initiates a performance task, collects relevant information from the current process, and reports it to the SkyWalking backend. This allows users to know when, why, and what type of profiling task was triggered from the interface.

Demo

Next, let’s quickly demonstrate the continuous profiling feature, so you can understand more specifically what it accomplishes.

Deploy SkyWalking Showcase

SkyWalking Showcase contains a complete set of example services and can be monitored using SkyWalking. For more information, please check the official documentation.

In this demo, we only deploy service, the latest released SkyWalking OAP, and UI.

export SW_OAP_IMAGE=apache/skywalking-oap-server:9.5.0
export SW_UI_IMAGE=apache/skywalking-ui:9.5.0
export SW_ROVER_IMAGE=apache/skywalking-rover:0.5.0

export FEATURE_FLAGS=mesh-with-agent,single-node,elasticsearch,rover
make deploy.kubernetes

After deployment is complete, please run the following script to open SkyWalking UI: http://localhost:8080/.

kubectl port-forward svc/ui 8080:8080 --namespace default

Create Continuous Profiling Policy

Currently, continues profiling feature is set by default in the Service Mesh panel at the Service level.

Figure 3: Continuous Policy Tab

By clicking on the edit button aside from the Policy List, the polices of current service could be created or updated.

Figure 4: Edit Continuous Profiling Policy

Multiple polices are supported. Every policy has the following configurations.

Target Type: Specifies the type of profiling task to be triggered when the threshold determination is met.
Items: For profiling task of the same target, one or more validation items can be specified. As long as one validation item meets the threshold determination, the corresponding performance analysis task will be launched.
1. Monitor Type: Specifies the type of monitoring to be carried out for the target process.
2. Threshold: Depending on the type of monitoring, you need to fill in the corresponding threshold to complete the verification work.
3. Period: Specifies the number of recent seconds of data you want to monitor.
4. Count: Determines the total number of seconds triggered within the recent period.
5. URI Regex/List: This is applicable to HTTP monitoring types, allowing URL filtering.

Done

After clicking the save button, you can see the currently created monitoring rules, as shown in the figure 5:

Figure 5: Continuous Profiling Monitoring Processes

The data can be divided into the following parts:

Policy list: On the left, you can see the rule list you have created.
Monitoring Summary List: Once a rule is selected, you can see which pods and processes would be monitored by this rule. It also summarizes how many profiling tasks have been triggered in the last 48 hours by the current pod or process, as well as the last trigger time. This list is also sorted in descending order by the number of triggers to facilitate your quick review.

When you click on a specific process, a new dashboard would show to list metrics and triggered profiling results.

Figure 6: Continuous Profiling Triggered Tasks

The current figure contains the following data contents:

Task Timeline: It lists all profiling tasks in the past 48 hours. And when the mouse hovers over a task, it would also display detailed information:
1. Task start and end time: It indicates when the current performance analysis task was triggered.
2. Trigger reason: It would display the reason why the current process was profiled and list out the value of the metric exceeding the threshold when the profiling was triggered. so you can quickly understand the reason.
Task Detail: Similar to the CPU Profiling and Network Profiling introduced in previous articles, this would display the flame graph or process topology map of the current task, depending on the profiling type.

Meanwhile, on the Metrics tab, metrics relative to profiling policies are collected to retrieve the historical trend, in order to provide a comprehensive explanation of the trigger point about the profiling.

Figure 7: Continuous Profiling Metrics

Conclusion

In this article, I have detailed how the continuous profiling feature in SkyWalking and eBPF works. In general, it involves deploying the eBPF Agent service on the same machine where the process to be monitored resides, and monitoring the target process with low resource consumption. When it meets the threshold conditions, it would initiate more complex CPU Profiling and Network Profiling tasks.

In the future, we will offer even more features. Stay tuned!

Twitter, ASFSkyWalking
Slack. Send Request to join SkyWalking slack mail to the mail list(dev@skywalking.apache.org), we will invite you in.
Subscribe to our medium list.

Zh: 自动化性能分析——持续剖析

Sun, 25 Jun 2023 00:00:00 +0000

背景

在之前的文章中，我们讨论了如何使用 SkyWalking 和 eBPF 来检测性能问题，包括进程和网络。这些方法可以很好地定位问题，但仍然存在一些挑战：

任务启动的时间: 当需要进行性能监控时，解决需要性能监控的进程始终是一个挑战。通常需要手动参与，以标识进程和所需的性能分析类型，这会在崩溃恢复期间耗费额外的时间。根本原因定位和崩溃恢复时间有时会发生冲突。在实际情况中，重新启动可能是恢复的第一选择，同时也会破坏崩溃的现场。
任务的资源消耗: 确定分析范围的困难。过宽的分析范围会导致需要更多的资源。我们需要一种方法来管理资源消耗并了解哪些进程需要性能分析。
工程师能力: 通常由整个团队负责呼叫，其中有初级和高级工程师，即使是高级工程师也对复杂的分布式系统有其理解限制，单个人几乎无法理解整个系统。

持续剖析（Continuous Profiling） 是解决上述问题的新机制。

自动剖析

由于性能分析的资源消耗和高经验要求，因此引入一种方法以缩小范围并由高级 SRE 工程师创建策略自动剖析。因此，在 9.5.0 中，SkyWalking 首先引入了预设策略规则，以低功耗方式监视特定服务的 eBPF 代理，并在必要时自动运行剖析。

策略

策略规则指定了如何监视目标进程并确定在满足某些阈值条件时应启动何种类型的分析任务。

这些策略规则主要包括以下配置信息：

监测类型: 这指定了应在目标进程上实施什么样的监测。
阈值确定: 这定义了如何确定目标进程是否需要启动分析任务。
触发任务: 这指定了应启动什么类型的性能分析任务。

监测类型

监测类型是通过观察指定进程的数据值来生成相应的指标来确定的。这些指标值可以促进后续的阈值判断操作。在 eBPF 观测中，我们认为以下指标最能直接反映程序的当前性能：

监测类型	单位	描述
系统负载	负载	在指定时间段内的系统负载平均值。
进程 CPU	百分比	进程的 CPU 使用率百分比。
进程线程计数	计数	进程中的线程数。
HTTP 错误率	百分比	导致错误响应（例如，4xx 或 5xx 状态代码）的 HTTP 请求的百分比。
HTTP 平均响应时间	毫秒	HTTP 请求的平均响应时间。

指标收集器

eBPF 代理会定期报告以下进程度量，以指示进程性能：

名称	单位	描述
process_cpu	(0-100)%	CPU 使用率百分比
process_thread_count	计数	进程中的线程数
system_load	计数	最近一分钟的平均系统负载，每个进程的值相同
http_error_rate	(0-100)%	网络请求错误率百分比
http_avg_response_time	毫秒	网络平均响应持续时间

阈值确定

对于阈值的确定，eBPF 代理是基于其自身内存中的目标监测进程进行判断，而不是依赖于 SkyWalking 后端执行的计算。这种方法的优点在于，它不必等待复杂后端计算的结果，减少了复杂交互所带来的潜在问题。

通过使用此方法，eBPF 代理可以在条件满足后立即启动任务，而无需任何延迟。

它包括以下配置项：

阈值: 检查监测值是否符合指定的期望值。
周期: 监控数据的时间周期（秒），也可以理解为最近的持续时间。
计数: 检测期间触发阈值的次数（秒），也可以理解为最近持续时间内指定阈值规则触发的总次数（秒）。一旦满足计数检查，指定的分析任务将被开始。

触发任务

当 eBPF Agent 检测到指定策略中的阈值决策符合规则时，根据预配置的规则可以启动相应的任务。对于每个不同的目标性能任务，它们的任务启动参数都不同：

On/Off CPU Profiling: 它会自动对符合条件的进程进行性能分析，缺省情况下监控时间为 10 分钟。
Network Profiling: 它会对当前机器上同一 Service Instance 中的所有进程进行网络性能分析，以防问题的原因因被收集进程太少而无法实现，缺省情况下监控时间为 10 分钟。

一旦任务启动，当前进程将在一定时间内不会启动新的剖析任务。主要原因是为了防止因低阈值设置而频繁创建任务，从而影响程序执行。缺省时间为 20 分钟。

数据流

图 1 展示了持续剖析功能的数据流：

图 1: 持续剖析的数据流

eBPF Agent进行进程跟踪

首先，我们需要确保 eBPF Agent 和要监测的进程部署在同一台主机上，以便我们可以从进程中收集相关数据。当 eBPF Agent 检测到符合策略的阈值验证规则时，它会立即为目标进程触发剖析任务，从而减少任何中间步骤并加速定位性能问题的能力。

滑动窗口

滑动窗口在 eBPF Agent 的阈值决策过程中发挥着至关重要的作用，如图 2 所示：

图 2: eBPF Agent 中的滑动窗口

数组中的每个元素表示指定时间内的数据值。当滑动窗口需要验证是否负责某个规则时，它从最近的一定数量的元素 (period 参数) 中获取每个元素的内容。如果一个元素超过了阈值，则标记为红色并计数。如果红色元素的数量超过一定数量，则被认为触发了任务。

使用滑动窗口具有以下两个优点：

快速检索最近的内容：使用滑动窗口，无需进行复杂的计算。你可以通过简单地读取一定数量的最近数组元素来了解数据。
解决数据峰值问题：通过计数进行验证，可以避免数据点突然增加然后快速返回正常的情况。使用多个值进行验证可以揭示超过阈值是频繁还是偶然发生的。

eBPF Agent与OAP后端通讯

eBPF Agent 定期与 SkyWalking 后端通信，涉及三个最关键的操作：

策略同步：通过定期的策略同步，eBPF Agent 可以尽可能地让本地机器上的进程与最新的策略规则保持同步。
指标发送：对于已经被监视的进程，eBPF Agent 定期将收集到的数据发送到后端程序。这就使用户能够实时查询当前数据值，用户也可以在出现问题时将此数据与历史值或阈值进行比较。
剖析任务报告：当 eBPF 检测到某个进程触发了策略规则时，它会自动启动性能任务，从当前进程收集相关信息，并将其报告给 SkyWalking 后端。这使用户可以从界面了解何时、为什么和触发了什么类型的剖析任务。

演示

接下来，让我们快速演示持续剖析功能，以便你更具体地了解它的功能。

部署 SkyWalking Showcase

SkyWalking Showcase 包含完整的示例服务，并可以使用 SkyWalking 进行监视。有关详细信息，请查看官方文档。

在此演示中，我们只部署服务、最新发布的 SkyWalking OAP 和 UI。

export SW_OAP_IMAGE=apache/skywalking-oap-server:9.5.0
export SW_UI_IMAGE=apache/skywalking-ui:9.5.0
export SW_ROVER_IMAGE=apache/skywalking-rover:0.5.0

export FEATURE_FLAGS=mesh-with-agent,single-node,elasticsearch,rover
make deploy.kubernetes

部署完成后，请运行以下脚本以打开 SkyWalking UI：http://localhost:8080/。

kubectl port-forward svc/ui 8080:8080 --namespace default

创建持续剖析策略

目前，持续剖析功能在 Service Mesh 面板的 Service 级别中默认设置。

图 3: 持续策略选项卡

通过点击 Policy List 旁边的编辑按钮，可以创建或更新当前服务的策略。

图 4: 编辑持续剖析策略

支持多个策略。每个策略都有以下配置。

Target Type：指定符合阈值决策时要触发的剖析任务的类型。
Items：对于相同目标的剖析任务，可以指定一个或多个验证项目。只要一个验证项目符合阈值决策，就会启动相应的性能分析任务。
1. Monitor Type：指定要为目标进程执行的监视类型。
2. Threshold：根据监视类型的不同，需要填写相应的阈值才能完成验证工作。
3. Period：指定你要监测的最近几秒钟的数据数量。
4. Count：确定最近时间段内触发的总秒数。
5. URI 正则表达式/列表：这适用于 HTTP 监控类型，允许 URL 过滤。

完成

单击保存按钮后，你可以看到当前已创建的监控规则，如图 5 所示：

图 5: 持续剖析监控进程

数据可以分为以下几个部分：

策略列表：在左侧，你可以看到已创建的规则列表。
监测摘要列表：选择规则后，你可以看到哪些 pod 和进程将受到该规则的监视。它还总结了当前 pod 或进程在过去 48 小时内触发的性能分析任务数量，以及最后一个触发时间。该列表还按触发次数降序排列，以便你快速查看。

当你单击特定进程时，将显示一个新的仪表板以列出指标和触发的剖析结果。

图 6: 持续剖析触发的任务

当前图包含以下数据内容：

任务时间轴：它列出了过去 48 小时的所有剖析任务。当鼠标悬停在任务上时，它还会显示详细信息：
1. 任务的开始和结束时间：它指示当前性能分析任务何时被触发。
2. 触发原因：它会显示为什么会对当前进程进行剖析，并列出当剖析被触发时超过阈值的度量值，以便你快速了解原因。
任务详情：与前几篇文章介绍的 CPU 剖析和网络剖析类似，它会显示当前任务的火焰图或进程拓扑图，具体取决于剖析类型。

同时，在 Metrics 选项卡中，收集与剖析策略相关的指标以检索历史趋势，以便在剖析的触发点提供全面的解释。

图 7: 持续剖析指标

结论

在本文中，我详细介绍了 SkyWalking 和 eBPF 中持续剖析功能的工作原理。通常情况下，它涉及将 eBPF Agent 服务部署在要监视的进程所在的同一台计算机上，并以低资源消耗监测目标进程。当它符合阈值条件时，它会启动更复杂的 CPU 剖析和网络剖析任务。

在未来，我们将提供更多功能。敬请期待！

Twitter：ASFSkyWalking
Slack：向邮件列表 (dev@skywalking.apache.org) 发送“Request to join SkyWalking Slack”，我们会邀请你加入。
订阅我们的 Medium 列表。

Blog: eBPF enhanced HTTP observability - L7 metrics and tracing

Thu, 12 Jan 2023 00:00:00 +0000

Background

Apache SkyWalking is an open-source Application Performance Management system that helps users collect and aggregate logs, traces, metrics, and events for display on a UI. In the previous article, we introduced how to use Apache SkyWalking Rover to analyze the network performance issue in the service mesh environment. However, in business scenarios, users often rely on mature layer 7 protocols, such as HTTP, for interactions between systems. In this article, we will discuss how to use eBPF techniques to analyze performance bottlenecks of layer 7 protocols and how to enhance the tracing system using network sampling.

This article will show how to use Apache SkyWalking with eBPF to enhance metrics and traces in HTTP observability.

HTTP Protocol Analysis

HTTP is one of the most common Layer 7 protocols and is usually used to provide services to external parties and for inter-system communication. In the following sections, we will show how to identify and analyze HTTP/1.x protocols.

Protocol Identification

In HTTP/1.x, the client and server communicate through a single file descriptor (FD) on each side. Figure 1 shows the process of communication involving the following steps:

Connect/accept: The client establishes a connection with the HTTP server, or the server accepts a connection from the client.
Read/write (multiple times): The client or server reads and writes HTTPS requests and responses. A single request-response pair occurs within the same connection on each side.
Close: The client and server close the connection.

To obtain HTTP content, it’s necessary to read it from the second step of this process. As defined in the RFC, the content is contained within the data of the Layer 4 protocol and can be obtained by parsing the data. The request and response pair can be correlated because they both occur within the same connection on each side.

Figure 1: HTTP communication timeline.

HTTP Pipeline

HTTP pipelining is a feature of HTTP/1.1 that enables multiple HTTP requests to be sent over a single TCP connection without waiting for the corresponding responses. This feature is important because it ensures that the order of the responses on the server side matches the order of the requests.

Figure 2 illustrates how this works. Consider the following scenario: an HTTP client sends multiple requests to a server, and the server responds by sending the HTTP responses in the same order as the requests. This means that the first request sent by the client will receive the first response from the server, the second request will receive the second response, and so on.

When designing HTTP parsing, we should follow this principle by adding request data to a list and removing the first item when parsing a response. This ensures that the responses are processed in the correct order.

Figure 2: HTTP/1.1 pipeline.

Metrics

Based on the identification of the HTTP content and process topology diagram mentioned in the previous article, we can combine these two to generate process-to-process metrics data.

Figure 3 shows the metrics that currently support the analysis between the two processes. Based on the HTTP request and response data, we can analyze the following data:

Metrics Name	Type	Unit	Description
Request CPM(Call Per Minute)	Counter	count	The HTTP request count
Response Status CPM(Call Per Minute)	Counter	count	The count of per HTTP response status code
Request Package Size	Counter/Histogram	Byte	The request package size
Response Package Size	Counter/Histogram	Byte	The response package size
Client Duration	Counter/Histogram	Millisecond	The duration of single HTTP response on the client side
Server Duration	Counter/Histogram	Millisecond	The duration of single HTTP response on the server side

Figure 3: Process-to-process metrics.

HTTP and Trace

During the HTTP process, if we unpack the HTTP requests and responses from raw data, we can use this data to correlate with the existing tracing system.

Trace Context Identification

In order to track the flow of requests between multiple services, the trace system usually creates a trace context when a request enters a service and passes it along to other services during the request-response process. For example, when an HTTP request is sent to another server, the trace context is included in the request header.

Figure 4 displays the raw content of an HTTP request intercepted by Wireshark. The trace context information generated by the Zipkin Tracing system can be identified by the “X-B3” prefix in the header. By using eBPF to intercept the trace context in the HTTP header, we can connect the current request with the trace system.

Figure 4: View of HTTP headers in Wireshark.

Trace Event

We have added the concept of an event to traces. An event can be attached to a span and consists of start and end times, tags, and summaries, allowing us to attach any desired information to the Trace.

When performing eBPF network profiling, two events can be generated based on the request-response data. Figure 5 illustrates what happens when a service performs an HTTP request with profiling. The trace system generates trace context information and sends it in the request. When the service executes in the kernel, we can generate an event for the corresponding trace span by interacting with the request-response data and execution time in the kernel space.

Previously, we could only observe the execution status in the user space. However, by combining traces and eBPF technologies, we can now also get more information about the current trace in the kernel space, which would impact less performance for the target service if we do similar things in the tracing SDK and agent.

Figure 5: Logical view of profiling an HTTP request and response.

Sampling

To ensure efficient data storage and minimize unnecessary data sampling, we use a sampling mechanism for traces in our system. This mechanism triggers sampling only when certain conditions are met. We also provide a list of the top N traces, which allows users to quickly access the relevant request information for a specific trace.

To help users easily identify and analyze relevant events, we offer three different sampling rules:

Slow Traces: Sampling is triggered when the response time for a request exceeds a specified threshold.
Response Status [400, 500): Sampling is triggered when the response status code is greater than or equal to 400 and less than 500.
Response Status [500, 600): Sampling is triggered when the response status code is greater than or equal to 500 and less than 600.

In addition, we recognize that not all request or response raw data may be necessary for analysis. For example, users may be more interested in requesting data when trying to identify performance issues, while they may be more interested in response data when troubleshooting errors. As such, we also provide configuration options for request or response events to allow users to specify which type of data they would like to sample.

Profiling in a Service Mesh

The SkyWalking and SkyWalking Rover projects have already implemented the HTTP protocol analyze and trace associations. How do they perform when running in a service mesh environment?

Deployment

Figure 6 demonstrates the deployment of SkyWalking and SkyWalking Rover in a service mesh environment. SkyWalking Rover is deployed as a DaemonSet on each machine where a service is located and communicates with the SkyWalking backend cluster. It automatically recognizes the services on the machine and reports metadata information to the SkyWalking backend cluster. When a new network profiling task arises, SkyWalking Rover senses the task and analyzes the designated processes, collecting and aggregating network data before ultimately reporting it back to the SkyWalking backend service.

Figure 6: SkyWalking rover deployment topology in a service mesh.

Tracing Systems

Starting from version 9.3.0, the SkyWalking backend fully supports all functions in the Zipkin server. Therefore, the SkyWalking backend can collect traces from both the SkyWalking and Zipkin protocols. Similarly, SkyWalking Rover can identify and analyze trace context in both the SkyWalking and Zipkin trace systems. In the following two sections, network analysis results will be displayed in the SkyWalking and Zipkin UI respectively.

SkyWalking

When SkyWalking performs network profiling, similar to the TCP metrics in the previous article, the SkyWalking UI will first display the topology between processes. When you open the dashboard of the line representing the traffic metrics between processes, you can see the metrics of HTTP traffic from the “HTTP/1.x” tab and the sampled HTTP requests with tracing in the “HTTP Requests” tab.

As shown in Figure 7, there are three lists in the tab, each corresponding to a condition in the event sampling rules. Each list displays the traces that meet the pre-specified conditions. When you click on an item in the trace list, you can view the complete trace.

Figure 7: Sampled HTTP requests within tracing context.

When you click on an item in the trace list, you can quickly view the specified trace. In Figure 8, we can see that in the current service-related span, there is a tag with a number indicating how many HTTP events are related to that trace span.

Since we are in a service mesh environment, each service involves interacting with Envoy. Therefore, the current span includes Envoy’s request and response information. Additionally, since the current service has both incoming and outgoing requests, there are events in the corresponding span.

Figure 8: Events in the trace detail.

When the span is clicked, the details of the span will be displayed. If there are events in the current span, the relevant event information will be displayed on a time axis. As shown in Figure 9, there are a total of 6 related events in the current Span. Each event represents a data sample of an HTTP request/response. One of the events spans multiple time ranges, indicating a longer system call time. It may be due to a blocked system call, depending on the implementation details of the HTTP request in different languages. This can also help us query the possible causes of errors.

Figure 9: Events in one trace span.

Finally, we can click on a specific event to see its complete information. As shown in Figure 10, it displays the sampling information of a request, including the SkyWalking trace context protocol contained in the request header from the HTTP raw data. The raw request data allows you to quickly re-request the request to solve any issues.

Figure 10: The detail of the event.

Zipkin

Zipkin is one of the most widely used distributed tracing systems in the world. SkyWalking can function as an alternative server to provide advanced features for Zipkin users. Here, we use this way to bring the feature into the Zipkin ecosystem out-of-box. The new events would also be treated as a kind of Zipkin’s tags and annotations.

To add events to a Zipkin span, we need to do the following:

Split the start and end times of each event into two annotations with a canonical name.
Add the sampled HTTP raw data from the event to the Zipkin span tags, using the same event name for corresponding purposes.

Figures 11 and 12 show annotations and tags in the same span. In these figures, we can see that the span includes at least two events with the same event name and sequence suffix (e.g., “Start/Finished HTTP Request/Response Sampling-x” in the figure). Both events have separate timestamps to represent their relative times within the span. In the tags, the data content of the corresponding event is represented by the event name and sequence number, respectively.

Figure 11: Event timestamp in the Zipkin span annotation.

Figure 12: Event raw data in the Zipkin span tag.

Demo

In this section, we demonstrate how to perform network profiling in a service mesh and complete metrics collection and HTTP raw data sampling. To follow along, you will need a running Kubernetes environment.

Deploy SkyWalking Showcase

SkyWalking Showcase contains a complete set of example services and can be monitored using SkyWalking. For more information, please check the official documentation.

In this demo, we only deploy service, the latest released SkyWalking OAP, and UI.

export SW_OAP_IMAGE=apache/skywalking-oap-server:9.3.0
export SW_UI_IMAGE=apache/skywalking-ui:9.3.0
export SW_ROVER_IMAGE=apache/skywalking-rover:0.4.0

export FEATURE_FLAGS=mesh-with-agent,single-node,elasticsearch,rover
make deploy.kubernetes

After deployment is complete, please run the following script to open SkyWalking UI: http://localhost:8080/.

kubectl port-forward svc/ui 8080:8080 --namespace default

Start Network Profiling Task

Currently, we can select the specific instances that we wish to monitor by clicking the Data Plane item in the Service Mesh panel and the Service item in the Kubernetes panel.

In figure 13, we have selected an instance with a list of tasks in the network profiling tab.

Figure 13: Network Profiling tab in the Data Plane.

When we click the Start button, as shown in Figure 14, we need to specify the sampling rules for the profiling task. The sampling rules consist of one or more rules, each of which is distinguished by a different URI regular expression. When the HTTP request URI matches the regular expression, the rule is used. If the URI regular expression is empty, the default rule is used. Using multiple rules can help us make different sampling configurations for different requests.

Each rule has three parameters to determine if sampling is needed:

Minimal Request Duration (ms): requests with a response time exceeding the specified time will be sampled.
Sampling response status code between 400 and 499: all status codes in the range [400-499) will be sampled.
Sampling response status code between 500 and 599: all status codes in the range [500-599) will be sampled.

Once the sampling configuration is complete, we can create the task.

Figure 14: Create network profiling task page.

Done!

After a few seconds, you will see the process topology appear on the right side of the page.

When you click on the line between processes, you can view the data between the two processes, which is divided into three tabs:

TCP: displays TCP-related metrics.
HTTP/1.x: displays metrics in the HTTP 1 protocol.
HTTP Requests: displays the analyzed request and saves it to a list according to the sampling rule.

Figure 16: TCP metrics in a network profiling task.

Figure 17: HTTP/1.x metrics in a network profiling task.

Figure 18: HTTP sampled requests in a network profiling task.

Conclusion

In this article, we detailed the overview of how to analyze the Layer 7 HTTP/1.x protocol in network analysis, and how to associate it with existing trace systems. This allows us to extend the scope of data we can observe from just user space to also include kernel-space data.

In the future, we will delve further into the analysis of kernel data, such as collecting information on TCP packet size, transmission frequency, network card, and help on enhancing distributed tracing from another perspective.

Additional Resources

Zh: 使用 eBPF 提升 HTTP 可观测性 - L7 指标和追踪

Thu, 12 Jan 2023 00:00:00 +0000

背景

Apache SkyWalking 是一个开源应用性能管理系统，帮助用户收集和聚合日志、追踪、指标和事件，并在 UI 上显示。在上一篇文章中，我们介绍了如何使用 Apache SkyWalking Rover 分析服务网格环境中的网络性能问题。但是，在商业场景中，用户通常依靠成熟的第 7 层协议（如 HTTP）来进行系统之间的交互。在本文中，我们将讨论如何使用 eBPF 技术来分析第 7 层协议的性能瓶颈，以及如何使用网络采样来增强追踪系统。

本文将演示如何使用 Apache SkyWalking 与 eBPF 来增强 HTTP 可观察性中的指标和追踪。

HTTP 协议分析

HTTP 是最常用的 7 层协议之一，通常用于为外部方提供服务和进行系统间通信。在下面的章节中，我们将展示如何识别和分析 HTTP/1.x 协议。

协议识别

在 HTTP/1.x 中，客户端和服务器通过两端的单个文件描述符（File Descriptor）进行通信。图 1 显示了涉及以下步骤的通信过程：

Connect/Accept：客户端与 HTTP 服务器建立连接，或者服务器接受客户端的连接。
Read/Write（多次）：客户端或服务器读取和写入 HTTPS 请求和响应。单个请求 - 响应对在每边的同一连接内发生。
Close：客户端和服务器关闭连接。

为了获取 HTTP 内容，必须从此过程的第二步读取它。根据 RFC 定义，内容包含在 4 层协议的数据中，可以通过解析数据来获取。请求和响应对可以相关联，因为它们都在两端的同一连接内发生。

图 1：HTTP 通信时间线。

HTTP 管线化

HTTP 管线化（Pipelining）是 HTTP/1.1 的一个特性，允许在等待对应的响应的情况下在单个 TCP 连接上发送多个 HTTP 请求。这个特性很重要，因为它确保了服务器端的响应顺序必须与请求的顺序匹配。

图 2 说明了这是如何工作的，考虑以下情况：HTTP 客户端向服务器发送多个请求，服务器通过按照请求的顺序发送 HTTP 响应来响应。这意味着客户端发送的第一个请求将收到服务器的第一个响应，第二个请求将收到第二个响应，以此类推。

在设计 HTTP 解析时，我们应该遵循这个原则，将请求数据添加到列表中，并在解析响应时删除第一个项目。这可以确保响应按正确的顺序处理。

图 2： HTTP/1.1 管道。

指标

根据前文提到的 HTTP 内容和流程拓扑图的识别，我们可以将这两者结合起来生成进程间的指标数据。

图 3 显示了目前支持两个进程间分析的指标。基于 HTTP 请求和响应数据，可以分析以下数据：

指标名称	类型	单位	描述
请求 CPM（Call Per Minute）	计数器	计数	HTTP 请求计数
响应状态 CPM (Call Per Minute)	计数器	计数	每个 HTTP 响应状态码的计数
请求包大小	计数器 / 直方图	字节	请求包大小
响应包大小	计数器 / 直方图	字节	响应包大小
客户端持续时间	计数器 / 直方图	毫秒	客户端单个 HTTP 响应的持续时间
服务器持续时间	计数器 / 直方图	毫秒	服务器端单个 HTTP 响应的持续时间

图 3：进程到进程指标。

HTTP 和追踪

在 HTTP 过程中，如果我们能够从原始数据中解包 HTTP 请求和响应，就可以使用这些数据与现有的追踪系统进行关联。

追踪上下文标识

为了追踪多个服务之间的请求流，追踪系统通常在请求进入服务时创建追踪上下文，并在请求 - 响应过程中将其传递给其他服务。例如，当 HTTP 请求发送到另一个服务器时，追踪上下文包含在请求头中。

图 4 显示了 Wireshark 拦截的 HTTP 请求的原始内容。由 Zipkin Tracing 系统生成的追踪上下文信息可以通过头中的 “X-B3” 前缀进行标识。通过使用 eBPF 拦截 HTTP 头中的追踪上下文，可以将当前请求与追踪系统连接起来。

图 4：Wireshark 中的 HTTP Header 视图。

Trace 事件

我们已经将事件这个概念加入了追踪中。事件可以附加到跨度上，并包含起始和结束时间、标签和摘要，允许我们将任何所需的信息附加到追踪中。

在执行 eBPF 网络分析时，可以根据请求 - 响应数据生成两个事件。图 5 说明了在带分析的情况下执行 HTTP 请求时发生的情况。追踪系统生成追踪上下文信息并将其发送到请求中。当服务在内核中执行时，我们可以通过与内核空间中的请求 - 响应数据和执行时间交互，为相应的追踪跨度生成事件。

以前，我们只能观察用户空间的执行状态。现在，通过结合追踪和 eBPF 技术，我们还可以在内核空间获取更多关于当前追踪的信息，如果我们在追踪 SDK 和代理中执行类似的操作，将对目标服务的性能产生较小的影响。

图 5：分析 HTTP 请求和响应的逻辑视图。

抽样

该机制仅在满足特定条件时触发抽样。我们还提供了前 N 条追踪的列表，允许用户快速访问特定追踪的相关请求信息。为了帮助用户轻松识别和分析相关事件，我们提供了三种不同的抽样规则：

慢速追踪：当请求的响应时间超过指定阈值时触发抽样。
响应状态 [400,500)：当响应状态代码大于或等于 400 且小于 500 时触发抽样。
响应状态 [500,600)：当响应状态代码大于或等于 500 且小于 600 时触发抽样。

此外，我们认识到分析时可能并不需要所有请求或响应的原始数据。例如，当试图识别性能问题时，用户可能更感兴趣于请求数据，而在解决错误时，他们可能更感兴趣于响应数据。因此，我们还提供了请求或响应事件的配置选项，允许用户指定要抽样的数据类型。

服务网格中的分析

SkyWalking Rover 项目已经实现了 HTTP 协议的分析和追踪关联。当在服务网格环境中运行时它们的表现如何？

部署

图 6 演示了 SkyWalking 和 SkyWalking Rover 在服务网格环境中的部署方式。SkyWalking Rover 作为一个 DaemonSet 部署在每台服务所在的机器上，并与 SkyWalking 后端集群通信。它会自动识别机器上的服务并向 SkyWalking 后端集群报告元数据信息。当出现新的网络分析任务时，SkyWalking Rover 会感知该任务并对指定的进程进行分析，在最终将数据报告回 SkyWalking 后端服务之前，收集和聚合网络数据。

图 6：服务网格中的 SkyWalking rover 部署拓扑。

追踪系统

从版本 9.3.0 开始，SkyWalking 后端完全支持 Zipkin 服务器中的所有功能。因此，SkyWalking 后端可以收集来自 SkyWalking 和 Zipkin 协议的追踪。同样，SkyWalking Rover 可以在 SkyWalking 和 Zipkin 追踪系统中识别和分析追踪上下文。在接下来的两节中，网络分析结果将分别在 SkyWalking 和 Zipkin UI 中显示。

SkyWalking

当 SkyWalking 执行网络分析时，与前文中的 TCP 指标类似，SkyWalking UI 会首先显示进程间的拓扑图。当打开代表进程间流量指标的线的仪表板时，您可以在 “HTTP/1.x” 选项卡中看到 HTTP 流量的指标，并在 “HTTP Requests” 选项卡中看到带追踪的抽样的 HTTP 请求。

如图 7 所示，选项卡中有三个列表，每个列表对应事件抽样规则中的一个条件。每个列表显示符合预先规定条件的追踪。当您单击追踪列表中的一个项目时，就可以查看完整的追踪。

图 7：Tracing 上下文中的采样 HTTP 请求。

当您单击追踪列表中的一个项目时，就可以快速查看指定的追踪。在图 8 中，我们可以看到在当前的服务相关的跨度中，有一个带有数字的标签，表示与该追踪跨度相关的 HTTP 事件数。

由于我们在服务网格环境中，每个服务都涉及与 Envoy 交互。因此，当前的跨度包括 Envoy 的请求和响应信息。此外，由于当前的服务有传入和传出的请求，因此相应的跨度中有事件。

图 8：Tracing 详细信息中的事件。

当单击跨度时，将显示跨度的详细信息。如果当前跨度中有事件，则相关事件信息将在时间轴上显示。如图 9 所示，当前跨度中一共有 6 个相关事件。每个事件代表一个 HTTP 请求 / 响应的数据样本。其中一个事件跨越多个时间范围，表示较长的系统调用时间。这可能是由于系统调用被阻塞，具体取决于不同语言中的 HTTP 请求的实现细节。这也可以帮助我们查询错误的可能原因。

图 9：一个 Tracing 范围内的事件。

最后，我们可以单击特定的事件查看它的完整信息。如图 10 所示，它显示了一个请求的抽样信息，包括从 HTTP 原始数据中的请求头中包含的 SkyWalking 追踪上下文协议。原始请求数据允许您快速重新请求以解决任何问题。

图 10：事件的详细信息。

Zipkin

Zipkin 是世界上广泛使用的分布式追踪系统。SkyWalking 可以作为替代服务器，提供高级功能。在这里，我们使用这种方式将功能无缝集成到 Zipkin 生态系统中。新事件也将被视为 Zipkin 的标签和注释的一种。

为 Zipkin 跨度添加事件，需要执行以下操作：

将每个事件的开始时间和结束时间分别拆分为两个具有规范名称的注释。
将抽样的 HTTP 原始数据从事件添加到 Zipkin 跨度标签中，使用相同的事件名称用于相应的目的。

图 11 和图 12 显示了同一跨度中的注释和标签。在这些图中，我们可以看到跨度包含至少两个具有相同事件名称和序列后缀的事件（例如，图中的 “Start/Finished HTTP Request/Response Sampling-x”）。这两个事件均具有单独的时间戳，用于表示其在跨度内的相对时间。在标签中，对应事件的数据内容分别由事件名称和序列号表示。

图 11：Zipkin span 注释中的事件时间戳。

图 12：Zipkin span 标签中的事件原始数据。

演示

在本节中，我们将演示如何在服务网格中执行网络分析，并完成指标收集和 HTTP 原始数据抽样。要进行操作，您需要一个运行中的 Kubernetes 环境。

部署 SkyWalking Showcase

SkyWalking Showcase 包含一套完整的示例服务，可以使用 SkyWalking 进行监控。有关详细信息，请参阅官方文档。

在本演示中，我们只部署了服务、最新发布的 SkyWalking OAP 和 UI。

export SW_OAP_IMAGE=apache/skywalking-oap-server:9.3.0
export SW_UI_IMAGE=apache/skywalking-ui:9.3.0
export SW_ROVER_IMAGE=apache/skywalking-rover:0.4.0

export FEATURE_FLAGS=mesh-with-agent,single-node,elasticsearch,rover
make deploy.kubernetes

部署完成后，运行下面的脚本启动 SkyWalking UI：http://localhost:8080/。

kubectl port-forward svc/ui 8080:8080 --namespace default

启动网络分析任务

目前，我们可以通过单击服务网格面板中的 Data Plane 项和 Kubernetes 面板中的 Service 项来选择要监视的特定实例。

在图 13 中，我们已在网络分析选项卡中选择了一个具有任务列表的实例。

图 13：数据平面中的网络分析选项卡。

当我们单击 “开始” 按钮时，如图 14 所示，我们需要为分析任务指定抽样规则。抽样规则由一个或多个规则组成，每个规则都由不同的 URI 正则表达式区分。当 HTTP 请求的 URI 与正则表达式匹配时，将使用该规则。如果 URI 正则表达式为空，则使用默认规则。使用多个规则可以帮助我们为不同的请求配置不同的抽样配置。

每个规则都有三个参数来确定是否需要抽样：

最小请求持续时间（毫秒）：响应时间超过指定时间的请求将被抽样。
在 400 和 499 之间的抽样响应状态代码：范围 [400-499) 中的所有状态代码将被抽样。
在 500 和 599 之间的抽样响应状态代码：范围 [500-599) 中的所有状态码将被抽样。

抽样配置完成后，我们就可以创建任务了。

图 14：创建网络分析任务页面。

完成

几秒钟后，你会看到页面的右侧出现进程拓扑结构。

图 15：网络分析任务中的流程拓扑。

当您单击进程之间的线时，您可以查看两个过程之间的数据，它被分为三个选项卡：

TCP：显示与 TCP 相关的指标。
HTTP/1.x：显示 HTTP 1 协议中的指标。
HTTP 请求：显示已分析的请求，并根据抽样规则保存到列表中。

图 16：网络分析任务中的 TCP 指标。

图 17：网络分析任务中的 HTTP/1.x 指标。

图 18：网络分析任务中的 HTTP 采样请求。

总结

在本文中，我们详细介绍了如何在网络分析中分析 7 层 HTTP/1.x 协议，以及如何将其与现有追踪系统相关联。这使我们能够将我们能够观察到的数据从用户空间扩展到内核空间数据。

在未来，我们将进一步探究内核数据的分析，例如收集 TCP 包大小、传输频率、网卡等信息，并从另一个角度提升分布式追踪。

其他资源

Blog: Diagnose Service Mesh Network Performance with eBPF

Tue, 27 Sep 2022 00:00:00 +0000

Background

This article will show how to use Apache SkyWalking with eBPF to make network troubleshooting easier in a service mesh environment.

Apache SkyWalking is an application performance monitor tool for distributed systems. It observes metrics, logs, traces, and events in the service mesh environment and uses that data to generate a dependency graph of your pods and services. This dependency graph can provide quick insights into your system, especially when there’s an issue.

However, when troubleshooting network issues in SkyWalking’s service topology, it is not always easy to pinpoint where the error actually is. There are two reasons for the difficulty:

Traffic through the Envoy sidecar is not easy to observe. Data from Envoy’s Access Log Service (ALS) shows traffic between services (sidecar-to-sidecar), but not metrics on communication between the Envoy sidecar and the service it proxies. Without that information, it is more difficult to understand the impact of the sidecar.
There is a lack of data from transport layer (OSI Layer 4) communication. Since services generally use application layer (OSI Layer 7) protocols such as HTTP, observability data is generally restricted to application layer communication. However, the root cause may actually be in the transport layer, which is typically opaque to observability tools.

Access to metrics from Envoy-to-service and transport layer communication can make it easier to diagnose service issues. To this end, SkyWalking needs to collect and analyze transport layer metrics between processes inside Kubernetes pods - a task well suited to eBPF. We investigated using eBPF for this purpose and present our results and a demo below.

Monitoring Kubernetes Networks with eBPF

With its origins as the Extended Berkeley Packet Filter, eBPF is a general purpose mechanism for injecting and running your own code into the Linux kernel and is an excellent tool for monitoring network traffic in Kubernetes Pods. In the next few sections, we'll provide an overview of how to use eBPF for network monitoring as background for introducing Skywalking Rover, a metrics collector and profiler powered by eBPF to diagnose CPU and network performance.

How Applications and the Network Interact

Interactions between the application and the network can generally be divided into the following steps from higher to lower levels of abstraction:

User Code: Application code uses high-level network libraries in the application stack to exchange data across the network, like sending and receiving HTTP requests.
Network Library: When the network library receives a network request, it interacts with the language API to send the network data.
Language API: Each language provides an API for operating the network, system, etc. When a request is received, it interacts with the system API. In Linux, this API is called syscalls.
Linux API: When the Linux kernel receives the request through the API, it communicates with the socket to send the data, which is usually closer to an OSI Layer 4 protocol, such as TCP, UDP, etc.
Socket Ops: Sending or receiving the data to/from the NIC.

Our hypothesis is that eBPF can monitor the network. There are two ways to implement the interception: User space (uprobe) or Kernel space (kprobe). The table below summarizes the differences.

	Pros	Cons
uprobe	• Get more application-related contexts, such as whether the current request is HTTP or HTTPS. • Requests and responses can be intercepted by a single method	• Data structures can be unstable, so it is more difficult to get the desired data. • Implementation may differ between language/library versions. • Does not work in applications without symbol tables.
kprobe	• Available for all languages. • The data structure and methods are stable and do not require much adaptation. • Easier correlation with underlying data, such as getting the destination address of TCP, OSI Layer 4 protocol metrics, etc.	• A single request and response may be split into multiple probes. • Contextual information is not easy to get for stateful requests. For example header compression in HTTP/2.

For the general network performance monitor, we chose to use the kprobe (intercept the syscalls) for the following reasons:

It’s available for applications written in any programming language, and it’s stable, so it saves a lot of development/adaptation costs.
It can be correlated with metrics from the system level, which makes it easier to troubleshoot.
As a single request and response are split into multiple probes, we can use technology to correlate them.
For contextual information, It’s usually used in OSI Layer 7 protocol network analysis. So, if we just monitor the network performance, then they can be ignored.

Kprobes and network monitoring

Following the network syscalls of Linux documentation, we can implement network monitoring by intercepting two types of methods: socket operations and send/receive methods.

Socket Operations

When accepting or connecting with another socket, we can get the following information:

Connection information: Includes the remote address from the connection which helps us to understand which pod is connected.
Connection statics: Includes basic metrics from sockets, such as round-trip time (RTT), lost packet count in TCP, etc.
Socket and file descriptor (FD) mapping: Includes the relationship between the Linux file descriptor and socket object. It is useful when sending and receiving data through a Linux file descriptor.

Send/Receive

The interface related to sending or receiving data is the focus of performance analysis. It mainly contains the following parameters:

Socket file descriptor: The file descriptor of the current operation corresponding to the socket.
Buffer: The data sent or received, passed as a byte array.

Based on the above parameters, we can analyze the following data:

Bytes: The size of the packet in bytes.
Protocol: The protocol analysis according to the buffer data, such as HTTP, MySQL, etc.
Execution Time: The time it takes to send/receive the data.

At this point (Figure 1) we can analyze the following steps for the whole lifecycle of the connection:

Connect/Accept: When the connection is created.
Transform: Sending and receiving data on the connection.
Close: When the connection is closed.

Figure 1

Protocol and TLS

The previous section described how to analyze connections using send or receive buffer data. For example, following the HTTP/1.1 message specification to analyze the connection. However, this does not work for TLS requests/responses.

Figure 2

When TLS is in use, the Linux Kernel transmits data encrypted in user space. In the figure above, The application usually transmits SSL data through a third-party library (such as OpenSSL). For this case, the Linux API can only get the encrypted data, so it cannot recognize any higher layer protocol. To decrypt inside eBPF, we need to follow these steps:

Read unencrypted data through uprobe: Compatible multiple languages, using uprobe to capture the data that is not encrypted before sending or after receiving. In this way, we can get the original data and associate it with the socket.
Associate with socket: We can associate unencrypted data with the socket.

OpenSSL Use case

For example, the most common way to send/receive SSL data is to use OpenSSL as a shared library, specifically the SSL_read and SSL_write methods to submit the buffer data with the socket.

Following the documentation, we can intercept these two methods, which are almost identical to the API in Linux. The source code of the SSL structure in OpenSSL shows that the Socket FD exists in the BIO object of the SSL structure, and we can get it by the offset.

In summary, with knowledge of how OpenSSL works, we can read unencrypted data in an eBPF function.

Introducing SkyWalking Rover, an eBPF-based Metrics Collector and Profiler

SkyWalking Rover introduces the eBPF network profiling feature into the SkyWalking ecosystem. It’s currently supported in a Kubernetes environment, so must be deployed inside a Kubernetes cluster. Once the deployment is complete, SkyWalking Rover can monitor the network for all processes inside a given Pod. Based on the monitoring data, SkyWalking can generate the topology relationship diagram and metrics between processes.

Topology Diagram

The topology diagram can help us understand the network access between processes inside the same Pod, and between the process and external environment (other Pod or service). Additionally, it can identify the data direction of traffic based on the line flow direction.

In Figure 3 below, all nodes within the hexagon are the internal process of a Pod, and nodes outside the hexagon are externally associated services or Pods. Nodes are connected by lines, which indicate the direction of requests or responses between nodes (client or server). The protocol is indicated on the line, and it’s either HTTP(S), TCP, or TCP(TLS). Also, we can see in this figure that the line between Envoy and Python applications is bidirectional because Envoy intercepts all application traffic.

Figure 3

Metrics

Once we recognize the network call relationship between processes through the topology, we can select a specific line and view the TCP metrics between the two processes.

The diagram below (Figure 4) shows the metrics of network monitoring between two processes. There are four metrics in each line. Two on the left side are on the client side, and two on the right side are on the server side. If the remote process is not in the same Pod, only one side of the metrics is displayed.

Figure 4

The following two metric types are available:

Counter: Records the total number of data in a certain period. Each counter contains the following data: a. Count: Execution count. b. Bytes: Packet size in bytes. c. Execution time: Execution duration.
Histogram: Records the distribution of data in the buckets.

Based on the above data types, the following metrics are exposed:

Name	Type	Unit	Description
Write	Counter and histogram	Millisecond	The socket write counter.
Read	Counter and histogram	Millisecond	The socket read counter.
Write RTT	Counter and histogram	Microsecond	The socket write round trip time (RTT) counter.
Connect	Counter and histogram	Millisecond	The socket connect/accept with another server/client counter.
Close	Counter and histogram	Millisecond	The socket with other socket counter.
Retransmit	Counter	Millisecond	The socket retransmit package counter.
Drop	Counter	Millisecond	The socket drop package counter.

Demo

In this section, we demonstrate how to perform network profiling in the service mesh. To follow along, you will need a running Kubernetes environment.

NOTE: All commands and scripts are available in this GitHub repository.

Install Istio

Istio is the most widely deployed service mesh, and comes with a complete demo application that we can use for testing. To install Istio and the demo application, follow these steps:

Install Istio using the demo configuration profile.
Label the default namespace, so Istio automatically injects Envoy sidecar proxies when we’ll deploy the application.
Deploy the bookinfo application to the cluster.
Deploy the traffic generator to generate some traffic to the application.

export ISTIO_VERSION=1.13.1

# install istio
istioctl install -y --set profile=demo
kubectl label namespace default istio-injection=enabled

# deploy the bookinfo applications
kubectl apply -f https://raw.githubusercontent.com/istio/istio/$ISTIO_VERSION/samples/bookinfo/platform/kube/bookinfo.yaml
kubectl apply -f https://raw.githubusercontent.com/istio/istio/$ISTIO_VERSION/samples/bookinfo/networking/bookinfo-gateway.yaml
kubectl apply -f https://raw.githubusercontent.com/istio/istio/$ISTIO_VERSION/samples/bookinfo/networking/destination-rule-all.yaml
kubectl apply -f https://raw.githubusercontent.com/istio/istio/$ISTIO_VERSION/samples/bookinfo/networking/virtual-service-all-v1.yaml

# generate traffic
kubectl apply -f https://raw.githubusercontent.com/mrproliu/skywalking-network-profiling-demo/main/resources/traffic-generator.yaml

Install SkyWalking

The following will install the storage, backend, and UI needed for SkyWalking:

git clone https://github.com/apache/skywalking-kubernetes.git
cd skywalking-kubernetes
cd chart
helm dep up skywalking
helm -n istio-system install skywalking skywalking \
 --set fullnameOverride=skywalking \
 --set elasticsearch.minimumMasterNodes=1 \
 --set elasticsearch.imageTag=7.5.1 \
 --set oap.replicas=1 \
 --set ui.image.repository=apache/skywalking-ui \
 --set ui.image.tag=9.2.0 \
 --set oap.image.tag=9.2.0 \
 --set oap.envoy.als.enabled=true \
 --set oap.image.repository=apache/skywalking-oap-server \
 --set oap.storageType=elasticsearch \
 --set oap.env.SW_METER_ANALYZER_ACTIVE_FILES='network-profiling'

Install SkyWalking Rover

SkyWalking Rover is deployed on every node in Kubernetes, and it automatically detects the services in the Kubernetes cluster. The network profiling feature has been released in the version 0.3.0 of SkyWalking Rover. When a network monitoring task is created, the SkyWalking rover sends the data to the SkyWalking backend.

kubectl apply -f https://raw.githubusercontent.com/mrproliu/skywalking-network-profiling-demo/main/resources/skywalking-rover.yaml

Start the Network Profiling Task

Once all deployments are completed, we must create a network profiling task for a specific instance of the service in the SkyWalking UI.

To open SkyWalking UI, run:

kubectl port-forward svc/skywalking-ui 8080:80 --namespace
istio-system

Currently, we can select the specific instances that we wish to monitor by clicking the Data Plane item in the Service Mesh panel and the Service item in the Kubernetes panel.

In the figure below, we have selected an instance with a list of tasks in the network profiling tab. When we click the start button, the SkyWalking Rover starts monitoring this instance’s network.

Figure 5

Done!

After a few seconds, you will see the process topology appear on the right side of the page.

Figure 6

When you click on the line between processes, you can see the TCP metrics between the two processes.

Figure 7

Conclusion

In this article, we detailed a problem that makes troubleshooting service mesh architectures difficult: lack of context between layers in the network stack. These are the cases when eBPF begins to really help with debugging/productivity when existing service mesh/envoy cannot. Then, we researched how eBPF could be applied to common communication, such as TLS. Finally, we demo the implementation of this process with SkyWalking Rover.

For now, we have completed the performance analysis for OSI layer 4 (mostly TCP). In the future, we will also introduce the analysis for OSI layer 7 protocols like HTTP.

Get Started with Istio

To get started with service mesh today, Tetrate Istio Distro is the easiest way to install, manage, and upgrade Istio. It provides a vetted upstream distribution of Istio that’s tested and optimized for specific platforms by Tetrate plus a CLI that facilitates acquiring, installing, and configuring multiple Istio versions. Tetrate Istio Distro also offers FIPS certified Istio builds for FedRAMP environments.

For enterprises that need a unified and consistent way to secure and manage services and traditional workloads across complex, heterogeneous deployment environments, we offer Tetrate Service Bridge, our flagship edge-to-workload application connectivity platform built on Istio and Envoy.

Additional Resources

Blog: Pinpoint Service Mesh Critical Performance Impact by using eBPF

Tue, 05 Jul 2022 00:00:00 +0000

Content

Background

Apache SkyWalking observes metrics, logs, traces, and events for services deployed into the service mesh. When troubleshooting, SkyWalking error analysis can be an invaluable tool helping to pinpoint where an error occurred. However, performance problems are more difficult: It’s often impossible to locate the root cause of performance problems with pre-existing observation data. To move beyond the status quo, dynamic debugging and troubleshooting are essential service performance tools. In this article, we’ll discuss how to use eBPF technology to improve the profiling feature in SkyWalking and analyze the performance impact in the service mesh.

Trace Profiling in SkyWalking

Since SkyWalking 7.0.0, Trace Profiling has helped developers find performance problems by periodically sampling the thread stack to let developers know which lines of code take more time. However, Trace Profiling is not suitable for the following scenarios:

Thread Model: Trace Profiling is most useful for profiling code that executes in a single thread. It is less useful for middleware that relies heavily on async execution models. For example Goroutines in Go or Kotlin Coroutines.
Language: Currently, Trace Profiling is only supported in Java and Python, since it’s not easy to obtain the thread stack in the runtimes of some languages such as Go and Node.js.
Agent Binding: Trace Profiling requires Agent installation, which can be tricky depending on the language (e.g., PHP has to rely on its C kernel; Rust and C/C++ require manual instrumentation to make install).
Trace Correlation: Since Trace Profiling is only associated with a single request it can be hard to determine which request is causing the problem.
Short Lifecycle Services: Trace Profiling doesn’t support short-lived services for (at least) two reasons:
1. It’s hard to differentiate system performance from class code manipulation in the booting stage.
2. Trace profiling is linked to an endpoint to identify performance impact, but there is no endpoint to match these short-lived services.

Fortunately, there are techniques that can go further than Trace Profiling in these situations.

Introduce eBPF

We have found that eBPF — a technology that can run sandboxed programs in an operating system kernel and thus safely and efficiently extend the capabilities of the kernel without requiring kernel modifications or loading kernel modules — can help us fill gaps left by Trace Profiling. eBPF is a trending technology because it breaks the traditional barrier between user and kernel space. Programs can now inject bytecode that runs in the kernel, instead of having to recompile the kernel to customize it. This is naturally a good fit for observability.

In the figure below, we can see that when the system executes the execve syscalls, the eBPF program is triggered, and the current process runtime information is obtained by using function calls.

Using eBPF technology, we can expand the scope of Skywalking’s profiling capabilities:

Global Performance Analysis: Before eBPF, data collection was limited to what agents can observe. Since eBPF programs run in the kernel, they can observe all threads. This is especially useful when you are not sure whether a performance problem is caused by a particular request.
Data Content: eBPF can dump both user and kernel space thread stacks, so if a performance issue happens in kernel space, it’s easier to find.
Agent Binding: All modern Linux kernels support eBPF, so there is no need to install anything. This means it is an orchestration-free vs an agent model. This reduces friction caused by built-in software which may not have the correct agents installed, such as Envoy in a Service Mesh.
Sampling Type: Unlike Trace Profiling, eBPF is event-driven and, therefore, not constrained by interval polling. For example, eBPF can trigger events and collect more data depending on a transfer size threshold. This can allow the system to triage and prioritize data collection under extreme load.

eBPF Limitations

While eBPF offers significant advantages for hunting performance bottlenecks, no technology is perfect. eBPF has a number of limitations described below. Fortunately, since SkyWalking does not require eBPF, the impact is limited.

Linux Version Requirement: eBPF programs require a Linux kernel version above 4.4, with later kernel versions offering more data to be collected. The BCC has documented the features supported by different Linux kernel versions, with the differences between versions usually being what data can be collected with eBPF.
Privileges Required: All processes that intend to load eBPF programs into the Linux kernel must be running in privileged mode. As such, bugs or other issues in such code may have a big impact.
Weak Support for Dynamic Language: eBPF has weak support for JIT-based dynamic languages, such as Java. It also depends on what data you want to collect. For Profiling, eBPF does not support parsing the symbols of the program, which is why most eBPF-based profiling technologies only support static languages like C, C++, Go, and Rust. However, symbol mapping can sometimes be solved through tools provided by the language. For example, in Java, perf-map-agent can be used to generate the symbol mapping. However, dynamic languages don’t support the attach (uprobe) functionality that would allow us to trace execution events through symbols.

Introducing SkyWalking Rover

SkyWalking Rover introduces the eBPF profiling feature into the SkyWalking ecosystem. The figure below shows the overall architecture of SkyWalking Rover. SkyWalking Rover is currently supported in Kubernetes environments and must be deployed inside a Kubernetes cluster. After establishing a connection with the SkyWalking backend server, it saves information about the processes on the current machine to SkyWalking. When the user creates an eBPF profiling task via the user interface, SkyWalking Rover receives the task and executes it in the relevant C, C++, Golang, and Rust language-based programs.

Other than an eBPF-capable kernel, there are no additional prerequisites for deploying SkyWalking Rover.

CPU Profiling with Rover

CPU profiling is the most intuitive way to show service performance. Inspired by Brendan Gregg‘s blog post, we’ve divided CPU profiling into two types that we have implemented in Rover:

On-CPU Profiling: Where threads are spending time running on-CPU.
Off-CPU Profiling: Where time is spent waiting while blocked on I/O, locks, timers, paging/swapping, etc.

Profiling Envoy with eBPF

Envoy is a popular proxy, used as the data plane by the Istio service mesh. In a Kubernetes cluster, Istio injects Envoy into each service’s pod as a sidecar where it transparently intercepts and processes incoming and outgoing traffic. As the data plane, any performance issues in Envoy can affect all service traffic in the mesh. In this scenario, it’s more powerful to use eBPF profiling to analyze issues in production caused by service mesh configuration.

Demo Environment

If you want to see this scenario in action, we’ve built a demo environment where we deploy an Nginx service for stress testing. Traffic is intercepted by Envoy and forwarded to Nginx. The commands to install the whole environment can be accessed through GitHub.

On-CPU Profiling

On-CPU profiling is suitable for analyzing thread stacks when service CPU usage is high. If the stack is dumped more times, it means that the thread stack occupies more CPU resources.

When installing Istio using the demo configuration profile, we found there are two places where we can optimize performance:

Zipkin Tracing: Different Zipkin sampling percentages have a direct impact on QPS.
Access Log Format: Reducing the fields of the Envoy access log can improve QPS.

Zipkin Tracing

Zipkin with 100% sampling

In the default demo configuration profile, Envoy is using 100% sampling as default tracing policy. How does that impact the performance?

As shown in the figure below, using the on-CPU profiling, we found that it takes about 16% of the CPU overhead. At a fixed consumption of 2 CPUs, its QPS can reach 5.7K.

Disable Zipkin tracing

At this point, we found that if Zipkin is not necessary, the sampling percentage can be reduced or we can even disable tracing. Based on the Istio documentation, we can disable tracing when installing the service mesh using the following command:

istioctl install -y --set profile=demo \
   --set 'meshConfig.enableTracing=false' \
   --set 'meshConfig.defaultConfig.tracing.sampling=0.0'

After disabling tracing, we performed on-CPU profiling again. According to the figure below, we found that Zipkin has disappeared from the flame graph. With the same 2 CPU consumption as in the previous example, the QPS reached 9K, which is an almost 60% increase.

Tracing with Throughput

With the same CPU usage, we’ve discovered that Envoy performance greatly improves when the tracing feature is disabled. Of course, this requires us to make trade-offs between the number of samples Zipkin collects and the desired performance of Envoy (QPS).

The table below illustrates how different Zipkin sampling percentages under the same CPU usage affect QPS.

Zipkin sampling %	QPS	CPUs	Note
100% (default)	5.7K	2	16% used by Zipkin
1%	8.1K	2	0.3% used by Zipkin
disabled	9.2K	2	0% used by Zipkin

Access Log Format

Default Log Format

In the default demo configuration profile, the default Access Log format contains a lot of data. The flame graph below shows various functions involved in parsing the data such as request headers, response headers, and streaming the body.

Simplifying Access Log Format

Typically, we don’t need all the information in the access log, so we can often simplify it to get what we need. The following command simplifies the access log format to only display basic information:

istioctl install -y --set profile=demo \
   --set meshConfig.accessLogFormat="[%START_TIME%] \"%REQ(:METHOD)% %REQ(X-ENVOY-ORIGINAL-PATH?:PATH)% %PROTOCOL%\" %RESPONSE_CODE%\n"

After simplifying the access log format, we found that the QPS increased from 5.7K to 5.9K. When executing the on-CPU profiling again, the CPU usage of log formatting dropped from 2.4% to 0.7%.

Simplifying the log format helped us to improve the performance.

Off-CPU Profiling

Off-CPU profiling is suitable for performance issues that are not caused by high CPU usage. For example, when there are too many threads in one service, using off-CPU profiling could reveal which threads spend more time context switching.

We provide data aggregation in two dimensions:

Switch count: The number of times a thread switches context. When the thread returns to the CPU, it completes one context switch. A thread stack with a higher switch count spends more time context switching.
Switch duration: The time it takes a thread to switch the context. A thread stack with a higher switch duration spends more time off-CPU.

Write Access Log

Enable Write

Using the same environment and settings as before in the on-CPU test, we performed off-CPU profiling. As shown below, we found that access log writes accounted for about 28% of the total context switches. The “__write” shown below also indicates that this method is the Linux kernel method.

Disable Write

SkyWalking implements Envoy’s Access Log Service (ALS) feature which allows us to send access logs to the SkyWalking Observability Analysis Platform (OAP) using the gRPC protocol. Even by disabling the access logging, we can still use ALS to capture/aggregate the logs. We’ve disabled writing to the access log using the following command:

istioctl install -y --set profile=demo --set meshConfig.accessLogFile=""

After disabling the Access Log feature, we performed the off-CPU profiling. File writing entries have disappeared as shown in the figure below. Envoy throughput also increased from 5.7K to 5.9K.

Conclusion

In this article, we’ve examined the insights Apache Skywalking’s Trace Profiling can give us and how much more can be achieved with eBPF profiling. All of these features are implemented in skywalking-rover. In addition to on- and off-CPU profiling, you will also find the following features:

Continuous profiling, helps you automatically profile without manual intervention. For example, when Rover detects that the CPU exceeds a configurable threshold, it automatically executes the on-CPU profiling task.
More profiling types to enrich usage scenarios, such as network, and memory profiling.

Blog: Scaling with Apache SkyWalking

Mon, 24 Jan 2022 00:00:00 +0000

Background

In the Apache SkyWalking ecosystem, the OAP obtains metrics, traces, logs, and event data through SkyWalking Agent, Envoy, or other data sources. Under the gRPC protocol, it transmits data by communicating with a single server node. Only when the connection is broken, the reconnecting policy would be used based on DNS round-robin mode. When new services are added at runtime or the OAP load is kept high due to increased traffic of observed services, the OAP cluster needs to scale out for increased traffic. The load of the new OAP node would be less due to all existing agents having connected to previous nodes. Even without scaling, the load of OAP nodes would be unbalanced, because the agent would keep the connection due to random policy at the booting stage. In these cases, it would become a challenge to keep up the health status of all nodes, and be able to scale out when needed.

In this article, we mainly discuss how to solve this challenge in SkyWalking.

How to Load Balance

SkyWalking mainly uses the gRPC protocol for data transmission, so this article mainly introduces load balancing in the gRPC protocol.

Proxy Or Client-side

Based on the gRPC official Load Balancing blog, there are two approaches to load balancing:

Client-side: The client perceives multiple back-end services and uses a load-balancing algorithm to select a back-end service for each RPC.
Proxy: The client sends the message to the proxy server, and the proxy server load balances the message to the back-end service.

From the perspective of observability system architecture:

	Pros	Cons
Client-side	High performance because of the elimination of extra hop	Complex client (cluster awareness, load balancing, health check, etc.) Ensure each data source to be connected provides complex client capabilities
Proxy	Simple Client	Higher latency

We choose Proxy mode for the following reasons:

Observable data is not very time-sensitive, a little latency caused by transmission is acceptable. A little extra hop is acceptable and there is no impact on the client-side.
As an observability platform, we cannot/should not ask clients to change. They make their own tech decisions and may have their own commercial considerations.

Transmission Policy

In the proxy mode, we should determine the transmission path between downstream and upstream.

Different data protocols require different processing policies. There are two transmission policies:

Synchronous: Suitable for protocols that require data exchange in the client, such as SkyWalking Dynamic Configuration Service. This type of protocol provides real-time results.
Asynchronous batch: Used when the client doesn’t care about the upstream processing results, but only the transmitted data (e.g., trace report, log report, etc.)

The synchronization policy requires that the proxy send the message to the upstream server when receiving the client message, and synchronously return the response data to the downstream client. Usually, only a few protocols need to use the synchronization policy.

As shown below, after the client sends the request to the Proxy, the proxy would send the message to the server synchronously. When the proxy receives the result, it returns to the client.

The asynchronous batch policy means that the data is sent to the upstream server in batches asynchronously. This policy is more common because most protocols in SkyWalking are primarily based on data reporting. We think using the queue as a buffer could have a good effect. The asynchronous batch policy is executed according to the following steps:

The proxy receives the data and wraps it as an Event object.
An event is added into the queue.
When the cycle time is reached or when the queue elements reach the fixed number, the elements in the queue will parallel consume and send to the OAP.

The advantage of using queues is:

Separate data receiving and sending to reduce the mutual influence.
The interval quantization mechanism can be used to combine events, which helps to speed up sending events to the OAP.
Using multi-threaded consumption queue events can make fuller use of network IO.

As shown below, after the proxy receives the message, the proxy would wrap the message as an event and push it to the queue. The message sender would take batch events from the queue and send them to the upstream OAP.

Routing

Routing algorithms are used to route messages to a single upstream server node.

The Round-Robin algorithm selects nodes in order from the list of upstream service nodes. The advantage of this algorithm is that the number of times each node is selected is average. When the size of the data is close to the same, each upstream node can handle the same quantity of data content.

With the Weight Round-Robin, each upstream server node has a corresponding routing weight ratio. The difference from Round-Robin is that each upstream node has more chances to be routed according to its weight. This algorithm is more suitable to use when the upstream server node machine configuration is not the same.

The Fixed algorithm is a hybrid algorithm. It can ensure that the same data is routed to the same upstream server node, and when the upstream server scales out, it still maintains routing to the same node; unless the upstream node does not exist, it will reroute. This algorithm is mainly used in the SkyWalking Meter protocol because this protocol needs to ensure that the metrics of the same service instance are sent to the same OAP node. The Routing steps are as follows:

Generate a unique identification string based on the data content, as short as possible. The amount of data is controllable.
Get the upstream node of identity from LRU Cache, and use it if it exists.
According to the identification, generate the corresponding hash value, and find the upstream server node from the upstream list.
Save the mapping relationship between the upstream server node and identification to LRU Cache.

The advantage of this algorithm is to bind the data with the upstream server node as much as possible, so the upstream server can better process continuous data. The disadvantage is that it takes up a certain amount of memory space to save the corresponding relationship.

As shown below, the image is divided into two parts:

The left side represents that the same data content always is routed to the same server node.
The right side represents the data routing algorithm. Get the number from the data, and use the remainder algorithm to obtain the position.

We choose to use a combination of Round-Robin and Fixed algorithm for routing:

The Fixed routing algorithm is suitable for specific protocols, mainly used when passing metrics data to the SkyWalking Meter protocol
The Round-Robin algorithm is used by default. When the SkyWalking OAP cluster is deployed, the configuration of the nodes needs to be as much the same as possible, so there would be no need to use the Weight Round-Robin algorithm.

How to balance the load balancer itself?

Proxy still needs to deal with the load balancing problem from client to itself, especially when deploying a Proxy cluster in a production environment.

There are three ways to solve this problem:

Connection management: Use the max_connection config on the client-side to specify the maximum connection duration of each connection. For more information, please read the proposal.
Cluster awareness: The proxy has cluster awareness, and actively disconnects the connection when the load is unbalanced to allow the client to re-pick up the proxy.
Resource limit+HPA: Restrict the connection resource situation of each proxy, and no longer accept new connections when the resource limit is reached. And use the HPA mechanism of Kubernetes to dynamically scale out the number of the proxy.

	Connection management	Cluster awareness	Resource Limit+HPA
Pros	Simple to use	Ensure that the number of connections in each proxy is relatively	Simple to use
Cons	Each client needs to ensure that data is not lost The client is required to accept GOWAY responses	May cause a sudden increase in traffic on some nodes Each client needs to ensure that data is not lost	Traffic will not be particularly balanced in each instance

We choose Limit+HPA for these reasons:

Easy to config and use the proxy and easy to understand based on basic data metrics.
No data loss due to broken connection. There is no need for the client to implement any other protocols to prevent data loss, especially when the client is a commercial product.
The connection of each node in the proxy cluster does not need to be particularly balanced, as long as the proxy node itself is high-performance.

SkyWalking-Satellite

We have implemented this Proxy in the SkyWalking-Satellite project. It’s used between Client and SkyWalking OAP, effectively solving the load balancing problem.

After the system is deployed, the Satellite would accept the traffic from the Client, and the Satellite will perceive all the nodes of the OAP through Kubernetes Label Selector or manual configuration, and load balance the traffic to the upstream OAP node.

As shown below, a single client still maintains a connection with a single Satellite, Satellite would establish the connection with each OAP, and load balance message to the OAP node.

When scaling Satellite, we need to deploy the SWCK adapter and configure the HPA in Kubernetes. SWCK is a platform for the SkyWalking users, provisions, upgrades, maintains SkyWalking relevant components, and makes them work natively on Kubernetes.

After deployment is finished, the following steps would be performed:

Read metrics from OAP: HPA requests the SWCK metrics adapter to dynamically read the metrics in the OAP.
Scaling the Satellite: Kubernetes HPA senses that the metrics values are in line with expectations, so the Satellite would be scaling automatically.

As shown below, use the dotted line to divide the two parts. HPA uses SWCK Adapter to read the metrics in the OAP. When the threshold is met, HPA would scale the Satellite deployment.

Example

In this section, we will demonstrate two cases:

SkyWalking Scaling: After SkyWalking OAP scaling, the traffic would auto load balancing through Satellite.
Satellite Scaling: Satellite’s own traffic load balancing.

NOTE: All commands could be accessed through GitHub.

SkyWalking Scaling

We will use the bookinfo application to demonstrate how to integrate Apache SkyWalking 8.9.1 with Apache SkyWalking-Satellite 0.5.0, and observe the service mesh through the Envoy ALS protocol.

Before starting, please make sure that you already have a Kubernetes environment.

Install Istio

Istio provides a very convenient way to configure the Envoy proxy and enable the access log service. The following step:

Install the istioctl locally to help manage the Istio mesh.
Install Istio into the Kubernetes environment with a demo configuration profile, and enable the Envoy ALS. Transmit the ALS message to the satellite. The satellite we will deploy later.
Add the label into the default namespace so Istio could automatically inject Envoy sidecar proxies when you deploy your application later.

# install istioctl
export ISTIO_VERSION=1.12.0
curl -L https://istio.io/downloadIstio | sh - 
sudo mv $PWD/istio-$ISTIO_VERSION/bin/istioctl /usr/local/bin/

# install istio
istioctl install -y --set profile=demo \
	--set meshConfig.enableEnvoyAccessLogService=true \
	--set meshConfig.defaultConfig.envoyAccessLogService.address=skywalking-system-satellite.skywalking-system:11800

# enbale envoy proxy in default namespace
kubectl label namespace default istio-injection=enabled

Install SWCK

SWCK provides convenience for users to deploy and upgrade SkyWalking related components based on Kubernetes. The automatic scale function of Satellite also mainly relies on SWCK. For more information, you could refer to the official documentation.

# Install cert-manager
kubectl apply -f https://github.com/jetstack/cert-manager/releases/download/v1.3.1/cert-manager.yaml

# Deploy SWCK
mkdir -p skywalking-swck && cd skywalking-swck
wget https://dlcdn.apache.org/skywalking/swck/0.6.1/skywalking-swck-0.6.1-bin.tgz
tar -zxvf skywalking-swck-0.6.1-bin.tgz
cd config
kubectl apply -f operator-bundle.yaml

Deploy Apache SkyWalking And Apache SkyWalking-Satellite

We have provided a simple script to deploy the skywalking OAP, UI, and Satellite.

# Create the skywalking components namespace
kubectl create namespace skywalking-system
kubectl label namespace skywalking-system swck-injection=enabled
# Deploy components
kubectl apply -f https://raw.githubusercontent.com/mrproliu/sw-satellite-demo-scripts/5821a909b647f7c8f99c70378e197630836f45f7/resources/sw-components.yaml

Deploy Bookinfo Application

export ISTIO_VERSION=1.12.0
kubectl apply -f https://raw.githubusercontent.com/istio/istio/$ISTIO_VERSION/samples/bookinfo/platform/kube/bookinfo.yaml
kubectl wait --for=condition=Ready pods --all --timeout=1200s
kubectl port-forward service/productpage 9080

Next, please open your browser and visit http://localhost:9080. You should be able to see the Bookinfo application. Refresh the webpage several times to generate enough access logs.

Then, you can see the topology and metrics of the Bookinfo application on SkyWalking WebUI. At this time, you can see that the Satellite is working!

Deploy Monitor

We need to install OpenTelemetry Collector to collect metrics in OAPs and analyze them.

# Add OTEL collector
kubectl apply -f https://raw.githubusercontent.com/mrproliu/sw-satellite-demo-scripts/5821a909b647f7c8f99c70378e197630836f45f7/resources/otel-collector-oap.yaml

kubectl port-forward -n skywalking-system  service/skywalking-system-ui 8080:80

Next, please open your browser and visit http://localhost:8080/ and create a new item on the dashboard. The SkyWalking Web UI pictured below shows how the data content is applied.

Scaling OAP

Scaling the number of OAPs by deployment.

kubectl scale --replicas=3 -n skywalking-system deployment/skywalking-system-oap

Done!

After a period of time, you will see that the number of OAPs becomes 3, and the ALS traffic is balanced to each OAP.

Satellite Scaling

After we have completed the SkyWalking Scaling, we would carry out the Satellite Scaling demo.

Deploy SWCK HPA

SWCK provides an adapter to implement the Kubernetes external metrics to adapt the HPA through reading the metrics in SkyWalking OAP. We expose the metrics service in Satellite to OAP and configure HPA Resource to auto-scaling the Satellite.

Install the SWCK adapter into the Kubernetes environment:

kubectl apply -f skywalking-swck/config/adapter-bundle.yaml

Create the HPA resource, and limit each Satellite to handle a maximum of 10 connections:

kubectl apply -f https://raw.githubusercontent.com/mrproliu/sw-satellite-demo-scripts/5821a909b647f7c8f99c70378e197630836f45f7/resources/satellite-hpa.yaml

Then, you could see we have 9 connections in one satellite. One envoy proxy may establish multiple connections to the satellite.

$ kubectl get HorizontalPodAutoscaler -n skywalking-system
NAME       REFERENCE                                TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
hpa-demo   Deployment/skywalking-system-satellite   9/10      1         3         1          5m18s

Scaling Application

The scaling application could establish more connections to the satellite, to verify whether the HPA is in effect.

kubectl scale --replicas=3 deployment/productpage-v1 deployment/details-v1

Done!

By default, Satellite will deploy a single instance and a single instance will only accept 11 connections. HPA resources limit one Satellite to handle 10 connections and use a stabilization window to make Satellite stable scaling up. In this case, we deploy the Bookinfo application in 10+ instances after scaling, which means that 10+ connections will be established to the Satellite.

So after HPA resources are running, the Satellite would be automatically scaled up to 2 instances. You can learn about the calculation algorithm of replicas through the official documentation. Run the following command to view the running status:

$ kubectl get HorizontalPodAutoscaler -n skywalking-system --watch
NAME       REFERENCE                                TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
hpa-demo   Deployment/skywalking-system-satellite   11/10     1         3         1          3m31s
hpa-demo   Deployment/skywalking-system-satellite   11/10     1         3         1          4m20s
hpa-demo   Deployment/skywalking-system-satellite   11/10     1         3         2          4m38s
hpa-demo   Deployment/skywalking-system-satellite   11/10     1         3         2          5m8s
hpa-demo   Deployment/skywalking-system-satellite   6/10      1         3         2          5m23s

By observing the “number of connections” metric, we would be able to see that when the number of connections of each gRPC exceeds 10 connections, then the satellite automatically scales through the HPA rule. As a result, the connection number is down to normal status (in this example, less than 10)

swctl metrics linear --name satellite_service_grpc_connect_count --service-name satellite::satellite-service

Blog: SkyWalking Python Agent Supports Profiling Now

Sun, 12 Sep 2021 00:00:00 +0000

The Java Agent of Apache SkyWalking has supported profiling since v7.0.0, and it enables users to troubleshoot the root cause of performance issues, and now we bring it into Python Agent. In this blog, we will show you how to use it, and we will introduce the mechanism of profiling.

How to use profiling in Python Agent

This feature is released in Python Agent at v0.7.0. It is turned on by default, so you don’t need any extra configuration to use it. You can find the environment variables about it here.

Here are the demo codes of an intentional slow application.

import time

def method1():
    time.sleep(0.02)
    return '1'

def method2():
    time.sleep(0.02)
    return method1()

def method3():
    time.sleep(0.02)
    return method2()

if __name__ == '__main__':
    import socketserver
    from http.server import BaseHTTPRequestHandler

    class SimpleHTTPRequestHandler(BaseHTTPRequestHandler):

        def do_POST(self):
            method3()
            time.sleep(0.5)
            self.send_response(200)
            self.send_header('Content-Type', 'application/json')
            self.end_headers()
            self.wfile.write('{"song": "Despacito", "artist": "Luis Fonsi"}'.encode('ascii'))

    PORT = 19090
    Handler = SimpleHTTPRequestHandler

    with socketserver.TCPServer(("", PORT), Handler) as httpd:
        httpd.serve_forever()

We can start it with SkyWalking Python Agent CLI without changing any application code now, which is also the latest feature of v0.7.0. We just need to add sw-python run before our start command(i.e. sw-python run python3 main.py), to start the application with python agent attached. More information about sw-python can be found there.

Then, we should add a new profile task for the / endpoint from the SkyWalking UI, as shown below.

We can access it by curl -X POST http://localhost:19090/, after that, we can view the result of this profile task on the SkyWalking UI.

The mechanism of profiling

When a request lands on an application with the profile function enabled, the agent begins the profiling automatically if the request’s URI is as required by the profiling task. A new thread is spawned to fetch the thread dump periodically until the end of request.

The agent sends these thread dumps, called ThreadSnapshot, to SkyWalking OAPServer, and the OAPServer analyzes those ThreadSnapshot(s) and gets the final result. It will take a method invocation with the same stack depth and code signature as the same operation, and estimate the execution time of each method from this.

Let’s demonstrate how this analysis works through the following example. Suppose we have such a program below and we profile it at 10ms intervals.

def main():
    methodA()

def methodA():
    methodB()

def methodB():
    methodC()
    methodD()

def methodC():
    time.sleep(0.04)

def methodD():
    time.sleep(0.06)

The agent collects a total of 10 ThreadSnapShot(s) over the entire time period(Diagram A). The first 4 snapshots represent the thread dumps during the execution of function C, and the last 6 snapshots represent the thread dumps during the execution of function D. After the analysis of OAPServer, we can see the result of this profile task on the SkyWalking Rocketbot UI as shown in the right of the diagram. With this result, we can clearly see the function call relationship and the time consumption situation of this program.

Diagram A

You can read more details of profiling theory from this blog.

We hope you enjoy the profile in the Python Agent, and if so, you can give us a star on Python Agent and SkyWalking on GitHub.

Blog: End-User Tracing in a SkyWalking-Observed Browser

Thu, 25 Mar 2021 00:00:00 +0000

Origin: End-User Tracing in a SkyWalking-Observed Browser - The New Stack

Apache SkyWalking: an APM (application performance monitor) system, especially designed for microservices, cloud native, and container-based (Docker, Kubernetes, Mesos) architectures.

skywalking-client-js: a lightweight client-side JavaScript exception, performance, and tracing library. It provides metrics and error collection to the SkyWalking backend. It also makes the browser the starting point for distributed tracing.

Background

Web application performance affects the retention rate of users. If a page load time is too long, the user will give up. So we need to monitor the web application to understand performance and ensure that servers are stable, available and healthy. SkyWalking is an APM tool and the skywalking-client-js extends its monitoring to include the browser, providing performance metrics and error collection to the SkyWalking backend.

Performance Metrics

The skywalking-client-js uses [window.performance] (https://developer.mozilla.org/en-US/docs/Web/API/Window/performance) for performance data collection. From the MDN doc, the performance interface provides access to performance-related information for the current page. It’s part of the High Resolution Time API, but is enhanced by the Performance Timeline API, the Navigation Timing API, the User Timing API, and the Resource Timing API. In skywalking-client-js, all performance metrics are calculated according to the Navigation Timing API defined in the W3C specification. We can get a PerformanceTiming object describing our page using the window.performance.timing property. The PerformanceTiming interface contains properties that offer performance timing information for various events that occur during the loading and use of the current page.

We can better understand these attributes when we see them together in the figure below from W3C:

The following table contains performance metrics in skywalking-client-js.

Metrics Name	Describe	Calculating Formulae	Note
redirectTime	Page redirection time	redirectEnd - redirectStart	If the current document and the document that is redirected to are not from the same origin, set redirectStart, redirectEnd to 0
ttfbTime	Time to First Byte	responseStart - requestStart	According to Google Development
dnsTime	Time to DNS query	domainLookupEnd - domainLookupStart
tcpTime	Time to TCP link	connectEnd - connectStart
transTime	Time to content transfer	responseEnd - responseStart
sslTime	Time to SSL secure connection	connectEnd - secureConnectionStart	Only supports HTTPS
resTime	Time to resource loading	loadEventStart - domContentLoadedEventEnd	Represents a synchronized load resource in pages
fmpTime	Time to First Meaningful Paint	-	Listen for changes in page elements. Traverse each new element, and calculate the total score of these elements. If the element is visible, the score is 1 * weight; if the element is not visible, the score is 0
domAnalysisTime	Time to DOM analysis	domInteractive - responseEnd
fptTime	First Paint Time	responseEnd - fetchStart
domReadyTime	Time to DOM ready	domContentLoadedEventEnd - fetchStart
loadPageTime	Page full load time	loadEventStart - fetchStart
ttlTime	Time to interact	domInteractive - fetchStart
firstPackTime	Time to first package	responseStart - domainLookupStart

Skywalking-client-js collects those performance metrics and sends them to the OAP (Observability Analysis Platform) server , which aggregates data on the back-end side that is then shown in visualizations on the UI side. Users can optimize the page according to these data.

Exception Metrics

There are five kinds of errors that can be caught in skywalking-client-js:

The resource loading error is captured by window.addeventlistener ('error ', callback, true)
window.onerror catches JS execution errors
window.addEventListener('unhandledrejection', callback) is used to catch the promise errors
the Vue errors are captured by Vue.config.errorHandler
the Ajax errors are captured by addEventListener('error', callback); addEventListener('abort', callback); addEventListener('timeout', callback); in send callback.

The Skywalking-client-js traces error data to the OAP server, finally visualizing data on the UI side. For an error overview of the App, there are several metrics for basic statistics and trends of errors, including the following metrics.

App Error Count, the total number of errors in the selected time period.
App JS Error Rate, the proportion of PV with JS errors in a selected time period to total PV.
All of Apps Error Count, Top N Apps error count ranking.
All of Apps JS Error Rate, Top N Apps JS error rate ranking.
Error Count of Versions in the Selected App, Top N Error Count of Versions in the Selected App ranking.
Error Rate of Versions in the Selected App, Top N JS Error Rate of Versions in the Selected App ranking.
Error Count of the Selected App, Top N Error Count of the Selected App ranking.
Error Rate of the Selected App, Top N JS Error Rate of the Selected App ranking.

For pages, we use several metrics for basic statistics and trends of errors, including the following metrics:

Top Unstable Pages / Error Rate, Top N Error Count pages of the Selected version ranking.
Top Unstable Pages / Error Count, Top N Error Count pages of the Selected version ranking.
Page Error Count Layout, data display of different errors in a period of time.

User Metrics

SkyWalking browser monitoring also provides metrics about how the visitors use the monitored websites, such as PV(page views), UV(unique visitors), top N PV(page views), etc.

In SPAs (single page applications), the page will be refreshed only once. The traditional method only reports PV once after the page loading, but cannot count the PV of each sub-page, and can’t make other types of logs aggregate by sub-page.

SkyWalking browser monitoring provides two processing methods for SPA pages:

Enable SPA automatic parsing. This method is suitable for most single page application scenarios with URL hash as the route. In the initialized configuration item, set enableSPA to true, which will turn on the page’s hashchange event listener (trigger re reporting PV), and use URL hash as the page field in other data reporting.
Manual reporting. This method can be used in all single page application scenarios. This method can be used if the first method is not usable. The following example provides a set page method to manually update the page name when data is reported. When this method is called, the page PV will be re reported by default:

app.on('routeChange', function (to) {
    ClientMonitor.setPerformance({
    collector: 'http://127.0.0.1:8080',
    service: 'browser-app',
    serviceVersion: '1.0.0',
    pagePath: to.path,
    autoTracePerf: true,
    enableSPA: true,
  });
});

Let’s take a look at the result found in the following image. It shows the most popular applications and versions, and the changes of PV over a period of time.

Make the browser the starting point for distributed tracing

SkyWalking browser monitoring intercepts HTTP requests to trace segments and spans. It supports tracking these following modes of HTTP requests: XMLHttpRequest and fetch. It also supports tracking libraries and tools based on XMLHttpRequest and fetch - such as Axios, SuperAgent, OpenApi, and so on.

Let’s see how the SkyWalking browser monitoring intercepts HTTP requests:

After this, use window.addEventListener('xhrReadyStateChange', callback) and set the readyState value tosw8 = xxxx in the request header. At the same time, reporting requests information to the back-end side. Finally, we can view trace data on the trace page. The following graphic is from the trace page:

To see how we listen for fetch requests, let’s see the source code of fetch

As you can see, it creates a promise and a new XMLHttpRequest object. Because the code of the fetch is built into the browser, it must monitor the code execution first. Therefore, when we add listening events, we can’t monitor the code in the fetch. Just after monitoring the code execution, let’s rewrite the fetch:

import { fetch } from 'whatwg-fetch'; window.fetch = fetch;

In this way, we can intercept the fetch request through the above method.

Additional Resources

End-User Tracing in a SkyWalking-Observed Browser.

Blog: Apache SkyWalking: Use Profiling to Fix the Blind Spot of Distributed Tracing

Mon, 13 Apr 2020 00:00:00 +0000

This post originally appears on The New Stack

This post introduces a way to automatically profile code in production with Apache SkyWalking. We believe the profile method helps reduce maintenance and overhead while increasing the precision in root cause analysis.

Limitations of the Distributed Tracing

In the early days, metrics and logging systems were the key solutions in monitoring platforms. With the adoption of microservice and distributed system-based architecture, distributed tracing has become more important. Distributed tracing provides relevant service context, such as system topology map and RPC parent-child relationships.

Some claim that distributed tracing is the best way to discover the cause of performance issues in a distributed system. It’s good at finding issues at the RPC abstraction, or in the scope of components instrumented with spans. However, it isn’t that perfect.

Have you been surprised to find a span duration longer than expected, but no insight into why? What should you do next? Some may think that the next step is to add more instrumentation, more spans into the trace, thinking that you would eventually find the root cause, with more data points. We’ll argue this is not a good option within a production environment. Here’s why:

There is a risk of application overhead and system overload. Ad-hoc spans measure the performance of specific scopes or methods, but picking the right place can be difficult. To identify the precise cause, you can “instrument” (add spans to) many suspicious places. The additional instrumentation costs more CPU and memory in the production environment. Next, ad-hoc instrumentation that didn’t help is often forgotten, not deleted. This creates a valueless overhead load. In the worst case, excess instrumentation can cause performance problems in the production app or overload the tracing system.
The process of ad-hoc (manual) instrumentation usually implies at least a restart. Trace instrumentation libraries, like Zipkin Brave, are integrated into many framework libraries. To instrument a method’s performance typically implies changing code, even if only an annotation. This implies a re-deploy. Even if you have the way to do auto instrumentation, like Apache SkyWalking, you still need to change the configuration and reboot the app. Otherwise, you take the risk of GC caused by hot dynamic instrumentation.
Injecting instrumentation into an uninstrumented third party library is hard and complex. It takes more time and many won’t know how to do this.
Usually, we don’t have code line numbers in the distributed tracing. Particularly when lambdas are in use, it can be difficult to identify the line of code associated with a span. Regardless of the above choices, to dive deeper requires collaboration with your Ops or SRE team, and a shared deep level of knowledge in distributed tracing.

Regardless of the above choices, to dive deeper requires collaboration with your Ops or SRE team, and a shared deep level of knowledge in distributed tracing.

Profiling in Production

Introduction

To reuse distributed tracing to achieve method scope precision requires an understanding of the above limitations and a different approach. We called it PROFILE.

Most high-level languages build and run on a thread concept. The profile approach takes continuous thread dumps. We merge the thread dumps to estimate the execution time of every method shown in the thread dumps. The key for distributed tracing is the tracing context, identifiers active (or current) for the profiled method. Using this trace context, we can weave data harvested from profiling into existing traces. This allows the system to automate otherwise ad-hoc instrumentation. Let’s dig deeper into how profiling works:

We consider a method invocation with the same stack depth and signature (method, line number etc), the same operation. We derive span timestamps from the thread dumps the same operation is in. Let’s put this visually:

Above, represents 10 successive thread dumps. If this method is in dumps 4-8, we assume it started before dump 4 and finished after dump 8. We can’t tell exactly when the method started and stopped. but the timestamps of thread dumps are close enough.

To reduce overhead caused by thread dumps, we only profile methods enclosed by a specific entry point, such as a URI or MVC Controller method. We identify these entry points through the trace context and the APM system.

The profile does thread dump analysis and gives us:

The root cause, precise to the line number in the code.
Reduced maintenance as ad-hoc instrumentation is obviated.
Reduced overload risk caused by ad-hoc instrumentation.
Dynamic activation: only when necessary and with a very clear profile target.

Implementing Precise Profiling with Apache SkyWalking 7

Distributed profiling is built-into Apache SkyWalking application performance monitoring (APM). Let’s demonstrate how the profiling approach locates the root cause of the performance issue.

final CountDownLatchcountDownLatch= new CountDownLatch(2);
 
threadPool.submit(new Task1(countDownLatch));
threadPool.submit(new Task2(countDownLatch));
 
try {
   countDownLatch.await(500, TimeUnit.MILLISECONDS);
} catch (InterruptedExceptione) {
}

Task1 and Task2 have a race condition and unstable execution time: they will impact the performance of each other and anything calling them. While this code looks suspicious, it is representative of real life. People in the OPS/SRE team are not usually aware of all code changes and who did them. They only know something in the new code is causing a problem.

To make matters interesting, the above code is not always slow: it only happens when the condition is locked. In SkyWalking APM, we have metrics of endpoint p99/p95 latency, so, we are easy to find out the p99 of this endpoint is far from the avg response time. However, this is not the same as understanding the cause of the latency. To locate the root cause, add a profile condition to this endpoint: duration greater than 500ms. This means faster executions will not add profiling load.

This is a typical profiled trace segment (part of the whole distributed trace) shown on the SkyWalking UI. We now notice the “service/processWithThreadPool” span is slow as we expected, but why? This method is the one we added the faulty code to. As the UI shows that method, we know the profiler is working. Now, let’s see what the profile analysis result say.

This is the profile analysis stack view. We see the stack element names, duration (include/exclude the children) and slowest methods have been highlighted. It shows clearly, “sun.misc.Unsafe.park” costs the most time. If we look for the caller, it is the code we added: CountDownLatch.await.

The Limitations of the Profile Method

No diagnostic tool can fit all cases, not even the profile method.

The first consideration is mistaking a repeatedly called method for a slow method. Thread dumps are periodic. If there is a loop of calling one method, the profile analysis result would say the target method is slow because it is captured every time in the dump process. There could be another reason. A method called many times can also end up captured in each thread dump. Even so, the profile did what it is designed for. It still helps the OPS/SRE team to locate the code having the issue.

The second consideration is overhead, the impact of repeated thread dumps is real and can’t be ignored. In SkyWalking, we set the profile dump period to at least 10ms. This means we can’t locate method performance issues if they complete in less than 10ms. SkyWalking has a threshold to control the maximum parallel degree as well.

Understanding the above keeps distributed tracing and APM systems useful for your OPS/SRE team.

How to Try This

Everything we discussed, including the Apache SkyWalking Java Agent, profile analysis code, and UI, could be found in our GitHub repository. We hope you enjoyed this new profile method, and love Apache SkyWalking. If so, give us a star on GitHub to encourage us.

SkyWalking 7 has just been released. You can contact the project team through the following channels:

Follow SkyWalking twitter.
Subscribe mailing list: dev@skywalking.apache.org. Send to dev-subscribe@kywalking.apache.org to subscribe to the mail list.

Co-author Sheng Wu is a Tetrate founding engineer and the founder and VP of Apache SkyWalking. He is solving the problem of observability for large-scale service meshes in hybrid and multi-cloud environments.

Adrian Cole works in the Spring Cloud team at VMware, mostly on Zipkin

Han Liu is a tech expert at Lagou. He is an Apache SkyWalking committer

Blog: SkyWalking performance in Service Mesh scenario

Fri, 25 Jan 2019 00:00:00 +0000

Author: Hongtao Gao, Apache SkyWalking & ShardingShpere PMC
GitHub, Twitter, Linkedin

Service mesh receiver was first introduced in Apache SkyWalking 6.0.0-beta. It is designed to provide a common entrance for receiving telemetry data from service mesh framework, for instance, Istio, Linkerd, Envoy etc. What’s the service mesh? According to Istio’s explain:

The term service mesh is used to describe the network of microservices that make up such applications and the interactions between them.

As a PMC member of Apache SkyWalking, I tested trace receiver and well understood the performance of collectors in trace scenario. I also would like to figure out the performance of service mesh receiver.

Different between trace and service mesh

Following chart presents a typical trace map:

You could find a variety of elements in it just like web service, local method, database, cache, MQ and so on. But service mesh only collect service network telemetry data that contains the entrance and exit data of a service for now(more elements will be imported soon, just like Database). A smaller quantity of data is sent to the service mesh receiver than the trace.

But using sidecar is a little different.The client requesting “A” that will send a segment to service mesh receiver from “A”’s sidecar. If “A” depends on “B”, another segment will be sent from “A”’s sidecar. But for a trace system, only one segment is received by the collector. The sidecar model splits one segment into small segments, that will increase service mesh receiver network overhead.

Deployment Architecture

In this test, I will pick two different backend deployment. One is called mini unit, consist of one collector and one elasticsearch instance. Another is a standard production cluster, contains three collectors and three elasticsearch instances.

Mini unit is a suitable architecture for dev or test environment. It saves your time and VM resources, speeds up depolyment process.

The standard cluster provides good performance and HA for a production scenario. Though you will pay more money and take care of the cluster carefully, the reliability of the cluster will be a good reward to you.

I pick 8 CPU and 16GB VM to set up the test environment. This test targets the performance of normal usage scenarios, so that choice is reasonable. The cluster is built on Google Kubernetes Engine(GKE), and every node links each other with a VPC network. For running collector is a CPU intensive task, the resource request of collector deployment should be 8 CPU, which means every collector instance occupy a VM node.

Testing Process

Receiving mesh fragments per second(MPS) depends on the following variables.

Ingress query per second(QPS)
The topology of a microservice cluster
Service mesh mode(proxy or sidecar)

In this test, I use Bookinfo app as a demo cluster.

So every request will touch max 4 nodes. Plus picking the sidecar mode(every request will send two telemetry data), the MPS will be QPS * 4 *2.

There are also some important metrics that should be explained

Client Query Latency: GraphQL API query response time heatmap.
Client Mesh Sender: Send mesh segments per second. The total line represents total send amount and the error line is the total number of failed send.
Mesh telemetry latency: service mesh receiver handling data heatmap.
Mesh telemetry received: received mesh telemetry data per second.

Mini Unit

You could find collector can process up to 25k data per second. The CPU usage is about 4 cores. Most of the query latency is less than 50ms. After login the VM on which collector instance running, I know that system load is reaching the limit(max is 8).

According to the previous formula, a single collector instance could process 3k QPS of Bookinfo traffic.

Standard Cluster

Compare to the mini-unit, cluster’s throughput increases linearly. Three instances provide total 80k per second processing power. Query latency increases slightly, but it’s also very small(less than 500ms). I also checked every collector instance system load that all reached the limit. 10k QPS of BookInfo telemetry data could be processed by the cluster.

Conclusion

Let’s wrap them up. There are some important things you could get from this test.

QPS varies by the there variables. The test results in this blog are not important. The user should pick property value according to his system.
Collector cluster’s processing power could scale out.
The collector is CPU intensive application. So you should provide sufficient CPU resource to it.

This blog gives people a common method to evaluate the throughput of Service Mesh Receiver. Users could use this to design their Apache Skywalking backend deployment architecture.