ITPub博客

首页 > 数据库 > Oracle > 【思考点亮生命之光】AWR之父Graham Wood专访

【思考点亮生命之光】AWR之父Graham Wood专访

Oracle 作者:tianya_2011 时间:2017-10-12 08:42:40 0 删除 编辑


Graham Wood  

Oracle RWP资深架构师


Graham 在性能优化方面有30多年的经验,设计和优化过众多大用户量数据量的高端系统,创建及参与创建了包括 Statspack 在内的很多监控工具以及数据库性能相关的特性,简化了数据库监控的流程,是 AWR,ASH,ADDM 的架构师,被业界尊称为 AWR之 父。Graham 曾多次在 Oracle Open World 以及世界各地的第三方会议上演讲,分享如何高效的使用 Oracle 的强大产品。

It’s 22 Mar, less than a week before the RWP China Tour, another technical seminar about Oracle database will also be held at the end of this month in Beijing, on cooperation with RWP and ACOUG.

由 RWP 团队和 ACOUG 联合主办的 RWP 中国之旅将于3月28日在京隆重举办,倒计时一周,一场围绕 Oracle 性能的技术盛会将全面袭来,大家是否充满期待?

 

We are all so excited about this conference, and the other reason we are gathering here today is (now you can prepare your applause if you like) that I am going to introduce you Mr. Wood, the father of AWR, the senior architect of Oracle Real-World Performance.

下面为大家隆重介绍:AWR 之父,Graham Wood,同时也是 Oracle RWP 团队的资深架构师。


Q1:Hello, Mr. Wood. It’s an honor to have this conversation with you, Iguess most people work on database have known you well, but for new friends whomight also read this, please tell something about yourself, say hello toeverybody.

问题一:Mr. Wood您好,很荣幸能对您做这次采访,首先请您简单介绍一下自己,跟大家打个招呼吧!

Graham:Hi, I am Graham Wood. I’ve been working at Oracle for over 30 years now and almost all of that time has been working on performance related. Almost as soon as I started working with the Oracle database I was working on projects for which performance was key. As well as working with customers worldwide, I was the architect for AWR, ASH and ADDM which were introduced in Oracle 10g with the intention of making it easier to diagnosis the root cause of performance problems. Now I spend most of my time using those tools.

嗨,我是Graham Wood,已经在 Oracle 工作了30多年,几乎一直从事与与性能有关的工作。 几乎从我开始从事 Oracle,我主要负责的就是性能优化。在与全球客户一起工作的同时,我还是 AWR,ASH 和 ADDM 的架构师,这些工具在 Oracle 10g 的时候被引入,目的是使诊断性能问题的根本原因更容易一些。这也是我现在的主要做的事情。


I’m not only interested in the performance of computer systems. I took a flight on Concorde before it was retired and last time I was in China I took the world’s fastest train to Shanghai Airport. I also like fast cars and fast motorcycles.

我不仅仅是对计算机系统性能感兴趣,在协和飞机退役之前,我还曾乘坐过一次;上次在中国,我也乘坐世界上最快的火车前往上海机场; 快车和快速摩托车也是我最爱的。

2. It has been two years since last RWP China Tour, which was held in Oct 2015. No need to mention that it was a real great success, 300 technology enthusiasts gather together, they sharing, discussing and of course progressing. It was so much more than collision of technology.

上一次的RWP中国之旅是在2015年举办的,那是一次成功的技术盛会。广大的Oracle技术爱好者欢聚一堂,思维碰撞,共同成长!3月28,我们将再次在北京相聚,我相信大家都已经迫不及待要再次一睹RWP团队的风采,并在其最新研究中感悟技术之美!


Q2:Mr. Wood, would you like to share some interesting memory or the most unforgettable part about last RWP China Tour in Beijing, and what’s your biggest expectation about this time, what subject will you share in conference?

问题二:关于上次的RWP中国之旅您还记得吗?有没有什么有趣或者难忘的经历想和大家分享的呢?这次RWP之行,您有什么期待,您将会分享什么内容给大家呢?

Graham:I think that we were all very impressed by the enthusiasm and thirst for knowledge from the attendees. For me the most unforgettable part was the age of the attendees. When presenting at conferences I used to give out pens to attendees who were not born in 1986 when I joined Oracle. It would be very expensive to do that in China!

与会者的热情和对知识的渴望让我们印象深刻。 对我来说,最难忘的是与会者的年龄差。之前的会议上,我们会给在1986年还没有出生的参会者赠送礼物,那一年刚好是我加入Oracle。但在中国,竟然有大部分的参会者都是1986年之后出生的年轻人。


Obviously technology has moved on in the last couple of years and we have seen new patterns emerging in the systems that we look at. Unfortunately new technology is not automatically better in all cases, and we will be sharing example of misusing development approaches that can severely limit system throughput.

显然,技术在过去的几年中已经发生了变化,系统中出现了新的模式。但很不幸,在所有情况下新技术都不会自我完善,所以我们将分享一些由于开发不当导致严重限制系统吞吐量的示例。

Q3:AWR is very important content for Oracle database, it has been helping operators to quickly recognize and resolve the performance issues of database since Oracle 10g, brought a lot of convenience. As the father of AWR, what is your original intention of designing this product, and what’s the major change about AWR in 12c release 2, if you don’t mind, we would also like to know how will it develop in future?

问题三:AWR是Oracle数据库中很重要的一部分,自从10g 中推出该功能以后,它帮助广大的运维者能够快速排查和处理性能问题,作为AWR之父,您最初是出于什么样的初衷设计这样一个工具的?在12.2中AWR发生的最重要的变化您可以跟可以介绍一下吗? 如果您不介意的话,我们也想知道AWR今后将会朝着什么样的方向发展。

Graham:The goal of AWR was to have a single, unified repository populated by ‘always on’ data collection about the operation of an Oracle database. By having this data collected continuously we are never in the position that we have to guess about the reason when we have a performance issue on a system, or have to reproduce a problem before we can diagnose it. The data is always there to allow us to diagnose a problem the very first time it happens. Key to the success of the diagnosis is the concept of DB Time, focusing on the only thing that really matters in performance, time. It allows us to determine the amount of leverage that any one issue has in the overall performance of the database and also allows for some automation of the diagnostic process by ADDM.

AWR 的目标是拥有一个统一的存储库,数据来源于 Oracle 数据库操作中“始终开启”的数据采集功能。通过不断累积数据,我们则无需在系统出现性能问题时揣测原因,或者在诊断前必须将问题重现。 历史数据总能帮我们在故障发生的第一时间将原因确诊。诊断成功的关键是 DB Time 概念,关注的重点永远是性能问题的最主要因素 - 时间。 它能让我们确定任何一个因素在数据库整体性能上的比例,还能通过 ADDM 进行自动化诊断。


With each new version of the database there are new challenges and requirements for AWR. For Oracle Database 12c release 2 there were obviously major new features that had to be addressed in AWR, most prominently the pluggable multi-tenant databases, while continuing to enhance and improve for existing features, such Active Data Guard, capturing SQL Monitor data into AWR and incorporating data from ASH and ADDM into the AWR report, so we no longer need to ask for multiple files.

AWR 对每个新的数据库版本都有新的挑战和要求。对于 Oracle 12.2,最明显的主要新功能是必须在 AWR 中解决,最突出就是可插拔的多租户数据库,不断增强和改进了现有功能,如 Active Data Guard,将 SQL 监控数据捕获到 AWR 并整合来自 ASH 和 ADDM 的数据到 AWR 报告中,所以我们无需再要多个文件。

Q4:With the era of automation and intelligent operation coming, many people want to rescue DBA from repeat cumbersome work, so they have designed a variety of performance monitoring and optimization software. What do you think makes a good monitoring or optimization software,and with so many of these  software and tools out there, would you like to analyse the best application scenario for these tools/softwares, how to make the most of them ?

问题四:随着自动化和智能化运维的发展,很多人都想把DBA从繁冗重复的工作中解救出来,为此出现了很多各类的性能监控和优化工具,Oracle也有很多自带的性能工具,您能跟大家分析一下这些工具的最佳使用场景吗?如何才能更好地发挥这些工具的价值呢?

Graham:There certainly are many tools out there and the key tousing them effectively is make sure that you have correctly scoped the problem before using the right tool. In RWP we always take a top down approach, normally starting with AWR/ADDM to look at a database wide view. From there we will drill down into the issues found. If the issue comes down to SQL, then we will drill down into SQL Monitor reports to get the most detailed information about how and why a statement is consuming resources. Key data sources that we look at in SQL Monitor are the Estimated Rows and Actual Rows which will often lead us to understanding poor plans caused by cardinality mismatches. If we are trying to understand the flow of an application (often in the case where the developer has moved on) then we will use SQL Trace to give a complete database side view of the application. SQL Trace can also give us lots of details about how much time an application is spending outside the database and lead to a ‘not a database problem’ diagnosis where the majority of the time is outside the database. Other scenarios may require us to look at other data. For example diagnosing locking issues we would look at ASH data to identify the activity (if any) of the lock holder and waiters. We can do this easily through the Enterprise Manager Top Activity screen or directly by writing SQL against the ASH data in AWR.

工具肯定是有很多的,有效使用的关键是确保在使用之前,你已经正确地定位了问题。在 RWP 中,我们始终采取自上而下的方法,通常从 AWR / ADDM 开始全面检查数据库,从而可以深入了解发现的根源。如果问题归结于 SQL,那么我们将深入了解 SQL 监控报告,以获取如何以及为何某一语句消耗资源的相关详细信息。我们在 SQL 监控中看到的关键数据源是估计行和实际行,这些通常会使我们了解基数不匹配导致的不良计划。如果我们试图了解应用程序的流程(通常在开发人员已经移动的情况下),那么我们将使用 SQL 跟踪来提供应用程序的完整数据库端视图。 SQL 跟踪还可以提供关于应用程序在数据库之外花费时间的细节,并给出“非数据库问题”的诊断。其他情况可能要求我们查看其他数据,例如诊断锁定问题,可以查看 ASH 数据,以确定锁的持有和等待活动(如果有的话)。可以通过企业管理器顶部活动屏幕轻松完成,也可以直接通过在 AWR 中针对 ASH 数据编写 SQL。

Q5: In the past few years, the traditional industry has been under huge impact of Internet,in fact they need to changes their IT architecture to adapt the trend ,so how do you think the IT architecture ofinternet or traditional industry would change in like 20 or more years?

问题五:在最近这几年,互联网给传统行业带来了巨大的影响和改变, 很多传统行业做了IT系统架构的改变去迎接潮流,在您看来,今后比如说十年或者20年的时间里,传统行业的IT架构将会有什么样的发展趋势呢?

Graham:20 years! I don’t think anyone has a crystal ball that good! I can certainly say that in that time, applications will come and go, but data will always be there and processing that data will always be major way that companies can add value. The way that users interact with systems will change. When I started with Oracle users interacted with green screens connected to main frames, now a high proportion of interactions are through mobile applications, and I’m sure that in 20 years time many of the interactions will be through voice commands as we are seeing with Apple’s Siri and Amazon’s Alexa (a voice activated assistant that links to home automation tools) But underlying all of these interactions will still be data and solid, rigorous SQL databases will still be needed to manage that data.

20年!我觉得技术发展这么快,20年是很难预测的! 但我可以肯定,在那个时候,应用程序会发生很大的变化,但是数据的核心地位却是不变的。数据治理将永远是企业增值的主要方式。 用户与系统交互的方式将会改变。 当我开始通过绿色屏幕开始与 Oracle 用户进行交互时,绝大部分是通过移动应用程序来进行;而且我相信20年后,许多互动将通过语音命令进行,就像与 Apple Siri 和 Amazon Alexa (一个连接到家庭自动化工具的语音激活助手,)但是,所有这些交互的基础仍然是数据,而且仍然需要严格的 SQL 数据库来管理数据。

 

Q6:During your 30 years database performance tuning experience, what’sthe most reason of the performance issues you’ve encountered result from, whatis the ideal system like and how to build a perfect database system?

问题六:在您30多年的性能优化的经验里,您遇到的最多的导致性能问题的原因是什么呢?一个性能良好的完美系统应该是什么样的呢?如何才能搭建这样一套系统呢?

Graham:Well over 30 years all of the hardware related issues have changed completely. Moore’sLaw has continued and more recently we have seen a big change from spinning disk to solid state technology for permanent storage. But all the way through the single most common reason for performance issues has been suboptimal design and build of applications. A well designed database and interfaces makes it much simpler build a well performing application, and a poorly designed database and interface make it impossible.

30多年来,所有硬件相关问题都已经完全改变。 摩尔定律还在持续中,最近我们已经看到从磁盘到固态存储的巨大变化。 但自始至终,性能问题的最常见原因,一直都是是次优设计和应用程序的构建。精心设计的数据库和接口使构建良好的应用程序变得简单很多,反之则成为不可能。


As far as an ideal system, it is really all about balance. A balanced system is one that can fully utilize all hardware resources, a system that has enough IO bandwidth to be able to keep the CPUs busy or has enough memory that you don’t need to do much IO. This has been a big part of the reason for the success of the Oracle engineered systems such as Exadata. But still, having a powerful, well balanced system can only go so far to help an application that is poorly designed whereas it can give exceptional performance with a well designed one.

就一个理想的系统来说,平衡才是核心。 一个平衡的系统能够充分利用所有硬件资源,并且具有足够的IO带宽使CPU持续工作,此外还拥有足够的内存,而无需对IO做过多处理。 这已经成为Oracle工程系统成功的主要原因,例如 Exadata。 但是,拥有一个功能强大并且平衡的系统,到目前为止,对一个设计不善的应用程序的帮助微乎其微,而对一个精心设计的应用程序则可以提供卓越的性能改善。

 

Q7:There is a new feature in Oracle 12.2 that is about ADG&AWR, describe as following:

(The Oracle Diagnostics Pack can be used with an Active Data Guard standby database that is open for read-only access. This enables you to capture the performance data tothe Automatic Workload Repository (AWR) for an Active Data Guard standby database and to run Automatic Database Diagnostic Monitor (ADDM) analysis on the AWR data. This feature enables performance tuning for read-only workloads executing on an Active Data Guard standby database.)


I assume you are familiar with that, we would like to know while using this new feature, is there anything we should pay attention to, does it have limit or restraints?

问题七:在Oracle 12.2中,有一个关于AWR的新特性,现在在ADG的standby库上也能够生成AWR报告,这样就可以避免对Primary的影响。

我想您对这个特性应该很熟悉,想请您介绍一下,在使用这个新特性的时候有什么要注意的问题吗?存在什么样的限制吗?

Graham:When ADG was first introduced in Oracle 11g it came with some limitations. As the ADG was real only it was not possible for it to capture AWR snapshots. The work around that was put in place was to modify the old Statspack code to allow capture of performance data from the ADG back into the primary database. With Oracle 12c release 2 the full set of AWR data is now available for ADG. The only limitation that I am aware of is that a database bounce is required for the transition from ADG to primary to complete. This istemporary limitation and we expect that the limitation will be removed in a patch set.

当 ADG首次在Oracle 11g中被引入时有一些限制。 由于 ADG 不可能捕获AWR 快照。 实施的解决方法是修改旧的 Statspack 代码,以便将 ADG 中的性能数据捕获回主数据库。 在 Oracle 12.2中,ADG 已经可用 AWR 的全套数据。而我知道的唯一限制是从 ADG 到开始到完成的过渡需要数据库反弹。 这个限制是暂时的,有望在补丁中被移除。


Q8: The PaaS is the least developed cloud service in the three, in past few years, many company lose interest for PaaS just because the flexibility of programming languages and infrastructure, so in future, how do you think this problem would be solved?  

问题八:PaaS 是三种云服务模式中发展最慢的一个,在过去的这些年,很多企业因为PaaS在编程语言和架构上的灵活性不够而放弃使用,您觉得以后这些问题会怎样解决呢?

Graham:PaaS is certainly the least developed compared to IaaS and SaaS. I think that the key to success in that market is ensuring the PaaS offerings can support the ‘flavor of the month’ development tools on top of solid software stack so that developers can quickly try things out. I think it has to change more quickly, be more ‘agile’ than the other two offerings.

与 IaaS 和 SaaS 相比,PaaS 肯定是发展最不成熟的。我认为,在市场上取得成功的关键是确保 PaaS 产品能够在坚实的软件堆栈之上支持当下最新的开发工具,以便开发人员能够快速尝试。我认为它必须要更加善变,比其他两个产品也要更加“敏捷”。


Q9:In your experience of database performance tuning over these years, what issue or what kind of issue is the most you’ve ever Encountered, for each kind of performance problem, does the portion change over these years, How and why, Would you like to share with us?

问题九:在您这么多年的性能优化经验里,遇到的哪一类问题比较多,这些性能问题的类型所占的比例,在这些年有什么样的变化呢 ?

Graham:Well, we certainly see less IO bound systems these days. It used to be the most common symptom seen in poorly performing systems but now that IO subsystem can deliver many GBs per second of bandwidth it is less frequently the issue. But once you really got to the root cause it always came down to schema design and the effective use of SQL. That certainly hasn’t changed over the years even though the languages and interfaces used by developers have changed completely.

现在我们看到更少的IO绑定系统。它曾经是在性能不佳的系统中最常见的症状,但现在IO子系统可以每秒提供许多GB的带宽,不太常见。 一旦深入到根本原因,总是会归结到架构设计和SQL的有效使用。很多年过去了,尽管开发人员使用的语言和界面已经完全改变,但这依然没变。

 

Q10:People who work on database especially oracle database area they admire your work a lot, some of them might be there on conference site 28 Mar, would you like to share your study methods to them, or any suggestions?

问题十:在28号的RWP会上,将会有很多RWP的粉丝到现场,大家对你们团队的工作充满了敬佩,您愿意跟大家分享一下您的学习方法吗,或者关于如何做好性能优化,您有什么建议给他们吗?

Graham:The approach that we take is top-down, looking at the entire systems (not just the database) and finding where the time is actually being spent, which is where there is the biggest leverage for improvement. Of course, that involves looking at data, rather than guessing and jumping to potential solutions, and thinking. As one of my colleagues in RWP often states it ‘Thinking is so important’.

我们采取的方法是自上而下的,查看整个系统(而不仅仅是数据库),并查明实际花费的时间,哪里有最大改进。当然,这涉及到查看数据,而不是猜测和跳转到潜在的解决方案和想法。就像我一位 RWP 的同事经常说的一样 “生命在于思考”。



来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/25373498/viewspace-2145870/,如需转载,请注明出处,否则将追究法律责任。

请登录后发表评论 登录
全部评论

注册时间:2011-04-11

  • 博文量
    85
  • 访问量
    606871