Memory footprint of the JVM(转至springteam)

The JVM can be a complex beast. Thankfully, much of that complexity is under the hood, and we as application developers and deployers often don’t have to worry about it too much. With the rise of container-based deployment strategies, one area of complexity that needs some attention is the JVM’s memory footprint.

Two kinds of memory

The JVM divides its memory into two main categories: heap memory and non-heap memory. Heap memory is the part with which people are typically the most familiar. It’s where objects that are created by the application are stored. They remain there until they are no longer referenced and are garbage collected. Typically, the amount of heap that an application is using will fluctuate as a function of the current load.

The JVM’s non-heap memory is divided into several different areas. We can use the HotSpot VM’s native memory tracking (NMT) to examine its memory usage across these areas. Note that, while NMT does not track all native memory usage (it does not track third party native code memory allocations, for example), it is sufficient for a large class of typical Spring applications. NMT can be used by starting the application with -XX:NativeMemoryTracking=summary and then using jcmd <pid> VM.native_memory summary to display the memory usage summary.

Let’s illustrate the use of NMT by looking at an application, in this case our old friend, Petclinic. The following pie chart shows the JVM’s memory usage as reported by NMT (minus its own overhead) when starting Petclinic with a 48MB max heap (-Xmx48M):

As you can see non-heap memory accounts for the vast majority of the JVM’s memory usage with the heap memory accounting for only one sixth of the total. In this case it was roughly 44MB (with 33MB of that being used immediately after garbage collection). The non-heap memory usage was 223MB in total.

Native Memory areas

  • Compressed class space: used to store information about the classes that have been loaded. Constrained by MaxMetaspaceSize. A function of the number of classes that have been loaded.
  • Thread: memory used by threads in the JVM. A function of the number of threads that are running.
  • Code cache: memory used by the JIT to store its output. A function of the number of classes that have been loaded. Constrained by ReservedCodeCacheSize. Can be reduced by tuning the JIT to, for example, disable tiered compilation.
  • GC: stores data used by the GC. Varies depending on which garbage collector is being used.
  • Symbol: stores symbols such as field names, method signatures, and interned strings. Excessive symbol memory usage can be an indicator that Strings have been interned too aggressively.
  • Internal: stores other internal data that does not fit into any of the other areas.


Compared to heap memory, non-heap memory is less likely to vary under load. Once an application has loaded all of the classes that it will use and the JIT is fully warmed up, things will settle into a steady state. To see a reduction in compressed class space usage, the class loader that loaded the classes needs to be garbage collected. This was more common in the past when applications were deployed to servlet containers or app servers – the application’s class loader would be garbage collected when the application was undeployed – but rarely happens with modern approaches to application deployment.

Sizing the JVM

Configuring the JVM to make efficient use of a given amount of available RAM isn’t easy. If you launch the JVM with -Xmx16M and expect it to use, at most, 16MB of RAM you are in for a nasty surprise.

An interesting area when sizing the JVM is the JIT’s code cache. By default, the HotSpot JVM will use up to 240MB. If the code cache is too small the JIT will run out of space to store its output and performance will suffer as a result. If the cache is too large, memory may be wasted. When sizing the code cache, it’s important to look at the effect on both your application’s memory usage and its performance.

When running in a Docker container, recent versions of Java are now aware of the container’s memory limits and attempt to size the JVM accordingly. Unfortunately, this sizing often over-allocates non-heap memory and under-allocates the heap. Say you have an application running in a container with 2 CPUs and 512MB of memory available. You want it to be able to handle more load so you double the CPUs to 4 and the memory to 1GB. As we discussed above, heap usage typically varies depending on the load, and non-heap usage much less so. Therefore, we’d like the vast majority of the extra 512MB of memory to be given to the heap to cope with the increased load. Unfortunately, the JVM does not do so by default and will allocate the additional memory more equally between its heap and non-heap areas.

Thankfully, the CloudFoundry team have a wealth of knowledge about the JVM’s memory footprint. If you’re pushing apps to CloudFoundry, the build pack will automatically apply this knowledge for you. If you’re not using CloudFoudry, or you’d like to understand more about how to size your JVM, the design document for version three of the Java buildpack’s memory calculator provides some highly recommended further reading.


Virtual memory used by a Java process extends far beyond just Java Heap. You know, JVM includes many subsytems: Garbage Collector, Class Loading, JIT compilers etc., and all these subsystems require certain amount of RAM to function.

JVM is not the only consumer of RAM. Native libraries (including standard Java Class Library) may also allocate native memory. And this won’t be even visible to Native Memory Tracking. Java application itself can also use off-heap memory by means of direct ByteBuffers.

So what takes memory in a Java process?

JVM parts (mostly shown by Native Memory Tracking)

  1. Java HeapThe most obvious part. This is where Java objects live. Heap takes up to -Xmx amount of memory.
  2. Garbage CollectorGC structures and algorithms require additional memory for heap management. These structures are Mark Bitmap, Mark Stack (for traversing object graph), Remembered Sets (for recording inter-region references) and others. Some of them are directly tunable, e.g. -XX:MarkStackSizeMax, others depend on heap layout, e.g. the larger are G1 regions (-XX:G1HeapRegionSize), the smaller are remembered sets.GC memory overhead varies between GC algorithms. -XX:+UseSerialGC and -XX:+UseShenandoahGC have the smallest overhead. G1 or CMS may easily use around 10% of total heap size.
  3. Code CacheContains dynamically generated code: JIT-compiled methods, interpreter and run-time stubs. Its size is limited by -XX:ReservedCodeCacheSize (240M by default). Turn off -XX:-TieredCompilation to reduce the amount of compiled code and thus the Code Cache usage.
  4. CompilerJIT compiler itself also requires memory to do its job. This can be reduced again by switching off Tiered Compilation or by reducing the number of compiler threads: -XX:CICompilerCount.
  5. Class loadingClass metadata (method bytecodes, symbols, constant pools, annotations etc.) is stored in off-heap area called Metaspace. The more classes are loaded – the more metaspace is used. Total usage can be limited by -XX:MaxMetaspaceSize (unlimited by default) and -XX:CompressedClassSpaceSize (1G by default).
  6. Symbol tablesTwo main hashtables of the JVM: the Symbol table contains names, signatures, identifiers etc. and the String table contains references to interned strings. If Native Memory Tracking indicates significant memory usage by a String table, it probably means the application excessively calls String.intern.
  7. ThreadsThread stacks are also responsible for taking RAM. The stack size is controlled by -Xss. The default is 1M per thread, but fortunately the things are not so bad. OS allocates memory pages lazily, i.e. on the first use, so the actual memory usage will be much lower (typically 80-200 KB per thread stack). I wrote a script to estimate how much of RSS belongs to Java thread stacks.There are other JVM parts that allocate native memory, but they do not usually play a big role in total memory consumption.

Direct buffers

An application may explicitly request off-heap memory by calling ByteBuffer.allocateDirect. The default off-heap limit is equal to -Xmx, but it can be overridden with -XX:MaxDirectMemorySize. Direct ByteBuffers are included in Other section of NMT output (or Internal before JDK 11).

The amount of used direct memory is visible through JMX, e.g. in JConsole or Java Mission Control:

BufferPool MBean

Besides direct ByteBuffers there can be MappedByteBuffers – the files mapped to virtual memory of a process. NMT does not track them, however, MappedByteBuffers can also take physical memory. And there is no a simple way to limit how much they can take. You can just see the actual usage by looking at process memory map: pmap -x <pid>

Address           Kbytes    RSS    Dirty Mode  Mapping
00007f2b3e557000   39592   32956       0 r--s- some-file-17405-Index.db
00007f2b40c01000   39600   33092       0 r--s- some-file-17404-Index.db
                           ^^^^^               ^^^^^^^^^^^^^^^^^^^^^^^^

Native libraries

JNI code loaded by System.loadLibrary can allocate as much off-heap memory as it wants with no control from JVM side. This also concerns standard Java Class Library. In particular, unclosed Java resources may become a source of native memory leak. Typical examples are ZipInputStream or DirectoryStream.

JVMTI agents, in particular, jdwp debugging agent – can also cause excessive memory consumption.

This answer describes how to profile native memory allocations with async-profiler.

Allocator issues

A process typically requests native memory either directly from OS (by mmap system call) or by using malloc – standard libc allocator. In turn, malloc requests big chunks of memory from OS using mmap, and then manages these chunks according to its own allocation algorithm. The problem is – this algorithm can lead to fragmentation and excessive virtual memory usage.

jemalloc, an alternative allocator, often appears smarter than regular libc malloc, so switching to jemalloc may result in a smaller footprint for free.


There is no guaranteed way to estimate full memory usage of a Java process, because there are too many factors to consider.

Total memory = Heap + Code Cache + Metaspace + Symbol tables +
               Other JVM structures + Thread stacks +
               Direct buffers + Mapped files +
               Native Libraries + Malloc overhead + ...

It is possible to shrink or limit certain memory areas (like Code Cache) by JVM flags, but many others are out of JVM control at all.

One possible approach to setting Docker limits would be to watch the actual memory usage in a “normal” state of the process. There are tools and techniques for investigating issues with Java memory consumption: Native Memory Trackingpmapjemallocasync-profiler.





WARN  |-o.e.j.server.handler.ErrorHandler:106  - Error page loop /WEB-INF/jsp/common/404.jsp


//WebAppContext.getResource(String path):
_baseResource + jsp.prefix + jsp.fileName + jsp.suffix
其中_baseResource 的路径查找步骤为:
getCommonDocumentRoot();//判断当前运行目录下是否存在src/main/webapp | public | static 这3个目录中的一个,如果存在,则将当前目录设置为_baseResource

在IDEA的Run/Debug Configurations中,将Working directory设置为要运行的模块目录绝对路径,例如/opt/myProject/ops。


Caused by: freemarker.template.TemplateModelException: Error while loading tag library for URI “/my-taglib” from TLD location “servletContext:/my-taglib”; see cause exception.

freemarker查找taglib时,默认使用TaglibFactory.DEFAULT_META_INF_TLD_SOURCES  = Collections.singletonList(WebInfPerLibJarMetaInfTldSource.INSTANCE)
在IDEA下直接run SpringBoot,相关的依赖包并不会拷贝到/WEB-INF/lib/下面去,所以就查找不到这些jar包里面的tld文件了,需要让freemarker去classpath中的所有jar包里面去查找,解决方案:

public class IdeFreeMarkerConfigurer extends FreeMarkerConfigurer {

    public void afterPropertiesSet() throws IOException, TemplateException {

                new TaglibFactory.ClasspathMetaInfTldSource(Pattern.compile(".*\\.jar$", Pattern.DOTALL))));

在XML或者Java代码中,配置使用这个FreeMarkerConfigurer 就可以了






 1     1 ms    <1 毫秒   <1 毫秒
 2     5 ms     5 ms     6 ms
 3     6 ms     6 ms     6 ms
 4     7 ms     *        7 ms
 5     6 ms     5 ms     6 ms
 6     9 ms     9 ms     9 ms
 7    23 ms    22 ms    22 ms
 8     *        *        *     请求超时。
 9     *        *        *     请求超时。
10     *        *        *     请求超时。
11     *        *        *     请求超时。
12     *        *        *     请求超时。
13     *        *       58 ms
14    76 ms    53 ms    51 ms
15    95 ms    96 ms    96 ms []
16    50 ms    46 ms    46 ms []
17   100 ms    97 ms   100 ms []
18   106 ms    96 ms    94 ms []
19   106 ms   126 ms   106 ms []
20   130 ms   118 ms   117 ms []
21     *        *        *     请求超时。
22     *        *        *     请求超时。
23     *        *        *     请求超时。
24     *        *        *     请求超时。
25     *        *        *     请求超时。
26     *        *        *     请求超时。
27     *        *        *     请求超时。
28     *        *        *     请求超时。
29     *        *        *     请求超时。
30     *        *        *     请求超时。

初看丢包发生在日本,于是给Vultr提了一个ticket咨询,答复是他们收到很多类似的反馈,这个是ISP的问题,他们没办法处理,建议我更换一个服务器IP。但tracert的结果明明是日本丢包的,看起来很像是vultr把包扔了,随便找个日本的vultr IP再跟踪路由看看:

1     1 ms    <1 毫秒   <1 毫秒
  2    21 ms     6 ms     6 ms
  3     6 ms     8 ms     5 ms
  4     5 ms     4 ms     5 ms
  5     6 ms    18 ms     8 ms
  6    88 ms    33 ms    33 ms
  7    32 ms     *        *
  8     *        *       28 ms
  9    35 ms    36 ms    37 ms
 10    54 ms    61 ms     *
 11    67 ms    68 ms     *
 12     *        *        *     请求超时。
 13    61 ms    55 ms    64 ms
 14   129 ms   115 ms   124 ms []
 15    72 ms    70 ms    62 ms []
 16     *        *      133 ms []
 17   114 ms   109 ms   111 ms []
 18     *      128 ms     * []
 19   140 ms   143 ms   140 ms []
 20     *        *        *     请求超时。
 21   119 ms   117 ms   117 ms []




接下来只能更换vultr的主机IP了,先给要更换IP的主机创建一个Snapshot,大概耗时半小时左右,然后Destroy掉主机,销毁前建议找个地方记录一下该主机的root密码,因为从Snapshot恢复后,root密码再也找不回来了,显示Unknown, set in snapshot.只能很麻烦的重置密码。主机也可以先不Destroy,等新的主机正常运行后再删掉老的主机。Deploy new server ->  Server Type -> Snapshot,选择刚刚创建的快照,十来分钟后,服务恢复完成,IP更换成功,检查新IP是否被墙,如果新IP被墙了,继续销毁后新建,直到分配到可用的IP。每次更换IP只需要0.01刀,还是很划算的,只是耗时有点长,能否分到可用的IP还得看运气,因为vultr的日本机房支持支付宝付款后已经成了重灾区了,很多IP都被GFW盯上了。






public class Singleton {  
    private static Singleton = new Singleton();
    private Singleton() {}
    public static getSignleton(){
        return singleton;




public class Singleton {
    private static Singleton singleton = null;
    private Singleton(){}
    public static Singleton getSingleton() {
        if(singleton == null) singleton = new Singleton();
        return singleton;



public class Singleton {
    private static volatile Singleton singleton = null;
    private Singleton(){}
    public static Singleton getSingleton(){
        synchronized (Singleton.class){
            if(singleton == null){
                singleton = new Singleton();
        return singleton;



public class Singleton {
    private static volatile Singleton singleton = null;
    private Singleton(){}
    public static Singleton getSingleton(){
        if(singleton == null){
            synchronized (Singleton.class){
                if(singleton == null){
                    singleton = new Singleton();
        return singleton;


那么,这种写法是不是绝对安全呢?前面说了,从语义角度来看,并没有什么问题。但是其实还是有坑。说这个坑之前我们要先来看看volatile这个关键字。其实这个关键字有两层语义。第一层语义相信大家都比较熟悉,就是可见性。可见性指的是在一个线程中对该变量的修改会马上由工作内存(Work Memory)写回主内存(Main Memory),所以会马上反应在其它线程的读取操作中。顺便一提,工作内存和主内存可以近似理解为实际电脑中的高速缓存和主存,工作内存是线程独享的,主存是线程共享的。volatile的第二层语义是禁止指令重排序优化。大家知道我们写的代码(尤其是多线程代码),由于编译器优化,在实际执行的时候可能与我们编写的顺序不同。编译器只保证程序执行结果与源代码相同,却不保证实际指令的顺序与源代码相同。这在单线程看起来没什么问题,然而一旦引入多线程,这种乱序就可能导致严重问题。volatile关键字就可以从语义上解决这个问题。


  1. 线程A发现变量没有被初始化, 然后它获取锁并开始变量的初始化。
  2. 由于某些编程语言的语义,编译器生成的代码允许在线程A执行完变量的初始化之前,更新变量并将其指向部分初始化的对象。
  3. 线程B发现共享变量已经被初始化,并返回变量。由于线程B确信变量已被初始化,它没有获取锁。如果在A完成初始化之前共享变量对B可见(这是由于A没有完成初始化或者因为一些初始化的值还没有穿过B使用的内存(缓存一致性)),程序很可能会崩溃。

Symantec JIT 编译 singletons[i].reference = new Singleton(); 这段代码时,如果不加volatile关键词,会生成如下字节码:

0206106A   mov         eax,0F97E78h
0206106F   call        01F6B210                  ; allocate space for
                                                 ; Singleton, return result in eax
02061074   mov         dword ptr [ebp],eax       ; EBP is &singletons[i].reference 
                                                ; store the unconstructed object here.
02061077   mov         ecx,dword ptr [eax]       ; dereference the handle to
                                                 ; get the raw pointer
02061079   mov         dword ptr [ecx],100h      ; Next 4 lines are
0206107F   mov         dword ptr [ecx+4],200h    ; Singleton's inlined constructor
02061086   mov         dword ptr [ecx+8],400h
0206108D   mov         dword ptr [ecx+0Ch],0F84030h





public class Singleton {
    private static class Holder {
        private static Singleton singleton = new Singleton();
    private Singleton(){}
    public static Singleton getSingleton(){
        return Holder.singleton;


  • 都需要额外的工作(Serializable、transient、readResolve())来实现序列化,否则每次反序列化一个序列化的对象实例时都会创建一个新的实例。
  • 可能会有人使用反射强行调用我们的私有构造器(如果要避免这种情况,可以修改构造器,让它在创建第二个实例的时候抛异常)。



public enum Singleton {
    private String name;
    public String getName(){
        return name;
    public void setName(String name){ = name;

使用枚举除了线程安全和防止反射强行调用构造器之外,还提供了自动序列化机制,防止反序列化的时候创建新的对象。因此,Effective Java推荐尽可能地使用枚举来实现单例。


比如枚举,虽然Effective Java中推荐使用,但是在Android平台上却是不被推荐的。在这篇Android Training中明确指出:

Enums often require more than twice as much memory as static constants. You should strictly avoid using enums on Android.



  • 线程安全
  • 延迟加载
  • 序列化与反序列化安全


《Effective Java(第二版)》
The “Double-Checked Locking is Broken” Declaration