The difference between a++ and ++a from assembly code

以下是科技公司面試常考的考題，主要是測驗你對於 a++ 和 ++a 的認知。

    int a=9527;
    a=a++;
    printf("a=%d\n", a);

請問上面這段程式碼會印出什麼？

我們首先想要了解 a++ 和 ++a 的差別，就要去看它們所產生出來的 assembly code。

先介紹一個可以線上產生 assembly code 的網站，可以幫助我們快速深刻的了解 C。 https://gcc.godbolt.org/

a++

來考慮下面的程式碼：

int test() {
    int a=9527;
    int b = a++;
}

丟到網站上就產生下面的 assembly code

test:
        push    rbp
        mov     rbp, rsp
        mov     DWORD PTR [rbp-4], 9527
        mov     eax, DWORD PTR [rbp-4]
        lea     edx, [rax+1]
        mov     DWORD PTR [rbp-4], edx
        mov     DWORD PTR [rbp-8], eax
        nop
        pop     rbp
        ret

如果我們只關注在 int b=a++：

        mov     eax, DWORD PTR [rbp-4]
        lea     edx, [rax+1]
        mov     DWORD PTR [rbp-4], edx
        mov     DWORD PTR [rbp-8], eax

第一行 DWORD PTR [rbp-4] 代表變數 a 的位址，所以翻譯成中文就是把 a 的位址複製到暫存器 eax。

在講解第二行之前必須先瞭解 eax 和 rax 的關係： rax, eax, ax, ah, al

以及中括號(brackets)在 assembly language 的意義： get the value of the address

因此第二行的翻譯就是將 a 加一後的值存入暫存器 edx，注意這邊並不更改 a 在 memory 的值。
第三行的翻譯就是將儲存在 edx 的值(a+1)存入 a memory 的所在位址。
第四行的翻譯就是將儲存在 eax 的值(a)存入 b memory 的所在位址。

因此最後 b=9527 然後 a=9528。

++a

接下來考慮下面的程式碼：

int test() {
    int a=9527;
    int b = ++a;
}

丟到網站上就產生下面的 assembly code

test:
        push    rbp
        mov     rbp, rsp
        mov     DWORD PTR [rbp-4], 9527
        add     DWORD PTR [rbp-4], 1
        mov     eax, DWORD PTR [rbp-4]
        mov     DWORD PTR [rbp-8], eax
        nop
        pop     rbp
        ret

如果我們只關注在 int b=++a：

        add     DWORD PTR [rbp-4], 1
        mov     eax, DWORD PTR [rbp-4]
        mov     DWORD PTR [rbp-8], eax

第一行直接使用 add 指令將 1 加到 a memory 所在位址。
第二行將 a memory 的值(a+1)存入 eax。
第三行將儲存在 eax 的值(a+1)存入 b memory 的所在位址。

有觀察到差異了嗎？

C 對於 statement 的運算必須經由暫存器，因此關鍵在於暫存器所存的值是 a 還是 a+1，不論是a++ 或是 ++a，a 在 memory 的值都會先加一。

a=a++

考慮以下程式碼：

int test() {
    int a=9527;
    a = a++;
}

丟到網站上就產生下面的 assembly code

test:
        push    rbp
        mov     rbp, rsp
        mov     DWORD PTR [rbp-4], 9527
        mov     eax, DWORD PTR [rbp-4]
        lea     edx, [rax+1]
        mov     DWORD PTR [rbp-4], edx
        mov     DWORD PTR [rbp-4], eax
        nop
        pop     rbp
        ret

如果我們只關注在 a=a++：

        mov     eax, DWORD PTR [rbp-4]
        lea     edx, [rax+1]
        mov     DWORD PTR [rbp-4], edx
        mov     DWORD PTR [rbp-4], eax

根據上面的討論，這四行 assembly code 所代表的意義就是：將 a 的值複製到 eax，然後利用暫存器 edx將 a memory 所在的位址的值加一，最後將eax(a) 存回 a memory 所在的位址，因此a並不會更改它的值。

Conclusion

    int a=9527;
    a=a++;
    printf("a=%d\n", a);

會印出a=9527。

Mistakes

這題其實是有爭議的，因為不論是a++還是++a，之前的討論都建立在「a在 memory 中的值都會先加一」，差別在於暫存器的值是 a+1 或是 a。但其實a在memory中的值什麼時候更新(+1)，是取決於 compiler 的實作，因此a=a++其實是「undefined behavior」。

The difference between a++ and ++a from assembly code

a++

++a

a=a++

Conclusion

Mistakes

Reference

Comments

a++

++a

a=a++

Conclusion

Mistakes

Reference

Comments

Related Posts

Containers in C++ STL (三) 22 Nov 2019

Containers in C++ STL (二) 17 Nov 2019

Containers in C++ STL (一) 15 Nov 2019