Learn by Making Notes.
Share by Posting Notes.

Fri, Sep 25, 2015

I bought my first Gundam model last week and assembled it very perfectly :) within near two hours.

Gundam
Gundam
Gundam

Thu, Sep 24, 2015

Days ago, I took an online interview from Microsoft Shanghai. After some quick questions, I was asked to solve a simple question, sorting a linked list, in a online editor, whose content was shared between me and the interviewer in real-time.

Solutions

Below are three solutions. Source is here.

Solution 1

Iterate from the head of list to the end. For an element, if it’s larger than the next one, swap them. Therefore, after the first pass, the largest one goes to the last position of the list. After the second pass, the second largest one gets to the second last position. Keep iterating until the list is sorted.

Obviously, the complexity is O(n^2).

#include "gtest/gtest.h"
#include <iostream>

struct Node{
    int value;
    Node *next;
};
typedef struct Node * NodePtr;

bool isSorted(NodePtr head);
void swap(NodePtr *prev, NodePtr p);

NodePtr sort_1(NodePtr head) {
    if (head == NULL) {
        return head;
    }
    
    while (!isSorted(head)) {
        NodePtr ptr = head;
        NodePtr *prev = &head;
        while (ptr->next != NULL) {
            if (ptr->value > ptr->next->value) {
                swap(prev, ptr);
            }
            ptr = (*prev)->next;
            prev = &((**prev).next); // hate this...
        }
    }
    return head;
}

bool isSorted(NodePtr head) {
    if (head == NULL) {
        return true;
    }
    
    NodePtr ptr = head;
    while (ptr->next != NULL) {
        if (ptr->value > ptr->next->value) {
            return false;
        }
        ptr = ptr->next;
    }
    return true;
}

/*
 *      -> [ ] -> []
 *    /     |
 * prev     p
 */
void swap(NodePtr *prev, NodePtr p) {
    *prev = p->next;
    NodePtr tmp = p->next->next;
    p->next->next = p;
    p->next = tmp;
}

Solution 2

Solution 2 solves the problem recursively. The problem of size N can be solved by solving problem of size N-1 firstly and then inserting the first element into the sorted list.

The code is cleaner, and the complexity is also O(n^2).

NodePtr insert(NodePtr list, NodePtr node);

NodePtr sort_2(NodePtr node) {
    NodePtr sub = NULL;
    if (node->next != NULL) {
        sub = sort_2(node->next);
    }
    
    return insert(sub, node);
}

NodePtr insert(NodePtr list, NodePtr node) {
    if (list == NULL) {
        return node;
    }
    
    // find the position to insert
    NodePtr ptr = list;
    NodePtr prev = NULL;
    while (ptr != NULL && ptr->value < node->value) {
        prev = ptr;
        ptr = ptr->next;
    }
    
    NodePtr newHead = list;
    if (prev == NULL) {
        node->next = ptr;
        newHead = node;
    } else {
        prev->next = node;
        node->next = ptr;
    }

    return newHead;
}

Solution 3

This solution does it in a “divide-and-conquer” way. It divides the problem into small ones, sorts them, and merges small sorted list into larger ones till the whole list is sorted.

The inner loop takes O(n) time. The outer loop runs O(lgn) times. So this soluton is O(nlgn).

阅读更多…

Mon, Sep 07, 2015

基本概念

Shell启动时,会从HISTFILE(默认是~/.bash_history)里读取命令的历史记录。如果设置了HISTFILESIZE变量,那么只会读取HISTFILESIZE条历史记录。 Shell退出时,这个shell的最后HISTSIZE条历史记录会写到HISTFILE里。如果shell打开了histappend选项,会以append的方式回写HISTFILE文件, 否则回写时会覆盖HISTFILE。如果保存时历史记录超过了HISTFILESIZE,那么HISTFILE会被截断(truncated)。如果不想被截断,可以考虑unset HISTFILESIZE,或者把它设成null、非数字的值或者负数值。

如果不想保存历史信息,可以unset shell的history option,(set +o history),或者unset HISTFILE变量。

在图形界面下,简单地点击shell窗口的关闭按钮可能不会保存这个shell的历史记录。通过执行exit命令来退出shell。

可以通过CTRL-pCTRL-n来上下浏览历史记录,通过CTRL-r反向搜索历史记录。

Environment Variables

HISTFILE,历史记录所在的文件。默认是~/.bash_history。不想保存历史信息时,可以unset这个变量。

HISTFILESIZE,设置HISTFILE文件里可以保存多少条记录,例如export HISTFILESIZE=10000

HISTSIZE,设置一个shell session可以存多少条历史记录。

HISTIGNORE,保存历史时忽略某些命令。例如export HISTIGNORE="&:[ ]*:exit",会忽略三种类型的历史记录;分别是: &,和上一条历史记录相同的记录;[ ]*,以空格开头的历史记录;exit,忽略exit命令。Pattern之间用:分隔。

HISTCONTROL,和HISTIGNORE类似,也是用来忽略某些历史记录,见man bash

HISTCMD,下一个命令在历史记录中的index。假设当前HISTCMD的值是123,然后执行ls,再执行history,会看到:

$ history
 ...
 122 ...
 123 ls

HISTTIMEFORMAT,可以在history的输出中显示历史记录的时间戳。HISTTIMEFORMAT字符串的可用格式见man strftime

Bash Options

history
可以控制是否保留历史记录。打开这个选项,set -o history;关闭这个选项,set +o history

histappend
打开后,写HISTFILE时会用append的方式;否则,使用覆盖的方式。shopt -s histappend,打开;shopt -u histappend,关闭。 append的时候只会把这个shell启动后新增的历史记录写到HISTFILE里。如果histappend选项处于关闭状态(默认行为),那么HISTFILE只会保留 最后关闭的那个shell的历史(其它shell的历史都被覆盖了)。

histreedit
打开这个option后,如果替换(例如^xxx^yyy^)失败,会失败的替换重新输出到shell的输入行上。

histverify
打开后,在做history expansion时会不立即执行它,而是把expansion的结果输出到shell上,让用户有机会在执行前修改它。 比如,打开这个option后,输入!!不会立即执行上一个命令,而是把上一个命令打印到shell的输入行上。

(注:可以看到一些选项用set来设置,另外一些用shopt来设置。这两个都是bash的bulitin command,它们所控制选项有所不同。详见man bash。)

history命令

Bash提供了一个builtin command,history,来操作历史记录。history命令有下面几种执行格式,

  history [n]
  history -c
  history -d offset
  history [-anrw] [filename]
  history -ps arg

history的一些例子,

$ history   # 列出所有历史记录
$ history 3 # 列出最近3条历史记录

$ history -c   # 清除当前shell的历史记录,不会影响HISTFILE
$ history -d 3 # 删除编号为3的记录
$ history -w   # 用当前shell的历史来覆盖HISTFILE
$ history -a   # 把当前shell启动后产生的新的历史记录append到HISTFILE里
$ history -r   # 读取HISTFILE中的历史记录,把它们append到当前shell的历史记录中
$ history -n   # 见man bash
$ history -p a b c 'd e'  # 会按行输出"a","b","c","d e" 
$ history -s "cmd arg1"   # 把"cmd arg1"保存到当前shell的历史中;不执行命令,就可以保留历史记录

在不退出任何一个shell的情况下,可以通过-a-r在shell间分享历史记录。例如,在shell A里执行history -a,然后在shell B里执行history -r,这样shell B就能看见shell A的历史了。

使用-w, -r, -a, -n选项时默认是操作HISTFILE文件,也可以在指定别的文件。

History Expansion

通过history expansion,可以重新执行之前执行过的命令或者提取之前执行过的命令的参数。 History Expansion有两种类型,event Designators以及word Designators。 Event designators用来重新执行命令,而word designators用来提取历史命令的参数。

阅读更多…

Fri, Aug 14, 2015

Share a weird css issue I met today, that isthere are always some space between inline-block div elements of a line. For these div elements, margin are confirmed to be 0. See how it looks in the top part of below snapshot.

space between

And below is corresponding code snippet,

<div style="border:red 1px solid;">
    <div style="width: 100px;display:inline-block; border:green 1px solid;">div 1</div>
    <div style="width: 150px;display:inline-block; border:green 1px solid;">div 2</div>
    <div style="width: 100px;display:inline-block; border:green 1px solid;">div 3</div>
</div>

After some time on googling, the root cause was located; It’s the whitespace between inline-block div elements. In the above code, whitespace is newline character.

Two solutions can be used to fix this issue. One is eliminating whitespace, like

<div style="border:red 1px solid;">
    <div style="width: 100px;display:inline-block; border:green 1px solid;">div 1
    </div><div style="width: 150px;display:inline-block; border:green 1px solid;">div 2
    </div><!-- no space here --><div style="width: 100px;display:inline-block; border:green 1px solid;">div 3
</div></div>

The other is to use display:table-cell;, which may be better than former method, since it does not hurt your beautiful format.

<div style="border:red 1px solid;">
    <div style="width: 100px;display:table-cell; border:green 1px solid;">div 1</div>
    <div style="width: 150px;display:table-cell; border:green 1px solid;">div 2</div>
    <div style="width: 100px;display:table-cell; border:green 1px solid;">div 3</div>
</div>

Reference here. Code for above snapshot here.

Tue, Aug 11, 2015

In my daily work, JPA is used in a Java EE environment/container. Here is a simple example using JPA in a Java SE environment, in which it’s more easily to demonstrate usage of JPA. Source code of this example is available on github.

This example is managed by Maven, and tries to involve as little dependencies as possible. It uses Hibernate as JPA vendor. And Mysql is taken as the database server. By passing some options, this example also helps create corresponding schema in database.

pom.xml

hibernate-core is not enough to use Hibernate as JPA provider. hibernate-entitymanager should be used instead. For JPA dependency, here it use artifact javax.persistence from group org.eclipse.persistence, not artifact persistence-api from group javax.persistence, which is too old to use.

And logback dependency is added for logging purpose.

<project xmlns="http://maven.apache.org/POM/4.0.0" 
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
    xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
  <modelVersion>4.0.0</modelVersion>
  <groupId>com.hong</groupId>
  <artifactId>plain-jpa-app</artifactId>
  <version>0.0.1-SNAPSHOT</version>
  
  <dependencies>
    <!-- use hibernate-entitymanager, if you are using hibernate simply as jpa provider -->
    <!--  <dependency>
      <groupId>org.hibernate</groupId>
      <artifactId>hibernate-core</artifactId>
      <version>4.3.10.Final</version>
    </dependency> -->
    <dependency>
      <groupId>org.hibernate</groupId>
      <artifactId>hibernate-entitymanager</artifactId>
      <version>4.3.10.Final</version>
    </dependency>
    <dependency>
      <groupId>org.eclipse.persistence</groupId>
      <artifactId>javax.persistence</artifactId>
      <version>2.1.0</version>
    </dependency>
    <dependency>
      <groupId>joda-time</groupId>
      <artifactId>joda-time</artifactId>
      <version>2.8.1</version>
    </dependency>
    <dependency>
      <groupId>mysql</groupId>
      <artifactId>mysql-connector-java</artifactId>
      <version>5.1.36</version>
    </dependency>
    <dependency>
      <groupId>ch.qos.logback</groupId>
      <artifactId>logback-classic</artifactId>
      <version>1.1.3</version>
    </dependency>
  </dependencies>
</project>

Entities

There are four entities in this example, Customer, Product, OrderLine, and Order. Only list code for OrderLine and Order below. For other, please check them on github.

package hello;

import java.math.BigDecimal;
import java.util.ArrayList;
import java.util.List;

import javax.persistence.*;

import org.joda.time.DateTime;

@Entity
// @Table can be omitted, if defaults are ok. however, specify 'name' of @Table
// is better, so that the mapping between entity and table wouldn't break 
// if we refactor Order class name later on
@Table(name = "SALES_ORDER")
public class Order {
    @Id
    @SequenceGenerator(name = "ORDER_SEQ", sequenceName = "ORDER_SEQ", allocationSize = 1)
    @GeneratedValue(generator = "ORDER_SEQ", strategy = GenerationType.AUTO)
    private Long id;
    
    // do 'optional = false' in @ManyToOne and 'nullable = false' in @JoinColumn
    // have the same effect?
    @ManyToOne(optional = false)
    @JoinColumn(name = "CUSTOMER_ID", nullable = false)
    private Customer customer;
    
    @Column(name = "AMOUNT")
    private BigDecimal amount;
    
    /*
     * https://docs.oracle.com/javaee/6/tutorial/doc/bnbqa.html
     * for 'date', jpa supports java.util.Date, java.util.Calendar, java.sql.Date, 
     * java.sql.Time, java.sql.TimeStamp. 
     * 
     * http://www.joda.org/joda-time/
     * it's said joda time the best choice before java 1.8.
     * 
     * since the app uses java 1.7, and also it needs time for jpa to support java
     * 1.8's 'date class', so we use joda time (via a converter) in this app.
     */
    @Column(name = "CREATED_TIME")
    @Convert(converter = JodaTimeConverter.class)
    private DateTime createdTime;
    
    @OneToMany(mappedBy = "parent", cascade = CascadeType.ALL, fetch = FetchType.LAZY, orphanRemoval = true)
    private final List<OrderLine> lines = new ArrayList<OrderLine>();
    
    public Long getId() {
        return id;
    }
    
    public void setId(Long id) {
        this.id = id;
    }
    
    public Customer getCustomer() {
        return customer;
    }
    
    public void setCustomer(Customer customer) {
        this.customer = customer;
    }
    
    public BigDecimal getAmount() {
        return amount;
    }
    
    public void setAmount(BigDecimal amount) {
        this.amount = amount;
    }
    
    public DateTime getCreatedTime() {
        return createdTime;
    }
    
    public void setCreatedTime(DateTime createdTime) {
        this.createdTime = createdTime;
    }
    
    public List<OrderLine> getLines() {
        return lines;
    }
}
package hello;

import java.math.BigDecimal;

import javax.persistence.*;

@Entity
@Table(name = "ORDER_LINE")
public class OrderLine {
    @Id
    @SequenceGenerator(name = "ORDER_LINE_SEQ", sequenceName = "ORDER_LINE_SEQ", allocationSize = 1)
    @GeneratedValue(generator = "ORDER_LINE_SEQ", strategy = GenerationType.AUTO)
    private Long id;
    
    @ManyToOne(optional = false)
    @JoinColumn(name = "ORDER_ID", nullable = false, updatable = false)
    private Order parent;
    
    @ManyToOne
    @JoinColumn(name = "PRODUCT_ID", nullable = false)
    private Product product;
    
    @Column(name = "QUANTITY")
    private BigDecimal quantity;
    
    @Column(name = "PRICE")
    private BigDecimal price;
    
    public Long getId() {
        return id;
    }
    
    public void setId(Long id) {
        this.id = id;
    }
    
    public Order getParent() {
        return parent;
    }
    
    public void setParent(Order order) {
        parent = order;
    }
    
    public Product getProduct() {
        return product;
    }
    
    public void setProduct(Product product) {
        this.product = product;
    }
    
    public BigDecimal getQuantity() {
        return quantity;
    }
    
    public void setQuantity(BigDecimal quantity) {
        this.quantity = quantity;
    }
    
    public BigDecimal getPrice() {
        return price;
    }
    
    public void setPrice(BigDecimal price) {
        this.price = price;
    }
}

Type and Converter for Date Time

It’s said Joda time should be used as THE date time class prior to Java 8. Since this example is developed in Java 7, so I take this advice.

To Use Joda time in JPA, I add a converter for it.

阅读更多…

Fri, Jun 26, 2015

整理电脑时发现之前写的一篇关于JNI的文档,放在以供参考。

Our (old) product uses JNI(Java Native Interface) to call C++ code from Java side.

JNI Basic

Using JNI, you only declare a method in Java, while implement that method in C++ Code. Follow below steps, so that when you invoke that Java method, Java/JVM will help invoke corresponding C++ implementation.

  1. Declare method to be native in java

     public class ProxyService {
         // ...
         public native void execute(long cmdCode);
         //...
     }
    
  2. Generate header files for C++
    Java provides command to help generate header files.

     $ javac com/abc/def/ghi/jkl/xyzserver/*.java # compile first  
     $ javah -jni com.abc.def.ghi.jkl.xyzserver.ProxyService 
    

    It generates a header file called com\_abc\_def\_ghi\_jkl\_xyzserver\_ProxyService.h. You can find below lines there,

     JNIEXPORT void JNICALL Java_com_abc_def_ghi_jkl_xyzserver_ProxyService_execute
       (JNIEnv *, jobject, jlong);
    
  3. Implement that C++ function

     JNIEXPORT void JNICALL Java_com_abc_def_ghi_jkl_xyzserver_ProxyService_execute
       (JNIEnv * env, jobject instance, jlong commandCode)
     {
         // C++ code for your "native" java method
     }
    

    The above is a very simple example of JNI. Usually, you already have an existing C++ class, and just want to “wrap” the C++ class into a Java class. To do this, let Java object have a member of long type, which holds the address of a C++ object of your C++ class.

     public class ProxyService {
         long nativeInstanceAddr;
         public native void newCppObj();
         public native void delCppObj();
         public native void execute(long cmdCode);
     }
    

    In C++ code,

     JNIEXPORT void JNICALL Java_com_abc_def_ghi_jkl_xyzserver_ProxyService_newCppObj
       (JNIEnv * env, jobject instance)
     {
         jclass c = env->getobjectclass(obj);
         jfieldid field = env->getfieldid(c, "nativeInstanceAddr", "j");
         jlong address = env->getlongfield(obj, field);
         address = // new some c++ object
     }
    
     JNIEXPORT void JNICALL Java_com_abc_def_ghi_jkl_xyzserver_ProxyService_delCppObj
       (JNIEnv * env, jobject instance)
     {
         jlong address = // get the value of nativeInstanceAddr field like in newCppObj()
         delete (SomeCPPClass *)address;
     }
    
     JNIEXPORT void JNICALL Java_com_abc_def_ghi_jkl_xyzserver_ProxyService_execute
       (JNIEnv * env, jobject instance, jlong commandCode)
     {
         jlong address = // get the value of nativeInstanceAddr field like in newCppObj()
         (SomeCPPClass *)address->execute(commandCode);
     }
    
  4. Load C++ library in Java

     public class DLLLoader {
         public synchronized static void loadDll() {
             //...
             System.loadLibrary(libName);
         }
     }
    

When to Release C++ Object

Java objects are taken care by JVM. However, for the C++ objects which are created via JNI, the JVM doesn’t manage (garbage-collect) them automatically (them live in their “own heap”). You have to delete C++ objects manually.

For example, C++ objects can be released when a transaction is committed or rolled back, or when a HTTP request finishes.

public void commit() {
    try
    {
        //...
    }
    catch(Exception e)
    {
        //...
    }
    finally
    {
        // delete c++ objects here
        // for example, call ProxyService::delCppObj() here
    }
}

Thu, Jun 25, 2015

没事的时候翻了《Coders at Work》的几个章节。看看对大师们的采访还是挺有意思的。《Masterminds of Programming》 也是一本很不错的采访录,不过它的主题偏向于语言设计;相比之下,《Coders at Work》的采访主题更宽泛一点。

Joshua Bloch

Google的首席Java架构师,设计和实现了Java Collections Framework,等等等等……

Seibel: Are there any books that every programmer should read?

Bloch: … I still think everyone should read, is Design Patterns. It gives us a common vocabulary. There are a lot of good ideas in there.

Another is Elements of Style, which isn’t even a programming book. … for two reasons: The first is that a large part of every software engineer’s job is writing prose.

The Art of Computer Programming. In truth, I haven’t read the whole series or anything close to it. When I’m working on a particular algorithm, though, I go there to see what he has to say about it.

Another old one is Frederick Brooks’s The Mythical Man Month. It’s 40 years old and still as true today as when it was written. … Everyone should read that. The main message of the book is “adding people to a late software project makes it later,” and it’s still true.

These days, everybody has to learn about concurrency. So Java Concurrency in Practice is another good bet.

Merriam-Webster’s Collegiate Dictionary, 11th Edition. … It’s not something you actually read, but as I said, when you’re writing programs you need to be able to name your identifiers well.

Bloch推荐了《设计模式》、《Java并发编程实战》;也推荐了一本写作相关的书(《Elements of Style》,豆瓣上的评分是9.1/10) 和韦氏词典。优美、简洁的注释、文档和API需要良好的写作能力;同时一本词典可以让程序员在给类、变量、方法起名字时减少“词穷” 的尴尬。书里的很多被采访者都说没通读过《The Art of Computer Programming》,Bloch也是把它当成工具书和参考书;估计 Bloch在实现Java Collections Framework里的算法部分时翻了不少《The Art of Computer Programming》。

Bloch还推荐了《人月神话》。“adding people to a late software project makes it later” ,看来“C?O”或者每个有 power来分配资源和管理项目的人都应该好好读读。

Seibel: What about other tools?

Bloch: I’m not good with programming tools. I wish I were. The build and source-control tools change more than I would like, and it’s hard for me to keep up. So I bother my more tool-savvy colleagues each time I set up a new environment.

I’m not proud of this. Engineers have things that they’re good at and things they’re not so good at.

估计Bloch对maven、git之类的工具不熟、貌似也不想尝试精通他们,所以搞不定工具时就去问他的同事😅。对average people才 会要求每样都会点;如果你在某些点足够闪光,人们不会太关心那些你不擅长的东西。另外,就掌握一个工具而言,能满足daily usage 应该就可以了;时间总是一定的,可以把精通工具的时间花在“更有价值的”地方。

Seibel: What’s your short list of ones you want to play with more?

Bloch: I want to try Scala … I should also play with Python

又一次说明了不用样样精通。Be focus,be specialised.

Seibel: I was reading Java Puzzlers and Effective Java and it struck me that there are a lot of little weird corners for a language that started out so simple.

Bloch: There are weird corners, yes, but that’s just a fact of life; all languages have them. You haven’t seen a book called C Puzzlers. Why not?

TODO: 有空看看这两本书,观摩一下java的“weird corners”。

另外,Bloch在谈到他的编程风格、编程哲学时推荐了他自己的一个演讲, “How to Design a Good API and Why It Matters”; 在谈到测试时,推荐了Bentley的一篇论文, “Engineering a Sort Function”。 有空可以看看。

Ken Thompson

发明了Unix、C语言、UTF-8编码……

Seibel: You print out invariants; do you also use asserts that check invariants?

Thompson: Rarely. I convince myself it’s correct and then either comment out the prints or throw them away.

大师也是人,怎么方便怎么来。

Seibel: You were at AT&T with Bjarne Stroustrup. Were you involved at all in the development of C++?

Thompson: I’m gonna get in trouble.

Seibel: That’s fine.

Thompson: I would try out the language as it was being developed and make comments on it. It was part of the work atmosphere there. And you’d write something and then the next day it wouldn’t work because the language changed. It was very unstable for a very long period of time. At some point I said, no, no more.

In an interview I said exactly that, that I didn’t use it just because it wouldn’t stay still for two days in a row. When Stroustrup read the interview he came screaming into my room about how I was undermining him and what I said mattered and I said it was a bad language. I never said it was a bad language. On and on and on. Since then I kind of avoid that kind of stuff.

有人的地方就有江湖,all human live in social society, you cannot avoid it.

Seibel: And are there development tools that just make you happy to program?

Thompson: I love yacc. I just love yacc. It just does exactly what you want done. Its complement, Lex, is horrible. It does nothing you want done.

Seibel: Do you use it anyway or do you write your lexers by hand?

Thompson: I write my lexers by hand. Much easier.

待续

Tue, Jun 16, 2015

之前的一篇文章讲了一个缓冲区溢出的简单示例,这里再讲一下格式化字符串存在的安全性问题。

Variable Argument List

C语言的库函数printf(const char * format_str, ...)是我们很熟悉的一个函数,它接受一个格式化字符串,以及一组数目不定的其它参数。

printf()通常通过“可变长度参数列表”(variable argument list)来实现。“可变长度参数列表”的一个简单例子如下所示(详 见man 3 stdarg),

#include <stdarg.h> 
#include <stdio.h>
    
void printNDoubles(int n, ... ) {        /* 函数声明中的"..."表示可变长度参数 */
    va_list args;                        /* 对应的类型 */
    va_start(args, n);                   /* 找到可变长度参数列表的起始处 */

    int i = 0;
    for (; i < n; ++i) {
        double d = va_arg(args, double); /* 取可变长度参数列表中的下一个参数 */
        printf("%dth double is:%f\n", i, d);
    }
    va_end( args );                      /* 做一下清理工作 */
}

int main(int argc, char **argv) {
    printNDoubles(3, 1.1, 2.2, 3.3);

    return 0;
}

对于printf()而言,它会根据格式化字符串中的格式信息,比如%d%f等,利用va_arg()依次取出可变参数列表中的相应参数。

一般而言,va_startva_arg等会被实现成宏(macro);这篇文章给出了一个简单的实现,

typedef unsigned char *va_list;
#define va_start(list, param) (list = (((va_list)&param) + sizeof(param)))
#define va_arg(list, type)    (*(type *)((list += sizeof(type)) - sizeof(type)))

TODO: 看一下gcc/glibc中的实现)

va_start等的实现依赖于函数调用时参数入栈(stack)的一些惯例(convention)。函数调用发生时,caller传给callee的 参数是按顺序入栈的;也就是说,printNDoubles(3, 1.1, 2.2, 3.3)的参数"31.12.23.3在栈上是互相挨着 存放的。用va_start(args, n)找到可变长度参数列表在栈中的起始位置(就是紧挨着参数n的“下一个”位置)后,就可以 用va_arg来依次查看可变参数列表中的参数了。

下面看一下函数的局部变量和函数调用时的参数在栈上是怎么布局的。

函数的局部变量(local variables)

函数的局部变量保存在它自己的栈上;局部变量的声明顺序和它们在栈上的存储位置可以通过gdb反汇编方便地看出来。

(所有的源文件都是以gcc -m32 -fno-stack-protector -Wno-format-security -g test.c在Ubuntu 14.04上编译。)

int main(int argc, char **argv) {
    int s1 = 42;
    int s2 = 8;
    char s3[] = "hello";
    /* ... */
}

对应的汇编语句如下,

mov    DWORD PTR [esp+0x1c],0x2a        // s1
mov    DWORD PTR [esp+0x18],0x8         // s2
mov    DWORD PTR [esp+0x12],0x6c6c6568  // s3,“lleh”(l:0x6c, e:0x65, h:0x68)
mov    WORD PTR [esp+0x16],0x6f         // s3,“o”(o:0x6f)

栈的示意图,

|----------| esp,低地址
|s3 8Byte  | esp+0x12
|----------|
|s2 4Byte  | esp+0x18
|----------|
|s1 4Byte  | esp+0x1c
|----------| 高地址

函数调用时的参数

int main(int argc, char **argv) {
	/* ... */
	printf("%s",argv[1]);
	/* ... */
}

在gdb里执行上面的代码,执行时传入一个参数((gdb) run abc)。调用printf()时的汇编代码片段如下,

0x8048445 <main+40>     mov    eax,DWORD PTR [ebp+0xc]   // [ebp+0xc]是argv[]数组
0x8048448 <main+43>     add    eax,0x4                   // 加4,argv+1,就是argv[1]的地址
0x804844b <main+46>     mov    eax,DWORD PTR [eax]       // 把eax设成argv[1]的值,就是字符串“abc”的地址
0x804844d <main+48>     mov    DWORD PTR [esp+0x4],eax   // printf的参数argv[1]入栈
0x8048451 <main+52>     mov    DWORD PTR [esp],0x8048500 // printf的参数"%s"入栈,0x8048500是字符串"%s"的地址
0x8048458 <main+59>     call   0x80482f0 <printf@plt>

在程序执行到call 0x80482f0 <printf@plt>指令时,检查一下寄存器和内存的状态,验证一下上面的判断。

(gdb) info registers eax
eax            0xffffd0b8       -12104
(gdb) x/w $eax
0xffffd0b8:     0xffffd2a6
(gdb) x/16bx 0xffffd2a6
0xffffd2a6:     0x61    0x62    0x63    0x00  确实是字符串"abc" ...
0xffffd2ae:     略...
(gdb) x/16bx 0x8048500
0x8048500:      0x25    0x73    0x00  字符串"%s" ...
0x8048508:      略...

栈的示意图,

|----------------------| 
|printf的参数1,"%s"    | esp,低地址
|----------------------|
|printf的参数2,argv[1] | esp+0x4
|----------------------|
| ...                  | 高地址

格式化字符串存在的安全问题

下面的代码允许用户传入格式化字符串,这会产生安全问题。

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(int argc, char **argv) {
   	int magic = 0xbad15bad;
   	char *str = (char *)malloc(20);
   	memset(str, 0, 20);

   	printf(argv[1]); /* 允许用户传入格式化字符串 */

   	printf("now str is :%s\n", str);

   	return 0;
}
通过格式化字符串查看内存内容

用下面的参数执行程序,

$ ./a.out %#x_%#x_%#x_%#x_%#x_%#x_%#x_%#x_
0_0x14_0xf75cf10d_0xf77433c4_0xf7781000_0x87eb008_0xbad15bad_0x80484f0_now str is :

可以看到局部变量magic被打印出来了。

阅读更多…