AlexJ's Computer Science Journal

alexandru.juncu.ro

Linked lists

If you are a computer science student or have ever taken a data structure course you must have heard about the concept of linked lists. The structure is a way of storing a set of data, similar to an array. Unlike an array, you can’t access the items with an index, but on the other hand, you don’t need to to have fixed size of the data set since elements can be added or deleted at any time. Operation with this data structure usually take more and each element needs more memory space than one in an array, but you can have a great number of elements without needing a contiguous memory space like you would with an array. For the rest of the article, I’m going to focus on the C programming language.

Since I was first taught about linked lists, in high-school, the concept was the same. You need to define a structure with some fields for the data that you want to store in each element and an extra field (or more) that is a pointer to an instance of this type of structure. You can have a simple linked list where in any element you always keep a reference (pointer) to the next element in the list (you’ll know the list ends when the element has NULL as next) or you can have a double linked list where you have both a next reference and a previous one. In the first case you can only go through the list in one direction, while in the second case, you can go back and forward as you wish. Here is an example:

struct my_linked_list {

int id;

char name[MAX_LEN];

struct my_linked_list *next;

};

struct my_double_linked_list {

float x;

float y;

struct my_double_linked_list *next;

struct my_double_linked_list *prev;

};

For any actions on the list, you need to have a pointer to one element of the list (usually the first element). You can treat the list as other abstract data structures, like a Stack/FIFO where you only push and pop the first element (top) or a queue where you add after the last element and take out the first element, or you can just consider it a generic list and add/remove element even in the middle of the list. More information about lists in C here and here

What is important is that for each type of list (list of inters, list of strings, list of a custom structure) you need to write functions for that specific to that list. So if you write functions for adding, removing, searching elements of an int list, you can’t use the same functions for a list of floats. You need to declare a new structure for a list element and functions to work with it. Keep in mind this in not C++ or Java where you can have an Object class.

When I took my kernel programming class, I found out about the implementation of linked lists in the Linux Kernel. At first, the concept was strange to me, because this implementation offered an API to a defined structured called struct list_head that worked different than what I knew about lists. Instead of defining list structure with my data as an element, I had to just add this list_head element as a field. So rather than having a specific list element structure that contained a generic type like int, char[] or another custom structure, I could use a generic structure (that contained generic fields) with this generic list element type. And the API provided generic functions to add, remove an iterate given a list head. An example would look something like this:

struct my_structure{

int value;

struct list_head mylist ;

}

At first look, it might not seem like a difference, until you realize that the list element is not tied to the type of the structure. This could mean that you could have a list of elements of different types (and now we can tell C++/Java fans to quit bashing C because it’s not flexible). Yes, you do still have say in code that an instance of that structure will be in a list, but in the classic implementation, you needed to describe the structure anyway (if it wasn’t s simply type like an int or float) and then describe a new structure that was a list element with a field of the first structure. And now we see how generic the implementation is.

Once you get passed the initial shock of things being upside down, you realize how power of this implementation. For example, an element can be part of more than one list at the same time (just add a list_head field for each one). And you don’t need to reinvent the wheel implementing each list and its operations.

These lists are used throughout the kernel. For example, task_struct, which defines a process basically, has lots of lists associated with it. This is a critical structure in the kernel, but if someone needs to add extra functionality that involves a linked list, he/she doesn’t care about breaking other lists by changing list implementation. Someone would just add a new field in the task_struct and use the same task_struct instance part of the new list, keeping everything consistent.

Recently I was reading an book with interview questions and there was a section about linked lists. And the classical theory was presented and the answers were implementations using that classical model. My own answers were using this Linux kernel style implementation I noticed that they made the problems much more easier to solve. So now I keep asking myself why isn’t this implementation more publicised. I know it’s used in non kernel projects (the implementation is very easy to port, everything is in the lists.h file).

To find out more, check out the tutorial on the kernelnewbies.org wiki and also take a look on a similar article about linked lists.

Comment

AlphaOmega Captcha Classica  –  Enter Security Code
     
 

*