Searching
Objective #1: Use a sequential search algorithm with arrays & ArrayLists
- A common algorithm used with arrays and ArrayList's is
the sequential search. A sequential searches is also called a linear search.
- The sequential search
uses a loop to methodically inspect each element of an array or ArrayList until the item which you are looking for is found. Typically a for loop is used rather than a while loop. You simply check each element of an array position by position until you find the one that you are looking for.
- I use a sequential search when I look for a piece of chocolate cake on the dessert line at theOld Country Buffet restaurant! My eyes carefully scan the available pieces of cake from the beginning of the dessert buffet line to the end. When I find the the first piece of chocolate cake, I'm finished!
- In any search, the item upon which the search is based is called the key and the field being searched is called the key field. For example, if you are searching student records for a student named John Doe then the key might be "Doe" and the field that you are searching would be the "last name" field.
- An advantage to using the sequential search algorithm is that it is relatively easy to code.
- A break statement or a boolean, flag variable can be used to exit early from a loop as soon as the key is found.
- Also, a boolean, flag variable can be used to keep track of whether the key was found or not.
- Additional variables can be used to keep track of the maximum or minimum value in the array, the number of occurrences of the key, or the position within an array.
- In addition to using a sequential search to look for a number in an array of integers, you could even search through an array of objects looking for a specific object (e.g looking for a password in an array of String objects.)
- Here is an animation of a linear search http://www.cosc.canterbury.ac.nz/mukundan/dsal/LSearch.html
- The following algorithm searches through an array
of integers named numbers for
the value 99:
*********************** version with an array *************************
boolean found = false;
int position = -1;
for (int i = 0; i < numbers.length; i++)
{
if (numbers[i] == 99)
{
found = true;
position = i;
break;
}
}
if (found)
{
System.out.println("The key was found in position " + position);
}
else
{
System.out.println("The key was not found.");
}
Notice the use of the boolean, flag variable found. A flag variable is used in many situations to keep track if something is on or off or up or down. In this case, found is true if the key is in the array and it is false if it is not found anywhere in the array. Rather than using the boolean data type, you could use an int variable as a flag variable setting it equal to 1 if it was found and 0 if it was not found. However, the code is easier to understand with the boolean.
Also notice that the variable position is used to mark where the key was found within the array. Sometimes it is only necessary to know if a key is found or not. Other times, it is important to determine where exactly the key is found in an array. Be sure to add one to the position if the end user isn't aware that arrays are zero-based. Why do you think position is initialized to -1 rather than 0 in the example above?
Here is a version of the same algorithm using a while loop instead of a for loop:
boolean found = false;
int i = 0;
while (!found && i < numbers.length)
{
if (numbers[i] == 99)
{
found = true;
}
else
{
i++;
}
}
if (found)
{
System.out.println("The key was found in position " + i);
}
else
{
System.out.println("The key was not found.");
}
- The following algorithm searches through an ArrayList of Bot objects
named botList for
a Bot that
has the name "fred":
*********************** version with an ArrayList *************************
boolean found = false;
int position = -1;
for (int i = 0; i < botList.size(); i++)
{
String next = botList.get(i).getName();
if (next.equals("fred"))
{
found = true;
position = i;
break;
}
}
if (found)
{
System.out.println("The key was found in position " + position);
}
else
{
System.out.println("The key was not found.");
}
- Another useful algorithm is one that counts the
number of occurrences of an object in an ArrayList or
a value in array. Here are algorithms that count the number of occurrences
of keyObject and
the value 99 in an ArrayList and
an array:
*********************** version with an array *************************
int count = 0;
for (int i = 0; i < numbers.length; i++)
{
if (numbers[i] == 99)
{
count++;
}
}
System.out.println("The key occurs " + count + " times.");
*********************** version with an ArrayList *************************
int count = 0;
Integer keyObject = 99; // autoboxing occurs here
for (i = 0; i < numbersList.size(); i++)
{
Integer temp = numbersList.get(i);
if (temp.equals(99))
{
count++;
}
}
Notice that you can avoid using the Integer object temp by using
if (numbersList.get(i).equals(99))
{
count++;
}
- Another useful algorithm is one that finds the maximum
(or minimum) value in an array or ArrayList.
Study the following examples which find the greatest value in an ArrayList or
array:
*********************** version with an array *************************
int max = numbers[0]; // or int max = Integer.MIN_VALUE;
int position = 0;
for (int i = 1; i < numbers.length; i++)
{
if (numbers[i] > max)
{
max = numbers[i];
position = i;
}
}
System.out.println("The max value is " + max + " in position " + position);
*********************** version with an ArrayList *************************
Bot max = botList.get(0) ;
int position = 0;
for (i = 1; i < botList.size(); i++)
{
Bot temp = botList.get(i);
if (temp.getAge() > max.getAge())
{
max = temp;
position = i;
}
}
System.out.println("The oldest Bot is " + max.getAge() + " years old and it is found in position " + position);
You can avoid using the temp object with this if statement
if (botList.get(i).getAge() > max.getAge())
{
max = botList.get(i);
position = i;
}
- Parallel arrays are two separate arrays that are used together in the logic of an algorithm to store associated data. For example, an array of names and an
array of scores where the score in position zero of the scores array belongs to the person whose name is found in position zero of the names array and so on. Two parallel arrays are the same size but
they do not have to contain elements of the same data type. The names array would be an array of strings and the scores array would be an array of integers.
For example, you may want to search one parallel array for the greatest gpa and then print the associated name of the student who has the greatest gpa.
String[] names = new String[10];
double[] gpas = new double[10];
// pretend that these two parallel arrays are filled with students' names and matching gpa's
double maxSoFar = gpas[0];
for (int i = 1; i < names.length; i++)
{
if (gpas[i] > maxSoFar)
{
maxSoFar = gpas[i];
indexOfMax = i;
}
}
System.out.println("The student with the highest gpa is " + names[indexOfMax]);
Notice that maxSoFar was initialized to gpas[0] rather than the value 0. Also, notice that the loop variable i is initialized to 1 rather than 0.
Objective #2: Trace and apply the binary search algorithm.
- If the data in an array is already sorted in order (alphabetically for example) then a binary search is much more efficient and quicker than a linear search.
- In the case of a binary search, you first examine the "middle" element. If that element is "lower" (alphabetically, for example) than the element which you are looking for, then you discard the lower half of the data and continue to search the upper half. The process repeats itself when you then look at the middle element of the remaining "upper half" of the data.
- Here is an animation of a binary search http://www.cosc.canterbury.ac.nz/mukundan/dsal/BSearch.html
- The binary search algorithm requires the following preconditions:
- The data must be sorted.
- The number of elements in the list must be stored in a separate variable.
- The programmer must be able to randomly access elements in the list by relative position
- Here's a code segment
while (low <= high && !found)
{
mid = (low + high) / 2;
if (key > numbers[mid])
{
low = mid + 1;
}
else if
(key < numbers[mid])
{
high = mid - 1;
}
else
{
found = true;
}
}
if (found)
{
System.out.println("The value" + key + " was found in position " + mid);
}
else
{
System.out.println("The value" + key + " was not found");
}
The following information will not be tested on the AP exam or unit tests.
Objective #3: Trace and apply a hash algorithm.
- A lookup table (also known as a key-to-address
transformation)
can be used to allow very fast searches. Data elements are placed into a
lookup table by converting some characteristic of each element into a specific
position of the lookup table. Often the lookup table is a one or two-dimensional
array.
- Sometimes, there is no good algorithm or function for mapping each different
data element to a different position of the array. That is, two or more elements
both map to the same position of the array. This problem is called a collision.
If that happens, then there is no easy way to guarantee that someone can
find
the
element
they
are searching
for. They may find one of the other elements that mapped to the same array
location.
- Hashing is a search algorithm that improves
upon lookup tables by finding a way to minimize and resolve collisions. A hash
function is used to convert a
data element to a hash value. The hash function is essentially
an assignment statement. For example, to hash a bunch of words into a hash
table, you could use a hash function that finds the sum of the Unicode values
for each letter in the word. The hash value is used to find a position for
the element in a hash
table.
Like a lookup table, a hash table is simply a data structure such as a one
or two-dimensional array. But it could also be
an array of
linked lists or a linked list of linked lists among other kinds of data structures
that we haven't studied. The key difference though between using hashing
and using a lookup table it that the hashing algorithm deals with collisions
that occur.
- One way to handle collisions is to use a hash table data structure
that allows two or more elements to both occupy the same position. By using
a two-dimensional array where the row represents the initial hash values,
you can place the original element into column 0 and the colliding element
into column 1. A third element that collides to the same hash table row
could be placed in column 3 and so one. This method for dealing with collisions
is called chaining. Each row number in the two-dimensional array is called
a bucket. Instead of a two-dimensional array, the hash table could be a
one-dimensional
array with a linked list or a binary search tree as a bucket attached to
each row of the array.
- Another way to handle handle collisions is called probing. When a second
element hashes to an already occupied hash table position, a probing function
is used to find a new slot in the hash table. A real simple probing function
would be to increment the position by one and place the second element into
the next position in the hash table array. If that position is also occupied
then the probing function could be applied again to send the element to the
next position that is hopefully empty. A poor probing function such as this
one (i.e. incrementing by one) leads to clustering with lots of data elements
are settling into the same area of a hash table. It's better to balance the
elements out throughout the table so quadratic probing is sometimes used.
Where the probing function is index + n ^ 2 for each successive n.
- A big disadvantage to lookup tables and hashing is the
large amount of memory that may be necessary. The size of the one or two-dimensional
array used as the lookup or hash table may have to be very large to accomodate
a poor hash function. If their is not a lot of data to put into the lookup
or hash table, then the table will be very sparse.
- The time performance for a hash is O(1) in the
best case and O(n) in its worst-case.