Better Genetics Handling Part 1

When I first committed to creating this strain database, an absolute must for me was a decent way to link all the strains together via genetics. I didn’t think a great deal about the best way to do this so I came up with a rather rudimentary methodology. I stored the information in a couple of simple ways.

First, I stored a genetics string with each strain. It looked something like this:

({24} x {56}) x {48}

From there I also stored the links in a strain_links table in the database. Basically a strain_id to strain_id relationship for all of the strains listed in the genetics string. So, for the example above I’d have 3 rows, each with the strain in question as the parent_id and the other strains as the related id. Crude and simple but it worked for what I was trying to achieve at a time. This provided a simple 1 to 1 relationship between the strains and gave me a parsable genetics string that I could replace numbers with to provide the output and links that are seen for strains where genetics is available. While this worked, it made it difficult to create the genetics trees that I so desired. I wanted to be able to easily show a tree of genetics going as far back as possible (to the landraces) or, until there wasn’t genetics information available for the parents.

I was stuck with a couple problems now. So, I did a bit of thinking on the best way to handle the genetics in a better fashion. Because I stored the relationships in strings, I knew I would have to do some string parsing to get things right. There were a few considerations for this that were important:

  1. All plants have a mother and a father. The relationships should be as simple as that.
  2. Some plants have information about genetics but not what the mothers and fathers are. This is denoted by a / instead of an x in the genetics.
  3. Due to some not so consistent data entry by myself during many of the late nights putting this together, there are inconsistencies such as X where there should be a / or putting errant spaces or different cases for the X or x.
  4. Some genetics strings contained many strings such as Ed Rosenthal’s Super Bud ({989} x {2093} x {245} x {1934} x {1945} – also a good example of where / should have been used).
  5. Since there was a need to switch to just a mother and a father in most cases, genetics strings like ({strain1} x {strain2}) x {strain3} needed to have a strain in the database for {strain1} x {strain2}

Because of all of these considerations, I knew I could not write a simple regex or loop to get this information. I began by creating my table I would be using for genetics. It looks like this:

+--------------------+------------+------+-----+---------+----------------+
| Field              | Type       | Null | Key | Default | Extra          |
+--------------------+------------+------+-----+---------+----------------+
| strain_genetics_id | int(11)    | NO   | PRI | NULL    | auto_increment |
| child_id           | int(11)    | NO   | MUL | NULL    |                |
| mother_id          | int(11)    | NO   |     | 0       |                |
| father_id          | int(11)    | NO   |     | 0       |                |
| active             | tinyint(4) | NO   |     | 1       |                |
| updated            | datetime   | NO   |     | NULL    |                |
| parent_id          | int(11)    | YES  |     | 0       |                |
+--------------------+------------+------+-----+---------+----------------+

Straightforward child to parents table.

After this, I wanted to examine the data.

SELECT COUNT(strain_id) FROM strain_reports WHERE genetics_string IS NOT NULL AND genetics_string NOT IN ('', '???');

Out of the 1,491 strains that I had genetics for, I wanted to find out how many of those were simple mother and father relationships. I knew that I could treat all of these the same. As it turned out, roughly 2/3 of them were. I discovered this by grabbing all of my strains and exploding on the genetics string to see how many pieces there were.

$strainParts = explode(' ', str_replace(array('{', '}'), '', $geneticsString));

In the process, I got rid of the {}’s that I was using to help with parsing the strings. From there, it was time to start inserting the relationships. I created a couple of functions to assist with this. Before anyone rags on the code, I know, a global $db isn’t the best and I also know I could’ve written them a bit better. However, this code is for me, running locally, and is for a script. There’s some copy-pasta and some not-so-elegant stuff but I’m okay with that as I’m trying to get work done at this point.

function insertMotherFather ($childId, $motherId = false, $fatherId = false) {
    global $db2;
 
    if ($motherId && $fatherId) {
        $sql = "INSERT INTO strain_genetics (child_id, mother_id, father_id) VALUES ($childId, $motherId, $fatherId)";
    } else if ($motherId) {
        $sql = "INSERT INTO strain_genetics (child_id, mother_id) VALUES ($childId, $motherId)";
    } else if ($fatherId) {
        $sql = "INSERT INTO strain_genetics (child_id, father_id) VALUES ($childId, $fatherId)";
    }
 
    var_dump($sql);
 
    $result = $db2->query($sql);
}

In some cases, my genetics strain looked like this:

{345} x ???

In these cases, it was known what the mother was, but not the father. I needed my function to handle those cases so I ensured my mother_id and father_id had appropriate defaults and did my inserting.

For the others where a / was provided instead of an x, I could only provide parent records. This meant I would insert two records in the strain_genetics table. One for each parent. The function looked similar:

function insertHybrid ($childId, $parentId) {
    global $db2;
 
    $sql = "INSERT INTO strain_genetics (child_id, parent_id) VALUES ($childId, $parentId)";
    $result = $db2->query($sql);
}

With my functions written, it was time to loop through and get my genetics relationships in there:

// Check to be sure this is formatted how we expect with x or / in the middle
$vars = array ('x', '/');
if (in_array($strainParts[1], $vars)) {
    // If it's a / in the middle, we don't know which is the mother or father
    if ($strainParts[1] == '/') {
        // In some cases, the second part of the relationship is unknown like: {234} / ??? so we check each
        if (is_numeric($strainParts[0])) {
            insertHybrid($strainId, $strainParts[0]);
        }
        if (is_numeric($strainParts[2])) {
            insertHybrid($strainId, $strainParts[2]);
        }
    } else if ($strainParts[1] == 'x') {
        /**
           * This is a mother and father relationship. Because of the aforementioned case where the mother or 
           * father is unknown, we have to check
           */
        if (is_numeric($strainParts[0]) && is_numeric($strainParts[2])) {
            insertMotherFather($strainId, $strainParts[0], $strainParts[2]);
        } else {
            if (is_numeric($strainParts[0])) {
                insertMotherFather($strainId, $strainParts[0]);
            }
            if (is_numeric($strainParts[2])) {
                insertMotherFather($strainId, false, $strainParts[2]);
            }
        }
    }
}

Badda-bing, badda-boom, 1,037 strains handled and 1,204 rows of strain information in my new, usable format. Phew. This left me with roughly 400 strains to parse and alter. At this point I could probably start to do things a bit more manually, but decided I would continue writing code to handle another big chunk.

I will discuss the unique challenges that arose from more complicated genetics strings in part two of “Better Genetics Handling” which you can expect tonight or tomorrow which you can find here.

If you liked this post or just want to keep up with the latest developments concerning Smokereports.com, please consider subscribing to my feed or following smokereports on Twitter.

May 12, 2011 · admin · No Comments
Posted in: development, genetics, list, mysql, php

Leave a Reply