Anonymising Data on a Magento Development Site

Often when copying data from a production website to a development environment, common requests from clients is to obfuscate customer data to remove any resemblance of real customer information. So how do we go about anonymising data on a Magento development site?

It’s important to note the areas of data that will need to be scrambled. The most important ones are the following.

  • Customer account information
  • Customer address information
  • Order information
  • Newsletter subscription information

Magento contains methods that allow us to manipulate information within the database via the use of the getter and setter methods.

Therefore we can create a quick and simple module to install on a development environment to help us achieve this.

As always, we need to start with the module declaration file. The module will be called Anonymous_Data.

<?xml version="1.0"?>
<config>
    <modules>
        <Anonymous_Data>
            <codePool>local</codePool>
            <active>true</active>
        </Anonymous_Data>
    </modules>
</config>

Now add the module’s config.xml configuration file.

<?xml version="1.0"?>
<config>
    <global>
        <blocks>
            <anonymous>
                <class>Anonymous_Data_Block_Adminhtml</class>
            </anonymous>
        </blocks>
        <helpers>
            <anonymous>
                <class>Anonymous_Data_Helper</class>
            </anonymous>
        </helpers>
        <models>
            <anonymous>
                <class>Anonymous_Data_Model_Adminhtml</class>
            </anonymous>
        </models>
    </global>
    <admin>
        <routers>
            <adminhtml>
                <args>
                    <modules>
                        <anonymous after="Mage_Adminhtml">Anonymous_Data_Adminhtml</anonymous>
                    </modules>
                </args>
            </adminhtml>
        </routers>
    </admin>
    <adminhtml>
        <layout>
            <updates>
                <anonymous>
                    <file>anonymous.xml</file>
                </anonymous>
            </updates>
        </layout>
    </adminhtml>
</config>

It is common to add a helper file that will assist with any translations of items within our module.

<?php
class Anonymous_Data_Helper_Data extends Mage_Core_Helper_Abstract
{
}

Now for the models. For simplicity, we’ll split the different data into separate model files. These being:

  • Customer.php – for customer data and their related address and order information.
  • Guestorder.php – for guest order data information.
  • Newsletter.php – for newsletter email address information.
<?php
class Anonymous_Data_Model_Adminhtml_Customer extends Anonymous_Data_Model_Adminhtml_Abstract
{
    public function __construct()
    {
        $this->_init('anonymous/customer');
    }
}
<?php
class Anonymous_Data_Model_Adminhtml_Guestorder extends Anonymous_Data_Model_Adminhtml_Abstract
{
    public function __construct()
    {
        $this->_init('anonymous/guestorder');
    }
}
<?php
class Anonymous_Data_Model_Adminhtml_Newsletter extends Anonymous_Data_Model_Adminhtml_Abstract
{
    public function __construct()
    {
        $this->_init('anonymous/newsletter');
    }
}

There will also be an Abstract.php file that will contain methods about the number of random characters to use and some other information.

<?php
class Anonymous_Data_Model_Adminhtml_Abstract extends Mage_Core_Model_Abstract
{

    const CHARACTER_SET = 'abcdefghijklmnopqrstuvwxyz';

    const RANDOM_CHAR_LENGTH = 8;

    protected $_randomFirstname;

    protected $_randomLastname;

    protected $_randomEmailAddress;

    public function getRandomCharString()
    {
        return substr(str_shuffle(self::CHARACTER_SET), 0, self::RANDOM_CHAR_LENGTH);
    }

    public function getRandom10DigitTelephone()
    {
        return rand(1111111111,9999999999);
    }

    public function getRandomEmailAddress($email)
    {
        $parts = explode('@', $email);
        $randomEmailPart = $this->getRandomCharString();

        return $this->_randomEmailAddress = $randomEmailPart. '@' . $parts[1];
    }
}

There are a few things to note from this file.

  • An alpha string of 8 random characters will be generated for customer names and email addresses.
  • Email addresses will only be anonymised before the ‘@’ symbol. This is to prevent any issues with invalid email addresses.
  • The $_randomFirstname, $_randomLastname and $_randomEmailAddress are properties used to store the randomly generated first name, last name and email address. These are re-used when anonymising order data to keep the data the same (although this is not necessary).

And finally, an Anonymous.php that will be responsible for starting the anonymisation process.

class Anonymous_Data_Model_Adminhtml_Anonymous extends Anonymous_Data_Model_Adminhtml_Abstract
{
    public function __construct()
    {
        $this->_init('anonymous/anonymous');
    }

    public function run()
    {
        try {
            Mage::getModel('anonymous/customer')->anonymiseCustomerData();
            Mage::getModel('anonymous/guestorder')->anonymiseGuestOrderData();
            Mage::getModel('anonymous/newsletter')->anonymiseNewsletterData();
        } catch (Exception $e) {
            Mage::throwException($e->getMessage());
        }
    }
}

Now we need to add a controller so that an area in the admin panel can be accessed.

<?php
class Anonymous_Data_Adminhtml_AnonymousController extends Mage_Adminhtml_Controller_Action
{
    public function indexAction()
    {
        $this->loadLayout();
        $this->renderLayout();
    }

    public function runAction()
    {
        try {
            Mage::getModel('anonymous/anonymous')->run();
            Mage::getSingleton('adminhtml/session')->addSuccess("Data has been anonymised!");
        } catch (Exception $e) {
            Mage::getSingleton('adminhtml/session')->addError(sprintf("Error: %s", $e->getMessage()));
        }
        return $this->_redirectUrl($this->_getRefererUrl());
    }
}

Now for the adminhtml.xml. The section in the admin will be located under System -> Tools for this example.

<?xml version="1.0"?>
<config>
    <menu>
        <system>
            <children>
                <tools>
                    <children>
                        <anonymous>
                            <title>Anonymise Data</title>
                            <sort_order>9999</sort_order>
                            <action>adminhtml/anonymous</action>
                        </anonymous>
                    </children>
                </tools>
            </children>
        </system>
    </menu>
</config>

The block class needs to be added that will help us render information on our page within the admin.

<?php
class Anonymous_Data_Block_Adminhtml_Anonymous extends Mage_Adminhtml_Block_Template
{
    public function getAnonymousProcessUrl()
    {
        return $this->getUrl('adminhtml/anonymous/run');
    }
}

We defined a layout xml file to be used within the config.xml file. This will contain our configured template to use.

<?xml version="1.0"?>
<layout>
    <adminhtml_anonymous_index>
        <reference name="content">
            <block type="anonymous/anonymous" name="anonymous" template="anonymous/anonymous.phtml" />
        </reference>
    </adminhtml_anonymous_index>
</layout>

The template file will contain text requiring the user to confirm they want to continue, as well as a link to the Magento system backup page should a database backup need to be taken beforehand.

<div class="container">
    <p><?php echo $this->__('Are you sure you want to anonymise data within Magento? The following data will be scrambled:'); ?></p>
    <ul>
        <li><?php echo $this->__(' - Customer data including account, address and order-related information'); ?></li>
        <li><?php echo $this->__(' - Guest order data'); ?></li>
        <li><?php echo $this->__(' - Newsletter subscription data'); ?></li>
    </ul>
</div>
<div class="buttons">
    <button class="button" onclick="setLocation('<?php echo $this->getAnonymousProcessUrl(); ?>')"><?php echo $this->__('Confirm'); ?></button>
    <button class="button" onclick="setLocation('<?php echo $this->getUrl('adminhtml/system_backup/index'); ?>')"><?php echo $this->__('Take Database Backup'); ?></button>
</div>

Now that the initial setup is complete, we can add the code to manipulate the database within our models.

The Customer.php model will alter customer account information, saved address and order information.

<?php
class Anonymous_Data_Model_Adminhtml_Customer extends Anonymous_Data_Model_Adminhtml_Abstract
{
    const DEFAULT_PASSWORD = 'xxxxxxxxxx';
    
    public function __construct()
    {
        $this->_init('anonymous/customer');
    }

    public function anonymiseCustomerData()
    {
        $customers = Mage::getModel('customer/customer')
            ->getCollection()
            ->addAttributeToSelect('*');

        foreach ($customers as $customer)  {

            $customerId = $customer->getId();
            
            $customer->setFirstname($this->getRandomCharString());
            $customer->setLastname($this->getRandomCharString());
            $customer->setCompany($this->getRandomCharString());

            $this->_randomFirstname = $this->getRandomCharString();
            $this->_randomLastname = $this->getRandomCharString();

            $this->_randomEmailAddress = $this->getRandomEmailAddress($customer->getEmail());
            $customer->setEmail($this->_randomEmailAddress);

            $customer->setPassword(self::DEFAULT_PASSWORD);

            $this->anonymiseCustomerAddresses($customer);
            $this->anonymiseCustomerOrders($customerId);

            $customer->save();
        }
    }

    public function anonymiseCustomerAddresses($customer)
    {
        foreach ($customer->getAddresses() as $address) {
            $address->setFirstname($this->_randomFirstname);
            $address->setLastname($this->_randomFirstname);
            $address->setStreet($this->getRandomCharString());
            $address->setRegion($this->getRandomCharString());
            $address->setPostcode($this->getRandomCharString());
            $address->setTelephone($this->getRandom10DigitTelephone());
            $address->save();
        }
    }

    public function anonymiseCustomerOrders($customerId)
    {
        $orderCollection = Mage::getModel('sales/order')
            ->getCollection()
            ->addFieldToSelect('*')
            ->addFieldToFilter('customer_id', $customerId)
            ->addFieldToFilter('state', array('in' => Mage::getSingleton('sales/order_config')->getVisibleOnFrontStates()));
        
        $this->anonymiseOrderData($orderCollection);
        
    }
}

The Guestorder.php file will filter the order collection by the customer_is_guest attribute.

<?php
class Anonymous_Data_Model_Adminhtml_Guestorder extends Anonymous_Data_Model_Adminhtml_Abstract
{

    public function __construct()
    {
        $this->_init('anonymous/guestorder');
    }

    public function anonymiseGuestOrderData()
    {
        $orderCollection = Mage::getModel('sales/order')
            ->getCollection()
            ->addFieldToSelect('*')
            ->addFieldToFilter('customer_is_guest', 1);

        $this->_randomFirstname = $this->getRandomCharString();
        $this->_randomLastname = $this->getRandomCharString();

        $this->anonymiseOrderData($orderCollection);
    }
}

Newsletter.php only needs to alter the subscriber email address.

<?php
class Anonymous_Data_Model_Adminhtml_Newsletter extends Anonymous_Data_Model_Adminhtml_Abstract
{
    public function __construct()
    {
        $this->_init('anonymous/newsletter');
    }

    public function anonymiseNewsletterData()
    {
        $subscribers = Mage::getModel('newsletter/subscriber')
            ->getCollection();

        foreach ($subscribers as $subscriber) {
            $randomEmailAddress = $this->getRandomEmailAddress($subscriber->getSubscriberEmail());

            $subscriber->setSubscriberEmail($randomEmailAddress);
            $subscriber->save();
        }
    }
}

And finally, add in the anonymiseOrderData() method to Abstract.php.

<?php
class Anonymous_Data_Model_Adminhtml_Abstract extends Mage_Core_Model_Abstract
{

    ....

    public function anonymiseOrderData($orderCollection)
    {
        foreach ($orderCollection as $order) {

            $order->setCustomerFirstname($this->_randomFirstname);
            $order->setCustomerLastname($this->_randomLastname);
            $order->setCustomerEmail($this->_randomEmailAddress);

            $billing = $order->getBillingAddress();
            $shipping = $order->getShippingAddress();

            $billing->setFirstname($this->_randomFirstname);
            $billing->setLastname($this->_randomLastname);
            $billing->setEmail($this->_randomEmailAddress);
            $billing->setCompany($this->getRandomCharString());
            $billing->setStreet($this->getRandomCharString());
            $billing->setRegion($this->getRandomCharString());
            $billing->setPostcode($this->getRandomCharString());
            $billing->setTelephone($this->getRandom10DigitTelephone());

            $shipping->setFirstname($this->_randomFirstname);
            $shipping->setLastname($this->_randomLastname);
            $shipping->setEmail($this->_randomEmailAddress);
            $shipping->setCompany($this->getRandomCharString());
            $shipping->setStreet($this->getRandomCharString());
            $shipping->setRegion($this->getRandomCharString());
            $shipping->setPostcode($this->getRandomCharString());
            $shipping->setTelephone($this->getRandom10DigitTelephone());

            $order->save();
        }
    }
}

Depending on how many customers and orders you have saved in your system, the process might take a few minutes.

However when it has completed, the data will be completely obfuscated!

Note: This article is based on Magento Community/Open Source version 1.9.