2011-07-14

Book Review - Pro Puppet

Recently, I was contacted by Apress and asked to read and review their new book, "Pro Puppet". Seeing as how I had been working my way through understanding puppet for my new job, I was quite excited to check out yet another resource. I'll try to be fair in this review, but sometimes your first love will always have a special place in your heart. And without further ado, in all of it's glory, my review.


First off, let's define what Puppet is. Puppet is a solution for managing configurations on a wide variety of UNIX-like systems. It's designed to let an administrator build policies on a centralized "Puppet Master" and have those configuration policies applied to all of the managed systems in a reliable, repeatable, and secure manner. It's an itch that has needed to be scratched in the Linux world for some time. Puppet is not the only contender in this arena, but it appears to be the most widely used.

When I started reading Pro Puppet I had already been working with puppet for a few months based on the documentation I have found on the Internet and a now out-of-date copy of Pulling Strings With Puppet. While Pulling Strings With Puppet is an excellent reference, it was based on a much older version of Puppet which is no longer supported. Anyhow, Pro Puppet is based on the 2.6.x series of the software (and 2.7 is out as of this writing) and as such has many useful resources for the newly minted puppet user.

Pro Puppet starts with explaining what puppet it and what it can be used for. It also explains a little about how puppet it structured. Puppet uses a Domain Specific Language (or DSL) to describe the configuration policies, and the book does an excellent job of slowly immersing the reader into the nuances of the language. I can say that it took me weeks to learn from Internet resources how to do what Pro Puppet can explain in just a day.

The book goes into excellent detail about how to install puppet on various different platforms like: Ubuntu, Debian, Solaris, Red Hat, and even Windows! Each of the detailed procedures takes the reader step by step through the install process and can quickly get the user up and running with puppet.

As you continue reading, you are lead from starting out with puppet, through creating advanced manifests, and finally on to scaling puppet to massive environments. The final few chapters deal with extending puppet, enabling true push support through marionette collective, and finally setting up reporting.

I have to say that this book had perfect timing for me. I had been fighting with learning how to use puppet to manage systems for my job. I find that the documentation from Puppet Labs is difficult to follow and hard to put together in a useful manner. Once I started reading this book and putting it's simplified lessons to use, I was able to make much more useful puppet manifests and truly start managing my servers in an automated fashion. If you need puppet to manage your systems or if you are just interested in learning puppet for other purposes, I highly recommend this book.


Read more...

2011-03-04

Introduction To Java Persistence

    This is the blog representation of the presentation I gave at CodepaLOUsa in March of 2011. The presentation included some basic introduction and history and some code examples which are shown below.

The Long Road To JPA

    Java has long had database capabilities. We started with JDBC, which is excellent for working directly with the database. A nice thing to have if you are DBA; but as programmers, we need data . . . not databases. So, while JDBC would allow us to get data from the database, we had to manipulate the data to put it into usable objects and business logic code. This is tedious and time consuming.
    Along comes Enterprise Java Beans (EJB). EJB allowed us to map database information straight to Java objects. This was a revolution, but unfortunately it had drawbacks which limited it's adoption. Primarily, EJB was difficult to configure because of "XML Hell". Secondly, EJB did not allow us to define relationships between Entities.
    To account for the need in the programming community to handle data in a more sane manner, many vendors start creating the predecessors to JPA. These implementations included products like Hibernate and TopLink. These tools handled mapping Plain Old Java Objects (POJOs) to database entities. They also allowed the objects to have relationships mapped between each other to handle table joins. The down side of these early implementations was that they were still difficult to configure and presented the programmer with more "XML Hell".
    The Java community kept demanding better solutions for these problems, and the EJB3 specification finally introduced Java Persistence (JPA). JPA can be used without any XML configuration whatsoever, it uses Annotations to simplify defining entities, and finally you could use the annotations to create entity relationship mappings in a quick and efficient manner.

What Problems Can JPA Help Me To Solve?

    Some of the challenges in writing applications around databases can be mitigated by using JPA. JPA automatically handles converting strings into safe values and thus prevent SQL injection attacks. JPA can also plug clusting, caching and connection pools plugged in without any changes to the business logic code. With the caching and the annotations, it speeds development time and runtime to improve your time to market for applications.

Definitions


Bean - A Java object which has (typically) private fields/properties and uses getters and setters to manipulate/access those values.

Annotation - A method of adding configuration/compiler meta-data to a Class definition.

Entity - A special type of Bean which has either external (XML configuration) or internal (Annotations) metadata making it possible for JPA to manage the Bean.

POJO - A "Plain Old Java Object". A class which is not special in structure, content or compilation.

What Does JPA Do?

    There are a few things that JPA does to make the life of the progammer much simpler. Primarily, it automates many of the traditional tasks involved storing and retrieving information from a database. Secondly, it can provide improvements to how data is accessed in the form of Caching, connection pooling and abstraction of the underlying SQL queries so that the code is database agnostic without any specialized coding. Additionally, JPA can be used as a way to make data more mobile between disparate machines and still tied to a database through the use of the "detached" entity concept, thus allowing a record to manipulated and passed around between systems and still persisted to the underlying database.

The Basics Of JPA Entities

    What is an Entity? Well, in it's simplest form an Entity is just a Bean class with some annotations added. An Entity is a Bean which has private properties and public getter/setter methods and a few small annotations. Please see an example of a basic Entity below.

@Entity
@Table(name=”mytable”)
public class MyEntity implements Serializable {
    private static final long serialVersionUID = 1L;

    @Id
    @GeneratedValue(strategy=GenerationType.IDENTITY)
    @Column(unique=true, nullable=false)
    private int index = 0 ;

    @Column(name=”mydatacolumn”)
    private String myDataColumn = null ;

    // Getters &Setters omitted . . . 
}

@Entity - This annotation tells the JPA provider that this class is an Entity and should be managed by the Persistence Context.

@Table - This annotation allows us to specify some additional details about the underlying table

@Id - An annotation used to indicate that the annotated field is to be used as the unique key on this entity. It is also possible to create an @EmbeddedId which indicates that the fields in the specified sub-class are to be used as a composite key

@GeneratedValue - This is how we can specify the method to be used when storing the primary key.

@Column - Optional annotation which can be used to more specifically control how a field is represented in the database tables.

Configuring JPA

    Configuration of JPA is much simpler than it's progenitors and more flexible in implementation options. You can use the persistence.xml file, a properties file, or configure the settings within the program. Each of these options has their merits and limitations, so it's up to the individual to determine what works best in your projects. In addition to these methods, many containers (like Spring and JBoss) can provide configurations to their applications without additional configuration at all.

The Persistence Context

    An oft misunderstood concept; the Persistence Context, at it's simplest level, is a cache layer between your application and the underlying database. So, when you manage an object via JPA, and modify it; it will not necessarily be immediately written to the database. Conversely, when you read from a managed object, it may already be held in memory and thus no request to the database is needed. Overall, this provides significant performance improvements for database intensive applications.

    When using JPA, a Persistence Context is created when you instantiate an EntityManagerFactory. When working within a container (JBoss/Spring), this could be done for you behind the scenes. There are two ways that a Persistence Context can be handled: Transactional and Resource Local. Transactional Persistence Contexts are typically used in containers where the server application manages the Persistence Context. Resource Local contexts mean that the programmer is responsible for starting and stopping transactions and managing the state of entities (attaching/detaching). Each option has it's place, depending on the application being developed. By default, JPA tends to use Resource Local Persistence Contexts when no other option is specified.

    Once an EntityManagerFactory, and thus a Persistence Context, have been created; you can start creating and managing entities. This is done using an EntityManager instance. The EntityManager can be thought of as somewhat analogous to a Session (but much more) in JDBC terms. You can use the EntityManager to query for information from the database, and you can use it to persist new data into the database. Beyond the basic capabilities, an EntityManager keeps track of all managed objects and if they are changed within the Persistence Context it will persist those changes back to the storage system . . . Automatically! For example, if I use the EntityManager to pull a Person object from my database, and subsequently make changes to that object; as long as I am still within the boundaries of the Persistence Context, JPA will store those changes without having to call a method to trigger it.

JPA-QL, The JPA Query Language

    If you have had to use multiple database engines from time to time, then you are aware that not all SQL implementations are created equal. In many cases, this meant having to rewrite large amounts of code in order to move from one database server to another. With JPA-QL, that is no longer the case. JPA-QL is an abstracted layer over SQL. It still can have many of the complexities and flexibility of SQL, but it is platform agnostic; instead allowing the JPA implementation to use the correct SQL dialect under the hood. This allows and application to be written once and by merely changing some of the JPA configuration options we can switch from one database platform to another.

    JPA queries can me much simpler than their SQL counterparts. In most cases, to access list of results from a table, the programmer only has to write a short query like "From Person" and they will get all results from the table that the "Person" entity refers to. In addition, you can have a "WHERE" clause in the form of:
EntityManager mgr = myEntityManagerFactory.createEntityManager() ;

mgr.getTransaction().begin() ;
ArrayList peopleNamedDeven = mgr.createQuery("From Person where forename=:foreName").addProperty("foreName","Deven").getResultList() ;
// Do something with these objects
mgr.getTransaction().commit() ;

Entity Relationships

    Database joins across multiple tables are a special case, and are handled in a completely intuitive manner within JPA. When you create an entity, you can use annotations to indicate that one entity is related to another entity. Once these Object Relational Mappings (ORM) are in place, referring to the associated getters is all it takes to perform a join and get the results. Here's an example:

Person.java
import java.io.Serializable;
import javax.persistence.*;

@Entity
@Table(name="people")
public class Person implements Serializable {

    private static final long serialVersionUID = 1L;

    @Id
    @GeneratedValue(strategy=GenerationType.IDENTITY)
    private int index = 0 ;

    private String name = null ;

    private int age = 0 ;

    // Getters and Setters omitted . . .
}

Company.java
import java.io.Serializable;
import javax.persistence.*;

@Entity
@Table(name="companies")
public class Company implements Serializable {

    private static final long serialVersionUID = 1L;

    @Id
    @GeneratedValue(strategy=GenerationType.IDENTITY)
    private int index = 0 ;

    private String name = null ;

    @OneToMany
    private ArrayList<Person> employees = null ;

    // Getters and Setters omitted . . .
}

JPALogic.java
public class JPALogic {

    public void manipulateRelatedObjects() {
        // Get an EntityManager instance from the Persistence Context
        EntityManager mgr = Persistence.createEntityManagerFactory("JPAExample").createEntityManager() ;

        // Start a new transaction block
        mgr.getTransaction().begin() ;

        // Use a JPA-QL query to fetch an instace of the Company class
        Company aCompany = mgr.createQuery("From Company where index=:pk").addParameter("pk",1).getSingleResult() ;

        // Use the getter from Company to retrieve a list of employees which are related to this company, and then grab the first item in the list
        Person anEmployee = aCompany.getEmployees().get(0) ;


        // Do something with aCompany and/or anEmployee


        // Commit the transaction
        mgr.getTransaction().commit() ;
    }
}

    Let's analyze the class file listings above. The first one, Person.class, is very straightforward. It is a simple entity to represent a person. I used the "@Table" annotation to specify the name which should be used for the table in the database. I could also do other things in that annotation, such as specify unique constraints. So, little about that class needs to be explained. Do take note though that ALL entities must implement the Serializable interface in order to function properly.

    The next class, Company, has some more interesting new annotations for us to understand. Most important is the "@OneToMany" annotation. This is used to specify that there is a relationship between this entity and the entity specified in the associated property, in this case an ArrayList of "Person" objects. Without any arguments, the @OneToMany annotation will define a foreign key field in the appropriate table and use the primary key of the related entity. So, in our case, a "company" field would be created in the "people" table, and it would also create a foreign key constraint that the "field" must reference a valid "index" in the "companies" table. You could also be more specific in the @OneToMany annotation so that you can better control how the relationship is created in the database schema:

@OneToMany(nullable=true,optional=true)
    @JoinColumn(name="index",referencedColumn="company_id")
    private ArrayList<Person> people = null ;

This would cause the foreign key field in the "people" table to be named "company_id" instead of the default name. It also makes the relationship optional and nullable. When creating these sorts of relationship, the Hibernate implementation will also use as much information as it can to implement indexes constraints which will improve database responsiveness.

    With the above classes, we implemented a uni-directional relationship. That means that you could grab an instance of a Company and use it to get the associated Person objects, but not vice-versa. In order to accomplish that, there is just one small change required.

Person.java
import java.io.Serializable;
import javax.persistence.*;

@Entity
@Table(name="people")
public class Person implements Serializable {

    private static final long serialVersionUID = 1L;

    @Id
    @GeneratedValue(strategy=GenerationType.IDENTITY)
    private int index = 0 ;

    private String name = null ;

    private int age = 0 ;

    @ManyToOne(mappedBy="employees", optional=true, nullable=true)
    private Company employer = null ;

    // Getters and Setters omitted . . .
}

We add the @ManyToOne annotation to the Person class and tell the JPA implementation that it is already mapped by the "employees" property in the target class. It's just that simple!

Inheritance And Polymorphism In JPA

    As object oriented programmers, we are all familiar with the concepts of inheritance and polymorphism. JPA includes these object oriented concepts in a very intuitive manner. An example would be keeping a table of animals in your database. Many animals share common traits, but some have unique attributes as well. So, we can define an entity called "Animal" which contains the common attributes of all animals. Once that entity is created, we could then inherit that class into more specific entities like Cat or Dog or Bird, and then put specific attributes about those species into the subclass. By default, when you do this, JPA will create a single table for all animals and columns to store attributes for all inheriting classes. You can override this behavior by using @Column annotations so that compatible fields would use the same columns:

Bird.class
@Entity
public class Bird extends Animal implements Serializable {

    private static final long serialVersionUID = 1L;

    @Column(name="color")
    private String featherColor = null ;

    // Getters and Setters omitted . . .
}

Cat.class
@Entity
public class Bird extends Animal implements Serializable {

    private static final long serialVersionUID = 1L;

    @Column(name="color")
    private String furColor = null ;

    // Getters and Setters omitted . . .
}

In the Bird and Cat classes above, we have different properties of featherColor and furColor, but since they are compatible types, we can use the same column in the database and reduce the complexity. But, when we query using JPA and cast the result to one of these classes, the fields and accessors will be specific to that class.

Embedded Classes and Composite Keys

    JPA would not be the amazing leap ahead that it is without some of the innovations that it provides for reducing and reusing code. One such feature is the concept of "Embedded" classes and keys. Instead of creating an series of fields to store addresses for various types of entities (Companies, People, and Customers all have addresses, right?), we can instead create an @Embedded class to make those address fields available to multiple entities without having to rewrite the same boilerplate code. See the examples below:

Address.class
@Embeddable
public class Address implements Serializable {

    private static final long serialVersionUID = 1L;

    private String streetAddr1 = null ;

    private String streetAddr2 = null ;

    private String city = null ;

    @Column(length=2)
    private String state = null ;

    @Column(length=24)
    private String postalCode = null ;
}

Company
@Entity
public class Company implements Serializable {

    private static final long serialVersionUID = 1L;

    @Id
    private int index = 0 ;

    @Embedded
    private Address address = null ;

    // Getters and Setters omitted . . . 
}

As you can see in the above example code, we could embed an Address object into as many entities as we like and the appropriate fields would be added to your tables without any additional code.

    It is often the case, especially with legacy database schemas, that we do not have a single primary key field to use for accessing records. In my experience, I have seen tables which use up to 8 fields to describe a unique record!! Don't worry, JPA can handle this with @EmbeddedId annotations. An embedded id is an @Embeddable class which is added as a property using @EmbbededId instead of @Embedded. That's the only difference between embedded classes and embedded Ids. An embedded Id is also re-usable, such that if you have a number of tables which use the same fields to achieve a unique key, you can just add the EmbeddedId class and move on... You could also use multiple @Id annotations to build a primary key, but I have generally found that to be far less readable than the @EmbeddedId syntax.

Conclusion

    Hopefully you have found this basic introduction helpful, and hopefully JPA can assist you in being more productive with less effort as all programmers are want to be. If you have any questions, please post them and I will answer as I am able. Addtionally, see below for the links to other references on JPA.

JPA Concepts
Schuchert's JPA Tutorial
The GlassFish Persistence FAQ

Read more...

2010-11-18

The Trials And Tribulations Of Windows Terminal Servers (Part 2)

    In the first segment I wrote on Terminal Services I discussed Terminal Services roaming profiles, their difficulties and some best practices. In this article I want to cover the various performance improvements you can make on your network to improve the overall responsiveness of terminal services. Be aware that while most of these are from well tested documents, making changes to the Windows registry is never guaranteed to turn out well . . . You have been warned.

Performance Tuning For Terminal Services


    Let's begin with some simple tweaks which do not require the black arts of registry hacking. Log on to your terminal server(s) and open up the system properties control panel, either by right-clicking on "My Computer" and selecting "Properties" or by opening it up in the "Control Panel". Click on the "Advanced" tab. Click the "Settings" button which is under the "Performance" heading. Click on the "Advanced" tab in the new dialog which opens. For your terminal servers you should have the "Processor Scheduling" set to favor "Programs" and you should have the "Memory usage" set to favor "Programs" as well. This is telling Windows that the kernel should give resource priority to interactive applications instead of server services. In addition, you should create a swap file that is at least 4GB and if at all possible place the swap file on a different physical disk. If you're using a RAID array or a SAN, putting this on a different logical drive will not make much difference.

    In a Terminal Services environment, unless you only have a single terminal server, your terminal performance very much tied to how the rest of your network servers are performing. With roaming profiles and redirected folders, if your file server is not at it's peak; your users will let you know about it. So, I recommend following a wonderfully detailed document published by IBM about tuning Windows servers. The file can be downloaded HERE. Read through it carefully and try to apply what you know about how your network is organized and how your users function to determine the best settings for your network.

    When using Internet Explorer inside of a terminal session, you will notice in many cases that animations (flash, gifs, etc..) will cause the terminal server to work very hard on encoding the screen changes and transmitting them to the client. This can cause high CPU load and higher bandwidth use on your network. There are several Group Policy settings which you can use to prevent animations from taking up precious CPU resources, but I have found that using Mozilla Firefox (>=3.0) with the "FlashBlock" and "AdBlock" extensions work better than anything. Many legitimate web sites require the use of flash, and the group policies for Internet Explorer are more like a fireman's axe than a surgeons' knife for eliminating unwanted content. By using FlashBlock you can disable flash animations by default, but still allow them to run and load if the user needs them. Score one for Open Source!!

    One of the more annoying applications which users need access to tends to be Adobe Acrobat. Acrobat version 9, while better than previous versions in many ways, still tends to overuse the CPU of a terminal server. In addition, the Adobe applications tend to lock files in the "Application Data" folder and thus cause problems with roaming profiles. There are several ways to alleviate this issue. One way is to not use Adobe Acrobat, but instead a 3rd party PDF reader like Evince. It does not have as many features as Adobe's product, but it works far better on a terminal server.

    The biggest headache I have run across in a Terminal Services environment is the problem of printers. Terminal Servers do not deal well with printers, and I have long ago learned that offloading the printing work to a print server helps. Beyond that, offloading the print drivers to a separate server seems to be the best practice. It is possible, with some print server configurations, to have the print server present all printers using a generic PostScript interface and driver. This ensures that only a single print driver is required on the Terminal Servers, and it is a very reliable and well tested driver. The specifics of how to build such a print server are beyond the scope of this document, but check the links below for suggestions on how to get started.

References:

CUPS Print Server


Read more...

2010-10-26

The Trials And Tribulations Of Windows Terminal Servers (Part 1)

    My employer made the decision to move to thin client computing several years ago, and I wanted to share with people some of the lessons learned and best practices for using Windows Terminal Services. In addition, I want to let people know what will not work (as best I can tell). I've experienced a lot of frustrations, but as we have overcome problems we have documented and studied. Without further ado, here's what I know.



Terminal Services Profiles


    First off, Terminal Services Roaming Profiles suck. We use a load balancer to distribute our users across 12-14 terminal servers, and to make that work properly you have to use Terminal Services roaming profiles. They're simple to configure, but a basic configuration is rarely optimal.

    By default, everything that would normally be in your "Documents and Settings" folder on XP is stored in your profile, with the exception of the "Local Settings" hidden folder. When you log on to a terminal server, the profile is copied from a file server location on the network. When you log off, the profile is copied back to the file server location.

    Here are some of the problems with how roaming profiles work in terminal services. I'll discuss several concerns in detail over the next few paragraphs, and hopefully save you some of the pain and agony that we have gone through.

    First, without lots of tweaking, the user registry hive will not unload in time while logging off to be copied back to the file server location. This can be fixed using the User Profile Hive Cleanup Service, but that is not well documented anywhere. You must install the UPHC on every terminal server and reboot before it will work properly. It helps a lot, but is still not perfect. Sometimes hung applications cannot be cleaned up by UPHC and they will corrupt the user's registry hive; if this happens the user's profile has to be recreated from scratch (at least, I have not found another solution).

    By default, the "My Documents" and "Desktop" folders are also contained inside of the profile. This means that after some time of users saving files in those locations, their log on times grind slower and slower while waiting to copy all of their data. You can mitigate this by implementing folder redirection along with loopback policy processing. You'll learn very quickly in a terminal services environment that Group Policies are INDISPENSABLE. By placing your terminal servers into their own Organizational Unit within Active Directory, you can then apply group policies to that group of servers. Within the computer policy, if you enable "Loopback Policy Processing", you can apply user level policies to users which only take effect when that user log on to a terminal server but not when they log on to a desktop. Our best results have been with redirecting folders to the user's home directory folder on the file server. We usually redirect the "My Documents" and the "Desktop" folder, but I recommend against redirecting the "Application Data" folder due to performance problems.



    In the next installment, I will discuss some of the performance tweaks you can implement and what they accomplish. Also, if you have comments or suggestions, feel free to comment below!!


Read more...

Alive And Kicking . . .

I would first like to apologize for neglecting this blog for so long. There are literally hundreds of comments I never even knew about on the site and I never responded because I have my notification settings wrong... Mea culpa!!

Anyhow, going forward this site should be updated more often as I have a new position which allows me the freedom to post on a regular basis. I promise that I will be more attentive to your comments and questions from now on and I hope that you continue to find the information on this site useful!!

Deven Phillips

Read more...